Merge pull request #79 from alvarotap/patch-9

Some minor typos in chapter 7
This commit is contained in:
Sylvain Gugger 2020-04-01 08:49:56 -04:00 committed by GitHub
commit a909a8cb5b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -389,7 +389,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As we have seen, the kinds of features that are learned by convolutional neural networks are not in any way specific to the size of the image — early layers find things like edges and gradients, and later layers may find things like noses and sunsets. So, when we change image size in the middle of training, it doesn't mean that we have two find totally different parameters for our model.\n",
"As we have seen, the kinds of features that are learned by convolutional neural networks are not in any way specific to the size of the image — early layers find things like edges and gradients, and later layers may find things like noses and sunsets. So, when we change image size in the middle of training, it doesn't mean that we have to find totally different parameters for our model.\n",
"\n",
"But clearly there are some differences between small images and big ones, so we shouldn't expect our model to continue working exactly as well, with no changes at all. Does this remind you of something? When we developed this idea, it reminded us of transfer learning! We are trying to get our model to learn to do something a little bit different to what it has learned to do before. Therefore, we should be able to use the `fine_tune` method after we resize our images.\n",
"\n",
@ -803,7 +803,7 @@
"source": [
"#hide_input\n",
"#id mixup_example\n",
"#caption Mixing a chruch and a gas station\n",
"#caption Mixing a church and a gas station\n",
"#alt An image of a church, a gas station and the two mixed up.\n",
"church = PILImage.create(get_image_files_sorted(path/'train'/'n03028079')[0])\n",
"gas = PILImage.create(get_image_files_sorted(path/'train'/'n03425413')[0])\n",
@ -910,7 +910,7 @@
"source": [
"Let's practice our paper reading skills to try to interpret this. \"This maximum\" is refering to the previous section of the paper, which talked about the fact that `1` is the value of the label for the positive class. So any value (except infinity) can't result in `1` after sigmoid or softmax. In a paper, you won't normally see \"any value\" written, but instead it would get a symbol; in this case, it's $z_k$. This is helpful in a paper, because it can be refered to again later, and the reader knows what value is being discussed.\n",
"\n",
"The it says: $z_y\\gg z_k$ for all $k\\neq y$. In this case, the paper immediately follows with \"that is...\", which is handy, because you can just read the English instead of the math. In the math, the $y$ is refering to the target ($y$ is defined earlier in the paper; sometimes it's hard to find where symbols are defined, but nearly all papers will define all their symbols somewhere), and $z_y$ is the activation corresponding to the target. So to get close to `1`, this activation needs to be much higher than all the others for that prediction.\n",
"Then it says: $z_y\\gg z_k$ for all $k\\neq y$. In this case, the paper immediately follows with \"that is...\", which is handy, because you can just read the English instead of the math. In the math, the $y$ is refering to the target ($y$ is defined earlier in the paper; sometimes it's hard to find where symbols are defined, but nearly all papers will define all their symbols somewhere), and $z_y$ is the activation corresponding to the target. So to get close to `1`, this activation needs to be much higher than all the others for that prediction.\n",
"\n",
"Next up is \"if the model learns to assign full probability to the ground-truth label for each training example, it is not guaranteed to generalize\". This is saying that making $z_y$ really big means we'll need large weights and large activations throughout our model. Large weights lead to \"bumpy\" functions, where a small change in input results in a big change to predictions. This is really bad for generalization, because it means just one pixel changing a bit could change our prediction entirely!\n",
"\n",