rename

2020-03-05 13:57:14 -08:00 · 2020-03-05 13:57:14 -08:00 · a872892185
commit a872892185
parent e24f16fc97
14 changed files with 4220 additions and 4132 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1 +1,4 @@
+__pycache__/
+.last_checked
+.gitconfig
 .ipynb_checkpoints/
--- a/11_midlevel_data.ipynb
+++ b/11_midlevel_data.ipynb
--- a/11_nlp_dive.ipynb
+++ b/11_nlp_dive.ipynb
--- a/12_better_rnn.ipynb
+++ b/12_better_rnn.ipynb
--- a/12_nlp_dive.ipynb
+++ b/12_nlp_dive.ipynb
--- a/14_deep_conv.ipynb
+++ b/14_deep_conv.ipynb
--- a/14_resnet.ipynb
+++ b/14_resnet.ipynb
@ -26,6 +26,15 @@
    "# Resnets"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this chapter, we will build on top of the CNNs introduced in the previous chapter and explain to you the ResNet (for residual network) architecture. It was introduced in 2015 in [this article](https://arxiv.org/abs/1512.03385) and is by far the most used model architecture nowadays. More recent developments in models almost always use the same trick of residual connections, and most of the time, they are just a tweak of the original ResNet.\n",
+    "\n",
+    "We will frist show you the basic ResNet as it was first designed, then explain to you what modern tweaks to it make it more performamt. But first, we will need a problem a little bit more difficult that the MNIST dataset, since we are already close to 100% accuracy with a regular CNN on it."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -37,7 +46,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "It's going to be tough to judge any improvement we do to our models when we are already at an accuracy that is as high as we saw on MNIST in the previous chapter, so we will tackle a tougher problem by going back to Imagenette. We'll stick with small images to keep things reasonably fast.\n",
+    "It's going to be tough to judge any improvement we do to our models when we are already at an accuracy that is as high as we saw on MNIST in the previous chapter, so we will tackle a tougher image classification problem by going back to Imagenette. We'll stick with small images to keep things reasonably fast.\n",
    "\n",
    "Let's grab the data--we'll use the already-resized 160px version to make things faster still, and will random crop to 128px:"
   ]
@ -303,7 +312,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "That's a pretty good start, considering we have to pick the correct one of ten categories, and we're training from scratch for just 5 epochs!"
+    "That's a pretty good start, considering we have to pick the correct one of ten categories, and we're training from scratch for just 5 epochs! But we can do way better than this using a deeper model. However, just stacking new layers won't really improve our results (you can try and see for yourself!). To work around this problem, ResNets introduce the idea of skip connections. Let's have a look at what it is exactly."
   ]
  },
  {
@ -317,7 +326,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We now have all the pieces needed to build the models we have been using in each computer vision task since the beginning of this book: ResNets. We introduce the main idea behind them and show how it improves accuracy Imagenette compared to our previous model, before building a version with all the recent tweaks."
+    "We now have all the pieces needed to build the models we have been using in each computer vision task since the beginning of this book: ResNets. We'll introduce the main idea behind them and show how it improves accuracy Imagenette compared to our previous model, before building a version with all the recent tweaks."
   ]
  },
  {
@ -335,7 +344,7 @@
    "\n",
    "> : Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error, as [previously reported] and thoroughly verified by our experiments.\n",
    "\n",
-    "This is the graph they showed, with training error on the left, and test on the right:"
+    "They showed the graph in <<resnet_depth>>, with training error on the left, and test on the right."
   ]
  },
  {
@ -361,7 +370,7 @@
    "\n",
    "What has that gained us, then? The key thing is that those 36 extra layers, as they stand, are an *identity mapping*, but they have *parameters*, which means they are *trainable*. So, we can start with our best 20 layer model, add these 36 extra layers which initially do nothing at all, and then *fine tune the whole 56 layer model*. If those extra 36 layers can be useful, then they can learn parameters to do so!\n",
    "\n",
-    "The ResNet paper actually proposed a variant of this, which is to instead \"skip over\" every 2nd convolution, so effectively we get `x+conv2(conv1(x))`. Or In diagram form (from the paper):"
+    "The ResNet paper actually proposed a variant of this, which is to instead \"skip over\" every 2nd convolution, so effectively we get `x+conv2(conv1(x))`. This is shown by the diagram in <<resnet_block>> (from the paper)."
   ]
  },
  {
@ -659,7 +668,7 @@
    "\n",
    "The authors of the ResNet paper went on to win the 2015 ImageNet challenge. At the time, this was by far the most important annual event in computer vision. We have already seen another ImageNet winner: the 2013 winners, Zeiler and Fergus. It is interesting to note that in both cases the starting point for the breakthroughs were experimental observations. Observations about what layers actually learn, in the case of Zeiler and Fergus, and observations about which kind of networks can be trained, in the case of the ResNet authors. This ability to design and analyse thoughtful experiments, or even just to see an unexpected result say \"hmmm, that's interesting\" — and then, most importantly, to figure out what on earth is going on, with great tenacity, is at the heart of many scientific discoveries. Deep learning is not like pure mathematics. It is a heavily experimental field, so it's important to be strong practitioner, not just a theoretician.\n",
    "\n",
-    "Since the ResNet was introduced, there's been many papers studying it and applying it to many domains. One of the most interesting, published in 2018, is [Visualizing the Loss Landscape of Neural Nets](https://arxiv.org/abs/1712.09913). It shows that using skip connections help smoothen the loss function, which makes training easier as it avoids us falling into a very sharp area. Here's a stunning picture from the paper, showing the bumpy terrain that SGD has to navigate to optimize a regular CNN (left) versus the smooth surface of a ResNet (right):"
+    "Since the ResNet was introduced, there's been many papers studying it and applying it to many domains. One of the most interesting, published in 2018, is [Visualizing the Loss Landscape of Neural Nets](https://arxiv.org/abs/1712.09913). It shows that using skip connections help smoothen the loss function, which makes training easier as it avoids us falling into a very sharp area. <<resnet_surface>> shows a stunning picture from the paper, showing the bumpy terrain that SGD has to navigate to optimize a regular CNN (left) versus the smooth surface of a ResNet (right)."
   ]
  },
  {
@ -669,6 +678,13 @@
    "<img alt=\"Impact of ResNet on loss landscape\" width=\"600\" caption=\"Impact of ResNet on loss landscape\" id=\"resnet_surface\" src=\"images/att_00044.png\">"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This first model is already good, but further research has discovered more tricks we can apply to make it better."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -910,7 +926,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Even although we have more channels (and our model is therefore even more accurate), our training is just as fast as before, thanks to our optimized stem."
+    "Even although we have more channels (and our model is therefore even more accurate), our training is just as fast as before, thanks to our optimized stem.\n",
+    "\n",
+    "To make our model deeper without taking too much compute or memory, the ResNet paper introduced anotehr kind of blocks for ResNets with a depth of 50 or more, using something called a bottleneck. "
   ]
  },
  {
@ -924,7 +942,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Things are a tiny bit more complicated for deeper models like `resnet50` as they don't use the same resnet blocks: instead of stacking two convolutions with a kernel size of 3, they use three different convolutions: two 1x1 (at the beginning and the end) and one 3x3, as shown in the right of this image from the ResNet paper (using an example of 64 channel output, comparing to the regular ResBlock on the left):"
+    "Instead of stacking two convolutions with a kernel size of 3, *bottleneck layers* use three different convolutions: two 1x1 (at the beginning and the end) and one 3x3, as shown in the right of <<resnet_compare>> the ResNet paper (using an example of 64 channel output, comparing to the regular ResBlock on the left)."
   ]
  },
  {
@ -938,7 +956,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Why? 1x1 convolutions are much faster, so even if this seems to be a more complex design, this block executes faster than the first resnet block we saw. This then lets us use more filters: as we see on the illustration, the number of filters in and out is 4 times higher (256) and the 1 by 1 convs are here to diminish then restore the number of channels (hence the name bottleneck). The overall impact is that we can use more filters in the same amount of time.\n",
+    "Why is that useful? 1x1 convolutions are much faster, so even if this seems to be a more complex design, this block executes faster than the first resnet block we saw. This then lets us use more filters: as we see on the illustration, the number of filters in and out is 4 times higher (256) and the 1 by 1 convs are here to diminish then restore the number of channels (hence the name bottleneck). The overall impact is that we can use more filters in the same amount of time.\n",
    "\n",
    "Let's try replacing our ResBlock with this bottleneck design:"
   ]
@ -1174,6 +1192,13 @@
    "The bottleneck design we've shown here is only used in ResNet50, 101, and 152 in all official models we've seen. ResNet18 and 34 use the non-bottleneck design seen in the previous section. However, we've noticed that the bottleneck layer generally works better even for the shallower networks. This just goes to show that the little details in papers tend to stick around for years, even if they're actually not quite the best design! Questioning assumptions and \"stuff everyone knows\" is always a good idea, because this is still a new field, and there's lots of details that aren't always done well."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "TK add conclusion"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -1243,31 +1268,6 @@
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.5"
-  },
-  "toc": {
-   "base_numbering": 1,
-   "nav_menu": {},
-   "number_sections": false,
-   "sideBar": true,
-   "skip_h1_title": true,
-   "title_cell": "Table of Contents",
-   "title_sidebar": "Contents",
-   "toc_cell": false,
-   "toc_position": {},
-   "toc_section_display": true,
-   "toc_window_display": false
  }
 },
 "nbformat": 4,
--- a/15_arch_details.ipynb
+++ b/15_arch_details.ipynb
@ -28,7 +28,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We are now in the exciting position that we can fully understand the entire architectures that we have been using for our state-of-the-art models for computer vision, natural language processing, and tabular analysis. In this chapter, we're going to fill in all the missing details on how fastai's application models work."
+    "We are now in the exciting position that we can fully understand the entire architectures that we have been using for our state-of-the-art models for computer vision, natural language processing, and tabular analysis. In this chapter, we're going to fill in all the missing details on how fastai's application models work and show you how to build the models they use.\n",
+    "\n",
+    "We will also go back to the custom data preprocessing pipeline we saw in <<chapter_midlevel_data>> for Siamese networks and show you how you can use the components in the fastai library to build custom pretrained models for new tasks.\n",
+    "\n",
+    "We will go voer each application in turn, starting with computer vision."
   ]
  },
  {
@ -38,6 +42,13 @@
    "## Computer vision"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In computer vision, we used the functions `cnn_learner` and `unet_learner` to build our models, depending on the task. Let's see how they start from a pretrained ResNet to build the `Learner` objects we have used in part 1 and 2 of this book."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -162,6 +173,13 @@
    "> note: One parameter to create_head that is worth looking at is bn_final. Setting this to true will cause a batchnorm layer to be added as your final layer. This can be useful in helping your model to more easily ensure that it is scaled appropriately for your output activations. We haven't seen this approach published anywhere, as yet, but we have found that it works well in practice, wherever we have used it."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's now have a look at what `unet_learner` did in the segmentation problem we showed in <<chapter_intro>>."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -181,7 +199,7 @@
    "\n",
    "We will (naturally) do this with a neural network! So we need some kind of layer which can increase the grid size in a CNN. One very simple approach to this is to replace every pixel in the 7x7 grid with four pixels in a 2x2 square. Each of those four pixels would have the same value — this is known as nearest neighbour interpolation. PyTorch provides a layer which does this for us, so we could create a head which contains stride one convolutional layers (along with batchnorm and ReLU as usual) interspersed with 2x2 nearest neighbour interpolation layers. In fact, you could try this now! See if you can create a custom head designed like this, and see if it can complete the CamVid segmentation task. You should find that you get some reasonable results, although it won't be as good as our <<chapter_intro>> results.\n",
    "\n",
-    "Another approach is to replace the nearest neighbour and convolution combination with a *transposed convolution* otherwise known as a *stride half convolution*. This is identical to a regular convolution, but first zero padding is inserted between every pixel in the input. This is easiest to see with a picture — here's a diagram from the excellent convolutional arithmetic paper we have seen before, showing a 3x3 transposed convolution applied to a 3x3 image:"
+    "Another approach is to replace the nearest neighbour and convolution combination with a *transposed convolution* otherwise known as a *stride half convolution*. This is identical to a regular convolution, but first zero padding is inserted between every pixel in the input. This is easiest to see with a picture — <<transp_conv>> shows a diagram from the excellent convolutional arithmetic paper we have seen before, showing a 3x3 transposed convolution applied to a 3x3 image."
   ]
  },
  {
@ -199,7 +217,7 @@
    "\n",
    "Neither of these approaches, however, works really well. The problem is that our 7x7 grid simply doesn't have enough information to create a 224x224 pixel output. It's asking an awful lot of the activations of each of those grid cells to have enough information to fully regenerate every pixel in the output. The solution to this problem is to use skip connections, like in a resnet, but skipping from the activations in the body of the resnet all the way over to the activations of the transposed convolution on the opposite side of the architecture. This is known as a U-Net, and it was developed in the 2015 paper [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597). Although the paper focussed on medical applications, the U-Net has revolutionized all kinds of generation vision models.\n",
    "\n",
-    "The U-Net paper shows the architecture like this:"
+    "<<unet>> shows the U-Net architecture (form the paper). "
   ]
  },
  {
@ -218,6 +236,13 @@
    "With this architecture, the input to the transposed convolutions is not just the lower resolution grid in the preceding layer, but also the higher resolution grid in the resnet head. This allows the U-Net to use all of the information of the original image, as it is needed. One challenge with U-Nets is that the exact architecture depends on the image size. fastai has a unique `DynamicUnet` class which auto-generates an architecture of the right size based on the data provided."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we've seen how to create complete state of the art computer vision models, let's move on to NLP."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -229,9 +254,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now that we've seen how to create complete state of the art computer vision models, let's move on to NLP.\n",
-    "\n",
-    "Converting an AWD-LSTM language model into a transfer learning classifier follows a very similar process to what we saw for `cnn_learner` in the first section of this chapter. We do not need a \"meta\" dictionary in this case, because we do not have such a variety of architectures to support in the body. All we need to do is to select the stacked RNN for the encoder in the language model, which is a single PyTorch module. This encoder will provide an activation for every word of the input, because a language model needs to output a prediction for every next word.\n",
+    "Converting an AWD-LSTM language model into a transfer learning classifier as we have done in <<chapter_nlp>> follows a very similar process to what we saw for `cnn_learner` in the first section of this chapter. We do not need a \"meta\" dictionary in this case, because we do not have such a variety of architectures to support in the body. All we need to do is to select the stacked RNN for the encoder in the language model, which is a single PyTorch module. This encoder will provide an activation for every word of the input, because a language model needs to output a prediction for every next word.\n",
    "\n",
    "To create a classifier from this we use an approach described in the ULMFiT paper as \"BPTT for Text Classification (BPT3C)\". The paper describes this:"
   ]
@ -240,7 +263,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "> In order to make fine-tuning a classifier for large documents feasible, we propose BPTT for Text Classification (BPT3C): We divide the document into fixed-length batches of size `b`. At the beginning of each batch, the model is initialized with the final state of the previous batch; we keep track of the hidden states for mean and max-pooling; gradients are back-propagated to the batches whose hidden states contributed to the final prediction. In practice, we use variable length backpropagation sequences."
+    "> : In order to make fine-tuning a classifier for large documents feasible, we propose BPTT for Text Classification (BPT3C): We divide the document into fixed-length batches of size `b`. At the beginning of each batch, the model is initialized with the final state of the previous batch; we keep track of the hidden states for mean and max-pooling; gradients are back-propagated to the batches whose hidden states contributed to the final prediction. In practice, we use variable length backpropagation sequences."
   ]
  },
  {
@ -256,6 +279,13 @@
    "This is done automatically behind the scenes by the fastai library when creating our `DataLoaders`."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The last application where we used fastai's model we haven't shown you yet is tabular."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -336,7 +366,9 @@
    "\n",
    "```\n",
    "\n",
-    "Finally, this is passed through the linear layers (each of which includes batchnorm, if `use_bn` is True, and dropout, if `ps` is set to some value or list of values)."
+    "Finally, this is passed through the linear layers (each of which includes batchnorm, if `use_bn` is True, and dropout, if `ps` is set to some value or list of values).\n",
+    "\n",
+    "Congratulations! Now, you know every single piece of the architectures used in the fastai library!"
   ]
  },
  {
@ -354,7 +386,7 @@
    "\n",
    "Now that we have investigated all of the pieces of a model and the data that is passed into it, we can consider what this means for practical deep learning. If you have unlimited data, unlimited memory, and unlimited time, then the advice is easy: train a huge model on all of your data for a really long time. The reason that deep learning is not straightforward is because your data, memory, and time is limited. If you are running out of memory or time, then the solution is to train a smaller model. If you are not able to train for long enough to overfit, then you are not taking advantage of the capacity of your model.\n",
    "\n",
-    "So step one is to get to the point that you can overfit. Then, the question is how to reduce that overfitting. Here is how we recommend prioritising the steps from there:"
+    "So step one is to get to the point that you can overfit. Then, the question is how to reduce that overfitting. <<reduce_overfit>> shows how we recommend prioritising the steps from there."
   ]
  },
  {
--- a/16_accel_sgd.ipynb
+++ b/16_accel_sgd.ipynb
@ -23,22 +23,24 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Variants of SGD"
+    "# The training process"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now that you know all about how the architectures are put together, it's time to start exploring the training process.\n",
+    "Since we now know how to create state-of-the-art architectures for computer vision, natural image processing, tabular analysis, and collaborative filtering, and we know how to train them quickly, we're done, right? Not quite yet. We still have to explorea little bit more the training process.\n",
    "\n",
-    "We explained earlier the basis of Stochastic Gradient Descent: pass a minibatch in the model, compare it to our target with the loss function then compute the gradients of this loss function with regards to each weight before updating the weights with the formula:\n",
+    "We explained in <<chapter_mnist_basics>> the basis of Stochastic Gradient Descent: pass a minibatch in the model, compare it to our target with the loss function then compute the gradients of this loss function with regards to each weight before updating the weights with the formula:\n",
    "\n",
    "```python\n",
    "new_weight = weight - lr * weight.grad\n",
    "```\n",
    "\n",
-    "We implemented this from scratch in a training loop, and also saw that Pytorch provides a simple `nn.SGD` class that does this calculation for each parameter for us. Let's now build some faster optimizers, using a flexible foundation."
+    "We implemented this from scratch in a training loop, and also saw that Pytorch provides a simple `nn.SGD` class that does this calculation for each parameter for us. In this chapter, we will build some faster optimizers, using a flexible foundation. But that's not all what we might want to change in the training process. For any tweak of the training loop, we will need a way to add some code to the basis of SGD. The fastai library has a system of callbacks to do this, and we will teach you all about it.\n",
+    "\n",
+    "Firs things first, let's start with standard SGD to get a baseline, then we will introduce most commonly used optimizers."
   ]
  },
  {
@ -429,7 +431,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "It's working! So that's how we create SGD from scratch in fastai."
+    "It's working! So that's how we create SGD from scratch in fastai. Now let's see see what this momentum is exactly."
   ]
  },
  {
@ -456,7 +458,7 @@
    "\n",
    "Note that we are writing `weight.avg` to highlight the fact we need to store thoe moving averages for each parameter of the model (and they all their own independent moving averages).\n",
    "\n",
-    "Here is an example of noisy data for a single parameter, with the momentum curve plotted in red, and the gradients of the parameter plotted in blue. The gradients increase, and then decrease, and the momentum does a good job of following the general trend, without getting too influenced by noise:"
+    "<<img_momentum>> shows an example of noisy data for a single parameter, with the momentum curve plotted in red, and the gradients of the parameter plotted in blue. The gradients increase, and then decrease, and the momentum does a good job of following the general trend, without getting too influenced by noise."
   ]
  },
  {
@ -480,6 +482,10 @@
    }
   ],
   "source": [
+    "#hide_input\n",
+    "#id img_mommentum\n",
+    "#caption An example of momentum\n",
+    "#alt Graph showing an example of momentum\n",
    "x = np.linspace(-4, 4, 100)\n",
    "y = 1 - (x/3) ** 2\n",
    "x1 = x + np.random.randn(100) * 0.1\n",
@ -499,7 +505,7 @@
   "source": [
    "It works particularly well if the loss function has narrow canyons we need to navigate: vanilla SGD would send us from one side to the other while SGD with momentum will average those to roll down inside. The parameter `beta` determines the strength of that momentum we are using: with a small beta we stay closer to the actual gradient values whereas with a high beta, we will mostly go in the direction of the average of the gradients and it will take a while before any change in the gradients makes that trend move.\n",
    "\n",
-    "With a large beta, we might miss that the gradients have changed directions and roll over a small local minima which is a desired side-effect: intuitively, when we show a new picture/text/data to our model, it will look like something in the training set but won't be exactly like it. That means it will correspond to a point in the loss function that is closest to the minimum we ended up with at the end of training, but not exactly *at* that minimum. We then would rather end up training in a wide minimum, where nearby points have approximately the same loss (or if you prefer, a point where the loss is as flat as possible). Here's how the above chart varies as we change beta:"
+    "With a large beta, we might miss that the gradients have changed directions and roll over a small local minima which is a desired side-effect: intuitively, when we show a new picture/text/data to our model, it will look like something in the training set but won't be exactly like it. That means it will correspond to a point in the loss function that is closest to the minimum we ended up with at the end of training, but not exactly *at* that minimum. We then would rather end up training in a wide minimum, where nearby points have approximately the same loss (or if you prefer, a point where the loss is as flat as possible). <<img_betas>> shows how the chart in <<img_momentum>> varies as we change beta."
   ]
  },
  {
@ -523,6 +529,10 @@
    }
   ],
   "source": [
+    "#hide_input\n",
+    "#id img_betas\n",
+    "#caption Momentum with different beta values\n",
+    "#alt Graph showing how the beta value imfluence momentum\n",
    "x = np.linspace(-4, 4, 100)\n",
    "y = 1 - (x/3) ** 2\n",
    "x1 = x + np.random.randn(100) * 0.1\n",
@ -852,7 +862,9 @@
    "\n",
    "In fastai, Adam is the default optimizer we use since it allows faster training, but we found that `beta2=0.99` is better suited for the type of schedule we are using. `beta1` is the momentum parameter, which we specify with the argument `moms` in our call to `fit_one_cycle`. As for `eps`, fastai uses a default of 1e-5. `eps` is not just useful for numerical stability. A higher `eps` limits the maximum value of the adjusted learning rate. To take an extreme example, if `eps` is 1, then the adjusted learning will never be higher than the base learning rate. \n",
    "\n",
-    "Rather than show all the code for this in the book, we'll let you look at the optimizer notebook in fastai's GitHub repository--you'll see all the code we've seen so far, along with Adam and other optimizers, and lots of examples and tests."
+    "Rather than show all the code for this in the book, we'll let you look at the optimizer notebook in fastai's GitHub repository--you'll see all the code we've seen so far, along with Adam and other optimizers, and lots of examples and tests.\n",
+    "\n",
+    "One thing that changes when we go from SGD to Adam is the way we apply weight decay, and it can have important consequences."
   ]
  },
  {
@ -889,7 +901,319 @@
    "\n",
    "Most libraries use the first formulation, but it was pointed out in [Decoupled Weight Regularization](https://arxiv.org/pdf/1711.05101.pdf) by Ilya Loshchilov and Frank Hutter, second one is the only correct approach with the Adam optimizer or momentum, which is why fastai makes it its default.\n",
    "\n",
-    "Now you know everything that is hidden behind the line `learn.fit_one_cycle`!"
+    "Now you know everything that is hidden behind the line `learn.fit_one_cycle`!\n",
+    "\n",
+    "OPtimizers are only one part of the training process. When you need to change the training loop with fastai, you can't directly change the code inside the library. Instead, we have designed a system of callbacks to let you write any tweak in independent blocks you can then mix and match. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Callbacks"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Sometimes you need to change how things work a little bit. In fact, we have already seen examples of this: mixup, FP16 training, resetting the model after each epoch for training RNNs, and so forth. How do we go about making these kinds of tweaks to the training process?\n",
+    "\n",
+    "We've seen the basic training loop, which, with the help of the `Optimizer` class, looks like this for a single epoch:\n",
+    "\n",
+    "```python\n",
+    "for xb,yb in dl:\n",
+    "    loss = loss_func(model(xb), yb)\n",
+    "    loss.backward()\n",
+    "    opt.step()\n",
+    "    opt.zero_grad()\n",
+    "```\n",
+    "\n",
+    "<<basic_loop>> shows how to picture that."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img alt=\"Basic training loop\" width=\"300\" caption=\"Basic training loop\" id=\"basic_loop\" src=\"images/att_00048.png\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The usual way for deep learning practitioners to customise the training loop is to make a copy of an existing training loop, and then insert their code necessary for their particular changes into it. This is how nearly all code that you find online will look. But it has some very serious problems.\n",
+    "\n",
+    "It's not very likely that some particular tweaked training loop is going to meet your particular needs. There are hundreds of changes that can be made to a training loop, which means there are billions and billions of possible permutations. You can't just copy one tweak from a training loop here, another from a training loop there, and expect them all to work together. Each will be based on different assumptions about the environment that it's working in, use different naming conventions, and expect the data to be in different formats.\n",
+    "\n",
+    "We need a way to allow users to insert their own code at any part of the training loop, but in a consistent and well-defined way. Computer scientists have already come up with an answer to this question: the callback. A callback is a piece of code that you write, and inject into another piece of code at some predefined point. In fact, callbacks have been used with deep learning training loops for years. The problem is that only a small subset of places that may require code injection have been available in previous libraries, and, more importantly, callbacks were not able to do all the things they needed to do.\n",
+    "\n",
+    "In order to be just as flexible as manually copying and pasting a training loop and directly inserting code into it, a callback must be able to read every possible piece of information available in the training loop, modify all of it as needed, and fully control when a batch, epoch, or even all the whole training loop should be terminated. fastai is the first library to provide all of this functionality. It modifies the training loop so it looks like <<cb_loop>>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img alt=\"Training loop with callbacks\" width=\"550\" caption=\"Training loop with callbacks\" id=\"cb_loop\" src=\"images/att_00049.png\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The real test of whether this works has been borne out over the last couple of years — it has turned out that every single new paper implemented, or use a request fulfilled, for modifying the training loop has successfully been achieved entirely by using the fastai callback system. The training loop itself has not required modifications. <<some_cbs>> shows just a few of the callbacks that have been added."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<img alt=\"Some fastai callbacks\" width=\"500\" caption=\"Some fastai callbacks\" id=\"some_cbs\" src=\"images/att_00050.png\">"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The reason that this is important for all of us is that it means that whatever idea we have in our head, we can implement it. We need never dig into the source code of PyTorch or fastai and act together some one-off system to try out our ideas. And when we do implement our own callbacks to develop our own ideas, we know that they will work together with all of the other functionality provided by fastai – so we will get progress bars, mixed precision training, hyperparameter annealing, and so forth.\n",
+    "\n",
+    "Another advantage is that it makes it easy to gradually remove or add functionality and perform ablation studies. You just need to adjust the list of callbacks you pass along to your fit function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As an example, here is the fastai source code that is run for each batch of the training loop:\n",
+    "\n",
+    "```python\n",
+    "try:\n",
+    "    self._split(b);                                  self('begin_batch')\n",
+    "    self.pred = self.model(*self.xb);                self('after_pred')\n",
+    "    self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')\n",
+    "    if not self.training: return\n",
+    "    self.loss.backward();                            self('after_backward')\n",
+    "    self.opt.step();                                 self('after_step')\n",
+    "    self.opt.zero_grad()\n",
+    "except CancelBatchException:                         self('after_cancel_batch')\n",
+    "finally:                                             self('after_batch')\n",
+    "```\n",
+    "\n",
+    "The calls of the form `self('...')` are where the callbacks are called. As you see, after every step a callback is called. The callback will receive the entire state of training, and can also modify it. For instance, as you see above, the input data and target labels are in `self.xb` and `self.yb` respectively. A callback can modify these to modify the data the training loop sees. It can also modify `self.loss`, or even modify the gradients.\n",
+    "\n",
+    "Let's see how this work in practice by writing a `Callback`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Creating a callback"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When you want to write your own callback, the full list of available events is:\n",
+    "\n",
+    "- `begin_fit`:: called before doing anything, ideal for initial setup.\n",
+    "- `begin_epoch`:: called at the beginning of each epoch, useful for any behavior you need to reset at each epoch.\n",
+    "- `begin_train`:: called at the beginning of the training part of an epoch.\n",
+    "- `begin_batch`:: called at the beginning of each batch, just after drawing said batch. It can be used to do any setup necessary for the batch (like hyper-parameter scheduling) or to change the input/target before it goes in the model (change of the input with techniques like mixup for instance).\n",
+    "- `after_pred`:: called after computing the output of the model on the batch. It can be used to change that output before it's fed to the loss.\n",
+    "- `after_loss`:: called after the loss has been computed, but before the backward pass. It can be used to add any penalty to the loss (AR or TAR in RNN training for instance).\n",
+    "- `after_backward`:: called after the backward pass, but before the update of the parameters. It can be used to do any change to the gradients before said update (gradient clipping for instance).\n",
+    "- `after_step`:: called after the step and before the gradients are zeroed.\n",
+    "- `after_batch`:: called at the end of a batch, for any clean-up before the next one.\n",
+    "- `after_train`:: called at the end of the training phase of an epoch.\n",
+    "- `begin_validate`:: called at the beginning of the validation phase of an epoch, useful for any setup needed specifically for validation.\n",
+    "- `after_validate`:: called at the end of the validation part of an epoch.\n",
+    "- `after_epoch`:: called at the end of an epoch, for any clean-up before the next one.\n",
+    "- `after_fit`:: called at the end of training, for final clean-up.\n",
+    "\n",
+    "This list is available as attributes of the special variable `event`; so just type `event.` and hit `Tab` in your notebook to see a list of all the options"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's take a look at an example. Do you recall how in <<chapter_nlp_dive>> we needed to ensure that our special `reset` method was called at the start of training and validation for each epoch? We used the `ModelReseter` callback provided by fastai to do this for us. But how did `ModelReseter` do that exactly? Here's the full actual source code to that class:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class ModelReseter(Callback):\n",
+    "    def begin_train(self):    self.model.reset()\n",
+    "    def begin_validate(self): self.model.reset()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Yes, that's actually it! It just does what we said in the paragraph above: after completing training and epoch or validation for an epoch, call a method named `reset`.\n",
+    "\n",
+    "Callbacks are often \"short and sweet\" like this one. In fact, let's look at one more. Here's the fastai source for the callback that add RNN regularization (*AR* and *TAR*):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class RNNRegularizer(Callback):\n",
+    "    def __init__(self, alpha=0., beta=0.): self.alpha,self.beta = alpha,beta\n",
+    "\n",
+    "    def after_pred(self):\n",
+    "        self.raw_out,self.out = self.pred[1],self.pred[2]\n",
+    "        self.learn.pred = self.pred[0]\n",
+    "\n",
+    "    def after_loss(self):\n",
+    "        if not self.training: return\n",
+    "        if self.alpha != 0.:\n",
+    "            self.learn.loss += self.alpha * self.out[-1].float().pow(2).mean()\n",
+    "        if self.beta != 0.:\n",
+    "            h = self.raw_out[-1]\n",
+    "            if len(h)>1:\n",
+    "                self.learn.loss += self.beta * (h[:,1:] - h[:,:-1]\n",
+    "                                               ).float().pow(2).mean()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> stop: Go back to where we discussed TAR and AR regularization, and compare to the code here. Made sure you understand what it's doing, and why."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In both of these examples, notice how we can access attributes of the training loop by directly checking `self.model` or `self.pred`. That's because a `Callback` will always try to get an attribute it doesn't have inside the `Learner` associated to it. This is a shortcut for `self.learn.model` or `self.learn.pred`. Note that this shortcut works for reading attributes, but not for writing them, which is why when `RNNRegularizer` changes the loss or the predictions, you see `self.learn.loss = ` or `self.learn.pred = `. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When writing a callback, the following attributes of `Learner` are available:\n",
+    "\n",
+    "- `model`: the model used for training/validation\n",
+    "- `data`: the underlying `DataLoaders`\n",
+    "- `loss_func`: the loss function used\n",
+    "- `opt`: the optimizer used to udpate the model parameters\n",
+    "- `opt_func`: the function used to create the optimizer\n",
+    "- `cbs`: the list containing all `Callback`s\n",
+    "- `dl`: current `DataLoader` used for iteration\n",
+    "- `x`/`xb`: last input drawn from `self.dl` (potentially modified by callbacks). `xb` is always a tuple (potentially with one element) and `x` is detuplified. You can only assign to `xb`.\n",
+    "- `y`/`yb`: last target drawn from `self.dl` (potentially modified by callbacks). `yb` is always a tuple (potentially with one element) and `y` is detuplified. You can only assign to `yb`.\n",
+    "- `pred`: last predictions from `self.model` (potentially modified by callbacks)\n",
+    "- `loss`: last computed loss (potentially modified by callbacks)\n",
+    "- `n_epoch`: the number of epochs in this training\n",
+    "- `n_iter`: the number of iterations in the current `self.dl`\n",
+    "- `epoch`: the current epoch index (from 0 to `n_epoch-1`)\n",
+    "- `iter`: the current iteration index in `self.dl` (from 0 to `n_iter-1`)\n",
+    "\n",
+    "The following attributes are added by `TrainEvalCallback` and should be available unless you went out of your way to remove that callback:\n",
+    "\n",
+    "- `train_iter`: the number of training iterations done since the beginning of this training\n",
+    "- `pct_train`: from 0. to 1., the percentage of training iterations completed\n",
+    "- `training`:  flag to indicate if we're in training mode or not\n",
+    "\n",
+    "The following attribute is added by `Recorder` and should be available unless you went out of your way to remove that callback:\n",
+    "\n",
+    "- `smooth_loss`: an exponentially-averaged version of the training loss"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Callbacks can also interrupt any part of the training loop by using a system of exceptions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Callback ordering and exceptions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Sometimes, callbacks need to be able to tell fastai to skip over a batch, or an epoch, or stop training altogether. For instance, consider `TerminateOnNaNCallback`. This handy callback will automatically stop training any time the loss becomes infinite or `NaN` (*not a number*). Here's the fastai source for this callback:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class TerminateOnNaNCallback(Callback):\n",
+    "    run_before=Recorder\n",
+    "    def after_batch(self):\n",
+    "        if torch.isinf(self.loss) or torch.isnan(self.loss):\n",
+    "            raise CancelFitException"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The way it tells the training loop to interrupt training at this point is to `raise CancelFitException`. The training loop catches this exception and does not run any further training or validation. The callback control flow exceptions available are:\n",
+    "\n",
+    "- `CancelFitException`:: Skip the rest of this batch and go to `after_batch\n",
+    "- `CancelEpochException`:: Skip the rest of the training part of the epoch and go to `after_train\n",
+    "- `CancelTrainException`:: Skip the rest of the validation part of the epoch and go to `after_validate\n",
+    "- `CancelValidException`:: Skip the rest of this epoch and go to `after_epoch\n",
+    "- `CancelBatchException`:: Interrupts training and go to `after_fit"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can detect one of those exceptions occurred and add code that executes right after with the following events:\n",
+    "\n",
+    "- `after_cancel_batch`:: reached immediately after a `CancelBatchException` before proceeding to `after_batch`\n",
+    "- `after_cancel_train`:: reached immediately after a `CancelTrainException` before proceeding to `after_epoch`\n",
+    "- `after_cancel_valid`:: reached immediately after a `CancelValidException` before proceeding to `after_epoch`\n",
+    "- `after_cancel_epoch`:: reached immediately after a `CancelEpochException` before proceeding to `after_epoch`\n",
+    "- `after_cancel_fit`:: reached immediately after a `CancelFitException` before proceeding to `after_fit`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Sometimes, callbacks need to be called in a particular order. In the case of `TerminateOnNaNCallback`, it's important that `Recorder` runs its `after_batch` after this callback, to avoid registering an NaN loss. You can specify `run_before` (this callback must run before ...) or `run_after` (this callback must run after ...) in your callback to ensure the ordering that you need."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have seen how to tweak the training loop of fastai to do anything we need, let's take a step back and dig a little bit deeper in the foundations of that training loop."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "TK Write a conclusion"
   ]
  },
  {
@ -920,7 +1244,16 @@
    "1. Calculate the value of `unbias_avg` and `w.avg` for a few batches of dummy values.\n",
    "1. What's the impact of having a high eps in Adam?\n",
    "1. Read through the optimizer notebook in fastai's repo, and execute it.\n",
-    "1. In what situations do dynamic learning rate methods like Adam change the behaviour of weight decay?"
+    "1. In what situations do dynamic learning rate methods like Adam change the behaviour of weight decay?\n",
+    "1. What are the four steps of a training loop?\n",
+    "1. Why is the use of callbacks better than writing a new training loop for each tweak you want to add?\n",
+    "1. What are the necessary points in the design of the fastai's callback system that make it as flexible as copying and pasting bits of code?\n",
+    "1. How can you get the list of events available to you when writing a callback?\n",
+    "1. Write the `ModelResetter` callback (without peeking).\n",
+    "1. How can you access the necessary attributes of the training loop inside a callback? When can you use or not use the shortcut that goes with it?\n",
+    "1. How can a callback influence the control flow of the training loop.\n",
+    "1. Write the `TerminateOnNaN` callback (without peeking if possible).\n",
+    "1. How do you make sure your callback runs after or before another callback?"
   ]
  },
  {
@ -934,7 +1267,28 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "1. Look up the \"rectified Adam\" paper and implement it using the general optimizer framework, and try it out. Search for other recent optimizers that work well in practice, and pick one to implement."
+    "1. Look up the \"rectified Adam\" paper and implement it using the general optimizer framework, and try it out. Search for other recent optimizers that work well in practice, and pick one to implement.\n",
+    "1. Look at the mixed precision callback with the documentation. Try to understand what each event and line of code does.\n",
+    "1. Implement your own version of ther learning rate finder from scratch. Compare it with fastai's version.\n",
+    "1. Look at the source code of the callbacks that ship with fastai. See if you can find one that's similar to what you're looking to do, to get some inspiration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Foundations of Deep Learning: Wrap up"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Congratulations, you have made it to the end of the \"foundations of deep learning\" section. You now understand how all of fastai's applications and most important architectures are built, and the recommended ways to train them, and have all the information you need to build these from scratch. Whilst you probably won't need to create your own training loop, or batchnorm layer, for instance, knowing what is going on behind the scenes is very helpful for debugging, profiling, and deploying your solutions.\n",
+    "\n",
+    "Since you understand all of the foundations of fastai's applications now, be sure to spend some time digging through fastai's source notebooks, and running and experimenting with parts of them, since you can and see exactly how everything in fastai is developed.\n",
+    "\n",
+    "In the next section, we will be looking even further under the covers, to see how the actual forward and backward passes of a neural network are done, and we will see what tools are at our disposal to get better performance. We will then finish up with a project that brings together everything we have learned throughout the book, which we will use to build a method for interpreting convolutional neural networks."
   ]
  },
  {
--- a/17_foundations.ipynb
+++ b/17_foundations.ipynb
@ -39,7 +39,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## A neural net from scratch"
+    "## A neural net layer from scratch"
   ]
  },
  {
@ -470,14 +470,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Broadcasting"
+    "### Broadcasting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "As we discussed in <<chapter_mnist_basics>>, broadcasting is a term introduced by the numpy library that describes how tensor of different ranks are treated during arithmetic operations. For instance, it's obvious there is no way to add a 3 by 3 matrix with a 4 by 5 matrix, but what if we want to add one scalar (which can be represented as a 1 by 1 tensor) with a matrix? Or a vector of size 3 with a 3 by 4 matrix? In both cases, we can find a way to make sense of what the operation could be.\n",
+    "As we discussed in <<chapter_mnist_basics>>, broadcasting is a term introduced by the [numpy library](https://docs.scipy.org/doc/) that describes how tensor of different ranks are treated during arithmetic operations. For instance, it's obvious there is no way to add a 3 by 3 matrix with a 4 by 5 matrix, but what if we want to add one scalar (which can be represented as a 1 by 1 tensor) with a matrix? Or a vector of size 3 with a 3 by 4 matrix? In both cases, we can find a way to make sense of what the operation could be.\n",
    "\n",
    "Broadcasting gives specific rules to codify when shapes are compatible when trying to do an element-wise operation, and how the tensor of the smaller shape is expanded to match the tensor of the bigger shape. It's essential to master those rules if you want to be able to write code that executes quickly. In this section, we'll expand our previous treatment of broadcasting to understand these rules."
   ]
@ -486,14 +486,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Broadcasting with a scalar"
+    "#### Broadcasting with a scalar"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This is the easiest broadcating: when we have a tensor `a` and a scalar, we just imagine a tensor of the same shape as `a` filled with that scalar and perform the operation."
+    "Broadcasting with a scalar is the easiest broadcating: when we have a tensor `a` and a scalar, we just imagine a tensor of the same shape as `a` filled with that scalar and perform the operation."
   ]
  },
  {
@ -553,7 +553,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Broadcasting a vector to a matrix"
+    "Now you could have different means for each row of the matrix, in which case you would need to broadcast a vector to a matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Broadcasting a vector to a matrix"
   ]
  },
  {
@ -1027,14 +1034,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We're now 3,700 times faster than our first implementation!"
+    "We're now 3,700 times faster than our first implementation! Now let's discuss the exact rules of broadcasting."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Broadcasting Rules"
+    "#### Broadcasting Rules"
   ]
  },
  {
@ -1077,7 +1084,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Einstein summation"
+    "Another useful thing for tensor manipulations is the use of Einstein summations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Einstein summation"
   ]
  },
  {
@ -1157,6 +1171,13 @@
    "As we see, not only is it practical, but it's *very* fast. `einsum` is often the fastest way to do custom operations in PyTorch, without diving into C++ and CUDA. (But it's generally not as fast as carefully optimized CUDA code, as you see in the matrix multiplication example)."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we know how to implement a matrix multiplication from scratch, we are ready to build our neural net, specifically its forward and backward passes, just using matrix multiplications."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -1168,7 +1189,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now that we have defined `matmul` from scratch, we are ready to define our first neural net. As we saw in <<chapter_mnist_basics>>, to train it, we will need to compute all the gradients of a given a loss with respect to its parameters, which is known as the *backward pass*. The *forward pass* is computing the output of the model on a given input, which is just based on the matrix products we saw. As we define our first neural net, we will also delve in the problem of properly initializing the weights, which is crucial to make training start properly."
+    "As we saw in <<chapter_mnist_basics>>, to train it, we will need to compute all the gradients of a given a loss with respect to its parameters, which is known as the *backward pass*. The *forward pass* is computing the output of the model on a given input, which is just based on the matrix products we saw. As we define our first neural net, we will also delve in the problem of properly initializing the weights, which is crucial to make training start properly."
   ]
  },
  {
@ -1734,6 +1755,13 @@
    "loss = mse(out, y)"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "That is all for the forward pass, let now look at the gradients."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -1932,6 +1960,13 @@
    "And now we can access to the gradients of our model parameters in `w1.g`, `b1.g`, `w2.g`, `b2.g`."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We have sucessfuly defined our model, now let's make it a bit more like a PyTorch module."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -2194,6 +2229,7 @@
    "Then the structure used to build a more complex model that takes advantage of those functions is a `torch.nn.Module`. This is the base structure for all models and all the neural nets you have seen up until now where from that class. It mostly helps to register all the trainable parameters, which as we've seen can be used in the training loop.\n",
    "\n",
    "To implement a `nn.Module` you just need to\n",
+    "\n",
    "- Make sure the superclass `__init__` is called first when you initiliaze it,\n",
    "- Define any parameter of the model as attributes with `nn.Parameter`,\n",
    "- Define a `forward` function that returns the output of your model.\n",
@ -2314,6 +2350,8 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
+    "TK tweak this and make it a real conclusion\n",
+    "\n",
    "- A neural net is basically a bunch of matrix multiplications with non-linearities in-between.\n",
    "- Python is slow so to write fast code we have to vectorize it and take advantage of element-wise arithmetic or broadcasting.\n",
    "- Two tensors are broadcastable if the dimensions starting from the end and going backward match (they are the same or one of them is 1). To make tensors broadcastable, we may need to add dimensions of size 1 with `unsqueeze` or a `None` index.\n",
--- a/18_CAM.ipynb
+++ b/18_CAM.ipynb
@ -30,7 +30,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now that we know how to build up pretty much anything from scratch, let's use that knowledge to create entirely new (and very useful!) functionality: the *class activation map*. In the process, we'll learn about one handy feature of PyTorch we haven't seen before, the *hook*, and we'll apply many of the concepts classes we've learned in the rest of the book. If you want to really test out your understanding of the material in this book, after you've finished this chapter, try putting the book aside, and recreate the ideas here yourself from scratch (no peaking!)"
+    "Now that we know how to build up pretty much anything from scratch, let's use that knowledge to create entirely new (and very useful!) functionality: the *class activation map*. It gives a us an hindsight of why a CNN made the predictions it did.\n",
+    "\n",
+    "In the process, we'll learn about one handy feature of PyTorch we haven't seen before, the *hook*, and we'll apply many of the concepts classes we've learned in the rest of the book. If you want to really test out your understanding of the material in this book, after you've finished this chapter, try putting the book aside, and recreate the ideas here yourself from scratch (no peaking!)"
   ]
  },
  {
@ -44,7 +46,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Class Activation Mapping (or CAM) was introduced by Zhou et al. in [Learning Deep Features for Discriminative Localization](https://arxiv.org/abs/1512.04150). It uses the output of the last convolutional layer (just before our average pooling) together with the predictions to give us some heatmap visulaization of why the model made its decision.\n",
+    "Class Activation Mapping (or CAM) was introduced by Zhou et al. in [Learning Deep Features for Discriminative Localization](https://arxiv.org/abs/1512.04150). It uses the output of the last convolutional layer (just before our average pooling) together with the predictions to give us some heatmap visulaization of why the model made its decision. This is a useful tool for intepretation.\n",
    "\n",
    "More precisely, at each position of our final convolutional layer we have has many filters as the last linear layer. We can then compute the dot product of those activations by the final weights to have, for each location on our feature map, the score of the feature that was used to make a decision.\n",
    "\n",
@ -422,6 +424,13 @@
    "fastai provides this `Hook` class for you, as well as some other handy classes to make working with hooks easier."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This method is useful but only works for the last layer. Gradient CAM is a variant that addreses this problem."
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -580,6 +589,13 @@
    "              interpolation='bilinear', cmap='magma');"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "TK Write conclusion"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
--- a/18_callbacks.ipynb
+++ b/18_callbacks.ipynb
@ -1,419 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#hide\n",
-    "from utils import *"
-   ]
-  },
-  {
-   "cell_type": "raw",
-   "metadata": {},
-   "source": [
-    "[[chapter_callbacks]]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Callbacks"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Introduction to callbacks"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Since we now know how to create state-of-the-art architectures for computer vision, natural image processing, tabular analysis, and collaborative filtering, and we know how to train them quickly with accelerated optimisers, and we know how to regularise them effectively, we're done, right?\n",
-    "\n",
-    "Well… Yes, sort of. But other things come up. Sometimes you need to change how things work a little bit. In fact, we have already seen examples of this: mixup, FP16 training, resetting the model after each epoch for training RNNs, and so forth. How do we go about making these kinds of tweaks to the training process?\n",
-    "\n",
-    "We've seen the basic training loop, which, with the help of the `Optimizer` class, looks like this for a single epoch:\n",
-    "\n",
-    "```python\n",
-    "for xb,yb in dl:\n",
-    "    loss = loss_func(model(xb), yb)\n",
-    "    loss.backward()\n",
-    "    opt.step()\n",
-    "    opt.zero_grad()\n",
-    "```\n",
-    "\n",
-    "Here's one way to picture that:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<img alt=\"Basic training loop\" width=\"300\" caption=\"Basic training loop\" id=\"basic_loop\" src=\"images/att_00048.png\">"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The usual way for deep learning practitioners to customise the training loop is to make a copy of an existing training loop, and then insert their code necessary for their particular changes into it. This is how nearly all code that you find online will look. But it has some very serious problems.\n",
-    "\n",
-    "It's not very likely that some particular tweaked training loop is going to meet your particular needs. There are hundreds of changes that can be made to a training loop, which means there are billions and billions of possible permutations. You can't just copy one tweak from a training loop here, another from a training loop there, and expect them all to work together. Each will be based on different assumptions about the environment that it's working in, use different naming conventions, and expect the data to be in different formats.\n",
-    "\n",
-    "We need a way to allow users to insert their own code at any part of the training loop, but in a consistent and well-defined way. Computer scientists have already come up with an answer to this question: the callback. A callback is a piece of code that you write, and inject into another piece of code at some predefined point. In fact, callbacks have been used with deep learning training loops for years. The problem is that only a small subset of places that may require code injection have been available in previous libraries, and, more importantly, callbacks were not able to do all the things they needed to do.\n",
-    "\n",
-    "In order to be just as flexible as manually copying and pasting a training loop and directly inserting code into it, a callback must be able to read every possible piece of information available in the training loop, modify all of it as needed, and fully control when a batch, epoch, or even all the whole training loop should be terminated. fastai is the first library to provide all of this functionality. It modifies the training loop so it looks like this:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<img alt=\"Training loop with callbacks\" width=\"550\" caption=\"Training loop with callbacks\" id=\"cb_loop\" src=\"images/att_00049.png\">"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The real test of whether this works has been borne out over the last couple of years — it has turned out that every single new paper implemented, or use a request fulfilled, for modifying the training loop has successfully been achieved entirely by using the fastai callback system. The training loop itself has not required modifications. Here are just a few of the callbacks that have been added:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<img alt=\"Some fastai callbacks\" width=\"500\" caption=\"Some fastai callbacks\" id=\"some_cbs\" src=\"images/att_00050.png\">"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The reason that this is important for all of us is that it means that whatever idea we have in our head, we can implement it. We need never dig into the source code of PyTorch or fastai and act together some one-off system to try out our ideas. And when we do implement our own callbacks to develop our own ideas, we know that they will work together with all of the other functionality provided by fastai – so we will get progress bars, mixed precision training, hyperparameter annealing, and so forth.\n",
-    "\n",
-    "Another advantage is that it makes it easy to gradually remove or add functionality and perform ablation studies. You just need to adjust the list of callbacks you pass along to your fit function."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "As an example, here is the fastai source code that is run for each batch of the training loop:\n",
-    "\n",
-    "```python\n",
-    "try:\n",
-    "    self._split(b);                                  self('begin_batch')\n",
-    "    self.pred = self.model(*self.xb);                self('after_pred')\n",
-    "    self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')\n",
-    "    if not self.training: return\n",
-    "    self.loss.backward();                            self('after_backward')\n",
-    "    self.opt.step();                                 self('after_step')\n",
-    "    self.opt.zero_grad()\n",
-    "except CancelBatchException:                         self('after_cancel_batch')\n",
-    "finally:                                             self('after_batch')\n",
-    "```\n",
-    "\n",
-    "The calls of the form `self('...')` are where the callbacks are called. As you see, after every step a callback is called. The callback will receive the entire state of training, and can also modify it. For instance, as you see above, the input data and target labels are in `self.xb` and `self.yb` respectively. A callback can modify these to modify the data the training loop sees. It can also modify `self.loss`, or even modify the gradients."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Creating a callback"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The full list of available callback events is:\n",
-    "\n",
-    "- `begin_fit`: called before doing anything, ideal for initial setup.\n",
-    "- `begin_epoch`: called at the beginning of each epoch, useful for any behavior you need to reset at each epoch.\n",
-    "- `begin_train`: called at the beginning of the training part of an epoch.\n",
-    "- `begin_batch`: called at the beginning of each batch, just after drawing said batch. It can be used to do any setup necessary for the batch (like hyper-parameter scheduling) or to change the input/target before it goes in the model (change of the input with techniques like mixup for instance).\n",
-    "- `after_pred`: called after computing the output of the model on the batch. It can be used to change that output before it's fed to the loss.\n",
-    "- `after_loss`: called after the loss has been computed, but before the backward pass. It can be used to add any penalty to the loss (AR or TAR in RNN training for instance).\n",
-    "- `after_backward`: called after the backward pass, but before the update of the parameters. It can be used to do any change to the gradients before said update (gradient clipping for instance).\n",
-    "- `after_step`: called after the step and before the gradients are zeroed.\n",
-    "- `after_batch`: called at the end of a batch, for any clean-up before the next one.\n",
-    "- `after_train`: called at the end of the training phase of an epoch.\n",
-    "- `begin_validate`: called at the beginning of the validation phase of an epoch, useful for any setup needed specifically for validation.\n",
-    "- `after_validate`: called at the end of the validation part of an epoch.\n",
-    "- `after_epoch`: called at the end of an epoch, for any clean-up before the next one.\n",
-    "- `after_fit`: called at the end of training, for final clean-up.\n",
-    "\n",
-    "This list is available as attributes of the special variable `event`; so just type `event.` and hit `Tab` in your notebook to see a list of all the options"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's take a look at an example. Do you recall how in <<chapter_nlp_dive>> we needed to ensure that our special `reset` method was called at the start of training and validation for each epoch? We used the `ModelReseter` callback provided by fastai to do this for us. But how did `ModelReseter` do that exactly? Here's the full actual source code to that class:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class ModelReseter(Callback):\n",
-    "    def begin_train(self):    self.model.reset()\n",
-    "    def begin_validate(self): self.model.reset()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Yes, that's actually it! It just does what we said in the paragraph above: after completing training and epoch or validation for an epoch, call a method named `reset`.\n",
-    "\n",
-    "Callbacks are often \"short and sweet\" like this one. In fact, let's look at one more. Here's the fastai source for the callback that add RNN regularization (*AR* and *TAR*):"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class RNNRegularizer(Callback):\n",
-    "    def __init__(self, alpha=0., beta=0.): self.alpha,self.beta = alpha,beta\n",
-    "\n",
-    "    def after_pred(self):\n",
-    "        self.raw_out,self.out = self.pred[1],self.pred[2]\n",
-    "        self.learn.pred = self.pred[0]\n",
-    "\n",
-    "    def after_loss(self):\n",
-    "        if not self.training: return\n",
-    "        if self.alpha != 0.:\n",
-    "            self.learn.loss += self.alpha * self.out[-1].float().pow(2).mean()\n",
-    "        if self.beta != 0.:\n",
-    "            h = self.raw_out[-1]\n",
-    "            if len(h)>1:\n",
-    "                self.learn.loss += self.beta * (h[:,1:] - h[:,:-1]\n",
-    "                                               ).float().pow(2).mean()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "> stop: Go back to where we discussed TAR and AR regularization, and compare to the code here. Made sure you understand what it's doing, and why."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In both of these examples, notice how we can access attributes of the training loop by directly checking `self.model` or `self.pred`. That's because a `Callback` will always try to get an attribute it doesn't have inside the `Learner` associated to it. This is a shortcut for `self.learn.model` or `self.learn.pred`. Note that this shortcut works for reading attributes, but not for writing them, which is why when `RNNRegularizer` changes the loss or the predictions, you see `self.learn.loss = ` or `self.learn.pred = `. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "When writing a callback, the following attributes of `Learner` are available:\n",
-    "\n",
-    "- `model`: the model used for training/validation\n",
-    "- `data`: the underlying `DataLoaders`\n",
-    "- `loss_func`: the loss function used\n",
-    "- `opt`: the optimizer used to udpate the model parameters\n",
-    "- `opt_func`: the function used to create the optimizer\n",
-    "- `cbs`: the list containing all `Callback`s\n",
-    "- `dl`: current `DataLoader` used for iteration\n",
-    "- `x`/`xb`: last input drawn from `self.dl` (potentially modified by callbacks). `xb` is always a tuple (potentially with one element) and `x` is detuplified. You can only assign to `xb`.\n",
-    "- `y`/`yb`: last target drawn from `self.dl` (potentially modified by callbacks). `yb` is always a tuple (potentially with one element) and `y` is detuplified. You can only assign to `yb`.\n",
-    "- `pred`: last predictions from `self.model` (potentially modified by callbacks)\n",
-    "- `loss`: last computed loss (potentially modified by callbacks)\n",
-    "- `n_epoch`: the number of epochs in this training\n",
-    "- `n_iter`: the number of iterations in the current `self.dl`\n",
-    "- `epoch`: the current epoch index (from 0 to `n_epoch-1`)\n",
-    "- `iter`: the current iteration index in `self.dl` (from 0 to `n_iter-1`)\n",
-    "\n",
-    "The following attributes are added by `TrainEvalCallback` and should be available unless you went out of your way to remove that callback:\n",
-    "\n",
-    "- `train_iter`: the number of training iterations done since the beginning of this training\n",
-    "- `pct_train`: from 0. to 1., the percentage of training iterations completed\n",
-    "- `training`:  flag to indicate if we're in training mode or not\n",
-    "\n",
-    "The following attribute is added by `Recorder` and should be available unless you went out of your way to remove that callback:\n",
-    "\n",
-    "- `smooth_loss`: an exponentially-averaged version of the training loss"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Callback ordering and exceptions"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Sometimes, callbacks need to be able to tell fastai to skip over a batch, or an epoch, or stop training altogether. For instance, consider `TerminateOnNaNCallback`. This handy callback will automatically stop training any time the loss becomes infinite or `NaN` (*not a number*). Here's the fastai source for this callback:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class TerminateOnNaNCallback(Callback):\n",
-    "    run_before=Recorder\n",
-    "    def after_batch(self):\n",
-    "        if torch.isinf(self.loss) or torch.isnan(self.loss):\n",
-    "            raise CancelFitException"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The way it tells the training loop to interrupt training at this point is to `raise CancelFitException`. The training loop catches this exception and does not run any further training or validation. The callback control flow exceptions available are:\n",
-    "\n",
-    "- `CancelFitException`: Skip the rest of this batch and go to `after_batch\n",
-    "- `CancelEpochException`: Skip the rest of the training part of the epoch and go to `after_train\n",
-    "- `CancelTrainException`: Skip the rest of the validation part of the epoch and go to `after_validate\n",
-    "- `CancelValidException`: Skip the rest of this epoch and go to `after_epoch\n",
-    "- `CancelBatchException`: Interrupts training and go to `after_fit"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can detect one of those exceptions occurred and add code that executes right after with the following events:\n",
-    "\n",
-    "- `after_cancel_batch`: reached immediately after a `CancelBatchException` before proceeding to `after_batch`\n",
-    "- `after_cancel_train`: reached immediately after a `CancelTrainException` before proceeding to `after_epoch`\n",
-    "- `after_cancel_valid`: reached immediately after a `CancelValidException` before proceeding to `after_epoch`\n",
-    "- `after_cancel_epoch`: reached immediately after a `CancelEpochException` before proceeding to `after_epoch`\n",
-    "- `after_cancel_fit`: reached immediately after a `CancelFitException` before proceeding to `after_fit`"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Sometimes, callbacks need to be called in a particular order. In the case of `TerminateOnNaNCallback`, it's important that `Recorder` runs its `after_batch` after this callback, to avoid registering an NaN loss. You can specify `run_before` (this callback must run before ...) or `run_after` (this callback must run after ...) in your callback to ensure the ordering that you need.\n",
-    "\n",
-    "Now that we have seen how to tweak the training loop of fastai to do anything we need, let's take a step back and dig a little bit deeper in the foundations of that training loop."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Questionnaire"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "1. What are the four steps of a training loop?\n",
-    "1. Why is the use of callbacks better than writing a new training loop for each tweak you want to add?\n",
-    "1. What are the necessary points in the design of the fastai's callback system that make it as flexible as copying and pasting bits of code?\n",
-    "1. How can you get the list of events available to you when writing a callback?\n",
-    "1. Write the `ModelResetter` callback (without peeking).\n",
-    "1. How can you access the necessary attributes of the training loop inside a callback? When can you use or not use the shortcut that goes with it?\n",
-    "1. How can a callback influence the control flow of the training loop.\n",
-    "1. Write the `TerminateOnNaN` callback (without peeking if possible).\n",
-    "1. How do you make sure your callback runs after or before another callback?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Further research"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "1. Look at the mixed precision callback with the documentation. Try to understand what each event and line of code does.\n",
-    "1. Implement your own version of ther learning rate finder from scratch. Compare it with fastai's version.\n",
-    "1. Look at the source code of the callbacks that ship with fastai. See if you can find one that's similar to what you're looking to do, to get some inspiration."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Foundations of Deep Learning: Wrap up"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Congratulations, you have made it to the end of the \"foundations of deep learning\" section. You now understand how all of fastai's applications and most important architectures are built, and the recommended ways to train them, and have all the information you need to build these from scratch. Whilst you probably won't need to create your own training loop, or batchnorm layer, for instance, knowing what is going on behind the scenes is very helpful for debugging, profiling, and deploying your solutions.\n",
-    "\n",
-    "Since you understand all of the foundations of fastai's applications now, be sure to spend some time digging through fastai's source notebooks, and running and experimenting with parts of them, since you can and see exactly how everything in fastai is developed.\n",
-    "\n",
-    "In the next section, we will be looking even further under the covers, to see how the actual forward and backward passes of a neural network are done, and we will see what tools are at our disposal to get better performance. We will then finish up with a project that brings together everything we have learned throughout the book, which we will use to build a method for interpreting convolutional neural networks."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "split_at_heading": true
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.5"
-  },
-  "toc": {
-   "base_numbering": 1,
-   "nav_menu": {},
-   "number_sections": false,
-   "sideBar": true,
-   "skip_h1_title": true,
-   "title_cell": "Table of Contents",
-   "title_sidebar": "Contents",
-   "toc_cell": false,
-   "toc_position": {},
-   "toc_section_display": true,
-   "toc_window_display": false
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/19_learner.ipynb
+++ b/19_learner.ipynb
@ -2,7 +2,7 @@
 "cells": [
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -35,7 +35,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -44,7 +44,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -53,7 +53,7 @@
       "Path('/home/jhoward/.fastai/data/imagenette2-160/val/n03417042/n03417042_3752.JPEG')"
      ]
     },
-     "execution_count": 3,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -65,7 +65,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -74,7 +74,7 @@
       "Path('/home/jhoward/.fastai/data/imagenette2-160/val/n03417042/n03417042_3752.JPEG')"
      ]
     },
-     "execution_count": 4,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -87,7 +87,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -97,7 +97,7 @@
       "<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=213x160 at 0x7F471FDA55D0>"
      ]
     },
-     "execution_count": 5,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -109,7 +109,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -118,7 +118,7 @@
       "(#10) ['n03417042','n03445777','n03888257','n03394916','n02979186','n03000684','n03425413','n01440764','n03028079','n02102040']"
      ]
     },
-     "execution_count": 6,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -129,7 +129,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -147,7 +147,7 @@
       " 'n02102040': 9}"
      ]
     },
-     "execution_count": 7,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -158,7 +158,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -167,7 +167,7 @@
       "torch.Size([160, 213, 3])"
      ]
     },
-     "execution_count": 8,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -186,7 +186,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -201,7 +201,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -210,7 +210,7 @@
       "(9469, 3925)"
      ]
     },
-     "execution_count": 10,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -223,7 +223,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -232,7 +232,7 @@
       "(torch.Size([64, 64, 3]), tensor(0))"
      ]
     },
-     "execution_count": 11,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -245,7 +245,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -267,7 +267,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -278,7 +278,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -287,7 +287,7 @@
       "(torch.Size([2, 64, 64, 3]), tensor([0, 0]))"
      ]
     },
-     "execution_count": 14,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -299,7 +299,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -319,7 +319,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -328,7 +328,7 @@
       "(torch.Size([128, 64, 64, 3]), torch.Size([128]), 74)"
      ]
     },
-     "execution_count": 16,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -343,7 +343,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -352,7 +352,7 @@
       "[tensor([0.4544, 0.4453, 0.4141]), tensor([0.2812, 0.2766, 0.2981])]"
      ]
     },
-     "execution_count": 17,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -364,7 +364,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -378,7 +378,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -388,7 +388,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -397,7 +397,7 @@
       "(tensor([0.3732, 0.4907, 0.5633]), tensor([1.0212, 1.0311, 1.0131]))"
      ]
     },
-     "execution_count": 20,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -416,7 +416,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -427,7 +427,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -466,7 +466,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 21,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -475,7 +475,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -496,7 +496,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -505,7 +505,7 @@
       "2"
      ]
     },
-     "execution_count": 26,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -517,7 +517,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -526,7 +526,7 @@
       "torch.Size([128, 4, 64, 64])"
      ]
     },
-     "execution_count": 27,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -539,7 +539,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -555,7 +555,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -564,7 +564,7 @@
       "torch.Size([3, 2])"
      ]
     },
-     "execution_count": 29,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -577,7 +577,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -589,7 +589,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -598,7 +598,7 @@
       "4"
      ]
     },
-     "execution_count": 31,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -610,7 +610,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -619,7 +619,7 @@
       "device(type='cuda', index=5)"
      ]
     },
-     "execution_count": 32,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -638,7 +638,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -655,7 +655,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -665,7 +665,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 35,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -682,7 +682,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 36,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -691,7 +691,7 @@
       "10"
      ]
     },
-     "execution_count": 36,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -703,7 +703,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 37,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -713,7 +713,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 38,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -732,7 +732,7 @@
       "torch.Size([128, 10])"
      ]
     },
-     "execution_count": 38,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -751,7 +751,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 39,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -767,7 +767,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 40,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -776,7 +776,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -785,7 +785,7 @@
       "tensor(-2.7753, grad_fn=<SelectBackward>)"
      ]
     },
-     "execution_count": 41,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -796,7 +796,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -805,7 +805,7 @@
       "tensor(2.5293, grad_fn=<NegBackward>)"
      ]
     },
-     "execution_count": 42,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -828,7 +828,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -837,7 +837,7 @@
       "tensor(-2.7753, grad_fn=<SelectBackward>)"
      ]
     },
-     "execution_count": 43,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -863,7 +863,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 44,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -872,7 +872,7 @@
       "tensor(False)"
      ]
     },
-     "execution_count": 44,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -885,7 +885,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 45,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -896,7 +896,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 46,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -905,7 +905,7 @@
       "tensor(2.3158, grad_fn=<SelectBackward>)"
      ]
     },
-     "execution_count": 46,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -923,7 +923,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 47,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -932,7 +932,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 48,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -941,7 +941,7 @@
       "tensor(-2.7753, grad_fn=<SelectBackward>)"
      ]
     },
-     "execution_count": 48,
+     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -952,7 +952,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 49,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -968,7 +968,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 50,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -982,7 +982,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 51,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -992,7 +992,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 52,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1001,7 +1001,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 53,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1045,7 +1045,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 54,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1054,7 +1054,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 55,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1077,7 +1077,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 56,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1091,7 +1091,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 57,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1100,7 +1100,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 58,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1109,7 +1109,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 75,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -1160,7 +1160,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 59,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1182,7 +1182,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 60,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1191,7 +1191,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 61,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1200,7 +1200,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 62,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -1273,7 +1273,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 113,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -1296,7 +1296,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 136,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1323,7 +1323,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 137,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1332,7 +1332,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 138,
+   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
@ -1341,7 +1341,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 139,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -1623,7 +1623,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 140,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -1731,34 +1731,6 @@
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.5"
-  },
-  "toc": {
-   "base_numbering": 1,
-   "nav_menu": {
-    "height": "140px",
-    "width": "202px"
-   },
-   "number_sections": false,
-   "sideBar": true,
-   "skip_h1_title": true,
-   "title_cell": "Table of Contents",
-   "title_sidebar": "Contents",
-   "toc_cell": false,
-   "toc_position": {},
-   "toc_section_display": true,
-   "toc_window_display": false
  }
 },
 "nbformat": 4,
--- a/20_conclusion.ipynb
+++ b/20_conclusion.ipynb
@ -20,7 +20,7 @@
   "source": [
    "Congratulations! You've made it! If you have worked through all of the notebooks to this point, then you have joined a small, but growing group of people that are able to harness the power of deep learning to solve real problems. You may not feel that way; in fact you probably do not feel that way. We have seen again and again that students that complete the fast.AI courses dramatically underestimate how effective they are as deep learning practitioners. We've also seen that these people are often underestimated by those that have come out of a classic academic background. So for you to rise above your own expectations and the expectations of others what you do next, after closing this book, is even more important than what you've done to get to this point.\n",
    "\n",
-    "The most important thing is to keep the momentum going. In fact, as you know from your study of optimisers, momentum is something which can build upon itself! So think about what it is you can do now to maintain and accelerate your deep learning journey. Here's a few ideas:"
+    "The most important thing is to keep the momentum going. In fact, as you know from your study of optimisers, momentum is something which can build upon itself! So think about what it is you can do now to maintain and accelerate your deep learning journey. <<do_next>> can give you a few ideas."
   ]
  },
  {
@ -69,18 +69,6 @@
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.3"
  }
 },
 "nbformat": 4,