commit
5ddb589296
@ -72,7 +72,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now if we are going to understand how to extract the breed of each pet from each image were going to need to understand how this data is laid out. Such details of data layout are a vital piece of the deep learning puzzle. Data is usually provided in one of these two ways:\n",
|
||||
"Now if we are going to understand how to extract the breed of each pet from each image we're going to need to understand how this data is laid out. Such details of data layout are a vital piece of the deep learning puzzle. Data is usually provided in one of these two ways:\n",
|
||||
"\n",
|
||||
"- Individual files representing items of data, such as text documents or images, possibly organised into folders or with filenames representing information about those items, or\n",
|
||||
"- A table of data, such as in CSV format, where each row is an item, each row which may include filenames providing a connection between the data in the table and data in other formats such as text documents and images.\n",
|
||||
@ -1073,7 +1073,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To see exactly what's happening here, let's put all the columns together in a table. Here, the first two columns are out activations, then we have the targets, the row index, and finally the result shown immediately above:"
|
||||
"To see exactly what's happening here, let's put all the columns together in a table. Here, the first two columns are our activations, then we have the targets, the row index, and finally the result shown immediately above:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1599,7 +1599,7 @@
|
||||
"source": [
|
||||
"That did not look good. Here's what happened. The optimiser stepped in the correct direction, but it stepped so far that it totally overshot the minimum loss. Repeating that multiple times makes it get further and further away, not closer and closer!\n",
|
||||
"\n",
|
||||
"What do we do to find the perfect learning rate, not too high, and not too low? In 2015 the researcher Leslie Smith came up with a brilliant idea, called the *learning rate finder*. His idea was to start with a very very small learning rate, something so small that we would never expect it to be too big to handle. We use that for one mini batch, find what the losses afterwards, and then increase the learning rate by some percentage (e.g. doubling it each time). Then we do another mini batch, track the loss, and double the learning rate again. We keep doing this until the loss gets worse, instead of better. This is the point where we know we have gone too far. We then select a learning rate a bit lower than this point. Our advice is to pick either:\n",
|
||||
"What do we do to find the perfect learning rate, not too high, and not too low? In 2015 the researcher Leslie Smith came up with a brilliant idea, called the *learning rate finder*. His idea was to start with a very very small learning rate, something so small that we would never expect it to be too big to handle. We use that for one mini batch, find what the losses are afterwards, and then increase the learning rate by some percentage (e.g. doubling it each time). Then we do another mini batch, track the loss, and double the learning rate again. We keep doing this until the loss gets worse, instead of better. This is the point where we know we have gone too far. We then select a learning rate a bit lower than this point. Our advice is to pick either:\n",
|
||||
"\n",
|
||||
"- one order of magnitude less than where the minimum loss was achieved (i.e. the minimum divided by 10)\n",
|
||||
"- the last point where the loss was clearly decreasing. \n",
|
||||
@ -1784,11 +1784,11 @@
|
||||
"\n",
|
||||
"This final linear layer is unlikely to be of any use for us, when we are fine tuning in a transfer learning setting, because it is specifically designed to classify the categories in the original pretraining dataset. So when we do transfer learning we remove it, throw it away, and replace it with a new linear layer with the correct number of outputs for our desired task (in this case, there would be 37 activations).\n",
|
||||
"\n",
|
||||
"This newly added linear layer will have entirely random weights. Therefore, our model prior to fine tuning has entirely random outputs. But that does not mean that it is an entirely random model! All of the layers prior to the last one have been carefully trained to be good at image classification tasks in general. As we saw in the images from the Zeiler and Fergus paper in <<chapter_intro>> (see <<img_layer1>> and followings), the first layers encode very general concepts such as finding gradients and edges, and later layers encode concepts that are still very useful for us, such as finding eyeballs and fur.\n",
|
||||
"This newly added linear layer will have entirely random weights. Therefore, our model prior to fine tuning has entirely random outputs. But that does not mean that it is an entirely random model! All of the layers prior to the last one have been carefully trained to be good at image classification tasks in general. As we saw in the images from the Zeiler and Fergus paper in <<chapter_intro>> (see <<img_layer1>> and followings), the first few layers encode very general concepts such as finding gradients and edges, and later layers encode concepts that are still very useful for us, such as finding eyeballs and fur.\n",
|
||||
"\n",
|
||||
"We want to train a model in such a way that we allow it to remember all of these generally useful ideas from the pretrained model, use them to solve our particular task (classify pet breeds), and only adjust them as required for the specifics of our particular task.\n",
|
||||
"\n",
|
||||
"Our challenge than when fine tuning is to replace the random weights in our added linear layers with weights that correctly achieve our desired task (classifying pet breeds) without breaking the carefully pretrained weights and the other layers. There is actually a very simple trick to allow this to happen: tell the optimiser to only update the weights in those randomly added final layers. Don't change the weights in the rest of the neural network at all. This is called *freezing* those pretrained layers."
|
||||
"Our challenge when fine tuning is to replace the random weights in our added linear layers with weights that correctly achieve our desired task (classifying pet breeds) without breaking the carefully pretrained weights and the other layers. There is actually a very simple trick to allow this to happen: tell the optimiser to only update the weights in those randomly added final layers. Don't change the weights in the rest of the neural network at all. This is called *freezing* those pretrained layers."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -2308,7 +2308,7 @@
|
||||
"\n",
|
||||
"You may have to restart your notebook when this happens, and the way to solve it is to use a smaller *batch size*, which means we will pass smaller groups of images at any given time through our model. We can pass the batch size we want to the call creating our `DataLoaders` with `bs=`.\n",
|
||||
"\n",
|
||||
"The other downside of deeper architectures is that they take quite a bit longer to train. One thing that can speed things up a lot is *mixed precision training*. This refers to using less precise numbers (*half precision floating point*, also called *fp16*) where possible during training. As we are writing this words (early 2020) nearly all current NVIDIA GPUs support a special feature called *tensor cores* which can dramatically (2x-3x) speed up neural network training. They also require a lot less GPU memory. To enable this feature in fastai, just add `to_fp16()` after your `Learner` creation (you also need to import the module).\n",
|
||||
"The other downside of deeper architectures is that they take quite a bit longer to train. One thing that can speed things up a lot is *mixed precision training*. This refers to using less precise numbers (*half precision floating point*, also called *fp16*) where possible during training. As we are writing these words (early 2020) nearly all current NVIDIA GPUs support a special feature called *tensor cores* which can dramatically (2x-3x) speed up neural network training. They also require a lot less GPU memory. To enable this feature in fastai, just add `to_fp16()` after your `Learner` creation (you also need to import the module).\n",
|
||||
"\n",
|
||||
"You can't really know ahead of time what the best architecture for your particular problem is, until you try training some. So let's try a resnet 50 now with mixed precision:"
|
||||
]
|
||||
@ -2458,7 +2458,7 @@
|
||||
"source": [
|
||||
"In this chapter we learned some important practical tips, both for getting our image data ready for modeling (presizing; data block summary) and for fitting the model (learning rate finder, unfreezing, discriminative learning rates, setting the number of epochs, and using deeper architectures). Using these tools will help you to build more accurate image models, more quickly.\n",
|
||||
"\n",
|
||||
"We also learned about cross entropy loss. This part of the book is worth spending plenty of time on. You aren't likely to need to actually implement cross entropy loss from scratch yourself in practice, but it's really important you understand the inputs to and output from that function, because it (or a variant of it, as we'll see in the next chapter) is used in nearly everything classification model. So when you want to debug a model, or put a model in production, or improve the accuracy of a model, you're going to need to be able to look at its activations and loss, and understand what's going on, and why. You can't do that properly if you don't understand your loss function.\n",
|
||||
"We also learned about cross entropy loss. This part of the book is worth spending plenty of time on. You aren't likely to need to actually implement cross entropy loss from scratch yourself in practice, but it's really important you understand the inputs to and output from that function, because it (or a variant of it, as we'll see in the next chapter) is used in nearly every classification model. So when you want to debug a model, or put a model in production, or improve the accuracy of a model, you're going to need to be able to look at its activations and loss, and understand what's going on, and why. You can't do that properly if you don't understand your loss function.\n",
|
||||
"\n",
|
||||
"If cross entropy loss hasn't \"clicked\" for you just yet, don't worry--you'll get there! First, go back to the last chapter and make sure you really understand `mnist_loss`. Then work gradually through the cells of the notebook for this chapter, where we step through each piece of cross entropy loss. Make sure you understand what each calculation is doing, and why. Try creating some small tensors yourself and pass them into the functions, to see what they return.\n",
|
||||
"\n",
|
||||
@ -2495,7 +2495,7 @@
|
||||
"1. What two steps does the fine_tune method do?\n",
|
||||
"1. In Jupyter notebook, how do you get the source code for a method or function?\n",
|
||||
"1. What are discriminative learning rates?\n",
|
||||
"1. How is a Python slice object interpreted when past as a learning rate to fastai?\n",
|
||||
"1. How is a Python slice object interpreted when passed as a learning rate to fastai?\n",
|
||||
"1. Why is early stopping a poor choice when using one cycle training?\n",
|
||||
"1. What is the difference between resnet 50 and resnet101?\n",
|
||||
"1. What does to_fp16 do?"
|
||||
@ -2532,6 +2532,18 @@
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Loading…
Reference in New Issue
Block a user