Update
This commit is contained in:
@@ -71,7 +71,7 @@
|
||||
"\n",
|
||||
"Here's a list of some of the thousands of tasks where deep learning, or methods heavily using deep learning, is now the best in the world:\n",
|
||||
"\n",
|
||||
"- Natural language processing:: answering questions; speech recognition; summarizing documents; classifying documents; finding names, dates, etc. in documents; searching for articles mentioning a concept\n",
|
||||
"- Natural Language Processing (NLP):: answering questions; speech recognition; summarizing documents; classifying documents; finding names, dates, etc. in documents; searching for articles mentioning a concept\n",
|
||||
"- Computer vision:: satellite and drone imagery interpretation (e.g. for disaster resilience); face recognition; image captioning; reading traffic signs; locating pedestrians and vehicles in autonomous vehicles\n",
|
||||
"- Medicine:: Finding anomalies in radiology images, including CT, MRI, and x-ray; counting features in pathology slides; measuring features in ultrasounds; diagnosing diabetic retinopathy\n",
|
||||
"- Biology:: folding proteins; classifying proteins; many genomics tasks, such as tumor-normal sequencing and classifying clinically actionable genetic mutations; cell classification; analyzing protein/protein interactions\n",
|
||||
@@ -175,7 +175,7 @@
|
||||
"\n",
|
||||
"- How to train models that achieve state of the art results in:\n",
|
||||
" - Computer vision: Image classification (e.g. classify pet photos by breed), and image localization and detection (e.g. find where the animals in an image are)\n",
|
||||
" - Natural Language Processing (NLP): Document classification (e.g. movie review sentiment analysis), and language modelling\n",
|
||||
" - NLP: Document classification (e.g. movie review sentiment analysis), and language modelling\n",
|
||||
" - Tabular data (e.g. sales prediction) with categorical data, continuous data, and mixed data, including time series\n",
|
||||
" - Collaborative filtering (e.g. movie recommendation)\n",
|
||||
"- How to turn your models into web applications\n",
|
||||
|
||||
@@ -369,7 +369,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Have a look at each image, and check that each one seems to have the correct label for that breed of pet. Often, data scientists work with data with which they are not familiar as domain experts me: for instance, I actually don't know what a lot of these pet breeds are. Since I am not an expert on pet breeds, I would use Google images at this point to search for a few of these breeds, and make sure the images looks similar to what I see in this output.\n",
|
||||
"Have a look at each image, and check that each one seems to have the correct label for that breed of pet. Often, data scientists work with data with which they are not as familiar as domain experts may be: for instance, I actually don't know what a lot of these pet breeds are. Since I am not an expert on pet breeds, I would use Google images at this point to search for a few of these breeds, and make sure the images looks similar to what I see in this output.\n",
|
||||
"\n",
|
||||
"If you made a mistake while building your `DataBlock` it is very likely you won't see it before this step. To debug this, we encourage you to use the `summary` method. It will attempt to create a batch from the source you give it, with a lot of details. Also, if it fails, you will see exactly at which point the error happens, and the library will try to give you some help. For instance, one common mistake is to forget to put a `Resize` transform, ending up with pictures of different sizes and not able to batch them. Here is what the summary would look like in that case (note that the exact text may have changed since the time of writing, but it will give you an idea):"
|
||||
]
|
||||
@@ -624,7 +624,7 @@
|
||||
"- It works even when our dependent variable has more than two categories\n",
|
||||
"- It results in faster and more reliable training.\n",
|
||||
"\n",
|
||||
"In order to understand how cross entropy loss works for dependent variables with more than two categories, we first have to understand what the actual data and activations that are loss function is seen look like."
|
||||
"In order to understand how cross entropy loss works for dependent variables with more than two categories, we first have to understand what the actual data and activations that are seen by the loss function look like."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1599,7 +1599,7 @@
|
||||
"source": [
|
||||
"That did not look good. Here's what happened. The optimiser stepped in the correct direction, but it stepped so far that it totally overshot the minimum loss. Repeating that multiple times makes it get further and further away, not closer and closer!\n",
|
||||
"\n",
|
||||
"What do we do to find the perfect learning rate, not too high, and not too low? In 2015 the researcher Leslie Smith came up with a brilliant idea, called the *learning rate finder*. His idea was to start with a very very small learning rate, something so small that we would never expect it to be too big to handle. We use that for one mini batch, find the losses afterwards, and then increase the learning rate by some percentage (e.g. doubling it each time). Then we do another mini batch, track the loss, and double the learning rate again. We keep doing this until the loss gets worse, instead of better. This is the point where we know we have gone too far. We then select a learning rate a bit lower than this point. Our advice is to pick either:\n",
|
||||
"What do we do to find the perfect learning rate, not too high, and not too low? In 2015 the researcher Leslie Smith came up with a brilliant idea, called the *learning rate finder*. His idea was to start with a very very small learning rate, something so small that we would never expect it to be too big to handle. We use that for one mini batch, find what the losses afterwards, and then increase the learning rate by some percentage (e.g. doubling it each time). Then we do another mini batch, track the loss, and double the learning rate again. We keep doing this until the loss gets worse, instead of better. This is the point where we know we have gone too far. We then select a learning rate a bit lower than this point. Our advice is to pick either:\n",
|
||||
"\n",
|
||||
"- one order of magnitude less than where the minimum loss was achieved (i.e. the minimum divided by 10)\n",
|
||||
"- the last point where the loss was clearly decreasing. \n",
|
||||
@@ -1788,7 +1788,7 @@
|
||||
"\n",
|
||||
"We want to train a model in such a way that we allow it to remember all of these generally useful ideas from the pretrained model, use them to solve our particular task (classify pet breeds), and only adjust them as required for the specifics of our particular task.\n",
|
||||
"\n",
|
||||
"Our challenge then when fine tuning is to replace the random weights in our added linear layers with weights that correctly achieve our desired task (classifying pet breeds) without breaking the carefully pretrained weights and the other layers. There is actually a very simple trick to allow this to happen: tell the optimiser to only update the weights in those randomly added final layers. Don't change the weights in the rest of the neural network at all. This is called *freezing* those pretrained layers."
|
||||
"Our challenge than when fine tuning is to replace the random weights in our added linear layers with weights that correctly achieve our desired task (classifying pet breeds) without breaking the carefully pretrained weights and the other layers. There is actually a very simple trick to allow this to happen: tell the optimiser to only update the weights in those randomly added final layers. Don't change the weights in the rest of the neural network at all. This is called *freezing* those pretrained layers."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user