This commit is contained in:
Sylvain Gugger 2020-04-15 05:49:39 -07:00
parent 3fdf8c070b
commit 676ebb9941
3 changed files with 6 additions and 18 deletions

View File

@ -2508,7 +2508,7 @@
"\n",
"- **Initialize**:: we initialize the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initialising them to the percentage of times that that pixel is activated for that category. But since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well\n",
"- **Loss**:: This is the thing Arthur Samuel referred to: \"*testing the effectiveness of any current weight assignment in terms of actual performance*\". We need some function that will return a number that is small if the performance of the model is good (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention)\n",
"- **Step**:: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it. Increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount which works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out which direction, and roughly how much, to change each weight, without having to try all these small changes. The way to do this is by calculating *gradients*. This is just a performance optimisation. We would get exactly the same results by using the slower manual process as well\n",
"- **Step**:: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it. Increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount which works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out which direction, and roughly how much, to change each weight, without having to try all these small changes. The way to do this is by calculating *gradients*. This is just a performance optimisation, we would get exactly the same results by using the slower manual process as well\n",
"- **Stop**:: We have already discussed how to choose how many epochs to train a model for. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time."
]
},
@ -3505,7 +3505,7 @@
"source": [
"To summarize, at the beginning, the weights of our model can be random (training *from scratch*) or come from a pretrained model (*transfer learning*). In the first case, the output we will get from our inputs won't have anything to do with what we want, and even in the second case, it's very likely the pretrained model won't be very good at the specific task we are targeting. So the model will need to *learn* better weights.\n",
"\n",
"To do this, we will compare the outputs the model gives us with our targets (we have labelled data, so we know what result the model should give) using a *loss function*, which returns a number that needs to be as low as possible. Our weights need to be improved. To do this, we take a few data items (such as images) and feed them to our model. After going through our model, we compare the corresponding targets using our loss function. The score we get tells us how wrong our predictions were, and we will change the weights a little bit to make it slightly better.\n",
"To do this, we will compare the outputs the model gives us with our targets (we have labelled data, so we know what result the model should give) using a *loss function*, which returns a number that needs to be as low as possible. Our weights need to be improved. To do this, we take a few data items (such as images) that and feed them to our model. After going through our model, we compare the corresponding targets using our loss function. The score we get tells us how wrong our predictions were, and we will change the weights a little bit to make it slightly better.\n",
"\n",
"To find how to change the weights to make the loss a bit better, we use calculus to calculate the *gradient*. (Actually, we let PyTorch do it for us!) Let's imagine you are lost in the mountains with your car parked at the lowest point. To find your way, you might wander in a random direction but that probably won't help much. Since you know your vehicle is at the lowest point, you would be better to go downhill. By always taking a step in the direction of the steepest downward slope, you should eventually arrive at your destination. We use the magnitude of the gradient (i.e., the steepness of the slope) to tell us how big a step to take; specifically, we multiply the gradient by a number we choose called the *learning rate* to decide on the step size."
]
@ -5760,18 +5760,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,

View File

@ -1723,7 +1723,7 @@
" self.i_h = nn.Embedding(vocab_sz, n_hidden)\n",
" self.rnn = nn.LSTM(n_hidden, n_hidden, n_layers, batch_first=True)\n",
" self.h_o = nn.Linear(n_hidden, vocab_sz)\n",
" self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(1)]\n",
" self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(2)]\n",
" \n",
" def forward(self, x):\n",
" res,h = self.rnn(self.i_h(x), self.h)\n",
@ -2039,7 +2039,7 @@
" self.drop = nn.Dropout(p)\n",
" self.h_o = nn.Linear(n_hidden, vocab_sz)\n",
" self.h_o.weight = self.i_h.weight\n",
" self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(1)]\n",
" self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(2)]\n",
" \n",
" def forward(self, x):\n",
" raw,h = self.rnn(self.i_h(x), self.h)\n",

View File

@ -1154,7 +1154,7 @@
" self.i_h = nn.Embedding(vocab_sz, n_hidden)\n",
" self.rnn = nn.LSTM(n_hidden, n_hidden, n_layers, batch_first=True)\n",
" self.h_o = nn.Linear(n_hidden, vocab_sz)\n",
" self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(1)]\n",
" self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(2)]\n",
" \n",
" def forward(self, x):\n",
" res,h = self.rnn(self.i_h(x), self.h)\n",
@ -1362,7 +1362,7 @@
" self.drop = nn.Dropout(p)\n",
" self.h_o = nn.Linear(n_hidden, vocab_sz)\n",
" self.h_o.weight = self.i_h.weight\n",
" self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(1)]\n",
" self.h = [torch.zeros(n_layers, bs, n_hidden) for _ in range(2)]\n",
" \n",
" def forward(self, x):\n",
" raw,h = self.rnn(self.i_h(x), self.h)\n",