Update 16_accel_sgd.ipynb
This commit is contained in:
parent
62ac21d085
commit
c96681f486
@ -30,7 +30,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You now know how to create state-of-the-art architectures for computer vision, natural image processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we're done, right? Not quite yet. We still have to explorea little bit more the training process.\n",
|
||||
"You now know how to create state-of-the-art architectures for computer vision, natural image processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we're done, right? Not quite yet. We still have to explore a little bit more the training process.\n",
|
||||
"\n",
|
||||
"We explained in <<chapter_mnist_basics>> the basis of stochastic gradient descent: pass a mini-batch to the model, compare it to our target with the loss function, then compute the gradients of this loss function with regard to each weight before updating the weights with the formula:\n",
|
||||
"\n",
|
||||
@ -456,7 +456,7 @@
|
||||
"\n",
|
||||
"Here `beta` is some number we choose which defines how much momentum to use. If `beta` is 0, then the first equation becomes `weight.avg = weight.grad`, so we end up with plain SGD. But if it's a number close to 1, then the main direction chosen is an average of the previous steps. (If you have done a bit of statistics, you may recognize in the first equation an *exponentially weighted moving average*, which is very often used to denoise data and get the underlying tendency.)\n",
|
||||
"\n",
|
||||
"Note that we are writing `weight.avg` to highlight the fact that we need to store the moving averages for each parameter of the model (they all their own independent moving averages).\n",
|
||||
"Note that we are writing `weight.avg` to highlight the fact that we need to store the moving averages for each parameter of the model (they all have their own independent moving averages).\n",
|
||||
"\n",
|
||||
"<<img_momentum>> shows an example of noisy data for a single parameter, with the momentum curve plotted in red, and the gradients of the parameter plotted in blue. The gradients increase, then decrease, and the momentum does a good job of following the general trend without getting too influenced by noise."
|
||||
]
|
||||
@ -532,7 +532,7 @@
|
||||
"#hide_input\n",
|
||||
"#id img_betas\n",
|
||||
"#caption Momentum with different beta values\n",
|
||||
"#alt Graph showing how the beta value imfluence momentum\n",
|
||||
"#alt Graph showing how the beta value influences momentum\n",
|
||||
"x = np.linspace(-4, 4, 100)\n",
|
||||
"y = 1 - (x/3) ** 2\n",
|
||||
"x1 = x + np.random.randn(100) * 0.1\n",
|
||||
@ -1107,7 +1107,7 @@
|
||||
"- `model`:: The model used for training/validation.\n",
|
||||
"- `data`:: The underlying `DataLoaders`.\n",
|
||||
"- `loss_func`:: The loss function used.\n",
|
||||
"- `opt`:: The optimizer used to udpate the model parameters.\n",
|
||||
"- `opt`:: The optimizer used to update the model parameters.\n",
|
||||
"- `opt_func`:: The function used to create the optimizer.\n",
|
||||
"- `cbs`:: The list containing all the `Callback`s.\n",
|
||||
"- `dl`:: The current `DataLoader` used for iteration.\n",
|
||||
|
Loading…
Reference in New Issue
Block a user