Grammar/word fixes in chapters 1, 4, 5, 6, 9, 11 (#348)
* Update 04_mnist_basics.ipynb Change 'turn to be' to 'turn out to be' * Update 04_mnist_basics.ipynb Change 'linear layout' to 'linear layer' * Update 04_mnist_basics.ipynb add comma * Update 05_pet_breeds.ipynb change 'using to' to 'using it to' * Update 05_pet_breeds.ipynb Change 'probem' to 'problem' * Update 05_pet_breeds.ipynb Change "train the model all" to "train the model at all" * Fix typo Change "same people appears" to "same people appear" * Fix typo Change 'tfrom' to 'from' * Make text match code Text said 2e-2 but code has 1e-2 * Fix redundant words Change "on your about your" to "on your" * Fix grammar Change "inputs images" to "input images" * grammar fix Changed "updating weight" to "updating weights" * Fix grammar * Fix grammar * Fix newline added mistakenly
This commit is contained in:
parent
1cf5630742
commit
896ff67eed
@ -1307,7 +1307,7 @@
|
||||
"\n",
|
||||
"What we would like is some kind of function that is so flexible that it could be used to solve any given problem, just by varying its weights. Amazingly enough, this function actually exists! It's the neural network, which we already discussed. That is, if you regard a neural network as a mathematical function, it turns out to be a function which is extremely flexible depending on its weights. A mathematical proof called the *universal approximation theorem* shows that this function can solve any problem to any level of accuracy, in theory. The fact that neural networks are so flexible means that, in practice, they are often a suitable kind of model, and you can focus your effort on the process of training them—that is, of finding good weight assignments.\n",
|
||||
"\n",
|
||||
"But what about that process? One could imagine that you might need to find a new \"mechanism\" for automatically updating weight for every problem. This would be laborious. What we'd like here as well is a completely general way to update the weights of a neural network, to make it improve at any given task. Conveniently, this also exists!\n",
|
||||
"But what about that process? One could imagine that you might need to find a new \"mechanism\" for automatically updating weights for every problem. This would be laborious. What we'd like here as well is a completely general way to update the weights of a neural network, to make it improve at any given task. Conveniently, this also exists!\n",
|
||||
"\n",
|
||||
"This is called *stochastic gradient descent* (SGD). We'll see how neural networks and SGD work in detail in <<chapter_mnist_basics>>, as well as explaining the universal approximation theorem. For now, however, we will instead use Samuel's own words: *We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programmed would \"learn\" from its experience.*"
|
||||
]
|
||||
|
@ -5202,14 +5202,14 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> J: There is an enormous amount of jargon in deep learning, including terms like _rectified linear unit_. The vast vast majority of this jargon is no more complicated than can be implemented in a short line of code, as we saw in this example. The reality is that for academics to get their papers published they need to make them sound as impressive and sophisticated as possible. One of the ways that they do that is to introduce jargon. Unfortunately, this has the result that the field ends up becoming far more intimidating and difficult to get into than it should be. You do have to learn the jargon, because otherwise papers and tutorials are not going to mean much to you. But that doesn't mean you have to find the jargon intimidating. Just remember, when you come across a word or phrase that you haven't seen before, it will almost certainly turn to be referring to a very simple concept."
|
||||
"> J: There is an enormous amount of jargon in deep learning, including terms like _rectified linear unit_. The vast vast majority of this jargon is no more complicated than can be implemented in a short line of code, as we saw in this example. The reality is that for academics to get their papers published they need to make them sound as impressive and sophisticated as possible. One of the ways that they do that is to introduce jargon. Unfortunately, this has the result that the field ends up becoming far more intimidating and difficult to get into than it should be. You do have to learn the jargon, because otherwise papers and tutorials are not going to mean much to you. But that doesn't mean you have to find the jargon intimidating. Just remember, when you come across a word or phrase that you haven't seen before, it will almost certainly turn out to be referring to a very simple concept."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The basic idea is that by using more linear layers, we can have our model do more computation, and therefore model more complex functions. But there's no point just putting one linear layout directly after another one, because when we multiply things together and then add them up multiple times, that could be replaced by multiplying different things together and adding them up just once! That is to say, a series of any number of linear layers in a row can be replaced with a single linear layer with a different set of parameters.\n",
|
||||
"The basic idea is that by using more linear layers, we can have our model do more computation, and therefore model more complex functions. But there's no point just putting one linear layer directly after another one, because when we multiply things together and then add them up multiple times, that could be replaced by multiplying different things together and adding them up just once! That is to say, a series of any number of linear layers in a row can be replaced with a single linear layer with a different set of parameters.\n",
|
||||
"\n",
|
||||
"But if we put a nonlinear function between them, such as `max`, then this is no longer true. Now each linear layer is actually somewhat decoupled from the other ones, and can do its own useful work. The `max` function is particularly interesting, because it operates as a simple `if` statement."
|
||||
]
|
||||
@ -5644,7 +5644,7 @@
|
||||
"1. A function that can solve any problem to any level of accuracy (the neural network) given the correct set of parameters\n",
|
||||
"1. A way to find the best set of parameters for any function (stochastic gradient descent)\n",
|
||||
"\n",
|
||||
"This is why deep learning can do things which seem rather magical such fantastic things. Believing that this combination of simple techniques can really solve any problem is one of the biggest steps that we find many students have to take. It seems too good to be true—surely things should be more difficult and complicated than this? Our recommendation: try it out! We just tried it on the MNIST dataset and you have seen the results. And since we are doing everything from scratch ourselves (except for calculating the gradients) you know that there is no special magic hiding behind the scenes."
|
||||
"This is why deep learning can do things which seem rather magical, such fantastic things. Believing that this combination of simple techniques can really solve any problem is one of the biggest steps that we find many students have to take. It seems too good to be true—surely things should be more difficult and complicated than this? Our recommendation: try it out! We just tried it on the MNIST dataset and you have seen the results. And since we are doing everything from scratch ourselves (except for calculating the gradients) you know that there is no special magic hiding behind the scenes."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -530,7 +530,7 @@
|
||||
"source": [
|
||||
"You can see exactly how we gathered the data and split it, how we went from a filename to a *sample* (the tuple (image, category)), then what item transforms were applied and how it failed to collate those samples in a batch (because of the different shapes). \n",
|
||||
"\n",
|
||||
"Once you think your data looks right, we generally recommend the next step should be using to train a simple model. We often see people put off the training of an actual model for far too long. As a result, they don't actually find out what their baseline results look like. Perhaps your probem doesn't need lots of fancy domain-specific engineering. Or perhaps the data doesn't seem to train the model all. These are things that you want to know as soon as possible. For this initial test, we'll use the same simple model that we used in <<chapter_intro>>:"
|
||||
"Once you think your data looks right, we generally recommend the next step should be using it to train a simple model. We often see people put off the training of an actual model for far too long. As a result, they don't actually find out what their baseline results look like. Perhaps your problem doesn't need lots of fancy domain-specific engineering. Or perhaps the data doesn't seem to train the model at all. These are things that you want to know as soon as possible. For this initial test, we'll use the same simple model that we used in <<chapter_intro>>:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -1638,9 +1638,9 @@
|
||||
"source": [
|
||||
"We can pass this function to `DataBlock` as `get_y`, since it is responsible for labeling each item. We'll resize the images to half their input size, just to speed up training a bit.\n",
|
||||
"\n",
|
||||
"One important point to note is that we should not just use a random splitter. The reason for this is that the same people appears in multiple images in this dataset, but we want to ensure that our model can generalize to people that it hasn't seen yet. Each folder in the dataset contains the images for one person. Therefore, we can create a splitter function that returns true for just one person, resulting in a validation set containing just that person's images.\n",
|
||||
"One important point to note is that we should not just use a random splitter. The reason for this is that the same people appear in multiple images in this dataset, but we want to ensure that our model can generalize to people that it hasn't seen yet. Each folder in the dataset contains the images for one person. Therefore, we can create a splitter function that returns true for just one person, resulting in a validation set containing just that person's images.\n",
|
||||
"\n",
|
||||
"The only other difference tfrom the previous data block examples is that the second block is a `PointBlock`. This is necessary so that fastai knows that the labels represent coordinates; that way, it knows that when doing data augmentation, it should do the same augmentation to these coordinates as it does to the images:"
|
||||
"The only other difference from the previous data block examples is that the second block is a `PointBlock`. This is necessary so that fastai knows that the labels represent coordinates; that way, it knows that when doing data augmentation, it should do the same augmentation to these coordinates as it does to the images:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1934,7 +1934,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We'll try an LR of 2e-2:"
|
||||
"We'll try an LR of 1e-2:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -2106,7 +2106,7 @@
|
||||
"source": [
|
||||
"In problems that are at first glance completely different (single-label classification, multi-label classification, and regression), we end up using the same model with just different numbers of outputs. The loss function is the one thing that changes, which is why it's important to double-check that you are using the right loss function for your problem.\n",
|
||||
"\n",
|
||||
"fastai will automatically try to pick the right one from the data you built, but if you are using pure PyTorch to build your `DataLoader`s, make sure you think hard when you have to decide on your about your choice of loss function, and remember that you most probably want:\n",
|
||||
"fastai will automatically try to pick the right one from the data you built, but if you are using pure PyTorch to build your `DataLoader`s, make sure you think hard when you have to decide on your choice of loss function, and remember that you most probably want:\n",
|
||||
"\n",
|
||||
"- `nn.CrossEntropyLoss` for single-label classification\n",
|
||||
"- `nn.BCEWithLogitsLoss` for multi-label classification\n",
|
||||
@ -2140,7 +2140,7 @@
|
||||
"1. When is it okay to tune a hyperparameter on the validation set?\n",
|
||||
"1. How is `y_range` implemented in fastai? (See if you can implement it yourself and test it without peeking!)\n",
|
||||
"1. What is a regression problem? What loss function should you use for such a problem?\n",
|
||||
"1. What do you need to do to make sure the fastai library applies the same data augmentation to your inputs images and your target point coordinates?"
|
||||
"1. What do you need to do to make sure the fastai library applies the same data augmentation to your input images and your target point coordinates?"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -168,7 +168,7 @@
|
||||
"\n",
|
||||
"Another benefit is that we can combine our continuous embedding values with truly continuous input data in a straightforward manner: we just concatenate the variables, and feed the concatenation into our first dense layer. In other words, the raw categorical data is transformed by an embedding layer before it interacts with the raw continuous input data. This is how fastai and Guo and Berkhahn handle tabular models containing continuous and categorical variables.\n",
|
||||
"\n",
|
||||
"An example using this concatenation approach is how Google does it recommendations on Google Play, as explained in the paper [\"Wide & Deep Learning for Recommender Systems\"](https://arxiv.org/abs/1606.07792). <<google_recsys>> illustrates."
|
||||
"An example using this concatenation approach is how Google does its recommendations on Google Play, as explained in the paper [\"Wide & Deep Learning for Recommender Systems\"](https://arxiv.org/abs/1606.07792). <<google_recsys>> illustrates."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -4490,7 +4490,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This shows a chart of the distribution of the data for each split point. We can clearly see that there's a problem with our `YearMade` data: there are bulldozers made in the year 1000, apparently! Presumably this is actually just a missing value code (a value that doesn't otherwise appear in the data and that is used as a placeholder in cases where a value is missing). For modeling purposes, 1000 is fine, but as you can see this outlier makes visualization the values we are interested in more difficult. So, let's replace it with 1950:"
|
||||
"This shows a chart of the distribution of the data for each split point. We can clearly see that there's a problem with our `YearMade` data: there are bulldozers made in the year 1000, apparently! Presumably this is actually just a missing value code (a value that doesn't otherwise appear in the data and that is used as a placeholder in cases where a value is missing). For modeling purposes, 1000 is fine, but as you can see this outlier makes visualization of the values we are interested in more difficult. So, let's replace it with 1950:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -7736,7 +7736,7 @@
|
||||
"source": [
|
||||
"One of the most important properties of random forests is that they aren't very sensitive to the hyperparameter choices, such as `max_features`. You can set `n_estimators` to as high a number as you have time to train—the more trees you have, the more accurate the model will be. `max_samples` can often be left at its default, unless you have over 200,000 data points, in which case setting it to 200,000 will make it train faster with little impact on accuracy. `max_features=0.5` and `min_samples_leaf=4` both tend to work well, although sklearn's defaults work well too.\n",
|
||||
"\n",
|
||||
"The sklearn docs [show an example](http://scikit-learn.org/stable/auto_examples/ensemble/plot_ensemble_oob.html) of the effects different `max_features` choices, with increasing numbers of trees. In the plot, the blue plot line uses the fewest features and the green line uses the most (it uses all the features). As you can see in <<max_features>>, the models with the lowest error result from using a subset of features but with a larger number of trees."
|
||||
"The sklearn docs [show an example](http://scikit-learn.org/stable/auto_examples/ensemble/plot_ensemble_oob.html) of the effects of different `max_features` choices, with increasing numbers of trees. In the plot, the blue plot line uses the fewest features and the green line uses the most (it uses all the features). As you can see in <<max_features>>, the models with the lowest error result from using a subset of features but with a larger number of trees."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -8009,7 +8009,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It's not normally enough to just to know that a model can make accurate predictions—we also want to know *how* it's making predictions. *feature importance* gives us insight into this. We can get these directly from sklearn's random forest by looking in the `feature_importances_` attribute. Here's a simple function we can use to pop them into a DataFrame and sort them:"
|
||||
"It's not normally enough just to know that a model can make accurate predictions—we also want to know *how* it's making predictions. *feature importance* gives us insight into this. We can get these directly from sklearn's random forest by looking in the `feature_importances_` attribute. Here's a simple function we can use to pop them into a DataFrame and sort them:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -9049,7 +9049,7 @@
|
||||
"source": [
|
||||
"We have a big problem! Our predictions outside of the domain that our training data covered are all too low. Why do you suppose this is?\n",
|
||||
"\n",
|
||||
"Remember, a random forest just averages the predictions of a number of trees. And a tree simply predicts the average value of the rows in a leaf. Therefore, a tree and a random forest can never predict values outside of the range of the training data. This is particularly problematic for data where there is a trend over time, such as inflation, and you wish to make predictions for a future time.. Your predictions will be systematically too low.\n",
|
||||
"Remember, a random forest just averages the predictions of a number of trees. And a tree simply predicts the average value of the rows in a leaf. Therefore, a tree and a random forest can never predict values outside of the range of the training data. This is particularly problematic for data where there is a trend over time, such as inflation, and you wish to make predictions for a future time. Your predictions will be systematically too low.\n",
|
||||
"\n",
|
||||
"But the problem extends beyond time variables. Random forests are not able to extrapolate outside of the types of data they have seen, in a more general sense. That's why we need to make sure our validation set does not contain out-of-domain data."
|
||||
]
|
||||
@ -9836,7 +9836,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In fact, this result is better than any score shown on the Kaggle leaderboard. It's not directly comparable, however, because the Kaggle leaderboard uses a separate dataset that we do not have access to. Kaggle does not allow us to submit to this old competition to find out how we would done, but our results certainly look very encouraging!"
|
||||
"In fact, this result is better than any score shown on the Kaggle leaderboard. It's not directly comparable, however, because the Kaggle leaderboard uses a separate dataset that we do not have access to. Kaggle does not allow us to submit to this old competition to find out how we would have done, but our results certainly look very encouraging!"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -914,7 +914,7 @@
|
||||
"source": [
|
||||
"But now, you know how to customize every single piece of it!\n",
|
||||
"\n",
|
||||
"Let's practice what we just learned on this mid-level API for data preprocessing about using a computer vision example now."
|
||||
"Let's practice what we just learned about this mid-level API for data preprocessing, using a computer vision example now."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1152,7 +1152,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the mid-level API for data collection we have two objects that can help us apply transforms on a set of items, `TfmdLists` and `Datasets`. If you remember what we have just seen, one applies a `Pipeline` of transforms and the other applies several `Pipeline` of transforms in parallel, to build tuples. Here, our main transform already builds the tuples, so we use `TfmdLists`:"
|
||||
"In the mid-level API for data collection we have two objects that can help us apply transforms on a set of items, `TfmdLists` and `Datasets`. If you remember what we have just seen, one applies a `Pipeline` of transforms and the other applies several `Pipeline`s of transforms in parallel, to build tuples. Here, our main transform already builds the tuples, so we use `TfmdLists`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -1307,4 +1307,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user