Merge pull request #135 from philtrade/master

[11_midlevel_data] Suggested edits for clarity and brevity.
This commit is contained in:
Sylvain Gugger 2020-04-23 08:34:08 -04:00 committed by GitHub
commit 787ceaf44c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -340,7 +340,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If you need either `setup` or `decode`, you will need to subclass `Transform`. When writing this subclass, you need to implement the actual function in `encodes`, then (optionally), the setup behavior in `setups` and the decoding behavior in `decodes`:"
"If you need either `setup` or `decode`, you will need to subclass `Transform` to implement the actual encoding behavior in `encodes`, then (optionally), the setup behavior in `setups` and the decoding behavior in `decodes`:"
]
},
{
@ -647,7 +647,7 @@
"source": [
"If you have manually written a `Transform` that returns your whole data (input and target) from the raw items you had, then `TfmdLists` is the class you need. You can directly convert it to a `DataLoaders` object with the `dataloaders` method. This is what we will do in our Siamese example further in this chapter.\n",
"\n",
"In general though, you have two (or more) parallel pipelines of transforms: one for processing your raw items into inputs and one to process your raw items into targets. For instance, here, the pipeline we defined only processes the input. If we want to do text classification, we have to process the labels as well. \n",
"In general though, you have two (or more) parallel pipelines of transforms: one for processing your raw items into inputs and one to process your raw items into targets. For instance, here, the pipeline we defined only processes the raw text into inputs. If we want to do text classification, we have to process the labels as well, into targets. \n",
"\n",
"Here we need to do two things: first take the label name from the parent folder. There is a function `parent_label` for this:"
]
@ -876,7 +876,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The two differences with what we had above is the use of `GrandParentSplitter` to split our training and validation data, and the `dl_type` argument. This is to tell `dataloaders` to use the `SortedDL` class of `DataLoader`, and not the usual one. This is the class that will handle the construction of batches by putting samples of roughly the same lengths into batches.\n",
"The two differences with what we had above is the use of `GrandParentSplitter` to split our training and validation data, and the `dl_type` argument. This is to tell `dataloaders` to use the `SortedDL` class of `DataLoader`, and not the usual one. `SortedDL` constructs batches by putting samples of roughly the same lengths into batches.\n",
"\n",
"This does the exact same thing as our `DataBlock` from above:"
]