Update 10_nlp.ipynb (#291)

fix typo, changed "preforms" by "it performs"
This commit is contained in:
ricardocalleja 2020-10-12 05:59:57 -03:00 committed by GitHub
parent 69548f0e11
commit 717535fe3b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1320,7 +1320,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"One thing that's different to previous types we've used in `DataBlock` is that we're not just using the class directly (i.e., `TextBlock(...)`, but instead are calling a *class method*. A class method is a Python method that, as the name suggests, belongs to a *class* rather than an *object*. (Be sure to search online for more information about class methods if you're not familiar with them, since they're commonly used in many Python libraries and applications; we've used them a few times previously in the book, but haven't called attention to them.) The reason that `TextBlock` is special is that setting up the numericalizer's vocab can take a long time (we have to read and tokenize every document to get the vocab). To be as efficient as possible preforms a few optimizations: \n",
"One thing that's different to previous types we've used in `DataBlock` is that we're not just using the class directly (i.e., `TextBlock(...)`, but instead are calling a *class method*. A class method is a Python method that, as the name suggests, belongs to a *class* rather than an *object*. (Be sure to search online for more information about class methods if you're not familiar with them, since they're commonly used in many Python libraries and applications; we've used them a few times previously in the book, but haven't called attention to them.) The reason that `TextBlock` is special is that setting up the numericalizer's vocab can take a long time (we have to read and tokenize every document to get the vocab). To be as efficient as possible it performs a few optimizations: \n",
"\n",
"- It saves the tokenized documents in a temporary folder, so it doesn't have to tokenize them more than once\n",
"- It runs multiple tokenization processes in parallel, to take advantage of your computer's CPUs\n",