This commit is contained in:
Jeremy Howard 2020-02-29 11:55:03 -08:00
parent 6eb60c897e
commit 931cc6935b
3 changed files with 418 additions and 186 deletions

1
.gitignore vendored Normal file
View File

@ -0,0 +1 @@
.ipynb_checkpoints/

File diff suppressed because one or more lines are too long

View File

@ -29,7 +29,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The five lines of code we've seen are just one small part of the process of using deep learning in practice. In this section, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically: we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, learn about how to create datasets, look at possible gotchas when using deep learning in practice, and more."
"The five lines of code we've seen in <<chaptter_intro>> are just one small part of the process of using deep learning in practice. In this chapter, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically: we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, learn about how to create datasets, look at possible gotchas when using deep learning in practice, and more. Let's start with how you should frame your problem.\n",
"\n",
"TK: the next section title seems a bit inadequate, let's double check"
]
},
{
@ -48,117 +50,6 @@
"The best thing to do is to keep an open mind. If you remain open to the possibility that deep learning might solve part of your problem with less data or complexity than you expect, then it is possible to design a process where you can find the specific capabilities and constraints related to your particular problem as you work through the process. This doesn't mean making any risky bets — we will show you how you can gradually roll out models so that they don't create significant risks, and can even backtest them prior to putting them in production."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The state of deep learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In general, here is a summary of the state of deep learning is at the start of 2020. However, things move very fast, and by the time you read this some of these constraints may no longer exist. We will try to keep the book website up-to-date; in addition, a Google search for \"what can AI do now\" there is likely to provide some up-to-date information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Computer vision**: there are many domains in which deep learning has not been used to analyse images yet, but those where it has been tried have nearly universally shown that computers can recognise what items are in an image at least as well as people can — even specially trained people, such as radiologists. This is known as *object recognition*. Deep learning is also good at recognizing whereabouts objects in an image are, and can highlight their location and name each found object. This is known as *object detection* (there is also a variant of this we saw in <<chapter_intro>>, where every pixel is categorized based on what kind of object it is part of--this is called *segmentation*). Deep learning algorithms are generally not good at recognizing images that are significantly different in structure or style to those used to train the model. For instance, if there were no black-and-white images in the training data, the model may well do poorly on black-and-white images. If the training data did not contain hand-drawn images then the model will probably do poorly on hand-drawn images. There is no general way to check what types of image are missing in your training set, but we will show in this chapter some ways to try to recognize when unexpected image types arise in the data when the model is being used in production (this is known as checking for *out of domain* data).\n",
"\n",
"One major challenge for object detection systems is that image labelling can be slow and expensive. There is a lot of work at the moment going into tools to try to make this labelling faster and more easy, and require less handcrafted labels to train accurate object detection models. One approach which is particularly helpful is to synthetically generate variations of input images, such as by rotating them, or changing their brightness and contrast; this is called *data augmentation* and also works well for text and other types of model. We will be discussing it in detail in this chapter.\n",
"\n",
"Another point to consider is that although your problem might not look like a computer vision problem, it might be possible with a little imagination to turn it into one. For instance, if what you are trying to classify is sounds, you might try converting the sounds into images of their acoustic waveforms and then training a model on those images."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Text (natural language processing)**: just like in computer vision, computers are very good at categorising both short and long documents based on categories such as spam, sentiment, author, source website, and so forth. We are not aware of any rigorous work done in this area to compare to human performance, but anecdotally it seems to us that deep learning performance is similar to human performance here. Deep learning is also very good at generating context-appropriate text, such as generating replies to social media posts, and imitating a particular author's style. It is also good at making this content compelling to humans, and has been shown to be even more compelling than human-generated text. However, deep learning is currently not good at generating *correct* responses! We don't currently have a reliable way to, for instance, combine a knowledge base of medical information, along with a deep learning model for generating medically correct natural language responses. This is very dangerous, because it is so easy to create content which appears to a layman to be compelling, but actually is entirely incorrect.\n",
"\n",
"Another concern is that context-appropriate, highly compelling responses on social media can be used at massive scale — thousands of times greater than any troll farm previously seen — to spread disinformation, create unrest, and encourage conflict. As a rule of thumb, text generation will always be technologically a bit ahead of the ability of models to recognize automatically generated text. For instance, as we will see in this book, it is possible to use a model that can recognize artificially generated content to actually improve the generator that creates that content, until the classification model is no longer able to complete its task.\n",
"\n",
"Despite these issues, deep learning can be used to translate text from one language to another, summarize long documents into something which can be digested more quickly, find all mentions of a concept of interest, and so forth. Unfortunately, the translation or summary could well include completely incorrect information! However, it is already good enough that many people are using the systems — for instance Google's online translation system (and every other online service we are aware of) is based on deep learning."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Combining text and images**: the ability of deep learning to combine text and images into a single model is, generally, far better than most people intuitively expect. For example, a deep learning model can be trained on input images, and output captions written in English, and can learn to generate surprisingly appropriate captions automatically for new images! But again, we have the same warning that we discussed in the previous section: there is no guarantee that these captions will actually be correct.\n",
"\n",
"Because of this serious issue we generally recommend that deep learning be used not as a entirely automated process, but as part of a process in which the model and a human user interact closely. This can potentially make humans orders of magnitude more productive than they would be with entirely manual methods, and actually result in more accurate processes than using a human alone. For instance, an automatic system can be used to identify potential strokes directly from CT scans, send a high priority alert to have potential/scans looked at quickly. There is only a three-hour window to treat strokes, so this fast feedback loop could save lives. At the same time, however, all scans could continue to be sent to radiologists in the usual way, so there would be no reduction in human input. Other deep learning models could automatically measure items seen on the scan, and insert those measurements into report, warn the radiologist about findings that they may have missed, and tell the radiologist about other cases which might be relevant."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Tabular**: for analysing timeseries and tabular data, deep learning has recently been making great strides. However, deep learning is generally used as part of a ensemble of multiple types of model. If you already have a system that is using random forests or gradient boosting machines (popular tabular modelling tools that we will learn about soon) then switching to, or adding, deep learning may not result in any dramatic improvement. Deep learning does greatly increase the variety of columns that you can include, for example columns containing natural language (e.g. book titles, reviews, etc), and *high cardinality categorical* columns (i.e. something that contains a large number of discrete choices, such as zip code or product id). On the downside, deep learning models generally take longer to train than random forests or gradient boosting machines, although this is changing thanks to libraries such as [RAPIDS](https://rapids.ai/), which provides GPU acceleration for the whole modeling pipeline. We cover the pros and cons of all these methods in detail in <<chapter_tabular>> in this book."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Recommendation systems**: Recommendation systems are really just a special type of tabular data. In particular, they generally have a high cardinality categorical variable representing users, and another one representing products (or something similar). A company like Amazon represents every purchase that has ever been made as a giant sparse matrix, with customers as the rows and products as the columns. Once they have the data in this format, data scientists apply some form of collaborative filtering to *fill in the matrix*. For example, if customer A buys products 1 and 10, and customer B buys products 1, 2, 4, and 10, the engine will recommend that A buy 2 and 4. Because deep learning models are good at handling high cardinality categorical variables they are quite good at handling recommendation systems. They particularly come into their own, just like for tabular data, when combining these variables with other kinds of data, such as natural language, or images. They can also do a good job of combining all of these types of information additional meta data represented as tables, such as user information, previous transactions, and so forth.\n",
"\n",
"However, nearly all machine learning approaches have the downside that the only tell you what products a particular user might like, rather than what recommendations would be helpful for a user. Many kinds of recommendations for products a user might like may not be at all helpful, for instance, if the user is already familiar with its products, or if they are simply different packagings of products they have already purchased (such as a boxed set of novels, where they already have each of the items in that set). Jeremy likes reading books by Terry Pratchett, and for a while Amazon was recommending nothing but Terry Pratchett books to him, which really wasn't helpful because he already was aware of these books!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Terry Pratchett books recommendation\" caption=\"A not-so-useful recommendation\" id=\"pratchett\" src=\"images/pratchett.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Other data types**: Often you will find that domain-specific data types fit very nicely into existing categories. For instance, protein chains look a lot like natural language documents, in that they are long sequences of discrete tokens with complex relationships and meaning throughout the sequence. And indeed, it does turn out the using NLP deep learning methods is the current state of the art approach for many types of protein analysis. As another example: sounds can be represented as spectrograms, which can be treated as images; standard deep learning approaches for images turn out to work really well on spectrograms."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Drivetrain approach"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many accurate models that are of no use to anyone, and many inaccurate models that are highly useful. To ensure that your modeling work is useful in practice, you need to consider how your work will be used. In 2012 Jeremy, along with Margit Zwemer and Mike Loukides, introduced a method called *The Drivetrain Approach* for thinking about this issue, which we will summarize here. For more information, see the full article on oreilly.com [Designing Great Data Products](https://www.oreilly.com/radar/drivetrain-approach-data-products/).\n",
"\n",
"Consider a model in an autonomous vehicle, you want to help a car drive safely from point A to point B without human intervention. Great predictive modeling is an important part of the solution, but it doesn't stand on its own; as products become more sophisticated, it disappears into the plumbing. Someone using a self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work. But as data scientists build increasingly sophisticated products, they need a systematic design approach.\n",
"\n",
"We use data not just to generate more data (in the form of predictions), but to produce *actionable outcomes*. That is the goal of the Drivetrain Approach. Start by defining a clear **objective**. For instance, Google, when creating their first search engine, considered \"What is the users main objective in typing in a search query?\", and their answer was \"show the most relevant search result\". The next step is to consider what **levers** you can pull (i.e. what actions could you take) to better achieve that objective. In Google's case, that was the ranking of the search results. The third step was to consider what new **data** they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose. Only after these first three steps do we begin thinking about building the predictive **models**. Our objective and available levers, what data we already have and what additional data we will need to collect, determine the models we can build. The models will take both the levers and any uncontrollable variables as their inputs; the outputs from the models can be combined to predict the final state for our objective."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/drivetrain-approach.png\" id=\"drivetrain\" caption=\"The Drivetrain approach\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's consider another example: recommendation systems. The **objective** of a recommendation engine is to drive additional sales by surprising and delighting the customer with recommendations of items they would not have purchased without the recommendation. The **lever** is the ranking of the recommendations. New **data** must be collected to generate recommendations that will *cause new sales*. This will require conducting many randomized experiments in order to collect data about a wide range of recommendations for a wide range of customers. This is a step that few organizations take; but without it, you don't have the information you need to actually optimize recommendations based on your true objective (more sales!)\n",
"\n",
"Finally, you could build two **models** for purchase probabilities, conditional on seeing or not seeing a recommendation. The difference between these two probabilities is a utility function for a given recommendation to a customer. It will be low in cases where the algorithm recommends a familiar book that the customer has already rejected (both components are small) or a book that he or she would have bought even without the recommendation (both components are large and cancel each other out).\n",
"\n",
"As you can see, in practice often the practical implementation of your model will require a lot more than just training a model! You'll often need to run experiments to collect more data, and consider how to incorporate your models into the overall system you're developing."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -198,7 +89,160 @@
"\n",
"Sometimes, you have to get a bit creative. Maybe you can find some previous machine learning project, such as a Kaggle competition, that is related to your field of interest. Sometimes, you have to compromize. Maybe you can't find the exact data you need for the precise project you have in mind; but you might be able to find something from a similar domain, or measured in a different way, tackling a slightly different problem. Working on these kinds of similar projects will still give you a good understanding of the overall process, and may help you identify other shortcuts, data sources, and so forth.\n",
"\n",
"Especially when you are just starting out with deep learning it's not a good idea to branch out into very different areas to places that deep learning has not been applied to before. That's because if your model does not work at first, you will not know whether it is because you have made a mistake, or if the very problem you are trying to solve is simply not solvable with deep learning. And you won't know where to look to get help. Therefore, it is best at first to start with something where you can find an example online of somebody who has had good results with something that is at least somewhat similar to what you are trying to achieve, or where you can convert your data into a format similar what someone else has used before (such as creating an image from your data). Have a look at the *state of deep learning* earlier in this chapter for a reminder of what kinds of things deep learning is good at right now."
"Especially when you are just starting out with deep learning it's not a good idea to branch out into very different areas to places that deep learning has not been applied to before. That's because if your model does not work at first, you will not know whether it is because you have made a mistake, or if the very problem you are trying to solve is simply not solvable with deep learning. And you won't know where to look to get help. Therefore, it is best at first to start with something where you can find an example online of somebody who has had good results with something that is at least somewhat similar to what you are trying to achieve, or where you can convert your data into a format similar what someone else has used before (such as creating an image from your data). Let's have a look at the state of deep learning, jsut so you know what kinds of things deep learning is good at right now."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The state of deep learning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First things first, let's make sure that deep learning cn be any good at the problem you are considering. In general, here is a summary of the state of deep learning is at the start of 2020. However, things move very fast, and by the time you read this some of these constraints may no longer exist. We will try to keep the book website up-to-date; in addition, a Google search for \"what can AI do now\" there is likely to provide some up-to-date information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Computer vision"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many domains in which deep learning has not been used to analyse images yet, but those where it has been tried have nearly universally shown that computers can recognise what items are in an image at least as well as people can — even specially trained people, such as radiologists. This is known as *object recognition*. Deep learning is also good at recognizing whereabouts objects in an image are, and can highlight their location and name each found object. This is known as *object detection* (there is also a variant of this we saw in <<chapter_intro>>, where every pixel is categorized based on what kind of object it is part of--this is called *segmentation*). Deep learning algorithms are generally not good at recognizing images that are significantly different in structure or style to those used to train the model. For instance, if there were no black-and-white images in the training data, the model may well do poorly on black-and-white images. If the training data did not contain hand-drawn images then the model will probably do poorly on hand-drawn images. There is no general way to check what types of image are missing in your training set, but we will show in this chapter some ways to try to recognize when unexpected image types arise in the data when the model is being used in production (this is known as checking for *out of domain* data).\n",
"\n",
"One major challenge for object detection systems is that image labelling can be slow and expensive. There is a lot of work at the moment going into tools to try to make this labelling faster and more easy, and require less handcrafted labels to train accurate object detection models. One approach which is particularly helpful is to synthetically generate variations of input images, such as by rotating them, or changing their brightness and contrast; this is called *data augmentation* and also works well for text and other types of model. We will be discussing it in detail in this chapter.\n",
"\n",
"Another point to consider is that although your problem might not look like a computer vision problem, it might be possible with a little imagination to turn it into one. For instance, if what you are trying to classify is sounds, you might try converting the sounds into images of their acoustic waveforms and then training a model on those images."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Text (natural language processing)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just like in computer vision, computers are very good at categorising both short and long documents based on categories such as spam, sentiment, author, source website, and so forth. We are not aware of any rigourous work done in this area to compare to human performance, but anecdotally it seems to us that deep learning performance is similar to human performance here. Deep learning is also very good at generating context-appropriate text, such as generating replies to social media posts, and imitating a particular author's style. It is also good at making this content compelling to humans, and has been shown to be even more compelling than human-generated text. However, deep learning is currently not good at generating *correct* responses! We don't currently have a reliable way to, for instance, combine a knowledge base of medical information, along with a deep learning model for generating medically correct natural language responses. This is very dangerous, because it is so easy to create content which appears to a layman to be compelling, but actually is entirely incorrect.\n",
"\n",
"Another concern is that context-appropriate, highly compelling responses on social media can be used at massive scale — thousands of times greater than any troll farm previously seen — to spread disinformation, create unrest, and encourage conflict. As a rule of thumb, text generation will always be technologically a bit ahead of the ability of models to recognize automatically generated text. For instance, it is possible to use a model that can recognize artificially generated content to actually improve the generator that creates that content, until the classification model is no longer able to complete its task.\n",
"\n",
"Despite these issues, deep learning can be used to translate text from one language to another, summarize long documents into something which can be digested more quickly, find all mentions of a concept of interest, and so forth. Unfortunately, the translation or summary could well include completely incorrect information! However, it is already good enough that many people are using the systems — for instance Google's online translation system (and every other online service we are aware of) is based on deep learning."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Combining text and images"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ability of deep learning to combine text and images into a single model is, generally, far better than most people intuitively expect. For example, a deep learning model can be trained on input images, and output captions written in English, and can learn to generate surprisingly appropriate captions automatically for new images! But again, we have the same warning that we discussed in the previous section: there is no guarantee that these captions will actually be correct.\n",
"\n",
"Because of this serious issue we generally recommend that deep learning be used not as a entirely automated process, but as part of a process in which the model and a human user interact closely. This can potentially make humans orders of magnitude more productive than they would be with entirely manual methods, and actually result in more accurate processes than using a human alone. For instance, an automatic system can be used to identify potential strokes directly from CT scans, send a high priority alert to have potential/scans looked at quickly. There is only a three-hour window to treat strokes, so this fast feedback loop could save lives. At the same time, however, all scans could continue to be sent to radiologists in the usual way, so there would be no reduction in human input. Other deep learning models could automatically measure items seen on the scan, and insert those measurements into report, warn the radiologist about findings that they may have missed, and tell the radiologist about other cases which might be relevant."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Tabular data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For analysing timeseries and tabular data, deep learning has recently been making great strides. However, deep learning is generally used as part of a ensemble of multiple types of model. If you already have a system that is using random forests or gradient boosting machines (popular tabular modelling tools that we will learn about soon) then switching to, or adding, deep learning may not result in any dramatic improvement. Deep learning does greatly increase the variety of columns that you can include, for example columns containing natural language (e.g. book titles, reviews, etc), and *high cardinality categorical* columns (i.e. something that contains a large number of discrete choices, such as zip code or product id). On the downside, deep learning models generally take longer to train than random forests or gradient boosting machines, although this is changing thanks to libraries such as [RAPIDS](https://rapids.ai/), which provides GPU acceleration for the whole modeling pipeline. We cover the pros and cons of all these methods in detail in <<chapter_tabular>> in this book."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Recommendation systems"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Recommendation systems are really just a special type of tabular data. In particular, they generally have a high cardinality categorical variable representing users, and another one representing products (or something similar). A company like Amazon represents every purchase that has ever been made as a giant sparse matrix, with customers as the rows and products as the columns. Once they have the data in this format, data scientists apply some form of collaborative filtering to *fill in the matrix*. For example, if customer A buys products 1 and 10, and customer B buys products 1, 2, 4, and 10, the engine will recommend that A buy 2 and 4. Because deep learning models are good at handling high cardinality categorical variables they are quite good at handling recommendation systems. They particularly come into their own, just like for tabular data, when combining these variables with other kinds of data, such as natural language, or images. They can also do a good job of combining all of these types of information with additional meta data represented as tables, such as user information, previous transactions, and so forth.\n",
"\n",
"However, nearly all machine learning approaches have the downside that they only tell you what products a particular user might like, rather than what recommendations would be helpful for a user. Many kinds of recommendations for products a user might like may not be at all helpful, for instance, if the user is already familiar with its products, or if they are simply different packagings of products they have already purchased (such as a boxed set of novels, where they already have each of the items in that set). Jeremy likes reading books by Terry Pratchett, and for a while Amazon was recommending nothing but Terry Pratchett books to him (see <<pratchett>>), which really wasn't helpful because he already was aware of these books!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Terry Pratchett books recommendation\" caption=\"A not-so-useful recommendation\" id=\"pratchett\" src=\"images/pratchett.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Other data types**: Often you will find that domain-specific data types fit very nicely into existing categories. For instance, protein chains look a lot like natural language documents, in that they are long sequences of discrete tokens with complex relationships and meaning throughout the sequence. And indeed, it does turn out that using NLP deep learning methods is the current state of the art approach for many types of protein analysis. As another example: sounds can be represented as spectrograms, which can be treated as images; standard deep learning approaches for images turn out to work really well on spectrograms."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Jeremy: please add a transition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Drivetrain approach"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many accurate models that are of no use to anyone, and many inaccurate models that are highly useful. To ensure that your modeling work is useful in practice, you need to consider how your work will be used. In 2012 Jeremy, along with Margit Zwemer and Mike Loukides, introduced a method called *The Drivetrain Approach* for thinking about this issue, which we will summarize here, and illustrate in <<drivetrain>>. For more information, see the full article on oreilly.com [Designing Great Data Products](https://www.oreilly.com/radar/drivetrain-approach-data-products/).\n",
"\n",
"Consider a model in an autonomous vehicle, you want to help a car drive safely from point A to point B without human intervention. Great predictive modeling is an important part of the solution, but it doesn't stand on its own; as products become more sophisticated, it disappears into the plumbing. Someone using a self-driving car is completely unaware of the hundreds (if not thousands) of models and the petabytes of data that make it work. But as data scientists build increasingly sophisticated products, they need a systematic design approach.\n",
"\n",
"We use data not just to generate more data (in the form of predictions), but to produce *actionable outcomes*. That is the goal of the Drivetrain Approach. Start by defining a clear **objective**. For instance, Google, when creating their first search engine, considered \"What is the users main objective in typing in a search query?\", and their answer was \"show the most relevant search result\". The next step is to consider what **levers** you can pull (i.e. what actions could you take) to better achieve that objective. In Google's case, that was the ranking of the search results. The third step was to consider what new **data** they would need to produce such a ranking; they realized that the implicit information regarding which pages linked to which other pages could be used for this purpose. Only after these first three steps do we begin thinking about building the predictive **models**. Our objective and available levers, what data we already have and what additional data we will need to collect, determine the models we can build. The models will take both the levers and any uncontrollable variables as their inputs; the outputs from the models can be combined to predict the final state for our objective."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"images/drivetrain-approach.png\" id=\"drivetrain\" caption=\"The Drivetrain approach\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's consider another example: recommendation systems. The **objective** of a recommendation engine is to drive additional sales by surprising and delighting the customer with recommendations of items they would not have purchased without the recommendation. The **lever** is the ranking of the recommendations. New **data** must be collected to generate recommendations that will *cause new sales*. This will require conducting many randomized experiments in order to collect data about a wide range of recommendations for a wide range of customers. This is a step that few organizations take; but without it, you don't have the information you need to actually optimize recommendations based on your true objective (more sales!)\n",
"\n",
"Finally, you could build two **models** for purchase probabilities, conditional on seeing or not seeing a recommendation. The difference between these two probabilities is a utility function for a given recommendation to a customer. It will be low in cases where the algorithm recommends a familiar book that the customer has already rejected (both components are small) or a book that he or she would have bought even without the recommendation (both components are large and cancel each other out).\n",
"\n",
"As you can see, in practice often the practical implementation of your model will require a lot more than just training a model! You'll often need to run experiments to collect more data, and consider how to incorporate your models into the overall system you're developing. Speaking of data, let's now focus on how to find find data for your project."
]
},
{
@ -483,7 +527,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"One thing to be aware of in this process: as we discussed in <<chapter_intro>>, models can only reflect the data used to train them. And the world is full of biased data, which ends up reflected in, for example, Bing Image Search (which we used to create our dataset). For instance, let's say you were interested in creating an app which could help users figure out whether they had healthy skin, so you trained a model on the results of searches for (say) *healthy skin*. Here's the results you would get:"
"One thing to be aware of in this process: as we discussed in <<chapter_intro>>, models can only reflect the data used to train them. And the world is full of biased data, which ends up reflected in, for example, Bing Image Search (which we used to create our dataset). For instance, let's say you were interested in creating an app which could help users figure out whether they had healthy skin, so you trained a model on the results of searches for (say) *healthy skin*. <<healthy_skin>> shows you the results you would get."
]
},
{
@ -500,6 +544,13 @@
"So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data. (Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have downloaded some data, we need to assemble it in a format suitable for model training. In fastai, that means creating an object called `DataLoaders`."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -697,14 +748,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data augmentation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All of these approaches seem somewhat wasteful, or problematic. If we squished or stretch the images then the end up unrealistic shapes, leading to a model that learns that things look different to how they actually are, which we would expect to result in lower accuracy. If we crop the images then we remove some of the features that allow us to recognize them. For instance, if we were trying to recognise the breed of dog or cat, we may end up cropping out a key part of the body or the face necessary to distinguish between similar breeds. If we pat the images then we have a whole lot of empty space, which is just wasted computation for our model, and results in a lower effective resolution for the part of the image we actually use.\n",
"All of these approaches seem somewhat wasteful, or problematic. If we squished or stretch the images then the end up unrealistic shapes, leading to a model that learns that things look different to how they actually are, which we would expect to result in lower accuracy. If we crop the images then we remove some of the features that allow us to recognize them. For instance, if we were trying to recognise the breed of dog or cat, we may end up cropping out a key part of the body or the face necessary to distinguish between similar breeds. If we pad the images then we have a whole lot of empty space, which is just wasted computation for our model, and results in a lower effective resolution for the part of the image we actually use.\n",
"\n",
"Instead, what we normally do in practice is to randomly select part of the image, and crop to just that part. On each epoch (which is one complete pass through all of our images in the dataset) we randomly select a different part of each image. This means that our model can learn to focus on, and recognize, different features in our images. It also reflects how images work in the real world; different photos of the same thing may be framed in slightly different ways.\n",
"\n",
@ -749,7 +793,21 @@
"source": [
"In fact, an entirely untrained neural network knows nothing whatsoever about how images behave. It doesn't even recognise that when an object is moved one pixel to the left, then it still is a picture of the same thing! So actually training the neural network with examples of images that are in slightly different places, and slightly different sizes, helps it to understand the basic concept of what a *object* is, and how it can be represented in an image.\n",
"\n",
"This is a specific example of a more general technique, called *data augmentation*. Data augmentation refers to creating random variations of our input data, such that they appear a different, but are not expected to change the meaning of the data. Examples of common data augmentation for images are rotation, flipping, perspective warping, brightness changes, contrast changes, and much more. For natural photo images such as the ones we are using here, there is a standard set of augmentations which we have found work pretty well, and are provided with the get transforms function. Because the images are now all the same size, we can apply these augmentation is to an entire batch of the time using the GPU, which will save a lot of time. To tell fastai we want to use these transforms to a batch, we use the `batch_tfms` parameter. (Note that's we're not using `RandomResizedCrop` in this example, so you can see the differences more clearly; we're also using double the amount of augmentation compared to the default, for the same reason)."
"This is a specific example of a more general technique, called *data augmentation*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data augmentation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data augmentation refers to creating random variations of our input data, such that they appear a different, but are not expected to change the meaning of the data. Examples of common data augmentation for images are rotation, flipping, perspective warping, brightness changes, contrast changes, and much more. For natural photo images such as the ones we are using here, there is a standard set of augmentations which we have found work pretty well, and are provided with the get transforms function. Because the images are now all the same size, we can apply these augmentations to an entire batch of them using the GPU, which will save a lot of time. To tell fastai we want to use these transforms to a batch, we use the `batch_tfms` parameter. (Note that's we're not using `RandomResizedCrop` in this example, so you can see the differences more clearly; we're also using double the amount of augmentation compared to the default, for the same reason)."
]
},
{
@ -777,6 +835,13 @@
"dls.train.show_batch(max_n=8, rows=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have assembled our data in a format fit for model training, let's actually train an image classifier using it."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -788,7 +853,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll use `RandomResizedCrop` and default `aug_transforms` for our model, and an image size of 224px, which is fairly standard for image classification."
"Time to use the same lined of codes as in <<chapter_intro>> to train our bear classifier.\n",
"\n",
"We don't have a lot of data for our pblem (150 pictures of each sort of bear at most), so to train our model, we'll use `RandomResizedCrop` and default `aug_transforms` for our model, on an image size of 224px, which is fairly standard for image classification."
]
},
{
@ -1067,6 +1134,13 @@
"> note: After cleaning the dataset using the above steps, we generally are seeing 100% accuracy on this task. We even see that result when we download a lot less images than the 150 per class we're using here. As you can see, the common complaint *you need massive amounts of data to do deep learning* can be a very long way from the truth!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have trained our model, let's see how we can deploy it to be used in practice."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1074,6 +1148,13 @@
"## Turning your model into an online application"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now going to look at what it takes to take this model and turn it into a working online application. We will just go as far as creating a basic working prototype; we do not have the scope in this book to teach you all the details of web application development generally."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1085,8 +1166,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now going to look at what it takes to take this model and turn it into a working online application. We will just go as far as creating a basic working prototype; we do not have the scope in this book to teach you all the details of web application development generally.\n",
"\n",
"Once you've got a model you're happy with, you need to save it, so that you can then copy it over to a server where you'll use it in production. Do you remember exactly what a model is? It consists of two parts: the *architecture*, and the trained *parameters*. The easiest way to save a model is to save both of these, because that way when you load a model you can be sure that you have the matching architecture and parameters. To save both parts, use the `export` method.\n",
"\n",
"This method even saves the definition of how to create your `DataLoaders`. This is important, because otherwise you would have to redefine how to transform your data in order to use your model in production. When you call export, fastai will save a file called `export.pkl`."
@ -1218,6 +1297,13 @@
"We can see here that if we index into the vocab with the integer returned by `predict` then we get back \"grizzly\", as expected. Also, note that if we index into the list of probabilities, we see a nearly 1.00 probability that this is a grizzly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We know how to make predictions from our saved model, so we have everything we need to start building our app. We can do it directly in a Jupyter Notenook."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1500,6 +1586,13 @@
"<img alt=\"The whole widget\" width=\"233\" src=\"images/att_00011.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have written all the code necessary for our app. The next step is to convert it in something we can deploy."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1522,7 +1615,9 @@
"\n",
"Voila runs Jupyter notebooks, just like the Jupyter notebook server you are using now does, except that it does something very important: it removes all of the cell inputs, and only shows output (including ipywidgets), along with your markdown cells. So what's left is a web application! To view your notebook as a voila web application replace the word \"notebooks\" in your browser's URL with: \"voila/render\". You will see the same content as your notebook, but without any of the code cells.\n",
"\n",
"Of course, you don't need to use Voila or ipywidgets. Your model is just a function you can call: `pred,pred_idx,probs = learn.predict(img)` . So you can use it with any framework, hosted on any platform. And you can take something you've prototyped in ipywidgets and Voila and later convert it into a regular web application. We're showing you this approach in the book because we think it's a great way for data scientists and other folks that aren't web development experts to create applications from their models."
"Of course, you don't need to use Voila or ipywidgets. Your model is just a function you can call: `pred,pred_idx,probs = learn.predict(img)` . So you can use it with any framework, hosted on any platform. And you can take something you've prototyped in ipywidgets and Voila and later convert it into a regular web application. We're showing you this approach in the book because we think it's a great way for data scientists and other folks that aren't web development experts to create applications from their models.\n",
"\n",
"We have our app, now let's deploy it!"
]
},
{
@ -1553,7 +1648,7 @@
"For at least the initial prototype of your application, and for any hobby projects that you want to show off, you can easily host them for free. The best place and the best way to do this will vary over time so check the book website for the most up-to-date recommendations. As we're writing this book in 2020 the simplest (and free!) approach is called [Binder](https://mybinder.org/). To publish your web app on Binder, you follow these steps:\n",
"\n",
"1. Add your notebook to a [GitHub repository](http://github.com/), \n",
"2. Paste the URL of that repo in the URL field of Binder, \n",
"2. Paste the URL of that repo in the URL field of Binder as shown in <<deploy-binder>>, \n",
"3. Change the \"File\" dropdown to instead select \"URL\",\n",
"4. In the Path field, enter `/voila/render/name.ipynb` (replacing `name.ipynb` as appropriate for your notebook):\n",
"5. Click the \"Copy the URL\" button and paste it somewhere safe. \n",
@ -1595,7 +1690,9 @@
"source": [
"> A: I've had a chance to see up close how the mobile ML landscape is changing in my work. We offer an iPhone app that depends on computer vision and for years we ran our own computer vision models in the cloud. This was the only way to do it then since those models needed significant memory and compute resources and took minutes to process. This approach required building not only the models (fun!) but infrastructure to ensure a certain number of \"compute worker machines\" was absolutely always running (scary), that more machines would automatically come online if traffic increased, that there was stable storage for large inputs and outputs, that the iOS app could know and tell the user how their job was doing, etc... Nowadays, Apple provides APIs for converting models to run efficiently on device and most iOS devices have dedicated ML hardware, so we run our new models on device. So, in a few years that strategy has gone from impossible to possible but it's still not easy. In our case it's worth it, for a faster user experiene and to worry less about servers. What works for you will depend, realistically, on the user experience you're trying to create and what you personally find it easy to do. If you really know how to run servers, do it. If you really know how to build native mobile apps, do that. There are many roads up the hill.\n",
"\n",
"Overall, we'd recommend using a simple CPU-based server approach where possible, for as long as you can get away with it. If you're lucky enough to have a very successful application, then you'll be able to justify the investment in more complex deployment approaches at that time."
"Overall, we'd recommend using a simple CPU-based server approach where possible, for as long as you can get away with it. If you're lucky enough to have a very successful application, then you'll be able to justify the investment in more complex deployment approaches at that time.\n",
"\n",
"Congratulations, you have succesfully built a deep learning model and deployed it! Now is a good time to take a pause and think about what could go wrong."
]
},
{
@ -1609,7 +1706,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In practice, a deep learning model will be just one piece of a much bigger system. As we discussed at the start of this chapter, a *data product* requires thinking about the entire end to end process within which or model lives.\n",
"In practice, a deep learning model will be just one piece of a much bigger system. As we discussed at the start of this chapter, a *data product* requires thinking about the entire end to end process within which our model lives.\n",
"\n",
"One of the biggest issues with this is that understanding and testing the behavior of a deep learning model is much more difficult than most code that you would write. With normal software development you can analyse the exact steps that the software is taking, and carefully study with of these steps match the desired behaviour that you are trying to create. But with a neural network the behavior emerges from the models attempt to match the training data, rather than being exactly defined.\n",
"\n",
@ -1632,7 +1729,7 @@
"\n",
"There are other reasons we need to be careful too. One very common problem is *domain shift*; this is where the type of data that our model sees changes over time. For instance, an insurance company may use a deep learning model as part of their pricing and risk algorithm, but over time the type of customers that they attract, and the type of risks that they represent, may change so much that the original training data is no longer relevant.\n",
"\n",
"Out of domain data, and domain shift, are examples of the problem that you can never fully no the entire behaviour of your neural network. They have far too many parameters to be able to analytically understand all of their possible behaviours. This is the natural downside of the thing that they're so good at — their flexibility in being able to solve complex problems where we may not even be able to fully specify our preferred solution approaches. The good news, however, is that there are ways to mitigate these risks using a carefully thought out process. The details of this will vary depending on the details of the problem you are solving, but we will attempt to lay out here a high-level approach which we hope will provide useful guidance."
"Out of domain data, and domain shift, are examples of the problem that you can never fully know the entire behaviour of your neural network. They have far too many parameters to be able to analytically understand all of their possible behaviours. This is the natural downside of the thing that they're so good at — their flexibility in being able to solve complex problems where we may not even be able to fully specify our preferred solution approaches. The good news, however, is that there are ways to mitigate these risks using a carefully thought out process. The details of this will vary depending on the details of the problem you are solving, but we will attempt to lay out here a high-level approach summarized in <<deploy_process>> which we hope will provide useful guidance."
]
},
{
@ -1660,6 +1757,13 @@
"> j: I started a company 20 years ago called *Optimal Decisions* which used machine learning and optimisation to help giant insurance companies set their pricing, impacting tens of billions of dollars of risks. We used the approaches described above to manage the potential downsides of something that might go wrong. Also, before we worked with our clients to put anything in production, we tried to simulate the impact by testing the end to end system on their previous year's data. It was always quite a nerve-wracking process, putting these new algorithms in production, but every rollout was successful."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you analyze the results while deploying your model progressively, you should check for the following unexpected behaviors."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1680,6 +1784,13 @@
"Such a thought exercise might help you to construct a more careful rollout plan, ongoing monitoring systems, and human oversight. Of course, human oversight isn't useful if it isn't listened to; so make sure that there are reliable and resilient communication channels so that the right people will be aware of issues, and will have the power to fix them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulations, you have finished your first deep learning project! To help with understanding the material, we really recommend you start writing about what you learned."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1785,6 +1896,31 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,