updates from repo

This commit is contained in:
Jeremy Howard 2020-03-03 06:11:00 -08:00
parent cd65c692a2
commit 284a24325b
15 changed files with 476 additions and 551 deletions

File diff suppressed because one or more lines are too long

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -29,16 +29,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The five lines of code we've seen in <<chaptter_intro>> are just one small part of the process of using deep learning in practice. In this chapter, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically: we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, learn about how to create datasets, look at possible gotchas when using deep learning in practice, and more. Let's start with how you should frame your problem.\n",
"\n",
"TK: the next section title seems a bit inadequate, let's double check"
"The five lines of code we saw in <<chaptter_intro>> are just one small part of the process of using deep learning in practice. In this chapter, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically: we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, learn about how to create datasets, look at possible gotchas when using deep learning in practice, and more. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Picking a problem"
"## The practice of deep learning"
]
},
{
@ -47,7 +45,9 @@
"source": [
"We've seen that deep learning can solve a lot of challenging problems quickly and with little code. However, deep learning isn't magic! We often talk to people who overestimate both the constraints, and the capabilities of deep learning. Both of these can be problems: underestimating the capabilities means that you might not even try things which could be very beneficial; underestimating the constraints might mean that you fail to consider and react to important issues.\n",
"\n",
"The best thing to do is to keep an open mind. If you remain open to the possibility that deep learning might solve part of your problem with less data or complexity than you expect, then it is possible to design a process where you can find the specific capabilities and constraints related to your particular problem as you work through the process. This doesn't mean making any risky bets — we will show you how you can gradually roll out models so that they don't create significant risks, and can even backtest them prior to putting them in production."
"The best thing to do is to keep an open mind. If you remain open to the possibility that deep learning might solve part of your problem with less data or complexity than you expect, then it is possible to design a process where you can find the specific capabilities and constraints related to your particular problem as you work through the process. This doesn't mean making any risky bets — we will show you how you can gradually roll out models so that they don't create significant risks, and can even backtest them prior to putting them in production.\n",
"\n",
"Let's start with how you should frame your problem."
]
},
{
@ -103,7 +103,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"First things first, let's make sure that deep learning cn be any good at the problem you are considering. In general, here is a summary of the state of deep learning is at the start of 2020. However, things move very fast, and by the time you read this some of these constraints may no longer exist. We will try to keep the book website up-to-date; in addition, a Google search for \"what can AI do now\" there is likely to provide some up-to-date information."
"Let's make sure that deep learning can be any good at the problem you are considering. In general, here is a summary of the state of deep learning is at the start of 2020. However, things move very fast, and by the time you read this some of these constraints may no longer exist. We will try to keep the book website up-to-date; in addition, a Google search for \"what can AI do now\" there is likely to provide some up-to-date information."
]
},
{
@ -282,7 +282,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -308,7 +308,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -317,7 +317,7 @@
"<function utils.search_images_bing(key, term, min_sz=128)>"
]
},
"execution_count": 8,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -583,7 +583,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data. (Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.)"
"So with this as your training data, you would end up not with a healthy skin detector, but a *young white woman touching her face* detector! Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data.footnote:[Thanks to Deb Raji, who came up with the *healthy skin* example. See her paper *Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products* for more fascinating insights into model bias.]"
]
},
{
@ -688,7 +688,7 @@
"item_tfms=Resize(128)\n",
"```\n",
"\n",
"Our images are all different sizes, and this is a problem for deep learning: we don't feed the model one image at a time but several (what we call a *mini-batch*) of them. To group them in a big array (usually called *tensor*) that is going to go through our model, they all need to be of the same size. So we need to add a transform which will resize these images to the same size. *item transforms* are pieces of code which run on each individual item, whether it be an image, category, or so forth. fastai includes many predefined transforms; we will use the `Resize` transform here.\n",
"Our images are all different sizes, and this is a problem for deep learning: we don't feed the model one image at a time but several (what we call a *mini-batch*) of them. To group them in a big array (usually called *tensor*) that is going to go through our model, they all need to be of the same size. So we need to add a transform twhich will resize these images to the same size. *item transforms* are pieces of code which run on each individual item, whether it be an image, category, or so forth. fastai includes many predefined transforms; we will use the `Resize` transform here.\n",
"\n",
"This command has given us a `DataBlock` object. This is like a *template* for creating a `DataLoaders`. We still need to tell fastai the actual source of our data — in this case, the path where the images can be found."
]
@ -741,9 +741,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"metadata": {},
"outputs": [
{
"data": {
@ -792,7 +790,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"All of these approaches seem somewhat wasteful, or problematic. If we squished or stretch the images then the end up unrealistic shapes, leading to a model that learns that things look different to how they actually are, which we would expect to result in lower accuracy. If we crop the images then we remove some of the features that allow us to recognize them. For instance, if we were trying to recognise the breed of dog or cat, we may end up cropping out a key part of the body or the face necessary to distinguish between similar breeds. If we pad the images then we have a whole lot of empty space, which is just wasted computation for our model, and results in a lower effective resolution for the part of the image we actually use.\n",
"All of these approaches seem somewhat wasteful, or problematic. If we squished or stretch the images then they end up unrealistic shapes, leading to a model that learns that things look different to how they actually are, which we would expect to result in lower accuracy. If we crop the images then we remove some of the features that allow us to recognize them. For instance, if we were trying to recognise the breed of dog or cat, we may end up cropping out a key part of the body or the face necessary to distinguish between similar breeds. If we pad the images then we have a whole lot of empty space, which is just wasted computation for our model, and results in a lower effective resolution for the part of the image we actually use.\n",
"\n",
"Instead, what we normally do in practice is to randomly select part of the image, and crop to just that part. On each epoch (which is one complete pass through all of our images in the dataset) we randomly select a different part of each image. This means that our model can learn to focus on, and recognize, different features in our images. It also reflects how images work in the real world; different photos of the same thing may be framed in slightly different ways.\n",
"\n",
@ -1061,7 +1059,9 @@
"source": [
"Each row here represents all the black, grizzly, and teddy bears in our dataset, respectively. Each column represents the images which the model predicted as black, grizzly, and teddy bears, respectively. Therefore, the diagonal of the matrix shows the images which were classified correctly, and the other, off diagonal, cells represent those which were classified incorrectly. This is called a *confusion matrix* and is one of the many ways that fastai allows you to view the results of your model. It is (of course!) calculated using the validation set. With the color coding, the goal is to have white everywhere, except the diagonal where we want dark blue. Our bear classifier isn't making many mistakes!\n",
"\n",
"It's helpful to see where exactly our errors are occuring, to see whether it's due to a dataset problem (e.g. images that aren't bears at all, or are labelled incorrectly, etc), or a model problem (e.g. perhaps it isn't handling images taken with unusual lighting, or from a different angle, etc.) To do this, we can sort out images by their *loss*. The *loss* is a number that is higher if the model is incorrect (and especially if it's also confident of its incorrect answer), or if it's correct, but not confident of its correct answer. (We'll learn how loss is calculated later in the book.) `plot_top_losses` shows us the images with the highest loss in our dataset. As the title of the output says, each image is labeled with four things: prediction, actual (target label), loss, and probability. The *probability* here is the confidence level, from zero to one, that the model has assigned to its prediction."
"It's helpful to see where exactly our errors are occuring, to see whether it's due to a dataset problem (e.g. images that aren't bears at all, or are labelled incorrectly, etc), or a model problem (e.g. perhaps it isn't handling images taken with unusual lighting, or from a different angle, etc). To do this, we can sort out images by their *loss*.\n",
"\n",
"The *loss* is a number that is higher if the model is incorrect (and especially if it's also confident of its incorrect answer), or if it's correct, but not confident of its correct answer. In a couple chapters we'll learn in depth how loss is calculated and used in training process. For now, `plot_top_losses` shows us the images with the highest loss in our dataset. As the title of the output says, each image is labeled with four things: prediction, actual (target label), loss, and probability. The *probability* here is the confidence level, from zero to one, that the model has assigned to its prediction."
]
},
{
@ -1147,7 +1147,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Cleaner widget\" width=\"700\" caption=\"Cleaner widget\" id=\"cleaner\" src=\"images/att_00007.png\">"
"<img alt=\"Cleaner widget\" width=\"700\" src=\"images/att_00007.png\">"
]
},
{
@ -1210,7 +1210,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you've got a model you're happy with, you need to save it, so that you can then copy it over to a server where you'll use it in production. Do you remember exactly what a model is? It consists of two parts: the *architecture*, and the trained *parameters*. The easiest way to save a model is to save both of these, because that way when you load a model you can be sure that you have the matching architecture and parameters. To save both parts, use the `export` method.\n",
"Once you've got a model you're happy with, you need to save it, so that you can then copy it over to a server where you'll use it in production. Remember that a model consists of two parts: the *architecture*, and the trained *parameters*. The easiest way to save a model is to save both of these, because that way when you load a model you can be sure that you have the matching architecture and parameters. To save both parts, use the `export` method.\n",
"\n",
"This method even saves the definition of how to create your `DataLoaders`. This is important, because otherwise you would have to redefine how to transform your data in order to use your model in production. When you call export, fastai will save a file called `export.pkl`."
]
@ -1750,7 +1750,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In practice, a deep learning model will be just one piece of a much bigger system. As we discussed at the start of this chapter, a *data product* requires thinking about the entire end to end process within which our model lives.\n",
"In practice, a deep learning model will be just one piece of a much bigger system. As we discussed at the start of this chapter, a *data product* requires thinking about the entire end to end process within which our model lives. In this book, we can't hope to cover all the complexity of managing deployed data products, such as managing multiple versions of models, A/B testing, canarying, refreshing the data (should we just grow and grow our datasets all the time, or should we regularly remove some of the old data), handling data labelling, monitoring all this, detecting model rot, and so forth. However, there is an excellent book that covers many deployment issues, which is [Building Machine Learning Powered Applications](https://www.amazon.com/Building-Machine-Learning-Powered-Applications/dp/149204511X), by Emmanuel Ameisen. In this section, we will give an overview of some of the most important issues to consider.\n",
"\n",
"One of the biggest issues with this is that understanding and testing the behavior of a deep learning model is much more difficult than most code that you would write. With normal software development you can analyse the exact steps that the software is taking, and carefully study with of these steps match the desired behaviour that you are trying to create. But with a neural network the behavior emerges from the models attempt to match the training data, rather than being exactly defined.\n",
"\n",
@ -1801,13 +1801,6 @@
"> j: I started a company 20 years ago called *Optimal Decisions* which used machine learning and optimisation to help giant insurance companies set their pricing, impacting tens of billions of dollars of risks. We used the approaches described above to manage the potential downsides of something that might go wrong. Also, before we worked with our clients to put anything in production, we tried to simulate the impact by testing the end to end system on their previous year's data. It was always quite a nerve-wracking process, putting these new algorithms in production, but every rollout was successful."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you analyze the results while deploying your model progressively, you should check for the following unexpected behaviors."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1821,20 +1814,13 @@
"source": [
"One of the biggest challenges in rolling out a model is that your model may change the behaviour of the system it is a part of. For instance, consider YouTube's recommendation system. A couple of years ago Google talked about how they had introduced reinforcement learning (closely related to deep learning, but where your loss function represents a result which could be a long time after an action occurs) to improve their recommendation system. They described how they used an algorithm which made recommendations such that watch time would be optimised.\n",
"\n",
"However, human beings tend to be drawn towards controversial content. This meant that videos about wings like conspiracy theories started to get recommended more and more by the recommendation system. Furthermore, it turns out that the kinds of people that are interested in conspiracy theories are also people that watch a lot of online videos! So, they started to get drawn more and more towards YouTube. The increasing number of conspiracy theorists watching YouTube resulted in the algorithm recommending more and more conspiracy theories and other extremist content, which resulted in more extremists watching videos on YouTube, and more people watching YouTube developing extremist views, which led to the algorithm recommending more extremist content... The system became so out of control that in February 2019 it led the New York Times to run the headline \"YouTube Unleashed a Conspiracy Theory Boom. Can It Be Contained?\"\n",
"However, human beings tend to be drawn towards controversial content. This meant that videos about wings like conspiracy theories started to get recommended more and more by the recommendation system. Furthermore, it turns out that the kinds of people that are interested in conspiracy theories are also people that watch a lot of online videos! So, they started to get drawn more and more towards YouTube. The increasing number of conspiracy theorists watching YouTube resulted in the algorithm recommending more and more conspiracy theories and other extremist content, which resulted in more extremists watching videos on YouTube, and more people watching YouTube developing extremist views, which led to the algorithm recommending more extremist content... The system became so out of control that in February 2019 it led the New York Times to run the headline \"YouTube Unleashed a Conspiracy Theory Boom. Can It Be Contained?\"footnote:[https://www.nytimes.com/2019/02/19/technology/youtube-conspiracy-stars.html]\n",
"\n",
"A helpful exercise prior to rolling out a significant machine learning system is to consider this question: \"what would happen if it went really, really well?\" In other words, what if the predictive power was extremely high, and its ability to influence behaviour was extremely significant? In that case, who would be most impacted? What would the most extreme results potentially look like? How would you know what was really going on?\n",
"\n",
"Such a thought exercise might help you to construct a more careful rollout plan, ongoing monitoring systems, and human oversight. Of course, human oversight isn't useful if it isn't listened to; so make sure that there are reliable and resilient communication channels so that the right people will be aware of issues, and will have the power to fix them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulations, you have finished your first deep learning project! To help with understanding the material, we really recommend you start writing about what you learned."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1940,31 +1926,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -18,7 +18,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acknowledgement: Dr Rachel Thomas"
"**Acknowledgement: Dr Rachel Thomas**"
]
},
{
@ -28,13 +28,6 @@
"This chapter was co-authored by Dr Rachel Thomas, the co-founder of fast.ai, and founding director of the Center for Applied Data Ethics at the University of San Francisco. It largely follows a subset of her syllabus for the \"Introduction to Data Ethics\" course that she developed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to data ethics"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -73,7 +66,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting started with some examples"
"## Key examples for data ethics"
]
},
{
@ -130,7 +123,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Dr. Latanya Sweeney is a professor at Harvard and director of their data privacy lab. In the paper [Discrimination in Online Ad Delivery](https://arxiv.org/abs/1301.6822) she describes her discovery that googling her name resulted in advertisements saying \"Latanya Sweeney arrested\" even although she is the only Latanya Sweeney and has never been arrested. However when she googled other names, such as Kirsten Lindquist, she got more neutral ads, even though Kirsten Lindquist has been arrested three times."
"Dr. Latanya Sweeney is a professor at Harvard and director of their data privacy lab. In the paper [Discrimination in Online Ad Delivery](https://arxiv.org/abs/1301.6822) (see <<lantanya_arrested>>) she describes her discovery that googling her name resulted in advertisements saying \"Latanya Sweeney arrested\" even although she is the only Latanya Sweeney and has never been arrested. However when she googled other names, such as Kirsten Lindquist, she got more neutral ads, even though Kirsten Lindquist has been arrested three times."
]
},
{
@ -153,7 +146,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## So what?"
"TK Jeremy: \"Why does this matter?\" as an alternative title."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### So what?"
]
},
{
@ -222,7 +222,7 @@
"\n",
"These are not just algorithm questions. They are data product design questions. But the product managers, executives, judges, journalists, doctors… whoever ends up developing and using the system of which your model is a part will not be well-placed to understand the decisions that you made, let alone change them.\n",
"\n",
"For instance, two studies found that Amazons facial recognition software produced [inaccurate](https://www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html) and [racially biased results](https://www.theverge.com/2019/1/25/18197137/amazon-rekognition-facial-recognition-bias-race-gender). Amazon claimed that the researchers should have changed the default parameters. However, it turned out that [Amazon was not instructing police departments](https://gizmodo.com/defense-of-amazons-face-recognition-tool-undermined-by-1832238149) that use its software to do this either. There was, presumably, a big distance between the researchers that developed these algorithms, and the Amazon documentation staff that wrote the guidelines provided to the police. A lack of tight integration led to serious problems for society, the police, and Amazon themselves. It turned out that their system erroneously *matched* 28 members of congress to criminal mugshots! (And these members of congress wrongly matched to criminal mugshots disproportionately included people of color.)"
"For instance, two studies found that Amazons facial recognition software produced [inaccurate](https://www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html) and [racially biased results](https://www.theverge.com/2019/1/25/18197137/amazon-rekognition-facial-recognition-bias-race-gender). Amazon claimed that the researchers should have changed the default parameters, they did not explain how it would change the racially baised results. Further more, it turned out that [Amazon was not instructing police departments](https://gizmodo.com/defense-of-amazons-face-recognition-tool-undermined-by-1832238149) that use its software to do this either. There was, presumably, a big distance between the researchers that developed these algorithms, and the Amazon documentation staff that wrote the guidelines provided to the police. A lack of tight integration led to serious problems for society, the police, and Amazon themselves. It turned out that their system erroneously *matched* 28 members of congress to criminal mugshots! (And these members of congress wrongly matched to criminal mugshots disproportionately included people of color as seen in <<congressmen>>.)"
]
},
{
@ -255,6 +255,7 @@
"metadata": {},
"source": [
"Data ethics is a big field, and we can't cover everything. Instead, we're going to pick a few topics which we think are particularly relevant:\n",
"\n",
"- need for recourse and accountability\n",
"- feedback loops\n",
"- bias\n",
@ -265,14 +266,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Errors and recourse"
"TK Jeremy-Rachel: Explain why those topics are important and transition to errors and recourse."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a complex system it is easy for no one person to feel responsible for outcomes. While this is understandable, it does not lead to good results. In the example above of the Arkansas healthcare system in which a bug led to people with cerebral palsy losing access to needed care, the creator of the algorithm blamed government officials, and government officials could blame those who implemented the software. NYU professor danah boyd described this phenomenon: \"bureaucracy has often been used to evade responsibility, and today's algorithmic systems are extending bureaucracy.\"\n",
"### Errors and recourse"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a complex system it is easy for no one person to feel responsible for outcomes. While this is understandable, it does not lead to good results. In the earlier example of the Arkansas healthcare system in which a bug led to people with cerebral palsy losing access to needed care, the creator of the algorithm blamed government officials, and government officials could blame those who implemented the software. NYU professor danah boyd described this phenomenon: \"bureaucracy has often been used to evade responsibility, and today's algorithmic systems are extending bureaucracy.\"\n",
"\n",
"An additional reason why recourse is so necessary, is because data often contains errors. Mechanisms for audits and error-correction are crucial. A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as “admitting to being gang members”). In this case, there was no process in place for correcting mistakes or removing people once theyve been added. Another example is the US credit report system; in a large-scale study of credit reports by the FTC in 2012, it was found that 26% of consumers had at least one mistake in their files, and 5% had errors that could be devastating. Yet, the process of getting such errors corrected is incredibly slow and opaque. When public-radio reporter Bobby Allyn discovered that he was erroneously listed as having a firearms conviction, it took him \"more than a dozen phone calls, the handiwork of a county court clerk and six weeks to solve the problem. And that was only after I contacted the companys communications department as a journalist.\" (as covered in the article [How the careless errors of credit reporting agencies are ruining peoples lives](https://www.washingtonpost.com/posteverything/wp/2016/09/08/how-the-careless-errors-of-credit-reporting-agencies-are-ruining-peoples-lives/))\n",
"\n",
@ -283,14 +291,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feedback loops"
"### Feedback loops"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The New York Times published another article on YouTube's recommendation system, titled [On YouTubes Digital Playground, an Open Gate for Pedophiles](https://www.nytimes.com/2019/06/03/world/americas/youtube-pedophiles.html). The article started with this chilling story:"
"We have already explained in <<chapter_intro>> how an algorithm can interact with its enviromnent to create a feedback loop, making prediction that reinforces actions taken in the field, which lead to predictions even more pronounced in the same direciton. The New York Times published another article on YouTube's recommendation system, titled [On YouTubes Digital Playground, an Open Gate for Pedophiles](https://www.nytimes.com/2019/06/03/world/americas/youtube-pedophiles.html). The article started with this chilling story:"
]
},
{
@ -312,7 +320,7 @@
"\n",
"Part of the problem here is the centrality of metrics in driving a financially important system. When an algorithm has a metric to optimise, as you have seen, it will do everything it can to optimise that number. This tends to lead to all kinds of edge cases, and humans interacting with a system will search for, find, and exploit these edge cases and feedback loops for their advantage.\n",
"\n",
"There are signs that this is exactly what has happened with YouTube's recommendation system. The Guardian ran an article [How an ex-YouTube insider investigated its secret algorithm](https://www.theguardian.com/technology/2018/feb/02/youtube-algorithm-election-clinton-trump-guillaume-chaslot) about Guillaume Chaslot, an ex-YouTube engineer who created AlgoTransparency, which tracks these issues. Chaslot published this chart, following the release of Robert Mueller's \"Report on the Investigation Into Russian Interference in the 2016 Presidential Election\":"
"There are signs that this is exactly what has happened with YouTube's recommendation system. The Guardian ran an article [How an ex-YouTube insider investigated its secret algorithm](https://www.theguardian.com/technology/2018/feb/02/youtube-algorithm-election-clinton-trump-guillaume-chaslot) about Guillaume Chaslot, an ex-YouTube engineer who created AlgoTransparency, which tracks these issues. Chaslot published the chart in <<ethics_yt_rt>>, following the release of Robert Mueller's \"Report on the Investigation Into Russian Interference in the 2016 Presidential Election\"."
]
},
{
@ -342,6 +350,13 @@
"> : \"once people join a single conspiracy-minded \\[Facebook\\] group, they are algorithmically routed to a plethora of others. Join an anti-vaccine group, and your suggestions will include anti-GMO, chemtrail watch, flat Earther (yes, really), and curing cancer naturally groups. Rather than pulling a user out of the rabbit hole, the recommendation engine pushes them further in.\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is extremely important to keep in mind this kind of behavior can happen, and to either anticipate a feedback loop or take positive action to break it when you can the first signs of it in your own projects. Another thing to keep in mind is bias."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -355,7 +370,7 @@
"source": [
"Discussions of bias online tend to get pretty confusing pretty fast. The word bias mean so many different things. Statisticians often think that when data ethicists are talking about bias that they're talking about the statistical definition of the term bias. But they're not. And they're certainly not talking about the bias is that appear in the weights and bias is which are the parameters of your model!\n",
"\n",
"What they're talking about is the social science concept of bias. In [A Framework for Understanding Unintended Consequences of Machine Learning](https://arxiv.org/abs/1901.10002) MIT's Suresh and Guttag describe six types of bias in machine learning, summarized in this figure from their paper:"
"What they're talking about is the social science concept of bias. In [A Framework for Understanding Unintended Consequences of Machine Learning](https://arxiv.org/abs/1901.10002) MIT's Suresh and Guttag describe six types of bias in machine learning, summarized in <<bias>> from their paper."
]
},
{
@ -365,6 +380,13 @@
"<img src=\"images/ethics/image5.png\" id=\"bias\" caption=\"Bias in machine learning can come from multiple sources\" alt=\"A diagram showing all sources where bias can appear in machine learning\" width=\"650\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Jeremy: \"Why only four? Tell the reader.\" If you have anything interesting to say about that here, otherwise we can ignore."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -454,7 +476,7 @@
"\n",
"One of the MIT researchers, Joy Buolamwini, warned, \"We have entered the age of automation overconfident yet underprepared. If we fail to make ethical and inclusive artificial intelligence, we risk losing gains made in civil rights and gender equity under the guise of machine neutrality\".\n",
"\n",
"Part of the issue appears to be a systematic imbalance in the make up of popular datasets used for training models. The abstract to the paper [No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World](https://arxiv.org/abs/1711.08536) states, \"We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales\". Here is one of the charts from the paper, showing the geographic make up of what was, at the time (and still, as this book is being written), the two most important image datasets for training models:"
"Part of the issue appears to be a systematic imbalance in the make up of popular datasets used for training models. The abstract to the paper [No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World](https://arxiv.org/abs/1711.08536) states, \"We analyze two large, publicly available image data sets to assess geo-diversity and find that these data sets appear to exhibit an observable amerocentric and eurocentric representation bias. Further, we analyze classifiers trained on these data sets to assess the impact of these training distributions and find strong differences in the relative performance on images from different locales\". <<image_provenance>> shows one of the charts from the paper, showing the geographic make up of what was, at the time (and still, as this book is being written), the two most important image datasets for training models."
]
},
{
@ -468,7 +490,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The vast majority of the images are from the United States and other Western countries, leading to models trained on ImageNet performing worse on scenes from other countries and cultures. For instance, [research](https://arxiv.org/pdf/1906.02659.pdf) found that such models are worse at identifying household items (such as soap, spices, sofas, or beds) from lower-income countries. Below is an image from the paper, [Does Object Recognition Work for Everyone?](https://arxiv.org/pdf/1906.02659.pdf)."
"The vast majority of the images are from the United States and other Western countries, leading to models trained on ImageNet performing worse on scenes from other countries and cultures. For instance, [research](https://arxiv.org/pdf/1906.02659.pdf) found that such models are worse at identifying household items (such as soap, spices, sofas, or beds) from lower-income countries. <<object_detect>> shows an image from the paper, [Does Object Recognition Work for Everyone?](https://arxiv.org/pdf/1906.02659.pdf)."
]
},
{
@ -482,7 +504,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As we will discuss shortly, in addition, the vast majority of AI researchers and developers are young white men. Most projects that we have seen do most user testing using friends and families of the immediate product development group. Given this, the kinds of problems we saw above should not be surprising.\n",
"TK Jeremy: \"Tell the reader what the figure shows, what's the takeaway?\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we will discuss shortly, in addition, the vast majority of AI researchers and developers are young white men. Most projects that we have seen do most user testing using friends and families of the immediate product development group. Given this, the kinds of problems we just discussed should not be surprising.\n",
"\n",
"Similar historical bias is found in the texts used as data for natural language processing models. This crops up in downstream machine learning tasks in many ways. For instance, until last year Google Translate showed systematic bias in how it translated the Turkish gender-neutral pronoun \"bir\" into English. For instance, when applied to jobs which are often associated with males, it used \"he\", and when applied to jobs which are often associated with females, it used \"she\":"
]
@ -494,6 +523,13 @@
"<img src=\"images/ethics/image11.png\" width=\"600\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Jeremy: Link to the study needed"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -553,7 +589,7 @@
"source": [
"The abstract of the paper [Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting](https://arxiv.org/abs/1901.09451) notes that there is gender imbalance in occupations (e.g. females are more likely to be nurses, and males are more likely to be pastors), and says that: \"differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances\".\n",
"\n",
"What this is saying is that the researchers noticed that models predicting occupation did not only reflect the actual gender imbalance in the underlying population, but actually amplified it! This is quite common, particularly for simple models. When there is some clear, easy to see underlying relationship, a simple model will often simply assume that that relationship holds all the time. As the show with the paper, for occupations which had a higher percentage of females, the model tended to overestimate the prevalence of that occupation:"
"What this is saying is that the researchers noticed that models predicting occupation did not only reflect the actual gender imbalance in the underlying population, but actually amplified it! This is quite common, particularly for simple models. When there is some clear, easy to see underlying relationship, a simple model will often simply assume that that relationship holds all the time. As <<representation_bias>> from the paper shows, for occupations which had a higher percentage of females, the model tended to overestimate the prevalence of that occupation."
]
},
{
@ -567,14 +603,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, in the training dataset, 14.6% of surgeons were women, yet in the model predictions, only 11.6% of the true positives were women."
"For example, in the training dataset, 14.6% of surgeons were women, yet in the model predictions, only 11.6% of the true positives were women. The model is thus amplifying the bias existing in the training set.\n",
"\n",
"Now that we saw those bias existed, what can we do to mitigate them?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Addressing different types of bias"
"## Addressing different types of bias"
]
},
{
@ -597,10 +635,10 @@
"source": [
"We often hear this question — \"humans are biased, so does algorithmic bias even matter?\" This comes up so often, there must be some reasoning that makes sense to the people that ask it, but it doesn't seem very logically sound to us! Independently of whether this is logically sound, it's important to realise that algorithms and people are different. Machine learning, particularly so. Consider these points about machine learning algorithms:\n",
"\n",
" - *Machine learning can create feedback loops*: small amounts of bias can very rapidly, exponentially increase due to feedback loops\n",
" - *Machine learning can amplify bias*: human bias can lead to larger amounts of machine learning bias\n",
" - *Algorithms & humans are used differently*: human decision makers and algorithmic decision makers are not used in a plug-and-play interchangeable way in practice. For instance, algorithmic decisions are more likely to be implemented at scale and without a process for recourse. Furthermore, people are more likely to mistakenly believe that the result of an algorithm is objective and error-free.\n",
" - *Technology is power*. And with that comes responsibility.\n",
" - _Machine learning can create feedback loops_:: small amounts of bias can very rapidly, exponentially increase due to feedback loops\n",
" - _Machine learning can amplify bias_:: human bias can lead to larger amounts of machine learning bias\n",
" - _Algorithms & humans are used differently_:: human decision makers and algorithmic decision makers are not used in a plug-and-play interchangeable way in practice. For instance, algorithmic decisions are more likely to be implemented at scale and without a process for recourse. Furthermore, people are more likely to mistakenly believe that the result of an algorithm is objective and error-free.\n",
" - _Technology is power_:: And with that comes responsibility.\n",
"\n",
"As the Arkansas healthcare example showed, machine learning is often implemented in practice not because it leads to better outcomes, but because it is cheaper and more efficient. Cathy O'Neill, in her book *Weapons of Math Destruction*, described the pattern of how the privileged are processed by people, the poor are processed by algorithms. This is just one of a number of ways that algorithms are used differently than human decision makers. Others include:\n",
"\n",
@ -614,14 +652,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data contains errors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because data is likely to contain errors, mechanisms for audits and error-correction are important. A database of suspected gang members maintained by California law enforcement officials was found to be full of errors, including 42 babies who had been added to the database when they were less than 1 year old (28 of whom were marked as *admitting to being gang members*). In this case, there was no process in place for correcting mistakes or removing people once theyve been added. Another example is the US credit report system; in a large-scale study of credit reports by the FTC in 2012, it was found that 26% of consumers had at least one mistake in their files, and 5% had errors that could be devastating. Yet, the process of getting such errors corrected is incredibly slow and opaque. When public-radio reporter Bobby Allyn discovered that he was erroneously listed as having a firearms conviction, it took him \"more than a dozen phone calls, the handiwork of a county court clerk and six weeks to solve the problem. And that was only after I contacted the companys communications department as a journalist.\" (as covered in the article [How the careless errors of credit reporting agencies are ruining peoples lives](https://www.washingtonpost.com/posteverything/wp/2016/09/08/how-the-careless-errors-of-credit-reporting-agencies-are-ruining-peoples-lives/))"
"TK Jeremy: Takeaway for readers and transition to disinformation."
]
},
{
@ -662,6 +693,13 @@
"One proposed approach is to develop some form of digital signature, implement it in a seamless way, and to create norms that we should only trust content which has been verified. Head of the Allen Institute on AI, Oren Etzioni, wrote such a proposal in an article titled [How Will We Prevent AI-Based Forgery?](https://hbr.org/2019/03/how-will-we-prevent-ai-based-forgery), \"AI is poised to make high-fidelity forgery inexpensive and automated, leading to potentially disastrous consequences for democracy, security, and society. The specter of AI forgery means that we need to act to make digital signatures de rigueur as a means of authentication of digital content.\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Jeremy: Wrap up section and transition to next. Also change next title to What to do about bla or What to do with foo."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -683,6 +721,13 @@
"- increase diversity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's walk through each step next, staring with analyzing a project you are working on."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -705,6 +750,13 @@
" - How diverse is the team that built it?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Jeremy: Expand--add some additional details and takeaways from the reader. What will they get out of doing this and how should they go about it? Then transition to \"Process to Implement\""
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -726,6 +778,13 @@
" - Who might use this product that we didnt expect to use it, or for purposes we didnt initially intend?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Jeremy: Add takeaways and transition to Ethical Lenses"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -739,11 +798,11 @@
"source": [
"Another useful resource from the Markkula Center is [Conceptual Frameworks in Technology and Engineering Practice](https://www.scu.edu/ethics-in-technology-practice/conceptual-frameworks/). This considers how different foundational ethical lenses can help identify concrete issues, and lays out the following approaches and key questions:\n",
"\n",
" - The Rights Approach: Which option best respects the rights of all who have a stake?\n",
" - The Justice Approach: Which option treats people equally or proportionately?\n",
" - The Utilitarian Approach: Which option will produce the most good and do the least harm?\n",
" - The Common Good Approach: Which option best serves the community as a whole, not just some members?\n",
" - The Virtue Approach: Which option leads me to act as the sort of person I want to be?"
" - The Rights Approach:: Which option best respects the rights of all who have a stake?\n",
" - The Justice Approach:: Which option treats people equally or proportionately?\n",
" - The Utilitarian Approach:: Which option will produce the most good and do the least harm?\n",
" - The Common Good Approach:: Which option best serves the community as a whole, not just some members?\n",
" - The Virtue Approach:: Which option leads me to act as the sort of person I want to be?"
]
},
{
@ -807,6 +866,13 @@
"In philosophy, and especially philosophy of ethics, this is one of the most effective tools: first, come up with a process, definition, set of questions, etc, which is designed to resolve some problem. Then try to come up with an example where that apparent solution results in a proposal that no-one would consider acceptable. This can then lead to a further refinement of the solution."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Jeremy: Add takeaways for the reader and transition to Role of Policy."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -821,16 +887,25 @@
"The ethical issues that arise in the use of automated decision systems, such as machine learning, can be complex and far-reaching. To better address them, we will need thoughtful policy, in addition to the ethical efforts of those in industry. Neither is sufficient on its own.\n",
"\n",
"Policy is the appropriate tool for addressing:\n",
"\n",
"- Negative externalities\n",
"- Misaligned economic incentives\n",
"- “Race to the bottom” situations\n",
"- Enforcing accountability.\n",
"\n",
"Ethical behavior in industry is necessary as well, since:\n",
"\n",
"- Law will not always keep up\n",
"- Edge cases will arise in which practitioners must use their best judgement."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TK Jeremy: Expand this section. What does this mean for the reader? Add transition to The Power of Diversity"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@ -27,6 +27,15 @@
"# Under the hood: training a digit classifier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that weve seen what it looks like to actually train a variety of models, lets now dig under the hood and see exactly what is going on. Well start with computer vision, and will use that to introduce many of the key concepts of deep learning. In future chapters well do deep dives into other applications as well, and well see how to use these insights to both improve our models accuracy, speed up its training, and turn it into a real working web application.\n",
"\n",
"First, let's start by how images are represented in a computer, then we will make our way up to how to classify different type of images."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -38,8 +47,6 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that weve seen what it looks like to actually train a variety of models, lets now dig under the hood and see exactly what is going on. Well start with computer vision, and will use that to introduce many of the key concepts of deep learning. In future chapters well do deep dives into other applications as well, and well see how to use these insights to both improve our models accuracy, speed up its training, and turn it into a real working web application.\n",
"\n",
"In order to understand what happens in a computer vision model, we first have to understand how computers handle images. We'll use one of the most famous datasets in computer vision, [MNIST](https://en.wikipedia.org/wiki/MNIST_database), for our experiments. MNIST contains hand-written digits, collected by the National Institute of Standards and Technology, and collated into a machine learning dataset by Yann Lecun and his colleagues. Lecun used MNIST in 1998 to demonstrate [Lenet 5](http://yann.lecun.com/exdb/lenet/), the first computer system to demonstrate practically useful recognition of hand-written digit sequences. This was one of the most important breakthroughs in the history of AI."
]
},
@ -1750,6 +1757,13 @@
"> j: When I first came across this \"L1\" thingie, I looked it up to see what on Earth it meant, found on Google that it is a _vector norm_ using _absolute value_, so looked up _vector norm_ and started reading: _Given a vector space V over a field F of the real or complex numbers, a norm on V is a nonnegative-valued any function p: V → \\[0,+∞) with the following properties: For all a ∈ F and all u, v ∈ V, p(u + v) ≤ p(u) + p(v)..._ Then I stopped reading. \"Ugh, I'll never understand math!\" I thought, for the thousandth time. Since then I've learned that every time these complex mathy bits of jargon come up in practice, it turns out I can replace them with a tiny bit of code! Like the _L1 loss_ is just equal to `(a-b).abs().mean()`, where `a` and `b` are tensors. I guess mathy folks just think differently to me... I'll make sure, in this book, every time some mathy jargon comes up, I'll give you the little bit of code it's equal to as well, and explain in common sense terms what's going on."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the above code we completed various mathematical operations on *PyTorch tensors*. If you've done some numeric programming in Pytorch before, you may recognize these as being similar to *Numpy arrays*. Let's have a look at those two very important classes."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1761,7 +1775,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In the above code we completed various mathematical operations on *PyTorch tensors*. If you've done some numeric programming in Pytorch before, you may recognize these as being similar to *Numpy arrays*. [Numpy](https://numpy.org/) is the most widely used library for scientific and numeric programming in Python, and provides very similar functionality and a very similar API to that provided by PyTorch; however, it does not support using the GPU, or calculating gradients, which are both critical for deep learning. Therefore, in this book we will generally use PyTorch tensors instead of NumPy arrays, where possible. (Note that fastai adds some features to NumPy and PyTorch to make them a bit more similar to each other; if any code in this book doesn't work on your computer, it's possible that you forgot to include a line at the start of your notebook such as: `from fastai.vision.all import *`.)\n",
"[Numpy](https://numpy.org/) is the most widely used library for scientific and numeric programming in Python, and provides very similar functionality and a very similar API to that provided by PyTorch; however, it does not support using the GPU, or calculating gradients, which are both critical for deep learning. Therefore, in this book we will generally use PyTorch tensors instead of NumPy arrays, where possible. (Note that fastai adds some features to NumPy and PyTorch to make them a bit more similar to each other; if any code in this book doesn't work on your computer, it's possible that you forgot to include a line at the start of your notebook such as: `from fastai.vision.all import *`.)\n",
"\n",
"So, what's an array? And what's a tensor?\n",
"\n",
@ -2012,14 +2026,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Broadcasting and metrics"
"So, is our baseline model any good? To quantify this, we will use a metric."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, is our baseline model any good? To quantify this, we will use a metric. A metric is a number which is calculated from the predictions of our model, and the correct labels in our dataset, and tells us something about how good our model is. For instance, we could use either of the functions we saw in the previous section, mean squared error or mean absolute error, and take the average of them over the whole dataset. However, neither of these are numbers that are very understandable to most people; in practice, we normally use *accuracy* as the metric for classification models.\n",
"## Metrics and broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A metric is a number which is calculated from the predictions of our model, and the correct labels in our dataset, and tells us something about how good our model is. For instance, we could use either of the functions we saw in the previous section, mean squared error or mean absolute error, and take the average of them over the whole dataset. However, neither of these are numbers that are very understandable to most people; in practice, we normally use *accuracy* as the metric for classification models.\n",
"\n",
"As we've discussed, we need to use a *validation set* to calculate our metric. That means we need to do is remove some of the data from training entirely, so it is not seen by the model at all. As it turns out, the creators of the MNIST dataset have already done this for us. Do you remember how there was a whole separate directory called \"valid\"? That's what this directory is for!\n",
"\n",
@ -2300,7 +2321,7 @@
"\n",
"> : _Suppose we arrange for some automatic means of testing the effectiveness of any current weight assignment in terms of actual performance and provide a mechanism for altering the weight assignment so as to maximize the performance. We need not go into the details of such a procedure to see that it could be made entirely automatic and to see that a machine so programed would \"learn\" from its experience._\n",
"\n",
"As we discussed, this is the key to allowing us to have something which can get better and better — to learn. But our pixel similarity approach does not really do this. We do not have any kind of weight assignment, or any way of improving based on testing the effectiveness of a weight assignment. In other words, we can't really improve our pixel similarity approach by modifying a set of parameters. In order to take advantage of the power of deep learning, we will first have to represent our task in the way that Arthur Samuel described it.\n",
"As we discussed, this is the key to allowing us to have something which can get better and better — to learn. But our pixel similarity approach does not really do this. We do not have any kind of weight assignment, or any way of improving based on testing the effectiveness of a weight assignment. In other words, we can't really improve our pixel similarity approach by modifying a set of parameters (which will be the SGD part, as we will see). In order to take advantage of the power of deep learning, we will first have to represent our task in the way that Arthur Samuel described it.\n",
"\n",
"Instead of trying to find the similarity between an image and a \"ideal image\" we could instead look at each individual pixel, and come up with a set of weights for each pixel, such that the highest weights are associated with those pixels most likely to be black for a particular category. For instance, pixels towards the bottom right are not very likely to be activated for a seven, so they should have a low weight for a seven, but are more likely to be activated for an eight, so they should have a high weight for an eight. This can be represented as a function for each possible category, for instance the probability of being the number eight:\n",
"\n",
@ -2448,14 +2469,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"These seven steps are the key to the training of all deep learning models and we'll be using the seven terms in the above diagram throughout this book. That deep learning turns out to rely entirely on these steps is extremely surprising and counter-intuitive. It's amazing that this process can solve such complex problems. But, as you'll see, it really does!\n",
"These seven steps, illustrated in <<gradient_descent>> are the key to the training of all deep learning models and we'll be using the seven terms in the above diagram throughout this book. That deep learning turns out to rely entirely on these steps is extremely surprising and counter-intuitive. It's amazing that this process can solve such complex problems. But, as you'll see, it really does!\n",
"\n",
"There are many different ways to do each of these seven steps, and we will be learning about them throughout the rest of this book. These are the details which make a big difference for deep learning practitioners. But it turns out that the general approach to each one generally follows some basic principles:\n",
"\n",
"- **Initialize**: we initialise the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initialising them to the percentage of times that that pixel is activated for that category. But since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well\n",
"- **Loss**: This is the thing Arthur Samuel refered to: \"*testing the effectiveness of any current weight assignment in terms of actual performance*\". We need some function that will return a number that is small if the performance of the model is good, and vice versa (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention)\n",
"- **Step**: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it. Increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount which works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out which direction, and roughly how much, to change each weight, without having to try all these small changes, by calculating *gradients*. This is just a performance optimisation, we would get exactly the same results by using the slower manual process as well\n",
"- **Stop**: We have already discussed how to choose how many epochs to train a model for. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time."
"- **Initialize**:: we initialise the parameters to random values. This may sound surprising. There are certainly other choices we could make, such as initialising them to the percentage of times that that pixel is activated for that category. But since we already know that we have a routine to improve these weights, it turns out that just starting with random weights works perfectly well\n",
"- **Loss**:: This is the thing Arthur Samuel refered to: \"*testing the effectiveness of any current weight assignment in terms of actual performance*\". We need some function that will return a number that is small if the performance of the model is good, and vice versa (the standard approach is to treat a small loss as good, and a large loss as bad, although this is just a convention)\n",
"- **Step**:: A simple way to figure out whether a weight should be increased a bit, or decreased a bit, would be just to try it. Increase the weight by a small amount, and see if the loss goes up or down. Once you find the correct direction, you could then change that amount by a bit more, and a bit less, until you find an amount which works well. However, this is slow! As we will see, the magic of calculus allows us to directly figure out which direction, and roughly how much, to change each weight, without having to try all these small changes, by calculating *gradients*. This is just a performance optimisation, we would get exactly the same results by using the slower manual process as well\n",
"- **Stop**:: We have already discussed how to choose how many epochs to train a model for. This is where that decision is applied. For our digit classifier, we would keep training until the accuracy of the model started getting worse, or we ran out of time."
]
},
{
@ -2544,7 +2565,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"A graph showing the squared function with the slope at one point\" width=\"400\" caption=\"The slope of a function\" src=\"images/grad_illustration.svg\" id=\"slope\"/>"
"<img alt=\"A graph showing the squared function with the slope at one point\" width=\"400\" src=\"images/grad_illustration.svg\"/>"
]
},
{
@ -2558,7 +2579,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"An illustration of gradient descent\" width=\"400\" caption=\"Gradient descent\" src=\"images/chapter2_perfect.svg\" id=\"descent\"/>"
"<img alt=\"An illustration of gradient descent\" width=\"400\" src=\"images/chapter2_perfect.svg\"/>"
]
},
{
@ -2776,15 +2797,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stepping with a learning rate"
"The gradient only tells us the slope of our function, it doesn't actually tell us how far to adjust the parameters. It gives us some idea of how far to adjust them; if the slope is very large, then that may suggest that we have more adjustments to do, whereas if the slope is very small, that may suggest that we are close to the optimal value."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stepping with a learning rate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The gradient only tells us the slope of our function, it doesn't actually tell us how far to adjust the parameters. It gives us some idea of how far to adjust them; if the slope is very large, then that may suggest that we have more adjustments to do, whereas if the slope is very small, that may suggest that we are close to the optimal value.\n",
"\n",
"Deciding how to change our parameters based on the value of the gradients is an important part of the deep learning process. Nearly all approaches start with the basic idea of multiplying the gradient by some small number, called the *learning rate* (LR). The learning rate is often a number between 0.001 and 0.1, although it could be anything. Often, people select a learning rate just by trying a few, and finding which results in the best model after training (we'll show you a better approach later in this book, called the *learning rate finder*). Once you've picked a learning rate, you can adjust your parameters using this simple function:\n",
"\n",
"```\n",
@ -2793,7 +2819,7 @@
"\n",
"This is known as *stepping* your parameters, using a *optimiser step*.\n",
"\n",
"If you pick a learning rate that's too low, it can mean having to do for a lot of steps:"
"If you pick a learning rate that's too low, it can mean having to do for a lot of steps. <<descent_small>> illustrates that."
]
},
{
@ -2807,7 +2833,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Although picking a learning rate that's too high is even worse--it can actually result in the loss getting *worse*!"
"Although picking a learning rate that's too high is even worse--it can actually result in the loss getting *worse* as we see in <<descent_div>>!"
]
},
{
@ -2821,7 +2847,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If the learning rate is too high, it may also \"bounce\" around, rather than actually diverging; this has the result of taking many steps to train successfully:"
"If the learning rate is too high, it may also \"bounce\" around, rather than actually diverging; <<descent_bouncy>> shows how this has the result of taking many steps to train successfully."
]
},
{
@ -2831,6 +2857,13 @@
"<img alt=\"An illustation of gradient descent with a bouncy LR\" width=\"400\" caption=\"Gradient descent with bouncy LR\" src=\"images/chapter2_bouncy.svg\" id=\"descent_bouncy\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's apply all of this on an end-to-end example."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -3371,7 +3404,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"`torch.where(a,b,c)` is the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. (It's important to learn about PyTorch functions like this, because looping over tensors in Python performs at Python speed, not C/CUDA speed!) Try running `help(torch.where)` now to read the docs for this function, or, better still, look it up on the PyTorch documentation site."
"`torch.where(a,b,c)` is the same as running the list comprehension `[b[i] if a[i] else c[i] for i in range(len(a))]`, except it works on tensors, at C/CUDA speed. \n",
"\n",
"> note: It's important to learn about PyTorch functions like this, because looping over tensors in Python performs at Python speed, not C/CUDA speed!\n",
"\n",
"Try running `help(torch.where)` now to read the docs for this function, or, better still, look it up on the PyTorch documentation site."
]
},
{
@ -3448,6 +3485,13 @@
"mnist_loss(tensor([0.9, 0.4, 0.8]),tgt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One problem with mnist_loss as currently defined is that it assumes that inputs are always between zero and one. We need to ensure, then, that this is actually the case! As it happens, there is a function that does exactly that--it always outputs a number between zero and one and it's called sigmoid."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -3459,7 +3503,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"One problem with `mnist_loss` as currently defined is that it assumes that inputs are always between zero and one. We need to ensure, then, that this is actually the case! As it happens, there is a function that does exactly that--it always outputs a number between one and one. This function is called *sigmoid* and is defined by:"
"The function called *sigmoid* is defined by:"
]
},
{
@ -3531,7 +3575,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stochastic gradient descent and mini-batches"
"### SGD and mini-batches"
]
},
{
@ -3631,6 +3675,13 @@
"list(dl)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now read to write our first training loop for a model using SGD!"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -3642,7 +3693,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In code, our process will be implemented something like this for each epoch:\n",
"it's time to implement the graph we saw in <<gradient_descent>>. In code, our process will be implemented something like this for each epoch:\n",
"\n",
"```python\n",
"for x,y in dl:\n",
@ -3885,7 +3936,7 @@
"source": [
"Whilst we could use a python for loop to calculate the prediction for each image, that would be very slow. Because Python loops don't run on the GPU, and because Python is a slow language for loops in general, we need to represent as much of the computation in a model as possible using higher-level functions.\n",
"\n",
"In this case, there's an extremely convenient mathematical operation that calculates `w*x` for every row of a matrix--it's called *matrix multiplication*. Here's what matrix multiplication looks like (diagram from Wikipedia):"
"In this case, there's an extremely convenient mathematical operation that calculates `w*x` for every row of a matrix--it's called *matrix multiplication*. <<matmul>> show what matrix multiplication looks like (diagram from Wikipedia)."
]
},
{
@ -4090,7 +4141,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Our only remaining step will be to update the weights and bias based on the gradient and learning rate. When we do so, we have to tell PyTorch not to take the gradient of this step too, otherwise things will get very confusing! If we assign to the `data` attribute of a tensor then PyTorch will not take the gradient of that step. Here's our basic training loop for an epoch:"
"Our only remaining step will be to update the weights and bias based on the gradient and learning rate. When we do so, we have to tell PyTorch not to take the gradient of this step too, otherwise things will get very confusing when we try to compute the derivative at the next batch! If we assign to the `data` attribute of a tensor then PyTorch will not take the gradient of that step. Here's our basic training loop for an epoch:"
]
},
{
@ -4274,7 +4325,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking good! We're already about at the same accuracy as our \"pixel similarity\" approach, and we've created a general purpose foundation we can build on."
"Looking good! We're already about at the same accuracy as our \"pixel similarity\" approach, and we've created a general purpose foundation we can build on. Our next step will be to create an object that will handle the SGD step for us. In PyTorch, it's called an *optimizer*."
]
},
{
@ -4288,7 +4339,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Because this is such a useful general foundation, PyTorch provides some useful classes to make it easier to implement. The first we'll use is to replace our `linear()` function with PyTorch's `nn.Linear` *module*. A \"module\" is an object of a class that inherits from the PyTorch `nn.Module` class. Objects of this class behave identically to a standard Python function, in that you can call it using parentheses, and it will return the activations of a model.\n",
"Because this is such a general foundation, PyTorch provides some useful classes to make it easier to implement. The first we'll use is to replace our `linear()` function with PyTorch's `nn.Linear` *module*. A \"module\" is an object of a class that inherits from the PyTorch `nn.Module` class. Objects of this class behave identically to a standard Python function, in that you can call it using parentheses, and it will return the activations of a model.\n",
"\n",
"`nn.Linear` does the same thing as our `init_params` and `linear` together. It contains both the *weights* and *bias* in a single class. Here's how we replicate our model from the previous section:"
]
@ -4649,7 +4700,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"So far we have a general procedure for optimising the parameters of a function, and we have tried it out on a very boring function: a simple linear classifier. A linear classifier is very constrained in terms of what it can do. Let's instead use a neural network. Here is the entire definition of a basic neural network:"
"So far we have a general procedure for optimising the parameters of a function, and we have tried it out on a very boring function: a simple linear classifier. A linear classifier is very constrained in terms of what it can do. To make it a bit more complex (and able to handle more tasks), we need to add a non-linearity between two linear classifiers, and this is what will gived us a neural network.\n",
"\n",
"Here is the entire definition of a basic neural network:"
]
},
{
@ -4692,7 +4745,7 @@
"source": [
"The key point about this is that `w1` has 30 output activations (which means that `w2` must have 30 input activations, so they match). That means that the first layer can construct 30 different features, each representing some different mix of pixels. You can change that `30` to anything you like, to make the model more or less complex.\n",
"\n",
"That little function `res.max(tensor(0.0))` is called a *rectified linear unit*, also known as *ReLU*. I think we can all agree that *rectified linear unit* sounds pretty fancy and complicated... But actually, there's nothing more to it than `res.max(tensor(0.0))`, in other words: replace every negative number with a zero. This tiny function is also available in PyTorch as `F.relu`:"
"That little function `res.max(tensor(0.0))` is called a *rectified linear unit*, also known as *ReLU*. We think we can all agree that *rectified linear unit* sounds pretty fancy and complicated... But actually, there's nothing more to it than `res.max(tensor(0.0))`, in other words: replace every negative number with a zero. This tiny function is also available in PyTorch as `F.relu`:"
]
},
{
@ -4730,8 +4783,20 @@
"source": [
"The basic idea is that by using more linear layers, we can have our model do more computation, and therefore model more complex functions. But there's no point just putting one linear layout directly after another one, because when we multiply things together and then at them up multiple times, that can be replaced by multiplying different things together and adding them up just once! That is to say, a series of any number of linear layers in a row can be replaced with a single linear layer with a different set of parameters.\n",
"\n",
"But if we put a non-linear function between them, such as max, then this is no longer true. Now, each linear layer is actually somewhat decoupled from the other ones, and can do its own useful work. The max function is particularly interesting, because it operates as a simple \"if\" statement. For any arbitrarily wiggly function, we can approximate it as a bunch of lines joined together; to make it more close to the wiggly function, we just have to use shorter lines.\n",
"\n",
"But if we put a non-linear function between them, such as max, then this is no longer true. Now, each linear layer is actually somewhat decoupled from the other ones, and can do its own useful work. The max function is particularly interesting, because it operates as a simple \"if\" statement. For any arbitrarily wiggly function, we can approximate it as a bunch of lines joined together; to make it more close to the wiggly function, we just have to use shorter lines."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> s: Mathematically, we say the composition of two linear functions is another linear function. So we can stack as many linear classifiers on top or each other, without non-linear functions between them, it will jsut be the same as one linear classifier."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Amazingly enough, it can be mathematically proven that this little function can solve any computable problem to an arbitrarily high level of accuracy, if you can find the right parameters for `w1` and `w2`, and if you make these matrices big enough. This is known as the *universal approximation theorem* . The three lines of code that we have here are known as *layers*. The first and third are known as *linear layers*, and the second line of code is known variously as a *nonlinearity*, or *activation function*.\n",
"\n",
"Just like the previous section, we can replace this code with something a bit simpler, by taking advantage of PyTorch:"
@ -5217,7 +5282,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deep learning"
"## Jargon recap"
]
},
{
@ -5250,7 +5315,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### _Choose Your Own Adventure_ reminder"
"#### _Choose Your Own Adventure_ reminder"
]
},
{

View File

@ -2466,6 +2466,31 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -1898,6 +1898,31 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -55,7 +55,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -73,7 +73,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -152,7 +152,7 @@
"4 166 346 1 886397596"
]
},
"execution_count": 3,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -188,7 +188,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -204,7 +204,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -220,7 +220,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -229,7 +229,7 @@
"2.1420000000000003"
]
},
"execution_count": 6,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -261,7 +261,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -277,7 +277,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -286,7 +286,7 @@
"-1.611"
]
},
"execution_count": 8,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -345,7 +345,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -412,7 +412,7 @@
"4 5 Copycat (1995)"
]
},
"execution_count": 9,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -432,7 +432,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -517,7 +517,7 @@
"4 306 242 5 876503793 Kolya (1996)"
]
},
"execution_count": 10,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -536,7 +536,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -637,7 +637,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -660,7 +660,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -669,7 +669,7 @@
"tensor([-0.4586, -0.9915, -0.4052, -0.3621, -0.5908])"
]
},
"execution_count": 13,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -688,7 +688,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -697,7 +697,7 @@
"tensor([-0.4586, -0.9915, -0.4052, -0.3621, -0.5908])"
]
},
"execution_count": 14,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -755,7 +755,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -773,7 +773,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -782,7 +782,7 @@
"'Hello Sylvain, nice to meet you.'"
]
},
"execution_count": 16,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -803,7 +803,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -829,7 +829,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -838,7 +838,7 @@
"torch.Size([64, 2])"
]
},
"execution_count": 18,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -857,7 +857,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -874,7 +874,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -944,7 +944,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -962,7 +962,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1036,7 +1036,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1065,7 +1065,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1153,7 +1153,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": null,
"metadata": {
"hide_input": true
},
@ -1205,7 +1205,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1291,7 +1291,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1300,7 +1300,7 @@
"(#0) []"
]
},
"execution_count": 30,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -1321,7 +1321,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1331,7 +1331,7 @@
"tensor([1., 1., 1.], requires_grad=True)]"
]
},
"execution_count": 32,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -1352,7 +1352,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1364,7 +1364,7 @@
" [ 0.8159]], requires_grad=True)]"
]
},
"execution_count": 37,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -1379,7 +1379,7 @@
},
{
"cell_type": "code",
"execution_count": 41,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1388,7 +1388,7 @@
"torch.nn.parameter.Parameter"
]
},
"execution_count": 41,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -1406,7 +1406,7 @@
},
{
"cell_type": "code",
"execution_count": 58,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1423,7 +1423,7 @@
},
{
"cell_type": "code",
"execution_count": 59,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1452,7 +1452,7 @@
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -2227,31 +2227,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -1281,31 +1281,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {
"hide_input": false
},
@ -62,7 +62,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -133,7 +133,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -154,7 +154,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -399,7 +399,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -432,7 +432,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -462,7 +462,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -471,7 +471,7 @@
"tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
},
"execution_count": 22,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -482,7 +482,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -491,7 +491,7 @@
"(tensor([0, 1, 2, 3, 4]), tensor([5, 6, 7, 8, 9]))"
]
},
"execution_count": 23,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -516,7 +516,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -538,7 +538,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -736,7 +736,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -821,7 +821,7 @@
},
{
"cell_type": "code",
"execution_count": 48,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -853,7 +853,7 @@
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -871,7 +871,7 @@
},
{
"cell_type": "code",
"execution_count": 50,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -888,7 +888,7 @@
},
{
"cell_type": "code",
"execution_count": 54,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1122,34 +1122,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {
"height": "245px",
"width": "258px"
},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -33,7 +33,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -42,7 +42,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -52,7 +52,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -61,7 +61,7 @@
"(#2) [Path('testing'),Path('training')]"
]
},
"execution_count": 4,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -79,7 +79,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -104,7 +104,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -147,7 +147,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -172,7 +172,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -196,7 +196,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -212,7 +212,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -225,7 +225,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -277,7 +277,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -306,7 +306,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -349,7 +349,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -358,7 +358,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -406,7 +406,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -464,7 +464,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -477,7 +477,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -527,7 +527,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -564,7 +564,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -595,7 +595,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -656,7 +656,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -729,7 +729,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -749,7 +749,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -797,7 +797,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -830,7 +830,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -1037,31 +1037,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -444,31 +444,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -953,31 +953,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -392,31 +392,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {
"hide_input": false
},
@ -55,7 +55,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -140,7 +140,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -157,7 +157,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -174,7 +174,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -191,7 +191,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -207,7 +207,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -223,7 +223,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -232,7 +232,7 @@
"tensor([[2.7374e-09, 1.0000e+00]], device='cuda:5')"
]
},
"execution_count": 8,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -250,7 +250,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -259,7 +259,7 @@
"(#2) [False,True]"
]
},
"execution_count": 9,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -284,7 +284,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -293,7 +293,7 @@
"torch.Size([1, 3, 224, 224])"
]
},
"execution_count": 10,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -304,7 +304,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -313,7 +313,7 @@
"torch.Size([2, 7, 7])"
]
},
"execution_count": 11,
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
@ -334,7 +334,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -369,7 +369,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -385,7 +385,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -406,7 +406,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -440,7 +440,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -462,7 +462,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -484,7 +484,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -494,7 +494,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -526,7 +526,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -540,7 +540,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -557,7 +557,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"metadata": {},
"outputs": [
{
@ -637,31 +637,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,

View File

@ -23,11 +23,7 @@
"source": [
"This final chapter (other than the conclusion, and the online chapters) is going to look a bit different. We will have far more code, and far less pros than previous chapters. We will introduce new Python keywords and libraries without discussing them. This chapter is meant to be the start of a significant research project for you. You see, we are going to implement all of the key pieces of the fastai and PyTorch APIs from scratch, building on nothing other than the components that we developed in <<chapter_foundations>>! The key goal here is to end up with our own `Learner` class, and some callbacks--enough to be able to train a model on Imagenette, including examples of each of the key techniques we've studied. On the way to building Learner, we will be creating Module, Parameter, and even our own parallel DataLoader… and much more.\n",
"\n",
<<<<<<< HEAD
"The end of chapter questionnaire is particularly important for this chapter. This is where we will be getting you started on the many interesting directions that you could take, using this chapter as your starting out point. What we really are saying is: follow through with this chapter on your computer, not on paper, and do lots of experiments, web searches, and whatever else you need to understand what's going on. You've built up the skills and expertise to do this in the rest of this book, so we think you are going to go great!"
=======
"The end of chapter questionnaire is particularly important for this chapter. This is where we will be getting you started on the many interesting directions that you could take, using this chapter as your starting out point. What we are really saying is: follow through with this chapter on your computer, not on paper, and do lots of experiments, web searches, and whatever else you need to understand what's going on. You've built up the skills and expertise to do this in the rest of this book, so we think you are going to go great!"
>>>>>>> 2b504ff971d195ae521b0fa4ec196e51ea2c23ed
"The end of chapter questionnaire is particularly important for this chapter. This is where we will be getting you started on the many interesting directions that you could take, using this chapter as your starting out point. What we really saying is: follow through with this chapter on your computer, not on paper, and do lots of experiments, web searches, and whatever else you need to understand what's going on. You've built up the skills and expertise to do this in the rest of this book, so we think you are going to go great!"
]
},
{
@ -1746,7 +1742,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
"version": "3.7.5"
},
"toc": {
"base_numbering": 1,