diff --git a/03_ethics.ipynb b/03_ethics.ipynb index 7a11db7..7783731 100644 --- a/03_ethics.ipynb +++ b/03_ethics.ipynb @@ -135,21 +135,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Bias: Professor Lantanya Sweeney \"Arrested\"" + "### Bias: Professor Latanya Sweeney \"Arrested\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Dr. Latanya Sweeney is a professor at Harvard and director of the university's data privacy lab. In the paper [\"Discrimination in Online Ad Delivery\"](https://arxiv.org/abs/1301.6822) (see <>) she describes her discovery that Googling her name resulted in advertisements saying \"Latanya Sweeney, arrested?\" even though she is the only known Latanya Sweeney and has never been arrested. However when she Googled other names, such as \"Kirsten Lindquist,\" she got more neutral ads, even though Kirsten Lindquist has been arrested three times." + "Dr. Latanya Sweeney is a professor at Harvard and director of the university's data privacy lab. In the paper [\"Discrimination in Online Ad Delivery\"](https://arxiv.org/abs/1301.6822) (see <>) she describes her discovery that Googling her name resulted in advertisements saying \"Latanya Sweeney, arrested?\" even though she is the only known Latanya Sweeney and has never been arrested. However when she Googled other names, such as \"Kirsten Lindquist,\" she got more neutral ads, even though Kirsten Lindquist has been arrested three times." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "\"Screenshot" + "\"Screenshot" ] }, { @@ -243,7 +243,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\"Picture" + "\"Picture" ] }, { @@ -312,7 +312,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We explained in <> how an algorithm can interact with its enviromnent to create a feedback loop, making predictions that reinforce actions taken in the real world, which lead to predictions even more pronounced in the same direction. \n", + "We explained in <> how an algorithm can interact with its environment to create a feedback loop, making predictions that reinforce actions taken in the real world, which lead to predictions even more pronounced in the same direction. \n", "As an example, let's again consider YouTube's recommendation system. A couple of years ago the Google team talked about how they had introduced reinforcement learning (closely related to deep learning, but where your loss function represents a result potentially a long time after an action occurs) to improve YouTube's recommendation system. They described how they used an algorithm that made recommendations such that watch time would be optimized.\n", "\n", "However, human beings tend to be drawn to controversial content. This meant that videos about things like conspiracy theories started to get recommended more and more by the recommendation system. Furthermore, it turns out that the kinds of people that are interested in conspiracy theories are also people that watch a lot of online videos! So, they started to get drawn more and more toward YouTube. The increasing number of conspiracy theorists watching videos on YouTube resulted in the algorithm recommending more and more conspiracy theory and other extremist content, which resulted in more extremists watching videos on YouTube, and more people watching YouTube developing extremist views, which led to the algorithm recommending more extremist content... The system was spiraling out of control.\n", @@ -826,7 +826,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The hiring process is particularly broken in tech. One study indicative of the disfunction comes from Triplebyte, a company that helps place software engineers in companies, conducting a standardized technical interview as part of this process. They have a fascinating dataset: the results of how over 300 engineers did on their exam, coupled with the results of how those engineers did during the interview process for a variety of companies. The number one finding from [Triplebyte’s research](https://triplebyte.com/blog/who-y-combinator-companies-want) is that “the types of programmers that each company looks for often have little to do with what the company needs or does. Rather, they reflect company culture and the backgrounds of the founders.”\n", + "The hiring process is particularly broken in tech. One study indicative of the dysfunction comes from Triplebyte, a company that helps place software engineers in companies, conducting a standardized technical interview as part of this process. They have a fascinating dataset: the results of how over 300 engineers did on their exam, coupled with the results of how those engineers did during the interview process for a variety of companies. The number one finding from [Triplebyte’s research](https://triplebyte.com/blog/who-y-combinator-companies-want) is that “the types of programmers that each company looks for often have little to do with what the company needs or does. Rather, they reflect company culture and the backgrounds of the founders.”\n", "\n", "This is a challenge for those trying to break into the world of deep learning, since most companies' deep learning groups today were founded by academics. These groups tend to look for people \"like them\"—that is, people that can solve complex math problems and understand dense jargon. They don't always know how to spot people who are actually good at solving real problems using deep learning.\n", "\n", diff --git a/06_multicat.ipynb b/06_multicat.ipynb index b0f55fe..3fbbac8 100644 --- a/06_multicat.ipynb +++ b/06_multicat.ipynb @@ -40,7 +40,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the previous chapter you learned some important practical techniques for training models in practice. COnsiderations like selecting learning rates and the number of epochs are very important to getting good results.\n", + "In the previous chapter you learned some important practical techniques for training models in practice. Considerations like selecting learning rates and the number of epochs are very important to getting good results.\n", "\n", "In this chapter we are going to look at two other types of computer vision problems: multi-label classification and regression. The first one is when you want to predict more than one label per image (or sometimes none at all), and the second is when your labels are one or several numbers—a quantity instead of a category.\n", "\n", diff --git a/07_sizing_and_tta.ipynb b/07_sizing_and_tta.ipynb index ad331ff..07a0c6f 100644 --- a/07_sizing_and_tta.ipynb +++ b/07_sizing_and_tta.ipynb @@ -68,7 +68,7 @@ "\n", "We thought that seemed very unlikely to be true. We had never actually seen a study that showed that ImageNet happen to be exactly the right size, and that other datasets could not be developed which would provide useful insights. So we thought we would try to create a new dataset that researchers could test their algorithms on quickly and cheaply, but which would also provide insights likely to work on the full ImageNet dataset.\n", "\n", - "About three hours later we had created Imagenette. We selected 10 classes from the full ImageNet that looked very different from one another. As we had hopep, we were able to quickly and cheaply create a classifier capable of recognizing these classes. We then tried out a few algorithmic tweaks to see how they impacted Imagenette. We found some that worked pretty well, and tested them on ImageNet as well—and we were very pleased to find that our tweaks worked well on ImageNet too!\n", + "About three hours later we had created Imagenette. We selected 10 classes from the full ImageNet that looked very different from one another. As we had hoped, we were able to quickly and cheaply create a classifier capable of recognizing these classes. We then tried out a few algorithmic tweaks to see how they impacted Imagenette. We found some that worked pretty well, and tested them on ImageNet as well—and we were very pleased to find that our tweaks worked well on ImageNet too!\n", "\n", "There is an important message here: the dataset you get given is not necessarily the dataset you want. It's particularly unlikely to be the dataset that you want to do your development and prototyping in. You should aim to have an iteration speed of no more than a couple of minutes—that is, when you come up with a new idea you want to try out, you should be able to train a model and see how it goes within a couple of minutes. If it's taking longer to do an experiment, think about how you could cut down your dataset, or simplify your model, to improve your experimentation speed. The more experiments you can do, the better!\n", "\n", @@ -289,7 +289,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's check how what effet this had on training our model:" + "Let's check what effect this had on training our model:" ] }, { @@ -914,7 +914,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's practice our paper-reading skills to try to interpret this. \"This maximum\" is refering to the previous part of the paragraph, which talked about the fact that 1 is the value of the label for the positive class. So it's not possible for any value (except infinity) to result in 1 after sigmoid or softmax. In a paper, you won't normally see \"any value\" written; instead it will get a symbol, which in this case is $z_k$. This shorthand is helpful in a paper, because it can be refered to again later and the reader will know what value is being discussed.\n", + "Let's practice our paper-reading skills to try to interpret this. \"This maximum\" is refering to the previous part of the paragraph, which talked about the fact that 1 is the value of the label for the positive class. So it's not possible for any value (except infinity) to result in 1 after sigmoid or softmax. In a paper, you won't normally see \"any value\" written; instead it will get a symbol, which in this case is $z_k$. This shorthand is helpful in a paper, because it can be referred to again later and the reader will know what value is being discussed.\n", "\n", "Then it says \"if $z_y\\gg z_k$ for all $k\\neq y$.\" In this case, the paper immediately follows the math with an English description, which is handy because you can just read that. In the math, the $y$ is refering to the target ($y$ is defined earlier in the paper; sometimes it's hard to find where symbols are defined, but nearly all papers will define all their symbols somewhere), and $z_y$ is the activation corresponding to the target. So to get close to 1, this activation needs to be much higher than all the others for that prediction.\n", "\n", diff --git a/08_collab.ipynb b/08_collab.ipynb index 617df20..9c664a5 100644 --- a/08_collab.ipynb +++ b/08_collab.ipynb @@ -793,7 +793,7 @@ "source": [ "In computer vision, we have a very easy way to get all the information of a pixel through its RGB values: each pixel in a colored image is represented by three numbers. Those three numbers give us the redness, the greenness and the blueness, which is enough to get our model to work afterward.\n", "\n", - "For the problem at hand, we don't have the same easy way to characterize a user or a movie. There are probably relations with genres: if a given user likes romance, they are likely to give higher scores to romance movies. Other factors might be wether the movie is more action-oriented versus heavy on dialogue, or the presence of a specific actor that a user might particularly like. \n", + "For the problem at hand, we don't have the same easy way to characterize a user or a movie. There are probably relations with genres: if a given user likes romance, they are likely to give higher scores to romance movies. Other factors might be whether the movie is more action-oriented versus heavy on dialogue, or the presence of a specific actor that a user might particularly like. \n", "\n", "How do we determine numbers to characterize those? The answer is, we don't. We will let our model *learn* them. By analyzing the existing relations between users and movies, our model can figure out itself the features that seem important or not.\n", "\n", @@ -1006,7 +1006,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The first thing we can do to make this model a little bit better is to force those predictions to be between 0 and 5. For this, we just need to use `sigmoid_range`, like in <>. One thing we discovered empirically is that it's better to have the range go a little bit over 5, so we use `(0, 5.5)`:" + "The first thing we can do to make this model a little bit better is to force those predictions to be between 0 and 5. For this, we just need to use `sigmoid_range`, like in <>. One thing we discovered empirically is that it's better to have the range go a little bit over 5, so we use `(0, 5.5)`:" ] }, { @@ -1262,7 +1262,7 @@ "loss_with_wd = loss + wd * (parameters**2).sum()\n", "```\n", "\n", - "In practice, though, it would be very inefficient (and maybe numerically unstable) to compute that big sum and add it to the loss. If you remember a little bit of high schoool math, you might recall that the derivative of `p**2` with respect to `p` is `2*p`, so adding that big sum to our loss is exactly the same as doing:\n", + "In practice, though, it would be very inefficient (and maybe numerically unstable) to compute that big sum and add it to the loss. If you remember a little bit of high school math, you might recall that the derivative of `p**2` with respect to `p` is `2*p`, so adding that big sum to our loss is exactly the same as doing:\n", "\n", "``` python\n", "parameters.grad += wd * 2 * parameters\n", diff --git a/09_tabular.ipynb b/09_tabular.ipynb index 8c93e8d..69dc5d9 100644 --- a/09_tabular.ipynb +++ b/09_tabular.ipynb @@ -92,7 +92,7 @@ "source": [ "At the end of 2015, the [Rossmann sales competition](https://www.kaggle.com/c/rossmann-store-sales) ran on Kaggle. Competitors were given a wide range of information about various stores in Germany, and were tasked with trying to predict sales on a number of days. The goal was to help the company to manage stock properly and be able to satisfy demand without holding unnecessary inventory. The official training set provided a lot of information about the stores. It was also permitted for competitors to use additional data, as long as that data was made public and available to all participants.\n", "\n", - "One of the gold medalists used deep learning, in one of the earliest known examples of a state-of-the-art deep learning tabular model. Their method involved far less feature engineering, based on domain knowledge, than those of the other gold medalists. The paper, [\"Entity Embeddings of Categorical Variables\"](https://arxiv.org/abs/1604.06737) describes their approach. In an online-only chapter on the [book's website](https://book.fast.ai/) we show how to replicate it from scratch and attain the same accuracy shown in the paper. In the abstract of the paper the authors (Cheng Guo and Felix Bekhahn) say:" + "One of the gold medalists used deep learning, in one of the earliest known examples of a state-of-the-art deep learning tabular model. Their method involved far less feature engineering, based on domain knowledge, than those of the other gold medalists. The paper, [\"Entity Embeddings of Categorical Variables\"](https://arxiv.org/abs/1604.06737) describes their approach. In an online-only chapter on the [book's website](https://book.fast.ai/) we show how to replicate it from scratch and attain the same accuracy shown in the paper. In the abstract of the paper the authors (Cheng Guo and Felix Berkhahn) say:" ] }, { @@ -174,7 +174,7 @@ "\n", "In addition, it is valuable in its own right that embeddings are continuous, because models are better at understanding continuous variables. This is unsurprising considering models are built of many continuous parameter weights and continuous activation values, which are updated via gradient descent (a learning algorithm for finding the minimums of continuous functions).\n", "\n", - "Another benefit is that we can combine our continuous embedding values with truly continuous input data in a straightforward manner: we just concatenate the variables, and feed the concatenation into our first dense layer. In other words, the raw categorical data is transformed by an embedding layer before it interacts with the raw continuous input data. This is how fastai and Guo and Berkham handle tabular models containing continuous and categorical variables.\n", + "Another benefit is that we can combine our continuous embedding values with truly continuous input data in a straightforward manner: we just concatenate the variables, and feed the concatenation into our first dense layer. In other words, the raw categorical data is transformed by an embedding layer before it interacts with the raw continuous input data. This is how fastai and Guo and Berkhahn handle tabular models containing continuous and categorical variables.\n", "\n", "An example using this concatenation approach is how Google does it recommendations on Google Play, as explained in the paper [\"Wide & Deep Learning for Recommender Systems\"](https://arxiv.org/abs/1606.07792). <> illustrates." ] @@ -598,7 +598,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "> A: Here's a productive question to ponder. If you consider that the procedure for defining a decision tree essentially chooses one _sequence of splitting questions about variables_, you might ask yourself, how do we know this procedure chooses the _correct sequence_? The rule is to choose the splitting question that produces the best split (i.e., that most accurately separates the itmes into two distinct categories), and then to apply the same rule to the groups that split produces, and so on. This is known in computer science as a \"greedy\" approach. Can you imagine a scenario in which asking a “less powerful” splitting question would enable a better split down the road (or should I say down the trunk!) and lead to a better result overall?" + "> A: Here's a productive question to ponder. If you consider that the procedure for defining a decision tree essentially chooses one _sequence of splitting questions about variables_, you might ask yourself, how do we know this procedure chooses the _correct sequence_? The rule is to choose the splitting question that produces the best split (i.e., that most accurately separates the items into two distinct categories), and then to apply the same rule to the groups that split produces, and so on. This is known in computer science as a \"greedy\" approach. Can you imagine a scenario in which asking a “less powerful” splitting question would enable a better split down the road (or should I say down the trunk!) and lead to a better result overall?" ] }, { @@ -612,7 +612,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The first piece of data preparation we need to do is to enrich our representation of dates. The fundamental basis of the decision tree that we just described is *bisection*— dividing a group into two. We look at the ordinal variables and divide up the dataset based on whether the variable's value is greater (or lower) than a threshhold, and we look at the categorical variables and divide up the dataset based on whether the variable's level is a particular level. So this algorithm has a way of dividing up the dataset based on both ordinal and categorical data.\n", + "The first piece of data preparation we need to do is to enrich our representation of dates. The fundamental basis of the decision tree that we just described is *bisection*— dividing a group into two. We look at the ordinal variables and divide up the dataset based on whether the variable's value is greater (or lower) than a threshold, and we look at the categorical variables and divide up the dataset based on whether the variable's level is a particular level. So this algorithm has a way of dividing up the dataset based on both ordinal and categorical data.\n", "\n", "But how does this apply to a common data type, the date? You might want to treat a date as an ordinal value, because it is meaningful to say that one date is greater than another. However, dates are a bit different from most ordinal values in that some dates are qualitatively different from others in a way that that is often relevant to the systems we are modeling.\n", "\n", diff --git a/10_nlp.ipynb b/10_nlp.ipynb index 378f18b..05e13a4 100644 --- a/10_nlp.ipynb +++ b/10_nlp.ipynb @@ -903,7 +903,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In a perfect world, we could then give this one batch to our model. But that approach doesn't scale, because outside of this toy example it's unlikely that a signle batch containing all the texts would fit in our GPU memory (here we have 90 tokens, but all the IMDb reviews together give several million).\n", + "In a perfect world, we could then give this one batch to our model. But that approach doesn't scale, because outside of this toy example it's unlikely that a single batch containing all the texts would fit in our GPU memory (here we have 90 tokens, but all the IMDb reviews together give several million).\n", "\n", "So, we need to divide this array more finely into subarrays of a fixed sequence length. It is important to maintain order within and across these subarrays, because we will use a model that maintains a state so that it remembers what it read previously when predicting what comes next. \n", "\n", diff --git a/12_nlp_dive.ipynb b/12_nlp_dive.ipynb index 917d64d..3738fa2 100644 --- a/12_nlp_dive.ipynb +++ b/12_nlp_dive.ipynb @@ -1532,7 +1532,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This inaccuracy means that often the gradients calculated for updating the weights end up as zero or infinity for deep networks. This is commonly refered to as the *vanishing gradients* or *exploding gradients* problem. It means that in SGD, the weights are either not updated at all or jump to infinity. Either way, they won't improve with training.\n", + "This inaccuracy means that often the gradients calculated for updating the weights end up as zero or infinity for deep networks. This is commonly referred to as the *vanishing gradients* or *exploding gradients* problem. It means that in SGD, the weights are either not updated at all or jump to infinity. Either way, they won't improve with training.\n", "\n", "Researchers have developed a number of ways to tackle this problem, which we will be discussing later in the book. One option is to change the definition of a layer in a way that makes it less likely to have exploding activations. We'll look at the details of how this is done in <>, when we discuss batch normalization, and <>, when we discuss ResNets, although these details don't generally matter in practice (unless you are a researcher that is creating new approaches to solving this problem). Another strategy for dealing with this is by being careful about initialization, which is a topic we'll investigate in <>.\n", "\n", diff --git a/13_convolutions.ipynb b/13_convolutions.ipynb index 98b11d9..2982ebc 100644 --- a/13_convolutions.ipynb +++ b/13_convolutions.ipynb @@ -1343,7 +1343,7 @@ "\n", "it will return $a1+a2+a3-a7-a8-a9$. If we are in a part of the image where $a1$, $a2$, and $a3$ add up to the same as $a7$, $a8$, and $a9$, then the terms will cancel each other out and we will get 0. However, if $a1$ is greater than $a7$, $a2$ is greater than $a8$, and $a3$ is greater than $a9$, we will get a bigger number as a result. So this filter detects horizontal edges—more precisely, edges where we go from bright parts of the image at the top to darker parts at the bottom.\n", "\n", - "Changing our filter to have the row of `1`s at the top and the `-1`s at the bottom would detect horizonal edges that go from dark to light. Putting the `1`s and `-1`s in columns versus rows would give us filters that detect vertical edges. Each set of weights will produce a different kind of outcome.\n", + "Changing our filter to have the row of `1`s at the top and the `-1`s at the bottom would detect horizontal edges that go from dark to light. Putting the `1`s and `-1`s in columns versus rows would give us filters that detect vertical edges. Each set of weights will produce a different kind of outcome.\n", "\n", "Let's create a function to do this for one location, and check it matches our result from before:" ] @@ -1826,7 +1826,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To explain the math behing convolutions, fast.ai student Matt Kleinsmith came up with the very clever idea of showing [CNNs from different viewpoints](https://medium.com/impactai/cnns-from-different-viewpoints-fab7f52d159c). In fact, it's so clever, and so helpful, we're going to show it here too!\n", + "To explain the math behind convolutions, fast.ai student Matt Kleinsmith came up with the very clever idea of showing [CNNs from different viewpoints](https://medium.com/impactai/cnns-from-different-viewpoints-fab7f52d159c). In fact, it's so clever, and so helpful, we're going to show it here too!\n", "\n", "Here's our 3×3 pixel image, with each pixel labeled with a letter:" ] @@ -2093,7 +2093,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "> jargon: channels and features: These two terms are largely used interchangably, and refer to the size of the second axis of a weight matrix, which is, the number of activations per grid cell after a convolution. _Features_ is never used to refer to the input data, but _channels_ can refer to either the input data (generally channels are colors) or activations inside the network." + "> jargon: channels and features: These two terms are largely used interchangeably, and refer to the size of the second axis of a weight matrix, which is, the number of activations per grid cell after a convolution. _Features_ is never used to refer to the input data, but _channels_ can refer to either the input data (generally channels are colors) or activations inside the network." ] }, { @@ -2416,7 +2416,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The *receptive field* is the area of an image that is involved in the calculation of a layer. On the [book's website](https://book.fast.ai/), you'll find an Excel spreadsheet called *conv-example.xlsx* that shows the calculation of two stride-2 convolutional layers using an MNIST digit. Each layer has a single kernel. <> shows what we see if we click on one of the cells in the *conv2* section, which shows the output of the second convolutional layer, and click *trace precendents*." + "The *receptive field* is the area of an image that is involved in the calculation of a layer. On the [book's website](https://book.fast.ai/), you'll find an Excel spreadsheet called *conv-example.xlsx* that shows the calculation of two stride-2 convolutional layers using an MNIST digit. Each layer has a single kernel. <> shows what we see if we click on one of the cells in the *conv2* section, which shows the output of the second convolutional layer, and click *trace precedents*." ] }, { @@ -2446,7 +2446,7 @@ "source": [ "In this example, we have just two convolutional layers, each of stride 2, so this is now tracing right back to the input image. We can see that a 7×7 area of cells in the input layer is used to calculate the single green cell in the Conv2 layer. This 7×7 area is the *receptive field* in the input of the green activation in Conv2. We can also see that a second filter kernel is needed now, since we have two layers.\n", "\n", - "As you see from this example, the deeper we are in the network (specifically, the more stride-2 convs we have before a layer), the larger the receptive field for an activation in that layer. A large receptive field means that a large amount of the input image is used to calculate each activation in that layer is. We now know that in the deeper layers of the network we have semantically rich features, corresponding to larger receptive fields. Therefore, we'd expect that we'd need more weights for each of our features to handle this increasing complexity. This is another way of saying the same thing we mentionedin the previous section: when we introduce a stride-2 conv in our network, we should also increase the number of channels." + "As you see from this example, the deeper we are in the network (specifically, the more stride-2 convs we have before a layer), the larger the receptive field for an activation in that layer. A large receptive field means that a large amount of the input image is used to calculate each activation in that layer is. We now know that in the deeper layers of the network we have semantically rich features, corresponding to larger receptive fields. Therefore, we'd expect that we'd need more weights for each of our features to handle this increasing complexity. This is another way of saying the same thing we mentioned in the previous section: when we introduce a stride-2 conv in our network, we should also increase the number of channels." ] }, { @@ -2469,7 +2469,7 @@ "source": [ "We are not, to say the least, big users of social networks in general. But our goal in writing this book is to help you become the best deep learning practitioner you can, and we would be remiss not to mention how important Twitter has been in our own deep learning journeys.\n", "\n", - "You see, there's another part of Twitter, far away from Donald Trump and the Kardashians, which is the part of Twitter where deep learning researchers and practitioners talk shop every day. As we were writing this section, Jeremy wanted to double-checkthat what we were saying about stride-2 convolutions was accurate, so he asked on Twitter:" + "You see, there's another part of Twitter, far away from Donald Trump and the Kardashians, which is the part of Twitter where deep learning researchers and practitioners talk shop every day. As we were writing this section, Jeremy wanted to double-check that what we were saying about stride-2 convolutions was accurate, so he asked on Twitter:" ] }, { diff --git a/17_foundations.ipynb b/17_foundations.ipynb index ab31025..69f5518 100644 --- a/17_foundations.ipynb +++ b/17_foundations.ipynb @@ -502,7 +502,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Broadcasting with a scalar is the easiest type of broadcating. When we have a tensor `a` and a scalar, we just imagine a tensor of the same shape as `a` filled with that scalar and perform the operation:" + "Broadcasting with a scalar is the easiest type of broadcasting. When we have a tensor `a` and a scalar, we just imagine a tensor of the same shape as `a` filled with that scalar and perform the operation:" ] }, { @@ -1984,7 +1984,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We have sucessfuly defined our model—now let's make it a bit more like a PyTorch module." + "We have successfully defined our model—now let's make it a bit more like a PyTorch module." ] }, { @@ -2252,7 +2252,7 @@ "\n", "To implement an `nn.Module` you just need to:\n", "\n", - "- Make sure the superclass `__init__` is called first when you initiliaze it.\n", + "- Make sure the superclass `__init__` is called first when you initialize it.\n", "- Define any parameters of the model as attributes with `nn.Parameter`.\n", "- Define a `forward` function that returns the output of your model.\n", "\n", diff --git a/18_CAM.ipynb b/18_CAM.ipynb index 6b452ec..91bfba0 100644 --- a/18_CAM.ipynb +++ b/18_CAM.ipynb @@ -58,7 +58,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The class activation map (CAM) was introduced by Bolei Zhou et al. in [\"Learning Deep Features for Discriminative Localization\"](https://arxiv.org/abs/1512.04150). It uses the output of the last convolutional layer (just before the average pooling layer) together with the predictions to give us a heatmap visualization of why the model made its decision. This is a useful tool for intepretation.\n", + "The class activation map (CAM) was introduced by Bolei Zhou et al. in [\"Learning Deep Features for Discriminative Localization\"](https://arxiv.org/abs/1512.04150). It uses the output of the last convolutional layer (just before the average pooling layer) together with the predictions to give us a heatmap visualization of why the model made its decision. This is a useful tool for interpretation.\n", "\n", "More precisely, at each position of our final convolutional layer, we have as many filters as in the last linear layer. We can therefore compute the dot product of those activations with the final weights to get, for each location on our feature map, the score of the feature that was used to make a decision.\n", "\n", @@ -440,7 +440,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This method is useful, but only works for the last layer. *Gradient CAM* is a variant that addreses this problem." + "This method is useful, but only works for the last layer. *Gradient CAM* is a variant that addresses this problem." ] }, { diff --git a/19_learner.ipynb b/19_learner.ipynb index bcbab1e..e53b42a 100644 --- a/19_learner.ipynb +++ b/19_learner.ipynb @@ -1827,7 +1827,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We have explored the key concepts of the fastai library are implemented by re-implementing them in this chapter. Since it's mostly full of code, you should definitely try to experiment with it by looking at the corresponding notebook on the book's website. Now that you know how it's built, as a next step be sure to check out the intermediate and advanced tutorials in the fastai documentation to learn how to customize every bit of the libraryt." + "We have explored the key concepts of the fastai library are implemented by re-implementing them in this chapter. Since it's mostly full of code, you should definitely try to experiment with it by looking at the corresponding notebook on the book's website. Now that you know how it's built, as a next step be sure to check out the intermediate and advanced tutorials in the fastai documentation to learn how to customize every bit of the library." ] }, { diff --git a/20_conclusion.ipynb b/20_conclusion.ipynb index 8450da2..3149e37 100644 --- a/20_conclusion.ipynb +++ b/20_conclusion.ipynb @@ -62,7 +62,7 @@ "\n", "Also, you may want to take a look at the fast.ai free online course that covers the same material as this book. Sometimes, seeing the same material in two different ways can really help to crystallize the ideas. In fact, human learning researchers have found that one of the best ways to learn material is to see the same thing from different angles, described in different ways.\n", "\n", - "Your final mission, should you choose to accept it, is to take this book and give it to somebody that you know—and get somebody else starte on their own deep learning journey!" + "Your final mission, should you choose to accept it, is to take this book and give it to somebody that you know—and get somebody else started on their own deep learning journey!" ] }, { diff --git a/app_blog.ipynb b/app_blog.ipynb index d429f2a..f932275 100644 --- a/app_blog.ipynb +++ b/app_blog.ipynb @@ -61,7 +61,7 @@ "source": [ "A great solution is to host your blog on a platform called [GitHub Pages](https://pages.github.com/), which is free, has no ads or pay wall, and makes your data available in a standard way such that you can at any time move your blog to another host. But all the approaches I’ve seen to using GitHub Pages have required knowledge of the command line and arcane tools that only software developers are likely to be familiar with. For instance, GitHub's [own documentation](https://help.github.com/en/github/working-with-github-pages/creating-a-github-pages-site-with-jekyll) on setting up a blog includes a long list of instructions that involve installing the Ruby programming language, using the `git` command-line tool, copying over version numbers, and more—17 steps in total!\n", "\n", - "To cut down the hassle, weve created an easy approach that allows you to use an *entirely browser-based interface* for all your blogging needs. You will be up and running with your new blog within about five minutes. It doesn’t cost anything, and you can easily add your own custom domain to it if you wish to. In this section, we'll explain how to do it, using a template we've created called *fast\\_template*. (NB: be sure to check the [book's website](https://book.fast.ai) for the latest blog recommendations, since new tools are always coming out)." + "To cut down the hassle, we’ve created an easy approach that allows you to use an *entirely browser-based interface* for all your blogging needs. You will be up and running with your new blog within about five minutes. It doesn’t cost anything, and you can easily add your own custom domain to it if you wish to. In this section, we'll explain how to do it, using a template we've created called *fast\\_template*. (NB: be sure to check the [book's website](https://book.fast.ai) for the latest blog recommendations, since new tools are always coming out)." ] }, { @@ -76,10 +76,10 @@ "metadata": {}, "source": [ "You’ll need an account on GitHub, so head over there now and create an account if you don’t have one already. Make sure that you are logged in. Normally, GitHub is used by software developers for writing code, and they use a sophisticated command-line tool to work with it—but we're going to show you an approach that doesn't use the command line at all!\n", + "\n",_re + "To get started, point your browser to [https://github.com/fastai/fast_template/generate](https://github.com/fastai/fast_template/generate) (you need to be logged in to GitHub for the link to work). This will allow you to create a place to store your blog, called a *repository*. You will a screen like the one in <>. Note that you have to enter your repository name using the *exact* format shown here—that is, your GitHub username followed by `.github.io`.\n", "\n", - "To get started, point your browser to [https://github.com/fastai/fast_template/generate](https://github.com/fastai/fast_template/generate) (you need to be logged in to GitHub for the link to work). This will allow you to create a place to store your blog, called a *repository*. You will a screen like the one in <>. Note that you have to enter your repository name using the *exact* format shown here—that is, your GitHub username followed by `.github.io`.\n", - "\n", - "\"Screebshot\n", + "\"Screenshot\n", "\n", "Once you’ve entered that, and any description you like, click \"Create repository from template.\" You have the choice to make the repository \"private,\" but since you are creating a blog that you want other people to read, having the underlying files publicly available hopefully won't be a problem for you.\n", "\n", @@ -141,7 +141,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now you’re ready to create your first post. All your posts will go in the *\\_posts* folder. Click on that now, and then click the \"Create file\" button. You need to be careful to name your file using the format *<year>-<month>-<day>-<name>.md*, as shwon in <>, where *<year>* is a four-digit number, and *<month>* and *<day>* are two-digit numbers. *<name>* can be anything you want that will help you remember what this post was about. The *.md* extension is for markdown documents.\n", + "Now you’re ready to create your first post. All your posts will go in the *\\_posts* folder. Click on that now, and then click the \"Create file\" button. You need to be careful to name your file using the format *<year>-<month>-<day>-<name>.md*, as shown in <>, where *<year>* is a four-digit number, and *<month>* and *<day>* are two-digit numbers. *<name>* can be anything you want that will help you remember what this post was about. The *.md* extension is for markdown documents.\n", "\n", "\"Screenshot\n", "\n", @@ -156,7 +156,7 @@ "source": [ "As before, you can click the \"Preview\" button to see how your markdown formatting will look (<>).\n", "\n", - "\"Screenshot\n", + "\"Screenshot\n", "\n", "And you will need to click the \"Commit new file\" button to save it to GitHub, as shown in <>.\n", "\n", diff --git a/clean/03_ethics.ipynb b/clean/03_ethics.ipynb index f020568..102ec31 100644 --- a/clean/03_ethics.ipynb +++ b/clean/03_ethics.ipynb @@ -46,7 +46,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Bias: Professor Lantanya Sweeney \"Arrested\"" + "### Bias: Professor Latanya Sweeney \"Arrested\"" ] }, {