Fix writing ch3

2020-04-22 14:27:48 -04:00 · 2020-04-22 14:27:48 -04:00 · a131aa3f8f
commit a131aa3f8f
parent f94b161c0c
1 changed files with 25 additions and 25 deletions
--- a/03_ethics.ipynb
+++ b/03_ethics.ipynb
@ -130,7 +130,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Dr. Latanya Sweeney is a professor at Harvard and director of their data privacy lab. In the paper [Discrimination in Online Ad Delivery](https://arxiv.org/abs/1301.6822) (see <<lantanya_arrested>>) she describes her discovery that googling her name resulted in advertisements saying \"Latanya Sweeney arrested\" even although she is the only Latanya Sweeney and has never been arrested. However when she googled other names, such as Kirsten Lindquist, she got more neutral ads, even though Kirsten Lindquist has been arrested three times."
+    "Dr. Latanya Sweeney is a professor at Harvard and director of their data privacy lab. In the paper [Discrimination in Online Ad Delivery](https://arxiv.org/abs/1301.6822) (see <<lantanya_arrested>>) she describes her discovery that googling her name resulted in advertisements saying \"Latanya Sweeney arrested\" even though she is the only Latanya Sweeney and has never been arrested. However when she googled other names, such as Kirsten Lindquist, she got more neutral ads, even though Kirsten Lindquist has been arrested three times."
   ]
  },
  {
@ -162,7 +162,7 @@
   "source": [
    "One very natural reaction to considering these issues is: \"So what? What's that got to do with me? I'm a data scientist, not a politician. I'm not one of the senior executives at my company who make the decisions about what we do. I'm just trying to build the most predictive model I can.\"\n",
    "\n",
-    "These are very reasonable questions. But we're going to try to convince you that the answer is: everybody who is training models absolutely needs to consider how their model will be used. And to consider how to best ensure that it is used as positively as possible. There are things you can do. And if you don't do these things, then things can go pretty bad.\n",
+    "These are very reasonable questions. But we're going to try to convince you that the answer is: everybody who is training models absolutely needs to consider how their model will be used. And to consider how to best ensure that it is used as positively as possible. There are things you can do. And if you don't do these things, then things can go pretty badly.\n",
    "\n",
    "One particularly hideous example of what happens when technologists focus on technology at all costs is the story of IBM and Nazi Germany. A Swiss judge ruled \"It does not thus seem unreasonable to deduce that IBM's technical assistance facilitated the tasks of the Nazis in the commission of their crimes against humanity, acts also involving accountancy and classification by IBM machines and utilized in the concentration camps themselves.\"\n",
    "\n",
@ -194,7 +194,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Of course, the project managers and engineers and technicians involved were just living their ordinary lives. Caring for their families, going to the church on Sunday, doing their jobs as best as they could. Following orders. The marketers were just doing what they could to meet their business development goals. Edwin Black, author of \"IBM and the Holocaust\", said: \"To the blind technocrat, the means were more important than the ends. The destruction of the Jewish people became even less important because the invigorating nature of IBM's technical achievement was only heightened by the fantastical profits to be made at a time when bread lines stretched across the world.\"\n",
+    "Of course, the project managers and engineers and technicians involved were just living their ordinary lives. Caring for their families, going to the church on Sunday, doing their jobs the best they could. Following orders. The marketers were just doing what they could to meet their business development goals. Edwin Black, author of \"IBM and the Holocaust\", said: \"To the blind technocrat, the means were more important than the ends. The destruction of the Jewish people became even less important because the invigorating nature of IBM's technical achievement was only heightened by the fantastical profits to be made at a time when bread lines stretched across the world.\"\n",
    "\n",
    "Step back for a moment and consider: how would you feel if you discovered that you had been part of a system that ended up hurting society? Would you even know? Would you be open to finding out? How can you help make sure this doesn't happen? We have described the most extreme situation here in Nazi Germany, but there are many negative societal consequences happening due to AI and machine learning right now, some of which we'll describe in this chapter.\n",
    "\n",
@ -220,11 +220,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Presumably the reason you're doing this work is because you hope it will be used for something. Otherwise, you're just wasting your time. So, let's start with the assumption that your work will end up somewhere. Now, as you are collecting your data and developing your model, you are making lots of decisions. What level of aggregation will you store your data at? What loss function should you use? What validation and training sets should you use? Should you focus on simplicity of implementation, speed of inference, or accuracy of the model? How will your model handle out of domain data items? Can it be fine-tuned, or must it be retrained from scratch over time?\n",
+    "Presumably the reason you're doing this work is because you hope it will be used for something. Otherwise, you're just wasting your time. So, let's start with the assumption that your work will end up somewhere. Now, as you are collecting your data and developing your model, you are making lots of decisions. What level of aggregation will you store your data at? What loss function should you use? What validation and training sets should you use? Should you focus on simplicity of implementation, speed of inference, or accuracy of the model? How will your model handle out-of-domain data items? Can it be fine-tuned, or must it be retrained from scratch over time?\n",
    "\n",
    "These are not just algorithm questions. They are data product design questions. But the product managers, executives, judges, journalists, doctors… whoever ends up developing and using the system of which your model is a part will not be well-placed to understand the decisions that you made, let alone change them.\n",
    "\n",
-    "For instance, two studies found that Amazon’s facial recognition software produced [inaccurate](https://www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html) and [racially biased results](https://www.theverge.com/2019/1/25/18197137/amazon-rekognition-facial-recognition-bias-race-gender). Amazon claimed that the researchers should have changed the default parameters, they did not explain how it would change the racially baised results. Furthermore, it turned out that [Amazon was not instructing police departments](https://gizmodo.com/defense-of-amazons-face-recognition-tool-undermined-by-1832238149) that used its software to do this either. There was, presumably, a big distance between the researchers that developed these algorithms, and the Amazon documentation staff that wrote the guidelines provided to the police. A lack of tight integration led to serious problems for society, the police, and Amazon themselves. It turned out that their system erroneously *matched* 28 members of congress to criminal mugshots!  (And these members of congress wrongly matched to criminal mugshots disproportionately included people of color as seen in <<congressmen>>.)"
+    "For instance, two studies found that Amazon’s facial recognition software produced [inaccurate](https://www.nytimes.com/2018/07/26/technology/amazon-aclu-facial-recognition-congress.html) and [racially biased results](https://www.theverge.com/2019/1/25/18197137/amazon-rekognition-facial-recognition-bias-race-gender). Amazon claimed that the researchers should have changed the default parameters; they did not explain how it would change the racially biased results. Furthermore, it turned out that [Amazon was not instructing police departments](https://gizmodo.com/defense-of-amazons-face-recognition-tool-undermined-by-1832238149) that used its software to do this either. There was, presumably, a big distance between the researchers that developed these algorithms, and the Amazon documentation staff that wrote the guidelines provided to the police. A lack of tight integration led to serious problems for society, the police, and Amazon themselves. It turned out that their system erroneously *matched* 28 members of congress to criminal mugshots!  (And these members of congress wrongly matched to criminal mugshots disproportionately included people of color as seen in <<congressmen>>.)"
   ]
  },
  {
@ -238,11 +238,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Data scientists need to be part of a cross disciplinary team. And researchers need to work closely with the kinds of people who will end up using their research. Better still is if the domain experts themselves have learnt enough to be able to train and debug some models themselves — hopefully there's a few of you reading this book right now!\n",
+    "Data scientists need to be part of a cross-disciplinary team. And researchers need to work closely with the kinds of people who will end up using their research. Better still is if the domain experts themselves have learnt enough to be able to train and debug some models themselves — hopefully there are a few of you reading this book right now!\n",
    "\n",
-    "The modern workplace is a very specialised place. Everybody tends to have very well-defined jobs to perform. Especially in large companies, it can be very hard to know what all the pieces of the puzzle are. Sometimes companies even intentionally obscure the overall project goals that are being worked on, if they know that their employees are not going to like the answers. This is sometimes done by compartmentalising pieces as much as possible\n",
+    "The modern workplace is a very specialised place. Everybody tends to have very well-defined jobs to perform. Especially in large companies, it can be very hard to know what all the pieces of the puzzle are. Sometimes companies even intentionally obscure the overall project goals that are being worked on, if they know that their employees are not going to like the answers. This is sometimes done by compartmentalising pieces as much as possible.\n",
    "\n",
-    "In other words, we're not saying that any of this is easy. It's hard. It's really hard. We all have to do our best. And we have often seen that the people who do get involved in the higher-level context of these projects, and attempt to develop cross-disciplinary capabilities and teams, become some of the most important and well rewarded members of their organisations. It's the kind of work that tends to be highly appreciated by senior executives, even if it is considered, sometimes, rather uncomfortable by middle management."
+    "In other words, we're not saying that any of this is easy. It's hard. It's really hard. We all have to do our best. And we have often seen that the people who do get involved in the higher-level context of these projects, and attempt to develop cross-disciplinary capabilities and teams, become some of the most important and well rewarded members of their organisations. It's the kind of work that tends to be highly appreciated by senior executives, even if it is sometimes considered rather uncomfortable by middle management."
   ]
  },
  {
@ -341,13 +341,13 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Russia Today's coverage of the Mueller report was an extreme outlier in how many channels were recommending it. This suggests the possibility that Russia Today, a state-owned Russia media outlet, has been successful in gaming YouTube's recommendation algorithm.  The lack of transparency of systems like this make it hard to uncover the kinds of problems that we're discussing.\n",
+    "Russia Today's coverage of the Mueller report was an extreme outlier in how many channels were recommending it. This suggests the possibility that Russia Today, a state-owned Russia media outlet, has been successful in gaming YouTube's recommendation algorithm.  The lack of transparency of systems like this makes it hard to uncover the kinds of problems that we're discussing.\n",
    "\n",
    "One of our reviewers for this book, Aurélien Géron, led YouTube's video classification team from 2013 to 2016 (well before the events discussed above). He pointed out that it's not just feedback loops involving humans that are a problem. There can also be feedback loops without humans! He told us about an example from YouTube:\n",
    "\n",
-    "> : \"One important signal to classify the main topic of a video is the channel it comes from. For example, a video uploaded to a cooking channel is very likely to be a cooking video. But how do we know what topic a channel is about? Well… in part by looking at the topics of the videos it contains! Do you see the loop? For example, many videos have a description which indicates what camera was used to shoot the video. As a result, some of these videos might get classified as videos about “photography”. If a channel has such as misclassified video, it might be classified as a “photography” channel, making it even more likely for future videos on this channel to be wrongly classified as “photography”. This could even lead to runaway virus-like classifications! One way to break this feedback loop is to classify videos with and without the channel signal. Then when classifying the channels, you can only use the classes obtained without the channel signal. This way, the feedback loop is broken.\"\n",
+    "> : \"One important signal to classify the main topic of a video is the channel it comes from. For example, a video uploaded to a cooking channel is very likely to be a cooking video. But how do we know what topic a channel is about? Well… in part by looking at the topics of the videos it contains! Do you see the loop? For example, many videos have a description which indicates what camera was used to shoot the video. As a result, some of these videos might get classified as videos about “photography”. If a channel has such a misclassified video, it might be classified as a “photography” channel, making it even more likely for future videos on this channel to be wrongly classified as “photography”. This could even lead to runaway virus-like classifications! One way to break this feedback loop is to classify videos with and without the channel signal. Then when classifying the channels, you can only use the classes obtained without the channel signal. This way, the feedback loop is broken.\"\n",
    "\n",
-    "There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. Meetup’s algorithm could recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, by explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but to consider their impact. \"You need to decide which feature not to use in your algorithm… the most optimal algorithm is perhaps not the best one to launch into production\", he said.\n",
+    "There are positive examples of people and organizations attempting to combat these problems. Evan Estola, lead machine learning engineer at Meetup, [discussed the example](https://www.youtube.com/watch?v=MqoRzNhrTnQ) of men expressing more interest than women in tech meetups. Meetup’s algorithm could recommend fewer tech meetups to women, and as a result, fewer women would find out about and attend tech meetups, which could cause the algorithm to suggest even fewer tech meetups to women, and so on in a self-reinforcing feedback loop. Evan and his team made the ethical decision for their recommendation algorithm to not create such a feedback loop, by explicitly not using gender for that part of their model. It is encouraging to see a company not just unthinkingly optimize a metric, but to consider its impact. \"You need to decide which feature not to use in your algorithm… the most optimal algorithm is perhaps not the best one to launch into production\", he said.\n",
    "\n",
    "While Meetup chose to avoid such an outcome, Facebook provides an example of allowing a runaway feedback loop to run wild. Facebook radicalizes users interested in one conspiracy theory by introducing them to more. As [Renee DiResta, a researcher on proliferation of disinformation, writes](https://www.fastcompany.com/3059742/social-network-algorithms-are-distorting-reality-by-boosting-conspiracy-theories):"
   ]
@ -444,7 +444,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Yes, that is showing what you think it is: Google Photos classified a Black user's photo with their friend as \"gorillas\"! This algorithmic mis-step got a lot of attention in the media. “We’re appalled and genuinely sorry that this happened,” a company spokeswoman said. “There is still clearly a lot of work to do with automatic image labeling, and we’re looking at how we can prevent these types of mistakes from happening in the future.”\n",
+    "Yes, that is showing what you think it is: Google Photos classified a Black user's photo with their friend as \"gorillas\"! This algorithmic misstep got a lot of attention in the media. “We’re appalled and genuinely sorry that this happened,” a company spokeswoman said. “There is still clearly a lot of work to do with automatic image labeling, and we’re looking at how we can prevent these types of mistakes from happening in the future.”\n",
    "\n",
    "Unfortunately, fixing problems in machine learning systems when the input data has problems is hard. Google's first attempt didn't inspire confidence, as covered by The Guardian:"
   ]
@ -474,7 +474,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "IBM's system, for instance, had a 34.7% error rate for darker females, vs 0.3% for lighter males—over 100 times more errors! Some people incorrectly reacted to these experiments by claiming that the difference was simply because darker skin is harder for computers to recognise. However, what actually happened, is after the negative publicity that this result created, all of the companies in question dramatically improved their models for darker skin, such that one year later they were nearly as good as for lighter skin. So what this actually showed is that the developers failed to utilise datasets containing enough darker faces, or test their product with darker faces.\n",
+    "IBM's system, for instance, had a 34.7% error rate for darker females, vs 0.3% for lighter males—over 100 times more errors! Some people incorrectly reacted to these experiments by claiming that the difference was simply because darker skin is harder for computers to recognise. However, what actually happened is that, after the negative publicity that this result created, all of the companies in question dramatically improved their models for darker skin, such that one year later they were nearly as good as for lighter skin. So what this actually showed is that the developers failed to utilise datasets containing enough darker faces, or test their product with darker faces.\n",
    "\n",
    "One of the MIT researchers, Joy Buolamwini, warned, \"We have entered the age of automation overconfident yet underprepared. If we fail to make ethical and inclusive artificial intelligence, we risk losing gains made in civil rights and gender equity under the guise of machine neutrality\".\n",
    "\n",
@ -563,7 +563,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "*Aggregation bias* occurs when models do not aggregate data in a way that incorporates all of the appropriate factors, or when a model does not include the necessary interaction terms, nonlinearities, or so forth. This can particularly occur in medical settings. For instance, the way diabetes is treated is often based on simple univariate statistics and studies involving small groups of heterogeneous people. Analysis of results is often done in a way that does not take account of different ethnicities or genders. However it turns out that diabetes patients have [different complications across ethnicities](https://www.ncbi.nlm.nih.gov/pubmed/24037313), and HbA1c levels (widely used to diagnose and monitor diabetes) [differ in complex ways across ethnicities and genders](https://www.ncbi.nlm.nih.gov/pubmed/22238408). This can result in people being misdiagnosed or incorrectly treated because medical decisions are based on a model which does not include these important variables and interactions."
+    "*Aggregation bias* occurs when models do not aggregate data in a way that incorporates all of the appropriate factors, or when a model does not include the necessary interaction terms, nonlinearities, or so forth. This can particularly occur in medical settings. For instance, the way diabetes is treated is often based on simple univariate statistics and studies involving small groups of heterogeneous people. Analysis of results is often done in a way that does not take account of different ethnicities or genders. However, it turns out that diabetes patients have [different complications across ethnicities](https://www.ncbi.nlm.nih.gov/pubmed/24037313), and HbA1c levels (widely used to diagnose and monitor diabetes) [differ in complex ways across ethnicities and genders](https://www.ncbi.nlm.nih.gov/pubmed/22238408). This can result in people being misdiagnosed or incorrectly treated because medical decisions are based on a model which does not include these important variables and interactions."
   ]
  },
  {
@ -579,7 +579,7 @@
   "source": [
    "The abstract of the paper [Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting](https://arxiv.org/abs/1901.09451) notes that there is gender imbalance in occupations (e.g. females are more likely to be nurses, and males are more likely to be pastors), and says that: \"differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances\".\n",
    "\n",
-    "What this is saying is that the researchers noticed that models predicting occupation did not only reflect the actual gender imbalance in the underlying population, but actually amplified it! This is quite common, particularly for simple models. When there is some clear, easy to see underlying relationship, a simple model will often simply assume that that relationship holds all the time. As <<representation_bias>> from the paper shows, for occupations which had a higher percentage of females, the model tended to overestimate the prevalence of that occupation."
+    "What this is saying is that the researchers noticed that models predicting occupation did not only reflect the actual gender imbalance in the underlying population, but actually amplified it! This is quite common, particularly for simple models. When there is some clear, easy-to-see underlying relationship, a simple model will often simply assume that this relationship holds all the time. As <<representation_bias>> from the paper shows, for occupations which had a higher percentage of females, the model tended to overestimate the prevalence of that occupation."
   ]
  },
  {
@ -595,7 +595,7 @@
   "source": [
    "For example, in the training dataset, 14.6% of surgeons were women, yet in the model predictions, only 11.6% of the true positives were women. The model is thus amplifying the bias existing in the training set.\n",
    "\n",
-    "Now that we saw those biases existed, what can we do to mitigate them?"
+    "Now that we've seen that those biases exist, what can we do to mitigate them?"
   ]
  },
  {
@ -630,7 +630,7 @@
    "  - _Algorithms & humans are used differently_:: human decision makers and algorithmic decision makers are not used in a plug-and-play interchangeable way in practice.  For instance, algorithmic decisions are more likely to be implemented at scale and without a process for recourse.  Furthermore, people are more likely to mistakenly believe that the result of an algorithm is objective and error-free.\n",
    "  - _Technology is power_:: And with that comes responsibility.\n",
    "\n",
-    "As the Arkansas healthcare example showed, machine learning is often implemented in practice not because it leads to better outcomes, but because it is cheaper and more efficient. Cathy O'Neill, in her book *Weapons of Math Destruction*, described the pattern of how the privileged are processed by people, the poor are processed by algorithms. This is just one of a number of ways that algorithms are used differently than human decision makers. Others include:\n",
+    "As the Arkansas healthcare example showed, machine learning is often implemented in practice not because it leads to better outcomes, but because it is cheaper and more efficient. Cathy O'Neill, in her book *Weapons of Math Destruction*, described the pattern of how the privileged are processed by people, whereas the poor are processed by algorithms. This is just one of a number of ways that algorithms are used differently than human decision makers. Others include:\n",
    "\n",
    "  - People are more likely to assume algorithms are objective or error-free (even if they’re given the option of a human override)\n",
    "  - Algorithms are more likely to be implemented with no appeals process in place\n",
@ -675,7 +675,7 @@
    "\n",
    "Disinformation through auto-generated text is a particularly significant issue, due to the greatly increased capability provided by deep learning. We discuss this issue in depth when we learn to create language models, in <<chapter_nlp>>.\n",
    "\n",
-    "One proposed approach is to develop some form of digital signature, implement it in a seamless way, and to create norms that we should only trust content which has been verified.  Head of the Allen Institute on AI, Oren Etzioni, wrote such a proposal in an article titled [How Will We Prevent AI-Based Forgery?](https://hbr.org/2019/03/how-will-we-prevent-ai-based-forgery), \"AI is poised to make high-fidelity forgery inexpensive and automated, leading to potentially disastrous consequences for democracy, security, and society. The specter of AI forgery means that we need to act to make digital signatures de rigueur as a means of authentication of digital content.\"\n",
+    "One proposed approach is to develop some form of digital signature, to implement it in a seamless way, and to create norms that we should only trust content which has been verified.  The head of the Allen Institute on AI, Oren Etzioni, wrote such a proposal in an article titled [How Will We Prevent AI-Based Forgery?](https://hbr.org/2019/03/how-will-we-prevent-ai-based-forgery): \"AI is poised to make high-fidelity forgery inexpensive and automated, leading to potentially disastrous consequences for democracy, security, and society. The specter of AI forgery means that we need to act to make digital signatures de rigueur as a means of authentication of digital content.\"\n",
    "\n",
    "Whilst we can't hope to discuss all the ethical issues that deep learning, and algorithms more generally, bring up, hopefully this brief introduction has been a useful starting point you can build on. We'll now move on to the questions of how to identify ethical issues, and what to do about them."
   ]
@ -719,14 +719,14 @@
    "  - Should we even be doing this?\n",
    "  - What bias is in the data?\n",
    "  - Can the code and data be audited?\n",
-    "  - What are error rates for different sub-groups?\n",
+    "  - What are the error rates for different sub-groups?\n",
    "  - What is the accuracy of a simple rule-based alternative?\n",
    "  - What processes are in place to handle appeals or mistakes?\n",
    "  - How diverse is the team that built it?\n",
    "\n",
    "These questions may be able to help you identify outstanding issues, and possible alternatives that are easier to understand and control. In addition to asking the right questions, it's also important to consider practices and processes to implement.\n",
    "\n",
-    "One thing to consider at this stage is what data you are collecting and storing. Data often ends up being used for different purposes than why it was originally collected for. For instance, IBM began selling to Nazi Germany well before the Holocaust, including helping with Germany’s 1933 census conducted by Adolf Hitler, which was effective at identifying far more Jewish people than had previously been recognized in Germany. US census data was used to round up Japanese-Americans (who were US citizens) for internment during World War II. It is important to recognize how data and images collected can be weaponized later. Columbia professor [Tim Wu wrote](https://www.nytimes.com/2019/04/10/opinion/sunday/privacy-capitalism.html) that “You must assume that any personal data that Facebook or Android keeps are data that governments around the world will try to get or that thieves will try to steal.”"
+    "One thing to consider at this stage is what data you are collecting and storing. Data often ends up being used for different purposes than what it was originally collected for. For instance, IBM began selling to Nazi Germany well before the Holocaust, including helping with Germany’s 1933 census conducted by Adolf Hitler, which was effective at identifying far more Jewish people than had previously been recognized in Germany. US census data was used to round up Japanese-Americans (who were US citizens) for internment during World War II. It is important to recognize how data and images collected can be weaponized later. Columbia professor [Tim Wu wrote](https://www.nytimes.com/2019/04/10/opinion/sunday/privacy-capitalism.html) that “You must assume that any personal data that Facebook or Android keeps are data that governments around the world will try to get or that thieves will try to steal.”"
   ]
  },
  {
@ -799,7 +799,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Currently, less than 12% of AI researchers are women, according to a study from element AI. The statistics are similarly dire when it comes to race and age. When everybody on a team has similar backgrounds, they are likely to have similar blindspots around ethical risks. The Harvard Business Review (HBR) has published a number of studies showing many benefits of diverse teams, including:\n",
+    "Currently, less than 12% of AI researchers are women, according to a study from Element AI. The statistics are similarly dire when it comes to race and age. When everybody on a team has similar backgrounds, they are likely to have similar blindspots around ethical risks. The Harvard Business Review (HBR) has published a number of studies showing many benefits of diverse teams, including:\n",
    "\n",
    "- [How Diversity Can Drive Innovation](https://hbr.org/2013/12/how-diversity-can-drive-innovation)\n",
    "- [Teams Solve Problems Faster When They’re More Cognitively Diverse](https://hbr.org/2017/03/teams-solve-problems-faster-when-theyre-more-cognitively-diverse)\n",
@ -839,7 +839,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The professional society for computer scientists, the ACM, runs a conference on data ethics called the \"Conference on Fairness, Accountability, and Transparency\". \"Fairness, Accountability, and Transparency\" sometimes goes under the acronym *FAT*, although nowadays it's changing to *FAccT*.  Microsoft has a group focused on \"Fairness, Accountability, Transparency, and Ethics\" (FATE). The various versions of this lens have resulted in the acronym \"FAT*\" seeing wide usage. In this section, we'll use \"FAccT\" to refer to the concepts of *Fairness, Accountability, and Transparency*.\n",
+    "The professional society for computer scientists, the ACM, runs a conference on data ethics called the \"Conference on Fairness, Accountability, and Transparency\". \"Fairness, Accountability, and Transparency\" sometimes goes under the acronym *FAT*, although nowadays it's changing to *FAccT*.  Microsoft has a group focused on \"Fairness, Accountability, Transparency, and Ethics\" (FATE). The various versions of this lens have resulted in the acronym \"FAT\" seeing wide usage. In this section, we'll use \"FAccT\" to refer to the concepts of *Fairness, Accountability, and Transparency*.\n",
    "\n",
    "FAccT is another lens that you may find useful in considering ethical issues. One useful resource for this is the free online book [Fairness and machine learning; Limitations and Opportunities](https://fairmlbook.org/), which \"gives a perspective on machine learning that treats fairness as a central concern rather than an afterthought.\" It also warns, however, that it \"is intentionally narrow in scope... A narrow framing of machine learning ethics might be tempting to technologists and businesses as a way to focus on technical interventions while sidestepping deeper questions about power and accountability. We caution against this temptation.\" Rather than provide an overview of the FAccT approach to ethics (which is better done in books such as the one linked above), our focus here will be on the limitations of this kind of narrow framing.\n",
    "\n",
@ -876,7 +876,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We often talk to people who are eager for technical or design fixes to be a full solution to the kinds of problems that we've been discussing; for instance, a technical approach to debias data, or design guidelines for making technology less addictive. While such measures can be useful, they will not be sufficient to address the underlying problems that have led to our current state. For example, as long as it is incredibly profitable to create addictive technology, companies will continue to do so, regardless of whether this has the side effect of promoting conspiracy theories and polluting our information ecosystem. While individual designers may try to tweak product designs, we will not see substantial changes until the underlying profit incentives changes."
+    "We often talk to people who are eager for technical or design fixes to be a full solution to the kinds of problems that we've been discussing; for instance, a technical approach to debias data, or design guidelines for making technology less addictive. While such measures can be useful, they will not be sufficient to address the underlying problems that have led to our current state. For example, as long as it is incredibly profitable to create addictive technology, companies will continue to do so, regardless of whether this has the side effect of promoting conspiracy theories and polluting our information ecosystem. While individual designers may try to tweak product designs, we will not see substantial changes until the underlying profit incentives change."
   ]
  },
  {
@ -963,11 +963,11 @@
    "1. What was the role of IBM in Nazi Germany? Why did the company participate as they did? Why did the workers participate?\n",
    "1. What was the role of the first person jailed in the VW diesel scandal?\n",
    "1. What was the problem with a database of suspected gang members maintained by California law enforcement officials?\n",
-    "1. Why did YouTube's recommendation algorithm recommend videos of partially clothed children to pedophiles, even although no employee at Google programmed this feature?\n",
+    "1. Why did YouTube's recommendation algorithm recommend videos of partially clothed children to pedophiles, even though no employee at Google programmed this feature?\n",
    "1. What are the problems with the centrality of metrics?\n",
    "1. Why did Meetup.com not include gender in their recommendation system for tech meetups?\n",
    "1. What are the six types of bias in machine learning, according to Suresh and Guttag?\n",
-    "1. Give two examples of historical race bias in the US\n",
+    "1. Give two examples of historical race bias in the US.\n",
    "1. Where are most images in Imagenet from?\n",
    "1. In the paper \"Does Machine Learning Automate Moral Hazard and Error\" why is sinusitis found to be predictive of a stroke?\n",
    "1. What is representation bias?\n",