Copy edits.

2016-01-09 08:51:03 -08:00 · 2016-01-09 08:51:03 -08:00 · eeab89c16c
commit eeab89c16c
parent 6b9b97927d
1 changed files with 31 additions and 18 deletions
--- a/02-Discrete-Bayes.ipynb
+++ b/02-Discrete-Bayes.ipynb
@ -307,7 +307,7 @@
    "\n",
    "When I begin listening to the sensor I have no reason to believe that Simon is at any particular position in the hallway. He is equally likely to be in any position. Their are 10 positions, so the probability that he is in any given position is 1/10. \n",
    "\n",
-    "Let's represent our belief of his position at any time in a NumPy array."
+    "Let's represent our belief of his position at any time in a NumPy array. I could use a Python list, but NumPy arrays offer a lot of functionality that we will be using soon."
   ]
  },
  {
@ -336,13 +336,17 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In Bayesian statistics this is called our *prior*. It basically means the probability prior to incorporating measurements or other information. More completely, this is called the *prior probability distribution*, but that is a mouthful and so it is normally shorted to prior. A *probability distribution* is a collection of all possible probabilities for an event. Probability distributions always to sum to 1 because something had to happen; the distribution lists all the different somethings and the probability of each.\n",
+    "In Bayesian statistics this is called our *prior*. It is the probability prior to incorporating measurements or other information. More completely, this is called the *prior probability distribution*. A *probability distribution* is a collection of all possible probabilities for an event. Probability distributions always to sum to 1 because something had to happen; the distribution lists all the different somethings and the probability of each.\n",
    "\n",
-    "I'm sure you've used probabilities before - as in \"the probability of rain today is 30%\". The last paragraph sounds like more of that. But Bayesian statistics was a revolution in probability because it treats the probability as a belief about a single event. Let's take an example. I know if I flip a fair coin a infinitely many times 50% of the flips will be heads and 50% tails. That is standard probability, not Bayesian, and is called *frequentist statistics* to distinguish it from Bayes. Now, let's say I flip the coin one more time. Which way do I believe it landed? Frequentist probablility has nothing to say about that; it will merely state that 50% of coin flips land as heads. Bayes treats this as a belief about a single event - the strength of my belief that this specific coin flip is heads is 50%.\n",
+    "I'm sure you've used probabilities before - as in \"the probability of rain today is 30%\". The last paragraph sounds like more of that. But Bayesian statistics was a revolution in probability because it treats the probability as a belief about a single event. Let's take an example. I know that if I flip a fair coin infinitely many times I will get 50% heads and 50% tails. This is called *frequentist statistics* to distinguish it from Bayesian statistics. \n",
    "\n",
-    "There are more differences, but for now recognize that Bayesian statistics reasons about our belief about a single event, whereas frequentists reason about collections of events. In this rest of this chapter, and most of the book, when I talk about the probability of something I am referring to the probability that some specific thing is true. When I do that I'm taking the Bayesian approach.\n",
+    "I flip the coin one more time and let it land. Which way do I believe it landed? Frequentist probability has nothing to say about that; it will merely state that 50% of coin flips land as heads. In some ways it is meaningless to to assign a probability to the current state of the coin. It is either heads or tails, we just don't know which. Bayes treats this as a belief about a single event - the strength of my belief or knowledge that this specific coin flip is heads is 50%.\n",
    "\n",
-    "Now let's create a map of the hallway in another list. Suppose there are first two doors close together, and then another door quite a bit further down the hallway. We will use 1 to denote a door, and 0 to denote a wall:"
+    "Bayesian statistics takes past information (the prior) into account. We observe that it rains 4 times every 100 days. From this I state the chance of rain tomorrow to be 1/25. This is not how weather prediction is done. If I know it is raining today, and there are no weather patterns to move this storm system, I predict that it is likely to rain tomorrow. Weather prediction is Bayesian.\n",
+    "\n",
+    "In practice statisticians use a mix of frequentist and Bayesian techniques. In some areas finding the prior is difficult or impossible, and frequentist techniques rule. But in this book we can find the prior. When I talk about the probability of something I am referring to the probability that some specific thing is true given past events. When I do that I'm taking the Bayesian approach.\n",
+    "\n",
+    "Now let's create a map of the hallway in another list. We'll place the first two doors close together, and then another door further away. We will use 1 to denote a door, and 0 to denote a wall:"
   ]
  },
  {
@ -360,7 +364,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "So I start listening to Simon's transmissions on the network, and the first data I get from the sensor is \"door\". For the moment assume the sensor always returns the correct answer. From this I conclude that he is in front of a door, but which one? I have no idea. I have no reason to believe he is is in front of the first, second, or third door. But what I can do is assign a probability to each door. All doors are equally likely, and there are three of them, so I assign a probability of 1/3 to each door. "
+    "I start listening to Simon's transmissions on the network, and the first data I get from the sensor is \"door\". For the moment assume the sensor always returns the correct answer. From this I conclude that he is in front of a door, but which one? I have no reason to believe he is in front of the first, second, or third door. What I can do is assign a probability to each door. All doors are equally likely, and there are three of them, so I assign a probability of 1/3 to each door. "
   ]
  },
  {
@ -386,7 +390,7 @@
    "import book_plots as bp\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
-    "belief = np.array([1./3, 1./3, 0, 0, 0, 0, 0, 0, 1./3, 0])\n",
+    "belief = np.array([1/3, 1/3, 0, 0, 0, 0, 0, 0, 1/3, 0])\n",
    "set_figsize(y=2)\n",
    "bp.bar_plot(belief)"
   ]
@ -395,11 +399,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This distribution is called a *categorical distribution*, which is the term used to describe a discrete distribution describing the probability of observing $n$ outcomes. It is a *multimodal distribution* because we have multiple beliefs about the position of our dog. Of course we are not saying that we think he is simultaneously in three different locations, merely that so far we have narrowed down our knowledge in his position to be one of these three locations. My (Bayesian) belief is that there is a 33.3% chance of being at door 0, 33.3% at door 1, and a 33.3% chance of being at door 8.\n",
+    "This distribution is called a *categorical distribution*, which is a discrete distribution describing the probability of observing $n$ outcomes. It is a *multimodal distribution* because we have multiple beliefs about the position of our dog. Of course we are not saying that we think he is simultaneously in three different locations, merely that so far we have narrowed down our knowledge in his position to be one of these three locations. My (Bayesian) belief is that there is a 33.3% chance of being at door 0, 33.3% at door 1, and a 33.3% chance of being at door 8.\n",
    "\n",
-    "A few words about the *mode* of a distribution. This terms from elementary statistics. Given a set of numbers, such as {1, 2, 2, 2, 3, 3, 4}, the *mode* is the number that occurs most often. For this set the mode is 2. A set can contain more than one mode. The set {1, 2, 2, 2, 3, 3, 4, 4, 4} contains the modes 2 and 4, because both occur three times. We say the former set is *unimodal*, and the latter is *multimodal*.\n",
+    "A few words about the *mode* of a distribution. Given a set of numbers, such as {1, 2, 2, 2, 3, 3, 4}, the *mode* is the number that occurs most often. For this set the mode is 2. A set can contain more than one mode. The set {1, 2, 2, 2, 3, 3, 4, 4, 4} contains the modes 2 and 4, because both occur three times. We say the former set is *unimodal*, and the latter is *multimodal*.\n",
    "\n",
-    "I hand coded the `belief` array in the code above. How would we implement this in code? Well, hallway represents each door as a 1, and wall as 0, so we will multiply the hallway variable by the percentage, like so;"
+    "I hand coded the `belief` array in the code above. How would we implement this in code? We represent doors with 1, and walls as 0, so we will multiply the hallway variable by the percentage, like so;"
   ]
  },
  {
@ -441,7 +445,7 @@
    "  * door\n",
    "  \n",
    "\n",
-    "Can we deduce where Simon is at the end of that sequence? Of course! Given the hallway's layout there is only one place where you can be in front of a door, move once to the right, and be in front of another door, and that is at the left end. Therefore we can confidently state that Simon is in front of the second doorway. If this is not clear, suppose Simon had started at the second or third door. After moving to the right, his sensor would have returned 'wall'. That doesn't match the sensor readings, so we know he didn't start there. We can continue with that logic for all the remaining starting positions. Therefore the only possibility is that he is now in front of the second door. We implement this in Python with:"
+    "Can we deduce Simon's location? Of course! Given the hallway's layout there is only one place from which you can get this sequence, and that is at the left end. Therefore we can confidently state that Simon is in front of the second doorway. If this is not clear, suppose Simon had started at the second or third door. After moving to the right, his sensor would have returned 'wall'. That doesn't match the sensor readings, so we know he didn't start there. We can continue with that logic for all the remaining starting positions. The only possibility is that he is now in front of the second door. We represent this in Python with:"
   ]
  },
  {
@ -477,11 +481,11 @@
   "source": [
    "Perfect sensors are rare. Perhaps the sensor would not detect a door if Simon sat in front of it while scratching himself, or it might report there is a door if he is facing towards the wall instead of down the hallway. So in practice when I get a report 'door' I cannot assign 1/3 as the probability for each door. I have to assign something less than 1/3 to each door, and then assign a small probability to each blank wall position. \n",
    "\n",
-    "At this point it doesn't matter exactly what numbers we assign; let us say that the probability of the sensor being right is 3 times more likely to be right than wrong. How would we do this?\n",
+    "At this point it doesn't matter exactly what numbers we assign; let's say the sensor is 3 times more likely to be right than wrong. How would we do this?\n",
    "\n",
    "At first this may seem like an insurmountable problem. If the sensor is noisy it casts doubt on every piece of data. How can we conclude anything if we are always unsure?\n",
    "\n",
-    "The answer, as with the problem above, is probabilities. We are already comfortable assigning a probabilistic belief about the location of the dog; now we have to incorporate the additional uncertainty caused by the sensor noise. Lets say we get a reading of 'door'. We already said that the sensor is three times as likely to be correct as incorrect, so we should scale the probability distribution by 3 where there is a door. If we do that the result will no longer be a probability distribution, but we will learn how to correct that in a moment.\n",
+    "The answer, as with the problem abovef, is probabilities. We are already comfortable assigning a probabilistic belief about the location of the dog; now we have to incorporate the additional uncertainty caused by the sensor noise. Lets say we get a reading of 'door'. We already said that the sensor is three times as likely to be correct as incorrect, so we should scale the probability distribution by 3 where there is a door. If we do that the result will no longer be a probability distribution, but we will learn how to correct that in a moment.\n",
    "\n",
    "Let's look at that in Python code. Here I use the variable `z` to denote the measurement as that is the customary choice in the literature (`y` is also commonly used). "
   ]
@ -594,7 +598,7 @@
    "\n",
    "$$\\mathtt{posterior} = \\frac{\\mathtt{prior}\\times \\mathtt{likelihood}}{\\mathtt{normalization}}$$ \n",
    "\n",
-    "It is very important to learn and internalize these terms as most of the literature uses them exclusively."
+    "It is very important to learn and internalize these terms as most of the literature uses them extensively."
   ]
  },
  {
@ -1162,9 +1166,16 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Here things have degraded a bit due to the long string of wall positions in the map. We cannot be as sure where we are when there is an undifferentiated line of wall positions, so naturally our probabilities spread out a bit.\n",
+    "Here things have degraded a bit due to the long string of wall positions in the map. We cannot be as sure where we are when there is an undifferentiated line of wall positions, so naturally our probabilities spread out a bit."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The Discrete Bayes Algorithm\n",
    "\n",
-    "I spread the computation across several cells, iteratively calling `predict()` and `update()`. This chart from the **g-h Filter** chapter illustrates the algorithm."
+    "This chart illustrates the algorithm:"
   ]
  },
  {
@ -1214,7 +1225,9 @@
    "    3. Determine whether whether the measurement matches each state\n",
    "    4. Update state belief if it matches the measurement\n",
    "\n",
-    "When we cover the Kalman filter we will use this exact same algorithm; only the details of the computation will differ. "
+    "When we cover the Kalman filter we will use this exact same algorithm; only the details of the computation will differ. \n",
+    "\n",
+    "Algorithms in this form are sometimes called *predictor correctors*. We make a prediction, then correct them. I prefer this language to predict/update, but update is used in the Kalman filter literature, and so I adopt it as well."
   ]
  },
  {
@ -1837,7 +1850,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,