Rewrote a lot to make it more understandable.

2014-05-09 13:55:30 -07:00 · 2014-05-09 13:55:30 -07:00 · f3939cd774
commit f3939cd774
parent 3504351fe6
1 changed files with 113 additions and 79 deletions
--- a/Gaussians.ipynb
+++ b/Gaussians.ipynb
@ -1,7 +1,7 @@
 {
 "metadata": {
  "name": "",
-  "signature": "sha256:5f19c6fe106aa81c2c579cadd757aca741e13f7982c7052c54831a08f4254eee"
+  "signature": "sha256:5304ce00179761b51c04bc474726572fbd4057258b20454a864ab8a3baca398b"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
@ -14,77 +14,58 @@
     "source": [
      "#Gaussian Probabilities\n",
      "\n",
-      "#### Introduction\n",
+      "### Introduction\n",
      "\n",
-      "The histogram filter uses a vector or multidimensional array of probabilities to represent our belief of the system state. This is very powerful, but it has limitations. Let us list a few. First, it is multimodal, and we may not want a multimodal result. In other words, we would not want our GPS telling us that we are 50% likely to be here, 40% likely to be there, and so on. There are plenty of techniques to go from a multimodal histogram to a unimodal result, but they all involve significant processing time. This becomes worse as we extend the problem into multiple dimensions. In general for $d$ dimensions the computational run time will be $O(n^d)$. So these filters become intractable in higher dimensions. \n",
+      "The last chapter ended by discussing some of the drawbacks of the Discrete Bayesian filter. For many tracking and filtering problems our desire is to have a filter that is *unimodal* and *continuous*. That is, we want to model our system using floating point math (continuous) and to have only one belief represented (unimodal). For example, we want to say an aircraft is at (12.34381, -95.54321,2389.5) where that is latitude, longitude, and altidue. We do not want our filter to tell us \"it might be at (1,65,78) or at (34,656,98)\" That doesn't match our physical intuition of how the world works, and as we discussed, it is prohibitively expensive to compute.\n",
+      "<p>\n",
+      "<div style=\"border:1px black;background-color:#DCDCDC#\">So we desire a unimodal, continuous way to represent probabilities that models how the real world works, and that is very computationally efficient to calculate. As you might guess from the chapter name, Gaussian distributions provide all of these features.</div>\n",
      "\n",
-      "Second, the results are hard to analyze. Recall the GPS in your car. It probably has a display that tells you how much error is in the filter - it tells you are at a specific location on the earth, with for example 9 meters of error. How would you derive that information from a histogram filter? There is no clear cut way. Most, if not all of the bins in the histogram will be non-zero, and there will usually be several clusters of higher probabilities. Being multimodal, there is no clear way to derive a single error estimate. Certainly one could come up with a heuristic, but often we want mathematically precise answers, not heuristics.\n",
-      "\n",
-      "Third, the histogram is discrete, but we live in a continuous world. The histogram requires that you model the output of your filter as a set of discrete points. In our dog in the hallway example, we used 10 positions, which is obviously far too few positions for anything but a toy problem. For example, for a 100 meter hallway you would need 10,000 positions to model the hallway to 1cm accuracy. So each sense and update operation would entail performing calculations for 10,000 different probabilities. It gets exponentially worse as we add dimensions. If our dog was roaming in a $100x100 m^2$ courtyard, we would need 100,000,000 bins ($10,000^2$) to get 1cm accuracy.\n",
-      "\n",
-      "Finally, the histogram does not represent what happens in the pysical world very well. For example, in the last chapter we had this as a probability distribution: [ 0.2245871   0.06288015  0.06109133  0.0581008   0.09334062  0.2245871\n",
-      "  0.06288015  0.06109133  0.0581008   0.09334062]. The largest probabilities are in position 0 and position 5. This does not fit our physical intuition at all. A dog cannot be in two places at once.\n",
-      "  \n",
-      " \n",
-      "Consider how a bimetallic thermometer works. These thermometers use a strip of bimetallic material bent into a loose coil shape, with a pointer attached to the spring. The two metals expand at different rates, so the strip will either coil tighter or uncoil as the temperature changes, and the pointer indicates the current temperature. It is in some ways a crude system, yet it works well. What kinds of errors might we expect from this system? It will not respond to rapid temperature changes, so it will always lag the current temperature a small amount. There may be frictions in the system causing the pointer to not register the correct value. And so on. Finally, when you read the dial you will rarely be exactly perpindicular to the face of the dial. Viewing the pointer against the dial from an angle will result in a small reading error - 34 might look like 35 from the left of the dial, for example. However, in total all of these effects will generally be low, and we would expect the total errors to always be small. We would probably agree that an error of 2 degrees is reasonable, but an error of 100 degrees would occur extremely rarely, if ever. Furthermore, we would expect the errors to cluster around the correct value. So in most systems we would expect the errors to equally likely to be larger or smaller than the correct value. If the correct temperature was 35 degrees, and we were told the thermometer was accurate within 2 degrees, we wouldn't be suprised to read either 33 degrees or 37 degrees.  \n",
-      "\n",
-      "So we desire a unimodal, continuous way to represent probabilities that models how the real world works, and that is very computationally efficient to calculate. As you might guess from the chapter name, gaussian distributions provide all of these features.\n",
-      "\n",
-      "\n",
-      "#### Probability Distributions\n",
-      "\n",
-      "Before we go into the math, lets look at a graph of the gaussian distribution. Don't bother reading the code yet; it is not important at this stage."
+      "Before we go into the math, lets just look at a graph of the Gaussian distribution to get a sense of what we are talking about."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
-      "import math\n",
-      "\n",
-      "def gaussian (x, mu, sigma):\n",
-      "    ''' compute the gaussian with the specified mean(mu) and sigma'''\n",
-      "    return math.exp (-0.5 * (x-mu)**2 / sigma) / math.sqrt(2.*math.pi*sigma)\n"
-     ],
-     "language": "python",
-     "metadata": {},
-     "outputs": []
-    },
-    {
-     "cell_type": "code",
-     "collapsed": false,
-     "input": [
+      "#ignore this code.\n",
      "%matplotlib inline\n",
-      "import numpy as np\n",
-      "import matplotlib.pyplot as plt\n",
+      "from __future__ import division, print_function\n",
+      "from gaussian_internal import plot_gaussian\n",
+      "import matplotlib.pylab as pylab\n",
+      "pylab.rcParams['figure.figsize'] = 10,6\n",
      "\n",
-      "xs = np.arange(0,10,0.1)\n",
-      "ys = [gaussian (x, 5, 3) for x in xs]\n",
-      "plt.plot (xs, ys)\n",
-      "plt.show()"
+      "plot_gaussian(mu=100, variance=15*15, xlim=(45,155), xlabel='IQ', ylabel='percent')"
     ],
     "language": "python",
     "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": ""
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Probably this is immediately recognizable to you as a 'bell curve'. This curve is ubiquitious because under real world conditions most observations are distributed in such a manner. We will not prove the math here, but the **central limit theorem** proves that under certain conditions the arithmetic mean of independent observations will be distributed in this manner, even if the observations themselves do not have this distribution. In nonmathematical terms, this means that if you take a bunch of measurements from a sensor and use them in a filter, they are very likely to create this distribution.\n",
+      "Probably this is immediately recognizable to you as a 'bell curve'. This curve is ubiquitious because under real world conditions most observations are distributed in such a manner. In fact, this is the bell curve for IQ (Intelligence Quotient). You've probably seen this before, and understand it. It tells us that the average IQ is 100, and that the number of people that have IQs higher or lower than that drops off as they get further away from 100. It's hard to see the exact number, but we can see that very few people have an IQ over 150 or under 50, but a lot have an IQ of 90 or 110. \n",
      "\n",
-      "Before we go further, a moment for terminology. This is variously called a normal distribution, a Gaussian distribution, or a bell curve. However, other distributions also have a bell shaped curve, so that name is somewhat ambiguous, and we will not use *bell curve* in this book.\n",
+      "This curve is not unique to IQ distributions - a vast amount of natural phenomena exhibits this sort of distribution, including the sensors that we use in filtering problems. As we will see, it also has all the attributes that we are looking for - it represents a unimodal belief or value as a probabilitiy, it is continuous, and it is computationally efficient. We will soon discover that it also other desirable qualities that we do not yet recognize we need.\n",
      "\n",
-      "Often *univariate* is tacked onto the front of the name to indicate that this is one dimensional - it is the gaussian for a scalar value, so often you will see it as *univariate normal distribution*. We will use this term often when we need to distinguish between the 1D case and the multidimensional cases that we will use in later chapters. For reference, we will learn that the multidimensional case is called *multivariate normal distribution*. If the context of what we are discussing makes the dimensionality clear we will often leave off the dimensional qualifier, as in the rest of this chapter."
+      "#### Nomenclature\n",
+      "\n",
+      "A bit of nomenclature before we continue - this chart depicts the probability of of a *random variable* having any value between ($-\\infty..\\infty)$. For example, for this chart the probability of the variable being 100 is roughly 2.7%, whereas the probability of it being 80 is around 1%.\n",
+      "> *Random variable* will be precisely defined later. For now just think of it as a variable that can 'freely' and 'randomly' vary. A dog's position in a hallway, air temperature, and a drone's height above the ground are all random variables. The position of the North Pole is not, nor is a sin wave (a sin wave is anything but 'free').\n",
+      "\n",
+      "You may object that human IQs cannot be less than zero, let alone $-\\infty$. This is true, but this is a common limitation of mathematical modelling. \"The map is not the territory\" is a common expression, and it is true for Bayesian filtering and statistics. The Gaussian distribution above very closely models the distribution of IQ test results, but being a model it is necessarily imperfect. The difference between model and reality will come up again and again in these filters. \n",
+      "\n",
+      "You will see these distributions called *Gaussian distributions*, *normal distributions*, and *bell curves*. Bell curve is ambiguous because there are other distributions which also look bell shaped but are not Gaussian distributions, so we will not use it further in this book. But *Gaussian* and *normal* both mean the same thing, and are used interchangeably. I will use both throughout this book as different sources will use either term, and so I want you to be used to seeing both. Finally, as in this paragraph, it is typical to shorten the name and just talk about a *Gaussian* or *normal* - these are both typical shortcut names for the *Gaussian distribution*. "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "#### Gaussian Distributions\n",
+      "### Gaussian Distributions\n",
      "\n",
-      "So let us explore how gaussians work. A gaussian is a continuous probability distribution that is completely described with two parameters, the mean ($\\mu$) and the variance ($\\sigma^2$). It is defined as:\n",
+      "So let us explore how Gaussians work. A Gaussian is a *continuous probability distribution* that is completely described with two parameters, the mean ($\\mu$) and the variance ($\\sigma^2$). It is defined as:\n",
      "$$ \n",
      "f(x, \\mu, \\sigma) = \\frac{1}{\\sigma\\sqrt{2\\pi}} e^{-\\frac{1}{2}{(x-\\mu)^2}/\\sigma^2 }\n",
      "$$"
@ -94,62 +75,105 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "You will not need to understand how this equation comes about, or remember it for this book, but it is useful to look at it to see how it works. Specifically, notice the term for $e$. When $x==\\mu$, the term reduces to $e^0=1$. Any other value of x will result in a smaller value for the exponent term due to the negative sign, so the curve will always be highest at $x==\\mu$.\n",
+      "<p> Don't be dissuaded by the equation if you haven't seen it before; you will not need to memorize or manipulate it. The computation of this function is stored in gaussian.py. You are free to look at the code if interested by using the '%load gaussian.py' magic. \n",
      "\n",
-      "Now we will plot a gaussian centered around 23 ($\\mu=23$), with a variance of 1 ($\\sigma^2=1$). "
+      "We will plot a Gaussian with a mean of 22 ($\\mu=22$), with a variance of 4 ($\\sigma^2=4$), and then discuss what this means. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
-      "xs = np.arange(16,30,0.1)\n",
-      "ys = [gaussian (x,23,1) for x in xs]\n",
-      "plt.plot (xs,ys, 'r')\n",
-      "plt.axvline(23); plt.axvline(24) \n",
-      "plt.show()"
+      "from gaussian import gaussian\n",
+      "plot_gaussian(22,4,True,xlabel='$^{\\circ}C$',ylabel=\"Percent\")\n",
+      "\n",
+      "print('Probability of 22 is %.2f' % (gaussian(22,22,4)*100))\n",
+      "print('Probability of 24 is %.2f' % (gaussian(24,22,4)*100))"
     ],
     "language": "python",
     "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": ""
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "As we expected the curve is centered around 23. The width of the curve is defined by the variance. If the variance is large than the curve will be wide, and if the variance is small the curve will be narrow.\n",
+      "So what does this curve *mean*? Assume for a moment that we have a themometer, which reads 22$\\,^{\\circ}C$. No thermometer is perfectly accurate, and so we normally expect that thermometer will read $\\pm$ that temperature by some amount each time we read it. Furthermore, a theorem called  **Central Limit Theorem** states that if we make many measurements that the measurements will be normally distributed. If that is true, then this chart can be interpreted as a continuous curve depicting our belief that the temperature is any given temperature. In this curve, we assign a probability of the temperature being exactly $22\\,^{\\circ}$ is 19.95%. Looking to the right, we assign the probability that the temperature is $24\\,^{\\circ}$ is 12.10%. Because of the curve's symmetry, the probability of $20\\,^{\\circ}$ is also 12.10%.\n",
      "\n",
-      "So what does this curve *mean*? Assume for a moment that we have a themometer that reads 23$\\,^{\\circ}$. We have asserted (without proof) that the distribution will be *normal*. If that is true, then this chart can be interpreted as a continuous curve depicting our belief that the temperature is any given temperature. In this curve, we assign a probability of the temperature being exactly 23$\\,^{\\circ}$ is 40%. Looking to the right, we assign the probability that the temperature is 24$\\,^{\\circ}$ is about 25%. We find 20$\\,^{\\circ}$ and 26$\\,^{\\circ}$ quite unlikely, and temperatures beyond that range extremely rare. \n",
+      "So the mean ($\\mu$) is what it sounds like - the average of all possible probabilities. Because of the symmetric shape of the curve it is also the tallest part of the curve. The thermometer reads $22\\,^{\\circ}C$, so that is what we used for the mean.  \n",
+      "\n",
+      "> *Important*: I will repeat what I wrote at the top of this section: \"A Gaussian...is completely described with two parameters\"\n",
+      "\n",
+      "The standard notation for a normal distribution for a random variable $X$ is $X \\sim\\ \\mathcal{N}(\\mu,\\sigma^2)$. This means I can express the temperature reading of our thermometer as $$temp = \\mathcal{N}(22,4)$$This is an **extremely important** result. Gaussians allow me to capture an infinite number of possible values with only two numbers! With the values $\\mu=22$ and $\\sigma^2=4$ I can compute the probability of the temperature being $22\\,^{\\circ}C$, of $20\\,^{\\circ}C$, of $87.3429\\,^{\\circ}C$, or any other arbitrary value.\n",
+      "\n",
+      "###### The Variance\n",
      "\n",
      "Since this is a probability distribution it is required that the area under the curve always equals one. This should be intuitively clear - the area under the curve represents all possible occurances, which must sum to one.\n",
      "\n",
-      "This leads to an important insight. If the variance is small the curve will be narrow. To keep the area == 1, the curve must also be tall. On the other hand if the variance is large the curve will be wide, and thus it will also have to be short to make the area == 1.\n",
+      "This leads to an important insight. If the variance is small the curve will be narrow. To keep the area equal to 1, the curve must also be tall. On the other hand if the variance is large the curve will be wide, and thus it will also have to be short to make the area equal to 1.\n",
      "\n",
-      "Let us look at that:"
+      "Let's look at that graphically:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
-      "plt.plot (xs,[gaussian(x, 23, .2) for x in xs],'b')\n",
-      "plt.plot (xs,[gaussian(x, 23, 1) for x in xs],'g')\n",
-      "plt.plot (xs,[gaussian(x, 23, 5) for x in xs],'r')\n",
+      "import numpy as np\n",
+      "import matplotlib.pyplot as plt\n",
+      "\n",
+      "xs = np.arange(15,30,0.1)\n",
+      "p1, = plt.plot (xs,[gaussian(x, 23, .2) for x in xs],'b')\n",
+      "p2, = plt.plot (xs,[gaussian(x, 23, 1) for x in xs],'g')\n",
+      "p3, = plt.plot (xs,[gaussian(x, 23, 5) for x in xs],'r')\n",
+      "plt.legend([p1,p2,p3], ['var = .2', 'var = 1', 'var = 5'])\n",
      "plt.show()"
     ],
     "language": "python",
     "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": ""
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "So what is this telling us? The blue gaussian is very narrow. It is saying that we believe x=23, and that we are very sure about that. In contrast, the red gaussian also believes that x=23, but we are much less sure about that. Our believe that x=23 is lower, and so our belief about the likely possible values for x is spread out - we think it is quite likely that x=2 or x=8, for example. The blue gaussian has almost completely eliminated 22 or 24 as possible value - their probably is almost 0.0, whereas the red curve considers them nearly as likely as 23.\n",
+      "So what is this telling us? The blue gaussian is very narrow. It is saying that we believe x=23, and that we are very sure about that (90%). In contrast, the red gaussian also believes that x=23, but we are much less sure about that (18%%). Our believe that x=23 is lower, and so our belief about the likely possible values for x is spread out - we think it is quite likely that x=20 or x=26, for example. The blue gaussian has almost completely eliminated 22 or 24 as possible value - their probability is almost 0%, whereas the red curve considers them nearly as likely as 23.\n",
      "\n",
-      "If we think back to the thermometer, we can consider these three curves as representing 3 thermometers. The blue curve represents a very accurate thermometer, and the red one represents a fairly inaccurate one. Green of course represents one in between the two others. Note the very powerful property the Gaussian distribution affords us - we can entirely represent both the reading and the error of a thermometer with only two numbers - the mean and the variance.\n",
+      "If we think back to the thermometer, we can consider these three curves as representing the readings from 3 different thermometers. The blue curve represents a very accurate thermometer, and the red one represents a fairly inaccurate one. Green of course represents one in between the two others. Note the very powerful property the Gaussian distribution affords us - we can entirely represent both the reading and the error of a thermometer with only two numbers - the mean and the variance.\n",
      "\n",
-      "The standard notation for a normal distribution is just $N(\\mu,\\sigma^2)$. I will not go into detail as to why $\\sigma^2$ is used, other than to note that $\\sigma$ is commonly called the *standard deviation*, which has enormous utility in statistics. The standard deviation is not really used in this book, so I will not address it further. The important thing to understand is that the variance ($\\sigma^2$) is a measure of the width of the curve. The curve above is notated as $N(23, 1)$, since $\\mu=23$ and $\\sigma=1$. We will use this notiation throughout the rest of the book, so learn it now.\n"
+      "The standard notation for a normal distribution for a random variable $X$ is just $X \\sim\\ \\mathcal{N}(\\mu,\\sigma^2)$ where $\\mu$ is the mean and $\\sigma^2$ is the variance. It may seem odd to use $\\sigma$ squared - why not just $\\sigma$? We will not go into great detail about the math at this point, but in statistics $\\sigma$ is the *standard deviation* of a normal distribution. *Variance* is defined as the square of the standard deviation, hence $\\sigma^2$.\n",
+      "\n",
+      "It is worth spending a few words on standard deviation now. The standard deviation is a measure of how much variation from the mean exists. For Gaussian distributions, 68% of all the data falls within one standard deviation($1\\sigma$) of the mean, 95% falls within two standard deviations ($2\\sigma$), and 99.7% within three ($3\\sigma$). This is often called the 68-95-99.7 rule. So if you were told that the average test score in a class was 71 with a standard deviation of 9.4, you could conclude that 95% of the students received a score between 52.2 and 89.8 if the distribution is normal (that is calculated with $71 \\pm (2 * 9.4)$). "
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "The following graph depicts the relationship between the standard deviation and the normal distribution. "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "from gaussian_internal import display_stddev_plot\n",
+      "display_stddev_plot()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": ""
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "<p>\n",
+      "<p>\n",
+      "<div style=\"border:1px dotted black;background-color:#DCDCDC#\">**Sidebar**: An equivalent formation for a Gaussian is $\\mathcal{N}(\\mu,1/\\tau)$ where $\\mu$ is the *mean* and $tau$ the *precision*. Here $1/\\tau = \\sigma^2$; it is the reciprocal of the variance. While we do not use this formulation in this book, it underscores that the variance is a measure of how precise our data is. A small variance yields large precision - our measurement is very precise. Conversely, a large variance yields low precision - our belief is spread out across a large area. You should become comfortable with thinking about Gaussians in these equivelant forms. Gaussians reflect our *belief* about a measurement, they express the *precision* of the measurement, and they express how much *variance* there is in the measurements. These are all different ways of stating the same fact.</div>"
     ]
    },
    {
@ -158,7 +182,7 @@
     "source": [
      "### Interactive Gaussians\n",
      "\n",
-      "For those that are using this directly in IPython Notebook, here is an interactive version of the guassian plots. Use the sliders to modify $\\mu$ and $\\sigma^2$. Adjusting $\\mu$ will move the graph to the left and right because you are adjusting the mean, and adjusting $\\sigma^2$ will make the bell curve thicker and thinner."
+      "For those that are reading this in IPython Notebook, here is an interactive version of the Gaussian plots. Use the sliders to modify $\\mu$ and $\\sigma^2$. Adjusting $\\mu$ will move the graph to the left and right because you are adjusting the mean, and adjusting $\\sigma^2$ will make the bell curve thicker and thinner."
     ]
    },
    {
@ -167,25 +191,21 @@
     "input": [
      "import math\n",
      "from IPython.html.widgets import interact, interactive, fixed\n",
-      "#from IPython.html import widgets\n",
-      "#from IPython.display import clear_output, display, HTML\n",
-      "\n",
-      "def gaussian (x, mu, sigma):\n",
-      "    ''' compute the gaussian with the specified mean(mu) and sigma'''\n",
-      "    return math.exp (-0.5 * (x-mu)**2 / sigma) / math.sqrt(2.*math.pi*sigma)\n",
+      "import IPython.html.widgets as widgets\n",
      "\n",
      "def plt_g (mu,variance):\n",
-      "    xs = np.arange(0,10,0.15)\n",
+      "    xs = np.arange(2,8,0.1)\n",
      "    ys = [gaussian (x, mu,variance) for x in xs]\n",
      "    plt.plot (xs, ys)\n",
      "    plt.ylim((0,1))\n",
      "    plt.show()\n",
      "\n",
-      "interact (plt_g, mu=(0,10), variance=(0.2,4.5))"
+      "interact (plt_g, mu=(0,10), variance=widgets.FloatSliderWidget(value=0.6,min=0.2,max=4.5))"
     ],
     "language": "python",
     "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": ""
    },
    {
     "cell_type": "markdown",
@ -193,14 +213,27 @@
     "source": [
      "#### Computational Properties of the Gaussian\n",
      "\n",
-      "Recall how our histogram filter worked. We had a vector (Python array) representing our belief at a certain moment in time. When we performed another measurement using the *sense()* function we had to multiply probabilities together, and when we performed the motion step using the *update()* function we had to shift and add probabilities. I've promised you that the Kalman filter uses essentially the same process, and that  it uses Gaussians instead of histograms, so you might reasonable expect that we will be multipling, adding, and shifting Gaussians in the Kalman filter.\n",
+      "Recall how our discrete Bayesian filter worked. We had a vector implemented as a numpy array representing our belief at a certain moment in time. When we performed another measurement using the *sense()* function we had to multiply probabilities together, and when we performed the motion step using the *update()* function we had to shift and add probabilities. I've promised you that the Kalman filter uses essentially the same process, and that  it uses Gaussians instead of histograms, so you might reasonable expect that we will be multipling, adding, and shifting Gaussians in the Kalman filter.\n",
      "\n",
-      "A typical math book would directly launch into a multipage proof of the behavior of Gaussians under these operations, but I don't see the value in that unless you plan to do statistics. I think the math will be much more intuitive and clear if we just start developing a Kalman filter using Gaussians, and I will provide the equations for multiplying and shifting Gaussians at the appropriate time. You will then be able to develop a physical intuition for what these operations do, rather than be forced to digest a lot of fairly abstract math.\n",
+      "A typical textbook would directly launch into a multipage proof of the behavior of Gaussians under these operations, but I don't see the value in that right now. I think the math will be much more intuitive and clear if we just start developing a Kalman filter using Gaussians. I will provide the equations for multiplying and shifting Gaussians at the appropriate time. You will then be able to develop a physical intuition for what these operations do, rather than be forced to digest a lot of fairly abstract math.\n",
      "\n",
-      "The key point, which I will only assert for now, is that all the operations are very simple, and that they preserve the properties of the Gaussian. This is somewhat remarkable, in that the Gaussian is a nonlinear function, and typically if you multiply a nonlinear equation with itself you end up with a different equation. For example, the shape of $sin(x)sin(x)$ is very different from $sin(x)$. But the result of multiplying two Gaussians is yet another Gaussian. This is a fundamental discovery, and the key reason why Kalman filters are possible\n",
+      "The key point, which I will only assert for now, is that all the operations are very simple, and that they preserve the properties of the Gaussian. This is somewhat remarkable, in that the Gaussian is a nonlinear function, and typically if you multiply a nonlinear equation with itself you end up with a different equation. For example, the shape of $sin(x)sin(x)$ is very different from $sin(x)$. But the result of multiplying two Gaussians is yet another Gaussian. This is a fundamental discovery, and the key reason why Kalman filters are possible"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "#### Summary and Key Points\n",
      "\n",
+      "The following points **must** be understood by you before we continue:\n",
      "\n",
-      "#### Summary and Key Points"
+      "* Normal distributions occur throughout nature\n",
+      "* They express a continuous probability distribution\n",
+      "* They are completely described by two parameters: the mean ($\\mu$) and variance ($\\sigma^2$)\n",
+      "* $\\mu$ is the average of all possible values\n",
+      "* $\\sigma^2$ represents how much our measurements vary from the mean\n",
+      "\n"
     ]
    },
    {
@ -216,7 +249,8 @@
     ],
     "language": "python",
     "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": ""
    }
   ],
   "metadata": {}