Lots of minor layout edits. Generated new PDF of book.

2014-05-28 23:15:39 -07:00 · 2014-05-28 23:15:39 -07:00 · 9fdb649396
commit 9fdb649396
parent 2e63fc3fad
8 changed files with 317 additions and 6968 deletions
--- a/Designing_Kalman_Filters.ipynb
+++ b/Designing_Kalman_Filters.ipynb
@ -1,7 +1,7 @@
 {
 "metadata": {
  "name": "",
-  "signature": "sha256:47e47ce6c78a985606fd325605b42a0e55e381111a625eb69820fb2771542ce2"
+  "signature": "sha256:6f0219267a66cc62e4a150a277649629939dd335bea9ca5857a6ff6f806acb53"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
@ -282,7 +282,7 @@
     "source": [
      "This first attempt at tracking a robot will closely resemble the 1-D dog tracking problem of previous chapters. This will allow us to 'get our feet wet' with Kalman filtering. So, instead of a sensor that outputs position in a hallway, we now have a sensor that supplies a noisy measurement of position in a 2-D space, such as an open field. That is, at each time $T$ it will provide an $(x,y)$ coordinate pair specifying the measurement of the sensor's position in the field.\n",
      "\n",
-      "Implemention of code to interact with real sensors is beyond the scope of this book, so as before we will program simple simuations in Python to represent the sensors. We will develop several of these sensors as we go, each with more complications, so as I program them I will just append a number to the function name. *pos_sensor1 ()* is the first sensor we write, and so on. \n",
+      "Implemention of code to interact with real sensors is beyond the scope of this book, so as before we will program simple simuations in Python to represent the sensors. We will develop several of these sensors as we go, each with more complications, so as I program them I will just append a number to the function name. $\\verb,pos_sensor1 (),$ is the first sensor we write, and so on. \n",
      "\n",
      "So let's start with a very simple sensor, one that travels in a straight line. It takes as input the last position, velocity, and how much noise we want, and returns the new position. "
     ]
@ -465,7 +465,7 @@
      "\\end{aligned}\n",
      "$$\n",
      "\n",
-      "Perhaps the $\\frac{1}{0.3048}$ caught you off guard. At first I always found  $\\small\\mathbf{H}$ a bit counter-intuitive to design because it takes you from the state variables to the measurements, but the Kalman filter is trying to take measurements and produce state variables to them. If you read the math chapter you will understand why  $\\small\\mathbf{H}$ is designed to go in this direction. If not, well, you'll have to remember how this works and trust that it is correct.\n",
+      "Perhaps the $\\frac{1}{0.3048}$ caught you off guard. At first I always found  $\\mathbf{H}$ a bit counter-intuitive to design because it takes you from the state variables to the measurements, but the Kalman filter is trying to take measurements and produce state variables to them. If you read the math chapter you will understand why  $\\mathbf{H}$ is designed to go in this direction. If not, well, you'll have to remember how this works and trust that it is correct.\n",
      "\n",
      "So, here is the Python that implements this:"
     ]
@ -684,7 +684,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "I encourage you to play with this, setting $\\small\\mathbf{Q}$ and $\\small\\mathbf{R}$ to various values.  However, we did a fair amount of that sort of thing in the last chapters, and we have a lot of material to cover, so I will move on to more complicated cases where we will also have a chance to experience changing these values.\n",
+      "I encourage you to play with this, setting $\\mathbf{Q}$ and $\\mathbf{R}$ to various values.  However, we did a fair amount of that sort of thing in the last chapters, and we have a lot of material to cover, so I will move on to more complicated cases where we will also have a chance to experience changing these values.\n",
      "\n",
      "Now I will run the same Kalman filter with the same settings, but also plot the covariance ellipse for $x$ and $y$. First, the code without explanation, so we can see the output. I print the last covariance to see what it looks like. But before you scroll down to look at the results, what do you think it will look like? You have enough information to figure this out but this is still new to you, so don't be discouraged if you get it wrong."
     ]
@ -774,7 +774,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Did you correctly predict what the covariance matrix and plots would look like? Perhaps you were expecting a tilted ellipse, as in the last chapters. If so, recall that in those chapters we were not plotting $x$ against $y$, but $x$ against $\\dot{x}$. $x$ *is correlated* to $\\dot{x}$, but $x$ is not correlated or dependent on $y$. Therefore our ellipses are not tilted. Furthermore, the noise for both $x$ and $y$ are modelled to have the same value, 5, in $\\small\\mathbf{R}$. If we were to set R to, for example,\n",
+      "Did you correctly predict what the covariance matrix and plots would look like? Perhaps you were expecting a tilted ellipse, as in the last chapters. If so, recall that in those chapters we were not plotting $x$ against $y$, but $x$ against $\\dot{x}$. $x$ *is correlated* to $\\dot{x}$, but $x$ is not correlated or dependent on $y$. Therefore our ellipses are not tilted. Furthermore, the noise for both $x$ and $y$ are modelled to have the same value, 5, in $\\mathbf{R}$. If we were to set R to, for example,\n",
      "\n",
      "$$\\mathbf{R} = \\begin{bmatrix}10&0\\\\0&5\\end{bmatrix}$$\n",
      "\n",
@ -785,7 +785,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "The final P tells us everything we need to know about the correlation beween the state variables. If we look at the diagonal alone we see the variance for each variable. In other words $P_{0,0}$ is the variance for x, $P_{1,1}$ is the variance for $\\dot{x}$, $P_{2,2}$ is the variance for y, and $P_{3,3}$ is the variance for $\\dot{y}$. We can extract the diagonal of a matrix using *numpy.diag()*."
+      "The final P tells us everything we need to know about the correlation beween the state variables. If we look at the diagonal alone we see the variance for each variable. In other words $\\mathbf{P}_{0,0}$ is the variance for x, $\\mathbf{P}_{1,1}$ is the variance for $\\dot{x}$, $\\mathbf{P}_{2,2}$ is the variance for y, and $\\mathbf{P}_{3,3}$ is the variance for $\\dot{y}$. We can extract the diagonal of a matrix using *numpy.diag()*."
     ]
    },
    {
@ -848,9 +848,9 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "The covariance contains the data for $x$ and $\\dot{x}$ in the upper left because of how it is organized. Recall that entries $\\small\\mathbf{P}_{i,j}$ and $\\small\\mathbf{P}_{j,i}$ contain $p\\sigma_1\\sigma_2$.\n",
+      "The covariance contains the data for $x$ and $\\dot{x}$ in the upper left because of how it is organized. Recall that entries $\\mathbf{P}_{i,j}$ and $\\mathbf{P}_{j,i}$ contain $p\\sigma_1\\sigma_2$.\n",
      "\n",
-      "Finally, let's look at the lower left side of $\\small\\mathbf{P}$, which is all 0s. Why 0s? Consider $\\small\\mathbf{P}_{3,0}$. That stores the term $p\\sigma_3\\sigma_0$, which is the covariance between $\\dot{y}$ and $x$. These are independent, so the term will be 0. The rest of the terms are for similarly independent variables."
+      "Finally, let's look at the lower left side of $\\mathbf{P}$, which is all 0s. Why 0s? Consider $\\mathbf{P}_{3,0}$. That stores the term $p\\sigma_3\\sigma_0$, which is the covariance between $\\dot{y}$ and $x$. These are independent, so the term will be 0. The rest of the terms are for similarly independent variables."
     ]
    },
    {
@ -920,14 +920,19 @@
      "The first step is to design our state variables. We will assume that the robot is travelling in a straight direction with constant velocity. This is unlikely to be true for a long period of time, but is acceptable for short periods of time. This does not differ from the previous problem - we will want to track the values for the robot's position and velocity. Hence,\n",
      "\n",
      "$$\\mathbf{x} = \n",
-      "\\begin{bmatrix}x\\\\v_x\\\\y\\\\v_y\\end{bmatrix}$$\n",
-      "\n",
+      "\\begin{bmatrix}x\\\\v_x\\\\y\\\\v_y\\end{bmatrix}$$"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
      "The next step is to design the state transistion function. This also will be the same as the previous problem, so without further ado,\n",
      "\n",
      "$$\n",
      "\\mathbf{x}' = \\begin{bmatrix}1& \\Delta t& 0& 0\\\\0& 1& 0& 0\\\\0& 0& 1& \\Delta t\\\\ 0& 0& 0& 1\\end{bmatrix}\\mathbf{x}$$\n",
      "\n",
-      "The next step is to design the control inputs. We have none, so we set ${\\small\\mathbf{B}}=0$.\n",
+      "The next step is to design the control inputs. We have none, so we set ${\\mathbf{B}}=0$.\n",
      "\n",
      "The next step is to design the measurement function $\\mathbf{z} = \\mathbf{Hx}$. We can model the measurement using the Pythagorean theorem.\n",
      "\n",
@ -945,7 +950,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Instead of computing $\\small\\mathbf{H}$ we will compute the partial derivative of $\\mathbf{H}$ with respect to the robot's position $\\small\\mathbf{x}$. You are probably familiar with the concept of partial derivative, but if not, it just means how $\\small \\mathbf{H}$ changes with respect to the robot's position.\n",
+      "Instead of computing $\\mathbf{H}$ we will compute the partial derivative of $\\mathbf{H}$ with respect to the robot's position $\\mathbf{x}$. You are probably familiar with the concept of partial derivative, but if not, it just means how $\\mathbf{H}$ changes with respect to the robot's position.\n",
      "\n",
      "$$\\frac{\\partial\\mathbf{h}}{\\partial\\mathbf{x}}=\n",
      "\\small\\begin{bmatrix}\n",
@ -974,12 +979,17 @@
      "\\end{bmatrix}\n",
      "$$\n",
      "\n",
-      "However, this raises a huge problem. We are no longer computing $\\small\\mathbf{H}$, but $\\Delta\\small\\mathbf{H}$, the change of $\\small\\mathbf{H}$. If we passed this into our Kalman filter without altering the rest of the design the output would be nonsense. Recall, for example, that we multiply $\\small\\mathbf{Hx}$ to generate the measurements that would result from the given estimate of $\\small\\mathbf{x}$ But now that $\\small\\mathbf{H}$ is linearized around our position it contains the *change* in the measurement function. \n",
+      "However, this raises a huge problem. We are no longer computing $\\mathbf{H}$, but $\\Delta\\mathbf{H}$, the change of $\\mathbf{H}$. If we passed this into our Kalman filter without altering the rest of the design the output would be nonsense. Recall, for example, that we multiply $\\mathbf{Hx}$ to generate the measurements that would result from the given estimate of $\\mathbf{x}$ But now that $\\mathbf{H}$ is linearized around our position it contains the *change* in the measurement function. \n",
      "\n",
-      "We are forced, therefore, to use the *change* in $\\small\\mathbf{x}$ for our state variables. So we have to go back and redesign our state variables. \n",
-      "\n",
-      ">Please note this is a completely normal occurance in designing Kalman filters. The textbooks present examples like this as *fait accompli*, as if it is trivially obvious that the state variables needed to be velocities, not positions. Perhaps once you do enough of these problems it would be trivially obvious, but at that point why are you reading a textbook? I find myself reading through a presentation multiple times, trying to figure out why they made a choice, finally to realize that it is because of the consequences of something on the next page. My presentation is longer, but it reflects what actually happens when you design a filter. You make what seem reasonable design choices, and as you move forward you discover properties that require you to recast your earlier steps. As a result, I am going to somewhat abandon my **step 1**, **step 2**, etc.,  approach, since so many real problems are not quite that straightforward.\n",
+      "We are forced, therefore, to use the *change* in $\\mathbf{x}$ for our state variables. So we have to go back and redesign our state variables. \n",
      "\n",
+      ">Please note this is a completely normal occurance in designing Kalman filters. The textbooks present examples like this as *fait accompli*, as if it is trivially obvious that the state variables needed to be velocities, not positions. Perhaps once you do enough of these problems it would be trivially obvious, but at that point why are you reading a textbook? I find myself reading through a presentation multiple times, trying to figure out why they made a choice, finally to realize that it is because of the consequences of something on the next page. My presentation is longer, but it reflects what actually happens when you design a filter. You make what seem reasonable design choices, and as you move forward you discover properties that require you to recast your earlier steps. As a result, I am going to somewhat abandon my **step 1**, **step 2**, etc.,  approach, since so many real problems are not quite that straightforward."
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
      "If our state variables contain the velocities of the robot and not the position then how do we track where the robot is? We can't. Kalman filters that are linearized in this fashion use what is called a *nominal trajectory* - i.e. you assume a position and track direction, and then apply the changes in velocity and acceleration to compute the changes in that trajectory. How could it be otherwise? Recall the graphic showing the intersection of the two range circles - there are two areas of intersection. Think of what this would look like if the two transmitters were very close to each other - the intersections would be two very long cresent shapes. This Kalman filter, as designed, has no way of knowing your true position from only distance measurements to the transmitters. Perhaps your mind is already leaping to ways of working around this problem. If so, stay engaged, as later sections and chapters will provide you with these techniques. Presenting the full solution all at once leads to more confusion than insight, in my opinion. \n",
      "\n",
      "So let's redesign our *state transition function*. We are assuming constant velocity and no acceleration, giving state equations of\n",
@ -994,7 +1004,7 @@
      "$$\n",
      "\\mathbf{F} = \\begin{bmatrix}0 &1 & 0& 0\\\\0& 0& 0& 0\\\\0& 0& 0& 1\\\\ 0& 0& 0& 0\\end{bmatrix}$$\n",
      "\n",
-      "A final complication comes from the measurements that we pass in. $\\small\\mathbf{Hx}$ is now computing the *change* in the measurement from our nominal position, so the measurement that we pass in needs to be not the range to A and B, but the *change* in range from our measured range to our nomimal position. \n",
+      "A final complication comes from the measurements that we pass in. $\\mathbf{Hx}$ is now computing the *change* in the measurement from our nominal position, so the measurement that we pass in needs to be not the range to A and B, but the *change* in range from our measured range to our nomimal position. \n",
      "\n",
      "There is a lot here to take in, so let's work through the code bit by bit. First we will define a function to compute $\\frac{\\partial\\mathbf{h}}{\\partial\\mathbf{x}}$ for each time step."
     ]
@ -1059,7 +1069,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Now let's write the code to compute $\\small\\frac{\\partial\\mathbf{h}}{\\partial\\mathbf{x}}$."
+      "Now let's write the code to compute $\\frac{\\partial\\mathbf{h}}{\\partial\\mathbf{x}}$."
     ]
    },
    {
--- a/Kalman_Filters.ipynb
+++ b/Kalman_Filters.ipynb
--- a/Multidimensional_Kalman_Filters.ipynb
+++ b/Multidimensional_Kalman_Filters.ipynb
@ -1,7 +1,7 @@
 {
 "metadata": {
  "name": "",
-  "signature": "sha256:00dc5e666bed6e6a433a1c086c374cef621d05ba381473abe5a36db5a5cd4ef7"
+  "signature": "sha256:df1d770cec87d6a479373130e0f54d7645b1762707c5d4ae14ccfc6562a26688"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
@ -328,16 +328,16 @@
      "\n",
      "If you are reasonably well-versed in linear algebra this equation should look quite managable; if not, don't worry! If you want to learn the math we will cover it in detail in the next optional chapter. If you choose to skip that chapter the rest of this book should still be managable for you\n",
      "\n",
-      "I have programmed it and saved it in the file *stats.py* with the function name *multivariate_gaussian*. I am not showing the code here because I have taken advantage of the linear algebra solving apparatus of numpy to efficiently compute a solution - the code does not correspond to the equation in a one to one manner. If you wish to view the code, I urge you to either load it in an editor, or load it into this worksheet by putting *%load -s multivariate_gaussian stats.py* in the next cell and executing it with ctrl-enter. "
+      "I have programmed it and saved it in the file *stats.py* with the function name *multivariate_gaussian*. I am not showing the code here because I have taken advantage of the linear algebra solving apparatus of numpy to efficiently compute a solution - the code does not correspond to the equation in a one to one manner. If you wish to view the code, I urge you to either load it in an editor, or load it into this worksheet by putting $\\verb,%load -s multivariate_gaussian stats.py,$ in the next cell and executing it with ctrl-enter. "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      ">As of version 0.14 scipy.stats has implemented the multivariate normal equation with the function **multivariate_normal()**. It is superior to my function in several ways. First, it is implemented in Fortran, and is therefore faster than mine. Second, it implements a 'frozen' form where you set the mean and covariance once, and then calculate the probability for any number of values for x over any arbitrary number of calls. This is much more efficient then recomputing everything in each call. So, if you have version 0.14 or later you may want to substitute my function for the built in version. Use **scipy.version.version** to get the version number. I deliberately named my function **multivariate_gaussian()** to ensure it is never confused with the built in version.\n",
+      ">As of version 0.14 scipy.stats has implemented the multivariate normal equation with the function $\\verb,multivariate_normal(),$. It is superior to my function in several ways. First, it is implemented in Fortran, and is therefore faster than mine. Second, it implements a 'frozen' form where you set the mean and covariance once, and then calculate the probability for any number of values for x over any arbitrary number of calls. This is much more efficient then recomputing everything in each call. So, if you have version 0.14 or later you may want to substitute my function for the built in version. Use $\\verb,scipy.version.version,$ to get the version number. I deliberately named my function $\\verb,multivariate_gaussian(),$ to ensure it is never confused with the built in version.\n",
      "\n",
-      "> If you intend to use Python for Kalman filters, you will want to read the <a href=\"http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html\">tutorial</a> for the scipy.stats module, which explains 'freezing' distributions and other very useful features. As of this date, it includes an example of using the multivariate_normal function, which does work a bit differently from my function."
+      "> If you intend to use Python for Kalman filters, you will want to read the <a href=\"http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html\">tutorial</a> for the $\\verb,scipy.stats,$ module, which explains 'freezing' distributions and other very useful features. As of this date, it includes an example of using the multivariate_normal function, which does work a bit differently from my function."
     ]
    },
    {
@ -396,7 +396,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Finally, we have to define our covariance matrix. In the problem statement we did not mention any correlation between $x$ and $y$, and we will assume there is none. This makes sense; a dog can choose to independently wander in either the $x$ direction or $y$ direction without affecting the other. If there is no correlation between the values you just fill in the diagonal of the covariance matrix with the variances. I will use the seemingly arbitrary name $P$ for the covariance matrix. The Kalman filters use the name $P$ for this matrix, so I will introduce the terminology now to avoid explaining why I change the name later. "
+      "Finally, we have to define our covariance matrix. In the problem statement we did not mention any correlation between $x$ and $y$, and we will assume there is none. This makes sense; a dog can choose to independently wander in either the $x$ direction or $y$ direction without affecting the other. If there is no correlation between the values you just fill in the diagonal of the covariance matrix with the variances. I will use the seemingly arbitrary name $\\textbf{P}$ for the covariance matrix. The Kalman filters use the name $\\textbf{P}$ for this matrix, so I will introduce the terminology now to avoid explaining why I change the name later. "
     ]
    },
    {
@ -534,7 +534,7 @@
     "source": [
      "The result is clearly a 3D bell shaped curve. We can see that the gaussian is centered around (2,7), and that the probability quickly drops away in all directions. On the sides of the plot I have drawn the Gaussians for $x$ in greens and for $y$ in orange.\n",
      "\n",
-      "As beautiful as this is, it is perhaps a bit hard to get useful information. For example, it is not easy to tell if $x$ and $y$ both have the same variance or not. So for most of the rest of this book we will display multidimensional Gaussian using contour plots. I will use some helper functions in *gaussian.py* to plot them. If you are interested in linear algebra go ahead and look at the code used to produce these contours, otherwise feel free to ignore it."
+      "As beautiful as this is, it is perhaps a bit hard to get useful information. For example, it is not easy to tell if $x$ and $y$ both have the same variance or not. So for most of the rest of this book we will display multidimensional Gaussian using contour plots. I will use some helper functions in $\\verb,gaussian.py,$ to plot them. If you are interested in linear algebra go ahead and look at the code used to produce these contours, otherwise feel free to ignore it."
     ]
    },
    {
@ -581,10 +581,10 @@
      "The first plot uses the mean and covariance matrices of\n",
      "$$\n",
      "\\begin{aligned}\n",
-      "\\mu &= \\begin{bmatrix}2\\\\7\\end{bmatrix} \\\\\n",
-      "\\sigma^2 &= \\begin{bmatrix}2&0\\\\0&2\\end{bmatrix}\n",
+      "\\mathbf{\\mu} &= \\begin{bmatrix}2\\\\7\\end{bmatrix} \\\\\n",
+      "\\mathbf{\\sigma}^2 &= \\begin{bmatrix}2&0\\\\0&2\\end{bmatrix}\n",
      "\\end{aligned}\n",
-      "$$\n",
+      "$$ \n",
      "\n",
      "Let this be our current belief about the position of our dog in a field. In other words, we believe that he is positioned at (2,7) with a variance of $\\sigma^2=2$ for both x and y. The contour plot shows where we believe the dog is located with the '+' in the center of the ellipse. The ellipse shows the boundary for the $1\\sigma^2$ probability - points where the dog is quite likely to be based on our current knowledge. Of course, the dog might be very far from this point, as Gaussians allow the mean to be any value. For example, the dog could be at (3234.76,189989.62), but that has vanishing low probability of being true. Generally speaking displaying  the $1\\sigma^2$ to $2\\sigma^2$ contour captures the most likely values for the distribution. An equivelent way of thinking about this is the circle/ellipse shows us the amount of error in our belief. A tiny circle would indicate that we have a very small error, and a very large circle indicates a lot of error in our belief. We will use this throughout the rest of the book to display and evaluate the accuracy of our filters at any point in time. "
     ]
@ -768,11 +768,11 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "So in general terms we can show how a multidimensional Kalman filter works. In the example above, we compute velocity from the previous position measurements using something called the **measurement function**. Then we predict the next position by using the current estimate and something called the **state transition function**. In our example above,\n",
+      "So in general terms we can show how a multidimensional Kalman filter works. In the example above, we compute velocity from the previous position measurements using something called the *measurement function*. Then we predict the next position by using the current estimate and something called the *state transition function*. In our example above,\n",
      "\n",
      "$$new\\_position = old\\_position + velocity*time$$ \n",
      "\n",
-      "Next, we take the measurement from the sensor, and compare it to the prediction we just made. In a world with perfect sensors and perfect airplanes the prediction will always match the measured value. In the real world they will always be at least slightly different. We call the difference between the two the **residual**. Finally, we use something called the **Kalman gain** to update our estimate to be somewhere between the measured position and the predicted position. I will not describe how the gain is set, but suppose we had perfect confidence in our measurement - no error is possible. Then, clearly, we would set the gain so that 100% of the position came from the measurement, and 0% from the prediction. At the other extreme, if he have no confidence at all in the sensor (maybe it reported a hardware fault), we would set the gain so that 100% of the position came from the prediction, and 0% from the measurement. In normal cases, we will take a ratio of the two: maybe 53% of the measurement, and 47% of the prediction. The gain is updated on every cycle based on the variance of the variables (in a way yet to be explained). It should be clear that if the variance of the measurement is low, and the variance of the prediction is high we will favor the measurement, and vice versa. \n",
+      "Next, we take the measurement from the sensor, and compare it to the prediction we just made. In a world with perfect sensors and perfect airplanes the prediction will always match the measured value. In the real world they will always be at least slightly different. We call the difference between the two the *residual*. Finally, we use something called the *Kalman gain* to update our estimate to be somewhere between the measured position and the predicted position. I will not describe how the gain is set, but suppose we had perfect confidence in our measurement - no error is possible. Then, clearly, we would set the gain so that 100% of the position came from the measurement, and 0% from the prediction. At the other extreme, if he have no confidence at all in the sensor (maybe it reported a hardware fault), we would set the gain so that 100% of the position came from the prediction, and 0% from the measurement. In normal cases, we will take a ratio of the two: maybe 53% of the measurement, and 47% of the prediction. The gain is updated on every cycle based on the variance of the variables (in a way yet to be explained). It should be clear that if the variance of the measurement is low, and the variance of the prediction is high we will favor the measurement, and vice versa. \n",
      "\n",
      "The chart shows a prior estimate of $x=1$ and $\\dot{x}=1$ ($\\dot{x}$ is the shorthand for the derivative of x, which is velocity). Therefore we predict $\\hat{x}=2$. However, the new measurement $x^{'}=1.3$, giving a residual $r=0.7$. Finally, Kalman filter gain $k$ gives us a new estimate of $\\hat{x^{'}}=1.8$.\n",
      "\n",
@ -1144,30 +1144,30 @@
      "\n",
      "**1**: We just assign the initial value for our state. Here we just initialize both the position and velocity to zero. \n",
      "\n",
-      "**2**: We set $\\small\\mathbf{F}=(\\begin{smallmatrix}1&1\\\\0&1\\end{smallmatrix})$, as in design step 2 above. \n",
+      "**2**: We set $\\textbf{F}=(\\begin{smallmatrix}1&1\\\\0&1\\end{smallmatrix})$, as in design step 2 above. \n",
      "\n",
-      "**3**: We set $H=(\\begin{smallmatrix}1&0\\end{smallmatrix})$, as in design step 3 above.\n",
+      "**3**: We set $\\textbf{H}=(\\begin{smallmatrix}1&0\\end{smallmatrix})$, as in design step 3 above.\n",
      "\n",
-      "**4**: We set $\\small\\mathbf{R} = 5$ and $\\small\\mathbf{Q}=0$ as in steps 5 and 6.\n",
+      "**4**: We set $\\textbf{R} = 5$ and $\\mathbf{Q}=0$ as in steps 5 and 6.\n",
      "\n",
      "**5**: Recall in the last chapter we set our initial belief to $\\mathcal{N}(\\mu,\\sigma^2)=\\mathcal{N}(0,500)$ to signify our lack of knowledge about the initial conditions. We implemented this in Python with a list that contained both $\\mu$ and $\\sigma^2$ in the variable $pos$:\n",
      "\n",
      "    pos = (0,500)\n",
      "    \n",
-      "Multidimensional Kalman filters stores the state variables in $\\mathbf{x}$ and their *covariance* in $\\small\\mathbf{P}$. These are $f.x$ and $f.P$ in the code above. Notionally, this is similar as the one dimension case, but instead of having a mean and variance we have a mean and covariance. For the multidimensional case, we have\n",
+      "Multidimensional Kalman filters stores the state variables in $\\mathbf{x}$ and their *covariance* in $\\mathbf{P}$. These are $\\verb,f.x,$ and $\\verb,f.P,$ in the code above. Notionally, this is similar as the one dimension case, but instead of having a mean and variance we have a mean and covariance. For the multidimensional case, we have\n",
      "\n",
      "$$\\mathcal{N}(\\mu,\\sigma^2)=\\mathcal{N}(\\mathbf{x},\\mathbf{P})$$\n",
      "\n",
-      "$\\small\\mathbf{P}$ is initialized to the identity matrix of size $n{\\times}n$, so multiplying by 500 assigns a variance of 500 to $x$ and $\\dot{x}$. So $f.P$ contains\n",
+      "$\\mathbf{P}$ is initialized to the identity matrix of size $n{\\times}n$, so multiplying by 500 assigns a variance of 500 to $x$ and $\\dot{x}$. So $\\verb,f.P,$ contains\n",
      "\n",
      "$$\\begin{bmatrix} 500&0\\\\0&500\\end{bmatrix}$$\n",
      "\n",
      "This will become much clearer once we look at the covariance matrix in detail in later sessions. For now recognize that each diagonal element $e_{ii}$ is the variance for the $ith$ state variable. \n",
      "\n",
-      "> Summary: For our dog tracking problem, in the 1-D case $\\mu$ was the position, and $\\sigma^2$ was the variance. In the 2-D case $\\small\\mathbf{x}$ is our position and velocity, and $\\small\\mathbf{P}$ is the *covariance* of the position and velocity. It is the same thing, just in higher dimensions!\n",
+      "> Summary: For our dog tracking problem, in the 1-D case $\\mu$ was the position, and $\\sigma^2$ was the variance. In the 2-D case $\\mathbf{x}$ is our position and velocity, and $\\mathbf{P}$ is the *covariance* of the position and velocity. It is the same thing, just in higher dimensions!\n",
      "\n",
      "\n",
-      "All that is left is to run the code! The *DogSensor* class from the previous chapter has been placed in *DogSensor.py*."
+      "All that is left is to run the code! The $\\tt DogSensor$ class from the previous chapter has been placed in $\\verb,DogSensor.py,$."
     ]
    },
    {
@ -1217,9 +1217,9 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "This is the complete code for the filter, and most of it is just boilerplate. The first function *dog_tracking_filter()* is a helper function that creates a *KalmamFilter* object with specified $\\small\\mathbf{R}$, $\\small\\mathbf{Q}$ and $\\small\\mathbf{P}$ matrices. We've shown this code already, so I will not discuss it more here. \n",
+      "This is the complete code for the filter, and most of it is just boilerplate. The first function $\\verb,dog_tracking_filter(),$ is a helper function that creates a $\\verb,KalmanFilter,$ object with specified $\\mathbf{R}$, $\\mathbf{Q}$ and $\\mathbf{P}$ matrices. We've shown this code already, so I will not discuss it more here. \n",
      "\n",
-      "The function *filter_dog()* implements the filter itself.  Lets work through it line by line. The first line creates the simulation of the DogSensor, as we have seen in the previous chapter.\n",
+      "The function $\\verb,filter_dog(),$ implements the filter itself.  Lets work through it line by line. The first line creates the simulation of the DogSensor, as we have seen in the previous chapter.\n",
      "\n",
      "    dog = DogSensor(velocity=1, noise=noise)\n",
      "\n",
@ -1233,7 +1233,7 @@
      "    zs  = [None] * count\n",
      "    cov = [None] * count\n",
      "    \n",
-      "Finally we get to the filter. All we need to do is perform the update and predict steps of the Kalman filter for each measurement. The *KalmanFilter* class provides the two functions *update()* and *predict()* for this purpose. *update()* performs the measurement update step of the Kalman filter, and so it takes a variable containing the sensor measurement. \n",
+      "Finally we get to the filter. All we need to do is perform the update and predict steps of the Kalman filter for each measurement. The $\\verb,KalmanFilter,$ class provides the two functions $\\verb,update(),$ and $\\verb,predict(),$ for this purpose. $\\verb,update(),$ performs the measurement update step of the Kalman filter, and so it takes a variable containing the sensor measurement. \n",
      "\n",
      "Absent the bookkeeping work of storing the filter's data, the for loop reads:\n",
      "\n",
@ -1242,9 +1242,9 @@
      "        dog_filter.update (z)\n",
      "        dog_filter.predict()\n",
      "    \n",
-      "It really cannot get much simpler than that. As we tackle more complicated problems this code will remain largely the same; all of the work goes into setting up the $KalmanFilter$ variables; executing the filter is trivial.\n",
+      "It really cannot get much simpler than that. As we tackle more complicated problems this code will remain largely the same; all of the work goes into setting up the $\\verb,KalmanFilter,$ variables; executing the filter is trivial.\n",
      "\n",
-      "Now let's look at the result. Here is some code that calls *filter_track()* and then plots the result. It is fairly uninteresting code, so I will not walk through it."
+      "Now let's look at the result. Here is some code that calls $\\verb,filter_track(),$ and then plots the result. It is fairly uninteresting code, so I will not walk through it."
     ]
    },
    {
@ -1285,7 +1285,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Finally, call it. We will start by filtering 100 measurements with a noise factor of 30, $\\small\\mathbf{R}=5$ and $\\small\\mathbf{Q}=0$."
+      "Finally, call it. We will start by filtering 100 measurements with a noise factor of 30, $\\mathbf{R}=5$ and $\\mathbf{Q}=0$."
     ]
    },
    {
@ -1324,7 +1324,7 @@
      "\n",
      "The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (drawn in green). After the initial settling in period the filter should track the dog's position very closely.\n",
      "\n",
-      "The next two plots show the variance of $x$ and of $\\dot{x}$. If you look at the code, you will see that I have plotted the diagonals of $\\small\\mathbf{P}$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $\\small\\mathbf{P}[0,0]$ is the variance of $x$, and $\\small\\mathbf{P}[1,1]$ is the variance of $\\dot{x}$. You can see that despite initializing $\\small\\mathbf{P}=(\\begin{smallmatrix}500&0\\\\0&500\\end{smallmatrix})$ we quickly converge to small variances for both the position and velocity. We will spend a lot of time on the covariance matrix later, so for now I will leave it at that.\n",
+      "The next two plots show the variance of $x$ and of $\\dot{x}$. If you look at the code, you will see that I have plotted the diagonals of $\\mathbf{P}$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $\\mathbf{P}[0,0]$ is the variance of $x$, and $\\mathbf{P}[1,1]$ is the variance of $\\dot{x}$. You can see that despite initializing $\\mathbf{P}=(\\begin{smallmatrix}500&0\\\\0&500\\end{smallmatrix})$ we quickly converge to small variances for both the position and velocity. We will spend a lot of time on the covariance matrix later, so for now I will leave it at that.\n",
      "\n",
      "In the previous chapter we filtered very noisy signals with much simpler code than the code above. However, realize that right now we are working with a very simple example - an object moving through 1-D space and one sensor. That is about the limit of what we can compute with the code in the last chapter. In contrast, we can implement very complicated, multidimensional filter with this code merely by altering are assignments to the filter's variables. Perhaps we want to track 100 dimensions in financial models. Or we have an aircraft with a GPS, INS, TACAN, radar altimeter, baro altimeter, and airspeed indicator, and we want to integrate all those sensors into a model that predicts position, velocity, and accelerations in 3D (which requires 9 state variables). We can do that with the code in this chapter."
     ]
@ -1341,9 +1341,9 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "The code in the $KalmanFilter$ is a nearly verbatim transcription of the linear algebra equations. I take advantage of numpy matrices to implement the linear algebra. It is worth looking at this code if for no other reason than to realize how easy it is to implement linear algebra with Python and numpy. For most of this book you will only really need to know how to call this class, not how to implement it from scratch.\n",
+      "The code in the $\\verb,KalmanFilter,$ is a nearly verbatim transcription of the linear algebra equations. I take advantage of numpy matrices to implement the linear algebra. It is worth looking at this code if for no other reason than to realize how easy it is to implement linear algebra with Python and numpy. For most of this book you will only really need to know how to call this class, not how to implement it from scratch.\n",
      "\n",
-      "> **sidebar**: numpy provides two data structures which can be used to perform linear algebra: *numpy.array* and *numpy.matrix*. The usual advice is to use *numpy.array*, not *numpy.matrix*. Ever the contrarian, I have chosen to use *numpy.matrix*, but for what I think are good pedalogical reasons. *numpy.array* is usually recommended because it can be sized to any arbitrary number of dimensions, and *numpy.matrix* is constrained to two dimensions. However, for Kalman filters we only need 2 dimensions. More importantly, *numpy.matrix* allows you to use very natural sytax. Multipying a by b is written *a&ast;b* if using *numpy.matrix*, but *a.dot(b)* if they are *numpy.array*. It is also more natural to mix scalars and matrices using *numpy.matrix*. Finally, the resulting code is extremely close to the equivelent Matlab code; if you are more familiar with Matlab than Python this code should feel very familiar to you.\n",
+      "> **sidebar**: numpy provides two data structures which can be used to perform linear algebra: $\\verb,numpy.array,$ and $\\verb,numpy.matrix,$. The usual advice is to use $\\verb,numpy.array,$, not $\\verb,numpy.matrix,$. Ever the contrarian, I have chosen to use $\\verb,numpy.matrix,$, but for what I think are good pedalogical reasons. $\\verb,numpy.array,$ is usually recommended because it can be sized to any arbitrary number of dimensions, and $\\verb,numpy.matrix,$ is constrained to two dimensions. However, for Kalman filters we only need 2 dimensions. More importantly, $\\verb,numpy.matrix,$ allows you to use very natural sytax. Multipying a by b is written $\\verb,a*b,$ if using $\\verb,numpy.matrix,$, but $\\verb,a.dot(b),$ if they are $\\verb,numpy.array,$. It is also more natural to mix scalars and matrices using $\\verb,numpy.matrix,$. Finally, the resulting code is extremely close to the equivalent Matlab code; if you are more familiar with Matlab than Python this code should feel very familiar to you.\n",
      "\n",
      "\n",
      "The constructor of the class creates variables for each of the Kalman filter variables, and assigns them a reasonable default value. This is the code in its entirety:\n",
@ -1361,7 +1361,7 @@
      "        self.R = np.matrix(np.eye(1))\n",
      "        self.I = np.matrix(np.eye(dim))\n",
      "\n",
-      "The function *predict()* implements the Kalman filter prediction equations.\n",
+      "The function $\\verb,predict(),$ implements the Kalman filter prediction equations.\n",
      "\n",
      "    def predict(self):\n",
      "        self.x = (self.F*self.x) + (self.B * self.u)\n",
@ -1376,7 +1376,7 @@
      "\\end{aligned}\n",
      "$$\n",
      "\n",
-      "Finally, the *update()* function implements the Kalman filter update equations in an equally straightforward way:\n",
+      "Finally, the $\\verb,update(),$ function implements the Kalman filter update equations in an equally straightforward way:\n",
      "\n",
      "    def update(self, Z):\n",
      "        \"\"\"\n",
@ -1390,7 +1390,7 @@
      "        self.x = self.x + (K*y)\n",
      "        self.P = (self.I - (K*self.H))*self.P\n",
      "        \n",
-      "Finally, for those reading this online or in a printed form, here is the code in KalmanFilter.py absent the unit testing code that is included in that file."
+      "Finally, for those reading this online or in a printed form, here is the code in $\\verb,KalmanFilter.py,$ absent the unit testing code that is included in that file."
     ]
    },
    {
@ -1458,9 +1458,9 @@
     "source": [
      "Your results will vary slightly depending on what numbers your random generator creates for the noise componenet of the noise, but the filter in the last section should track the actual position quite well. Typically as the filter starts up the first several predictions are quite bad, and varies a lot. But as the filter builds its state the estimates become much better. \n",
      "\n",
-      "Let's start varying our parameters to see the effect of various changes. This is a *very normal* thing to be doing with Kalman filters. It is difficult, and often impossible to exactly model our sensors. An imperfect model means imperfect output from our filter. Engineers spend a lot of time tuning Kalman filters so that they perform well with real world sensors. We will spend time now to learn the effect of these changes. As you learn the effect of each change you will develop an intuition for how to design a Kalman filter. As I wrote earlier, designing a Kalman filter is as much art as science. The science is, roughly, designing the $\\small{\\mathbf{H}}$ and $\\small{\\mathbf{F}}$ matrices - they develop in an obvious manner based on the physics of the system we are modelling. The art comes in modelling the sensors and selecting appropriate values for the rest of our variables.\n",
+      "Let's start varying our parameters to see the effect of various changes. This is a *very normal* thing to be doing with Kalman filters. It is difficult, and often impossible to exactly model our sensors. An imperfect model means imperfect output from our filter. Engineers spend a lot of time tuning Kalman filters so that they perform well with real world sensors. We will spend time now to learn the effect of these changes. As you learn the effect of each change you will develop an intuition for how to design a Kalman filter. As I wrote earlier, designing a Kalman filter is as much art as science. The science is, roughly, designing the ${\\mathbf{H}}$ and ${\\mathbf{F}}$ matrices - they develop in an obvious manner based on the physics of the system we are modelling. The art comes in modelling the sensors and selecting appropriate values for the rest of our variables.\n",
      "\n",
-      "Let's look at the effects of the noise parameters $\\small{\\mathbf{R}}$ and $\\small{\\mathbf{Q}}$. I will only run the filter for twenty steps to ensure we can see see the difference between the measurements and filter output. I will start by holding $\\small{\\mathbf{R}}$ to 5 and vary $\\small{\\mathbf{Q}}$. "
+      "Let's look at the effects of the noise parameters ${\\mathbf{R}}$ and ${\\mathbf{Q}}$. I will only run the filter for twenty steps to ensure we can see see the difference between the measurements and filter output. I will start by holding ${\\mathbf{R}}$ to 5 and vary ${\\mathbf{Q}}$. "
     ]
    },
    {
@ -1498,15 +1498,15 @@
     "source": [
      "The filter in the first plot should follow the noisy measurement almost exactly. In the second plot the filter should vary from the measurement quite a bit, and be much closer to a straight line than in the first graph. \n",
      "\n",
-      "In the Kalman filter $\\small{\\mathbf{R}}$ is the *measurement noise* and $\\small{\\mathbf{Q}}$ is the *process uncertainty*. $\\small{\\mathbf{R}}$ is the same in both plots, so ignore it for the moment. Why does $\\small{\\mathbf{Q}}$ affect the plots this way?\n",
+      "In the Kalman filter ${\\mathbf{R}}$ is the *measurement noise* and ${\\mathbf{Q}}$ is the *process uncertainty*. ${\\mathbf{R}}$ is the same in both plots, so ignore it for the moment. Why does ${\\mathbf{Q}}$ affect the plots this way?\n",
      "\n",
      "Let's remind ourselves of what the term *process uncertainty* means. Consider the problem of tracking a ball. We can accurately model its behavior in statid air with math, but if there is any wind our model will diverge from reality. \n",
      "\n",
-      "In the first case we set $\\small{\\mathbf{Q}}=100$, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\". Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf{F}}$, but the upshot of that is to not trust the motion prediction step. So the filter will be computing velocity ($\\dot{x}$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to use but the measurements, and thus it follows the measurements closely. \n",
+      "In the first case we set ${\\mathbf{Q}}=100$, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\". Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf{F}}$, but the upshot of that is to not trust the motion prediction step. So the filter will be computing velocity ($\\dot{x}$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to use but the measurements, and thus it follows the measurements closely. \n",
      "\n",
-      "In the second case we set $\\small{\\mathbf{Q}}=0.1$, which is quite small. In physical terms we are telling the filter \"trust the motion computation, it is really good!\". Again, more strictly this actually says there is very small amounts of process noise, so the motion computation will be accurate. So the filter ends up ignoring some of the measurement as it jumps up and down, because the variation in the measurement does not match our trustworthy velocity prediction. \n",
+      "In the second case we set ${\\mathbf{Q}}=0.1$, which is quite small. In physical terms we are telling the filter \"trust the motion computation, it is really good!\". Again, more strictly this actually says there is very small amounts of process noise, so the motion computation will be accurate. So the filter ends up ignoring some of the measurement as it jumps up and down, because the variation in the measurement does not match our trustworthy velocity prediction. \n",
      "\n",
-      "Now let's leave $\\small{\\mathbf{Q}}=0.1$, but bump $\\small{\\mathbf{R}}$ up to $1000$. This is telling the filter that the measurement noise is very large. "
+      "Now let's leave ${\\mathbf{Q}}=0.1$, but bump ${\\mathbf{R}}$ up to $1000$. This is telling the filter that the measurement noise is very large. "
     ]
    },
    {
@ -1537,7 +1537,7 @@
      "\n",
      "The filter is strongly preferring the motion update to the measurement, so if the prediction is off it takes a lot of measurements to correct it. It will eventually correct because the velocity is a hidden variable - it is computed from the measurements, but it will take awhile.\n",
      "\n",
-      "To some extent you can get similar looking output by varying either $\\small{\\mathbf{R}}$ or $\\small{\\mathbf{Q}}$, but I urge you to not 'magically' alter these until you get output that you like. Always think about the physical implications of these assignments, and vary $\\small{\\mathbf{R}}$ and/or $\\small{\\mathbf{Q}}$ based on your knowledge of the system you are filtering."
+      "To some extent you can get similar looking output by varying either ${\\mathbf{R}}$ or ${\\mathbf{Q}}$, but I urge you to not 'magically' alter these until you get output that you like. Always think about the physical implications of these assignments, and vary ${\\mathbf{R}}$ and/or ${\\mathbf{Q}}$ based on your knowledge of the system you are filtering."
     ]
    },
    {
@ -1552,7 +1552,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "So far I have not given a lot of coverage of the covariance matrix. $\\small\\mathbf{P}$, the covariance matrix is nothing more than the variance of our state - such as the position of our dog. It has many elements in it, but don't be daunted; we will learn how to interpret a very large $9{\\times}9$ covariance matrix, or even larger.\n",
+      "So far I have not given a lot of coverage of the covariance matrix. $\\mathbf{P}$, the covariance matrix is nothing more than the variance of our state - such as the position of our dog. It has many elements in it, but don't be daunted; we will learn how to interpret a very large $9{\\times}9$ covariance matrix, or even larger.\n",
      "\n",
      "Recall the beginning of the chapter, where we provided the equation for the covariance matrix. It read:\n",
      "\n",
@ -1565,7 +1565,7 @@
      " \\end{pmatrix}\n",
      "$$\n",
      "\n",
-      "(I have subtituted $\\small\\mathbf{P}$ for $\\Sigma$ because of the nomenclature used by the Kalman filter literature).\n",
+      "(I have subtituted $\\mathbf{P}$ for $\\Sigma$ because of the nomenclature used by the Kalman filter literature).\n",
      "\n",
      "The diagonal contains the variance of each of our state variables. So, if our state variables are\n",
      "\n",
@ -1731,13 +1731,13 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "The output on these is a bit messy, but you should be able to see what is happening. In both plots we are drawing the covariance matrix for each point. We start with the covariance $\\small\\mathbf{P}=(\\begin{smallmatrix}50&0\\\\0&50\\end{smallmatrix})$, which signifies a lot of uncertainty about our initial belief. After we receive the first measurement the Kalman filter updates this belief, and so the variance is no longer as large. In the top plot the first ellipse (the one on the far left) should be a slighly squashed ellipse. As the filter continues processing the measurements the covariance ellipse quickly shifts shape until it settles down to being a long, narrow ellipse tilted in the direction of movement.\n",
+      "The output on these is a bit messy, but you should be able to see what is happening. In both plots we are drawing the covariance matrix for each point. We start with the covariance $\\mathbf{P}=(\\begin{smallmatrix}50&0\\\\0&50\\end{smallmatrix})$, which signifies a lot of uncertainty about our initial belief. After we receive the first measurement the Kalman filter updates this belief, and so the variance is no longer as large. In the top plot the first ellipse (the one on the far left) should be a slighly squashed ellipse. As the filter continues processing the measurements the covariance ellipse quickly shifts shape until it settles down to being a long, narrow ellipse tilted in the direction of movement.\n",
      "\n",
      "Think about what this means physically. The x-axis of the ellipse denotes our uncertainty in position, and the y-axis our uncertainty in velocity. So, an ellipse that is taller than it is wide signifies that we are more uncertain about the velocity than the position. Conversely, a wide, narrow ellipse shows high uncertainty in position and low uncertainty in velocity. Finally, the amount of tilt shows the amount of correlation between the two variables. \n",
      "\n",
-      "The first plot, with $\\small\\mathbf{R}=5$, finishes up with an ellipse that is wider than it is tall. If that is not clear I have printed out the variances for the last ellipse in the lower right hand corner. The variance for position is 3.85, and the variance for velocity is 3.0. \n",
+      "The first plot, with $\\mathbf{R}=5$, finishes up with an ellipse that is wider than it is tall. If that is not clear I have printed out the variances for the last ellipse in the lower right hand corner. The variance for position is 3.85, and the variance for velocity is 3.0. \n",
      "\n",
-      "In contrast, the second plot, with $\\small\\mathbf{R}=0.5$, has a final ellipse that is taller than wide. The ellipses in the second plot are all much smaller than the ellipses in the first plot. This stands to reason because a small $\\small\\mathbf{R}$ implies a small amount of noise in our measurements. Small noise means accurate predictions, and thus a strong belief in our position. "
+      "In contrast, the second plot, with $\\mathbf{R}=0.5$, has a final ellipse that is taller than wide. The ellipses in the second plot are all much smaller than the ellipses in the first plot. This stands to reason because a small $\\small\\mathbf{R}$ implies a small amount of noise in our measurements. Small noise means accurate predictions, and thus a strong belief in our position. "
     ]
    },
    {
@ -1752,7 +1752,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Why are the ellipses for $R=5$ shorter, and more tilted than the ellipses for $\\small\\mathbf{R}=0.5$. Hint: think about this in the context of what these ellipses mean physically, not in terms of the math. If you aren't sure about the answer,change $\\small\\mathbf{R}$ to truly large and small numbers such as 100 and 0.1, observe the changes, and think about what this means. "
+      "Why are the ellipses for $\\mathbf{R}=5$ shorter, and more tilted than the ellipses for $\\mathbf{R}=0.5$. Hint: think about this in the context of what these ellipses mean physically, not in terms of the math. If you aren't sure about the answer,change $\\mathbf{R}$ to truly large and small numbers such as 100 and 0.1, observe the changes, and think about what this means. "
     ]
    },
    {
@ -1767,11 +1767,11 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "The $x$ axis is for position, and $y$ is velocity. An ellipse that is vertical, or nearly so, says there is no correlation between position and velocity, and an ellipse that is diagnal says that there is a lot of correlation. Phrased that way, it sounds unlikely - either they are correlated or not. But this is a measure of the *output of the filter*, not a description of the actual, physical world. When $\\small\\mathbf{R}$ is very large we are telling the filter that there is a lot of noise in the measurements. In that case the Kalman gain $\\small\\mathbf{K}$ is set to favor the prediction over the measurement, and the prediction comes from the velocity state variable. So, there is a large correlation between $x$ and $\\dot{x}$. Conversely, if $\\small\\mathbf{R}$ is small, we are telling the filter that the measurement is very trustworthy, and $\\small\\mathbf{K}$ is set to favor the measurement over the prediction. Why would the filter want to use the prediction if the measurement is nearly perfect? If the filter is not using much from the prediction there will be very little correlation reported. \n",
+      "The $x$ axis is for position, and $y$ is velocity. An ellipse that is vertical, or nearly so, says there is no correlation between position and velocity, and an ellipse that is diagnal says that there is a lot of correlation. Phrased that way, it sounds unlikely - either they are correlated or not. But this is a measure of the *output of the filter*, not a description of the actual, physical world. When $\\mathbf{R}$ is very large we are telling the filter that there is a lot of noise in the measurements. In that case the Kalman gain $\\mathbf{K}$ is set to favor the prediction over the measurement, and the prediction comes from the velocity state variable. So, there is a large correlation between $x$ and $\\dot{x}$. Conversely, if $\\mathbf{R}$ is small, we are telling the filter that the measurement is very trustworthy, and $\\mathbf{K}$ is set to favor the measurement over the prediction. Why would the filter want to use the prediction if the measurement is nearly perfect? If the filter is not using much from the prediction there will be very little correlation reported. \n",
      "\n",
      "**This is a critical point to understand!**. The Kalman filter is just a mathematical model for a real world system. A report of little correlation *does not mean* there is no correlation in the physical system, just that there was no correlation in the mathematical model. It's just a report of how much measurement vs prediction was incorporated into the model.  \n",
      "\n",
-      "Let's bring that point home with a truly large measurement error. We will set $\\small\\mathbf{R}=500$. Think about what the plot will look like before scrolling down. To emphasize the issue, I will set the amount of noise injected into the measurements to 0, so the measurement will exactly equal the actual position. "
+      "Let's bring that point home with a truly large measurement error. We will set $\\mathbf{R}=500$. Think about what the plot will look like before scrolling down. To emphasize the issue, I will set the amount of noise injected into the measurements to 0, so the measurement will exactly equal the actual position. "
     ]
    },
    {
@ -1805,7 +1805,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Keep looking at these plots until you grasp how to interpret the covariance matrix $\\small\\mathbf{P}$. When you start dealing with a, say, $9{\\times}9$ matrix it may seem overwhelming - there are 81 numbers to interpret. Just break it down - the diagonal contains the variance for each state variable, and all off diagonal elements are the product of two variances and a scaling factor $p$. You will not be able to plot a $9{\\times}9$ matrix on the screen because it would require living in 10-D space, so you have to develop your intution and understanding in this simple, 2-D case. \n",
+      "Keep looking at these plots until you grasp how to interpret the covariance matrix $\\mathbf{P}$. When you start dealing with a, say, $9{\\times}9$ matrix it may seem overwhelming - there are 81 numbers to interpret. Just break it down - the diagonal contains the variance for each state variable, and all off diagonal elements are the product of two variances and a scaling factor $p$. You will not be able to plot a $9{\\times}9$ matrix on the screen because it would require living in 10-D space, so you have to develop your intution and understanding in this simple, 2-D case. \n",
      "\n",
      "> **sidebar**: when plotting covariance ellipses, make sure to always use *plt.axis('equal')* in your code. If the axis use different scales the ellipses will be drawn distorted. For example, the ellipse may be drawn as being taller than it is wide, but it may actually be wider than tall."
     ]
--- a/Unscented_Kalman_Filter.ipynb
+++ b/Unscented_Kalman_Filter.ipynb
--- a/book.ipynb
+++ b/book.ipynb
--- a/build_book.bat
+++ b/build_book.bat
@ -1,5 +1,6 @@

-python merge_book.py Preface.ipynb Signals_and_Noise.ipynb g-h_filter.ipynb discrete_bayes.ipynb Gaussians.ipynb Kalman_Filters.ipynb Multidimensional_Kalman_Filters.ipynb Kalman_Filter_Math.ipynb Designing_Kalman_Filters.ipynb Extended_Kalman_Filters.ipynb Unscented_Kalman_Filter.ipynb > book.ipynb
+python merge_book.py Preface.ipynb Signals_and_Noise.ipynb g-h_filter.ipynb discrete_bayes.ipynb Gaussians.ipynb Kalman_Filters.ipynb Multidimensional_Kalman_Filters.ipynb Kalman_Filter_Math.ipynb Designing_Kalman_Filters.ipynb Extended_Kalman_Filters.ipynb Unscented_Kalman_Filter.ipynb >Kalman_and_Bayesian_Filters_in_Python.ipynb 
+
 ipython nbconvert --to latex --template book --post PDF Kalman_and_Bayesian_Filters_in_Python.ipynb


--- a/discrete_bayes.ipynb
+++ b/discrete_bayes.ipynb
@ -1,7 +1,7 @@
 {
 "metadata": {
  "name": "",
-  "signature": "sha256:1556bb7a2a15e47b34d1f079dab4a85d78afe50dc20aeb56325b4990232aeebd"
+  "signature": "sha256:266729b27e2d51ad90c675d939766430799e16551541a85150394b3f7eead92e"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
@ -53,7 +53,7 @@
        "\n",
        "    .text_cell_render h1 {\n",
        "        font-weight: 200;\n",
-        "        font-size: 36pt;\n",
+        "        font-size: 30pt;\n",
        "        line-height: 100%;\n",
        "        color:#c76c0c;\n",
        "        margin-bottom: 0.5em;\n",
@ -63,7 +63,6 @@
        "    } \n",
        "    h2 {\n",
        "        font-family: 'Open sans',verdana,arial,sans-serif;\n",
-        "        text-indent:1em;\n",
        "    }\n",
        "    .text_cell_render h2 {\n",
        "        font-weight: 200;\n",
@ -71,8 +70,8 @@
        "        font-style: italic;\n",
        "        line-height: 100%;\n",
        "        color:#c76c0c;\n",
-        "        margin-bottom: 1.5em;\n",
-        "        margin-top: 0.5em;\n",
+        "        margin-bottom: 0.5em;\n",
+        "        margin-top: 1.5em;\n",
        "        display: block;\n",
        "        white-space: nowrap;\n",
        "    } \n",
@ -243,13 +242,13 @@
       ],
       "metadata": {},
       "output_type": "pyout",
-       "prompt_number": 3,
+       "prompt_number": 1,
       "text": [
-        "<IPython.core.display.HTML at 0x247b650>"
+        "<IPython.core.display.HTML at 0xf9c990>"
       ]
      }
     ],
-     "prompt_number": 3
+     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
@ -300,7 +299,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Now let's create a map of the hallway in another list. Suppose rehere are first two doors close together, and then another door quite a bit further down the hallway. We will use 1 to denote a door, and 0 to denote a wall:"
+      "Now let's create a map of the hallway in another list. Suppose there are first two doors close together, and then another door quite a bit further down the hallway. We will use 1 to denote a door, and 0 to denote a wall:"
     ]
    },
    {
@ -1069,7 +1068,7 @@
     "level": 2,
     "metadata": {},
     "source": [
-      "Drawbacks and Limitations to the Discrete Bayesian Filter"
+      "Drawbacks and Limitations"
     ]
    },
    {
--- a/g-h_filter.ipynb
+++ b/g-h_filter.ipynb