Start of re-edit.

2015-07-11 11:39:38 -07:00 · 2015-07-11 11:39:38 -07:00 · 9449955e14
commit 9449955e14
parent 5b5aabaead
1 changed files with 63 additions and 47 deletions
--- a/05_Multivariate_Kalman_Filters.ipynb
+++ b/05_Multivariate_Kalman_Filters.ipynb
@ -284,21 +284,26 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The techniques in the last chapter are very powerful, but they only work in one dimension. The gaussians represent a mean and variance that are scalars - real numbers. They provide no way to represent multidimensional data, such as the position of a dog in a field. You may retort that you could use two Kalman filters from the last chapter. One would track the x coordinate and the other  the y coordinate. That does work, but put that thought aside, because soon you will see some enormous benefits to implementing the multidimensional case. Through one key insight we will achieve markedly better filter performance than was possible with the equations from the last chapter.\n",
+    "The techniques in the last chapter are very powerful, but they only work with one variable or dimension. Gaussians represent a mean and variance that are scalars - real numbers. They provide no way to represent multidimensional data, such as the position of a dog in a field. You may retort that you could use two Kalman filters from the last chapter. One would track the x coordinate and the other  the y coordinate. That does work, but suppose we want to track position, velocity, acceleration, and attitude. These values are related to each other, and as we learned in the g-h chapter we should never throw away information. Through one key insight we will achieve markedly better filter performance than was possible with the equations from the last chapter.\n",
    "\n",
-    "In this chapter I am purposefully glossing over many aspects of the mathematics behind Kalman filters. If you are familiar with the topic you will read statements that you disagree with because they contain simplifications that do not necessarily hold in more general cases. If you are not familiar with the topic, expect some paragraphs to be somewhat 'magical' - it will not be clear how I derived a certain result. I prefer that you develop an intuition for how these filters work through several worked examples. If I started by presenting a rigorous mathematical formulation you would be left scratching your head about what all these terms mean and how you might apply them to your problem. In later chapters I will provide a more rigorous mathematical foundation, and at that time I will have to either correct approximations that I made in this chapter or provide additional information that I did not cover here. \n",
+    "In this chapter I am purposefully glossing over many aspects of the mathematics behind Kalman filters. Some things I show you will only work for special cases, others will be 'magical' - it will not be clear how I derived a certain result. I prefer that you develop an intuition for how these filters work through several worked examples. If I started with rigorous, generalized equations you would be left scratching your head about what all these terms mean and how you might apply them to your problem. In later chapters I will provide a more rigorous mathematical foundation, and at that time I will have to either correct approximations that I made in this chapter or provide additional information that I did not cover here. \n",
    "\n",
-    "To make this possible we will restrict ourselves to a subset of problems which we can describe with Newton's equations of motion. In the literature these filters are sometimes called \n",
-    "**discretized continuous-time kinematic filters**. In the *Kalman Filter Math* chapter we will develop the math required for solving any kind of dynamic system. \n",
+    "To make this possible we will restrict ourselves to a subset of problems which we can describe with Newton's equations of motion. These filters are called **discretized continuous-time kinematic filters**. In the *Kalman Filter Math* chapter we will develop the math required for solving any kind of dynamic system. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Newton's Equations of Motion\n",
    "\n",
-    "\n",
-    "This subset consists of systems which can be described with Newton's equations of motion: given a constant velocity $v$ of a system we can compute its position $x$ after time $t$ with:\n",
+    "Newton's equations of motion are: given a constant velocity $v$ of a system we can compute its position $x$ after time $t$ with:\n",
    "\n",
    "$$x = vt + x_0$$\n",
    "\n",
-    "For example, if we start at position 13 ($x_0=13$), our velocity is 10 m/s ($v=10$) and we travel for 12 seconds($t=12$) our final position is 133 ($133=10\\times 12 + 13$).\n",
+    "For example, if we start at position 13 ($x_0=13$), our velocity is 10 m/s ($v=10$) and we travel for 12 seconds($t=12$) our final position is $133=(10\\times 12) + 13$.\n",
    "\n",
-    "We can incorporate constant accleration with this equation\n",
+    "We can incorporate constant acceleration with this equation\n",
    "\n",
    "$$x = \\frac{1}{2}at^2 + v_0 t + x_0$$\n",
    "\n",
@ -306,13 +311,15 @@
    "\n",
    "$$x = \\frac{1}{6}jt^3 +  \\frac{1}{2}a_0 t^2 + v_0 t + x_0$$\n",
    "\n",
-    "As a reminder, we can generate these equations using basic calculus. Given a constant velocity v we can compute the distance traveled over time with the equation\n",
+    "As a reminder, we generate these equations by integrating a differential equation. Given a constant velocity v we can compute the distance traveled over time with the equation\n",
    "\n",
    "$$\\begin{aligned} v &= \\frac{dx}{dt}\\\\\n",
    "dx &= v\\, dt \\\\\n",
    "\\int_{x_0}^x\\, dx &= \\int_0^t v\\, dt\\\\\n",
    "x - x_0 &= vt - 0\\\\\n",
-    "x &= vt + x_0\\end{aligned}$$"
+    "x &= vt + x_0\\end{aligned}$$\n",
+    "\n",
+    "Dynamic systems are describable with differential equations. Most are not easily integrable in this way. We start with Newton because we can integrate and get a closed form solution, which makes the Kalman filter easier to design."
   ]
  },
  {
@ -326,9 +333,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In the last two chapters we used Gaussians for a scalar (one dimensional) variable, expressed as $\\mathcal{N}(\\mu, \\sigma^2)$. A more formal term for this is **univariate normal**, where univariate just means 'one variable'. The probability distribution of the Gaussian is also called the **univariate normal distribution**\n",
+    "In the last two chapters we used Gaussians for a scalar (one dimensional) variable, expressed as $\\mathcal{N}(\\mu, \\sigma^2)$. A more formal term for this is **univariate normal**, where univariate means 'one variable'. The probability distribution of the Gaussian is known as the **univariate normal distribution**\n",
    "\n",
-    "What might a **multivariate normal distribution** be? In this context, multivariate just means multiple variables. Our goal is to be able to represent a normal distribution across multiple dimensions. Consider the 2 dimensional case. Let's say we believe that $x = 2$ and $y = 17$. This might be the *x* and *y* coordinates for the position of our dog, it might be the  position and velocity of our dog on the x-axis, or the temperature and wind speed at our weather station, it doesn't really matter. We can see that for $N$ dimensions, we need $N$ means, which we will arrange in a column matrix (vector) like so:\n",
+    "What might a **multivariate normal distribution** be? *Multivariate* means multiple variables. Our goal is to be able to represent a normal distribution across multiple dimensions. I don't necessarily mean spatial dimensions - it could be position, velocity, and acceleration. Consider a two dimensional case. Let's say we believe that $x = 2$ and $y = 17$. This might be the *x* and *y* coordinates for the position of our dog, it might be the  position and velocity of our dog on the x-axis, or the temperature and wind speed at our weather station. It doesn't really matter. We can see that for $N$ dimensions, we need $N$ means, which we will arrange in a column matrix (vector) like so:\n",
    "\n",
    "$$\n",
    "\\mu = \\begin{bmatrix}{\\mu}_1\\\\{\\mu}_2\\\\ \\vdots \\\\{\\mu}_n\\end{bmatrix}\n",
@ -344,7 +351,9 @@
    "\n",
    "$$\\sigma^2 = \\begin{bmatrix}10\\\\4\\end{bmatrix}$$ \n",
    "\n",
-    "This is incorrect because it does not consider the more general case. For example, suppose we were tracking house prices vs total $m^2$ of the floor plan. These numbers are **correlated**. It is not an exact correlation, but in general houses in the same neighborhood are more expensive if they have a larger floor plan. We want a way to express not only what we think the variance is in the price and the $m^2$, but also the degree to which they are correlated. We use a **covariance matrix** to denote **covariances** with multivariate normal distributions. You might guess, correctly, that **covariance** is short for **correlated variances**."
+    "This is incorrect because it does not consider the more general case. For example, suppose we were tracking house prices vs total $m^2$ of the floor plan. These numbers are **correlated**. It is not an exact correlation, but in general houses in the same neighborhood are more expensive if they have a larger floor plan. We want a way to express not only what we think the variance is in the price and the $m^2$, but also the degree to which they are correlated. The **covariance** describes how two variables are correlated. **Covariance** is short for **correlated variances**\n",
+    "\n",
+    "We use a **covariance matrix** to denote **covariances** with multivariate normal distributions, and it looks like this:"
   ]
  },
  {
@ -360,7 +369,13 @@
    " \\end{bmatrix}\n",
    "$$\n",
    "\n",
-    "If you haven't seen this before it is probably a bit confusing at the moment. Rather than explain the math right now, we will take our usual tactic of building our intuition first with various thought experiments. At this point, note that the diagonal contains the variance for each state variable, and that all off-diagonal elements (covariances) are represent how much the $i$th (row) and $j$th (column) state variable are linearly correlated to each other. In other words, it is a measure for how much they change together. A covariance of 0 indicates no correlation. So, for example, if the variance for x is 10, the variance for y is 4, and there is no linear correlation between x and y, then we would say\n",
+    "If you haven't seen this before it is probably a bit confusing. Instead of starting with the mathematical definition I will build your intuition with thought experiments. At this point, note that the diagonal contains the variance for each state variable, and that all off-diagonal elements (covariances) are represent how much the $i$th (horizontal row) and $j$th (vertical column) state variable are linearly correlated to each other. In other words, covariance is a *measure for how much they change together*. \n",
+    "\n",
+    "A couple of examples. Generally speaking as the square footage of a house increases the price increases. These variables are correlated. As the temperature of an engine increases its life expectancy lowers. These are **inversely correlated**. The price of tea and the number of tail wags my dog makes have no relation to each other, and we say they are not correlated - each can change independent of the other.\n",
+    "\n",
+    "Correlation implies *prediction*. If our houses are in the same neighborhood, and you have twice the square footage I can predict that the price is likely to be higher. This is not guaranteed as there are other factors such as proximity to garbage dumps which also affect the price. If my car engine significantly overheats I start planning on replacing it soon. If my dog wags his tail grocery I don't conclude that tea prices will be increasing.\n",
+    "\n",
+    "A covariance of 0 indicates no correlation. So, for example, if the variance for x is 10, the variance for y is 4, and there is no linear correlation between x and y, then we would say\n",
    "\n",
    "$$\\Sigma = \\begin{bmatrix}10&0\\\\0&4\\end{bmatrix}$$\n",
    "\n",
@ -375,18 +390,18 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now, without explanation, here is the full equation for the multivariate normal distribution in $n$ dimensions.\n",
+    "Now, without explanation, here is the multivariate normal distribution in $n$ dimensions.\n",
    "\n",
-    "$$\\mathcal{N}(\\mu,\\,\\Sigma) = \\frac{1}{(2\\pi)^{\\frac{n}{2}}|\\Sigma|^{\\frac{1}{2}}}\\, \\exp  \\Big [{ -\\frac{1}{2}(\\mathbf{x}-\\mu)^\\mathsf{T}\\Sigma^{-1}(\\mathbf{x}-\\mu) \\Big ]}\n",
+    "$$f(\\mathbf{x},\\, \\mu,\\,\\Sigma) = \\frac{1}{(2\\pi)^{\\frac{n}{2}}|\\Sigma|^{\\frac{1}{2}}}\\, \\exp  \\Big [{ -\\frac{1}{2}(\\mathbf{x}-\\mu)^\\mathsf{T}\\Sigma^{-1}(\\mathbf{x}-\\mu) \\Big ]}\n",
    "$$\n",
    "\n",
-    "I urge you to not try to remember this function. We will program it in a Python function and then call it if we need to compute a specific value. Plus, it turns out that the Kalman filter equations will compute this for us automatically; we never have to compute it ourselves. However, if you look at it briefly you will note that it looks quite similar to the univariate normal distribution  except it uses matrices instead of scalar values, and the root of $\\pi$ is scaled by $n$. Here is the univariate equation for reference:\n",
+    "I urge you to not try to remember this function. We will program it in a Python function and then call it if we need to compute a specific value. Plus, the Kalman filter equations compute this for us automatically; we never have to explicitly compute it. However, note that it has the same form as the univariate normal distribution. It uses matrices instead of scalar values, and the root of $\\pi$ is scaled by $n$. Here is the univariate equation for reference:\n",
    "\n",
    "$$ \n",
    "f(x, \\mu, \\sigma) = \\frac{1}{\\sigma\\sqrt{2\\pi}} \\exp \\Big [{-\\frac{1}{2}}{(x-\\mu)^2}/\\sigma^2 \\Big ]\n",
    "$$\n",
    "\n",
-    "The multivariate version merely replaces the scalars of the univariate equations with matrices. If you are reasonably well-versed in linear algebra this equation should look quite manageable; if not, don't worry! Let's just plot it and see what it looks like."
+    "The multivariate version merely replaces the scalars of the univariate equations with matrices. If you are reasonably well-versed in linear algebra this equation should look quite manageable; if not, don't worry! Let's plot it and see what it looks like."
   ]
  },
  {
@ -416,9 +431,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Here we have plotted a two dimensional multivariate Gaussian with a mean of $\\mu=[\\begin{smallmatrix}2\\\\17\\end{smallmatrix}]$ and a covariance of $\\Sigma=[\\begin{smallmatrix}10&0\\\\0&4\\end{smallmatrix}]$. The three dimensional shape shows the probability of for any value of (x,y) in the z-axis. I have projected just the variance for x and y onto the walls of the chart - you can see that they take on the normal Gaussian bell curve shape. You can also see that, as we might hope, that the curve for x is wider than the curve for y, which is explained by $\\sigma_x^2=10$ and $\\sigma_y^2=4$. Also, the highest point of the curve is centered over (2,17), the means for x and y. I hope this demystifies the equation for you. Any multivariate Gaussian will create this sort of shape. If we think of this as a the Gaussian for our dog's position in a two dimensional field, the z-value at each point of (x,y) is the probability density for the dog being at that position. So, he has the highest probability of being near (2,17), a modest probability of being near (5,14), and a very low probability of being near (10,10).\n",
+    "Here we have plotted a two dimensional multivariate Gaussian with a mean of $\\mu=[\\begin{smallmatrix}2\\\\17\\end{smallmatrix}]$ and a covariance of $\\Sigma=[\\begin{smallmatrix}10&0\\\\0&4\\end{smallmatrix}]$. The three dimensional shape shows the probability of for any value of (x,y) in the z-axis. I have projected the variance for x and y onto the walls of the chart - you can see that they take on the normal Gaussian bell curve shape. You can also see that, as we might hope, that the curve for x is wider than the curve for y, which is explained by $\\sigma_x^2=10$ and $\\sigma_y^2=4$. Also, the highest point of the curve is centered over (2,17), the means for x and y. I hope this demystifies the equation for you. Any multivariate Gaussian will create this sort of shape. If we think of this as a the Gaussian for our dog's position in a two dimensional field, the z-value at each point of (x,y) is the probability density for the dog being at that position. So, he has the highest probability of being near (2,17), a modest probability of being near (5,14), and a very low probability of being near (10,10).\n",
    "\n",
-    "We will discuss the mathematical description of covariances in the Kalman Filter math chapter. For this chapter we just need to understand the following.\n",
+    "We will discuss the mathematical description of covariances in the Kalman Filter math chapter. For this chapter we need to understand the following.\n",
    "\n",
    "1. The diagonal of the matrix contains the variance for each variable. \n",
    "\n",
@ -426,7 +441,7 @@
    "\n",
    "3. $\\sigma_{ij} = \\sigma_{ji}$: if i gets larger when j gets larger, then it must be true that j gets larger when i gets larger.\n",
    "\n",
-    "4. The covariance between x and itself is just the variance of x: $\\sigma_{xx} = \\sigma_x^2$.\n",
+    "4. The covariance between x and itself is the variance of x: $\\sigma_{xx} = \\sigma_x^2$.\n",
    "\n",
    "5. This chart only shows a 2 dimensional Gaussian, but the equation works for any number of dimensions > 0. It's *kind of* hard to show a chart for the higher dimensions, so we will have to be satisfied with 2 dimensions."
   ]
@ -464,7 +479,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's use it to compute a few values just to make sure we know how to call and use the function, and then move on to more interesting things.\n",
+    "Let's use it to compute a few values to ensure we know how to call and use the function, and then move on to more interesting things.\n",
    "\n",
    "First, let's find the probability density for our dog being at (2.5, 7.3) if we believe he is at (2, 7) with a variance of 8 for $x$ and a variance of 4 for $y$.\n",
    "\n",
@ -522,7 +537,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now just call the function"
+    "Now call the function"
   ]
  },
  {
@ -592,7 +607,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's look at this in a slightly different way. Instead of plotting a surface showing the probability distribution I will just generate 1,000 points with the distribution of $[\\begin{smallmatrix}8&0\\\\0&4\\end{smallmatrix}]$."
+    "Let's look at this in a slightly different way. Instead of plotting a surface showing the probability distribution I will generate 1,000 points with the distribution of $[\\begin{smallmatrix}8&0\\\\0&4\\end{smallmatrix}]$."
   ]
  },
  {
@ -908,9 +923,9 @@
    "\\Sigma &\\approx \\frac{\\Sigma_1\\Sigma_2}{(\\Sigma_1+\\Sigma_2)}\n",
    "\\end{aligned}$$\n",
    "\n",
-    "In this form we can surmise that these equations are just the linear algebra form of the univariate equations.\n",
+    "In this form we can surmise that these equations are the linear algebra form of the univariate equations.\n",
    "\n",
-    "Now let's explore multivariate Gaussians in terms of a concrete example. Suppose that we are tracking an aircraft with two radar systems. I will ignore altitude as this is easier to graph in two dimensions. Radars give us the range and bearing to a target. We start out being uncertain about the position of the aircraft, so the covariance, which is just our uncertainty about the position, might look like this. "
+    "Now let's explore multivariate Gaussians in terms of a concrete example. Suppose that we are tracking an aircraft with two radar systems. I will ignore altitude as this is easier to graph in two dimensions. Radars give us the range and bearing to a target. We start out being uncertain about the position of the aircraft, so the covariance, which is our uncertainty about the position, might look like this. "
   ]
  },
  {
@ -940,7 +955,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now suppose that there is a radar to the lower left of the aircraft. Further suppose that the radar is very accurate in the bearing measurement, but not very accurate at the range. That covariance, which is just the uncertainty in the reading might look like this (plotted in blue):"
+    "Now suppose that there is a radar to the lower left of the aircraft. Further suppose that the radar is very accurate in the bearing measurement, but not very accurate at the range. That covariance, which is the uncertainty in the reading might look like this (plotted in blue):"
   ]
  },
  {
@ -1543,7 +1558,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "What do all of the variables mean? What is $\\mathbf{P}$, for example? Don't worry right now. Instead, I am just going to design a Kalman filter, and introduce the names as we go. Then we will just pass them into Python function that implement the equations above, and we will have our solution. Later sections will then delve into more detail about each step and equation. I think learning by example and practice is far easier than trying to memorize a dozen abstract facts at once.  "
+    "What do all of the variables mean? What is $\\mathbf{P}$, for example? Don't worry right now. Instead, I am going to design a Kalman filter, and introduce the names as we go. Then we will pass them into Python function that implement the equations above, and we will have our solution. Later sections will then delve into more detail about each step and equation. I think learning by example and practice is far easier than trying to memorize a dozen abstract facts at once.  "
   ]
  },
  {
@ -1557,7 +1572,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Before we go any further let's gain some familiarity with the equations by programming them in Python. I have written a production quality implementation of the Kalman filter equations in my `FilterPy` library, and we will be using that later in the chapter and the remainder of the book. We could just look at that code, but it contains a significant amount of code to ensure that the computations are numerically stable, that you do not pass in bad data, and so on. Let's just try to program this.\n",
+    "Before we go any further let's gain some familiarity with the equations by programming them in Python. I have written a production quality implementation of the Kalman filter equations in my `FilterPy` library, and we will be using that later in the chapter and the remainder of the book. We could just look at that code, but it contains a significant amount of code to ensure that the computations are numerically stable, that you do not pass in bad data, and so on. Let's try to program this.\n",
    "\n",
    "The filter equations are *linear algebra* equations, so we will use the Python library that implements linear algebra - NumPy. In the filter equations a **bold** variable denotes a matrix. Numpy provides two types to implement matrices: `numpy.array` and `numpy.matrix`. You might suspect that the latter is the one we want to use. As it turns out `numpy.matrix` does support linear algebra well, except for one problem - most of the rest of `numpy` uses `numpy.array`, not `numpy.matrix`. You can pass a `numpy.matrix` into a function, and get a `numpy.array` back as a result. Hence, the standard advice is that `numpy.matrix` is deprecated, and you should always use `numpy.array` even when `numpy.matrix` is more convenient. I ignored this advice in a early version of this code and ended up regretting that choice, and so now I use `numpy.array` only."
   ]
@ -1630,11 +1645,11 @@
    "    8. compute covariance for x to account for additional\n",
    "       information the measurement provides\n",
    "    \n",
-    "That is the entire Kalman filter algorithm. It is both what we described above in words, and it is what the rather obscure Kalman Filter equations do. This is the same algorithm we used in the last chapter except here $x$ can be multidimesional (it can contain positions in the x-axis and y-axis, velocities, accelerations, and so on). And since $x$ is multidimensional our variance must also be multidimensional, so it becomes a **covariance**. The Kalman filter equations just express this algorithm by using linear algebra. \n",
+    "That is the entire Kalman filter algorithm. It is both what we described above in words, and it is what the rather obscure Kalman Filter equations do. This is the same algorithm we used in the last chapter except here $x$ can be multidimesional (it can contain positions in the x-axis and y-axis, velocities, accelerations, and so on). And since $x$ is multidimensional our variance must also be multidimensional, so it becomes a **covariance**. The Kalman filter equations express this algorithm by using linear algebra. \n",
    "\n",
    "Step 4 might be confusing. Suppose we are measuring temperature. The thermometer may provide readings in volts, and thus the noise in the reading is also expressed as volts, not temperature. The Kalman filter will convert everything into volts so it can compute the Kalman gain in terms of volts, not temperature, and then convert everything back to temperatures. This is called working in the **measurement space**. We'll see how that works soon.\n",
    "\n",
-    "As I mentioned above, there is actually very little programming involved in creating a Kalman filter. We will just be defining several matrices and parameters that get passed into  the Kalman filter algorithm code. "
+    "As I mentioned above, there is actually very little programming involved in creating a Kalman filter. We will be defining several matrices and parameters that get passed into  the Kalman filter algorithm code. "
   ]
  },
  {
@ -1643,7 +1658,7 @@
   "source": [
    "## Designing the Kalman Filter\n",
    "\n",
-    "Rather than try to explain each of the steps ahead of time, which can be a bit abstract and hard to follow, let's just do it for our by now well known dog tracking problem. Naturally this one example will not cover every use case of the Kalman filter, but we will learn by starting with a simple problem and then slowly start addressing more complicated situations. But based on the algorithm above we can see that we have "
+    "Rather than try to explain each of the steps ahead of time, which can be a bit abstract and hard to follow, let's do it for our by now well known dog tracking problem. Naturally this one example will not cover every use case of the Kalman filter, but we will learn by starting with a simple problem and then slowly start addressing more complicated situations. But based on the algorithm above we can see that we have "
   ]
  },
  {
@ -1675,7 +1690,7 @@
    "\n",
    "$$\\mathbf{x} =\\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}$$\n",
    "\n",
-    "We use $\\mathbf{x}$ instead of $\\mu$, but recognize this is just the mean of the multivariate Gaussian. It is common to use the transpose notation to write the state variable as $\\mathbf{x} =\\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathbf{T}$ because the transpose of a row vector is a column vector.\n",
+    "We use $\\mathbf{x}$ instead of $\\mu$, but recognize this is the mean of the multivariate Gaussian. It is common to use the transpose notation to write the state variable as $\\mathbf{x} =\\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathbf{T}$ because the transpose of a row vector is a column vector.\n",
    "\n",
    "One point that might be confusing: $\\mathbf{x}$ and $x$ somewhat coincidentally have the same name. If we were tracking the dog in the y-axis we would write $\\mathbf{x} =\\begin{bmatrix}y & \\dot{y}\\end{bmatrix}^\\mathsf{T}$.\n",
    "\n",
@ -1706,7 +1721,7 @@
   "source": [
    "### **Step 2:** Design the State Transition Function\n",
    "\n",
-    "The next step in designing a Kalman filter is telling it how to predict the next state from the current state. We do this by providing it with equations that describe the physical model of the system. For example, for our dog tracking problem we are tracking a moving object, so we just need to provide it with the Newtonian equations for motion. If we were tracking a thrown ball we would have to provide equations for how a ball moves in a gravitational field, and perhaps include the effects of things like air drag. If we were writing a Kalman filter for a rocket we would have to tell it how the rocket responds to its thrusters and main engine. A Kalman filter for a bowling ball would incorporate the effects of friction and ball rotation. You get the idea. \n",
+    "The next step in designing a Kalman filter is telling it how to predict the next state from the current state. We do this by providing it with equations that describe the physical model of the system. For example, for our dog tracking problem we are tracking a moving object, so we need to provide it with the Newtonian equations for motion. If we were tracking a thrown ball we would have to provide equations for how a ball moves in a gravitational field, and perhaps include the effects of things like air drag. If we were writing a Kalman filter for a rocket we would have to tell it how the rocket responds to its thrusters and main engine. A Kalman filter for a bowling ball would incorporate the effects of friction and ball rotation. You get the idea. \n",
    "\n",
    "In the language of Kalman filters the physical model is sometimes called the **process model**. That is probably a better term than *physical model* because the Kalman filter can be used to track non-physical things like stock prices. We describe the process model with a set of equations we call the **State Transition Function.** \n",
    "\n",
@ -1870,7 +1885,7 @@
   "source": [
    "### **Step 5**: Design the Measurement Noise Matrix\n",
    "\n",
-    "The *measurement noise matrix* is a matrix that models the noise in our sensors as a covariance matrix. This can be admittedly a very difficult thing to do in practice. A complicated system may have many sensors, the correlation between them might not be clear, and usually their noise is not a pure Gaussian. For example, a sensor might be biased to read high if the temperature is high, and so the noise is not distributed equally on both sides of the mean. Later we will address this topic in detail. For now I just want you to get used to the idea of the measurement noise matrix so we will keep it deliberately simple.\n",
+    "The *measurement noise matrix* is a matrix that models the noise in our sensors as a covariance matrix. This can be admittedly a very difficult thing to do in practice. A complicated system may have many sensors, the correlation between them might not be clear, and usually their noise is not a pure Gaussian. For example, a sensor might be biased to read high if the temperature is high, and so the noise is not distributed equally on both sides of the mean. Later we will address this topic in detail. For now I want you to get used to the idea of the measurement noise matrix so we will keep it deliberately simple.\n",
    "\n",
    "In the last chapter we used a variance of 5 meters for our position sensor. Let's use the same value here.  The Kalman filter equations uses the symbol $R$ for this matrix. In general the matrix will have dimension $m{\\times}m$, where $m$ is the number of sensors. It is $m{\\times}m$ because it is a covariance matrix, as there may be correlations between the sensors. We have only 1 sensor here so we write:\n",
    "\n",
@ -2093,7 +2108,7 @@
   "source": [
    "Let's look at this line by line. \n",
    "\n",
-    "**1**: We just assign the initial value for our state. Here we just initialize both the position and velocity to zero.\n",
+    "**1**: We assign the initial value for our state. Here we initialize both the position and velocity to zero.\n",
    "\n",
    "**2**: We set $\\textbf{F}=\\begin{bmatrix}1&1\\\\0&1\\end{bmatrix}$, as in design step 2 above. \n",
    "\n",
@ -2174,7 +2189,8 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This is the complete code for the filter, and most of it is just boilerplate. I've made it flexible enough to support several uses in this chapter, so it is a bit verbose. The first function `dog_tracking_filter()` is a helper function that creates a `KalmanFilter` object with specified $\\mathbf{R}$, $\\mathbf{Q}$ and $\\mathbf{P}$ matrices. We've shown this code already, so I will not discuss it more here. \n",
+    "This is the complete code for the filter, and most of it is\n",
+    "boilerplate. I've made it flexible enough to support several uses in this chapter, so it is a bit verbose. The first function `dog_tracking_filter()` is a helper function that creates a `KalmanFilter` object with specified $\\mathbf{R}$, $\\mathbf{Q}$ and $\\mathbf{P}$ matrices. We've shown this code already, so I will not discuss it more here. \n",
    "\n",
    "The function `filter_dog()` implements the filter itself.  Lets work through it line by line. The first line creates the simulation of the DogSensor, as we have seen in the previous chapter.\n",
    "\n",
@ -2289,7 +2305,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "I want you to get a better feel for how the Gaussians chnage over time, so here is a 3D plot showing the Gaussians from the plot above. I generated this by computing the probability distribution for each over a large area, and then just summing them together so I can show them all in the same plot. Note that it makes no mathematical sense to add these together, it is just a way to display them all at once. The scale of the y axis is much smaller than the x axis, so the Gaussians look very stretched horizontally, but they are not. This scaling just makes it easier to see everything and minimizes the amount of computation needed. You should still be able to tell that our certainty in both the position and velocity improve over time. We know this because the Gaussian gets narrow in both axis as time increases, and it simultaneously gets taller."
+    "I want you to get a better feel for how the Gaussians change over time, so here is a 3D plot showing the Gaussians from the plot above. I generated this by computing the probability distribution for each over a large area, and then summing them together so I can show them all in the same plot. Note that it makes no mathematical sense to add these together, it is merely a way to display them all at once. The scale of the y axis is much smaller than the x axis, so the Gaussians look very stretched horizontally, but they are not. This scaling makes it easier to see everything and minimizes the amount of computation needed. You should still be able to tell that our certainty in both the position and velocity improve over time. We know this because the Gaussian gets narrow in both axis as time increases, and it simultaneously gets taller."
   ]
  },
  {
@ -2427,9 +2443,9 @@
    "    \n",
    "2. Therefore, the simple Bayesian reasoning we used in the last chapter applies to this chapter as well.\n",
    "\n",
-    "3. Therefore, the equations in this chapter might 'look ugly', but they really are just implementing multiplying and addition of Gaussians.\n",
+    "3. Therefore, the equations in this chapter might 'look ugly', but they really implement multiplying and addition of Gaussians.\n",
    "\n",
-    "> The above might not seem worth emphasizing, but as we continue in the book the mathematical demands will increase significantly. It is easy to get lost in a thicket of linear algebra equations when you read a book or paper on optimal estimation. Any time you start getting lost, just go back to the basics of the predict/update cycle based on residuals between measurements and predictions and the meaning of the math will usually be much clearer. The math *looks* daunting, and can sometimes be very hard to solve analytically, but the concepts are quite simple."
+    "> The above might not seem worth emphasizing, but as we continue in the book the mathematical demands will increase significantly. It is easy to get lost in a thicket of linear algebra equations when you read a book or paper on optimal estimation. Any time you start getting lost, go back to the basics of the predict/update cycle based on residuals between measurements and predictions and the meaning of the math will usually be much clearer. The math *looks* daunting, and can sometimes be very hard to solve analytically, but the concepts are quite simple."
   ]
  },
  {
@ -2584,7 +2600,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Here we can see that the filter cannot acquire the track. This happens because even though the filter is getting reasonably good measurements it assumes that the measurements are bad, and eventually just predicts forward from a bad position at each step. If you think that perhaps that bad initial position would give similar results for a smaller measurement noise, let's set it back to the correct value of 240 $m^2$."
+    "Here we can see that the filter cannot acquire the track. This happens because even though the filter is getting reasonably good measurements it assumes that the measurements are bad, and eventually predicts forward from a bad position at each step. If you think that perhaps that bad initial position would give similar results for a smaller measurement noise, let's set it back to the correct value of 240 $m^2$."
   ]
  },
  {
@ -2847,7 +2863,7 @@
   "source": [
    "This *looks* good at first blush. The plot does not have the spike that the former plot did; the filter starts tracking the measurements and doesn't take any time to settle to the signal. However, if we look at the plots for P you can see that there is an initial spike for the variance in position, and that it never really converges. Poor design leads to a long convergence time, and suboptimal results. \n",
    "\n",
-    "So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to just use a small $\\text{P}$. Yes, this will avoid having the Kalman filter take time to accurately track the signal, but if we are truly uncertain about the initial measurements this can cause the filter to generate very bad results. If we are tracking a living object we are probably very uncertain about where it is before we start tracking it. On the other hand, if we are filtering the output of a thermometer, we are just as certain about the first measurement as the 1000th. For your Kalman filter to perform well you must set $\\text{P}$ to a value that truly reflects your knowledge about the data. \n",
+    "So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to use a small $\\text{P}$. Yes, this will avoid having the Kalman filter take time to accurately track the signal, but if we are truly uncertain about the initial measurements this can cause the filter to generate very bad results. If we are tracking a living object we are probably very uncertain about where it is before we start tracking it. On the other hand, if we are filtering the output of a thermometer, we are as certain about the first measurement as the 1000th. For your Kalman filter to perform well you must set $\\text{P}$ to a value that truly reflects your knowledge about the data. \n",
    "\n",
    "Let's see the result of a bad initial estimate coupled with a very small $\\text{P}$ We will set our initial estimate at 100 m (whereas the dog actually starts at 0m), but set `P=1 m`."
   ]
@ -3039,7 +3055,7 @@
   "source": [
    "The $x$ axis is for position, and $y$ is velocity. An ellipse that is vertical, or nearly so, says there is no correlation between position and velocity, and an ellipse that is diagonal says that there is a lot of correlation. Phrased that way, it sounds unlikely - either they are correlated or not. But this is a measure of the *output of the filter*, not a description of the actual, physical world. When $\\mathbf{R}$ is very large we are telling the filter that there is a lot of noise in the measurements. In that case the Kalman gain $\\mathbf{K}$ is set to favor the prediction over the measurement, and the prediction comes from the velocity state variable. So, there is a large correlation between $x$ and $\\dot{x}$. Conversely, if $\\mathbf{R}$ is small, we are telling the filter that the measurement is very trustworthy, and $\\mathbf{K}$ is set to favor the measurement over the prediction. Why would the filter want to use the prediction if the measurement is nearly perfect? If the filter is not using much from the prediction there will be very little correlation reported. \n",
    "\n",
-    "**This is a critical point to understand!**. The Kalman filter is just a mathematical model for a real world system. A report of little correlation *does not mean* there is no correlation in the physical system, just that there was no *linear* correlation in the mathematical model. It's just a report of how much measurement vs prediction was incorporated into the model.  \n",
+    "**This is a critical point to understand!**. The Kalman filter is a mathematical model for a real world system. A report of little correlation *does not mean* there is no correlation in the physical system, just that there was no *linear* correlation in the mathematical model. It's a report of how much measurement vs prediction was incorporated into the model.  \n",
    "\n",
    "Let's bring that point home with a truly large measurement error. We will set $\\mathbf{R}=500 m^2$. Think about what the plot will look like before scrolling down. To emphasize the issue, I will set the amount of noise injected into the measurements to 0, so the measurement will exactly equal the actual position. "
   ]
@ -3197,7 +3213,7 @@
    "\n",
    "However, this does not incorporate the process noise $w$. We assume that the process noise is white. This often isn't true, but it makes the math possible, and making this assumption even when it isn't strictly true usually works.\n",
    "\n",
-    "We need to convert $w$ into a covariance matrix so that it can be added to $\\mathbf{P}$. We can do this by computing its expected value:  $\\mathbf{Q} = E[\\mathbf{ww}^\\mathsf{T}]$. This is covered in the *Kalman Math* chapter. Once we have $\\mathbf{Q}$ it is just added to the covariance matrix, giving use the equation at the top of this section. I show it here along with an example of how we would do it for the univariate case.\n",
+    "We need to convert $w$ into a covariance matrix so that it can be added to $\\mathbf{P}$. We can do this by computing its expected value:  $\\mathbf{Q} = E[\\mathbf{ww}^\\mathsf{T}]$. This is covered in the *Kalman Math* chapter. Once we have $\\mathbf{Q}$ it is added to the covariance matrix, giving use the equation at the top of this section. I show it here along with an example of how we would do it for the univariate case.\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\sigma &= \\sigma + \\sigma_{move} &+ &\\sigma_{noise}\\,\\,\\, &Univariate\\\\\n",
@ -3248,7 +3264,7 @@
    "\n",
    "In other words, the *Kalman gain* equation is doing nothing more than computing a ratio based on how much we trust the prediction vs the measurement. If we are confident in our measurements and unconfident in our predictions $\\mathbf{K}$ will favor the measurement, and vice versa. The equation is complicated because we are doing this in multiple dimensions via matrices, but the concept is simple - scale by a ratio.\n",
    "\n",
-    "Without going into the derivation of $\\mathbf{K}$, I'll say that this equation is the result of finding a value of $\\mathbf{K}$ that optimizes the *mean-square estimation error*. It does this by finding the minimal values for $\\mathbf{P}$ along its diagonal. Recall that the diagonal of $\\mathbf{P}$ is just the variance for each state variable. So, this equation for $\\mathbf{K}$ ensures that the Kalman filter output is optimal. To put this in concrete terms, for our dog tracking problem this means that the estimates for both position and velocity will be optimal in a least squares sense."
+    "Without going into the derivation of $\\mathbf{K}$, I'll say that this equation is the result of finding a value of $\\mathbf{K}$ that optimizes the *mean-square estimation error*. It does this by finding the minimal values for $\\mathbf{P}$ along its diagonal. Recall that the diagonal of $\\mathbf{P}$ is the variance for each state variable. So, this equation for $\\mathbf{K}$ ensures that the Kalman filter output is optimal. To put this in concrete terms, for our dog tracking problem this means that the estimates for both position and velocity will be optimal in a least squares sense."
   ]
  },
  {
@ -3294,7 +3310,7 @@
    "\n",
    "$$\\mathbf{x} = \\mathbf{x} + \\mathbf{Ky}$$\n",
    "\n",
-    "This just multiplies the residual by the Kalman gain, and adds it to the state variable. In other words, this is the computation of our new estimate.\n",
+    "This multiplies the residual by the Kalman gain, and adds it to the state variable. In other words, this is the computation of our new estimate.\n",
    "\n",
    "### Update Covariance\n",
    "\n",
@ -3386,7 +3402,7 @@
   "source": [
    "I have an entire chapter on using the Kalman filter to smooth data; I will not repeat the chapter's information here. However, it is so easy to use, and offers such a profoundly improved output that I will tease you will a few examples. The smoothing chapter is not especially difficult; you are sufficiently prepared to read it now.\n",
    "\n",
-    "Let's assume that we are tracking a car that has been traveling in a straight line. We get a measurement that implies that the car is starting to turn to the left. The Kalman filter moves the state estimate somewhat towards the measurement, but it cannot judge whether this is just a particularly noisy measurement or the true start of a turn. \n",
+    "Let's assume that we are tracking a car that has been traveling in a straight line. We get a measurement that implies that the car is starting to turn to the left. The Kalman filter moves the state estimate somewhat towards the measurement, but it cannot judge whether this is a particularly noisy measurement or the true start of a turn. \n",
    "\n",
    "However, if we have future measurements we can decide if a turn was made. Suppose the subsequent measurements all continue turning left. We can then be sure that that a turn was initiated. On the other hand, if the subsequent measurements continued on in a straight line we would know that the measurement was noisy and should be mostly ignored. Instead of making an estimate part way between the measurement and prediction the estimate will either fully incorporate the measurement or ignore it, depending on what the future measurements imply about the object's movement.\n",
    "\n",