Fixes small mistakes and corrects equations in chap. 6

2015-09-25 19:22:18 +02:00 · 2015-09-25 19:22:18 +02:00 · 0ebd3fd488
commit 0ebd3fd488
parent b0bc969cbe
1 changed files with 18 additions and 18 deletions
--- a/06-Multivariate-Kalman-Filters.ipynb
+++ b/06-Multivariate-Kalman-Filters.ipynb
@ -810,7 +810,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's test this! `KalmanFilter` has a `predict` method that performs the prediction by computing $\\mathbf{\\bar{x}} = \\mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to  0.45 meter/sec. We've defined `dt = 0.1`, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged."
+    "Let's test this! `KalmanFilter` has a `predict` method that performs the prediction by computing $\\mathbf{\\bar{x}} = \\mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to  4.5 meter/sec. We've defined `dt = 0.1`, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged."
   ]
  },
  {
@ -950,7 +950,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can see that the center of the ellipse shifted by a small amount (from 10 to 10.45) and the ellipse elongated, showing the correlation between position and velocity. How does the filter compute new values for $\\mathbf{\\bar{P}}$, and what is it based on? It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground."
+    "You can see that the center of the ellipse shifted by a small amount (from 10 to 10.90) and the ellipse elongated, showing the correlation between position and velocity. How does the filter compute new values for $\\mathbf{\\bar{P}}$, and what is it based on? It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground."
   ]
  },
  {
@ -1095,7 +1095,7 @@
    "\n",
    "$$\\mathbf{y} = \\mathbf{z} - \\mathbf{H \\bar{x}}$$\n",
    "\n",
-    "where $\\textbf{y}$ is the residual, $\\mathbf{x^-}$ is the prior, $\\textbf{z}$ is the measurement, and $\\textbf{H}$ is the measurement function. So we take the prior, convert it to a measurement, and subtract it from the measurement our sensor gave us. This gives us the difference between our prediction and measurement in measurement space!\n",
+    "where $\\mathbf{y}$ is the residual, $\\mathbf{\\bar{x}}$ is the prior, $\\mathbf{z}$ is the measurement, and $\\mathbf{H}$ is the measurement function. So we take the prior, convert it to a measurement, and subtract it from the measurement our sensor gave us. This gives us the difference between our prediction and measurement in measurement space!\n",
    "\n",
    "We need to design $\\mathbf{H}$ so that $\\mathbf{H\\bar{x}}$ yields a measurement. For this problem we have a sensor that measures position, so $\\mathbf{z}$ will be a one variable vector:\n",
    "\n",
@ -1372,7 +1372,7 @@
    "It really cannot get much simpler than that. As we tackle more complicated problems this code will remain largely the same; all of the work goes into setting up the `KalmanFilter` variables; executing the filter is trivial.\n",
    "\n",
    "\n",
-    "The rest of the code optionally plots the results and returns the saved states and covaraniances."
+    "The rest of the code optionally plots the results and returns the saved states and covariances."
   ]
  },
  {
@ -1422,7 +1422,7 @@
   "source": [
    "There is still a lot to learn, but we have implemented our first, full Kalman filter using the same theory and equations as published by Rudolf Kalman! Code very much like this runs inside of your GPS and phone, inside every airliner, inside of robots, and so on. \n",
    "\n",
-    "The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (drawn in green). After the initial settling in period the filter should track the dog's position very closely. The yellow shaded portion between the black dotted lines shows 1 standard deviations of the filter's variance, which I explain in the next paragraph.\n",
+    "The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (the 'Track' line). After the initial settling in period the filter should track the dog's position very closely. The yellow shaded portion between the black dotted lines shows 1 standard deviations of the filter's variance, which I explain in the next paragraph.\n",
    "\n",
    "The next two plots show the variance of $x$ and of $\\dot{x}$. If you look at the code, you will see that I have plotted the diagonals of $\\mathbf{P}$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $\\mathbf{P}[0,0]$ is the variance of $x$, and $\\mathbf{P}[1,1]$ is the variance of $\\dot{x}$. You can see that despite initializing $\\mathbf{P}=(\\begin{smallmatrix}500&0\\\\0&400\\end{smallmatrix})$ we quickly converge to small variances for both the position and velocity. The covariance matrix $\\mathbf{P}$ tells us the *theoretical* performance of the filter *assuming* everything we tell it is true. Recall from the Gaussian chapter that the standard deviation is the square root of the variance, and that approximately 68% of a Gaussian distribution occurs within one standard deviation. Therefore, if at least 68% of the filter output is within one standard deviation we can be sure that the filter is performing well. In the top chart I have displayed the one standard deviation as the yellow shaded area between the two dotted lines. To my eye it looks like perhaps the filter is slightly exceeding that bounds, so the filter probably needs some tuning. We will discuss this later in the chapter."
   ]
@ -1496,7 +1496,7 @@
    "\n",
    "$\\underline{\\textbf{Mean}}$\n",
    "\n",
-    "$\\mathbf{x}^- = \\mathbf{Fx} + \\mathbf{Bu}$\n",
+    "$\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$\n",
    "\n",
    "We are starting out easy - you were already exposed to the first equation while designing the state transition function $\\mathbf{F}$ and control function $\\mathbf{B}$. \n",
    "\n",
@ -1521,7 +1521,7 @@
    "\n",
    "$$\\bar{x} = \\dot{x}\\Delta t + x$$\n",
    "\n",
-    "Since we do not have perfect knowledge of the value of $\\dot{x}$ the sum $x^- = \\dot{x}\\Delta t + x$ gains uncertainty. Because the positions and velocities are correlated we cannot simply add the covariance matrices. The correct equation is\n",
+    "Since we do not have perfect knowledge of the value of $\\dot{x}$ the sum $\\bar{x} = \\dot{x}\\Delta t + x$ gains uncertainty. Because the positions and velocities are correlated we cannot simply add the covariance matrices. The correct equation is\n",
    "\n",
    "$$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}$$\n",
    "\n",
@ -1568,7 +1568,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information asbout the position due to adding $\\dot{x}\\Delta t$ to x at each step. The height has not changed - our system model say the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf{F}$ linearly correlates $x$ with $\\dot{x}$ with the expression $\\bar{x} = \\dot{x} \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf{T}$ computation correctly incorporates this correlation into the covariance matrix!\n",
+    "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information about the position due to adding $\\dot{x}\\Delta t$ to x at each step. The height has not changed - our system model says the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf{F}$ linearly correlates $x$ with $\\dot{x}$ with the expression $\\bar{x} = \\dot{x} \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf{T}$ computation correctly incorporates this correlation into the covariance matrix!\n",
    "\n",
    "Here is an animation of this equation that allows you to change the design of $\\mathbf{F}$ to see how it affects shape of $\\mathbf{P}$. The `F00` slider affects the value of F[0, 0]. `covar` sets the intial covariance between the position and velocity($\\sigma_x\\sigma_{\\dot{x}}$). I recommend answering these questions at a minimum\n",
    "\n",
@ -1637,7 +1637,7 @@
   "source": [
    "### Update Equations\n",
    "\n",
-    "The update equations look messier than the predict equations, but that is mostly due to the Kalman filter computing the update in **measurement space**. This is because measurement are not *invertable*. For example, later on we will be using sensors that only give you the range to a target. It is impossible to convert a range into a position - an infinite number of positions (in a circle) will yield the same range. On the other hand, we can always compute the range (measurement) given a position(state).\n",
+    "The update equations look messier than the predict equations, but that is mostly due to the Kalman filter computing the update in **measurement space**. This is because measurement are not *invertible*. For example, later on we will be using sensors that only give you the range to a target. It is impossible to convert a range into a position - an infinite number of positions (in a circle) will yield the same range. On the other hand, we can always compute the range (measurement) given a position(state).\n",
    "\n",
    "Before I continue, recall that we are trying to do something very simple: choose a new estimate chosen somewhere between a measurement and a prediction, as in this chart:"
   ]
@ -1679,14 +1679,14 @@
   "source": [
    "<u>**System Uncertainty**</u>\n",
    "\n",
-    "$\\textbf{S} = \\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R}$\n",
+    "$\\textbf{S} = \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R}$\n",
    "\n",
    "To work in measurement space the Kalman filter has to project the covariance matrix into measurement space. The math for this is $\\mathbf{H\\bar{P}H}^\\mathsf{T}$, where $\\mathbf{\\bar{P}}$ is the *prior* covariance and $\\mathbf{H}$ is the measurement function.\n",
    "\n",
    "\n",
    "You should recognize this $\\mathbf{ABA}^\\mathsf{T}$ form - the prediction step used $\\mathbf{FPF}^\\mathsf{T}$ to update $\\mathbf{P}$ with the state transition function. Here, we use the same form to update it with the measurement function. In a real sense the linear algebra is changing the coordinate system for us. \n",
    "\n",
-    "Once the covariace is in measurement space we need to account for the sensor noise. This is very easy  - we just add matrices. The result is variously called either the **system uncertainty** or **innovation covariance**.\n",
+    "Once the covariance is in measurement space we need to account for the sensor noise. This is very easy  - we just add matrices. The result is variously called either the **system uncertainty** or **innovation covariance**.\n",
    "\n",
    "I want you to compare the equation for the system uncertainty and the covariance\n",
    "\n",
@ -1747,13 +1747,13 @@
    "\n",
    "$\\underline{\\texttt{State Update}}$\n",
    "\n",
-    "$\\mathbf{x} = \\mathbf{x}^- + \\mathbf{Ky}$\n",
+    "$\\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky}$\n",
    "\n",
-    "We select our new state to be along the residual, scaled by the Kalman gain. The scaling is performed by $\\mathbf{Ky}$, which both scales the residual and converts it back into state space. This is added to the prior, yielding the equation: $\\mathbf{x} =\\mathbf{x}^- + \\mathbf{Ky}$.\n",
+    "We select our new state to be along the residual, scaled by the Kalman gain. The scaling is performed by $\\mathbf{Ky}$, which both scales the residual and converts it back into state space. This is added to the prior, yielding the equation: $\\mathbf{x} =\\mathbf{\\bar{x}} + \\mathbf{Ky}$.\n",
    "\n",
    "$\\underline{\\texttt{Covariance Update}}$\n",
    "\n",
-    "$\\mathbf{P} = (\\mathbf{I}-\\mathbf{KH})\\mathbf{P}^-$\n",
+    "$\\mathbf{P} = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}$\n",
    "\n",
    "$\\mathbf{I}$ is the identity matrix, and is the way we represent $1$ in multiple dimensions. $\\mathbf{H}$ is our measurement function, and is a constant.  So, simplified, this is simply $\\mathbf{P} = (1-c\\mathbf{K})\\mathbf{P}$. $\\mathbf{K}$ is our ratio of how much prediction vs measurement we use. So, if $\\mathbf{K}$ is large then $(1-\\mathbf{cK})$ is small, and $\\mathbf{P}$ will be made smaller than it was. If $\\mathbf{K}$ is small, then $(1-\\mathbf{cK})$ is large, and $\\mathbf{P}$ will be made larger than it was. So we adjust the size of our uncertainty by some factor of the *Kalman gain*.\n",
    "\n",
@ -1792,7 +1792,7 @@
    "\n",
    "The notation above makes use of the Bayesian a$\\mid$b notation, which means a given the evidence of b. The hat means estimate. So, $\\hat{\\mathbf{x}}_{k\\mid k}$ means the estimate of the state $\\mathbf{X}$ at time $k$ (the first k) given the evidence from time $k$ (the second k). The posterior, in other words. $\\hat{\\mathbf{x}}_{k\\mid k-1}$ means the estimate for the state $\\mathbf{x}$ at time k given the estimate from time k - 1. The prior, in other words. \n",
    "\n",
-    "This notation allows a mathematician to express himself exactly, and when it comes to formal publications presenting new results this precision is necessary. As a programmer I find all of that fairly unreadable; I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format, so each author makes different choices. I find it challenging to switch quickly between books an papers, and so have adopted my admittedly less precise notation. Mathematicians will write scathing emails to me, but I hope the programmers and students will rejoice.\n",
+    "This notation allows a mathematician to express himself exactly, and when it comes to formal publications presenting new results this precision is necessary. As a programmer I find all of that fairly unreadable; I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format, so each author makes different choices. I find it challenging to switch quickly between books and papers, and so have adopted my admittedly less precise notation. Mathematicians will write scathing emails to me, but I hope the programmers and students will rejoice.\n",
    "\n",
    "Here are some examples for how other authors write the prior: $X^*_{n+1,n}$, $\\underline{\\hat{x}}_k(-)$ (really!), $\\hat{\\textbf{x}}^-_{k+1}$, $\\hat{x}_{k}$. If you are lucky an author defines the notation; more often you have to read the equations in context to recognize what the author is doing. Of course, people write within a tradition; papers on Kalman filters in finance are likely to use one set of notations while papers on radar tracking are likely to use a different set. Over time you will start to become familiar with trends, and also instantly recognize when somebody just copied equations wholesale from another work. For example - the equations I gave above were copied from the  Wikipedia [Kalman Filter](https://en.wikipedia.org/wiki/Kalman_filter#Details) [[1]](#[wiki_article]) article.\n",
    "\n",
@ -2047,7 +2047,7 @@
    "\n",
    "Let's remind ourselves of what the term *process uncertainty* means. Consider the problem of tracking a ball. We can accurately model its behavior in static air with math, but if there is any wind our model will diverge from reality. \n",
    "\n",
-    "In the first case we set `Q_var=20 m^2`, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\" as we are saying that the variance in the velocity is 10. Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf{F}}$, but the upshot of that is to not trust the motion prediction step. So the filter will be computing velocity ($\\dot{x}$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to use but the measurements, and thus it follows the measurements closely. \n",
+    "In the first case we set `Q_var=20 m^2`, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\" as we are saying that the variance in the velocity is 20. Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf{F}}$, but the upshot of that is to not trust the motion prediction step. So the filter will be computing velocity ($\\dot{x}$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to use but the measurements, and thus it follows the measurements closely. \n",
    "\n",
    "In the second case we set `Q_var=0.02 m^2`, which is quite small. In physical terms we are telling the filter \"trust the motion computation, it is really good!\". Again, more strictly this actually says there is very small amounts of process noise (variance 0.02 $m^2$), so the motion computation will be accurate. So the filter ends up ignoring some of the measurement as it jumps up and down, because the variation in the measurement does not match our trustworthy velocity prediction."
   ]
@ -2375,7 +2375,7 @@
    "\n",
    "So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to use a small $\\mathbf{P}$. Yes, this will avoid having the Kalman filter take time to accurately track the signal, but if we are truly uncertain about the initial measurements this can cause the filter to generate very bad results. If we are tracking a living object we are probably very uncertain about where it is before we start tracking it. On the other hand, if we are filtering the output of a thermometer, we are as certain about the first measurement as the 1000th. For your Kalman filter to perform well you must set $\\mathbf{P}$ to a value that truly reflects your knowledge about the data. \n",
    "\n",
-    "Let's see the result of a bad initial estimate coupled with a very small $\\mathbf{P}$ We will set our initial estimate at 100 m (whereas the dog actually starts at 0m), but set `P=1 m`."
+    "Let's see the result of a bad initial estimate coupled with a very small $\\mathbf{P}$. We will set our initial estimate at 100 m (whereas the dog actually starts at 0m), but set `P=1 m`."
   ]
  },
  {
@ -2598,7 +2598,7 @@
    "## Filter Initialization\n",
    "\n",
    "\n",
-    "There are many schemes for initializing the filter (i.e. choosing the initial values for $\\mathbf{x}$ and $\\mathbf{P}$. I will share a common approach that performs well in most situations. In this scheme you do not initialize the filter until you get the first measurement, $\\mathbf{z}_0$. From this you can compute the initial value for $\\mathbf{x}$ with $\\mathbf{x}_0 = \\mathbf{z}_0$. If  $\\mathbf{z}$ is not of the same size, type, and units as $\\mathbf{x}$, which is usually the case, we can use our measurement function as follow.\n",
+    "There are many schemes for initializing the filter (i.e. choosing the initial values for $\\mathbf{x}$ and $\\mathbf{P}$). I will share a common approach that performs well in most situations. In this scheme you do not initialize the filter until you get the first measurement, $\\mathbf{z}_0$. From this you can compute the initial value for $\\mathbf{x}$ with $\\mathbf{x}_0 = \\mathbf{z}_0$. If  $\\mathbf{z}$ is not of the same size, type, and units as $\\mathbf{x}$, which is usually the case, we can use our measurement function as follow.\n",
    "\n",
    "We know\n",
    "\n",