Expanding the math chapter.

I have once again altered some of my notation - I need to go back and revise the rest of the book to use it. For now, the book is in an inconsistant state as far as notation goes, but each chapter should be self consistant.
2014-09-11 09:48:22 -07:00 · 2014-09-11 09:48:22 -07:00 · d5f4a15ad9
commit d5f4a15ad9
parent 049783e92f
1 changed files with 71 additions and 27 deletions
--- a/Chapter07_Kalman_Filter_Math/Kalman_Filter_Math.ipynb
+++ b/Chapter07_Kalman_Filter_Math/Kalman_Filter_Math.ipynb
@ -1,7 +1,7 @@
 {
 "metadata": {
  "name": "",
-  "signature": "sha256:5e618e97b591e89d2e34bd50626e7a9afaab2de1c0ab60da49a9119975fdd7aa"
+  "signature": "sha256:a73447c9246d1dd1dc9c4cb604b69a1588146a3a4d575d676bb8402d8964f8d0"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
@ -300,10 +300,15 @@
      "\\end{aligned}\n",
      "$$\n",
      "\n",
-      "But, of course we know this is not all that is happening. First, we do not have perfect measures of things like the velocity and acceleration - there is always noise in the measurements, and we have to model that. Second, no car travels on a perfect road. There are bumps that cause the car to slow down, there is wind drag, there are hills that raise and lower the speed. If we do not have explicit knowledge of these factors we lump them all together under the term \"process noise\".\n",
+      "And once we learned calculus we saw them in this form:\n",
      "\n",
-      "Trying to model off of those factors explicitly and exactly is impossible for anything but the most trivial problem. I could try to modify Newton's equations for things like bumps in the road, the behavior of the car's suspension system, heck, the effects of hitting bugs with the windshield, but the job would never be done - there would always be more effects to add. What is worse, each of those models would in themselves be a simplification - do I assume the wind is constant, that the drag of the car is the same for all angles of the wind, that the suspension act as perfect springs, and so on?\n",
+      "$$\n",
+      " \\mathbf{v} = \\frac{d \\mathbf{d}}{d t}\\\\ \\quad \\mathbf{a} = \\frac{d \\mathbf{v}}{d t} = \\frac{d^2 \\mathbf{d}}{d t^2} \\,\\!\n",
+      " $$\n",
+      " \n",
+      "A typical problem would have you compute the distance travelled given a constant velocity or acceleration. But, of course we know this is not all that is happening. First, we do not have perfect measures of things like the velocity and acceleration - there is always noise in the measurements, and we have to model that. Second, no car travels on a perfect road. There are bumps that cause the car to slow down, there is wind drag, there are hills that raise and lower the speed. If we do not have explicit knowledge of these factors we lump them all together under the term \"process noise\".\n",
      "\n",
+      "Trying to model all of those factors explicitly and exactly is impossible for anything but the most trivial problem. I could try to include equations for things like bumps in the road, the behavior of the car's suspension system, heck, the effects of hitting bugs with the windshield, but the job would never be done - there would always be more effects to add. What is worse, each of those models would in themselves be a simplification - do I assume the wind is constant, that the drag of the car is the same for all angles of the wind, that the suspension act as perfect springs, that the suspension for each wheel acts identically, and so on.\n",
      "\n",
      "So control theory makes a mathematically correct simplification. We acknowledge that there are many factors that influence the system that we either do not know or that we don't want to have to model. At any time $t$ we say that the actual value (say, the position of our car) is the predicted value plus some unknown process noise:\n",
      "\n",
@ -311,22 +316,22 @@
      "x(t) = x_{pred}(t) + noise(t)\n",
      "$$\n",
      "\n",
-      "This is not meant to imply that $noise(t)$ is a function that we can derive analytically or that it is well behaved. If there is a bump in the road at $t=10$ then the noise factor will just incorporate that effect. Again, this is not implying that we model, compute, or even know the value of *noise(t)*, it is merely a statement of fact - we can *always* describe the actual value as the predicted value plus some other value. \n",
+      "This is not meant to imply that $noise(t)$ is a function that we can derive analytically or that it is well behaved. If there is a bump in the road at $t=10$ then the noise factor will just incorporate that effect. Again, this is not implying that we model, compute, or even know the value of *noise(t)*, it is merely a statement of fact - we can *always* describe the actual value as the predicted value from our idealized model plus some other value. \n",
      "\n",
      "Let's express this in linear algebra. Using the same notation from previous chapters, we can say that our model of the system (without noise) is:\n",
      "\n",
-      "$$ f(x) = Fx$$\n",
+      "$$ f(\\mathbf{x}) = \\mathbf{Fx}$$\n",
      "\n",
      "That is, we have a set of linear equations that describe our system. For our car, \n",
-      "$F$ will be the coefficients for Newton's equations of motion. \n",
+      "$mathbf{F}$ will be the coefficients for Newton's equations of motion. \n",
      "\n",
      "Now we need to model the noise. We will just call that *w*, and add it to the equation.\n",
      "\n",
-      "$$ f(x) = Fx + w$$\n",
+      "$$ f(\\mathbf{x}) = \\mathbf{Fx} + \\mathbf{w}$$\n",
      "\n",
-      "Finally, we need to consider inputs into the system. We are dealing with linear problems here, so we will assume that there is some input $u$ into the system, and that we have some linear model that defines how that input changes the system. For example, if you press down on the accelerator in your car the car will accelerate. We will need a matrix $G$ to convert $u$ into the effect on the system. We just add that into our equation:\n",
+      "Finally, we need to consider inputs into the system. We are dealing with linear problems here, so we will assume that there is some input $u$ into the system, and that we have some linear model that defines how that input changes the system. For example, if you press down on the accelerator in your car the car will accelerate. We will need a matrix $\\mathbf{B}$ to convert $u$ into the effect on the system. We just add that into our equation:\n",
      "\n",
-      "$$ f(x) = Fx + Gu + w$$\n",
+      "$$ f(\\mathbf{x}) = \\mathbf{Fx} + \\mathbf{Bu} + \\mathbf{w}$$\n",
      "\n",
      "And that's it. That is the equation that Kalman set out to solve, and he found a way to compute an optimal solution if we assume certain properties of $w$.\n",
      "\n",
@ -351,18 +356,19 @@
      "$$\n",
      "\\begin{aligned}\n",
      "\\text{Predict Step}\\\\\n",
-      "\\mathbf{x}' &= \\mathbf{F x} + \\mathbf{G u}\\;\\;\\;\\;&(1) \\\\\n",
+      "\\mathbf{x} &= \\mathbf{F x} + \\mathbf{B u}\\;\\;\\;\\;&(1) \\\\\n",
      "\\mathbf{P} &= \\mathbf{FP{F}}^T + \\mathbf{Q}\\;\\;\\;\\;&(2) \\\\\n",
      "\\\\\n",
      "\\text{Update Step}\\\\\n",
-      "\\mathbf{\\gamma} &= \\mathbf{z} - \\mathbf{H x}\\;\\;\\;\\;&(3) \\\\\n",
-      "\\mathbf{K}&= \\mathbf{PH}^T (\\mathbf{HPH}^T + \\mathbf{R})^{-1}\\;\\;\\;\\;&(4) \\\\\n",
-      "\\mathbf{x}&=\\mathbf{x}' +\\mathbf{K\\gamma}\\;\\;\\;\\;&(5) \\\\\n",
-      "\\mathbf{P}&= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P}\\;\\;\\;\\;&(6)\n",
+      "\\textbf{y} &= \\mathbf{z} - \\mathbf{H}\\mathbf{x}\\;\\;\\;&(3) \\\\\n",
+      "\\mathbf{S} &= \\mathbf{HPH}^T + \\mathbf{R} \\;\\;\\;&(4) \\\\\n",
+      "\\mathbf{K} &= \\mathbf{PH}^T\\mathbf{S}^{-1}\\;\\;\\;&(5) \\\\\n",
+      "\\mathbf{x}  &= \\mathbf{x} +\\mathbf{K}\\mathbf{y} \\;\\;\\;&(6)\\\\\n",
+      "\\mathbf{P} &= (\\mathbf{I}-\\mathbf{K}\\mathbf{H})\\mathbf{P}\\;\\;\\;&(7)\n",
      "\\end{aligned}\n",
      "$$\n",
      "\n",
-      "I will start with the measurement step, as that is what we started with in the one dimensional Kalman filter case. Our first equation is\n",
+      "I will start with the update step, as that is what we started with in the one dimensional Kalman filter case. Our first equation is\n",
      "\n",
      "$$\n",
      "\\mathbf{\\gamma} = \\mathbf{z} - \\mathbf{H x}\\tag{3}\n",
@ -396,28 +402,66 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "The blue prediction line is the output of $\\mathbf{Hx}$, and the dot labeled \"measurement\" is $\\mathbf{z}$. Therefore, $\\gamma = \\mathbf{z} - \\mathbf{Hx}$ is how we compute the residual, drawn in red. So $\\gamma$ is the residual.\n",
+      "The blue point labeled \"prediction\" is the output of $\\mathbf{Hx}$, and the dot labeled \"measurement\" is $\\mathbf{z}$. Therefore, $\\gamma = \\mathbf{z} - \\mathbf{Hx}$ is how we compute the residual, drawn in red, where $\\gamma$ is the residual.\n",
      "\n",
-      "The next line is the formidable:\n",
+      "The next two lines are the formidable:\n",
      "\n",
-      "$$\\mathbf{K}= \\mathbf{PH}^T (\\mathbf{HPH}^T + \\mathbf{R})^{-1}\\tag{4}$$\n",
+      "$$\n",
+      "\\begin{aligned}\n",
+      "\\mathbf{S} &= \\textbf{HPH}^T + \\textbf{R} \\;\\;\\;&(4) \\\\\n",
+      "\\textbf{K} &= \\textbf{PH}^T\\mathbf{S}^{-1}\\;\\;\\;&(5) \\\\\n",
+      "\\end{aligned}\n",
+      "$$\n",
+      "Unfortunately it is a fair amount of linear algebra to derive this. The derivation can be quite elegant, and I urge you to look it up if you have the mathematical education to follow it. But $\\mathbf{K}$ is just the *Kalman gain* - the ratio of how much measurement vs prediction we should use to create the new estimate. $\\mathbf{R}$ is the *measurement noise*, and $\\mathbf{P}$ is our *uncertainty covariance matrix* from the prediction step.\n",
      "\n",
-      "Unfortunately it is a fair amount of linear algebra to derive this. The derivation can be quite elegant, and I urge you to look it up if you have the mathematical education to follow it. But $\\mathbf{K}$ is just the *Kalman gain* - the ratio of how much measurement vs prediction we should use to create the new estimate. $\\mathbf{R}$ is the *measurement noise*, and $\\mathbf{P}$ is our *uncertainty covariance matrix*.\n",
+      "** author's note: the following aside probably belongs elsewhere in the book**\n",
      "\n",
-      "So let's work through this expression by expression. Start with $\\mathbf{HPH}^T$. The linear equation $\\mathbf{ABA}^T$ can be thought of as changing the basis of $\\mathbf{B}$ to $\\mathbf{A}$. So $\\mathbf{HPH}^T$ is taking the covariance $\\mathbf{P}$ and putting it in measurement ($\\mathbf{H}$) space. Then, once in measurement space, we can add the measurement noise $\\mathbf{R}$ to it. Hence, the expression for the uncertainty in the measurement is:\n",
+      "> As an aside, most textbooks are more exact with the notation, in Gelb[1] for example, *Pk(+)* is used to denote the uncertainty covariance for the prediction step, and *Pk(-)* for the uncertainty covariance for the update step. Other texts use subscripts with 'k|k-1', superscipt $^-$, and many other variations. As a programmer I find all of that fairly unreadable; I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format, so each author makes different choices. I find it challenging to switch quickly between books an papers, and so have adopted this admittedly less precise notation. Mathematicians will write scathing emails to me, but I hope the programmers and students will rejoice.\n",
      "\n",
-      "$$(\\mathbf{HPH}^T + \\mathbf{R})^{-1}$$\n",
+      "> If you are a programmer trying to understand a paper's math equations, I strongly recommend just removing all of the superscripts, subscripts, and diacriticals, replacing them with a single letter. If you work with equations like this every day this is superflous advice, but when I read I am usually trying to understand the flow of computation. To me it is far more understandable to remember that $P$ in this step represents the updated value of $P$ computed in the last step, as opposed to trying to remember what $P_{k-1}(+)$ denotes, and what its relation to $P_k(-)$ is, if any.\n",
      "\n",
-      "Taking the inverse is linear algebra's way of doing $\\frac{1}{x}$. So if you accept my admittedly hand wavey explanation it can be seen to be computing:\n",
+      "> For example, for the equation of $\\mathbf{S}$ above, Wikipedia uses\n",
      "\n",
-      "$$ \n",
-      "gain_{measurement\\,space} = \\frac{uncertainty_{prediction}}{uncertainty_{measurement}}\n",
+      "> $$\\textbf{S}_k = \\textbf{H}_k \\textbf{P}_{k\\mid k-1} \\textbf{H}_k^\\text{T} + \\textbf{R}_k\n",
+      "$$\n",
+      "\n",
+      "> Is that more exact? Absolutely. Is it easier or harder to read? You'll need to answer that for yourself.\n",
+      "\n",
+      "> For reference, the Appendix **Symbols and Notations** lists the symbology used by the major authors in the field.\n",
+      "\n",
+      "\n",
+      "So let's work through this expression by expression. Start with $\\mathbf{HPH}^T$. The linear equation $\\mathbf{ABA}^T$ can be thought of as changing the basis of $\\mathbf{B}$ to $\\mathbf{A}$. So $\\mathbf{HPH}^T$ is taking the covariance $\\mathbf{P}$ and putting it in measurement ($\\mathbf{H}$) space. \n",
+      "\n",
+      "In English, consider the problem of reading a temperature with a thermometer that provices readings in volts. Our state is in terms of temperature, but we are now doing calculations in *measurement space* - volts. So we need to convert $\\mathbf{P}$ from applying to temperatures to volts. The linear algebra form $\\textbf{H}\\textbf{P}\\textbf{H}$ takes $\\mathbf{P}$ to the basis used by $\\mathbf{H}$, namely volts. \n",
+      "\n",
+      "Then, once in measurement space, we can add the measurement noise $\\mathbf{R}$ to it. Hence, the expression for the uncertainty once we include the measurement is:\n",
+      "\n",
+      "$$\\mathbf{S} = \\mathbf{HP}\\mathbf{H}^T + \\mathbf{R}$$"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "The next equation is\n",
+      "$$\\textbf{K} = \\textbf{P}\\textbf{H}^T\\mathbf{S}^{-1}\\\\\n",
+      "$$\n",
+      "\n",
+      "$\\mathbf{K}$ is the *Kalman gain* - the ratio that chooses how far along the residual to select between the measurement and prediction in the graph above.\n",
+      "\n",
+      "We can think of the inverse of a matrix as linear algebra's way ofcomputing  $\\frac{1}{x}$. So we can read the equation for $\\textbf{K}$ as\n",
+      "\n",
+      "$$ \\textbf{K} = \\frac{\\textbf{P}\\textbf{H}^T}{\\mathbf{S}} $$\n",
+      "\n",
+      "\n",
+      "$$\n",
+      "\\textbf{K} = \\frac{uncertainty_{prediction}}{uncertainty_{measurement}}\n",
      "$$\n",
      "\n",
      "\n",
      "In other words, the *Kalman gain* equation is doing nothing more than computing a ratio based on how much we trust the prediction vs the measurement. If we are confident in our measurements and unconfident in our predictions $\\mathbf{K}$ will favor the measurement, and vice versa. The equation is complicated because we are doing this in multiple dimensions via matrices, but the concept is simple - scale by a ratio.\n",
      "\n",
-      "Without going into the derivation of $\\mathbf{K}$, I'll say that this equation is the result of finding a value of $\\mathbf{K}$ that optimizes the *mean-square estimation error*. It does this by finding the minimal values for $\\mathbf{P}$ along it's diagonal. Recall that the diagonal of $P$ is just the variance for each state variable. So, this equation for $\\mathbf{K}$ ensures that the Kalman filter output is optimal. To put this in concrete terms, for our dog tracking problem this means that the estimates for both position and velocity will be optimal - a value of $\\mathbf{K}$ that made the position extremely accurate but the velocity very inaccurate would be rejected in favor of a $\\mathbf{K}$ that made both position and velocity just somewhat accurate."
+      "Without going into the derivation of $\\mathbf{K}$, I'll say that this equation is the result of finding a value of $\\mathbf{K}$ that optimizes the *mean-square estimation error*. It does this by finding the minimal values for $\\mathbf{P}$ along it's diagonal. Recall that the diagonal of $\\mathbf{P}$ is just the variance for each state variable. So, this equation for $\\mathbf{K}$ ensures that the Kalman filter output is optimal. To put this in concrete terms, for our dog tracking problem this means that the estimates for both position and velocity will be optimal - a value of $\\mathbf{K}$ that made the position extremely accurate but the velocity very inaccurate would be rejected in favor of a $\\mathbf{K}$ that made both position and velocity just somewhat accurate."
     ]
    },
    {
@ -447,9 +491,9 @@
     "source": [
      "Now we have the measurement steps. The first equation is\n",
      "\n",
-      "$$\\mathbf{x}' = \\mathbf{Fx} + \\mathbf{Gu}\\tag{1}$$\n",
+      "$$\\mathbf{x} = \\mathbf{Fx} + \\mathbf{Bu}\\tag{1}$$\n",
      "\n",
-      "This is just our state transition equation which we have already discussed. $\\mathbf{Fx}$  multiplies $\\mathbf{x}$ with the state transition matrix to compute the next state. $G$ and $u$ add in the contribution of the control input $\\mathbf{u}$, if any.\n",
+      "This is just our state transition equation which we have already discussed. $\\mathbf{Fx}$  multiplies $\\mathbf{x}$ with the state transition matrix to compute the next state. $B$ and $u$ add in the contribution of the control input $\\mathbf{u}$, if any.\n",
      "\n",
      "The final equation is:\n",
      "$$\\mathbf{P} = \\mathbf{FPF}^T + \\mathbf{Q}\\tag{2}$$\n",