From 9529459f1414101f4b7966d06ce4151a90a47a0c Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Wed, 17 Sep 2014 12:47:34 -0700
Subject: [PATCH] More work on making the notation consistant.

Going back to using ^- for the estimates. Using \mathsf{T}
for the transpose character
---
 .../Multivariate_Kalman_Filters.ipynb         | 137 +++++++++---------
 1 file changed, 67 insertions(+), 70 deletions(-)

diff --git a/Chapter06_Multivariate_Kalman_Filter/Multivariate_Kalman_Filters.ipynb b/Chapter06_Multivariate_Kalman_Filter/Multivariate_Kalman_Filters.ipynb
index aa863fa..501cf0d 100644
--- a/Chapter06_Multivariate_Kalman_Filter/Multivariate_Kalman_Filters.ipynb
+++ b/Chapter06_Multivariate_Kalman_Filter/Multivariate_Kalman_Filters.ipynb
@@ -1,7 +1,7 @@
 {
  "metadata": {
   "name": "",
-  "signature": "sha256:a173822504e2e8404015e6d858158001601adf2d500c7a1c4d4494238bb8fd25"
+  "signature": "sha256:3aac89b941d2833559c577596712d8055827531a775ebd4870e9cce0e8efb9e2"
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -257,13 +257,13 @@
        ],
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 3,
+       "prompt_number": 1,
        "text": [
-        "<IPython.core.display.HTML at 0x7f5728987470>"
+        "<IPython.core.display.HTML at 0x7f713e4d3ba8>"
        ]
       }
      ],
-     "prompt_number": 3
+     "prompt_number": 1
     },
     {
      "cell_type": "heading",
@@ -297,7 +297,7 @@
       "\n",
       "$$ x = \\frac{j}{3}t^3 +  \\frac{a_0}{2}t^2 + v_0 t + x_0$$\n",
       "\n",
-      "As a reminder, we can compute these equations using basic calculus. Given a constant velocity v we can compute the distance travelled over time with the equation\n",
+      "As a reminder, we can generate these equations using basic calculus. Given a constant velocity v we can compute the distance travelled over time with the equation\n",
       "\n",
       "$$ v = \\frac{dx}{dt}\\\\\n",
       "x = \\int v \\, \\mathrm{d}t\\\\ x = vt + x_0$$"
@@ -641,7 +641,7 @@
       "\\end{aligned}\n",
       "$$ \n",
       "\n",
-      "Let this be our current belief about the position of our dog in a field. In other words, we believe that he is positioned at (2,7) with a variance of $\\sigma^2=2$ for both x and y. The contour plot shows where we believe the dog is located with the '+' in the center of the ellipse. The ellipse shows the boundary for the $1\\sigma$ probability, where $1\\sigma$ is one *standard deviation*. In other words, for this Gaussian 68% of the data will fall within this ellipse. Recall from the Gaussians chapter the the 68-95-99.7 rule - 68% of all values will fall within 1 standard deviation ($1\\sigma$), 95% within $2\\sigma$, and 99.7% within $3\\sigma$. The dog could be at (356443,58483), but values that far away from the mean are infitestimally small.\n",
+      "Let this be our current belief about the position of our dog in a field. In other words, we believe that he is positioned at (2,7) with a variance of $\\sigma^2=2$ for both x and y. The contour plot shows where we believe the dog is located with the '+' in the center of the ellipse. The ellipse shows the boundary for the $1\\sigma$ probability, where $1\\sigma$ is one *standard deviation*. In other words, for this Gaussian 68% of the data will fall within this ellipse. Recall from the Gaussians chapter the the 68-95-99.7 rule - 68% of all values will fall within 1 standard deviation ($1\\sigma$), 95% within $2\\sigma$, and 99.7% within $3\\sigma$. The dog could be at (356443,58483), but the chances for values that far away from the mean are infitestimally small.\n",
       "\n",
       "An equivalent way of thinking about this is the circle/ellipse shows us the amount of error in our belief. A tiny circle would indicate that we have a very small error, and a very large circle indicates a lot of error in our belief. We will use this throughout the rest of the book to display and evaluate the accuracy of our filters at any point in time. "
      ]
@@ -659,7 +659,7 @@
       "\\end{aligned}\n",
       "$$\n",
       "\n",
-      "This time we use a different variance for $x$ (2) vs $y$ (9). The result is an ellipse. When we look at it we can immediately tell that we have a lot more uncertainty in the $y$ value vs the $x$ value. Our belief that the value is (2,7) is the same in both cases, but errors are different. This sort of thing happens naturally as we track objects in the world - one sensor has a better view of the object, or is closer, than another sensor, and so we end up with different error rates in the different axis."
+      "This time we use a different variance for $x$ (2) vs $y$ (9). The result is an ellipse. When we look at it we can immediately tell that we have a lot more uncertainty in the $y$ value vs the $x$ value. Our belief that the value is (2,7) is the same in both cases, but errors are different. In this case the standard deviation in $x$ is $\\sigma_x = \\sqrt{2}=1.414$ and the standard deviation for $y$ is $\\sigma_y = \\sqrt{9}=3$. This sort of thing happens naturally as we track objects in the world - one sensor has a better view of the object, or is closer, than another sensor, and so we end up with different error rates in the different axis."
      ]
     },
     {
@@ -885,7 +885,7 @@
       "> $$\n",
       "\\begin{aligned}\n",
       "\\hat{\\textbf{x}}^-_{k+1} &= \\mathbf{F}_{k}\\hat{\\textbf{x}}_{k} + \\mathbf{B}_k\\mathbf{u}_k\\;\\;\\;&(1) \\\\\n",
-      "\\textbf{P}^-_{k+1} &=  \\mathbf{F}_k \\textbf{P}_k\\mathbf{F}_k^T + \\textbf{Q}_{k}\\;\\;\\;&(2)\n",
+      "\\textbf{P}^-_{k+1} &=  \\mathbf{F}_k \\textbf{P}_k\\mathbf{F}_k^\\mathsf{T} + \\textbf{Q}_{k}\\;\\;\\;&(2)\n",
       "\\end{aligned}\n",
       "$$\n",
       "\n",
@@ -894,8 +894,8 @@
       ">$$\n",
       "\\begin{aligned}\n",
       "\\textbf{y}_k &= \\textbf{z}_k - \\textbf{H}_k\\hat{\\textbf{}x}^-_k\\;\\;\\;&(3) \\\\\n",
-      "\\mathbf{S}_k &= \\textbf{H}_k\\textbf{P}^-_k\\textbf{H}_k^T + \\textbf{R}_k \\;\\;\\;&(4) \\\\\n",
-      "\\textbf{K}_k &= \\textbf{P}^-_k\\textbf{H}_k^T\\mathbf{S}_k^{-1}\\;\\;\\;&(5) \\\\\n",
+      "\\mathbf{S}_k &= \\textbf{H}_k\\textbf{P}^-_k\\textbf{H}_k^\\mathsf{T} + \\textbf{R}_k \\;\\;\\;&(4) \\\\\n",
+      "\\textbf{K}_k &= \\textbf{P}^-_k\\textbf{H}_k^\\mathsf{T}\\mathbf{S}_k^{-1}\\;\\;\\;&(5) \\\\\n",
       "\\hat{\\textbf{x}}_k  &= \\hat{\\textbf{x}}^-_k +\\textbf{K}_k\\textbf{y} \\;\\;\\;&(6)\\\\\n",
       "\\mathbf{P}_k &= (\\mathbf{I}-\\mathbf{K}_k\\mathbf{H}_k)\\mathbf{P}^-_k\\;\\;\\;&(7)\n",
       "\\end{aligned}\n",
@@ -910,11 +910,8 @@
      "source": [
       "These are nothing more than linear algebra equations that implement the algorithm we used in the last chapter, but using multidimensional Gaussians instead of univariate Gaussians, and optimized for a least squares fit.\n",
       "\n",
-      " The subscripts indicate which time step the data comes from; $k$ is now, $k+1$ is the next step. $A^T$ is the transpose of the matrix A, and $A^{-1}$ is the inverse. Finally, the hat denotes an estimate, so $\\hat{x}_k$ is the estimate of $x$ at time $k$.\n",
       "\n",
-      "Different texts use different notation and variable names for the Kalman filter. Later we will expose you to these different forms to prepare you for reading the original literature. In the equations above I have adopted the variable names used by the Wikipedia article[2] on Kalman filters. Each bold letter denotes a matrix or vector. The subscripts indicate which time step the data comes from; $k$ is now, $k+1$ is the next step. The caret (^) indicates that the value is an estimate. Finally, I particularly like how Brown [3] uses a raised $^-$ to denote a prediction, and so I have adopted that approach. For a matrix $\\mathbf{A}$, $\\mathbf{A}^T$ signifies its transpose, and $\\mathbf{A}^{-1}$ its inverse. So, taken together, $\\hat{\\mathbf{x}}^-_{k+1}$ represents the prediction for the estimate of $\\mathbf{x}$ at time step $k+1$, where $\\mathbf{x}$ is some vector in the form $\\mathbf{x} = \\begin{bmatrix}x_1&x_2&..&x_n\\end{bmatrix}^T.$ The notation does not specify that $\\mathbf{x}$ is a column vector - we will learn the shapes and sizes of all of the variables later in the chapter.\n",
-      "\n",
-      "**author's note: do we really want to explain notation here?**"
+      "Different texts use different notation and variable names for the Kalman filter. Later we will expose you to these different forms to prepare you for reading the original literature. In the equations above I have adopted the variable names used by the Wikipedia article[2] on Kalman filters. Each bold letter denotes a matrix or vector. The subscripts indicate which time step the data comes from; $k$ is now, $k+1$ is the next step. The caret (^) indicates that the value is an estimate. Finally, I particularly like how Brown [3] uses a raised $^-$ to denote a prediction, and so I have adopted that approach. For a matrix $\\mathbf{A}$, $\\mathbf{A}^\\mathsf{T}$ signifies its transpose, and $\\mathbf{A}^{-1}$ its inverse. So, taken together, $\\hat{\\mathbf{x}}^-_{k+1}$ represents the prediction for the estimate of $\\mathbf{x}$ at time step $k+1$, where $\\mathbf{x}$ is some vector in the form $\\mathbf{x} = \\begin{bmatrix}x_1&x_2&..&x_n\\end{bmatrix}^\\mathsf{T}.$ The notation does not specify that $\\mathbf{x}$ is a column vector - we will learn the shapes and sizes of all of the variables later in the chapter."
      ]
     },
     {
@@ -934,15 +931,15 @@
       "$$\n",
       "\\begin{aligned}\n",
       "\\text{Predict Step}\\\\\n",
-      "\\mathbf{x} &:= \\mathbf{F x} + \\mathbf{B u}\\;\\;\\;&(1) \\\\\n",
-      "\\mathbf{P } &:= \\mathbf{FP{F}}^T + \\mathbf{Q}\\;\\;\\;&(2) \\\\\n",
+      "\\mathbf{x^-} &= \\mathbf{F x} + \\mathbf{B u}\\;\\;\\;&(1) \\\\\n",
+      "\\mathbf{P^-} &= \\mathbf{FP{F}}^\\mathsf{T} + \\mathbf{Q}\\;\\;\\;&(2) \\\\\n",
       "\\\\\n",
       "\\text{Update Step}\\\\\n",
-      "\\textbf{y} &:= \\mathbf{z} - \\mathbf{H x} \\;\\;\\;&(3)\\\\\n",
-      "\\textbf{S} &:= \\mathbf{HPH}^T + \\mathbf{R} \\;\\;\\;&(4)\\\\\n",
-      "\\mathbf{K} &:= \\mathbf{PH}^T \\mathbf{S}^{-1}\\;\\;\\;&(5) \\\\\n",
-      "\\mathbf{x}&:=\\mathbf{x} +\\mathbf{K\\textbf{y}} \\;\\;\\;&(6)\\\\\n",
-      "\\mathbf{P}&:= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P}\\;\\;\\;&(7)\n",
+      "\\textbf{y} &= \\mathbf{z} - \\mathbf{H x^-} \\;\\;\\;&(3)\\\\\n",
+      "\\textbf{S} &= \\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R} \\;\\;\\;&(4)\\\\\n",
+      "\\mathbf{K} &= \\mathbf{P^-H}^\\mathsf{T} \\mathbf{S}^{-1}\\;\\;\\;&(5) \\\\\n",
+      "\\mathbf{x} &=\\mathbf{x^-} +\\mathbf{K\\textbf{y}} \\;\\;\\;&(6)\\\\\n",
+      "\\mathbf{P} &= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P^-}\\;\\;\\;&(7)\n",
       "\\end{aligned}\n",
       "$$"
      ]
@@ -951,8 +948,6 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "This is an algorithm, so $:=$ denotes assignment, not equality. For example, equation (6) has P on both sides of the $:=$. This equation updates the value of P by the computation on the right hand side; it does not imply that the two sides of the equation are equal (they are not).  \n",
-      "\n",
       "What do all of the variables mean? What is $\\mathbf{P}$, for example? Don't worry right now. Instead, I am just going to design a Kalman filter, and introduce the names as we go. Then we will just pass them into Python function that implement the equations above, and we will have our solution. Later sections will then delve into more detail about each step and equation. I think learning by example and practice is far easier than trying to memorize a dozen abstract facts at once.  "
      ]
     },
@@ -1176,8 +1171,8 @@
       "\n",
       "$$\n",
       "\\begin{aligned}\n",
-      "\\mathbf{x} &:= \\mathbf{F x} + \\mathbf{B u}\\;\\;\\;&(1) \\\\\n",
-      "\\mathbf{P} &:= \\mathbf{FPF}^T + \\mathbf{Q}\\;\\;\\;&(2) \\\\\n",
+      "\\mathbf{x^-} &= \\mathbf{F x} + \\mathbf{B u}\\;\\;\\;&(1) \\\\\n",
+      "\\mathbf{P^-} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\\;\\;\\;&(2) \\\\\n",
       "\\end{aligned}\n",
       "$$\n",
       "\n",
@@ -1192,11 +1187,11 @@
       "\n",
       "$$\n",
       "\\begin{aligned}\n",
-      "\\textbf{y} &:= \\mathbf{z} - \\mathbf{H x} \\;\\;\\;&(3)\\\\\n",
-      "\\textbf{S} &:= \\mathbf{HPH}^T + \\mathbf{R} \\;\\;\\;&(4)\\\\\n",
-      "\\mathbf{K} &:= \\mathbf{PH}^T \\mathbf{S}^{-1}\\;\\;\\;&(5) \\\\\n",
-      "\\mathbf{x}&:=\\mathbf{x} +\\mathbf{K\\textbf{y}} \\;\\;\\;&(6)\\\\\n",
-      "\\mathbf{P}&:= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P}\\;\\;\\;&(7)\n",
+      "\\textbf{y} &= \\mathbf{z} - \\mathbf{H x} \\;\\;\\;&(3)\\\\\n",
+      "\\textbf{S} &= \\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R} \\;\\;\\;&(4)\\\\\n",
+      "\\mathbf{K} &= \\mathbf{P^-H}^\\mathsf{T} \\mathbf{S}^{-1}\\;\\;\\;&(5) \\\\\n",
+      "\\mathbf{x}&=\\mathbf{x^-} +\\mathbf{K\\textbf{y}} \\;\\;\\;&(6)\\\\\n",
+      "\\mathbf{P}&= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P^-}\\;\\;\\;&(7)\n",
       "\\end{aligned}\n",
       "$$\n",
       "\n",
@@ -1431,11 +1426,15 @@
       "In the language of Kalman filters the physical model is call the *process model*. That is probably a better term than *physical model* because the Kalman filter can be used to track non-physical things like stock prices. We describe the process model with a set of equations we call the *State Transition Function.* \n",
       "\n",
       "\n",
-      "We know from elementary physics how to compute a future position given our current position and velocity. Let $x$ be our current position, and $\\Delta t$ be the amount of time in the future, and $x^-$ be our predicted position. The velocity is then the derivative of $x$, which we notate as $\\dot{x}$. We can then write\n",
+      "We know from elementary physics how to compute a future position given our current position and velocity. Let $x$ be our current position, and $\\Delta t$ be the amount of time in the future. The velocity is then the derivative of $x$, which we notate as $\\dot{x}$. We can then write \n",
+      "\n",
+      "$$x_t = \\dot{x}\\Delta t + x_{t-1}$$\n",
+      "\n",
+      "However, to avoid subscripts we will adopt the notation that uses $x^-$ to denote our predicted position, and write\n",
       "\n",
       "$$x^- = \\dot{x}\\Delta t + x$$\n",
       "\n",
-      "Equation (1) of the Kalman filter $\\mathbf{x}^- = \\mathbf{Fx} + \\mathbf{B u}$ implements the *state transition function* that we are discussing. This requires us to formulate the motion equation above with matrices, so let's learn how to do that now. For the moment we will ignore the $\\mathbf{B u}$ term, as for our problem it turns out that it is equal to zero. Thus, we must express our equations in the form \n",
+      "Equation (1) of the Kalman filter $\\mathbf{x^-} = \\mathbf{Fx} + \\mathbf{B u}$ implements the *state transition function* that we are discussing. This requires us to formulate the motion equation above with matrices, so let's learn how to do that now. For the moment we will ignore the $\\mathbf{B u}$ term, as for our problem it turns out that it is equal to zero. This will be explained in step 3. Thus, we must express our equations in the form \n",
       "$\\mathbf{x}^- = \\mathbf{Fx}.$\n",
       "\n",
       "A quick review on how to represent linear equations with matrices. Take the following two equations:\n",
@@ -1447,7 +1446,7 @@
       "\\begin{bmatrix}2& 3 \\\\ 3& -1\\end{bmatrix}\\begin{bmatrix}x\\\\y\\end{bmatrix}=\\begin{bmatrix}8\\\\1\\end{bmatrix}$$\n",
       "If you perform the matrix multiplication in this equation the result will be the two equations above.\n",
       "\n",
-      "So, given that $\\mathbf{x} = \\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^T$ we can write:\n",
+      "So, given that $\\mathbf{x} = \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}$ we can write:\n",
       "\n",
       "$$\n",
       "\\begin{aligned}\n",
@@ -1517,9 +1516,9 @@
       "\n",
       "We will cover this use case later, but for now passive tracking applications we set those terms to 0. In step 2 there was the unexplained term $\\mathbf{Bu}$ in equation (1):\n",
       "\n",
-      "$$\\mathbf{\\hat{x}} = \\mathbf{Fx} + \\mathbf{Bu}$$.\n",
+      "$$\\mathbf{x^-} = \\mathbf{Fx} + \\mathbf{Bu}$$.\n",
       "\n",
-      "Here $\\mathbf{u}$ is the control input, and $\\mathbf{B}$ is its transfer function. For example, $\\mathbf{u}$ might be a voltage controlling how fast the wheel's motor turns, and multiplying by $\\mathbf{B}$ yields $\\begin{smallmatrix}x\\\\\\dot{x}\\end{smallmatrix}$. Since we do not need these terms we will set them both to zero and not concern ourselves with them for now.\n"
+      "Here $\\mathbf{u}$ is the control input, and $\\mathbf{B}$ is its transfer function. For example, $\\mathbf{u}$ might be a voltage controlling how fast the wheel's motor turns, and multiplying by $\\mathbf{B}$ yields $[\\begin{smallmatrix}x\\\\\\dot{x}\\end{smallmatrix}]$. Since we do not need these terms we will set them both to zero and not concern ourselves with them for now.\n"
      ]
     },
     {
@@ -1550,7 +1549,7 @@
       "\n",
       "where $\\textbf{y}$ is the residual, $\\mathbf{x^-}$ is the predicted value for $\\mathbf{x}$, $\\textbf{z}$ is the measurement, and $\\textbf{H}$ is the measurement function. It is just a matrix that we multiply the state into to convert it into a measurement.\n",
       "\n",
-      "For our dog tracking problem we have a sensor that measures position, but no sensor that measures velocity. So for a given state $\\mathbf{x}=\\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^T$ we will want to multiply the position $x$ by 1 to get the corresponding measurement of the position, and multiply the velocity $\\dot{x}$ by 0 to get the corresponding measurement of velocity (of which there is none).\n",
+      "For our dog tracking problem we have a sensor that measures position, but no sensor that measures velocity. So for a given state $\\mathbf{x}=\\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathsf{T}$ we will want to multiply the position $x$ by 1 to get the corresponding measurement of the position, and multiply the velocity $\\dot{x}$ by 0 to get the corresponding measurement of velocity (of which there is none).\n",
       "\n",
       "We only have 1 measurement in this example, so the dimension of the residual matrix needs to be $1\\times 1$. $\\mathbf{x}$ is $2\\times 1$, so $\\mathbf{H}$ needs to be $1\\times 2$ to get the right result. If we put this in linear algebra terms we get:\n",
       "\n",
@@ -1599,9 +1598,9 @@
       "\n",
       "As you might expect, the Kalman filter uses the process noise matrix during the prediction step because the prediction step is the step that uses the process model to predict the next state. It is used in equation (2) from the Kalman filter equations:\n",
       "\n",
-      "$$\\mathbf{P} = \\mathbf{FP{F}}^T + \\mathbf{Q}$$\n",
+      "$$\\mathbf{P} = \\mathbf{FP{F}}^\\mathsf{T} + \\mathbf{Q}$$\n",
       "\n",
-      "If you look back at step one you will recall that $\\mathbf{P}$ is the covariance matrix. It will be of size $n\\times n$ where $n$ is the number of state variables. $\\mathbf{FP{F}}^T$ is just some linear algebra 'magic' that switches $\\mathbf{P}$ into state space. We will cover this in detail in the Kalman math chapter; for now I will just say that when the Kalman filter performs the math for Step 4 above it needed to convert $\\mathbf{P}$ into measurement space to compute the residual and Kalman gain, so here it it converted back into state space to perform the state prediction.\n",
+      "If you look back at step one you will recall that $\\mathbf{P}$ is the covariance matrix. It will be of size $n\\times n$ where $n$ is the number of state variables. $\\mathbf{FP{F}}^\\mathsf{T}$ is just some linear algebra 'magic' that switches $\\mathbf{P}$ into state space. We will cover this in detail in the Kalman math chapter; for now I will just say that when the Kalman filter performs the math for Step 4 above it needed to convert $\\mathbf{P}$ into measurement space to compute the residual and Kalman gain, so here it it converted back into state space to perform the state prediction.\n",
       "\n",
       "In pseudocode we might express this equation as:\n",
       "    \n",
@@ -2125,8 +2124,8 @@
       "Let'start with the predict step, which is slightly easier. Here are the multivariate equations. \n",
       "$$\n",
       "\\begin{aligned}\n",
-      "\\mathbf{x}^- &:= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
-      "\\mathbf{P} &:= \\mathbf{FP{F}}^T + \\mathbf{Q}\n",
+      "\\mathbf{x}^- &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
+      "\\mathbf{P^-} &= \\mathbf{FP{F}}^\\mathsf{T} + \\mathbf{Q}\n",
       "\\end{aligned}\n",
       "$$\n",
       "\n",
@@ -2145,14 +2144,14 @@
       "\n",
       "Hopefully the general process is clear, so now I will go a bit faster on the rest. Our other equation for the predict step is\n",
       "\n",
-      "$$\\mathbf{P} := \\mathbf{FP{F}}^T + \\mathbf{Q}$$\n",
+      "$$\\mathbf{P}^- = \\mathbf{FP{F}}^\\mathsf{T} + \\mathbf{Q}$$\n",
       "\n",
       "Again, since our state only has one variable $\\mathbf{P}$ and $\\mathbf{Q}$ must also be $1\\times 1$ matrix, which we can treat as scalars, yielding  \n",
-      "$$P := FPF^T + Q$$\n",
+      "$$P^- = FPF^\\mathsf{T} + Q$$\n",
       "\n",
-      "We already know $F=1$. The transpose of a scalar is the scalar, so $F^T = 1$. This yields\n",
+      "We already know $F=1$. The transpose of a scalar is the scalar, so $F^\\mathsf{T} = 1$. This yields\n",
       "\n",
-      "$$P := P + Q$$\n",
+      "$$P^- = P + Q$$\n",
       "\n",
       "which is equivalent to the Gaussian equation of \n",
       "$$\\sigma^2 = \\sigma_1^2 + \\sigma_2^2$$"
@@ -2165,10 +2164,10 @@
       "Here our our multivariate Kalman filter equations for the update step.\n",
       "$$\n",
       "\\begin{aligned}\n",
-      "\\textbf{y} &:= \\mathbf{z} - \\mathbf{H x}\\\\\n",
-      "\\mathbf{K}&:= \\mathbf{PH}^T (\\mathbf{HPH}^T + \\mathbf{R})^{-1} \\\\\n",
-      "\\mathbf{x}&:=\\mathbf{x}^- +\\mathbf{K\\textbf{y}} \\\\\n",
-      "\\mathbf{P}&:= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P}\n",
+      "\\textbf{y} &= \\mathbf{z} - \\mathbf{H x^-}\\\\\n",
+      "\\mathbf{K}&= \\mathbf{P^-H}^\\mathsf{T} (\\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R})^{-1} \\\\\n",
+      "\\mathbf{x}&=\\mathbf{x}^- +\\mathbf{K\\textbf{y}} \\\\\n",
+      "\\mathbf{P}&= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P^-}\n",
       "\\end{aligned}\n",
       "$$\n",
       "\n",
@@ -2176,10 +2175,10 @@
       "\n",
       "$$\n",
       "\\begin{aligned}\n",
-      "y &:= z - x\\\\\n",
-      "K &:=P / (P + R) \\\\\n",
-      "x &:=x +Ky \\\\\n",
-      "P &:= (1-K)P\n",
+      "y &= z - x^-\\\\\n",
+      "K &=P^- / (P^- + R) \\\\\n",
+      "x &=x +Ky \\\\\n",
+      "P &= (1-K)P^-\n",
       "\\end{aligned}\n",
       "$$\n",
       "\n",
@@ -2244,7 +2243,7 @@
      "source": [
       "##### Exercise: Compare to a Filter That Incorporates Velocity\n",
       "\n",
-      "The last example did not use one of the fundamental insights of this chapter, unobserved variables. In this example velocity would the the unobserved variable. Write a Kalman filter that uses the state $\\mathbf{x}=\\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^T$ and compare it against the filter in the last exercise which used the state $\\mathbf{x}=\\begin{bmatrix}x\\end{bmatrix}$."
+      "The last example did not use one of the fundamental insights of this chapter, unobserved variables. In this example velocity would the the unobserved variable. Write a Kalman filter that uses the state $\\mathbf{x}=\\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathsf{T}$ and compare it against the filter in the last exercise which used the state $\\mathbf{x}=\\begin{bmatrix}x\\end{bmatrix}$."
      ]
     },
     {
@@ -2433,7 +2432,7 @@
       "\n",
       "If we have $m$ state variables then $\\textbf{Q}$ will be an $m{\\times}m$ matrix. $\\textbf{Q}$ is a covariance matrix for the state variables, so it will contain the variances and covariances for the process noise for each state variable. \n",
       "\n",
-      "Let's make this concrete. Assume our state variables are $x=\\begin{bmatrix}x&\\dot{x}\\end{bmatrix}^T$. (**note**: it is customary to use this transpose form of writing an matrix in text. It is how we denote that $x$ is a column matrix without taking up a lot of line space). Then $\\textbf{Q}$ will contain:\n",
+      "Let's make this concrete. Assume our state variables are $x=\\begin{bmatrix}x&\\dot{x}\\end{bmatrix}^\\mathsf{T}$. (**note**: it is customary to use this transpose form of writing an matrix in text. It is how we denote that $x$ is a column matrix without taking up a lot of line space). Then $\\textbf{Q}$ will contain:\n",
       "\n",
       "$$Q = \\begin{bmatrix}\n",
       "\\sigma_x^2 & \\sigma_{x\\dot{x}} \\\\\n",
@@ -2666,7 +2665,7 @@
       "      54.235, 62.974, 61.742, 54.863, 52.831, 61.122, 61.187, 58.441, \n",
       "      47.769, 56.855, 53.693, 61.534, 70.665, 60.355, 65.095, 63.386]\n",
       "\n",
-      "plot_track (data=zs, R=5, Q=.02, P=500., count=len(zs), plot_P=False, title='P=500')"
+      "plot_track (data=zs, R=5., Q=.02, P=500., count=len(zs), plot_P=False, title='P=500')"
      ],
      "language": "python",
      "metadata": {},
@@ -2695,7 +2694,7 @@
      "cell_type": "code",
      "collapsed": false,
      "input": [
-      "plot_track (data=zs, R=5, Q=.02, P=1., count=len(zs), plot_P=False, title='P=1')"
+      "plot_track (data=zs, R=5., Q=.02, P=1., count=len(zs), plot_P=False, title='P=1')"
      ],
      "language": "python",
      "metadata": {},
@@ -2727,8 +2726,8 @@
      "cell_type": "code",
      "collapsed": false,
      "input": [
-      "x = np.array([[100,0]]).T\n",
-      "plot_track(data=zs, R=5, Q=.02, P=1., initial_x=x,count=len(zs),\n",
+      "x = np.array([[100.,0.]]).T\n",
+      "plot_track(data=zs, R=5., Q=.02, P=1., initial_x=x,count=len(zs),\n",
       "           plot_P=False, title='P=1')"
      ],
      "language": "python",
@@ -2907,7 +2906,7 @@
      "cell_type": "code",
      "collapsed": false,
      "input": [
-      "plot_track_ellipses (noise=0, R=500, Q=.2, count=7, title='R = 500')"
+      "plot_track_ellipses (noise=0., R=500., Q=.2, count=7, title='R = 500')"
      ],
      "language": "python",
      "metadata": {},
@@ -3001,8 +3000,8 @@
       "\n",
       "$$\n",
       "\\begin{aligned}\n",
-      "\\mathbf{\\hat{x}} &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
-      "\\mathbf{P} &= \\mathbf{FP{F}}^T + \\mathbf{Q} \n",
+      "\\mathbf{x}^- &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
+      "\\mathbf{P}^- &= \\mathbf{FP{F}}^\\mathsf{T} + \\mathbf{Q} \n",
       "\\end{aligned}\n",
       "$$\n",
       "\n",
@@ -3016,21 +3015,19 @@
       "\n",
       "With that in mind, the Python code `self.F.dot(self.x)` implements the math expression $\\mathbf{F x}$.\n",
       "\n",
-      "Numpy's `array` implements matrix transposition by using the `.T` property. Therefore, `F.T` is the python implementation of $\\mathbf{F}^T$.\n",
+      "Numpy's `array` implements matrix transposition by using the `.T` property. Therefore, `F.T` is the python implementation of $\\mathbf{F}^\\mathsf{T}$.\n",
       "\n",
       "The `update()` method implements the update equations of the Kalman filter, which are\n",
       "\n",
       "$$\n",
       "\\begin{aligned}\n",
-      "\\mathbf{\\tilde{y}} &= \\mathbf{z} - \\mathbf{H}\\hat{\\mathbf{x}} \\\\\n",
-      "\\mathbf{K} &= \\mathbf{P} \\mathbf{H}^T (\\mathbf{H} \\mathbf{P} \\mathbf{H}^T +\\mathbf{R})^{-1} \\\\\n",
-      "\\hat{\\mathbf{x}} &= \\hat{\\mathbf{x}} + \\mathbf{K} \\mathbf{\\tilde{y}} \\\\\n",
-      "\\mathbf{P} &= (\\mathbf{I} - \\mathbf{K} \\mathbf{H})\\mathbf{P}\n",
+      "\\mathbf{y} &= \\mathbf{z} - \\mathbf{H}\\mathbf{x^-} \\\\\n",
+      "\\mathbf{K} &= \\mathbf{P} \\mathbf{H}^\\mathsf{T} (\\mathbf{H} \\mathbf{P^-} \\mathbf{H}^\\mathsf{T} +\\mathbf{R})^{-1} \\\\\n",
+      "\\mathbf{x} &= \\mathbf{x}^- + \\mathbf{K} \\mathbf{y} \\\\\n",
+      "\\mathbf{P} &= (\\mathbf{I} - \\mathbf{K} \\mathbf{H})\\mathbf{P^-}\n",
       "\\end{aligned}\n",
       "$$\n",
       "\n",
-      "\n",
-      "\n",
       "The corresponding code is:\n",
       "\n",
       "    def update(self, Z, R=None):\n",
@@ -3102,11 +3099,11 @@
       "\n",
       "The rest of the code implements the Kalman filter equations, with one exception. Instead of implementing \n",
       "\n",
-      "$$\\mathbf{P} = (\\mathbf{I} - \\mathbf{KH})\\mathbf{P}$$\n",
+      "$$\\mathbf{P} = (\\mathbf{I} - \\mathbf{KH})\\mathbf{P}^-$$\n",
       "\n",
       "it implements the somewhat more complicated form \n",
       "\n",
-      "$$\\mathbf{P} = (\\mathbf{I} - \\mathbf{KH})\\mathbf{P}(\\mathbf{I} - \\mathbf{KRK})^T + \\mathbf{KRK}^T$$.\n",
+      "$$\\mathbf{P} = (\\mathbf{I} - \\mathbf{KH})\\mathbf{P}^-(\\mathbf{I} - \\mathbf{KRK})^\\mathsf{T} + \\mathbf{KRK}^\\mathsf{T}$$.\n",
       "\n",
       "The reason for this altered equation is that it is more numerically stable than the former equation, at the cost of being a bit more expensive to compute. It is not always possible to find the optimal value for $\\text{K}$, in which case the former equation will not produce good results because it assumes optimality. The longer reformulation used in the code is derived from more general math that does not assume optimality, and hence provides good results for non-optimal filters (such as when we can not correctly model our measurement error).\n",
       "\n",