From c1b3703ce68449cf07e36356aa22bb7e5bc2dbb7 Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Sun, 26 Jul 2015 19:20:58 -0700
Subject: [PATCH] Changed notation for prior to a car.

The ^- notation was pretty unreadable, and inconsistant with the
hat notation for estimates.
---
 06-Multivariate-Kalman-Filters.ipynb   |  95 +++++++------
 07-Kalman-Filter-Math.ipynb            |  28 ++--
 08-Designing-Kalman-Filters.ipynb      |  12 +-
 10-Unscented-Kalman-Filter.ipynb       |  36 ++---
 11-Extended-Kalman-Filters.ipynb       |  14 +-
 Appendix-B-Symbols-and-Notations.ipynb | 177 +++++++++++--------------
 code/ukf_internal.py                   |   2 +
 7 files changed, 168 insertions(+), 196 deletions(-)

diff --git a/06-Multivariate-Kalman-Filters.ipynb b/06-Multivariate-Kalman-Filters.ipynb
index 5d29578..0198b8c 100644
--- a/06-Multivariate-Kalman-Filters.ipynb
+++ b/06-Multivariate-Kalman-Filters.ipynb
@@ -714,37 +714,37 @@
     "\n",
     "$$\\mathbf{A} = \\begin{bmatrix}2& 3 \\\\ 3&-1\\end{bmatrix},\\, \\mathbf{x} = \\begin{bmatrix}x\\\\y\\end{bmatrix}, \\mathbf{b}=\\begin{bmatrix}8\\\\1\\end{bmatrix}$$\n",
     "\n",
-    "We call the set of equations that describe how the systems behaves the **process model**. We use the process model to perform the innovation, because the equations tell us what the next state will be given the current state. Kalman filters implement this using the linear equation, where $\\mathbf{x}^-$ is the *prior*, or predicted state:\n",
+    "We call the set of equations that describe how the systems behaves the **process model**. We use the process model to perform the innovation, because the equations tell us what the next state will be given the current state. Kalman filters implement this using the linear equation, where $\\mathbf{\\bar{x}}$ is the *prior*, or predicted state:\n",
     "\n",
-    "$$\\mathbf{x}^- = \\mathbf{Fx}$$\n",
+    "$$\\mathbf{\\bar{x}} = \\mathbf{Fx}$$\n",
     "\n",
-    "Our job as Kalman filters designers is to specify $\\mathbf{F}$ such that $\\mathbf{x}^- = \\mathbf{Fx}$ performs the innovation (prediction) for our system. To do this we need one equation for each state variable. In our problem $\\mathbf{x} = \\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathtt{T}$, so we need one equation for $x$ and a second one for $\\dot{x}$ . We already know the equation for the position innovation:\n",
+    "Our job as Kalman filters designers is to specify $\\mathbf{F}$ such that $\\bar{\\mathbf{x}}  = \\mathbf{Fx}$ performs the innovation (prediction) for our system. To do this we need one equation for each state variable. In our problem $\\mathbf{x} = \\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathtt{T}$, so we need one equation for $x$ and a second one for $\\dot{x}$ . We already know the equation for the position innovation:\n",
     "\n",
-    "$$x^- = \\dot{x} \\Delta t + x$$\n",
+    "$$\\mathbf{\\bar{x}} = \\dot{x} \\Delta t + x$$\n",
     "\n",
     "What is our equation for velocity ($\\dot{x}$)? Unfortunately, we have no predictive model for how our dog's velocity will change over time. In this case we assume that it remains constant between innovations. Of course this is not exactly true, but so long as the velocity doesn't change *too much* over each innovation you will see that the filter performs very well. So we say\n",
     "\n",
-    "$$\\dot{x}^- = \\dot{x}$$\n",
+    "$$\\bar{\\dot{x}} = \\dot{x}$$\n",
     "\n",
     "This gives us the process model for our system \n",
     "\n",
     "$$\\begin{aligned}\n",
-    "x^- &= \\dot{x} \\Delta t + x \\\\\n",
-    "\\dot{x}^- &= \\dot{x}\n",
+    "\\bar{x} &= \\dot{x} \\Delta t + x \\\\\n",
+    "\\bar{\\dot{x}} &= \\dot{x}\n",
     "\\end{aligned}$$\n",
     "\n",
-    "We need to express this set of equations in the form $\\mathbf{x}^- = \\mathbf{Fx}$. Let me rearrange terms to make it easier to see what to do.\n",
+    "We need to express this set of equations in the form $\\bar{\\mathbf{x}}  = \\mathbf{Fx}$. Let me rearrange terms to make it easier to see what to do.\n",
     "\n",
     "$$\\begin{aligned}\n",
-    "x^- &= 1x + &\\Delta t \\dot{x} \\\\\n",
-    "\\dot{x}^- &=0x + &\\dot{x}\n",
+    "\\bar{x} &= 1x + &\\Delta t \\dot{x} \\\\\n",
+    "\\bar{\\dot{x}} &=0x + &\\dot{x}\n",
     "\\end{aligned}$$\n",
     "\n",
     "We can rewrite this in matrix form as\n",
     "\n",
     "$$\\begin{aligned}\n",
     "{\\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix}}^- &= \\begin{bmatrix}1&\\Delta t  \\\\ 0&1\\end{bmatrix}  \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}\\\\\n",
-    "\\mathbf{x}^- &= \\mathbf{Fx}\n",
+    "\\mathbf{\\bar{x}} &= \\mathbf{Fx}\n",
     "\\end{aligned}$$\n",
     "\n",
     "$\\mathbf{F}$ is often called the **state transition function**. In the `KalmanFilter` class we implement the state transition function with"
@@ -777,7 +777,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's test this! `KalmanFilter` has a `predict` method that performs the prediction by computing $\\mathbf{x}^- = \\mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to  0.45 meter/sec. We've defined `dt=0.1`, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged."
+    "Let's test this! `KalmanFilter` has a `predict` method that performs the prediction by computing $\\mathbf{\\bar{x}} = \\mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to  0.45 meter/sec. We've defined `dt=0.1`, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged."
    ]
   },
   {
@@ -808,7 +808,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This worked. Note that the code does not distinguish between between the *prior* and *posterior* in the variable names, so after calling predict the prior $\\mathbf{x}^-$ is stored in `KalmanFilter.x`. If we call `predict()` several times in a row the value will be updated each time."
+    "This worked. Note that the code does not distinguish between between the *prior* and *posterior* in the variable names, so after calling predict the prior $\\bar{\\mathbf{x}}$ is stored in `KalmanFilter.x`. If we call `predict()` several times in a row the value will be updated each time."
    ]
   },
   {
@@ -845,7 +845,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "`KalmanFilter.predict()` computes both the mean and covariance of the innovation. This is the value of $\\mathbf{P}$ after three innovations (predictions)."
+    "`KalmanFilter.predict()` computes both the mean and covariance of the innovation. This is the value of $\\mathbf{P}$ after three innovations (predictions), which we denote $\\mathbf{\\bar{P}}$ in the Kalman filter equations."
    ]
   },
   {
@@ -914,7 +914,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "How does the filter compute new values for $\\mathbf{P}$, and what is it based on? It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground."
+    "How does the filter compute new values for $\\mathbf{\\bar{P}}$, and what is it based on? It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground."
    ]
   },
   {
@@ -993,7 +993,7 @@
     "\n",
     "Therefore the complete Kalman filter equation for the prior mean is\n",
     "\n",
-    "$$\\mathbf{x^-} = \\mathbf{Fx} + \\mathbf{Bu}$$\n",
+    "$$\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$$\n",
     "\n",
     "Your dog may be trained to respond to voice commands. All available evidence suggests that my dog has no control inputs, so I set $\\mathbf{B}$ to zero. In Python we write"
    ]
@@ -1057,11 +1057,11 @@
     "\n",
     "Both the measurement $\\mathbf{z}$ and state $\\mathbf{x}$ are vectors so we need to use a matrix to perform the conversion. The Kalman filter equation that performs this step is:\n",
     "\n",
-    "$$\\textbf{y} = \\mathbf{z} - \\mathbf{H x^-}$$\n",
+    "$$\\textbf{y} = \\mathbf{z} - \\mathbf{H \\bar{x}}$$\n",
     "\n",
     "where $\\textbf{y}$ is the residual, $\\mathbf{x^-}$ is the prior, $\\textbf{z}$ is the measurement, and $\\textbf{H}$ is the measurement function. So we take the prior, convert it to a measurement, and subtract it from the measurement our sensor gave us. This gives us the difference between our prediction and measurement in measurement space!\n",
     "\n",
-    "We need to design $\\mathbf{H}$ so that $\\textbf{Hx}^-$ yields a measurement. For this problem we have a sensor that measures position, so $\\mathbf{z}$ will be a one variable vector:\n",
+    "We need to design $\\mathbf{H}$ so that $\\mathbf{H\\bar{x}}$ yields a measurement. For this problem we have a sensor that measures position, so $\\mathbf{z}$ will be a one variable vector:\n",
     "\n",
     "$$\\mathbf{z} = \\begin{bmatrix}z\\end{bmatrix}$$\n",
     "\n",
@@ -1069,7 +1069,7 @@
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
-    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H}\\mathbf{x^-}  \\\\\n",
+    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H\\bar{x}}  \\\\\n",
     "\\begin{bmatrix}y \\end{bmatrix} &= \\begin{bmatrix}z\\end{bmatrix} - \\begin{bmatrix}?&?\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}\n",
     "\\end{aligned}\n",
     "$$\n",
@@ -1492,11 +1492,11 @@
     "The Kalman filter uses these equations to compute the *prior* - the predicted next state of the system. They compute the mean ($\\mathbf{x}$)  and covariance ($\\mathbf{P}$) of the system.\n",
     "\n",
     "$$\\begin{aligned}\n",
-    "\\mathbf{x}^- &= \\mathbf{Fx} + \\mathbf{Bu}\\\\\n",
-    "\\mathbf{P}^- &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
+    "\\mathbf{\\bar{x}} &= \\mathbf{Fx} + \\mathbf{Bu}\\\\\n",
+    "\\mathbf{\\bar{P}} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
     "\\end{aligned}$$\n",
     "\n",
-    "<u>**Mean**</u>\n",
+    "$\\underline{\\textbf{Mean}}$\n",
     "\n",
     "$\\mathbf{x}^- = \\mathbf{Fx} + \\mathbf{Bu}$\n",
     "\n",
@@ -1506,13 +1506,12 @@
     "\n",
     "If $\\mathbf{F}$ contains the state transition for a given time step, then the product $\\mathbf{Fx}$ computes the state after that transition. Easy! Likewise, $\\mathbf{B}$ is the control function, $\\mathbf{u}$ is the control input, so $\\mathbf{Bu}$ computes the contribution of the controls to the state after the transition.\n",
     "\n",
-    "<u>**Covariance**</u>\n",
+    "$\\underline{\\textbf{Covariance}}$\n",
     "\n",
-    "$\\mathbf{P}^- = \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$\n",
+    "$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$\n",
     "\n",
     "This equation is not as easy to understand so we will spend more time on it. \n",
     "\n",
-    "\n",
     "In the univariate chapter when we added Gaussians in the predict step we did it this way:\n",
     "\n",
     "$$\\mu = \\mu + \\mu_1\\\\\n",
@@ -1522,15 +1521,15 @@
     "\n",
     "In a multivariate Gaussians the state variables are *correlated*. What does this imply? Our knowledge of the velocity is imperfect, but we are adding it to the position with\n",
     "\n",
-    "$$x^- = \\dot{x}\\Delta t + x$$\n",
+    "$$\\bar{x} = \\dot{x}\\Delta t + x$$\n",
     "\n",
     "Since we do not have perfect knowledge of the value of $\\dot{x}$ the sum $x^- = \\dot{x}\\Delta t + x$ gains uncertainty. Because the positions and velocities are correlated we cannot simply add the covariance matrices. The correct equation is\n",
     "\n",
-    "$$\\mathbf{P} = \\mathbf{FPF}^\\mathsf{T}$$\n",
+    "$$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}$$\n",
     "\n",
     "The expression $\\mathbf{FPF}^\\mathsf{T}$ is seen all the time in linear algebra. You can think of it as *projecting* the middle term by the outer term. We will be using this many times in the rest of the book. I explain its derivation in the *Kalman Math* chapter. \n",
     "\n",
-    "For now we will look at its effect. Here I use $\\mathbf{F}$ from our filter and project the state forward 6/10ths of a second. I do this five times so you can see how $\\mathbf{P}$ continues to change. "
+    "For now we will look at its effect. Here I use $\\mathbf{F}$ from our filter and project the state forward 6/10ths of a second. I do this five times so you can see how $\\mathbf{\\bar{P}}$ continues to change. "
    ]
   },
   {
@@ -1571,7 +1570,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information asbout the position due to adding $\\dot{x}\\Delta t$ to x at each step. The height has not changed - our system model say the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf{F}$ linearly correlates $x$ with $\\dot{x}$ with the expression $x^- = \\dot{x} \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf{T}$ computation correctly incorporates this correlation into the covariance matrix!\n",
+    "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information asbout the position due to adding $\\dot{x}\\Delta t$ to x at each step. The height has not changed - our system model say the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf{F}$ linearly correlates $x$ with $\\dot{x}$ with the expression $\\bar{x} = \\dot{x} \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf{T}$ computation correctly incorporates this correlation into the covariance matrix!\n",
     "\n",
     "Here is an animation of this equation that allows you to change the design of $\\mathbf{F}$ to see how it affects shape of $\\mathbf{P}$. The `F00` slider affects the value of F[0, 0]. `covar` sets the intial covariance between the position and velocity($\\sigma_x\\sigma_{\\dot{x}}$). I recommend answering these questions at a minimum\n",
     "\n",
@@ -1684,7 +1683,7 @@
     "\n",
     "$\\textbf{S} = \\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R}$\n",
     "\n",
-    "To work in measurement space the Kalman filter has to project the covariance matrix into measurement space. The math for this is $\\mathbf{HP^-H}^\\mathsf{T}$, where $\\mathbf{P}^-$ is the *prior* covariance and $\\mathbf{H}$ is the measurement function.\n",
+    "To work in measurement space the Kalman filter has to project the covariance matrix into measurement space. The math for this is $\\mathbf{H\\bar{P}H}^\\mathsf{T}$, where $\\mathbf{\\bar{P}}$ is the *prior* covariance and $\\mathbf{H}$ is the measurement function.\n",
     "\n",
     "\n",
     "You should recognize this $\\mathbf{ABA}^\\mathsf{T}$ form - the prediction step used $\\mathbf{FPF}^\\mathsf{T}$ to update $\\mathbf{P}$ with the state transition function. Here, we use the same form to update it with the measurement function. In a real sense the linear algebra is changing the coordinate system for us. \n",
@@ -1695,8 +1694,8 @@
     "\n",
     "\n",
     "$$\\begin{aligned}\n",
-    "\\mathbf{S} &= \\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R}\\\\\n",
-    "\\mathbf{P} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
+    "\\mathbf{S} &= \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R}\\\\\n",
+    "\\mathbf{\\bar{P}} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
     "\\end{aligned}$$\n",
     "\n",
     "You can see that they are performing the same computation. In each $\\mathbf{P}$ is put into a different space with either the function $\\mathbf{H}$ or $\\mathbf{F}$. Once that is done we add the noise matrix associated with that space."
@@ -1708,7 +1707,7 @@
    "source": [
     "<u>**Kalman Gain**</u>\n",
     "\n",
-    "$\\mathbf{K} = \\mathbf{P^-H}^\\mathsf{T} \\mathbf{S}^{-1}$\n",
+    "$\\mathbf{K} = \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1}$\n",
     "\n",
     "Look back at the diagram above. Once we have a prediction and a measurement we need to select an estimate somewhere between the two. If we have more certainty about the measurement the estimate will be closer to it. If instead we have more certainty about the prediction then the estimate will be closer to it. \n",
     "\n",
@@ -1723,10 +1722,10 @@
     "\n",
     "Here $K$ is the Kalman gain, and it is a scaler between 0 and 1. Examine this equation and ensure you understand how it selects a mean somewhere between the prediction and measurement. In this form the Kalman gain is essentially a *percentage* or *ratio* - if K is .9 it takes 90% of the measurement and 10% of the prediction. \n",
     "\n",
-    "For the multivariate Kalman filter $\\mathbf{K}$ is a vector, not a scalar. Here is the equation again: $\\mathbf{K} = \\mathbf{P^-H}^\\mathsf{T} \\mathbf{S}^{-1}$. Is this a *ratio*? We can think of the inverse of a matrix as linear algebra's way of doing matrix division. Division is not defined for matrices, but it is useful to think of it in this way. So we can read the equation for $\\textbf{K}$ as meaning\n",
+    "For the multivariate Kalman filter $\\mathbf{K}$ is a vector, not a scalar. Here is the equation again: $\\mathbf{K} = \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1}$. Is this a *ratio*? We can think of the inverse of a matrix as linear algebra's way of doing matrix division. Division is not defined for matrices, but it is useful to think of it in this way. So we can read the equation for $\\textbf{K}$ as meaning\n",
     "\n",
-    "$$\\begin{aligned} \\textbf{K} &\\approx \\frac{\\textbf{P}\\textbf{H}^\\mathsf{T}}{\\mathbf{S}} \\\\\n",
-    "\\textbf{K} &\\approx \\frac{\\mathsf{uncertainty}_\\mathsf{prediction}}{\\mathsf{uncertainty}_\\mathsf{measurement}}\\textbf{H}^\\mathsf{T}\n",
+    "$$\\begin{aligned} \\mathbf{K} &\\approx \\frac{\\mathbf{\\bar{P}}\\mathbf{H}^\\mathsf{T}}{\\mathbf{S}} \\\\\n",
+    "\\mathbf{K} &\\approx \\frac{\\mathsf{uncertainty}_\\mathsf{prediction}}{\\mathsf{uncertainty}_\\mathsf{measurement}}\\mathbf{H}^\\mathsf{T}\n",
     "\\end{aligned}$$"
    ]
   },
@@ -1768,15 +1767,15 @@
     "$$\n",
     "\\begin{aligned}\n",
     "\\text{Predict Step}\\\\\n",
-    "\\mathbf{x^-} &= \\mathbf{F x} + \\mathbf{B u}\\;\\;\\;&(1) \\\\\n",
-    "\\mathbf{P^-} &= \\mathbf{FP{F}}^\\mathsf{T} + \\mathbf{Q}\\;\\;\\;&(2) \\\\\n",
+    "\\mathbf{\\bar{x}} &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
+    "\\mathbf{\\bar{P}} &= \\mathbf{FP{F}}^\\mathsf{T} + \\mathbf{Q} \\\\\n",
     "\\\\\n",
     "\\text{Update Step}\\\\\n",
-    "\\textbf{S} &= \\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R} \\;\\;\\;&(3)\\\\\n",
-    "\\mathbf{K} &= \\mathbf{P^-H}^\\mathsf{T} \\mathbf{S}^{-1}\\;\\;\\;&(4) \\\\\n",
-    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H x^-} \\;\\;\\;&(5)\\\\\n",
-    "\\mathbf{x} &=\\mathbf{x^-} +\\mathbf{K\\textbf{y}} \\;\\;\\;&(6)\\\\\n",
-    "\\mathbf{P} &= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P^-}\\;\\;\\;&(7)\n",
+    "\\textbf{S} &= \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R} \\\\\n",
+    "\\mathbf{K} &= \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1} \\\\\n",
+    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H \\bar{x}} \\\\\n",
+    "\\mathbf{x} &=\\mathbf{\\bar{x}} +\\mathbf{K\\textbf{y}} \\\\\n",
+    "\\mathbf{P} &= (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}\n",
     "\\end{aligned}\n",
     "$$\n",
     "\n",
@@ -1794,13 +1793,13 @@
     "\\\\\\end{aligned}\n",
     "$$\n",
     "\n",
-    "The notation above makes heavy use of the Bayesian a$\\mid$b notation, which means a given the evidence of b. The hat means estimate. So, $\\hat{\\mathbf{x}}_{k\\mid k}$ means the estimate of the state $\\mathbf{X}$ at time $k$ (the first k) given the evidence from time $k$ (the second k). The posterior, in other words. $\\hat{\\mathbf{x}}_{k\\mid k-1}$ means the estimate for the state $\\mathbf{x}$ at time k given the estimate from time k - 1. The prior, in other words. \n",
+    "The notation above makes use of the Bayesian a$\\mid$b notation, which means a given the evidence of b. The hat means estimate. So, $\\hat{\\mathbf{x}}_{k\\mid k}$ means the estimate of the state $\\mathbf{X}$ at time $k$ (the first k) given the evidence from time $k$ (the second k). The posterior, in other words. $\\hat{\\mathbf{x}}_{k\\mid k-1}$ means the estimate for the state $\\mathbf{x}$ at time k given the estimate from time k - 1. The prior, in other words. \n",
     "\n",
     "This notation allows a mathematician to express himself exactly, and when it comes to formal publications presenting new results this precision is necessary. As a programmer I find all of that fairly unreadable; I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format, so each author makes different choices. I find it challenging to switch quickly between books an papers, and so have adopted my admittedly less precise notation. Mathematicians will write scathing emails to me, but I hope the programmers and students will rejoice.\n",
     "\n",
     "Here are some examples for how other authors write the prior: $X^*_{n+1,n}$, $\\underline{\\hat{x}}_k(-)$ (really!), $\\hat{\\textbf{x}}^-_{k+1}$, $\\hat{x}_{k}$. If you are lucky an author defines the notation; more often you have to read the equations in context to recognize what the author is doing. Of course, people write within a tradition; papers on Kalman filters in finance are likely to use one set of notations while papers on radar tracking are likely to use a different set. Over time you will start to become familiar with trends, and also instantly recognize when somebody just copied equations wholesale from another work. For example - the equations I gave above were copied from the  Wikipedia [Kalman Filter](https://en.wikipedia.org/wiki/Kalman_filter#Details) [[1]](#[wiki_article]) article.\n",
     "\n",
-    "The *Symbology* Chapter lists the notation used by various authors. This brings up another difficulty. Different authors use different variable names. $\\mathbf{x}$ is fairly universal, but after that it is anybody's guess. Again, you need to read carefully, and hope that the author defines their variables (they often do not).\n",
+    "The *Symbology* Appendix lists the notation used by various authors. This brings up another difficulty. Different authors use different variable names. $\\mathbf{x}$ is fairly universal, but after that it is anybody's guess. Again, you need to read carefully, and hope that the author defines their variables (they often do not).\n",
     "\n",
     "If you are a programmer trying to understand a paper's math equations, I suggest starting by removing all of the superscripts, subscripts, and diacriticals, replacing them with a single letter. If you work with equations like this every day this is superfluous advice, but when I read I am usually trying to understand the flow of computation. To me it is far more understandable to remember that $P$ in this step represents the updated value of $P$ computed in the last step, as opposed to trying to remember what $P_{k-1}(+)$ denotes, and what its relation to $P_k(-)$ is, if any, and how any of that relates to the completely different notation used in the paper I read 5 minutes ago."
    ]
@@ -2325,9 +2324,9 @@
    "source": [
     "This *looks* good at first blush. The plot does not have the spike that the former plot did; the filter starts tracking the measurements and doesn't take any time to settle to the signal. However, if we look at the plots for P you can see that there is an initial spike for the variance in position, and that it never really converges. Poor design leads to a long convergence time, and suboptimal results. \n",
     "\n",
-    "So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to use a small $\\text{P}$. Yes, this will avoid having the Kalman filter take time to accurately track the signal, but if we are truly uncertain about the initial measurements this can cause the filter to generate very bad results. If we are tracking a living object we are probably very uncertain about where it is before we start tracking it. On the other hand, if we are filtering the output of a thermometer, we are as certain about the first measurement as the 1000th. For your Kalman filter to perform well you must set $\\text{P}$ to a value that truly reflects your knowledge about the data. \n",
+    "So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to use a small $\\mathbf{P}$. Yes, this will avoid having the Kalman filter take time to accurately track the signal, but if we are truly uncertain about the initial measurements this can cause the filter to generate very bad results. If we are tracking a living object we are probably very uncertain about where it is before we start tracking it. On the other hand, if we are filtering the output of a thermometer, we are as certain about the first measurement as the 1000th. For your Kalman filter to perform well you must set $\\mathbf{P}$ to a value that truly reflects your knowledge about the data. \n",
     "\n",
-    "Let's see the result of a bad initial estimate coupled with a very small $\\text{P}$ We will set our initial estimate at 100 m (whereas the dog actually starts at 0m), but set `P=1 m`."
+    "Let's see the result of a bad initial estimate coupled with a very small $\\mathbf{P}$ We will set our initial estimate at 100 m (whereas the dog actually starts at 0m), but set `P=1 m`."
    ]
   },
   {
@@ -2361,7 +2360,7 @@
    "source": [
     "We can see that the initial estimates are terrible and that it takes the filter a long time to start converging onto the signal . This is because we told the Kalman filter that we strongly believe in our initial estimate of 100 m and were incorrect in that belief.\n",
     "\n",
-    "Now, let's provide a more reasonable value for P and see the difference."
+    "Now, let's provide a more reasonable value for `P` and see the difference."
    ]
   },
   {
diff --git a/07-Kalman-Filter-Math.ipynb b/07-Kalman-Filter-Math.ipynb
index ab62ca2..aa726da 100644
--- a/07-Kalman-Filter-Math.ipynb
+++ b/07-Kalman-Filter-Math.ipynb
@@ -746,12 +746,12 @@
     "\n",
     "> **Note:** This section will provide you with a strong intuition into what the Kalman filter equations are actually doing. While this section is not strictly required, I recommend reading this section carefully as it should make the rest of the material easier to understand. It is not merely a proof of correctness that you would normally want to skip past! The equations look complicated, but they are actually doing something quite simple.\n",
     "\n",
-    "Let's start with the predict step, which is slightly easier. Here are the multivariate equations.\n",
+    "Let's start with the predict step, which is slightly easier. Here are the multivariate equations. \n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
-    "\\mathbf{x} &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
-    "\\mathbf{P} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
+    "\\mathbf{\\bar{x}} &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
+    "\\mathbf{\\bar{P}} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
     "\\end{aligned}\n",
     "$$\n",
     "\n",
@@ -771,7 +771,7 @@
     "\n",
     "Hopefully the general process is clear, so now I will go a bit faster on the rest. Our other equation for the predict step is\n",
     "\n",
-    "$$\\mathbf{P} = \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$$\n",
+    "$$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$$\n",
     "\n",
     "Again, since our state only has one variable $\\mathbf{P}$ and $\\mathbf{Q}$ must also be $1\\times 1$ matrix, which we can treat as scalars, yielding  \n",
     "\n",
@@ -796,10 +796,10 @@
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
-    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H x}\\\\\n",
-    "\\mathbf{K}&= \\mathbf{PH}^\\mathsf{T} (\\mathbf{HPH}^\\mathsf{T} + \\mathbf{R})^{-1} \\\\\n",
-    "\\mathbf{x}&=\\mathbf{x} +\\mathbf{K\\textbf{y}} \\\\\n",
-    "\\mathbf{P}&= (\\mathbf{I}-\\mathbf{KH})\\mathbf{P}\n",
+    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H \\bar{x}}\\\\\n",
+    "\\mathbf{K}&= \\mathbf{\\bar{P}H}^\\mathsf{T} (\\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R})^{-1} \\\\\n",
+    "\\mathbf{x}&=\\mathbf{\\bar{x}} +\\mathbf{K\\textbf{y}} \\\\\n",
+    "\\mathbf{P}&= (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}\n",
     "\\end{aligned}\n",
     "$$\n",
     "\n",
@@ -1963,17 +1963,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The Kalman filter predict equation is $\\mathbf{x}^- = \\mathbf{Fx} + \\mathbf{Bu}$. Hence the prediction is\n",
+    "The Kalman filter predict equation is $\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$. Hence the prediction is\n",
     "\n",
-    "$$\\mathbf{x}^- = \\begin{bmatrix}\n",
+    "$$\\mathbf{\\bar{x}} = \\begin{bmatrix}\n",
     "1 & \\Delta t \\\\ 0 & 1\\end{bmatrix}\\begin{bmatrix}\n",
     "x\\\\ \\dot{x}\\end{bmatrix}\n",
     "$$\n",
     "\n",
     "which multiplies out to \n",
     "\n",
-    "$$\\begin{aligned}x^- &= x + v\\Delta t \\\\\n",
-    "\\dot{x}^- &= \\dot{x}\\end{aligned}$$.\n",
+    "$$\\begin{aligned}\\bar{x} &= x + v\\Delta t \\\\\n",
+    "\\bar{\\dot{x}} &= \\dot{x}\\end{aligned}$$.\n",
     "\n",
     "This works for linear ordinary differential equations (ODEs), but does not work (well) for nonlinear equations. For example, consider trying to predict the position of a rapidly turning car. Cars turn by pivoting the front wheels, which cause the car to pivot around the rear axle. Therefore the path will be continuously varying and a linear prediction will necessarily produce an incorrect value. If the change in the system is small enough relative to $\\Delta t$ this can often produce adequate results, but that will rarely be the case with the nonlinear Kalman filters we will be studying in subsequent chapters. Another problem is that even trivial systems produce differential equations for which finding closed form solutions is difficult or impossible. \n",
     "\n",
@@ -2364,9 +2364,9 @@
     "\n",
     "Let's start with some definitions which should be familiar to you. First, we define the innovation as \n",
     "\n",
-    "$$\\delta \\mathbf{z}^-= \\mathbf{z} - h(\\mathbf{x}^-)$$\n",
+    "$$\\delta \\mathbf{\\bar{z}}= \\mathbf{z} - h(\\mathbf{\\bar{x}})$$\n",
     "\n",
-    "where $\\mathbf{z}$ is the measurement, $h(\\bullet)$ is the measurement function, and $\\delta \\mathbf{z}^-$ is the innovation, which we abbreviate as $y$ in FilterPy. I don't use the $\\mathbf{x}^-$ symbology often, but it is the prediction for the state variable. In other words, this is the equation $\\mathbf{y} = \\mathbf{z} - \\mathbf{Hx}$ in the linear Kalman filter's update step.\n",
+    "where $\\mathbf{z}$ is the measurement, $h(\\bullet)$ is the measurement function, and $\\delta \\mathbf{\\bar{z}}$ is the innovation, which we abbreviate as $y$ in FilterPy. In other words, this is the equation $\\mathbf{y} = \\mathbf{z} - \\mathbf{H\\bar{x}}$ in the linear Kalman filter's update step.\n",
     "\n",
     "Next, the *measurement residual* is\n",
     "\n",
diff --git a/08-Designing-Kalman-Filters.ipynb b/08-Designing-Kalman-Filters.ipynb
index f643956..8925521 100644
--- a/08-Designing-Kalman-Filters.ipynb
+++ b/08-Designing-Kalman-Filters.ipynb
@@ -413,15 +413,15 @@
    "source": [
     "Our next step is to design the state transition function. Recall that the state transition function is implemented as a matrix $\\mathbf{F}$ that we multiply with the previous state of our system to get the next state, like so. \n",
     "\n",
-    "$$\\mathbf{x}^- = \\mathbf{Fx}$$\n",
+    "$$\\mathbf{\\bar{x}} = \\mathbf{Fx}$$\n",
     "\n",
     "I will not belabor this as it is very similar to the 1-D case we did in the previous chapter. The state equations are\n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
-    "x^- &= 1x + \\Delta t \\dot{x} + y + 0 \\dot{y} \\\\\n",
+    "x &= 1x + \\Delta t \\dot{x} + y + 0 \\dot{y} \\\\\n",
     "v_x &= 0x + 1\\dot{x} + 1y + 0 \\dot{y} \\\\\n",
-    "y^- &= 0x + 0\\dot{x} + 1y + \\Delta t \\dot{y} \\\\\n",
+    "y &= 0x + 0\\dot{x} + 1y + \\Delta t \\dot{y} \\\\\n",
     "v_y &= 0x + 0\\dot{x} + 0y + 1 \\dot{y}\n",
     "\\end{aligned}\n",
     "$$"
@@ -2826,7 +2826,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We might think to use the same state variables as used for tracking the dog. However, this will not work. Recall that the Kalman filter state transition must be written as $\\mathbf{x}^- = \\mathbf{Fx} + \\mathbf{Bu}$, which means we must calculate the current state from the previous state. Our assumption is that the ball is traveling in a vacuum, so the velocity in x is a constant, and the acceleration in y is solely due to the gravitational constant $g$. We can discretize the Newtonian equations using the well known Euler method in terms of $\\Delta t$ are:\n",
+    "We might think to use the same state variables as used for tracking the dog. However, this will not work. Recall that the Kalman filter state transition must be written as $\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$, which means we must calculate the current state from the previous state. Our assumption is that the ball is traveling in a vacuum, so the velocity in x is a constant, and the acceleration in y is solely due to the gravitational constant $g$. We can discretize the Newtonian equations using the well known Euler method in terms of $\\Delta t$ are:\n",
     "\n",
     "$$\\begin{aligned}\n",
     "x_t &=  x_{t-1} + v_{x(t-1)} {\\Delta t} \\\\\n",
@@ -2837,7 +2837,7 @@
     "\\end{aligned}\n",
     "$$\n",
     "\n",
-    "> **sidebar**: *Euler's method integrates a differential equation stepwise by assuming the slope (derivative) is constant at time $t$. In this case the derivative of the position is velocity. At each time step $\\Delta t$ we assume a constant velocity, compute the new position, and then update the velocity for the next time step. There are more accurate methods, such as Runge-Kutta available to us, but because we are updating the state with a measurement in each step Euler's method is very accurate.* If you need to use Runge-Kutta you will have to write your own `predict()` function which computes the state transition for $\\mathbf{x}$, and then uses the normal Kalman filter equation $\\mathbf{P}=\\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$ to update the covariance matrix.\n",
+    "> **sidebar**: *Euler's method integrates a differential equation stepwise by assuming the slope (derivative) is constant at time $t$. In this case the derivative of the position is velocity. At each time step $\\Delta t$ we assume a constant velocity, compute the new position, and then update the velocity for the next time step. There are more accurate methods, such as Runge-Kutta available to us, but because we are updating the state with a measurement in each step Euler's method is very accurate.* If you need to use Runge-Kutta you will have to write your own `predict()` function which computes the state transition for $\\mathbf{x}$, and then uses the normal Kalman filter equation $\\mathbf{\\bar{P}}=\\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$ to update the covariance matrix.\n",
     "\n",
     "This implies that we need to incorporate acceleration for $y$ into the Kalman filter, but not for $x$. This suggests the following state variables.\n",
     "\n",
@@ -2854,7 +2854,7 @@
     "\n",
     "However, the acceleration is due to gravity, which is a constant. Instead of asking the Kalman filter to track a constant we can treat gravity as what it really is - a control input. In other words, gravity is a force that alters the behavior of the system in a known way, and it is applied throughout the flight of the ball. \n",
     "\n",
-    "The equation for the state prediction is $\\mathbf{x^-} = \\mathbf{Fx} + \\mathbf{Bu}$. $\\mathbf{Fx}$ is the familiar state transition function which we will use to model the position and velocity of the ball. The vector $\\mathbf{u}$ lets you specify a control input into the filter. For a car the control input will be things such as the amount the accelerator and brake are pressed, the position of the steering wheel, and so on. For our ball the control input will be gravity. The matrix $\\mathbf{B}$ models how the control inputs affect the behavior of the system. Again, for a car $\\mathbf{B}$ will convert the inputs of the brake and accelerator into changes of velocity, and the input of the steering wheel into a different position and heading. For our ball tracking problem it will compute the velocity change due to gravity. We will go into the details of that in step 3. For now, we design the state variable to be\n",
+    "The equation for the state prediction is $\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$. $\\mathbf{Fx}$ is the familiar state transition function which we will use to model the position and velocity of the ball. The vector $\\mathbf{u}$ lets you specify a control input into the filter. For a car the control input will be things such as the amount the accelerator and brake are pressed, the position of the steering wheel, and so on. For our ball the control input will be gravity. The matrix $\\mathbf{B}$ models how the control inputs affect the behavior of the system. Again, for a car $\\mathbf{B}$ will convert the inputs of the brake and accelerator into changes of velocity, and the input of the steering wheel into a different position and heading. For our ball tracking problem it will compute the velocity change due to gravity. We will go into the details of that in step 3. For now, we design the state variable to be\n",
     "\n",
     "$$\n",
     "\\mathbf{x} = \n",
diff --git a/10-Unscented-Kalman-Filter.ipynb b/10-Unscented-Kalman-Filter.ipynb
index 554b8d3..b53dfc2 100644
--- a/10-Unscented-Kalman-Filter.ipynb
+++ b/10-Unscented-Kalman-Filter.ipynb
@@ -745,8 +745,8 @@
     "Now we compute the predicted mean and covariance using the *unscented transform *on the transformed sigma points. I've dropped the subscript $i$ for readability.\n",
     "\n",
     "$$\\begin{aligned}\n",
-    "\\mathbf{\\mu}^- &= \\sum w^m\\boldsymbol{\\mathcal{Y}} \\\\\n",
-    "\\mathbf{\\Sigma}^- &= \\sum w^c({\\boldsymbol{\\mathcal{Y}}-\\bf{\\mu}^-)(\\boldsymbol{\\mathcal{Y}}-\\bf{\\mu}^-)^\\mathsf{T}} + \\mathbf{Q}\n",
+    "\\mathbf{\\bar{\\mu}} &= \\sum w^m\\boldsymbol{\\mathcal{Y}} \\\\\n",
+    "\\mathbf{\\bar{\\Sigma}} &= \\sum w^c({\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})(\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})^\\mathsf{T}} + \\mathbf{Q}\n",
     "\\end{aligned}\n",
     "$$\n",
     "\n",
@@ -755,8 +755,8 @@
     "$$\\begin{array}{l|l}\n",
     "\\mathrm{Kalman} & \\mathrm{Unscented} \\\\\n",
     "\\hline \n",
-    "\\mathbf{x}^- = \\mathbf{Fx} & \\mathbf{\\mu}^- = \\sum w^m\\boldsymbol{\\mathcal{Y}}  \\\\\n",
-    "\\mathbf{P}^- = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q}  & \\mathbf{P}^- = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q}\n",
+    "\\mathbf{\\bar{x}} = \\mathbf{Fx} & \\mathbf{\\bar{\\mu}} = \\sum w^m\\boldsymbol{\\mathcal{Y}}  \\\\\n",
+    "\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q}  & \\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q}\n",
     "\\end{array}$$\n",
     "\n"
    ]
@@ -780,7 +780,7 @@
     "\n",
     "$$\\begin{aligned}\n",
     "\\mathbf{\\mu}_z &= \\sum w^m\\boldsymbol{\\mathcal{Z}} \\\\\n",
-    "\\mathbf{P}_z &= \\sum w^c{(\\boldsymbol{\\mathcal{Z}}-\\mu^-)(\\boldsymbol{\\mathcal{Z}}-\\mu^-)^\\mathsf{T}} + \\mathbf{R}\n",
+    "\\mathbf{P}_z &= \\sum w^c{(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})^\\mathsf{T}} + \\mathbf{R}\n",
     "\\end{aligned}\n",
     "$$\n",
     "\n",
@@ -805,11 +805,11 @@
     "\n",
     "Finally, we compute the new state estimate using the residual and Kalman gain:\n",
     "\n",
-    "$$\\mathbf{x} = \\mathbf{x}^- + \\mathbf{Ky}$$\n",
+    "$$\\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky}$$\n",
     "\n",
     "and the new covariance is computed as:\n",
     "\n",
-    "$$ \\mathbf{P} = \\mathbf{\\Sigma}^- - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}$$\n",
+    "$$ \\mathbf{P} = \\mathbf{\\bar{\\Sigma}} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}$$\n",
     "\n",
     "This step contains a few equations you have to take on faith, but you should be able to see how they relate to the linear Kalman filter equations. We convert the mean and covariance into measurement space, add the measurement error into the measurement covariance, compute the residual and kalman gain, compute the new state estimate as the old estimate plus the residual times the Kalman gain, and adjust the covariance for the information provided by the measurement. The linear algebra is slightly different from the linear Kalman filter, but the algorithm is the same Bayesian algorithm we have been implementing throughout the book. \n",
     "\n",
@@ -823,17 +823,17 @@
     "$$\\begin{array}{l|l}\n",
     "\\textrm{Kalman Filter} & \\textrm{Unscented Kalman Filter} \\\\\n",
     "\\hline \n",
-    "\\mathbf{x}^- = \\mathbf{Fx} & \\mathbf{\\mu}^- = \\sum w^m\\boldsymbol{\\mathcal{Y}}  \\\\\n",
-    "\\mathbf{P}^- = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q}  & \\mathbf{P}^- = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q} \\\\\n",
+    "\\mathbf{\\bar{x}} = \\mathbf{Fx} & \\mathbf{\\bar{\\mu}} = \\sum w^m\\boldsymbol{\\mathcal{Y}}  \\\\\n",
+    "\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q}  & \\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q} \\\\\n",
     "\\hline \n",
-    "\\mathbf{y} = \\boldsymbol{\\mathbf{z}} - \\mathbf{Hx}  &\n",
+    "\\mathbf{y} = \\boldsymbol{\\mathbf{z}} - \\mathbf{H\\bar{x}}  &\n",
     "\\mathbf{y} = \\mathbf{z} - \\sum w^m h(\\boldsymbol{\\mathcal{Y}})\\\\\n",
-    "\\mathbf{S} = \\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R} & \n",
-    "\\mathbf{P}_z = \\sum w^c{(\\boldsymbol{\\mathcal{Z}}-\\mu^-)(\\boldsymbol{\\mathcal{Z}}-\\mu^-)^\\mathsf{T}} + \\mathbf{R} \\\\ \n",
-    "\\mathbf{K} = \\mathbf{P^-H}^\\mathsf{T} \\mathbf{S}^{-1} &\n",
+    "\\mathbf{S} = \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R} & \n",
+    "\\mathbf{P}_z = \\sum w^c{(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})^\\mathsf{T}} + \\mathbf{R} \\\\ \n",
+    "\\mathbf{K} = \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1} &\n",
     "\\mathbf{K} = \\left[\\sum w^c(\\boldsymbol{\\chi}-\\mu)(\\boldsymbol{\\mathcal{Z}}-\\mathbf{\\mu}_z)^\\mathsf{T}\\right] \\mathbf{P}_z^{-1}\\\\\n",
-    "\\mathbf{x} = \\mathbf{x}^- + \\mathbf{Ky} & \\mathbf{x} = \\mathbf{x}^- + \\mathbf{Ky}\\\\\n",
-    "\\mathbf{P} = (\\mathbf{I}-\\mathbf{KH})\\mathbf{P}^- & \\mathbf{P} = \\mathbf{\\Sigma}^- - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}\n",
+    "\\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky} & \\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky}\\\\\n",
+    "\\mathbf{P} = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}} & \\mathbf{P} = \\mathbf{\\bar{\\Sigma}} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}\n",
     "\\end{array}$$"
    ]
   },
@@ -1509,7 +1509,7 @@
     "\n",
     "This requires the following change to the state transition function, which is still linear.\n",
     "\n",
-    "$$\\mathbf{x}^- = \\begin{bmatrix} 1 & \\Delta t & 0 &0 \\\\ 0& 1& 0 &0\\\\ 0&0&1&dt \\\\ 0&0&0&1\\end{bmatrix}\n",
+    "$$\\mathbf{\\bar{x}} = \\begin{bmatrix} 1 & \\Delta t & 0 &0 \\\\ 0& 1& 0 &0\\\\ 0&0&1&dt \\\\ 0&0&0&1\\end{bmatrix}\n",
     "\\begin{bmatrix}x \\\\\\dot{x}\\\\ y\\\\ \\dot{y}\\end{bmatrix} \n",
     "$$\n",
     "\n",
@@ -2359,11 +2359,11 @@
     "\n",
     "\n",
     "$$K = \\mathbf{P}_{xz} \\mathbf{P}_z^{-1}\\\\\n",
-    "{\\mathbf{x}} = \\mathbf{x}^- + \\mathbf{Ky}$$\n",
+    "{\\mathbf{x}} = \\mathbf{\\bar{x}} + \\mathbf{Ky}$$\n",
     "\n",
     "and the new covariance is computed as:\n",
     "\n",
-    "$$ \\mathbf{P} = \\mathbf{P}^- - \\mathbf{KP}_z\\mathbf{K}^\\mathsf{T}$$\n",
+    "$$ \\mathbf{P} = \\mathbf{\\bar{P}} - \\mathbf{KP}_z\\mathbf{K}^\\mathsf{T}$$\n",
     "\n",
     "This function can be implemented as follows, assuming it is a method of a class that stores the necessary matrices and data."
    ]
diff --git a/11-Extended-Kalman-Filters.ipynb b/11-Extended-Kalman-Filters.ipynb
index 10cc9d4..43385aa 100644
--- a/11-Extended-Kalman-Filters.ipynb
+++ b/11-Extended-Kalman-Filters.ipynb
@@ -456,8 +456,8 @@
     "$$\n",
     "\\begin{array}{ll}\n",
     "\\textbf{Linear} & \\textbf{Nonlinear} \\\\\n",
-    "x = Fx & x = \\underline{f(x)} \\\\\n",
-    "P = FPF^T + Q & P = FPF^T + Q\n",
+    "\\mathbf{\\bar{x}} = \\mathbf{Fx} & \\mathbf{\\bar{x}} = \\underline{f(x)} \\\\\n",
+    "\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q} & \\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
     "\\end{array}\n",
     "$$\n",
     "\n",
@@ -466,9 +466,9 @@
     "$$\n",
     "\\begin{array}{ll}\n",
     "\\textbf{Linear} & \\textbf{Nonlinear} \\\\\n",
-    "K = PH^T(HPH^T + R)^{-1}& K = PH^T(HPH^T + R)^{-1}\\\\\n",
-    "x = x + K(z-Hx) & x = x + K(z-\\underline{h(x)}) \\\\\n",
-    "P = P(I - KH) & P = P(I - KH)\\\\\n",
+    "\\mathbf{K} = \\mathbf{\\bar{P}H}^\\mathsf{T}(\\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R})^{-1}& \\mathbf{K} = \\mathbf{PH}^\\mathsf{T}(\\mathbf{HPH}^\\mathsf{T} + \\mathbf{R})^{-1}\\\\\n",
+    "\\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{K}(\\mathbf{z}-\\mathbf{H\\bar{x}}) & \\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{K}(\\mathbf{z}-\\underline{h(x)}) \\\\\n",
+    "\\mathbf{P} = \\mathbf{\\bar{P}}(\\mathbf{I} - \\mathbf{KH}) & \\mathbf{P} = \\mathbf{\\bar{P}}(\\mathbf{I} - \\mathbf{KH})\\\\\n",
     "\\end{array}\n",
     "$$"
    ]
@@ -1280,11 +1280,11 @@
     "This gives us the final form of our prediction equations:\n",
     "\n",
     "$$\\begin{aligned}\n",
-    "\\mathbf{x}^- &= \\mathbf{x} + \n",
+    "\\mathbf{\\bar{x}} &= \\mathbf{x} + \n",
     "\\begin{bmatrix}- R\\sin(\\theta) + R\\sin(\\theta + \\beta) \\\\\n",
     "R\\cos(\\theta) - R\\cos(\\theta + \\beta) \\\\\n",
     "\\beta\\end{bmatrix}\\\\\n",
-    "\\mathbf{P}^- &=\\mathbf{FPF}^{\\mathsf{T}} + \\mathbf{VMV}^{\\mathsf{T}}\n",
+    "\\mathbf{\\bar{P}} &=\\mathbf{FPF}^{\\mathsf{T}} + \\mathbf{VMV}^{\\mathsf{T}}\n",
     "\\end{aligned}$$\n",
     "\n",
     "One final point. This form of linearization is not the only way to predict $\\mathbf{x}$. For example, we could use a numerical integration technique like *Runge Kutta* to compute the position of the robot in the future. In fact, if the time step is relatively large you will have to do that. As I am sure you are realizing, things are not as cut and dried with the EKF as it was for the KF. For a real problem you have to very carefully model your system with differential equations and then determine the most appropriate way to solve that system. The correct approach depends on the accuracy you require, how nonlinear the equations are, your processor budget, and numerical stability concerns. These are all topics beyond the scope of this book."
diff --git a/Appendix-B-Symbols-and-Notations.ipynb b/Appendix-B-Symbols-and-Notations.ipynb
index be34e36..e90939b 100644
--- a/Appendix-B-Symbols-and-Notations.ipynb
+++ b/Appendix-B-Symbols-and-Notations.ipynb
@@ -254,13 +254,9 @@
    ],
    "source": [
     "#format the book\n",
-    "%matplotlib inline\n",
-    "%load_ext autoreload\n",
-    "%autoreload 2  \n",
-    "from __future__ import division, print_function\n",
     "import sys\n",
     "sys.path.insert(0,'./code')\n",
-    "from book_format import load_style, set_figsize, figsize\n",
+    "from book_format import load_style\n",
     "load_style()"
    ]
   },
@@ -268,145 +264,120 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Symbology"
+    "# Symbols and Notations\n",
+    "\n",
+    "Here is a collection of the notation used by various authors for the linear Kalman filter. I have ordered them in the same order so that you can compare and contrast the choices. I use the original variable names; I did not try to normalize them."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This is just notes at this point. \n",
+    "## Labbe\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\overline{\\mathbf{x}} &= \\mathbf{F}\\hat{\\mathbf{x}} + \\mathbf{B}\\mathbf{u} \\\\\n",
+    "\\overline{\\mathbf{P}} &=  \\mathbf{F} \\mathbf{P}\\mathbf{F}^\\mathsf{T} + \\textbf{Q} \\\\ \\\\\n",
+    "\\mathbf{y} &= \\mathbf{z} - \\mathbf{H}\\overline{\\textbf{x}} \\\\\n",
+    "\\mathbf{S} &= \\mathbf{H}\\overline{\\mathbf{P}}\\mathbf{H}^\\mathsf{T} + \\mathbf{R} \\\\\n",
+    "\\mathbf{K} &= \\overline{\\mathbf{P}}\\mathbf{H}^\\mathsf{T}\\mathbf{S}^{-1} \\\\\n",
+    "\\hat{\\textbf{x}}  &= \\overline{\\mathbf{x}} +\\mathbf{K}\\mathbf{y} \\\\\n",
+    "\\mathbf{P} &= (\\mathbf{I}-\\mathbf{K}\\mathbf{H})\\overline{\\mathbf{P}}\n",
+    "\\end{aligned}$$\n",
     "\n",
     "\n",
-    "## State\n",
+    "## Wikipedia\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\hat{\\textbf{x}}_{k\\mid k-1} &= \\textbf{F}_{k}\\hat{\\textbf{x}}_{k-1\\mid k-1} + \\textbf{B}_{k} \\textbf{u}_{k} \\\\\n",
+    "\\textbf{P}_{k\\mid k-1} &=  \\textbf{F}_{k} \\textbf{P}_{k-1\\mid k-1} \\textbf{F}_{k}^{\\textsf{T}} + \\textbf{Q}_{k}\\\\\n",
+    "\\tilde{\\textbf{y}}_k &= \\textbf{z}_k - \\textbf{H}_k\\hat{\\textbf{x}}_{k\\mid k-1} \\\\\n",
+    "\\textbf{S}_k &= \\textbf{H}_k \\textbf{P}_{k\\mid k-1} \\textbf{H}_k^\\textsf{T} + \\textbf{R}_k \\\\\n",
+    "\\textbf{K}_k &= \\textbf{P}_{k\\mid k-1}\\textbf{H}_k^\\textsf{T}\\textbf{S}_k^{-1} \\\\\n",
+    "\\hat{\\textbf{x}}_{k\\mid k} &= \\hat{\\textbf{x}}_{k\\mid k-1} + \\textbf{K}_k\\tilde{\\textbf{y}}_k \\\\\n",
+    "\\textbf{P}_{k|k} &= (I - \\textbf{K}_k \\textbf{H}_k) \\textbf{P}_{k|k-1}\n",
+    "\\end{aligned}$$\n",
     "\n",
-    "$x$ (Brookner, Zarchan, Brown)\n",
-    "\n",
-    "$\\underline{x}$ Gelb)\n",
-    "\n",
-    "## State at step n\n",
-    "\n",
-    "$x_n$ (Brookner)\n",
-    "\n",
-    "$x_k$ (Brown, Zarchan)\n",
-    "\n",
-    "$\\underline{x}_k$ (Gelb)\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Prediction\n",
-    "\n",
-    "$x^-$\n",
-    "\n",
-    "$x_{n,n-1}$  (Brookner) \n",
-    "\n",
-    "$x_{k+1,k}$\n",
-    "\n",
-    "\n",
-    "## measurement\n",
-    "\n",
-    "\n",
-    "$x^*$\n",
-    "\n",
-    "\n",
-    "\n",
-    "Y_n (Brookner)\n",
-    "\n",
-    "##control transition Matrix\n",
-    "\n",
-    "$G$ (Zarchan)\n",
-    "\n",
-    "\n",
-    "Not used (Brookner)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "##Nomenclature\n",
-    "\n",
-    "\n",
-    "### Equations\n",
-    "#### Brookner\n",
+    "## Brookner\n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
     "X^*_{n+1,n} &= \\Phi X^*_{n,n} \\\\\n",
     "X^*_{n,n}  &= X^*_{n,n-1} +H_n(Y_n - MX^*_{n,n-1}) \\\\\n",
-    "H_n &= S^*_{n,n-1}M^T[R_n + MS^*_{n,n-1}M^T]^{-1} \\\\\n",
-    "S^*_{n,n-1} &= \\Phi S^*_{n-1,n-1}\\Phi^T + Q_n \\\\\n",
+    "H_n &= S^*_{n,n-1}M^\\mathsf{T}[R_n + MS^*_{n,n-1}M^\\mathsf{T}]^{-1} \\\\\n",
+    "S^*_{n,n-1} &= \\Phi S^*_{n-1,n-1}\\Phi^\\mathsf{T} + Q_n \\\\\n",
     "S^*_{n-1,n-1} &= (I-H_{n-1}M)S^*_{n-1,n-2}\n",
     "\\end{aligned}$$\n",
     "\n",
-    "#### Gelb\n",
+    "## Gelb\n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
     "\\underline{\\hat{x}}_k(-) &= \\Phi_{k-1} \\underline{\\hat{x}}_{k-1}(+) \\\\\n",
     "\\underline{\\hat{x}}_k(+)  &= \\underline{\\hat{x}}_k(-) +K_k[Z_k - H_k\\underline{\\hat{x}}_k(-)] \\\\\n",
-    "K_k &= P_k(-)H_k^T[H_kP_k(-)H_k^T + R_k]^{-1}\\\\\n",
-    "P_k(+) &=  \\Phi_{k-1} P_{k-1}(+)\\Phi_{k-1}^T + Q_{k-1} \\\\\n",
+    "K_k &= P_k(-)H_k^\\mathsf{T}[H_kP_k(-)H_k^\\mathsf{T} + R_k]^{-1}\\\\\n",
+    "P_k(+) &=  \\Phi_{k-1} P_{k-1}(+)\\Phi_{k-1}^\\mathsf{T} + Q_{k-1} \\\\\n",
     "P_k(-) &= (I-K_kH_k)P_k(-)\n",
     "\\end{aligned}$$\n",
     "\n",
     "\n",
-    "#### Brown\n",
+    "## Brown\n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
     "\\hat{\\textbf{x}}^-_{k+1} &= \\mathbf{\\phi}_{k}\\hat{\\textbf{x}}_{k} \\\\\n",
     "\\hat{\\textbf{x}}_k  &= \\hat{\\textbf{x}}^-_k +\\textbf{K}_k[\\textbf{z}_k - \\textbf{H}_k\\hat{\\textbf{}x}^-_k] \\\\\n",
-    "\\textbf{K}_k &= \\textbf{P}^-_k\\textbf{H}_k^T[\\textbf{H}_k\\textbf{P}^-_k\\textbf{H}_k^T + \\textbf{R}_k]^{-1}\\\\\n",
-    "\\textbf{P}^-_{k+1} &=  \\mathbf{\\phi}_k \\textbf{P}_k\\mathbf{\\phi}_k^T + \\textbf{Q}_{k} \\\\\n",
+    "\\textbf{K}_k &= \\textbf{P}^-_k\\textbf{H}_k^\\mathsf{T}[\\textbf{H}_k\\textbf{P}^-_k\\textbf{H}_k^T + \\textbf{R}_k]^{-1}\\\\\n",
+    "\\textbf{P}^-_{k+1} &=  \\mathbf{\\phi}_k \\textbf{P}_k\\mathbf{\\phi}_k^\\mathsf{T} + \\textbf{Q}_{k} \\\\\n",
     "\\mathbf{P}_k &= (\\mathbf{I}-\\mathbf{K}_k\\mathbf{H}_k)\\mathbf{P}^-_k\n",
     "\\end{aligned}$$\n",
     "\n",
     "\n",
-    "#### Zarchan\n",
+    "## Zarchan\n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
     "\\hat{x}_{k} &= \\Phi_{k}\\hat{x}_{k-1} + G_ku_{k-1} + K_k[z_k - H\\Phi_{k}\\hat{x}_{k-1} - HG_ku_{k-1} ] \\\\\n",
-    "M_{k} &=  \\Phi_k P_{k-1}\\phi_k^T + Q_{k} \\\\\n",
-    "K_k &= M_kH^T[HM_kH^T + R_k]^{-1}\\\\\n",
+    "M_{k} &=  \\Phi_k P_{k-1}\\phi_k^\\mathsf{T} + Q_{k} \\\\\n",
+    "K_k &= M_kH^\\mathsf{T}[HM_kH^\\mathsf{T} + R_k]^{-1}\\\\\n",
     "P_k &= (I-K_kH)M_k\n",
-    "\\end{aligned}$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Wikipedia\n",
-    "$$\n",
-    "\\begin{aligned}\n",
-    "\\hat{\\textbf{x}}_{k\\mid k-1} &= \\textbf{F}_{k}\\hat{\\textbf{x}}_{k-1\\mid k-1} + \\textbf{B}_{k} \\textbf{u}_{k} \\\\\n",
-    "\\textbf{P}_{k\\mid k-1} &=  \\textbf{F}_{k} \\textbf{P}_{k-1\\mid k-1} \\textbf{F}_{k}^{\\text{T}} + \\textbf{Q}_{k}\\\\\n",
-    "\\tilde{\\textbf{y}}_k &= \\textbf{z}_k - \\textbf{H}_k\\hat{\\textbf{x}}_{k\\mid k-1} \\\\\n",
-    "\\textbf{S}_k &= \\textbf{H}_k \\textbf{P}_{k\\mid k-1} \\textbf{H}_k^\\text{T} + \\textbf{R}_k \\\\\n",
-    "\\textbf{K}_k &= \\textbf{P}_{k\\mid k-1}\\textbf{H}_k^\\text{T}\\textbf{S}_k^{-1} \\\\\n",
-    "\\hat{\\textbf{x}}_{k\\mid k} &= \\hat{\\textbf{x}}_{k\\mid k-1} + \\textbf{K}_k\\tilde{\\textbf{y}}_k \\\\\n",
-    "\\textbf{P}_{k|k} &= (I - \\textbf{K}_k \\textbf{H}_k) \\textbf{P}_{k|k-1}\n",
-    "\\end{aligned}$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Labbe\n",
+    "\\end{aligned}$$\n",
     "\n",
-    "$$\n",
-    "\\begin{aligned}\n",
-    "\\hat{\\textbf{x}}^-_{k+1} &= \\mathbf{F}_{k}\\hat{\\textbf{x}}_{k} + \\mathbf{B}_k\\mathbf{u}_k \\\\\n",
-    "\\textbf{P}^-_{k+1} &=  \\mathbf{F}_k \\textbf{P}_k\\mathbf{F}_k^T + \\textbf{Q}_{k} \\\\\n",
-    "\\textbf{y}_k &= \\textbf{z}_k - \\textbf{H}_k\\hat{\\textbf{}x}^-_k \\\\\n",
-    "\\mathbf{S}_k &= \\textbf{H}_k\\textbf{P}^-_k\\textbf{H}_k^T + \\textbf{R}_k \\\\\n",
-    "\\textbf{K}_k &= \\textbf{P}^-_k\\textbf{H}_k^T\\mathbf{S}_k^{-1} \\\\\n",
-    "\\hat{\\textbf{x}}_k  &= \\hat{\\textbf{x}}^-_k +\\textbf{K}_k\\textbf{y} \\\\\n",
-    "\\mathbf{P}_k &= (\\mathbf{I}-\\mathbf{K}_k\\mathbf{H}_k)\\mathbf{P}^-_k\n",
-    "\\end{aligned}$$"
+    "\n",
+    "## Bar-Shalom\n",
+    "\n",
+    "## Thrun"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Terminology\n",
+    "\n",
+    "### Bar-Shalom\n",
+    "\n",
+    "* State-space\n",
+    "* x: state vector\n",
+    "* u: input vector\n",
+    "* Process noise or plant noise\n",
+    "* system matrix: A  (for \\dot(x) = Ax + Bu + Dv)\n",
+    "* F: state transition matrix\n",
+    "* H : Measurement matrix\n",
+    "* y : measurement residual\n",
+    "* $\\overline{P}$ : state prediction covariance\n",
+    "* P: updated state covariance\n",
+    "* S: innovatin covariance\n",
+    "* K(W): filter gain\n",
+    "* $\\hat{x}$ updated state estimate\n",
+    "\n",
+    "\n",
+    "smoothed state = retrodicted state\n",
+    "\n",
+    "x0 = initial estimate\n",
+    "\n",
+    "P0 : initial covariance\n"
    ]
   }
  ],
diff --git a/code/ukf_internal.py b/code/ukf_internal.py
index eb89749..8e90736 100644
--- a/code/ukf_internal.py
+++ b/code/ukf_internal.py
@@ -277,6 +277,8 @@ def print_sigmas(n=1, mean=5, cov=3, alpha=.1, beta=2., kappa=2):
     print('sum cov', sum(Wc))
 
 
+
+
 def plot_rts_output(xs, Ms, t):
     plt.figure()
     plt.plot(t, xs[:, 0]/1000., label='KF', lw=2)