Latex cleanup.

2015-12-20 12:23:09 -08:00 · 2015-12-20 12:23:09 -08:00 · 3445153b7c
commit 3445153b7c
parent 5ffd42a25d
8 changed files with 534 additions and 563 deletions
--- a/03-Gaussians.ipynb
+++ b/03-Gaussians.ipynb
@ -435,21 +435,21 @@
    "\n",
    "Now suppose that each value has a different probability of happening. Say 1 has an 80% chance of occurring, 3 has an 15% chance, and 5 has only a 5% chance. In this case we compute the expected value by multiplying each value of $x$ by the percent chance of it occurring, and summing the result. So for this case we could compute\n",
    "\n",
-    "$$E[X] = (1)(0.8) + (3)(0.15) + (5)(0.05) = 1.5$$\n",
+    "$$\\mathbb E[X] = (1)(0.8) + (3)(0.15) + (5)(0.05) = 1.5$$\n",
    "\n",
-    "Here I have introduced the notation $E[X]$ for the expected value of $x$. Some texts use $E(x)$. The value 1.5 for $x$ makes intuitive sense because x is far more like to be 1 than 3 or 5, and 3 is more likely than 5 as well.\n",
+    "Here I have introduced the notation $\\mathbb E[X]$ for the expected value of $x$. Some texts use $E(x)$. The value 1.5 for $x$ makes intuitive sense because x is far more like to be 1 than 3 or 5, and 3 is more likely than 5 as well.\n",
    "\n",
    "We can formalize this by letting $x_i$ be the $i^{th}$ value of $X$, and $p_i$ be the probability of its occurrence. This gives us\n",
    "\n",
-    "$$E[X] = \\sum_{i=1}^n p_ix_i$$\n",
+    "$$\\mathbb E[X] = \\sum_{i=1}^n p_ix_i$$\n",
    "\n",
    "A trivial bit of algebra shows that if the probabilities are all equal, the expected value is the same as the mean:\n",
    "\n",
-    "$$E[X] = \\sum_{i=1}^n p_ix_i = \\sum_{i=1}^n \\frac{1}{n}x_i = \\mu_x$$\n",
+    "$$\\mathbb E[X] = \\sum_{i=1}^n p_ix_i = \\sum_{i=1}^n \\frac{1}{n}x_i = \\mu_x$$\n",
    "\n",
    "If $x$ is continuous we substitute the sum for an integral, like so\n",
    "\n",
-    "$$E[X] = \\int_{-\\infty}^\\infty x\\, f(x)$$\n",
+    "$$\\mathbb E[X] = \\int_{-\\infty}^\\infty x\\, f(x)$$\n",
    "\n",
    "where $f(x)$ is the probability distribution function of $x$. We won't be using this equation yet, but we will be using it in the next chapter."
   ]
@ -1718,7 +1718,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,
--- a/05-Multivariate-Gaussians.ipynb
+++ b/05-Multivariate-Gaussians.ipynb
@ -374,16 +374,16 @@
    "\n",
    "The equation for the covariance between $X$ and $Y$ is\n",
    "\n",
-    "$$ COV(X, Y) = \\sigma_{xy} = E\\big[(X-\\mu_x)(Y-\\mu_y)\\big]$$\n",
+    "$$ COV(X, Y) = \\sigma_{xy} = \\mathbb E\\big[(X-\\mu_x)(Y-\\mu_y)\\big]$$\n",
    "\n",
-    "Where $E[X]$ is the *expected value* of X defined as\n",
+    "Where $\\mathbb E[X]$ is the *expected value* of X defined as\n",
    "\n",
-    "$$E[X] =  \\begin{cases} \\sum_{i=1}^n p_ix_i & \\mbox{discrete}\\\\ \\int_{-\\infty}^\\infty x\\, f(x) & \\mbox{continuous}\\end{cases}$$\n",
+    "$$\\mathbb E[X] =  \\begin{cases} \\sum_{i=1}^n p_ix_i & \\mbox{discrete}\\\\ \\int_{-\\infty}^\\infty x\\, f(x) & \\mbox{continuous}\\end{cases}$$\n",
    "\n",
    "\n",
    "Compare the covariance equation to the equation for the variance:\n",
    "\n",
-    "$$VAR(X) = \\sigma_x^2 = E[(X - \\mu)^2]$$\n",
+    "$$VAR(X) = \\sigma_x^2 = \\mathbb E[(X - \\mu)^2]$$\n",
    "\n",
    "As you can see they are very similar."
   ]
@ -1979,7 +1979,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,
--- a/06-Multivariate-Kalman-Filters.ipynb
+++ b/06-Multivariate-Kalman-Filters.ipynb
@ -467,15 +467,15 @@
    "\n",
    "$$\\mu = \\begin{bmatrix}10.0\\\\4.5\\end{bmatrix}$$\n",
    "\n",
-    "The Kalman filter is implemented using linear algebra. We use an $n\\times 1$ matrix (called a *vector*) to store  $n$ state variables. For the dog tracking problem, we use $x$ to denote position, and the first derivative of $x$, $\\dot{x}$, for velocity. Kalman filter equations use $\\mathbf{x}$ for the state, so we define $\\mathbf{x}$ as:\n",
+    "The Kalman filter is implemented using linear algebra. We use an $n\\times 1$ matrix (called a *vector*) to store  $n$ state variables. For the dog tracking problem, we use $x$ to denote position, and the first derivative of $x$, $\\dot x$, for velocity. Kalman filter equations use $\\mathbf x$ for the state, so we define $\\mathbf x$ as:\n",
    "\n",
-    "$$\\mathbf{x} =\\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}$$\n",
+    "$$\\mathbf x =\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}$$\n",
    "\n",
-    "We use $\\mathbf{x}$ instead of $\\mu$, but recognize this is the mean of the multivariate Gaussian.\n",
+    "We use $\\mathbf x$ instead of $\\mu$, but recognize this is the mean of the multivariate Gaussian.\n",
    "\n",
-    "Another way to write this is $\\mathbf{x} =\\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathsf{T}$ because the transpose of a row vector is a column vector. This notation is easier to use in text because it takes less vertical space.\n",
+    "Another way to write this is $\\mathbf x =\\begin{bmatrix}x & \\dot x\\end{bmatrix}^\\mathsf T$ because the transpose of a row vector is a column vector. This notation is easier to use in text because it takes less vertical space.\n",
    "\n",
-    "$\\mathbf{x}$ and the position $x$ coincidentally have the same name. If we were tracking the dog in the y-axis we would write $\\mathbf{x} =\\begin{bmatrix}y & \\dot{y}\\end{bmatrix}^\\mathsf{T}$, not  $\\mathbf{y} =\\begin{bmatrix}y & \\dot{y}\\end{bmatrix}^\\mathsf{T}$. $\\mathbf{x}$ is the standard name for the state variable used in the Kalman filter literature and we will not vary it to give it a more meaningful name. This consistency in naming allows us to communicate with our peers.\n",
+    "$\\mathbf x$ and the position $x$ coincidentally have the same name. If we were tracking the dog in the y-axis we would write $\\mathbf x =\\begin{bmatrix}y & \\dot y\\end{bmatrix}^\\mathsf T$, not  $\\mathbf y =\\begin{bmatrix}y & \\dot y\\end{bmatrix}^\\mathsf T$. $\\mathbf x$ is the standard name for the state variable used in the Kalman filter literature and we will not vary it to give it a more meaningful name. This consistency in naming allows us to communicate with our peers.\n",
    "\n",
    "Let's code this. FilterPy's `KalmanFilter` class contains a variable `x` for the state variable. Initialization of `x` is as simple as"
   ]
@ -555,7 +555,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The other multivariate Gaussian variable is the covariance matrix $\\Sigma$. Kalman filter equations typically use the symbol $\\mathbf{P}$. In the univariate Kalman filter we specified an initial value for $\\sigma^2$, and then the filter took care of updating its value as measurements were added to the filter. The same thing happens in the multidimensional Kalman filter. We specify an initial value for $\\mathbf{P}$ and the filter updates it during each epoch.\n",
+    "The other multivariate Gaussian variable is the covariance matrix $\\Sigma$. Kalman filter equations typically use the symbol $\\mathbf P$. In the univariate Kalman filter we specified an initial value for $\\sigma^2$, and then the filter took care of updating its value as measurements were added to the filter. The same thing happens in the multidimensional Kalman filter. We specify an initial value for $\\mathbf P$ and the filter updates it during each epoch.\n",
    "\n",
    "We need to set the variances to reasonable values. For example, we may choose $\\sigma_\\mathtt{pos}^2=500 m^2$ if we are quite uncertain about the initial position. Top speed for a dog is around 21 m/s, so in the absence of any other information about the velocity we can set $3\\sigma_\\mathtt{vel}=21$, or $\\sigma_\\mathtt{vel}^2=49$. \n",
    "\n",
@ -564,7 +564,7 @@
    "Recall that the diagonals of the covariance matrix contains the variance of each variable, and the off-diagonal elements contains the covariances. Thus we have:\n",
    "\n",
    "$$\n",
-    "\\mathbf{P} = \\begin{bmatrix}500 & 0 \\\\ 0&49\\end{bmatrix}\n",
+    "\\mathbf P = \\begin{bmatrix}500 & 0 \\\\ 0&49\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "I can code this using the function `numpy.diag` which creates a diagonal matrix from the values for the diagonal. Recall from linear algebra that a diagonal matrix is one with zeros in the off-diagonal elements."
@ -657,9 +657,9 @@
    "\n",
    "We will do the same thing in this chapter, using multivariate Gaussians instead of univariate Gaussians. You might imagine this sort of implementation:\n",
    "\n",
-    "$$ \\mathbf{x} = \\begin{bmatrix}5.4\\\\4.2\\end{bmatrix}, \\, \\, \n",
-    "\\dot{\\mathbf{x}} =  \\begin{bmatrix}1.1\\\\0.\\end{bmatrix} \\\\\n",
-    "\\mathbf{x} = \\mathbf{x} + \\dot{\\mathbf{x}}$$\n",
+    "$$ \\mathbf x = \\begin{bmatrix}5.4\\\\4.2\\end{bmatrix}, \\, \\, \n",
+    "\\dot{\\mathbf x} =  \\begin{bmatrix}1.1\\\\0.\\end{bmatrix} \\\\\n",
+    "\\mathbf x = \\mathbf x + \\dot{\\mathbf x}$$\n",
    "\n",
    "But we need to generalize this. The Kalman filter equations work with any linear system, not just Newtonian ones. Maybe the system you are filtering is the plumbing system in a chemical plant, and the amount of flow in a given pipe is determined by a linear combination of the settings of different valves. \n",
    "\n",
@ -676,48 +676,48 @@
    "\n",
    "$$\\begin{bmatrix}2& 3 \\\\ 3&-1\\end{bmatrix} \\begin{bmatrix}x\\\\y\\end{bmatrix} = \\begin{bmatrix}8\\\\1\\end{bmatrix}$$\n",
    "\n",
-    "If you perform the [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication#General_definition_of_the_matrix_product) in this equation the result will be the two equations above. In linear algebra we would write this as $\\mathbf{Ax}=\\mathbf{b}$, where\n",
+    "If you perform the [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication#General_definition_of_the_matrix_product) in this equation the result will be the two equations above. In linear algebra we would write this as $\\mathbf{Ax}=\\mathbf B$, where\n",
    "\n",
-    "$$\\mathbf{A} = \\begin{bmatrix}2& 3 \\\\ 3&-1\\end{bmatrix},\\, \\mathbf{x} = \\begin{bmatrix}x\\\\y\\end{bmatrix}, \\mathbf{b}=\\begin{bmatrix}8\\\\1\\end{bmatrix}$$\n",
+    "$$\\mathbf{A} = \\begin{bmatrix}2& 3 \\\\ 3&-1\\end{bmatrix},\\, \\mathbf x = \\begin{bmatrix}x\\\\y\\end{bmatrix}, \\mathbf B=\\begin{bmatrix}8\\\\1\\end{bmatrix}$$\n",
    "\n",
-    "We call the set of equations that describe how the systems behaves the *process model*. We use the process model to perform the *innovation*, because the equations tell us what the next state will be given the current state. Kalman filters implement this using the linear equation, where $\\mathbf{\\bar{x}}$ is the *prior*, or predicted state:\n",
+    "We call the set of equations that describe how the systems behaves the *process model*. We use the process model to perform the *innovation*, because the equations tell us what the next state will be given the current state. Kalman filters implement this using the linear equation, where $\\mathbf{\\overline x}$ is the *prior*, or predicted state:\n",
    "\n",
-    "$$\\mathbf{\\bar{x}} = \\mathbf{Fx}$$\n",
+    "$$\\mathbf{\\overline x} = \\mathbf{Fx}$$\n",
    "\n",
-    "Our job as Kalman filters designers is to specify $\\mathbf{F}$ such that $\\bar{\\mathbf{x}}  = \\mathbf{Fx}$ performs the innovation (prediction) for our system. To do this we need one equation for each state variable. In our problem $\\mathbf{x} = \\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathtt{T}$, so we need one equation to compute $\\bar{x}$ and a another to compute $\\bar{\\dot{x}}$ . We already know the equation for the position innovation:\n",
+    "Our job as Kalman filters designers is to specify $\\mathbf F$ such that $\\overline{\\mathbf x}  = \\mathbf{Fx}$ performs the innovation (prediction) for our system. To do this we need one equation for each state variable. In our problem $\\mathbf x = \\begin{bmatrix}x & \\dot x\\end{bmatrix}^\\mathtt{T}$, so we need one equation to compute $\\overline x$ and a another to compute $\\overline{\\dot x}$ . We already know the equation for the position innovation:\n",
    "\n",
-    "$$\\bar{x} = \\dot{x} \\Delta t + x$$\n",
+    "$$\\overline x = \\dot x \\Delta t + x$$\n",
    "\n",
    "What is our equation for velocity? We have no predictive model for how our dog's velocity will change over time. In this case we assume that it remains constant between innovations. Of course this is not exactly true, but so long as the velocity doesn't change too much over each innovation you will see that the filter performs very well. So we say\n",
    "\n",
-    "$$\\bar{\\dot{x}} = \\dot{x}$$\n",
+    "$$\\overline{\\dot x} = \\dot x$$\n",
    "\n",
    "This gives us the process model for our system \n",
    "\n",
    "$$\\begin{cases}\n",
    "\\begin{aligned}\n",
-    "\\bar{x} &= \\dot{x} \\Delta t + x \\\\\n",
-    "\\bar{\\dot{x}} &= \\dot{x}\n",
+    "\\overline x &= \\dot x \\Delta t + x \\\\\n",
+    "\\overline{\\dot x} &= \\dot x\n",
    "\\end{aligned}\n",
    "\\end{cases}$$\n",
    "\n",
-    "We need to express this set of equations in the form $\\bar{\\mathbf{x}}  = \\mathbf{Fx}$. Rearranging terms makes it easier to see what to do.\n",
+    "We need to express this set of equations in the form $\\overline{\\mathbf x}  = \\mathbf{Fx}$. Rearranging terms makes it easier to see what to do.\n",
    "\n",
    "$$\\begin{cases}\n",
    "\\begin{aligned}\n",
-    "\\bar{x} &= 1x + &\\Delta t\\, \\dot{x} \\\\\n",
-    "\\bar{\\dot{x}} &=0x + &1\\, \\dot{x}\n",
+    "\\overline x &= 1x + &\\Delta t\\, \\dot x \\\\\n",
+    "\\overline{\\dot x} &=0x + &1\\, \\dot x\n",
    "\\end{aligned}\n",
    "\\end{cases}$$\n",
    "\n",
    "We can rewrite this in matrix form as\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "{\\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix}}^- &= \\begin{bmatrix}1&\\Delta t  \\\\ 0&1\\end{bmatrix}  \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}\\\\\n",
-    "\\mathbf{\\bar{x}} &= \\mathbf{Fx}\n",
+    "{\\overline{\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix}}} &= \\begin{bmatrix}1&\\Delta t  \\\\ 0&1\\end{bmatrix}  \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}\\\\\n",
+    "\\mathbf{\\overline x} &= \\mathbf{Fx}\n",
    "\\end{aligned}$$\n",
    "\n",
-    "$\\mathbf{F}$ is often called the *state transition function*. In the `KalmanFilter` class we implement the state transition function with"
+    "$\\mathbf F$ is often called the *state transition function*. In the `KalmanFilter` class we implement the state transition function with"
   ]
  },
  {
@ -747,7 +747,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's test this! `KalmanFilter` has a `predict` method that performs the prediction by computing $\\mathbf{\\bar{x}} = \\mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to  4.5 meter/sec. We've defined `dt = 0.1`, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged."
+    "Let's test this! `KalmanFilter` has a `predict` method that performs the prediction by computing $\\mathbf{\\overline x} = \\mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to  4.5 meter/sec. We've defined `dt = 0.1`, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged."
   ]
  },
  {
@ -780,7 +780,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This worked. Note that the code does not distinguish between between the prior and posterior in the variable names, so after calling predict the prior $\\bar{\\mathbf{x}}$ is stored in `KalmanFilter.x`. If we call `predict()` several times in a row the value will be updated each time. "
+    "This worked. Note that the code does not distinguish between between the prior and posterior in the variable names, so after calling predict the prior $\\overline{\\mathbf x}$ is stored in `KalmanFilter.x`. If we call `predict()` several times in a row the value will be updated each time. "
   ]
  },
  {
@ -817,7 +817,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "`KalmanFilter.predict()` computes both the mean and covariance of the innovation. This is the value of $\\mathbf{P}$ after three innovations (predictions), which we denote $\\mathbf{\\bar{P}}$ in the Kalman filter equations."
+    "`KalmanFilter.predict()` computes both the mean and covariance of the innovation. This is the value of $\\mathbf P$ after three innovations (predictions), which we denote $\\mathbf{\\overline P}$ in the Kalman filter equations."
   ]
  },
  {
@ -888,7 +888,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can see that the center of the ellipse shifted by a small amount (from 10 to 10.90) because the position changed. The ellipse also elongated, showing the correlation between position and velocity. How does the filter compute new values for $\\mathbf{\\bar{P}}$, and what is it based on? It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground."
+    "You can see that the center of the ellipse shifted by a small amount (from 10 to 10.90) because the position changed. The ellipse also elongated, showing the correlation between position and velocity. How does the filter compute new values for $\\mathbf{\\overline P}$, and what is it based on? It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground."
   ]
  },
  {
@ -897,15 +897,15 @@
   "source": [
    "#### Process Noise\n",
    "\n",
-    "A quick review on *process noise*. A car is driving along the road with the cruise control on; it should travel at a constant speed. We model this with $x_k=\\dot{x}_k\\Delta t + x_{k-1}$. However, it is affected by a number of unknown factors. The cruise control cannot perfectly maintain a constant velocity. Winds affect the car, as do hills and potholes. Passengers roll down windows, changing the drag profile of the car. \n",
+    "A quick review on *process noise*. A car is driving along the road with the cruise control on; it should travel at a constant speed. We model this with $x_k=\\dot x_k\\Delta t + x_{k-1}$. However, it is affected by a number of unknown factors. The cruise control cannot perfectly maintain a constant velocity. Winds affect the car, as do hills and potholes. Passengers roll down windows, changing the drag profile of the car. \n",
    "\n",
    "We can model this system with the differential equation\n",
    "\n",
-    "$$\\dot{\\mathbf{x}} = f(\\mathbf{x}) + w$$\n",
+    "$$\\dot{\\mathbf x} = f(\\mathbf x) + w$$\n",
    "\n",
-    "where $f(\\mathbf{x})$ models the state transition and $w$ is white process noise.\n",
+    "where $f(\\mathbf x)$ models the state transition and $w$ is white process noise.\n",
    "\n",
-    "We will learn how to go from a differential equation to the Kalman filter matrices in the **Kalman Filter Math** chapter. For now you just need to know that the to account for the  noise in the system the filter adds a process noise covariance matrix $\\mathbf{Q}$ to the covariance $\\mathbf{P}$. We do not add anything to $\\mathbf{x}$ because the noise is *white* - which means that the mean of the noise will be 0. If the mean is 0, $\\mathbf{x}$ will not change."
+    "We will learn how to go from a differential equation to the Kalman filter matrices in the **Kalman Filter Math** chapter. For now you just need to know that the to account for the  noise in the system the filter adds a process noise covariance matrix $\\mathbf Q$ to the covariance $\\mathbf P$. We do not add anything to $\\mathbf x$ because the noise is *white* - which means that the mean of the noise will be 0. If the mean is 0, $\\mathbf x$ will not change."
   ]
  },
  {
@ -914,14 +914,14 @@
   "source": [
    "The univariate Kalman filter used `variance = variance + process_noise` to compute the variance for the variance of the prediction step. The multivariate Kalman filter does the same, essentially `P = P + Q`. I say 'essentially' because there are other terms unrelated to noise in the covariance equation that we will see later.\n",
    "\n",
-    "Deriving the process noise matrix can be quite demanding, and we will put it off until the Kalman math chapter. For now know that $\\mathbf{Q}$ equals the expected value of the white noise $w$, computed as $\\mathbf{Q} = E[\\mathbf{ww}^\\mathsf{T}]$. In this chapter we will focus on building an intuitive understanding on how modifying this matrix alters the behavior of the filter."
+    "Deriving the process noise matrix can be quite demanding, and we will put it off until the Kalman math chapter. For now know that $\\mathbf Q$ equals the expected value of the white noise $w$, computed as $\\mathbf Q = \\mathbb E[\\mathbf{ww}^\\mathsf T]$. In this chapter we will focus on building an intuitive understanding on how modifying this matrix alters the behavior of the filter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "FilterPy provides functions which compute $\\mathbf{Q}$ for the kinematic problems of this chapter. `Q_discrete_white_noise` takes 3 parameters. `dim`, which specifies the dimension of the matrix, `dt`, which is the time step in seconds, and `var`, the variance in the noise. Briefly, it discretizes the noise over the given time period under assumptions that we will discuss later. This code computes $\\mathbf{Q}$ for white noise with a variance of 2.35 and a time step of 0.1 seconds:"
+    "FilterPy provides functions which compute $\\mathbf Q$ for the kinematic problems of this chapter. `Q_discrete_white_noise` takes 3 parameters. `dim`, which specifies the dimension of the matrix, `dt`, which is the time step in seconds, and `var`, the variance in the noise. Briefly, it discretizes the noise over the given time period under assumptions that we will discuss later. This code computes $\\mathbf Q$ for white noise with a variance of 2.35 and a time step of 0.1 seconds:"
   ]
  },
  {
@ -962,16 +962,16 @@
    "\n",
    "$$\\mathbf{Bu}$$\n",
    "\n",
-    "Here $\\mathbf{u}$ is the *control input*, and $\\mathbf{B}$ is the *control input model* or *control function*. For example, $\\mathbf{u}$ might be a voltage controlling how fast the wheel's motor turns, and multiplying by $\\mathbf{B}$ yields $[\\begin{smallmatrix}x\\\\\\dot{x}\\end{smallmatrix}]$.\n",
+    "Here $\\mathbf u$ is the *control input*, and $\\mathbf B$ is the *control input model* or *control function*. For example, $\\mathbf u$ might be a voltage controlling how fast the wheel's motor turns, and multiplying by $\\mathbf B$ yields $[\\begin{smallmatrix}x\\\\\\dot x\\end{smallmatrix}]$.\n",
    "\n",
    "\n",
    "Therefore the complete Kalman filter equation for the prior mean is\n",
    "\n",
-    "$$\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$$\n",
+    "$$\\mathbf{\\overline x} = \\mathbf{Fx} + \\mathbf{Bu}$$\n",
    "\n",
    "and this is the equation that is computed when you call `KalmanFilter.predict()`.\n",
    "\n",
-    "Your dog may be trained to respond to voice commands. All available evidence suggests that my dog has no control inputs whatsoever, so I set $\\mathbf{B}$ to zero. In Python we write"
+    "Your dog may be trained to respond to voice commands. All available evidence suggests that my dog has no control inputs whatsoever, so I set $\\mathbf B$ to zero. In Python we write"
   ]
  },
  {
@ -994,11 +994,11 @@
    "\n",
    "Your job as a designer is to specify the matrices for\n",
    "\n",
-    "* $\\mathbf{x}$: the state variables and initial value\n",
-    "* $\\mathbf{P}$: the covariance matrix initial value\n",
-    "* $\\mathbf{F}$: the state transition function\n",
+    "* $\\mathbf x$: the state variables and initial value\n",
+    "* $\\mathbf P$: the covariance matrix initial value\n",
+    "* $\\mathbf F$: the state transition function\n",
    "* $\\mathbf{B,u}$: Optionally, the control input and function\n",
-    "* $\\mathbf{Q}$: the process noise matrix"
+    "* $\\mathbf Q$: the process noise matrix"
   ]
  },
  {
@ -1061,38 +1061,38 @@
    "\n",
    "Why are we working in measurement space? Why not work in state space by converting the voltage into a temperature, allowing the residual to be a difference in temperature?\n",
    "\n",
-    "We cannot do that because most measurements are not *invertible*. The state for the tracking problem contains the hidden variable $\\dot{x}$. There is no way to convert a measurement of position into a state containing velocity. On the other hand, it is trivial to convert a state containing position and measurement into a equivalent \"measurement\" containing only position. We have to work in measurement space to make the computation of the residual possible.\n",
+    "We cannot do that because most measurements are not *invertible*. The state for the tracking problem contains the hidden variable $\\dot x$. There is no way to convert a measurement of position into a state containing velocity. On the other hand, it is trivial to convert a state containing position and measurement into a equivalent \"measurement\" containing only position. We have to work in measurement space to make the computation of the residual possible.\n",
    "\n",
-    "Both the measurement $\\mathbf{z}$ and state $\\mathbf{x}$ are vectors so we need to use a matrix to perform the conversion. The Kalman filter equation that performs this step is:\n",
+    "Both the measurement $\\mathbf z$ and state $\\mathbf x$ are vectors so we need to use a matrix to perform the conversion. The Kalman filter equation that performs this step is:\n",
    "\n",
-    "$$\\mathbf{y} = \\mathbf{z} - \\mathbf{H \\bar{x}}$$\n",
+    "$$\\mathbf y = \\mathbf z - \\mathbf{H \\overline x}$$\n",
    "\n",
-    "where $\\mathbf{y}$ is the residual, $\\mathbf{\\bar{x}}$ is the prior, $\\mathbf{z}$ is the measurement, and $\\mathbf{H}$ is the measurement function. So we take the prior, convert it to a measurement by multiplying it with $\\mathbf{H}$, and subtract that from the measurement. This gives us the difference between our prediction and measurement in measurement space!\n",
+    "where $\\mathbf y$ is the residual, $\\mathbf{\\overline x}$ is the prior, $\\mathbf z$ is the measurement, and $\\mathbf H$ is the measurement function. So we take the prior, convert it to a measurement by multiplying it with $\\mathbf H$, and subtract that from the measurement. This gives us the difference between our prediction and measurement in measurement space!\n",
    "\n",
-    "We need to design $\\mathbf{H}$ so that $\\mathbf{H\\bar{x}}$ yields a measurement. For this problem we have a sensor that measures position, so $\\mathbf{z}$ will be a one variable vector:\n",
+    "We need to design $\\mathbf H$ so that $\\mathbf{H\\overline x}$ yields a measurement. For this problem we have a sensor that measures position, so $\\mathbf z$ will be a one variable vector:\n",
    "\n",
-    "$$\\mathbf{z} = \\begin{bmatrix}z\\end{bmatrix}$$\n",
+    "$$\\mathbf z = \\begin{bmatrix}z\\end{bmatrix}$$\n",
    "\n",
    "The residual equation will have the form\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H\\bar{x}}  \\\\\n",
-    "\\begin{bmatrix}y \\end{bmatrix} &= \\begin{bmatrix}z\\end{bmatrix} - \\begin{bmatrix}?&?\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}\n",
+    "\\textbf{y} &= \\mathbf z - \\mathbf{H\\overline x}  \\\\\n",
+    "\\begin{bmatrix}y \\end{bmatrix} &= \\begin{bmatrix}z\\end{bmatrix} - \\begin{bmatrix}?&?\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
-    "$\\mathbf{H}$ has to be a 1x2 matrix for $\\mathbf{Hx}$ to be 1x1. Recall that multiplying matrices $m\\times n$ by $n\\times p$ yields a $m\\times p$ matrix.\n",
+    "$\\mathbf H$ has to be a 1x2 matrix for $\\mathbf{Hx}$ to be 1x1. Recall that multiplying matrices $m\\times n$ by $n\\times p$ yields a $m\\times p$ matrix.\n",
    "\n",
-    "We will want to multiply the position $x$ by 1 to get the corresponding measurement of the position. We do not need to use velocity to find the corresponding measurement so we multiply  $\\dot{x}$ by 0.\n",
+    "We will want to multiply the position $x$ by 1 to get the corresponding measurement of the position. We do not need to use velocity to find the corresponding measurement so we multiply  $\\dot x$ by 0.\n",
    "\n",
    "$$\n",
-    "\\textbf{y} = \\mathbf{z} - \\begin{bmatrix}1&0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}\n",
+    "\\textbf{y} = \\mathbf z - \\begin{bmatrix}1&0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "And so, for our Kalman filter we set\n",
    "\n",
-    "$$\\mathbf{H}=\\begin{bmatrix}1&0\\end{bmatrix}$$"
+    "$$\\mathbf H=\\begin{bmatrix}1&0\\end{bmatrix}$$"
   ]
  },
  {
@ -1121,7 +1121,7 @@
    "\n",
    "The *measurement noise matrix* models the noise in our sensors as a covariance matrix. In practice this can be difficult. A complicated system may have many sensors, the correlation between them might not be clear, and usually their noise is not a pure Gaussian. For example, a sensor might be biased to read high if the temperature is high, and so the noise is not distributed equally on both sides of the mean. We will learn to deal with these problems later.\n",
    "\n",
-    "The Kalman filter equations uses a covariance matrix $\\mathbf{R}$ for the measurement noise. The matrix will have dimension $m{\\times}m$, where $m$ is the number of sensors. It is a covariance matrix to account for correlations between the sensors. We have only 1 sensor so R is:\n",
+    "The Kalman filter equations uses a covariance matrix $\\mathbf R$ for the measurement noise. The matrix will have dimension $m{\\times}m$, where $m$ is the number of sensors. It is a covariance matrix to account for correlations between the sensors. We have only 1 sensor so R is:\n",
    "\n",
    "$$R = \\begin{bmatrix}\\sigma^2_z\\end{bmatrix}$$\n",
    "\n",
@ -1230,9 +1230,9 @@
    "\n",
    "**4**: We set $\\textbf{R} = \\begin{bmatrix}5\\end{bmatrix}$. \n",
    "\n",
-    "**5** We use the `Q_discrete_white_noise()` method to set $\\mathbf{Q}$'s variance to 0.1.\n",
+    "**5** We use the `Q_discrete_white_noise()` method to set $\\mathbf Q$'s variance to 0.1.\n",
    "\n",
-    "**6**: We set $\\mathbf{P} = \\begin{bmatrix} 500&0\\\\0&49\\end{bmatrix}$\n",
+    "**6**: We set $\\mathbf P = \\begin{bmatrix} 500&0\\\\0&49\\end{bmatrix}$\n",
    "\n",
    "I've put this in a function to allow you to specify different initial values for `R`, `P`, and `Q` and put it in a helper function. We will be creating and running many of these filters, and this saves us a lot of headaches."
   ]
@ -1400,9 +1400,9 @@
    "\n",
    "The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (the 'Track' line). After the initial settling in period the filter should track the dog's position very closely. The yellow shaded portion between the black dotted lines shows 1 standard deviations of the filter's variance, which I explain in the next paragraph.\n",
    "\n",
-    "The next two plots show the variance of $x$ and of $\\dot{x}$. If you look at the code, you will see that I have plotted the diagonals of $\\mathbf{P}$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $\\mathbf{P}[0,0]$ is the variance of $x$, and $\\mathbf{P}[1,1]$ is the variance of $\\dot{x}$. You can see that we quickly converge to small variances for both. \n",
+    "The next two plots show the variance of $x$ and of $\\dot x$. If you look at the code, you will see that I have plotted the diagonals of $\\mathbf P$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $\\mathbf P[0,0]$ is the variance of $x$, and $\\mathbf P[1,1]$ is the variance of $\\dot x$. You can see that we quickly converge to small variances for both. \n",
    "\n",
-    "The covariance matrix $\\mathbf{P}$ tells us the *theoretical* performance of the filter *assuming* everything we tell it is true. Recall from the Gaussian chapter that the standard deviation is the square root of the variance, and that approximately 68% of a Gaussian distribution occurs within one standard deviation. Therefore, if at least 68% of the filter output is within one standard deviation we can be sure that the filter is performing well. In the top chart I have displayed the one standard deviation as the yellow shaded area between the two dotted lines. To my eye it looks like perhaps the filter is slightly exceeding that bounds, so the filter probably needs some tuning."
+    "The covariance matrix $\\mathbf P$ tells us the *theoretical* performance of the filter *assuming* everything we tell it is true. Recall from the Gaussian chapter that the standard deviation is the square root of the variance, and that approximately 68% of a Gaussian distribution occurs within one standard deviation. Therefore, if at least 68% of the filter output is within one standard deviation we can be sure that the filter is performing well. In the top chart I have displayed the one standard deviation as the yellow shaded area between the two dotted lines. To my eye it looks like perhaps the filter is slightly exceeding that bounds, so the filter probably needs some tuning."
   ]
  },
  {
@ -1463,26 +1463,26 @@
    "\n",
    "### Prediction Equations\n",
    "\n",
-    "The Kalman filter uses these equations to compute the *prior* - the predicted next state of the system. They compute the mean ($\\mathbf{x}$)  and covariance ($\\mathbf{P}$) of the system.\n",
+    "The Kalman filter uses these equations to compute the *prior* - the predicted next state of the system. They compute the mean ($\\mathbf x$)  and covariance ($\\mathbf P$) of the system.\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{\\bar{x}} &= \\mathbf{Fx} + \\mathbf{Bu}\\\\\n",
-    "\\mathbf{\\bar{P}} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
+    "\\mathbf{\\overline x} &= \\mathbf{Fx} + \\mathbf{Bu}\\\\\n",
+    "\\mathbf{\\overline P} &= \\mathbf{FPF}^\\mathsf T + \\mathbf Q\n",
    "\\end{aligned}$$\n",
    "\n",
    "$\\underline{\\textbf{Mean}}$\n",
    "\n",
-    "$\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$\n",
+    "$\\mathbf{\\overline x} = \\mathbf{Fx} + \\mathbf{Bu}$\n",
    "\n",
-    "We are starting out easy - you were already exposed to the first equation while designing the state transition function $\\mathbf{F}$ and control function $\\mathbf{B}$. \n",
+    "We are starting out easy - you were already exposed to the first equation while designing the state transition function $\\mathbf F$ and control function $\\mathbf B$. \n",
    "\n",
-    "As a reminder, the linear equation $\\mathbf{Ax} = \\mathbf{b}$ represents a system of equations, where $\\mathbf{A}$ holds the coefficients of the state $\\mathbf{x}$. Performing the multiplication $\\mathbf{Ax}$ computes the right hand side values for that set of equations.\n",
+    "As a reminder, the linear equation $\\mathbf{Ax} = \\mathbf B$ represents a system of equations, where $\\mathbf{A}$ holds the coefficients of the state $\\mathbf x$. Performing the multiplication $\\mathbf{Ax}$ computes the right hand side values for that set of equations.\n",
    "\n",
-    "If $\\mathbf{F}$ contains the state transition for a given time step, then the product $\\mathbf{Fx}$ computes the state after that transition. Easy! Likewise, $\\mathbf{B}$ is the control function, $\\mathbf{u}$ is the control input, so $\\mathbf{Bu}$ computes the contribution of the controls to the state after the transition. Thus, the prior of the state ($\\mathbf{\\bar{x}}$) is computed as the sum of $\\mathbf{Fx}$ and $\\mathbf{Bu}$.\n",
+    "If $\\mathbf F$ contains the state transition for a given time step, then the product $\\mathbf{Fx}$ computes the state after that transition. Easy! Likewise, $\\mathbf B$ is the control function, $\\mathbf u$ is the control input, so $\\mathbf{Bu}$ computes the contribution of the controls to the state after the transition. Thus, the prior of the state ($\\mathbf{\\overline x}$) is computed as the sum of $\\mathbf{Fx}$ and $\\mathbf{Bu}$.\n",
    "\n",
    "$\\underline{\\textbf{Covariance}}$\n",
    "\n",
-    "$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$\n",
+    "$\\mathbf{\\overline P} = \\mathbf{FPF}^\\mathsf T + \\mathbf Q$\n",
    "\n",
    "This equation is not as easy to understand so we will spend more time on it. \n",
    "\n",
@ -1491,17 +1491,17 @@
    "$$\\mu = \\mu + \\mu_{move}\\\\\n",
    "\\sigma^2 = \\sigma^2 + \\sigma^2_{move}$$\n",
    "\n",
-    "We added the variance of the movement to the variance of our estimate to reflect the loss of knowlege. We need to do the same thing here, except it isn't quite that easy with multivariate Gaussians. We can't simply write $\\mathbf{\\bar{P}} = \\mathbf{P} + \\mathbf{Q}$.\n",
+    "We added the variance of the movement to the variance of our estimate to reflect the loss of knowlege. We need to do the same thing here, except it isn't quite that easy with multivariate Gaussians. We can't simply write $\\mathbf{\\overline P} = \\mathbf P + \\mathbf Q$.\n",
    "\n",
    "In a multivariate Gaussians the state variables are *correlated*. What does this imply? Our knowledge of the velocity is imperfect, but we are adding it to the position with\n",
    "\n",
-    "$$\\bar{x} = \\dot{x}\\Delta t + x$$\n",
+    "$$\\overline x = \\dot x\\Delta t + x$$\n",
    "\n",
-    "Since we do not have perfect knowledge of the value of $\\dot{x}$ the sum $\\bar{x} = \\dot{x}\\Delta t + x$ gains uncertainty. Because the positions and velocities are correlated we cannot simply add the covariance matrices. The correct equation is\n",
+    "Since we do not have perfect knowledge of the value of $\\dot x$ the sum $\\overline x = \\dot x\\Delta t + x$ gains uncertainty. Because the positions and velocities are correlated we cannot simply add the covariance matrices. The correct equation is\n",
    "\n",
-    "$$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}$$\n",
+    "$$\\mathbf{\\overline P} = \\mathbf{FPF}^\\mathsf T$$\n",
    "\n",
-    "Expressions in the form $\\mathbf{ABA}^\\mathsf{T}$ are common in linear algebra. You can think of it as *projecting* the middle term by the outer term. We will be using this many times in the rest of the book. \n",
+    "Expressions in the form $\\mathbf{ABA}^\\mathsf T$ are common in linear algebra. You can think of it as *projecting* the middle term by the outer term. We will be using this many times in the rest of the book. \n",
    "\n",
    "I admit this may be a 'magical' equation to you. "
   ]
@ -1510,14 +1510,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "When we initialize $\\mathbf{P}$ with\n",
+    "When we initialize $\\mathbf P$ with\n",
    "\n",
-    "$$\\mathbf{P} = \\begin{bmatrix}\\sigma^2_x & 0 \\\\ 0 & \\sigma^2_v\\end{bmatrix}$$\n",
+    "$$\\mathbf P = \\begin{bmatrix}\\sigma^2_x & 0 \\\\ 0 & \\sigma^2_v\\end{bmatrix}$$\n",
    "\n",
    "\n",
-    "the value for $\\mathbf{FPF}^\\mathsf{T}$ is:\n",
+    "the value for $\\mathbf{FPF}^\\mathsf T$ is:\n",
    "\n",
-    "$$\\mathbf{FPF}^\\mathsf{T} = \\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\n",
+    "$$\\mathbf{FPF}^\\mathsf T = \\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\n",
    "\\begin{bmatrix}\\sigma^2_x & 0 \\\\  0 & \\sigma^2_{v}\\end{bmatrix}\n",
    "\\begin{bmatrix}1&0\\\\\\Delta t&1\\end{bmatrix} \\\\\n",
    "= \\begin{bmatrix}\\sigma^2_x&\\sigma_v^2\\Delta t\\\\  0 & \\sigma^2_{v}\\end{bmatrix}\n",
@ -1531,7 +1531,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The initial value for $\\mathbf{P}$ had no covariance between the position and velocity.  Position is computed as $\\dot{x}\\Delta t + x$, so there is a correlation between the position and velocity. The multiplication $\\mathbf{FPF}^\\mathsf{T}$ computes a covariance of $\\sigma_v^2 \\Delta t$. The exact value is not important; you just need to recognize that $\\mathbf{FPF}^\\mathsf{T}$ uses the process model to automatically compute the covariance between the position and velocity!"
+    "The initial value for $\\mathbf P$ had no covariance between the position and velocity.  Position is computed as $\\dot x\\Delta t + x$, so there is a correlation between the position and velocity. The multiplication $\\mathbf{FPF}^\\mathsf T$ computes a covariance of $\\sigma_v^2 \\Delta t$. The exact value is not important; you just need to recognize that $\\mathbf{FPF}^\\mathsf T$ uses the process model to automatically compute the covariance between the position and velocity!"
   ]
  },
  {
@ -1541,21 +1541,21 @@
    "If you have some experience with linear algebra and statistics, this may help. The covariance due to the prediction can be modeled as the expected value of the error in the prediction step, given by this equation. \n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\bar{\\mathbf{P}} &= E[(\\mathbf{Fx})(\\mathbf{Fx})^\\mathsf{T}]\\\\\n",
-    " &= E[\\mathbf{Fxx}^\\mathsf{T}\\mathbf{F}^\\mathsf{T}] \\\\\n",
-    " &= \\mathbf{F}\\, E[\\mathbf{xx}^\\mathsf{T}]\\, \\mathbf{F}^\\mathsf{T}\n",
+    "\\overline{\\mathbf P} &= \\mathbb E[(\\mathbf{Fx})(\\mathbf{Fx})^\\mathsf T]\\\\\n",
+    " &= \\mathbb E[\\mathbf{Fxx}^\\mathsf T\\mathbf F^\\mathsf T] \\\\\n",
+    " &= \\mathbf F\\, \\mathbb E[\\mathbf{xx}^\\mathsf T]\\, \\mathbf F^\\mathsf T\n",
    "\\end{aligned}$$\n",
    "\n",
-    "Of course, $E[\\mathbf{xx}^\\mathsf{T}]$ is just $\\mathbf{P}$, giving us\n",
+    "Of course, $\\mathbb E[\\mathbf{xx}^\\mathsf T]$ is just $\\mathbf P$, giving us\n",
    "\n",
-    "$$\\bar{\\mathbf{P}} = \\mathbf{FPF}^\\mathsf{T}$$"
+    "$$\\overline{\\mathbf P} = \\mathbf{FPF}^\\mathsf T$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "For now we will look at its effect. Here I use $\\mathbf{F}$ from our filter and project the state forward 6/10ths of a second. I do this five times so you can see how $\\mathbf{\\bar{P}}$ continues to change. "
+    "For now we will look at its effect. Here I use $\\mathbf F$ from our filter and project the state forward 6/10ths of a second. I do this five times so you can see how $\\mathbf{\\overline P}$ continues to change. "
   ]
  },
  {
@ -1596,14 +1596,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information about the position due to adding $\\dot{x}\\Delta t$ to x at each step. The height has not changed - our system model says the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf{F}$ linearly correlates $x$ with $\\dot{x}$ with the expression $\\bar{x} = \\dot{x} \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf{T}$ computation correctly incorporates this correlation into the covariance matrix.\n",
+    "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information about the position due to adding $\\dot x\\Delta t$ to x at each step. The height has not changed - our system model says the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf F$ linearly correlates $x$ with $\\dot x$ with the expression $\\overline x = \\dot x \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf T$ computation correctly incorporates this correlation into the covariance matrix.\n",
    "\n",
-    "Here is an animation of this equation that allows you to change the design of $\\mathbf{F}$ to see how it affects shape of $\\mathbf{P}$. The `F00` slider affects the value of F[0, 0]. `covar` sets the intial covariance between the position and velocity($\\sigma_x\\sigma_{\\dot{x}}$). I recommend answering these questions at a minimum\n",
+    "Here is an animation of this equation that allows you to change the design of $\\mathbf F$ to see how it affects shape of $\\mathbf P$. The `F00` slider affects the value of F[0, 0]. `covar` sets the intial covariance between the position and velocity($\\sigma_x\\sigma_{\\dot x}$). I recommend answering these questions at a minimum\n",
    "\n",
-    "* what if $x$ is not correlated to $\\dot{x}$? (set F01 to 0, the rest at defaults)\n",
-    "* what if $x = 2\\dot{x}\\Delta t + x_0$? (set F01 to 2, the rest at defaults)\n",
-    "* what if $x = \\dot{x}\\Delta t + 2x_0$? (set F00 to 2, the rest at defaults)\n",
-    "* what if $x = \\dot{x}\\Delta t$?  (set F00 to 0, the rest at defaults)"
+    "* what if $x$ is not correlated to $\\dot x$? (set F01 to 0, the rest at defaults)\n",
+    "* what if $x = 2\\dot x\\Delta t + x_0$? (set F01 to 2, the rest at defaults)\n",
+    "* what if $x = \\dot x\\Delta t + 2x_0$? (set F00 to 2, the rest at defaults)\n",
+    "* what if $x = \\dot x\\Delta t$?  (set F00 to 0, the rest at defaults)"
   ]
  },
  {
@ -1707,23 +1707,23 @@
   "source": [
    "<u>**System Uncertainty**</u>\n",
    "\n",
-    "$\\textbf{S} = \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R}$\n",
+    "$\\textbf{S} = \\mathbf{H\\overline PH}^\\mathsf T + \\mathbf R$\n",
    "\n",
-    "To work in measurement space the Kalman filter has to project the covariance matrix into measurement space. The math for this is $\\mathbf{H\\bar{P}H}^\\mathsf{T}$, where $\\mathbf{\\bar{P}}$ is the *prior* covariance and $\\mathbf{H}$ is the measurement function.\n",
+    "To work in measurement space the Kalman filter has to project the covariance matrix into measurement space. The math for this is $\\mathbf{H\\overline PH}^\\mathsf T$, where $\\mathbf{\\overline P}$ is the *prior* covariance and $\\mathbf H$ is the measurement function.\n",
    "\n",
    "\n",
-    "You should recognize this $\\mathbf{ABA}^\\mathsf{T}$ form - the prediction step used $\\mathbf{FPF}^\\mathsf{T}$ to update $\\mathbf{P}$ with the state transition function. Here, we use the same form to update it with the measurement function. In a real sense the linear algebra is changing the coordinate system for us. \n",
+    "You should recognize this $\\mathbf{ABA}^\\mathsf T$ form - the prediction step used $\\mathbf{FPF}^\\mathsf T$ to update $\\mathbf P$ with the state transition function. Here, we use the same form to update it with the measurement function. In a real sense the linear algebra is changing the coordinate system for us. \n",
    "\n",
    "Once the covariance is in measurement space we need to account for the sensor noise. This is very easy  - we just add matrices. The result is variously called the *system uncertainty* or *innovation covariance*.\n",
    "\n",
    "Compare the equations for the system uncertainty and the covariance\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{S} &= \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R}\\\\\n",
-    "\\mathbf{\\bar{P}} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
+    "\\mathbf{S} &= \\mathbf{H\\overline PH}^\\mathsf T + \\mathbf R\\\\\n",
+    "\\mathbf{\\overline P} &= \\mathbf{FPF}^\\mathsf T + \\mathbf Q\n",
    "\\end{aligned}$$\n",
    "\n",
-    "In each equation $\\mathbf{P}$ is put into a different space with either the function $\\mathbf{H}$ or $\\mathbf{F}$. Once that is done we add the noise matrix associated with that space."
+    "In each equation $\\mathbf P$ is put into a different space with either the function $\\mathbf H$ or $\\mathbf F$. Once that is done we add the noise matrix associated with that space."
   ]
  },
  {
@ -1732,7 +1732,7 @@
   "source": [
    "<u>**Kalman Gain**</u>\n",
    "\n",
-    "$\\mathbf{K} = \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1}$\n",
+    "$\\mathbf K = \\mathbf{\\overline PH}^\\mathsf T \\mathbf{S}^{-1}$\n",
    "\n",
    "Look back at the residual diagram. Once we have a prediction and a measurement we need to select an estimate somewhere between the two. If we have more certainty about the measurement the estimate will be closer to it. If instead we have more certainty about the prediction then the estimate will be closer to it. \n",
    "\n",
@ -1747,10 +1747,10 @@
    "\n",
    "$K$ is the *Kalman gain*, and it is a real number between 0 and 1. Ensure you understand how it selects a mean somewhere between the prediction and measurement. The Kalman gain is a *percentage* or *ratio* - if K is .9 it takes 90% of the measurement and 10% of the prediction. \n",
    "\n",
-    "For the multivariate Kalman filter $\\mathbf{K}$ is a vector, not a scalar. Here is the equation again: $\\mathbf{K} = \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1}$. Is this a *ratio*? We can think of the inverse of a matrix as linear algebra's way of finding the reciprocal. Division is not defined for matrices, but it is useful to think of it in this way. So we can read the equation for $\\textbf{K}$ as meaning\n",
+    "For the multivariate Kalman filter $\\mathbf K$ is a vector, not a scalar. Here is the equation again: $\\mathbf K = \\mathbf{\\overline PH}^\\mathsf T \\mathbf{S}^{-1}$. Is this a *ratio*? We can think of the inverse of a matrix as linear algebra's way of finding the reciprocal. Division is not defined for matrices, but it is useful to think of it in this way. So we can read the equation for $\\textbf{K}$ as meaning\n",
    "\n",
-    "$$\\begin{aligned} \\mathbf{K} &\\approx \\frac{\\mathbf{\\bar{P}}\\mathbf{H}^\\mathsf{T}}{\\mathbf{S}} \\\\\n",
-    "\\mathbf{K} &\\approx \\frac{\\mathsf{uncertainty}_\\mathsf{prediction}}{\\mathsf{uncertainty}_\\mathsf{measurement}}\\mathbf{H}^\\mathsf{T}\n",
+    "$$\\begin{aligned} \\mathbf K &\\approx \\frac{\\mathbf{\\overline P}\\mathbf H^\\mathsf T}{\\mathbf{S}} \\\\\n",
+    "\\mathbf K &\\approx \\frac{\\mathsf{uncertainty}_\\mathsf{prediction}}{\\mathsf{uncertainty}_\\mathsf{measurement}}\\mathbf H^\\mathsf T\n",
    "\\end{aligned}$$"
   ]
  },
@ -1758,7 +1758,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The Kalman gain equation computes a ratio based on how much we trust the prediction vs the measurement. We did the same thing in every prior chapter. The equation is complicated because we are doing this in multiple dimensions via matrices, but the concept is simple. The $\\mathbf{H}^\\mathsf{T}$ term is less clear, I'll explain it soon."
+    "The Kalman gain equation computes a ratio based on how much we trust the prediction vs the measurement. We did the same thing in every prior chapter. The equation is complicated because we are doing this in multiple dimensions via matrices, but the concept is simple. The $\\mathbf H^\\mathsf T$ term is less clear, I'll explain it soon."
   ]
  },
  {
@ -1767,27 +1767,27 @@
   "source": [
    "$\\underline{\\texttt{Residual}}$\n",
    "\n",
-    "$\\mathbf{y} = \\mathbf{z} - \\mathbf{Hx}$\n",
+    "$\\mathbf y = \\mathbf z - \\mathbf{Hx}$\n",
    "\n",
-    "This is an easy one as we've covered this equation while designing the measurement function $\\mathbf{H}$. Recall that the measurement function converts a state into a measurement. So $\\mathbf{Hx}$ converts $\\mathbf{x}$ into an equivalent measurement. Once that is done, we can subtract it from the measurement $\\mathbf{z}$ to get the residual - the difference between the measurement and prediction.\n",
+    "This is an easy one as we've covered this equation while designing the measurement function $\\mathbf H$. Recall that the measurement function converts a state into a measurement. So $\\mathbf{Hx}$ converts $\\mathbf x$ into an equivalent measurement. Once that is done, we can subtract it from the measurement $\\mathbf z$ to get the residual - the difference between the measurement and prediction.\n",
    "\n",
    "$\\underline{\\texttt{State Update}}$\n",
    "\n",
-    "$\\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky}$\n",
+    "$\\mathbf x = \\mathbf{\\overline x} + \\mathbf{Ky}$\n",
    "\n",
-    "We select our new state to be along the residual, scaled by the Kalman gain. The scaling is performed by $\\mathbf{Ky}$, which both scales the residual and converts it back into state space with the $\\mathbf{H}^\\mathsf{T}$ term which is in $\\mathbf{K}$. This is added to the prior, yielding the equation: $\\mathbf{x} =\\mathbf{\\bar{x}} + \\mathbf{Ky}$. Let me write out $\\mathbf{K}$ so we can see the entire computation:\n",
+    "We select our new state to be along the residual, scaled by the Kalman gain. The scaling is performed by $\\mathbf{Ky}$, which both scales the residual and converts it back into state space with the $\\mathbf H^\\mathsf T$ term which is in $\\mathbf K$. This is added to the prior, yielding the equation: $\\mathbf x =\\mathbf{\\overline x} + \\mathbf{Ky}$. Let me write out $\\mathbf K$ so we can see the entire computation:\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{x} &= \\mathbf{\\bar{x}} + \\mathbf{Ky} \\\\\n",
-    "&= \\mathbf{\\bar{x}} + \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1}\\mathbf{y} \\\\\n",
-    "&\\approx \\mathbf{\\bar{x}} + \\frac{\\mathsf{uncertainty}_\\mathsf{prediction}}{\\mathsf{uncertainty}_\\mathsf{measurement}}\\mathbf{H}^\\mathsf{T}\\mathbf{y}\n",
+    "\\mathbf x &= \\mathbf{\\overline x} + \\mathbf{Ky} \\\\\n",
+    "&= \\mathbf{\\overline x} + \\mathbf{\\overline PH}^\\mathsf T \\mathbf{S}^{-1}\\mathbf y \\\\\n",
+    "&\\approx \\mathbf{\\overline x} + \\frac{\\mathsf{uncertainty}_\\mathsf{prediction}}{\\mathsf{uncertainty}_\\mathsf{measurement}}\\mathbf H^\\mathsf T\\mathbf y\n",
    "\\end{aligned}$$\n",
    "\n",
    "$\\underline{\\texttt{Covariance Update}}$\n",
    "\n",
-    "$\\mathbf{P} = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}$\n",
+    "$\\mathbf P = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\overline P}$\n",
    "\n",
-    "$\\mathbf{I}$ is the identity matrix, and is the way we represent $1$ in multiple dimensions. $\\mathbf{H}$ is our measurement function, and is a constant.  We can think of the equation as $\\mathbf{P} = (1-c\\mathbf{K})\\mathbf{P}$. $\\mathbf{K}$ is our ratio of how much prediction vs measurement we use. If $\\mathbf{K}$ is large then $(1-\\mathbf{cK})$ is small, and $\\mathbf{P}$ will be made smaller than it was. If $\\mathbf{K}$ is small, then $(1-\\mathbf{cK})$ is large, and $\\mathbf{P}$ will be relatively larger. This means that we adjust the size of our uncertainty by some factor of the Kalman gain.\n",
+    "$\\mathbf{I}$ is the identity matrix, and is the way we represent $1$ in multiple dimensions. $\\mathbf H$ is our measurement function, and is a constant.  We can think of the equation as $\\mathbf P = (1-c\\mathbf K)\\mathbf P$. $\\mathbf K$ is our ratio of how much prediction vs measurement we use. If $\\mathbf K$ is large then $(1-\\mathbf{cK})$ is small, and $\\mathbf P$ will be made smaller than it was. If $\\mathbf K$ is small, then $(1-\\mathbf{cK})$ is large, and $\\mathbf P$ will be relatively larger. This means that we adjust the size of our uncertainty by some factor of the Kalman gain.\n",
    "\n",
    "This equation can be numerically unstable and I don't use it in FilterPy. Later I'll share more complicated but numerically stable forms of this equation."
   ]
@ -1884,7 +1884,7 @@
   "source": [
    "The results are identical to the FilterPy version. Which you prefer is up to you. If you do not like object oriented programming FilterPy might be off-putting. I prefer not polluting my namespace with variables such as `x`, `P`, and so on; `dog_filter.x` is, to me, more readable.\n",
    "\n",
-    "More importantly, this example requires you to remember and program the equations for the Kalman filter. Sooner or later you will make a mistake. FilterPy's version ensures that your code will be correct. On the other hand, if you make a mistake in your definitions, such as making $\\mathbf{H}$ a column vector instead of a row vector, FilterPy's error message will be harder to debug than this explicit code. \n",
+    "More importantly, this example requires you to remember and program the equations for the Kalman filter. Sooner or later you will make a mistake. FilterPy's version ensures that your code will be correct. On the other hand, if you make a mistake in your definitions, such as making $\\mathbf H$ a column vector instead of a row vector, FilterPy's error message will be harder to debug than this explicit code. \n",
    "\n",
    "FilterPy's KalmanFilter class provides additional functionality such as smoothing, batch processing, faded memory filtering, computation of the maximum likelihood function, and more. You get all of this functionality without having to explicitly program it.\n",
    "\n",
@ -1899,20 +1899,20 @@
   "source": [
    "### Summary\n",
    "\n",
-    "We have learned the Kalman filter equations. Here they are all together for your review. There was a lot to learn, but I hope that as you went through each you recognized it's kinship with the equations in the univariate filter. In the *Kalman Math* chapter I will show you that if we set the dimension of $\\mathbf{x}$ to one that these equations revert back to the equations for the univariate filter. This is not \"like\" the univariate filter - it is a multiple dimension implementation of it.\n",
+    "We have learned the Kalman filter equations. Here they are all together for your review. There was a lot to learn, but I hope that as you went through each you recognized it's kinship with the equations in the univariate filter. In the *Kalman Math* chapter I will show you that if we set the dimension of $\\mathbf x$ to one that these equations revert back to the equations for the univariate filter. This is not \"like\" the univariate filter - it is a multiple dimension implementation of it.\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\text{Predict Step}\\\\\n",
-    "\\mathbf{\\bar{x}} &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
-    "\\mathbf{\\bar{P}} &= \\mathbf{FP{F}}^\\mathsf{T} + \\mathbf{Q} \\\\\n",
+    "\\mathbf{\\overline x} &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
+    "\\mathbf{\\overline P} &= \\mathbf{FP{F}}^\\mathsf T + \\mathbf Q \\\\\n",
    "\\\\\n",
    "\\text{Update Step}\\\\\n",
-    "\\textbf{S} &= \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R} \\\\\n",
-    "\\mathbf{K} &= \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1} \\\\\n",
-    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H \\bar{x}} \\\\\n",
-    "\\mathbf{x} &=\\mathbf{\\bar{x}} +\\mathbf{K\\textbf{y}} \\\\\n",
-    "\\mathbf{P} &= (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}\n",
+    "\\textbf{S} &= \\mathbf{H\\overline PH}^\\mathsf T + \\mathbf R \\\\\n",
+    "\\mathbf K &= \\mathbf{\\overline PH}^\\mathsf T \\mathbf{S}^{-1} \\\\\n",
+    "\\textbf{y} &= \\mathbf z - \\mathbf{H \\overline x} \\\\\n",
+    "\\mathbf x &=\\mathbf{\\overline x} +\\mathbf{K\\textbf{y}} \\\\\n",
+    "\\mathbf P &= (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\overline P}\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
@ -1920,21 +1920,21 @@
    "\n",
    " $$\n",
    "\\begin{aligned}\n",
-    "\\hat{\\mathbf{x}}_{k\\mid k-1} &= \\mathbf{F}_k\\hat{\\mathbf{x}}_{k-1\\mid k-1} + \\mathbf{B}_k \\mathbf{u}_k  \\\\\n",
-    "\\mathbf{P}_{k\\mid k-1} &=  \\mathbf{F}_k \\mathbf{P}_{k-1\\mid k-1} \\mathbf{F}_k^\\mathsf{T} + \\mathbf{Q}_k \\\\        \t\n",
-    "\\tilde{\\mathbf{y}}_k &= \\mathbf{z}_k - \\mathbf{H}_k\\hat{\\mathbf{x}}_{k\\mid k-1}\\\\\n",
-    "\\mathbf{S}_k &= \\mathbf{H}_k \\mathbf{P}_{k\\mid k-1} \\mathbf{H}_k^\\mathsf{T} + \\mathbf{R}_k \\\\\n",
-    "\\mathbf{K}_k &= \\mathbf{P}_{k\\mid k-1}\\mathbf{H}_k^\\mathsf{T} \\mathbf{S}_k^{-1}\\\\\n",
-    "\\hat{\\mathbf{x}}_{k\\mid k} &= \\hat{\\mathbf{x}}_{k\\mid k-1} + \\mathbf{K}_k\\tilde{\\mathbf{y}}_k\\\\\n",
-    "\\mathbf{P}_{k|k} &= (I - \\mathbf{K}_k \\mathbf{H}_k) \\mathbf{P}_{k|k-1}\n",
+    "\\hat{\\mathbf x}_{k\\mid k-1} &= \\mathbf F_k\\hat{\\mathbf x}_{k-1\\mid k-1} + \\mathbf B_k \\mathbf u_k  \\\\\n",
+    "\\mathbf P_{k\\mid k-1} &=  \\mathbf F_k \\mathbf P_{k-1\\mid k-1} \\mathbf F_k^\\mathsf T + \\mathbf Q_k \\\\        \t\n",
+    "\\tilde{\\mathbf y}_k &= \\mathbf z_k - \\mathbf H_k\\hat{\\mathbf x}_{k\\mid k-1}\\\\\n",
+    "\\mathbf{S}_k &= \\mathbf H_k \\mathbf P_{k\\mid k-1} \\mathbf H_k^\\mathsf T + \\mathbf R_k \\\\\n",
+    "\\mathbf K_k &= \\mathbf P_{k\\mid k-1}\\mathbf H_k^\\mathsf T \\mathbf{S}_k^{-1}\\\\\n",
+    "\\hat{\\mathbf x}_{k\\mid k} &= \\hat{\\mathbf x}_{k\\mid k-1} + \\mathbf K_k\\tilde{\\mathbf y}_k\\\\\n",
+    "\\mathbf P_{k|k} &= (I - \\mathbf K_k \\mathbf H_k) \\mathbf P_{k|k-1}\n",
    "\\\\\\end{aligned}\n",
    "$$\n",
    "\n",
-    "This notation uses the Bayesian $a\\mid b$ notation, which means $a$ given the evidence of $b$. The hat means estimate. Thus $\\hat{\\mathbf{x}}_{k\\mid k}$ means the estimate of the state $\\mathbf{x}$ at step $k$ (the first k) given the evidence from step $k$ (the second k). The posterior, in other words. $\\hat{\\mathbf{x}}_{k\\mid k-1}$ means the estimate for the state $\\mathbf{x}$ at step $k$ given the estimate from step $k - 1$. The prior, in other words. \n",
+    "This notation uses the Bayesian $a\\mid b$ notation, which means $a$ given the evidence of $b$. The hat means estimate. Thus $\\hat{\\mathbf x}_{k\\mid k}$ means the estimate of the state $\\mathbf x$ at step $k$ (the first k) given the evidence from step $k$ (the second k). The posterior, in other words. $\\hat{\\mathbf x}_{k\\mid k-1}$ means the estimate for the state $\\mathbf x$ at step $k$ given the estimate from step $k - 1$. The prior, in other words. \n",
    "\n",
    "This notation, copied from [Wikipedia](https://en.wikipedia.org/wiki/Kalman_filter#Details) [[1]](#[wiki_article]), allows a mathematician to express himself exactly. In formal publications presenting new results this precision is necessary. As a programmer I find it fairly unreadable. I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format in the literature, so each author makes different choices. I find it challenging to switch quickly between books and papers, and so have adopted my admittedly less precise notation. Mathematicians may write scathing emails to me, but I hope programmers and students will rejoice at my simplified notation.\n",
    "\n",
-    "The **Symbology** Appendix lists the notation used by various authors. This brings up another difficulty. Different authors use different variable names. $\\mathbf{x}$ is fairly universal, but after that it is anybody's guess. For example, it is common to use $\\mathbf{A}$ for what I call $\\mathbf{F}$. You must read carefully, and hope that the author defines their variables (they often do not).\n",
+    "The **Symbology** Appendix lists the notation used by various authors. This brings up another difficulty. Different authors use different variable names. $\\mathbf x$ is fairly universal, but after that it is anybody's guess. For example, it is common to use $\\mathbf{A}$ for what I call $\\mathbf F$. You must read carefully, and hope that the author defines their variables (they often do not).\n",
    "\n",
    "If you are a programmer trying to understand a paper's math equations, I suggest starting by removing all of the superscripts, subscripts, and diacriticals, replacing them with a single letter. If you work with equations like this every day this is superfluous advice, but when I read I am usually trying to understand the flow of computation. To me it is far more understandable to remember that $P$ in this step represents the updated value of $P$ computed in the last step, as opposed to trying to remember what $P_{k-1}(+)$ denotes, and what its relation to $P_k(-)$ is, if any, and how any of that relates to the completely different notation used in the paper I read 5 minutes ago."
   ]
@ -1947,7 +1947,7 @@
    "\n",
    "In our filter velocity is a hidden variable. How would a filter perform if we did not use velocity in the state?\n",
    "\n",
-    "Write a Kalman filter that uses the state $\\mathbf{x}=\\begin{bmatrix}x\\end{bmatrix}$ and compare it against a filter that uses $\\mathbf{x}=\\begin{bmatrix}x & \\dot{x}\\end{bmatrix}^\\mathsf{T}$."
+    "Write a Kalman filter that uses the state $\\mathbf x=\\begin{bmatrix}x\\end{bmatrix}$ and compare it against a filter that uses $\\mathbf x=\\begin{bmatrix}x & \\dot x\\end{bmatrix}^\\mathsf T$."
   ]
  },
  {
@ -2135,7 +2135,7 @@
   "metadata": {},
   "source": [
    "$$\\begin{aligned}\n",
-    "\\textbf{S} &= \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R} \\\\\n",
+    "\\textbf{S} &= \\mathbf{H\\overline PH}^\\mathsf T + \\mathbf R \\\\\n",
    "&= \\begin{bmatrix} 1 & 0\\end{bmatrix}\n",
    "\\begin{bmatrix}\\sigma^2_x & \\sigma_{xv} \\\\ \\sigma_{xv} & \\sigma^2_v\\end{bmatrix}\n",
    "\\begin{bmatrix} 1 \\\\ 0\\end{bmatrix} + \\begin{bmatrix}\\sigma_z^2\\end{bmatrix}\\\\\n",
@ -2150,7 +2150,7 @@
   "source": [
    "Now that we have $\\mathbf S$ we can find the value for the Kalman gain:\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{K} &= \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1} \\\\\n",
+    "\\mathbf K &= \\mathbf{\\overline PH}^\\mathsf T \\mathbf{S}^{-1} \\\\\n",
    "&= \\begin{bmatrix}\\sigma^2_x & \\sigma_{xv} \\\\ \\sigma_{xv} & \\sigma^2_v\\end{bmatrix}\n",
    "\\begin{bmatrix} 1 \\\\ 0\\end{bmatrix}\n",
    "\\begin{bmatrix}\\frac{1}{\\sigma_x^2 +\\sigma_z^2}\\end{bmatrix} \\\\\n",
@ -2177,21 +2177,21 @@
   "metadata": {},
   "source": [
    "$$\\begin{aligned}\\mathbf x \n",
-    "&=\\mathbf{\\bar x}+\\mathbf{K}(z-\\mathbf{Hx)} \\\\\n",
+    "&=\\mathbf{\\bar x}+\\mathbf K(z-\\mathbf{Hx)} \\\\\n",
    "&= \\mathbf{\\bar x}+\\mathbf Ky\\end{aligned}$$\n",
    "\n",
    "Here the residual $y$ is a scalar. Therefore it is multiplied into each element of $\\mathbf K$. Therefore we have\n",
    "\n",
-    "$$\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}=\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}^- + \\begin{bmatrix}K_x \\\\ K_{\\dot x}\\end{bmatrix}y$$\n",
+    "$$\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}=\\overline{\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}} + \\begin{bmatrix}K_x \\\\ K_{\\dot x}\\end{bmatrix}y$$\n",
    "\n",
    "Which gives this system of equations: \n",
    "\n",
    "$$x = \\bar x + yK_x\\\\\n",
-    "\\dot x = \\bar{\\dot x} + yK_{\\dot x}$$\n",
+    "\\dot x = \\overline{\\dot x} + yK_{\\dot x}$$\n",
    "\n",
    "The prediction $\\bar x$ was computed as $x + \\bar x \\Delta t$. If the prediction was perfect then the residual will be $y=0$ (ignoring noise in the measurement) and the velocity estimate will be unchanged. On the other hand, if the velocity estimate was very bad then the prediction will be very bad, and the residual will be large: $y >> 0$. In this case we update the velocity estimate with $yK_{\\dot x}$. $K_{\\dot x}$ is proportional to $COV(x,\\dot x)$. Therefore the velocity is updated by the error in the position times the a value proportional to the covariance between the position and velocity. The higher the correlation the larger the correction. \n",
    "\n",
-    "To bring this full circle, $COV(x,\\dot x)$ is the off-diagonal elements of $\\mathbf P$. Recall that those values were computed with $\\mathbf{FPF}^\\mathsf{T}$. So the covariance of position and velocity is computed during the predict step. The Kalman gain for the velocity is proportional to this covariance, and we adjust the velocity estimate based on how inaccurate it was during the last epoch times a value proportional to this covariance. \n",
+    "To bring this full circle, $COV(x,\\dot x)$ is the off-diagonal elements of $\\mathbf P$. Recall that those values were computed with $\\mathbf{FPF}^\\mathsf T$. So the covariance of position and velocity is computed during the predict step. The Kalman gain for the velocity is proportional to this covariance, and we adjust the velocity estimate based on how inaccurate it was during the last epoch times a value proportional to this covariance. \n",
    "\n",
    "In summary, these linear algebra equations may be very unfamiliar to you, but computation is actually very simple. It is essentially the same computation that we performed in the g-h filter. Our constants are different in this chapter because we take the noise in the process model and sensors into account, but the math is the same."
   ]
@ -2214,7 +2214,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's look at the effects of the measurement noise $\\mathbf{R}$ and process noise $\\mathbf{Q}$. We will want to see the effect of different settings for $\\mathbf{R}$ and $\\mathbf{Q}$, so I have hard coded our measurements in `zs` based on a variance of 225 meters squared. That is very large, but it magnifies the effects of various design choices on the graphs, making it easier to recognize what is happening. our first experiment holds $\\mathbf{R}$ constant while varying $\\mathbf{Q}$."
+    "Let's look at the effects of the measurement noise $\\mathbf R$ and process noise $\\mathbf Q$. We will want to see the effect of different settings for $\\mathbf R$ and $\\mathbf Q$, so I have hard coded our measurements in `zs` based on a variance of 225 meters squared. That is very large, but it magnifies the effects of various design choices on the graphs, making it easier to recognize what is happening. our first experiment holds $\\mathbf R$ constant while varying $\\mathbf Q$."
   ]
  },
  {
@ -2261,11 +2261,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The filter in the first plot should follow the noisy measurement closely. In the second plot the filter should vary from the measurement quite a bit, and be much closer to a straight line than in the first graph. Why does ${\\mathbf{Q}}$ affect the plots this way?\n",
+    "The filter in the first plot should follow the noisy measurement closely. In the second plot the filter should vary from the measurement quite a bit, and be much closer to a straight line than in the first graph. Why does ${\\mathbf Q}$ affect the plots this way?\n",
    "\n",
    "Let's remind ourselves of what the term *process uncertainty* means. Consider the problem of tracking a ball. We can accurately model its behavior in a vacuum with math, but with wind, varying air density, temperature, and an spinning ball with an imperfect surface our model will diverge from reality. \n",
    "\n",
-    "In the first case we set `Q_var=20 m^2`, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\" as we are saying that the variance in the velocity is 20. Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf{F}}$, but the upshot of that is to not trust the motion prediction step. The filter will be computing velocity ($\\dot{x}$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to trust but the measurements, and thus it follows the measurements closely. \n",
+    "In the first case we set `Q_var=20 m^2`, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\" as we are saying that the variance in the velocity is 20. Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf F}$, but the upshot of that is to not trust the motion prediction step. The filter will be computing velocity ($\\dot x$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to trust but the measurements, and thus it follows the measurements closely. \n",
    "\n",
    "In the second case we set `Q_var=0.02 m^2`, which is quite small. In physical terms we are telling the filter \"trust the prediction, it is really good!\". More strictly this actually says there is very small amounts of process noise (variance 0.02 $m^2$), so the process model is very accurate. So the filter ends up ignoring some of the measurement as it jumps up and down, because the variation in the measurement does not match our trustworthy velocity prediction."
   ]
@ -2305,7 +2305,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The effect of this can be subtle. We have created an suboptimal filter because the actual measurement noise variance is 225 $m^2$, not 10,000 $m^2$. By setting the filter's noise variance so high we force the filter to favor the prediction over the measurement. This can lead to apparently very smooth and good looking results. In the chart above the track may look extremely good to you since it follows the ideal path very closely. But, the 'great' behavior at the start should give you pause - the filter has not converged yet ($\\mathbf{P}$ is still large) so it should not be able to be so close to the actual position. We can see that $\\mathbf{P}$ has not converged because the entire chart is colored with the yellow background denoting the size of $\\mathbf{P}$. Let's see the result of a bad initial guess for the position by guessing the initial position to be 50 m and the initial velocity to be 1 m/s."
+    "The effect of this can be subtle. We have created an suboptimal filter because the actual measurement noise variance is 225 $m^2$, not 10,000 $m^2$. By setting the filter's noise variance so high we force the filter to favor the prediction over the measurement. This can lead to apparently very smooth and good looking results. In the chart above the track may look extremely good to you since it follows the ideal path very closely. But, the 'great' behavior at the start should give you pause - the filter has not converged yet ($\\mathbf P$ is still large) so it should not be able to be so close to the actual position. We can see that $\\mathbf P$ has not converged because the entire chart is colored with the yellow background denoting the size of $\\mathbf P$. Let's see the result of a bad initial guess for the position by guessing the initial position to be 50 m and the initial velocity to be 1 m/s."
   ]
  },
  {
@ -2367,9 +2367,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Here we see that the filter initially struggles for several iterations to acquire the track, but then it accurately tracks our dog. In fact, this is nearly optimum - we have not designed $\\mathbf{Q}$ optimally, but $\\mathbf{R}$ is optimal. A rule of thumb for $\\mathbf{Q}$ is to set it between $\\frac{1}{2}\\Delta a$ to $\\Delta a$, where $\\Delta a$ is the maximum amount that the acceleration will change between sample period. This only applies for the assumption we are making in this chapter - that acceleration is constant and uncorrelated between each time period. In the Kalman Math chapter we will discuss several different ways of designing $\\mathbf{Q}$.\n",
+    "Here we see that the filter initially struggles for several iterations to acquire the track, but then it accurately tracks our dog. In fact, this is nearly optimum - we have not designed $\\mathbf Q$ optimally, but $\\mathbf R$ is optimal. A rule of thumb for $\\mathbf Q$ is to set it between $\\frac{1}{2}\\Delta a$ to $\\Delta a$, where $\\Delta a$ is the maximum amount that the acceleration will change between sample period. This only applies for the assumption we are making in this chapter - that acceleration is constant and uncorrelated between each time period. In the Kalman Math chapter we will discuss several different ways of designing $\\mathbf Q$.\n",
    "\n",
-    "To some extent you can get similar looking output by varying either ${\\mathbf{R}}$ or ${\\mathbf{Q}}$, but I urge you to not 'magically' alter these until you get output that you like. Always think about the physical implications of these assignments, and vary ${\\mathbf{R}}$ and/or ${\\mathbf{Q}}$ based on your knowledge of the system you are filtering. Back that up with extensive simulations and/or trial runs of real data."
+    "To some extent you can get similar looking output by varying either ${\\mathbf R}$ or ${\\mathbf Q}$, but I urge you to not 'magically' alter these until you get output that you like. Always think about the physical implications of these assignments, and vary ${\\mathbf R}$ and/or ${\\mathbf Q}$ based on your knowledge of the system you are filtering. Back that up with extensive simulations and/or trial runs of real data."
   ]
  },
  {
@ -2426,18 +2426,18 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Looking at the output we see a very large spike in the filter output at the beginning. We set $\\text{P}=500\\, \\mathbf{I}_2$ (this is shorthand notation for a 2x2 diagonal matrix with 500 in the diagonal). We now have enough information to understand what this means, and how the Kalman filter treats it. The 500 in the upper left hand corner corresponds to $\\sigma^2_x$; therefore we are saying the standard deviation of `x` is $\\sqrt{500}$, or roughly 22.36 m. Roughly 99% of the samples occur withing $3\\sigma$, therefore $\\sigma^2_x=500$ is telling the Kalman filter that the prediction (the prior) could be up to 67 meters off. That is a large error, so when the measurement spikes the Kalman filter distrusts its own estimate and jumps wildly to try to incorporate the measurement. Then, as the filter evolves $\\mathbf{P}$ quickly converges to a more realistic value.\n",
+    "Looking at the output we see a very large spike in the filter output at the beginning. We set $\\text{P}=500\\, \\mathbf{I}_2$ (this is shorthand notation for a 2x2 diagonal matrix with 500 in the diagonal). We now have enough information to understand what this means, and how the Kalman filter treats it. The 500 in the upper left hand corner corresponds to $\\sigma^2_x$; therefore we are saying the standard deviation of `x` is $\\sqrt{500}$, or roughly 22.36 m. Roughly 99% of the samples occur withing $3\\sigma$, therefore $\\sigma^2_x=500$ is telling the Kalman filter that the prediction (the prior) could be up to 67 meters off. That is a large error, so when the measurement spikes the Kalman filter distrusts its own estimate and jumps wildly to try to incorporate the measurement. Then, as the filter evolves $\\mathbf P$ quickly converges to a more realistic value.\n",
    "\n",
    "Let's look at the math behind this. The equation for the Kalman gain is\n",
    "\n",
-    "$$\\mathbf{K} = \\mathbf{\\bar{P}} \\mathbf{H}^\\mathsf{T}\\mathbf{S}^{-1} \\approx \\frac{\\mathbf{\\bar{P}}\\mathbf{H}^\\mathsf{T}}{\\mathbf{S}} \n",
-    "\\approx \\frac{\\mathsf{uncertainty}_\\mathsf{prediction}}{\\mathsf{uncertainty}_\\mathsf{measurement}}\\mathbf{H}^\\mathsf{T}\n",
+    "$$\\mathbf K = \\mathbf{\\overline P} \\mathbf H^\\mathsf T\\mathbf{S}^{-1} \\approx \\frac{\\mathbf{\\overline P}\\mathbf H^\\mathsf T}{\\mathbf{S}} \n",
+    "\\approx \\frac{\\mathsf{uncertainty}_\\mathsf{prediction}}{\\mathsf{uncertainty}_\\mathsf{measurement}}\\mathbf H^\\mathsf T\n",
    "$$\n",
    "\n",
-    "It is a ratio of the uncertainty of the prediction vs measurement. Here the uncertainty in the prediction is large, so $\\mathbf{K}$ is large (near 1 if this was a scalar). $\\mathbf{K}$ is multiplied by the residual $\\textbf{y} = \\mathbf{z} - \\mathbf{H \\bar{x}}$, which is the measurement minus the prediction, so a large $\\mathbf{K}$ favors the measurement. Therefore if $\\mathbf{P}$ is large relative to the sensor uncertainty $\\mathbf{R}$ the filter will form most of the estimate from the measurement. \n",
+    "It is a ratio of the uncertainty of the prediction vs measurement. Here the uncertainty in the prediction is large, so $\\mathbf K$ is large (near 1 if this was a scalar). $\\mathbf K$ is multiplied by the residual $\\textbf{y} = \\mathbf z - \\mathbf{H \\overline x}$, which is the measurement minus the prediction, so a large $\\mathbf K$ favors the measurement. Therefore if $\\mathbf P$ is large relative to the sensor uncertainty $\\mathbf R$ the filter will form most of the estimate from the measurement. \n",
    "\n",
    "\n",
-    "Now let us see the effect of a smaller initial value of $\\mathbf{P} = 1.0\\, \\mathbf{I}_2$."
+    "Now let us see the effect of a smaller initial value of $\\mathbf P = 1.0\\, \\mathbf{I}_2$."
   ]
  },
  {
@ -2479,9 +2479,9 @@
   "source": [
    "This *looks* good at first blush. The plot does not have the spike that the former plot did; the filter starts tracking the measurements and doesn't take any time to settle to the signal. However, if we look at the plots for P you can see that there is an initial spike for the variance in position, and that it never really converges. Poor design leads to a long convergence time, and suboptimal results. \n",
    "\n",
-    "So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to use a small $\\mathbf{P}$. Yes, this will avoid having the Kalman filter take time to accurately track the signal, but if we are truly uncertain about the initial measurements this can cause the filter to generate very bad results. If we are tracking a living object we are probably very uncertain about where it is before we start tracking it. On the other hand, if we are filtering the output of a thermometer, we are as certain about the first measurement as the 1000th. For your Kalman filter to perform well you must set $\\mathbf{P}$ to a value that truly reflects your knowledge about the data. \n",
+    "So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to use a small $\\mathbf P$. Yes, this will avoid having the Kalman filter take time to accurately track the signal, but if we are truly uncertain about the initial measurements this can cause the filter to generate very bad results. If we are tracking a living object we are probably very uncertain about where it is before we start tracking it. On the other hand, if we are filtering the output of a thermometer, we are as certain about the first measurement as the 1000th. For your Kalman filter to perform well you must set $\\mathbf P$ to a value that truly reflects your knowledge about the data. \n",
    "\n",
-    "Let's see the result of a bad initial estimate coupled with a very small $\\mathbf{P}$. We will set our initial estimate at x = 100 m (whereas the dog actually starts at 0m), but set `P=1` m$^2$. This is clearly an incorrect value for $\\mathbf{P}$ as the estimate is off by 100 m but we tell the filter that it the $3\\sigma$ error is 3 m."
+    "Let's see the result of a bad initial estimate coupled with a very small $\\mathbf P$. We will set our initial estimate at x = 100 m (whereas the dog actually starts at 0m), but set `P=1` m$^2$. This is clearly an incorrect value for $\\mathbf P$ as the estimate is off by 100 m but we tell the filter that it the $3\\sigma$ error is 3 m."
   ]
  },
  {
@ -2609,7 +2609,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "If you are viewing this in Jupyter Notebook or on the web, here is an animation of the filter filtering the data. I've tuned the filter parameters such that it is easy to see a change in $\\mathbf{P}$ as the filter progresses.\n",
+    "If you are viewing this in Jupyter Notebook or on the web, here is an animation of the filter filtering the data. I've tuned the filter parameters such that it is easy to see a change in $\\mathbf P$ as the filter progresses.\n",
    "<img src='animations/multivariate_track1.gif'>"
   ]
  },
@ -2617,13 +2617,13 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The output on these is a bit messy, but you should be able to see what is happening. In both plots we are drawing the covariance matrix for each point. We start with the covariance $\\mathbf{P}=(\\begin{smallmatrix}20&0\\\\0&20\\end{smallmatrix})$, which signifies a lot of uncertainty about our initial belief. After we receive the first measurement the Kalman filter updates this belief, and so the variance is no longer as large. In the top plot the first ellipse (the one on the far left) should be a slightly squashed ellipse. As the filter continues processing the measurements the covariance ellipse quickly shifts shape until it settles down to being a long, narrow ellipse tilted in the direction of movement.\n",
+    "The output on these is a bit messy, but you should be able to see what is happening. In both plots we are drawing the covariance matrix for each point. We start with the covariance $\\mathbf P=(\\begin{smallmatrix}20&0\\\\0&20\\end{smallmatrix})$, which signifies a lot of uncertainty about our initial belief. After we receive the first measurement the Kalman filter updates this belief, and so the variance is no longer as large. In the top plot the first ellipse (the one on the far left) should be a slightly squashed ellipse. As the filter continues processing the measurements the covariance ellipse quickly shifts shape until it settles down to being a long, narrow ellipse tilted in the direction of movement.\n",
    "\n",
    "Think about what this means physically. The x-axis of the ellipse denotes our uncertainty in position, and the y-axis our uncertainty in velocity. So, an ellipse that is taller than it is wide signifies that we are more uncertain about the velocity than the position. Conversely, a wide, narrow ellipse shows high uncertainty in position and low uncertainty in velocity. Finally, the amount of tilt shows the amount of correlation between the two variables. \n",
    "\n",
    "The first plot, with $R=5 m^2$, finishes up with an ellipse that is wider than it is tall. If that is not clear I have printed out the variances for the last ellipse in the lower right hand corner.\n",
    "\n",
-    "In contrast, the second plot, with `R=0.5` $m^2$, has a final ellipse that is taller than wide. The ellipses in the second plot are all much smaller than the ellipses in the first plot. This stands to reason because a small $\\small\\mathbf{R}$ implies a small amount of noise in our measurements. Small noise means accurate predictions, and thus a strong belief in our position. "
+    "In contrast, the second plot, with `R=0.5` $m^2$, has a final ellipse that is taller than wide. The ellipses in the second plot are all much smaller than the ellipses in the first plot. This stands to reason because a small $\\small\\mathbf R$ implies a small amount of noise in our measurements. Small noise means accurate predictions, and thus a strong belief in our position. "
   ]
  },
  {
@ -2637,7 +2637,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Why are the ellipses for $\\mathbf{R}=5 m^2$ shorter, and more tilted than the ellipses for $\\mathbf{R}=0.5 m^2$. Hint: think about this in the context of what these ellipses mean physically, not in terms of the math. If you aren't sure about the answer,change $\\mathbf{R}$ to truly large and small numbers such as 100 $m^2$ and 0.1 $m^2$, observe the changes, and think about what this means. "
+    "Why are the ellipses for $\\mathbf R=5 m^2$ shorter, and more tilted than the ellipses for $\\mathbf R=0.5 m^2$. Hint: think about this in the context of what these ellipses mean physically, not in terms of the math. If you aren't sure about the answer,change $\\mathbf R$ to truly large and small numbers such as 100 $m^2$ and 0.1 $m^2$, observe the changes, and think about what this means. "
   ]
  },
  {
@ -2651,11 +2651,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The x-axis is for position, and x-axis is velocity. An ellipse that is vertical, or nearly so, says there is no correlation between position and velocity, and an ellipse that is diagonal says that there is a lot of correlation. Phrased that way, the results sound unlikely. The tilt of the ellipse changes, but the correlation shouldn't be changing over time. But this is a measure of the *output of the filter*, not a description of the actual, physical world. When $\\mathbf{R}$ is very large we are telling the filter that there is a lot of noise in the measurements. In that case the Kalman gain $\\mathbf{K}$ is set to favor the prediction over the measurement, and the prediction comes from the velocity state variable. Thus there is a large correlation between $x$ and $\\dot{x}$. Conversely, if $\\mathbf{R}$ is small, we are telling the filter that the measurement is very trustworthy, and $\\mathbf{K}$ is set to favor the measurement over the prediction. Why would the filter want to use the prediction if the measurement is nearly perfect? If the filter is not using much from the prediction there will be very little correlation reported. \n",
+    "The x-axis is for position, and x-axis is velocity. An ellipse that is vertical, or nearly so, says there is no correlation between position and velocity, and an ellipse that is diagonal says that there is a lot of correlation. Phrased that way, the results sound unlikely. The tilt of the ellipse changes, but the correlation shouldn't be changing over time. But this is a measure of the *output of the filter*, not a description of the actual, physical world. When $\\mathbf R$ is very large we are telling the filter that there is a lot of noise in the measurements. In that case the Kalman gain $\\mathbf K$ is set to favor the prediction over the measurement, and the prediction comes from the velocity state variable. Thus there is a large correlation between $x$ and $\\dot x$. Conversely, if $\\mathbf R$ is small, we are telling the filter that the measurement is very trustworthy, and $\\mathbf K$ is set to favor the measurement over the prediction. Why would the filter want to use the prediction if the measurement is nearly perfect? If the filter is not using much from the prediction there will be very little correlation reported. \n",
    "\n",
    "**This is a critical point to understand!**. The Kalman filter is a mathematical model for a real world system. A report of little correlation *does not mean* there is no correlation in the physical system, just that there was no *linear* correlation in the mathematical model. It's a report of how much measurement vs prediction was incorporated into the model.  \n",
    "\n",
-    "Let's bring that point home with a truly large measurement error. We will set $\\mathbf{R}=200\\, m^2$. Think about what the plot will look like before scrolling down."
+    "Let's bring that point home with a truly large measurement error. We will set $\\mathbf R=200\\, m^2$. Think about what the plot will look like before scrolling down."
   ]
  },
  {
@ -2684,14 +2684,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "I hope the result was what you were expecting. The ellipse quickly became very wide and not very tall. It did this because the Kalman filter mostly used the prediction vs the measurement to produce the filtered result. We can also see how the filter output is slow to acquire the track. The Kalman filter assumes that the measurements are extremely noisy, and so it is very slow to update its estimate for $\\dot{x}$. "
+    "I hope the result was what you were expecting. The ellipse quickly became very wide and not very tall. It did this because the Kalman filter mostly used the prediction vs the measurement to produce the filtered result. We can also see how the filter output is slow to acquire the track. The Kalman filter assumes that the measurements are extremely noisy, and so it is very slow to update its estimate for $\\dot x$. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Keep looking at these plots until you grasp how to interpret the covariance matrix $\\mathbf{P}$. When you work with a $9{\\times}9$ matrix it may seem overwhelming - there are 81 numbers to interpret. Just break it down - the diagonal contains the variance for each state variable, and all off diagonal elements are the product of two variances and a scaling factor $p$. You cannot plot a $9{\\times}9$ matrix on the screen so you have to develop your intuition and understanding in this simple, 2-D case. \n",
+    "Keep looking at these plots until you grasp how to interpret the covariance matrix $\\mathbf P$. When you work with a $9{\\times}9$ matrix it may seem overwhelming - there are 81 numbers to interpret. Just break it down - the diagonal contains the variance for each state variable, and all off diagonal elements are the product of two variances and a scaling factor $p$. You cannot plot a $9{\\times}9$ matrix on the screen so you have to develop your intuition and understanding in this simple, 2-D case. \n",
    "\n",
    ">When plotting covariance ellipses, make sure to always use ax.set_aspect('equal') or plt.axis('equal') in your code (the former lets you set the xlim and ylim values). If the axis use different scales the ellipses will be drawn distorted. For example, the ellipse may be drawn as being taller than it is wide, but it may actually be wider than tall."
   ]
@ -2703,19 +2703,19 @@
    "## Filter Initialization\n",
    "\n",
    "\n",
-    "There are many schemes for initializing the filter (i.e. choosing the initial values for $\\mathbf{x}$ and $\\mathbf{P}$). I will share a common approach that performs well in most situations. In this scheme you do not initialize the filter until you get the first measurement, $\\mathbf{z}_0$. From this you can compute the initial value for $\\mathbf{x}$ with $\\mathbf{x}_0 = \\mathbf{z}_0$. If  $\\mathbf{z}$ is not of the same size, type, and units as $\\mathbf{x}$, which is usually the case, we can use our measurement function as follow.\n",
+    "There are many schemes for initializing the filter (i.e. choosing the initial values for $\\mathbf x$ and $\\mathbf P$). I will share a common approach that performs well in most situations. In this scheme you do not initialize the filter until you get the first measurement, $\\mathbf z_0$. From this you can compute the initial value for $\\mathbf x$ with $\\mathbf x_0 = \\mathbf z_0$. If  $\\mathbf z$ is not of the same size, type, and units as $\\mathbf x$, which is usually the case, we can use our measurement function as follow.\n",
    "\n",
    "We know\n",
    "\n",
-    "$$\\mathbf{z} = \\mathbf{Hx}$$\n",
+    "$$\\mathbf z = \\mathbf{Hx}$$\n",
    "\n",
    "Hence,\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{H}^{-1}\\mathbf{Hx} &= \\mathbf{H}^{-1}\\mathbf{z} \\\\\n",
-    "\\mathbf{x} &= \\mathbf{H}^{-1}\\mathbf{z}\\end{aligned}$$\n",
+    "\\mathbf H^{-1}\\mathbf{Hx} &= \\mathbf H^{-1}\\mathbf z \\\\\n",
+    "\\mathbf x &= \\mathbf H^{-1}\\mathbf z\\end{aligned}$$\n",
    "\n",
-    "Matrix inversion requires a square matrix, but $\\mathbf{H}$ is rarely square. SciPy will compute the Moore-Penrose pseudo-inverse of a matrix with `scipy.linalg.pinv`, so your code might look like"
+    "Matrix inversion requires a square matrix, but $\\mathbf H$ is rarely square. SciPy will compute the Moore-Penrose pseudo-inverse of a matrix with `scipy.linalg.pinv`, so your code might look like"
   ]
  },
  {
@ -2750,11 +2750,11 @@
   "source": [
    "Specialized knowledge of your problem domain may lead you to a different computation, but this is one way to do it. \n",
    "\n",
-    "Now we need to compute a value for $\\mathbf{P}$. This will vary by problem, but in general you will use the measurement error $\\mathbf{R}$ for identical terms, and maximum values for the rest of the terms. Maybe that isn't clear. In this chapter we have been tracking and object using position and velocity as the state, and the measurements have been positions. In that case we would initialize $\\mathbf{P}$ with\n",
+    "Now we need to compute a value for $\\mathbf P$. This will vary by problem, but in general you will use the measurement error $\\mathbf R$ for identical terms, and maximum values for the rest of the terms. Maybe that isn't clear. In this chapter we have been tracking and object using position and velocity as the state, and the measurements have been positions. In that case we would initialize $\\mathbf P$ with\n",
    "\n",
-    "$$\\mathbf{P} = \\begin{bmatrix}\\mathbf{R}_0 & 0 \\\\0 & vel_{max}^2\\end{bmatrix}$$\n",
+    "$$\\mathbf P = \\begin{bmatrix}\\mathbf R_0 & 0 \\\\0 & vel_{max}^2\\end{bmatrix}$$\n",
    "\n",
-    "The diagonal of $\\mathbf{P}$ contains the variance of each state variable, so we populate it with reasonable values. $\\mathbf{R}$ is a reasonable variance for the position, and the maximum velocity squared is a reasonable variance for the velocity. It is squared because variance is squared: $\\sigma^2$.\n",
+    "The diagonal of $\\mathbf P$ contains the variance of each state variable, so we populate it with reasonable values. $\\mathbf R$ is a reasonable variance for the position, and the maximum velocity squared is a reasonable variance for the velocity. It is squared because variance is squared: $\\sigma^2$.\n",
    "\n",
    "You really need to understand the domain in which you are working and initialize your filter on the best available information. For example, suppose we were trying to track horses in a horse race. The initial measurements might be very bad, and provide you with a position far from the starting gate. We know that the horse must start at the starting gate; initializing the filter to the initial measurement would lead to suboptimal results. In this scenario we would want to always initialize the Kalman filter with the starting gate position of the horse."
   ]
@ -2979,7 +2979,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We see that the noise swamps the signal, causing the raw values to be essentially worthless. The filter is maintaining a separate estimate for velocity. The Kalman gain $\\mathbf{k}$ is multidimensional. For example, it might have the value  $\\mathbf{k} = [0.1274, 0.843]^\\mathsf{T}$. the first value is used to scale the residual of the position, and the second value will scale the residual of the velocity. The covariance matrix tells the filter how correlated the position and velocity are, and each will be optimally filtered. \n",
+    "We see that the noise swamps the signal, causing the raw values to be essentially worthless. The filter is maintaining a separate estimate for velocity. The Kalman gain $\\mathbf K$ is multidimensional. For example, it might have the value  $\\mathbf K = [0.1274, 0.843]^\\mathsf T$. the first value is used to scale the residual of the position, and the second value will scale the residual of the velocity. The covariance matrix tells the filter how correlated the position and velocity are, and each will be optimally filtered. \n",
    "\n",
    "I show this to reiterate the importance of using Kalman filters to compute velocities, accelerations, and even higher order values. I use a Kalman filter even when my measurements are so accurate that I am willing to use them unfiltered because it allows me accurate estimates for velocities and accelerations."
   ]
@ -3000,15 +3000,15 @@
    "\n",
    "I established a series of steps for designing a Kalman filter. These are not a usual part of the Kalman filter literature, and are only meant as a guide, not a prescription. Designing for a hard problem is an iterative process. You make a guess at the state vector, work out what your measurement and state models are, run some tests, and then alter the design as necessary. \n",
    "\n",
-    "The design of $\\mathbf{R}$ and $\\mathbf{Q}$ is often quite challenging. I've made it appear to be quite scientific. Your sensor has Gaussian noise of $\\mathcal{N}(0, \\sigma^2)$, so set $\\mathbf{R}=\\sigma^2$. Easy! But I told you a dirty lie. Sensors are not Gaussian. We started the book with a bathroom scale. Suppose $\\sigma=1$ kg, and you try to weigh something that weighs 0.5 kg. Theory tells us we will get negative measurements, but of course the scale will never report weights less than zero. Real world sensors typically have *fat tails* (known as *kurtosis*) and *skew*. In some cases, such as with the scale, one or both tails are truncated.\n",
+    "The design of $\\mathbf R$ and $\\mathbf Q$ is often quite challenging. I've made it appear to be quite scientific. Your sensor has Gaussian noise of $\\mathcal{N}(0, \\sigma^2)$, so set $\\mathbf R=\\sigma^2$. Easy! But I told you a dirty lie. Sensors are not Gaussian. We started the book with a bathroom scale. Suppose $\\sigma=1$ kg, and you try to weigh something that weighs 0.5 kg. Theory tells us we will get negative measurements, but of course the scale will never report weights less than zero. Real world sensors typically have *fat tails* (known as *kurtosis*) and *skew*. In some cases, such as with the scale, one or both tails are truncated.\n",
    "\n",
-    "The case with $\\mathbf{Q}$ is more dire. I hope you were skeptical when I blithely assigned a noise matrix to my prediction about the movements of a dog. Who can say what a dog will do next? The Kalman filter in my GPS doesn't know about hills, the outside winds, or my terrible driving skills. Yet the filter requires a precise number to encapsulate all of that information, and it needs to work while I drive off-road in the desert, and when a Formula One champion drives on a closed circuit track.\n",
+    "The case with $\\mathbf Q$ is more dire. I hope you were skeptical when I blithely assigned a noise matrix to my prediction about the movements of a dog. Who can say what a dog will do next? The Kalman filter in my GPS doesn't know about hills, the outside winds, or my terrible driving skills. Yet the filter requires a precise number to encapsulate all of that information, and it needs to work while I drive off-road in the desert, and when a Formula One champion drives on a closed circuit track.\n",
    "\n",
-    "These problems led some researchers and engineers to derogatorily call the Kalman filter a 'ball of mud'. In other words, it doesn't always hold together so well. Another term to know - Kalman filters can become *smug*. Their estimates are based solely on what you tell it the noises are. Those values can lead to overly confident estimates. $\\mathbf{P}$ gets smaller and smaller while the filter is actually becoming more and more inaccurate! In the worst case the filter diverges. We will see a lot of that when we start studying nonlinear filters. \n",
+    "These problems led some researchers and engineers to derogatorily call the Kalman filter a 'ball of mud'. In other words, it doesn't always hold together so well. Another term to know - Kalman filters can become *smug*. Their estimates are based solely on what you tell it the noises are. Those values can lead to overly confident estimates. $\\mathbf P$ gets smaller and smaller while the filter is actually becoming more and more inaccurate! In the worst case the filter diverges. We will see a lot of that when we start studying nonlinear filters. \n",
    "\n",
    "The Kalman filter is a mathematical model of the world. The output is only as accurate as that model. To make the math tractable we had to make some assumptions. We assume that the sensors and motion model have Gaussian noise. We assume that everything is linear. If that is true, the Kalman filter is *optimal* in a least squares sense. This means that there is no way to make a better estimate than what the filter gives us. However, these assumption are almost never true, and hence the model is necessarily limited, and a working filter is rarely optimal.\n",
    "\n",
-    "In later chapters we will deal with the problem of nonlinearity. For now I want you to understand that designing the matrices of a linear filter is an experimental procedure more than a mathematical one. Use math to establish the initial values, but then you need to experiment. If there is a lot of unaccounted noise in the world (wind, etc) you may have to make $\\mathbf{Q}$ larger. If you make it too large the filter fails to respond quickly to changes. In the **Adaptive Filters** chapter you will learn some alternative techniques that allow you to change the filter design in real time in response to the inputs and performance, but for now you need to find one set of values that works for the conditions your filter will encounter. Noise matrices for an acrobatic plane might be different if the pilot is a student than if the pilot is an expert as the dynamics will be quite different."
+    "In later chapters we will deal with the problem of nonlinearity. For now I want you to understand that designing the matrices of a linear filter is an experimental procedure more than a mathematical one. Use math to establish the initial values, but then you need to experiment. If there is a lot of unaccounted noise in the world (wind, etc) you may have to make $\\mathbf Q$ larger. If you make it too large the filter fails to respond quickly to changes. In the **Adaptive Filters** chapter you will learn some alternative techniques that allow you to change the filter design in real time in response to the inputs and performance, but for now you need to find one set of values that works for the conditions your filter will encounter. Noise matrices for an acrobatic plane might be different if the pilot is a student than if the pilot is an expert as the dynamics will be quite different."
   ]
  },
  {
@ -3045,7 +3045,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,
--- a/07-Kalman-Filter-Math.ipynb
+++ b/07-Kalman-Filter-Math.ipynb
@ -311,9 +311,9 @@
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    " \\mathbf{v} &= \\frac{d \\mathbf{x}}{d t}\\\\ \n",
-    " \\quad \\mathbf{a} &= \\frac{d \\mathbf{v}}{d t} \\\\\n",
-    " &= \\frac{d^2 \\mathbf{x}}{d t^2} \\,\\!\n",
+    " \\mathbf v &= \\frac{d \\mathbf x}{d t}\\\\ \n",
+    " \\quad \\mathbf{a} &= \\frac{d \\mathbf v}{d t} \\\\\n",
+    " &= \\frac{d^2 \\mathbf x}{d t^2} \\,\\!\n",
    "\\end{aligned}\n",
    " $$\n",
    " \n",
@ -329,18 +329,18 @@
    "\n",
    "In the next section we will learn techniques to convert a set of differential equations into a set of linear equations.  Using the same notation from previous chapters, we can say that our model of the system without noise is:\n",
    "\n",
-    "$$ f(\\mathbf{x}) = \\mathbf{Ax}$$\n",
+    "$$ f(\\mathbf x) = \\mathbf{Ax}$$\n",
    "\n",
    "That is, we have a set of linear equations that describe our system. For our car, \n",
-    "$\\mathbf{A}$ will be the coefficients for Newton's equations of motion. \n",
+    "$\\mathbf A$ will be the coefficients for Newton's equations of motion. \n",
    "\n",
    "Now we need to model the noise. We will call that *w*, and add it to the equation.\n",
    "\n",
-    "$$ f(\\mathbf{x}) = \\mathbf{Ax} + \\mathbf{w}$$\n",
+    "$$ f(\\mathbf x) = \\mathbf{Ax} + \\mathbf w$$\n",
    "\n",
-    "Finally, we need to consider inputs into the system. We are dealing with linear problems here, so we will assume that there is some input $u$ into the system, and that we have some linear model that defines how that input changes the system. For example, pressing the accelerator in your car makes it accelerate, and gravity causes balls to fall. Both are contol inputs. We will need a matrix $\\mathbf{B}$ to convert $u$ into the effect on the system. We add that into our equation:\n",
+    "Finally, we need to consider inputs into the system. We are dealing with linear problems here, so we will assume that there is some input $u$ into the system, and that we have some linear model that defines how that input changes the system. For example, pressing the accelerator in your car makes it accelerate, and gravity causes balls to fall. Both are contol inputs. We will need a matrix $\\mathbf B$ to convert $u$ into the effect on the system. We add that into our equation:\n",
    "\n",
-    "$$ f(\\mathbf{x}) = \\mathbf{Ax} + \\mathbf{Bu} + \\mathbf{w}$$\n",
+    "$$ f(\\mathbf x) = \\mathbf{Ax} + \\mathbf{Bu} + \\mathbf{w}$$\n",
    "\n",
    "And that's it. That is one of the equations that Dr. Kalman set out to solve, and he found a way to compute an optimal solution if we assume certain properties of $w$."
   ]
@ -356,7 +356,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The equation $\\mathbf{v} = \\frac{d \\mathbf{x}}{d t}$ is the simplest possible differential equation. We trivially integrate it into the high school physics equation $x = v\\Delta t + x_0$. We then use a computer to compute the position $\\mathbf{x}$ for any arbitrary time step $\\Delta t$. We call this equation *discretized* because we can compute the state for a discrete time step $\\Delta t$. Almost all other differential equations encountered in physical systems will not yield to this approach. \n",
+    "The equation $\\mathbf{v} = \\frac{d \\mathbf x}{d t}$ is the simplest possible differential equation. We trivially integrate it into the high school physics equation $x = v\\Delta t + x_0$. We then use a computer to compute the position $\\mathbf x$ for any arbitrary time step $\\Delta t$. We call this equation *discretized* because we can compute the state for a discrete time step $\\Delta t$. Almost all other differential equations encountered in physical systems will not yield to this approach. \n",
    "\n",
    "*State-space* methods became popular around the time of the Apollo missions, largely due to the work of Dr. Kalman. The idea is simple. Start with a system of $n^{th}$-order differential equations which model a dynamic system. Next, convert the equations into an equivalent set of first-order differential equations. We can then represent that set of first order equations in vector matrix form. These equations are continuous. We wish to computes these equations with a computer, so we use another technique to *discretize* the equations. Once in this form we use the formidable powers of linear algebra to solve the system of equations. The Kalman filter is an example of this power. \n",
    "\n",
@ -392,11 +392,11 @@
    "\n",
    "which we then write as\n",
    "\n",
-    "$$\\frac{d\\mathbf{x}}{dt} = \\mathbf{Ax} + \\mathbf{bu}$$\n",
+    "$$\\frac{d\\mathbf x}{dt} = \\mathbf{Ax} + \\mathbf{bu}$$\n",
    "\n",
-    "which you will recognize this form from the previous section. We are interested in the state $\\mathbf{x}$, not $\\frac{d\\mathbf{x}}{dt}$, so we will need a technique to find the *fundamental matrix* $\\Phi$ for that first order differential equation. The fundamental matrix discretizes the differential equation by propogating the system state from time $t_0$ to time $t_1$ with the recursive equation:\n",
+    "which you will recognize this form from the previous section. We are interested in the state $\\mathbf x$, not $\\frac{d\\mathbf x}{dt}$, so we will need a technique to find the *fundamental matrix* $\\Phi$ for that first order differential equation. The fundamental matrix discretizes the differential equation by propogating the system state from time $t_0$ to time $t_1$ with the recursive equation:\n",
    "\n",
-    "$$\\mathbf{x}(t) = \\Phi(t_1-t_0)\\mathbf{x}(t_0)$$\n",
+    "$$\\mathbf x(t) = \\Phi(t_1-t_0)\\mathbf x(t_0)$$\n",
    "\n",
    "$\\Phi$ is the state transition function matrix $\\mathbf{F}$ that we use in the Kalman filter predict step."
   ]
@ -409,28 +409,28 @@
    "\n",
    "I asserted that we can model dynamic systems with a set of first order differential equations. For example, we already presented the Newtonian equation\n",
    "\n",
-    "$$\\mathbf{v}= \\frac{d \\mathbf{x}}{d t} = \\dot{\\mathbf{x}}$$\n",
+    "$$\\mathbf{v}= \\frac{d \\mathbf x}{d t} = \\dot{\\mathbf x}$$\n",
    "\n",
    "In general terms we can then say that a dynamic system consists of equations of the form\n",
    "\n",
-    "$$ g(t) = \\dot{x}$$\n",
+    "$$ g(t) = \\dot x$$\n",
    "\n",
    "if the behavior of the system depends on time. However, if the system is *time invariant* the equations are of the form\n",
    "\n",
-    "$$ f(x) = \\dot{x}$$\n",
+    "$$ f(x) = \\dot x$$\n",
    "\n",
    "What does *time invariant* mean? Consider a home stereo. If you input a signal $x$ into it at time $t$, it will output some signal $f(x)$. If you instead perform the input at time $t + \\Delta t$ the output signal will be the same except shifted in time. A counterexample is $x(t) = \\sin(t)$, and the sytem is $f(x) = t\\times x(t) = t \\sin(t)$. This is not time invariant; the value will be different at different times due to the multiplication by t.  An aircraft is not time invariant. If you make a control input to the aircraft at a later time it's behavior will be different because it will have burned additional fuel and thus lost weight, drag may be different due to being at a different altitude, and so on.\n",
    "\n",
    "We can solve these equations by integrating each side. The time variant equation is very straightforward. Starting with \n",
    "\n",
-    "$$\\dot{\\mathbf{x}}=\\mathbf{v}$$ we get the expected\n",
+    "$$\\dot{\\mathbf x}=\\mathbf{v}$$ we get the expected\n",
    "\n",
-    "$$ \\int \\dot{\\mathbf{x}}\\mathrm{d}t = \\int \\mathbf{v} \\mathrm{d}t\\\\\n",
+    "$$ \\int \\dot{\\mathbf x}\\mathrm{d}t = \\int \\mathbf{v} \\mathrm{d}t\\\\\n",
    "x = vt + x_0$$\n",
    "\n",
    "However, integrating the time invariant equation is not so straightforward. \n",
    "\n",
-    "$$ \\dot{x} = f(x) \\\\\n",
+    "$$ \\dot x = f(x) \\\\\n",
    "\\frac{dx}{dt} = f(x)\n",
    "$$ \n",
    "\n",
@ -462,32 +462,32 @@
    "\n",
    "Many models of physical systems require second or higher order equations. State-space methods require first-order differential equations. Any higher order system of equations can be converted to a first order set of equations by defining extra variables for the first order terms and then solving. \n",
    "\n",
-    "Let's do an example. Given the system $\\ddot{x} - 6\\dot{x} + 9x = t$ find the first order equations.\n",
+    "Let's do an example. Given the system $\\ddot{x} - 6\\dot x + 9x = t$ find the first order equations.\n",
    "\n",
    "The first step is to isolate the highest order term onto one side of the equation .\n",
    "\n",
-    "$$\\ddot{x} = 6\\dot{x} - 9x + t$$\n",
+    "$$\\ddot{x} = 6\\dot x - 9x + t$$\n",
    "\n",
    "We define two new variables:\n",
    "\n",
    "$$ x_1(t) = x \\\\\n",
-    "x_2(t) = \\dot{x}\n",
+    "x_2(t) = \\dot x\n",
    "$$\n",
    "\n",
    "Now we will substitute these into the original equation and solve, giving us a set of first order equations in terms of these new variables. It is conventional to drop the $(t)$ for notational convenience.\n",
    "\n",
-    "First, we know that $\\dot{x}_1 = x_2$ and that $\\dot{x}_2 = \\ddot{x}$. Therefore\n",
+    "First, we know that $\\dot x_1 = x_2$ and that $\\dot x_2 = \\ddot{x}$. Therefore\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\dot{x}_2 &= \\ddot{x} \\\\\n",
-    "          &= 6\\dot{x} - 9x + t\\\\\n",
+    "\\dot x_2 &= \\ddot{x} \\\\\n",
+    "          &= 6\\dot x - 9x + t\\\\\n",
    "          &= 6x_2-9x_1 + t\n",
    "\\end{aligned}$$\n",
    "\n",
    "Therefore our first order system of equations is\n",
    "\n",
-    "$$\\begin{aligned}\\dot{x}_1 &= x_2 \\\\\n",
-    "\\dot{x}_2 &= 6x_2-9x_1 + t\\end{aligned}$$\n",
+    "$$\\begin{aligned}\\dot x_1 &= x_2 \\\\\n",
+    "\\dot x_2 &= 6x_2-9x_1 + t\\end{aligned}$$\n",
    "\n",
    "If you practice this a bit you will become adept at it. Isolate the highest term, define a new variable and its derivatives, and then substitute."
   ]
@ -500,23 +500,23 @@
    "\n",
    "We express the system equations in state-space form (i.e. using linear algebra equations) with\n",
    "\n",
-    "$$ \\dot{\\mathbf{x}} = \\mathbf{Ax}$$\n",
+    "$$ \\dot{\\mathbf x} = \\mathbf{Ax}$$\n",
    "\n",
-    "Now we assert that we want to find the *fundamental matrix* $\\Phi$ that propagates the state $\\mathbf{x}$ with the equation\n",
+    "Now we assert that we want to find the *fundamental matrix* $\\Phi$ that propagates the state $\\mathbf x$ with the equation\n",
    "\n",
-    "$$\\mathbf{x}(t_1) = \\Phi(t_1-t_0)\\mathbf{x}(t_0)$$\n",
+    "$$\\mathbf x(t_1) = \\Phi(t_1-t_0)\\mathbf x(t_0)$$\n",
    "\n",
    "which we can equivalently write as\n",
    "\n",
-    "$$\\mathbf{x}(t_k) = \\Phi(\\Delta t)\\mathbf{x}(t_{k-1})$$\n",
+    "$$\\mathbf x(t_k) = \\Phi(\\Delta t)\\mathbf x(t_{k-1})$$\n",
    "\n",
-    "You will recognize as the state transition matrix we use in the prediction step of the Kalman filter. We want to compute the value of $\\mathbf{x}$ at time $t$ by multiplying its previous value by some matrix $\\Phi$.\n",
+    "You will recognize as the state transition matrix we use in the prediction step of the Kalman filter. We want to compute the value of $\\mathbf x$ at time $t$ by multiplying its previous value by some matrix $\\Phi$.\n",
    "\n",
    "It is conventional to drop the $t_k$ and use the notation\n",
    "\n",
-    "$$\\mathbf{x}_k = \\Phi(\\Delta t)\\mathbf{x}_{k-1}$$\n",
+    "$$\\mathbf x_k = \\Phi(\\Delta t)\\mathbf x_{k-1}$$\n",
    "\n",
-    "$\\mathbf{x}_k$ does not mean the k$^{th}$ value of $\\mathbf{x}$, but the value of $\\mathbf{x}$ at the k$^{th}$ value of $t$.\n",
+    "$\\mathbf x_k$ does not mean the k$^{th}$ value of $\\mathbf x$, but the value of $\\mathbf x$ at the k$^{th}$ value of $t$.\n",
    "\n",
    "Broadly speaking there are three common ways to find this matrix for Kalman filters. The technique most often used with Kalman filters is to use a Taylor-series expansion. Linear Time Invariant Theory, also known as LTI System Theory, is a second technique. Finally, there are numerical techniques. You may know of others, but these three are what you will most likely encounter in the Kalman filter literature and praxis."
   ]
@ -593,11 +593,11 @@
    "\n",
    "Let's consider tracking an object moving in a vacuum. In one dimension the differential equation for motion with zero acceleration is\n",
    "\n",
-    "$$ v = \\dot{x}\\\\a=\\ddot{x} =0,$$\n",
+    "$$ v = \\dot x\\\\a=\\ddot{x} =0,$$\n",
    "\n",
    "which we can put in state-space matrix form as\n",
    "\n",
-    "$$\\begin{bmatrix}\\dot{x} \\\\ \\ddot{x}\\end{bmatrix} =\\begin{bmatrix}0&1\\\\0&0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}$$\n",
+    "$$\\begin{bmatrix}\\dot x \\\\ \\ddot{x}\\end{bmatrix} =\\begin{bmatrix}0&1\\\\0&0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}$$\n",
    "\n",
    "This is a first order differential equation, so we can set $\\mathbf{F}=\\begin{bmatrix}0&1\\\\0&0\\end{bmatrix}$ and solve the following equation.\n",
    "\n",
@ -617,15 +617,15 @@
    "$$\n",
    "\\begin{aligned}\n",
    "x(t_k) &= \\Phi(\\Delta t)x(t_{k-1}) \\\\\n",
-    "x^- &= \\Phi(\\Delta t)x \\\\\n",
-    "x^- &=\\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}x \\\\\n",
-    "\\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}^- &=\\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}\n",
+    "\\overline x &= \\Phi(\\Delta t)x \\\\\n",
+    "\\overline x &=\\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}x \\\\\n",
+    "\\overline{\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}} &=\\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}\n",
    "\\end{aligned}$$\n",
    "\n",
    "This should look very familiar to you! This is the equation we used in the **Multivariate Kalman Filter** chapter to track a moving object.\n",
    "\n",
    "$$\n",
-    "{\\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix}}^- =\\begin{bmatrix}1&\\Delta t \\\\ 0&1\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}\n",
+    "\\overline{{\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix}}} =\\begin{bmatrix}1&\\Delta t \\\\ 0&1\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "We derived this equation in that chapter by using techniques that are much easier to understand. The advantage of the Taylor series expansion is that we can use it for any arbitrary set of differential equations which are time invariant. \n",
@ -652,11 +652,11 @@
   "source": [
    "### Numerical Solutions\n",
    "\n",
-    "Finally, there are numerous numerical techniques to find $\\Phi$. As filters get larger finding analytical solutions becomes very tedious (though packages like SymPy make it easier). C. F. van Loan [3] has developed a technique that finds both $\\Phi$ and $\\mathbf{Q}$ numerically. Given the continuous model\n",
+    "Finally, there are numerous numerical techniques to find $\\Phi$. As filters get larger finding analytical solutions becomes very tedious (though packages like SymPy make it easier). C. F. van Loan [3] has developed a technique that finds both $\\Phi$ and $\\mathbf Q$ numerically. Given the continuous model\n",
    "\n",
    "$$ x' = Fx + Gu$$\n",
    "\n",
-    "where $u$ is the unity white noise, van Loan's method computes the $\\Phi$ and $\\mathbf{Q}_k$ which discretizes that equation.\n",
+    "where $u$ is the unity white noise, van Loan's method computes the $\\Phi$ and $\\mathbf Q_k$ which discretizes that equation.\n",
    "    \n",
    "I have implemented van Loan's method in `FilterPy`. You may use it as follows:\n",
    "\n",
@ -682,14 +682,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In general the design of the $\\mathbf{Q}$ matrix is among the most difficult aspects of Kalman filter design. This is due to several factors. First, the math requires a good foundation in signal theory. Second, we are trying to model the noise in something for which we have little information. Consider trying to model the process noise for a thrown baseball. We can model it as a sphere moving through the air, but that leave many unknown factors - the wind, ball rotation and spin decay, the coefficient of drag of a ball with stitches, the effects of wind and air density, and so on. We develop the equations for an exact mathematical solution for a given process model, but since the process model is incomplete the result for $\\mathbf{Q}$ will also be incomplete. This has a lot of ramifications for the behavior of the Kalman filter. If $\\mathbf{Q}$ is too small then the filter will be overconfident in its prediction model and will diverge from the actual solution. If $\\mathbf{Q}$ is too large than the filter will be unduly influenced by the noise in the measurements and perform sub-optimally. In practice we spend a lot of time running simulations and evaluating collected data to try to select an appropriate value for $\\mathbf{Q}$. But let's start by looking at the math.\n",
+    "In general the design of the $\\mathbf Q$ matrix is among the most difficult aspects of Kalman filter design. This is due to several factors. First, the math requires a good foundation in signal theory. Second, we are trying to model the noise in something for which we have little information. Consider trying to model the process noise for a thrown baseball. We can model it as a sphere moving through the air, but that leave many unknown factors - the wind, ball rotation and spin decay, the coefficient of drag of a ball with stitches, the effects of wind and air density, and so on. We develop the equations for an exact mathematical solution for a given process model, but since the process model is incomplete the result for $\\mathbf Q$ will also be incomplete. This has a lot of ramifications for the behavior of the Kalman filter. If $\\mathbf Q$ is too small then the filter will be overconfident in its prediction model and will diverge from the actual solution. If $\\mathbf Q$ is too large than the filter will be unduly influenced by the noise in the measurements and perform sub-optimally. In practice we spend a lot of time running simulations and evaluating collected data to try to select an appropriate value for $\\mathbf Q$. But let's start by looking at the math.\n",
    "\n",
    "\n",
    "Let's assume a kinematic system - some system that can be modeled using Newton's equations of motion. We can make a few different assumptions about this process. \n",
    "\n",
    "We have been using a process model of\n",
    "\n",
-    "$$ f(\\mathbf{x}) = \\mathbf{Fx} + \\mathbf{w}$$\n",
+    "$$ f(\\mathbf x) = \\mathbf{Fx} + \\mathbf{w}$$\n",
    "\n",
    "where $\\mathbf{w}$ is the process noise. Kinematic systems are *continuous* - their inputs and outputs can vary at any arbitrary point in time. However, our Kalman filters are *discrete*. We sample the system at regular intervals. Therefore we must find the discrete representation for the noise term in the equation above. This depends on what assumptions we make about the behavior of the noise. We will consider two different models for the noise."
   ]
@ -711,7 +711,7 @@
    "\n",
    "Since the noise is changing continuously we will need to integrate to get the discrete noise for the discretization interval that we have chosen. We will not prove it here, but the equation for the discretization of the noise is\n",
    "\n",
-    "$$\\mathbf{Q} = \\int_0^{\\Delta t} \\Phi(t)\\mathbf{Q_c}\\Phi^\\mathsf{T}(t) dt$$\n",
+    "$$\\mathbf Q = \\int_0^{\\Delta t} \\Phi(t)\\mathbf{Q_c}\\Phi^\\mathsf{T}(t) dt$$\n",
    "\n",
    "where $\\mathbf{Q_c}$ is the continuous noise. This gives us\n",
    "\n",
@ -901,7 +901,7 @@
    "\n",
    "The covariance of the process noise is then\n",
    "\n",
-    "$$Q = E[\\Gamma w(t) w(t) \\Gamma^\\mathsf{T}] = \\Gamma\\sigma^2_v\\Gamma^\\mathsf{T}$$.\n",
+    "$$Q = \\mathbb E[\\Gamma w(t) w(t) \\Gamma^\\mathsf{T}] = \\Gamma\\sigma^2_v\\Gamma^\\mathsf{T}$$.\n",
    "\n",
    "We can compute that with SymPy as follows"
   ]
@ -963,7 +963,7 @@
    "\n",
    "The covariance of the process noise is then\n",
    "\n",
-    "$$Q = E[\\Gamma w(t) w(t) \\Gamma^\\mathsf{T}] = \\Gamma\\sigma^2_v\\Gamma^\\mathsf{T}$$.\n",
+    "$$Q = \\mathbb E[\\Gamma w(t) w(t) \\Gamma^\\mathsf{T}] = \\Gamma\\sigma^2_v\\Gamma^\\mathsf{T}$$.\n",
    "\n",
    "We can compute that with SymPy as follows"
   ]
@ -1030,7 +1030,7 @@
   "source": [
    "### Using FilterPy to Compute Q\n",
    "\n",
-    "FilterPy offers several routines to compute the $\\mathbf{Q}$ matrix. The function `Q_continuous_white_noise()` computes $\\mathbf{Q}$ for a given value for $\\Delta t$ and the spectral density."
+    "FilterPy offers several routines to compute the $\\mathbf Q$ matrix. The function `Q_continuous_white_noise()` computes $\\mathbf Q$ for a given value for $\\Delta t$ and the spectral density."
   ]
  },
  {
@ -1083,7 +1083,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The function `Q_discrete_white_noise()` computes $\\mathbf{Q}$ assuming a piecewise model for the noise."
+    "The function `Q_discrete_white_noise()` computes $\\mathbf Q$ assuming a piecewise model for the noise."
   ]
  },
  {
@ -1135,7 +1135,7 @@
   "source": [
    "### Simplification of Q\n",
    "\n",
-    "Many treatments use a much simpler form for $\\mathbf{Q}$, setting it to zero except for a noise term in the lower rightmost element. Is this justified? Well, consider the value of $\\mathbf{Q}$ for a small $\\Delta t$"
+    "Many treatments use a much simpler form for $\\mathbf Q$, setting it to zero except for a noise term in the lower rightmost element. Is this justified? Well, consider the value of $\\mathbf Q$ for a small $\\Delta t$"
   ]
  },
  {
@ -1171,16 +1171,16 @@
   "source": [
    "We can see that most of the terms are very small. Recall that the only Kalman filter using this matrix is\n",
    "\n",
-    "$$ \\mathbf{P}=\\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$$\n",
+    "$$ \\mathbf P=\\mathbf{FPF}^\\mathsf{T} + \\mathbf Q$$\n",
    "\n",
-    "If the values for $\\mathbf{Q}$ are small relative to $\\mathbf{P}$\n",
-    "than it will be contributing almost nothing to the computation of $\\mathbf{P}$. Setting $\\mathbf{Q}$ to the zero matrix except for the lower right term\n",
+    "If the values for $\\mathbf Q$ are small relative to $\\mathbf P$\n",
+    "than it will be contributing almost nothing to the computation of $\\mathbf P$. Setting $\\mathbf Q$ to the zero matrix except for the lower right term\n",
    "\n",
-    "$$\\mathbf{Q}=\\begin{bmatrix}0&0&0\\\\0&0&0\\\\0&0&\\sigma^2\\end{bmatrix}$$\n",
+    "$$\\mathbf Q=\\begin{bmatrix}0&0&0\\\\0&0&0\\\\0&0&\\sigma^2\\end{bmatrix}$$\n",
    "\n",
    "while not correct, is often a useful approximation. If you do this you will have to perform quite a few studies to guarantee that your filter works in a variety of situations. \n",
    "\n",
-    "If you do this, 'lower right term' means the most rapidly changing term for each variable. If the state is $x=\\begin{bmatrix}x & \\dot{x} & \\ddot{x} & y & \\dot{y} & \\ddot{y}\\end{bmatrix}^\\mathsf{T}$ Then Q will be 6x6; the elements for both $\\ddot{x}$ and $\\ddot{y}$ will have to be set to non-zero in $\\mathbf{Q}$."
+    "If you do this, 'lower right term' means the most rapidly changing term for each variable. If the state is $x=\\begin{bmatrix}x & \\dot x & \\ddot{x} & y & \\dot{y} & \\ddot{y}\\end{bmatrix}^\\mathsf{T}$ Then Q will be 6x6; the elements for both $\\ddot{x}$ and $\\ddot{y}$ will have to be set to non-zero in $\\mathbf Q$."
   ]
  },
  {
@ -1196,7 +1196,7 @@
   "source": [
    "So far in this book we have been working with systems that can be expressed with simple linear differential equations such as\n",
    "\n",
-    "$$v = \\dot{x} = \\frac{dx}{dt}$$\n",
+    "$$v = \\dot x = \\frac{dx}{dt}$$\n",
    "\n",
    "which we can integrate into a closed form solution, in this case $x(t) =vt + x_0$. This equation is then put into the system matrix $\\mathbf{F}$, which allows the Kalman filter equations to predict the system state in the future. For example, our constant velocity filters use\n",
    "\n",
@ -1212,13 +1212,13 @@
    "\n",
    "$$\\mathbf{\\bar{x}} = \\begin{bmatrix}\n",
    "1 & \\Delta t \\\\ 0 & 1\\end{bmatrix}\\begin{bmatrix}\n",
-    "x\\\\ \\dot{x}\\end{bmatrix}\n",
+    "x\\\\ \\dot x\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "which multiplies out to \n",
    "\n",
    "$$\\begin{aligned}\\bar{x} &= x + v\\Delta t \\\\\n",
-    "\\bar{\\dot{x}} &= \\dot{x}\\end{aligned}$$.\n",
+    "\\bar{\\dot x} &= \\dot x\\end{aligned}$$.\n",
    "\n",
    "This works for linear ordinary differential equations (ODEs), but does not work well for nonlinear equations. For example, consider trying to predict the position of a rapidly turning car. Cars turn by pivoting the front wheels, which cause the car to pivot around the rear axle. Therefore the path will be continuously varying and a linear prediction will necessarily produce an incorrect value. If the change in the system is small enough relative to $\\Delta t$ this can often produce adequate results, but that will rarely be the case with the nonlinear Kalman filters we will be studying in subsequent chapters. Another problem is that even trivial systems produce differential equations for which finding closed form solutions is difficult or impossible. \n",
    "\n",
@ -1571,11 +1571,11 @@
    "$$\n",
    "\\begin{aligned}\n",
    "\\mathbf{\\bar{x}} &= \\mathbf{F x} + \\mathbf{B u} \\\\\n",
-    "\\mathbf{\\bar{P}} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}\n",
+    "\\mathbf{\\bar{P}} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf Q\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
-    "For a univariate problem the state $\\mathbf{x}$ only has one variable, so it is a $1\\times 1$ matrix. Our motion $\\mathbf{u}$ is also a $1\\times 1$ matrix. Therefore, $\\mathbf{F}$ and $\\mathbf{B}$ must also be $1\\times 1$ matrices. That means that they are all scalars, and we can write\n",
+    "For a univariate problem the state $\\mathbf x$ only has one variable, so it is a $1\\times 1$ matrix. Our motion $\\mathbf{u}$ is also a $1\\times 1$ matrix. Therefore, $\\mathbf{F}$ and $\\mathbf B$ must also be $1\\times 1$ matrices. That means that they are all scalars, and we can write\n",
    "\n",
    "$$\\bar{x} = Fx + Bu$$\n",
    "\n",
@ -1591,9 +1591,9 @@
    "\n",
    "Hopefully the general process is clear, so now I will go a bit faster on the rest. We have\n",
    "\n",
-    "$$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$$\n",
+    "$$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T} + \\mathbf Q$$\n",
    "\n",
-    "Again, since our state only has one variable $\\mathbf{P}$ and $\\mathbf{Q}$ must also be $1\\times 1$ matrix, which we can treat as scalars, yielding  \n",
+    "Again, since our state only has one variable $\\mathbf P$ and $\\mathbf Q$ must also be $1\\times 1$ matrix, which we can treat as scalars, yielding  \n",
    "\n",
    "$$\\bar{P} = FPF^\\mathsf{T} + Q$$\n",
    "\n",
@ -1611,10 +1611,10 @@
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\mathbf{K}&= \\mathbf{\\bar{P}H}^\\mathsf{T} (\\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R})^{-1} \\\\\n",
-    "\\textbf{y} &= \\mathbf{z} - \\mathbf{H \\bar{x}}\\\\\n",
-    "\\mathbf{x}&=\\mathbf{\\bar{x}} +\\mathbf{K\\textbf{y}} \\\\\n",
-    "\\mathbf{P}&= (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}\n",
+    "\\mathbf{K}&= \\mathbf{\\bar{P}H}^\\mathsf{T} (\\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf R)^{-1} \\\\\n",
+    "\\textbf{y} &= \\mathbf z - \\mathbf{H \\bar{x}}\\\\\n",
+    "\\mathbf x&=\\mathbf{\\bar{x}} +\\mathbf{K\\textbf{y}} \\\\\n",
+    "\\mathbf P&= (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
@ -1740,7 +1740,7 @@
    "\n",
    "The end result is multiplying the residual of the two measurements by a constant and adding to our previous value, which is the $g$ equation for the g-h filter. $g$ is the variance of the new estimate divided by the variance of the measurement. Of course in this case $g$ is not a constant as it varies with each time step as the variance changes. We can also derive the formula for $h$ in the same way. It is not a particularly illuminating derivation and I will skip it. The end result is\n",
    "\n",
-    "$$h_n = \\frac{COV (x,\\dot{x})}{\\sigma^2_{y}}$$\n",
+    "$$h_n = \\frac{COV (x,\\dot x)}{\\sigma^2_{y}}$$\n",
    "\n",
    "The takeaway point is that $g$ and $h$ are specified fully by the variance and covariances of the measurement and predictions at time $n$. In other words, we are picking a point between the measurement and prediction by a scale factor determined by the quality of each of those two inputs."
   ]
@ -1801,23 +1801,23 @@
    "\n",
    "Let's start with some definitions which should be familiar to you. First, we define the innovation as \n",
    "\n",
-    "$$\\delta \\mathbf{\\bar{z}}= \\mathbf{z} - h(\\mathbf{\\bar{x}})$$\n",
+    "$$\\delta \\mathbf{\\bar{z}}= \\mathbf z - h(\\mathbf{\\bar{x}})$$\n",
    "\n",
-    "where $\\mathbf{z}$ is the measurement, $h(\\bullet)$ is the measurement function, and $\\delta \\mathbf{\\bar{z}}$ is the innovation, which we abbreviate as $y$ in FilterPy. In other words, this is the equation $\\mathbf{y} = \\mathbf{z} - \\mathbf{H\\bar{x}}$ in the linear Kalman filter's update step.\n",
+    "where $\\mathbf z$ is the measurement, $h(\\bullet)$ is the measurement function, and $\\delta \\mathbf{\\bar{z}}$ is the innovation, which we abbreviate as $y$ in FilterPy. In other words, this is the equation $\\mathbf{y} = \\mathbf z - \\mathbf{H\\bar{x}}$ in the linear Kalman filter's update step.\n",
    "\n",
    "Next, the *measurement residual* is\n",
    "\n",
-    "$$\\delta \\mathbf{z}^+ = \\mathbf{z} - h(\\mathbf{x}^+)$$\n",
+    "$$\\delta \\mathbf z^+ = \\mathbf z - h(\\mathbf x^+)$$\n",
    "\n",
-    "I don't use the plus superscript much because I find it quickly makes the equations unreadable, but $\\mathbf{x}^+$ is the *a posteriori* state estimate, which is the predicted or unknown future state. In other words, the predict step of the linear Kalman filter computes this value. Here it is stands for the value of x which the ILS algorithm will compute on each iteration.\n",
+    "I don't use the plus superscript much because I find it quickly makes the equations unreadable, but $\\mathbf x^+$ is the *a posteriori* state estimate, which is the predicted or unknown future state. In other words, the predict step of the linear Kalman filter computes this value. Here it is stands for the value of x which the ILS algorithm will compute on each iteration.\n",
    "\n",
    "These equations give us the following linear algebra equation:\n",
    "\n",
-    "$$\\delta \\mathbf{z}^- = \\mathbf{H}\\delta \\mathbf{x} + \\delta \\mathbf{z}^+$$\n",
+    "$$\\delta \\mathbf z^- = \\mathbf H\\delta \\mathbf x + \\delta \\mathbf z^+$$\n",
    "\n",
-    "$\\mathbf{H}$ is our measurement function, defined as\n",
+    "$\\mathbf H$ is our measurement function, defined as\n",
    "\n",
-    "$$\\mathbf{H} = \\frac{d\\mathbf{h}}{d\\mathbf{x}} = \\frac{d\\mathbf{z}}{d\\mathbf{x}}$$\n",
+    "$$\\mathbf H = \\frac{d\\mathbf H}{d\\mathbf x} = \\frac{d\\mathbf z}{d\\mathbf x}$$\n",
    "\n",
    "We find the minimum of an equation by taking the derivative and setting it to zero. In this case we want to minimize the square of the residuals, so our equation is"
   ]
@ -1826,29 +1826,29 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "$$ \\frac{\\partial}{\\partial \\mathbf{x}}({\\delta \\mathbf{z}^+}^\\mathsf{T}\\delta \\mathbf{z}^+) = 0,$$\n",
+    "$$ \\frac{\\partial}{\\partial \\mathbf x}({\\delta \\mathbf z^+}^\\mathsf{T}\\delta \\mathbf z^+) = 0,$$\n",
    "\n",
    "where\n",
    "\n",
-    "$$\\delta \\mathbf{z}^+=\\delta \\mathbf{z}^- - \\mathbf{H}\\delta \\mathbf{x}.$$\n",
+    "$$\\delta \\mathbf z^+=\\delta \\mathbf z^- - \\mathbf H\\delta \\mathbf x.$$\n",
    "\n",
-    "Here I have switched to using the matrix $\\mathbf{H}$ as the measurement function. We want to use linear algebra to peform the ILS, so for each step we will have to compute the matrix $\\mathbf{H}$ which corresponds to $h(\\mathbf{x^-})$ during each iteration.  $h(\\bullet)$ is usually nonlinear for these types of problems so you will have to linearize it at each step (more about this soon).\n",
+    "Here I have switched to using the matrix $\\mathbf H$ as the measurement function. We want to use linear algebra to peform the ILS, so for each step we will have to compute the matrix $\\mathbf H$ which corresponds to $h(\\mathbf{x^-})$ during each iteration.  $h(\\bullet)$ is usually nonlinear for these types of problems so you will have to linearize it at each step (more about this soon).\n",
    "\n",
    "For various reasons you may want to weigh some measurement more than others. For example, the geometry of the problem might favor orthogonal measurements, or some measurements may be more noisy than others. We can do that with the equation\n",
    "\n",
-    "$$ \\frac{\\partial}{\\partial \\mathbf{x}}({\\delta \\mathbf{z}^+}^\\mathsf{T}\\mathbf{W}\\delta \\mathbf{z}^+) = 0$$\n",
+    "$$ \\frac{\\partial}{\\partial \\mathbf x}({\\delta \\mathbf z^+}^\\mathsf{T}\\mathbf{W}\\delta \\mathbf z^+) = 0$$\n",
    "\n",
-    "If we solve the first equation for ${\\delta \\mathbf{x}}$ (the derivation is shown in the next section) we get\n",
+    "If we solve the first equation for ${\\delta \\mathbf x}$ (the derivation is shown in the next section) we get\n",
    "\n",
-    "$${\\delta \\mathbf{x}} = {{(\\mathbf{H}^\\mathsf{T}\\mathbf{H})^{-1}}\\mathbf{H}^\\mathsf{T} \\delta \\mathbf{z}^-}\n",
+    "$${\\delta \\mathbf x} = {{(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}}\\mathbf H^\\mathsf{T} \\delta \\mathbf z^-}\n",
    "$$\n",
    "\n",
    "And the second equation yields\n",
    "\n",
-    "$${\\delta \\mathbf{x}} = {{(\\mathbf{H}^\\mathsf{T}\\mathbf{WH})^{-1}}\\mathbf{H}^\\mathsf{T}\\mathbf{W} \\delta \\mathbf{z}^-}\n",
+    "$${\\delta \\mathbf x} = {{(\\mathbf H^\\mathsf{T}\\mathbf{WH})^{-1}}\\mathbf H^\\mathsf{T}\\mathbf{W} \\delta \\mathbf z^-}\n",
    "$$\n",
    "\n",
-    "Since the equations are overdetermined we cannot solve these equations exactly so we use an iterative approach. An initial guess for the position is made, and this guess is used to compute  for $\\delta \\mathbf{x}$ via the equation above. $\\delta \\mathbf{x}$ is added to the intial guess, and this new state is fed back into the equation to produce another $\\delta \\mathbf{x}$. We iterate in this manner until the difference in the measurement residuals is suitably small."
+    "Since the equations are overdetermined we cannot solve these equations exactly so we use an iterative approach. An initial guess for the position is made, and this guess is used to compute  for $\\delta \\mathbf x$ via the equation above. $\\delta \\mathbf x$ is added to the intial guess, and this new state is fed back into the equation to produce another $\\delta \\mathbf x$. We iterate in this manner until the difference in the measurement residuals is suitably small."
   ]
  },
  {
@ -1862,18 +1862,18 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "I will implement the ILS in code, but first let's derive the equation for $\\delta \\mathbf{x}$. You can skip the derivation if you want, but it is somewhat instructive and not too hard if you know basic linear algebra and partial differential equations.\n",
+    "I will implement the ILS in code, but first let's derive the equation for $\\delta \\mathbf x$. You can skip the derivation if you want, but it is somewhat instructive and not too hard if you know basic linear algebra and partial differential equations.\n",
    "\n",
-    "Substituting $\\delta \\mathbf{z}^+=\\delta \\mathbf{z}^- - \\mathbf{H}\\delta \\mathbf{x}$ into the partial differential equation we get\n",
+    "Substituting $\\delta \\mathbf z^+=\\delta \\mathbf z^- - \\mathbf H\\delta \\mathbf x$ into the partial differential equation we get\n",
    "\n",
-    "$$ \\frac{\\partial}{\\partial \\mathbf{x}}(\\delta \\mathbf{z}^- -\\mathbf{H} \\delta \\mathbf{x})^\\mathsf{T}(\\delta \\mathbf{z}^- - \\mathbf{H} \\delta \\mathbf{x})=0$$\n",
+    "$$ \\frac{\\partial}{\\partial \\mathbf x}(\\delta \\mathbf z^- -\\mathbf H \\delta \\mathbf x)^\\mathsf{T}(\\delta \\mathbf z^- - \\mathbf H \\delta \\mathbf x)=0$$\n",
    "\n",
    "which expands to\n",
    "\n",
-    "$$ \\frac{\\partial}{\\partial \\mathbf{x}}({\\delta \\mathbf{x}}^\\mathsf{T}\\mathbf{H}^\\mathsf{T}\\mathbf{H}\\delta \\mathbf{x} - \n",
-    "{\\delta \\mathbf{x}}^\\mathsf{T}\\mathbf{H}^\\mathsf{T}\\delta \\mathbf{z}^- - \n",
-    "{\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H}\\delta \\mathbf{x} +\n",
-    "{\\delta \\mathbf{z}^-}^\\mathsf{T}\\delta \\mathbf{z}^-)=0$$"
+    "$$ \\frac{\\partial}{\\partial \\mathbf x}({\\delta \\mathbf x}^\\mathsf{T}\\mathbf H^\\mathsf{T}\\mathbf H\\delta \\mathbf x - \n",
+    "{\\delta \\mathbf x}^\\mathsf{T}\\mathbf H^\\mathsf{T}\\delta \\mathbf z^- - \n",
+    "{\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H\\delta \\mathbf x +\n",
+    "{\\delta \\mathbf z^-}^\\mathsf{T}\\delta \\mathbf z^-)=0$$"
   ]
  },
  {
@ -1882,66 +1882,66 @@
   "source": [
    "We know that \n",
    "\n",
-    "$$\\frac{\\partial \\mathbf{A}^\\mathsf{T}\\mathbf{B}}{\\partial \\mathbf{B}} = \\frac{\\partial \\mathbf{B}^\\mathsf{T}\\mathbf{A}}{\\partial \\mathbf{B}} = \\mathbf{A}^\\mathsf{T}$$\n",
+    "$$\\frac{\\partial \\mathbf{A}^\\mathsf{T}\\mathbf B}{\\partial \\mathbf B} = \\frac{\\partial \\mathbf B^\\mathsf{T}\\mathbf{A}}{\\partial \\mathbf B} = \\mathbf{A}^\\mathsf{T}$$\n",
    "\n",
    "Therefore the third term can be computed as\n",
    "\n",
-    "$$\\frac{\\partial}{\\partial \\mathbf{x}}{\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H}\\delta \\mathbf{x} = {\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H}$$\n",
+    "$$\\frac{\\partial}{\\partial \\mathbf x}{\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H\\delta \\mathbf x = {\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H$$\n",
    "\n",
    "and the second term as\n",
    "\n",
-    "$$\\frac{\\partial}{\\partial \\mathbf{x}}{\\delta \\mathbf{x}}^\\mathsf{T}\\mathbf{H}^\\mathsf{T}\\delta \\mathbf{z}^-={\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H}$$\n",
+    "$$\\frac{\\partial}{\\partial \\mathbf x}{\\delta \\mathbf x}^\\mathsf{T}\\mathbf H^\\mathsf{T}\\delta \\mathbf z^-={\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H$$\n",
    "\n",
    "We also know that\n",
-    "$$\\frac{\\partial \\mathbf{B}^\\mathsf{T}\\mathbf{AB}}{\\partial \\mathbf{B}} = \\mathbf{B}^\\mathsf{T}(\\mathbf{A} + \\mathbf{A}^\\mathsf{T})$$\n",
+    "$$\\frac{\\partial \\mathbf B^\\mathsf{T}\\mathbf{AB}}{\\partial \\mathbf B} = \\mathbf B^\\mathsf{T}(\\mathbf{A} + \\mathbf{A}^\\mathsf{T})$$\n",
    "\n",
    "Therefore the first term becomes\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\frac{\\partial}{\\partial \\mathbf{x}}{\\delta \\mathbf{x}}^\\mathsf{T}\\mathbf{H}^\\mathsf{T}\\mathbf{H}\\delta \\mathbf{x} &= {\\delta \\mathbf{x}}^\\mathsf{T}(\\mathbf{H}^\\mathsf{T}\\mathbf{H} + {\\mathbf{H}^\\mathsf{T}\\mathbf{H}}^\\mathsf{T})\\\\\n",
-    "&= {\\delta \\mathbf{x}}^\\mathsf{T}(\\mathbf{H}^\\mathsf{T}\\mathbf{H} + \\mathbf{H}^\\mathsf{T}\\mathbf{H}) \\\\\n",
-    "&= 2{\\delta \\mathbf{x}}^\\mathsf{T}\\mathbf{H}^\\mathsf{T}\\mathbf{H}\n",
+    "\\frac{\\partial}{\\partial \\mathbf x}{\\delta \\mathbf x}^\\mathsf{T}\\mathbf H^\\mathsf{T}\\mathbf H\\delta \\mathbf x &= {\\delta \\mathbf x}^\\mathsf{T}(\\mathbf H^\\mathsf{T}\\mathbf H + {\\mathbf H^\\mathsf{T}\\mathbf H}^\\mathsf{T})\\\\\n",
+    "&= {\\delta \\mathbf x}^\\mathsf{T}(\\mathbf H^\\mathsf{T}\\mathbf H + \\mathbf H^\\mathsf{T}\\mathbf H) \\\\\n",
+    "&= 2{\\delta \\mathbf x}^\\mathsf{T}\\mathbf H^\\mathsf{T}\\mathbf H\n",
    "\\end{aligned}$$\n",
    "\n",
    "Finally, the fourth term is\n",
    "\n",
-    "$$ \\frac{\\partial}{\\partial \\mathbf{x}}\n",
-    "{\\delta \\mathbf{z}^-}^\\mathsf{T}\\delta \\mathbf{z}^-=0$$\n",
+    "$$ \\frac{\\partial}{\\partial \\mathbf x}\n",
+    "{\\delta \\mathbf z^-}^\\mathsf{T}\\delta \\mathbf z^-=0$$\n",
    "\n",
    "Replacing the terms in the expanded partial differential equation gives us\n",
    "\n",
    "$$\n",
-    " 2{\\delta \\mathbf{x}}^\\mathsf{T}\\mathbf{H}^\\mathsf{T}\\mathbf{H} -\n",
-    " {\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H} - {\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H}\n",
+    " 2{\\delta \\mathbf x}^\\mathsf{T}\\mathbf H^\\mathsf{T}\\mathbf H -\n",
+    " {\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H - {\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H\n",
    " =0\n",
    "$$\n",
    "\n",
-    "$${\\delta \\mathbf{x}}^\\mathsf{T}\\mathbf{H}^\\mathsf{T}\\mathbf{H} -\n",
-    " {\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H} = 0$$\n",
+    "$${\\delta \\mathbf x}^\\mathsf{T}\\mathbf H^\\mathsf{T}\\mathbf H -\n",
+    " {\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H = 0$$\n",
    " \n",
-    "$${\\delta \\mathbf{x}}^\\mathsf{T}\\mathbf{H}^\\mathsf{T}\\mathbf{H} =\n",
-    " {\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H}$$\n",
+    "$${\\delta \\mathbf x}^\\mathsf{T}\\mathbf H^\\mathsf{T}\\mathbf H =\n",
+    " {\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H$$\n",
    "\n",
-    "Multiplying each side by $(\\mathbf{H}^\\mathsf{T}\\mathbf{H})^{-1}$ yields\n",
+    "Multiplying each side by $(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}$ yields\n",
    "\n",
-    "$${\\delta \\mathbf{x}}^\\mathsf{T} =\n",
-    "{\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H}(\\mathbf{H}^\\mathsf{T}\\mathbf{H})^{-1}$$\n",
+    "$${\\delta \\mathbf x}^\\mathsf{T} =\n",
+    "{\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}$$\n",
    "\n",
    "Taking the transpose of each side gives\n",
    "\n",
-    "$${\\delta \\mathbf{x}} = ({{\\delta \\mathbf{z}^-}^\\mathsf{T}\\mathbf{H}(\\mathbf{H}^\\mathsf{T}\\mathbf{H})^{-1}})^\\mathsf{T} \\\\\n",
-    "={{(\\mathbf{H}^\\mathsf{T}\\mathbf{H})^{-1}}^T\\mathbf{H}^\\mathsf{T} \\delta \\mathbf{z}^-} \\\\\n",
-    "={{(\\mathbf{H}^\\mathsf{T}\\mathbf{H})^{-1}}\\mathbf{H}^\\mathsf{T} \\delta \\mathbf{z}^-}\n",
+    "$${\\delta \\mathbf x} = ({{\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}})^\\mathsf{T} \\\\\n",
+    "={{(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}}^T\\mathbf H^\\mathsf{T} \\delta \\mathbf z^-} \\\\\n",
+    "={{(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}}\\mathbf H^\\mathsf{T} \\delta \\mathbf z^-}\n",
    "$$\n",
    "\n",
    "For various reasons you may want to weigh some measurement more than others. We can do that with the equation\n",
    "\n",
-    "$$ \\frac{\\partial}{\\partial \\mathbf{x}}({\\delta \\mathbf{z}}^\\mathsf{T}\\mathbf{W}\\delta \\mathbf{z}) = 0$$\n",
+    "$$ \\frac{\\partial}{\\partial \\mathbf x}({\\delta \\mathbf z}^\\mathsf{T}\\mathbf{W}\\delta \\mathbf z) = 0$$\n",
    "\n",
    "Replicating the math above with the added $\\mathbf{W}$ term results in\n",
    "\n",
-    "$${\\delta \\mathbf{x}} = {{(\\mathbf{H}^\\mathsf{T}\\mathbf{WH})^{-1}}\\mathbf{H}^\\mathsf{T}\\mathbf{W} \\delta \\mathbf{z}^-}\n",
+    "$${\\delta \\mathbf x} = {{(\\mathbf H^\\mathsf{T}\\mathbf{WH})^{-1}}\\mathbf H^\\mathsf{T}\\mathbf{W} \\delta \\mathbf z^-}\n",
    "$$"
   ]
  },
@ -1957,19 +1957,19 @@
   "metadata": {},
   "source": [
    "Our goal is to implement an iterative solution to \n",
-    "$${\\delta \\mathbf{x}} = {{(\\mathbf{H}^\\mathsf{T}\\mathbf{H})^{-1}}\\mathbf{H}^\\mathsf{T} \\delta \\mathbf{z}^-}\n",
+    "$${\\delta \\mathbf x} = {{(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}}\\mathbf H^\\mathsf{T} \\delta \\mathbf z^-}\n",
    "$$\n",
    "\n",
-    "First, we have to compute $\\mathbf{H}$, where $\\mathbf{H} =  d\\mathbf{z}/d\\mathbf{x}$. To keep the example small so the results are easier to interpret we will do this in 2D. Therefore for $n$ satellites $\\mathbf{H}$ expands to\n",
+    "First, we have to compute $\\mathbf H$, where $\\mathbf H =  d\\mathbf z/d\\mathbf x$. To keep the example small so the results are easier to interpret we will do this in 2D. Therefore for $n$ satellites $\\mathbf H$ expands to\n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}\n",
+    "$$\\mathbf H = \\begin{bmatrix}\n",
    "\\frac{\\partial p_1}{\\partial x_1} & \\frac{\\partial p_1}{\\partial y_1} \\\\\n",
    "\\frac{\\partial p_2}{\\partial x_2} & \\frac{\\partial p_2}{\\partial y_2} \\\\\n",
    "\\vdots & \\vdots \\\\\n",
    "\\frac{\\partial p_n}{\\partial x_n} & \\frac{\\partial p_n}{\\partial y_n}\n",
    "\\end{bmatrix}$$\n",
    "\n",
-    "We will linearize $\\mathbf{H}$ by computing the partial for $x$ as\n",
+    "We will linearize $\\mathbf H$ by computing the partial for $x$ as\n",
    "\n",
    "$$ \\frac{estimated\\_x\\_position - satellite\\_x\\_position}{estimated\\_range\\_to\\_satellite}$$\n",
    "\n",
@ -2115,10 +2115,10 @@
   "source": [
    "So let's think about this. The first iteration is essentially performing the computation that the linear Kalman filter computes during the update step:\n",
    "\n",
-    "$$\\mathbf{y} = \\mathbf{z} - \\mathbf{Hx}\\\\\n",
-    "\\mathbf{x} = \\mathbf{x} + \\mathbf{Ky}$$\n",
+    "$$\\mathbf{y} = \\mathbf z - \\mathbf{Hx}\\\\\n",
+    "\\mathbf x = \\mathbf x + \\mathbf{Ky}$$\n",
    "\n",
-    "where the Kalman gain equals one. You can see that despite the very inaccurate initial guess (900, 90) the computed value for $\\mathbf{x}$, (805.4, 205.3), was very close to the actual value of (800, 200). However, it was not perfect. But after three iterations the ILS algorithm was able to find the exact answer. So hopefully it is clear why we use ILS instead of doing the sensor fusion with the Kalman filter - it gives a better result. Of course, we started with a very inaccurate guess; what if the guess was better?"
+    "where the Kalman gain equals one. You can see that despite the very inaccurate initial guess (900, 90) the computed value for $\\mathbf x$, (805.4, 205.3), was very close to the actual value of (800, 200). However, it was not perfect. But after three iterations the ILS algorithm was able to find the exact answer. So hopefully it is clear why we use ILS instead of doing the sensor fusion with the Kalman filter - it gives a better result. Of course, we started with a very inaccurate guess; what if the guess was better?"
   ]
  },
  {
@ -2259,7 +2259,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,
--- a/08-Designing-Kalman-Filters.ipynb
+++ b/08-Designing-Kalman-Filters.ipynb
@ -392,10 +392,10 @@
   "source": [
    "As always, the first step is to choose our state variables. We are tracking in two dimensions and have a sensor that gives us a reading in each of those two dimensions, so we  know that we have the two *observed variables* $x$ and $y$. If we created our Kalman filter using only those two variables the performance would not be very good because we would be ignoring the information velocity can provide to us. We will want to incorporate velocity into our equations as well. I will represent this as\n",
    "\n",
-    "$$\\mathbf{x} = \n",
-    "\\begin{bmatrix}x & \\dot{x} & y & \\dot{y}\\end{bmatrix}^\\mathsf{T}$$\n",
+    "$$\\mathbf x = \n",
+    "\\begin{bmatrix}x & \\dot x & y & \\dot y\\end{bmatrix}^\\mathsf T$$\n",
    "\n",
-    "There is nothing special about this organization. I could have used $\\begin{bmatrix}x & y & \\dot{x} &  \\dot{y}\\end{bmatrix}^\\mathsf{T}$ or something less logical. I just need to be consistent in the rest of the matrices. I like keeping positions and velocities next to each other because it keeps the covariances between positions and velocities in the same sub block of the covariance matrix. In my formulation `P[1,0]` contains the covariance of of $x$ and $\\dot{x}$. In the alternative formulation that covariance is at `P[2, 0]`. This gets worse as the number of dimension increases (e.g. 3D space, accelerations).\n",
+    "There is nothing special about this organization. I could have used $\\begin{bmatrix}x & y & \\dot x &  \\dot y\\end{bmatrix}^\\mathsf T$ or something less logical. I just need to be consistent in the rest of the matrices. I like keeping positions and velocities next to each other because it keeps the covariances between positions and velocities in the same sub block of the covariance matrix. In my formulation `P[1,0]` contains the covariance of of $x$ and $\\dot x$. In the alternative formulation that covariance is at `P[2, 0]`. This gets worse as the number of dimension increases (e.g. 3D space, accelerations).\n",
    "\n",
    "Let's pause and address how you identify the hidden variables. This example is somewhat obvious because we've already worked through the 1D case, but other problems won't be obvious There is no easy answer to this question. The first thing to ask yourself is what is the interpretation of the first and second derivatives of the data from the sensors. We do that because obtaining the first and second derivatives is mathematically trivial if you are reading from the sensors using a fixed time step. The first derivative is just the difference between two successive readings. In our tracking case the first derivative has an obvious physical interpretation: the difference between two successive positions is velocity. \n",
    "\n",
@ -413,18 +413,18 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Our next step is to design the state transition function. Recall that the state transition function is implemented as a matrix $\\mathbf{F}$ that we multiply with the previous state of our system to get the next state, like so. \n",
+    "Our next step is to design the state transition function. Recall that the state transition function is implemented as a matrix $\\mathbf F$ that we multiply with the previous state of our system to get the next state, like so. \n",
    "\n",
-    "$$\\mathbf{\\bar{x}} = \\mathbf{Fx}$$\n",
+    "$$\\mathbf{\\overline x} = \\mathbf{Fx}$$\n",
    "\n",
    "I will not belabor this as it is very similar to the 1-D case we did in the previous chapter. The state equations are\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "x &= 1x + \\Delta t \\dot{x} + 0y + 0 \\dot{y} \\\\\n",
-    "v_x &= 0x + 1\\dot{x} + 0y + 0 \\dot{y} \\\\\n",
-    "y &= 0x + 0\\dot{x} + 1y + \\Delta t \\dot{y} \\\\\n",
-    "v_y &= 0x + 0\\dot{x} + 0y + 1 \\dot{y}\n",
+    "x &= 1x + \\Delta t \\dot x + 0y + 0 \\dot y \\\\\n",
+    "v_x &= 0x + 1\\dot x + 0y + 0 \\dot y \\\\\n",
+    "y &= 0x + 0\\dot x + 1y + \\Delta t \\dot y \\\\\n",
+    "v_y &= 0x + 0\\dot x + 0y + 1 \\dot y\n",
    "\\end{aligned}\n",
    "$$"
   ]
@ -433,10 +433,10 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Laying it out that way shows us both the values and row-column organization required for $\\small\\mathbf{F}$. In linear algebra, we would write this as:\n",
+    "Laying it out that way shows us both the values and row-column organization required for $\\small\\mathbf F$. In linear algebra, we would write this as:\n",
    "\n",
    "$$\n",
-    "\\begin{bmatrix}x \\\\ \\dot{x} \\\\ y \\\\ \\dot{y}\\end{bmatrix}^- = \\begin{bmatrix}1& \\Delta t& 0& 0\\\\0& 1& 0& 0\\\\0& 0& 1& \\Delta t\\\\ 0& 0& 0& 1\\end{bmatrix}\\begin{bmatrix}x \\\\ \\dot{x} \\\\ y \\\\ \\dot{y}\\end{bmatrix}$$\n",
+    "\\begin{bmatrix}x \\\\ \\dot x \\\\ y \\\\ \\dot y\\end{bmatrix}^- = \\begin{bmatrix}1& \\Delta t& 0& 0\\\\0& 1& 0& 0\\\\0& 0& 1& \\Delta t\\\\ 0& 0& 0& 1\\end{bmatrix}\\begin{bmatrix}x \\\\ \\dot x \\\\ y \\\\ \\dot y\\end{bmatrix}$$\n",
    "\n",
    "So, let's do this in Python. It is very simple; the only thing new here is setting `dim_z` to 2. We will see why it is set to 2 in step 4."
   ]
@ -465,7 +465,7 @@
   "source": [
    "### Step 3: Design the Process Noise Matrix\n",
    "\n",
-    "We will use FilterPy's functions to compute the $\\mathbf{Q}$ matrix for us. For simplicity I will assume the noise is a discrete time Wiener process - that it is constant for each time period. This assumption allows me to use a variance to specify how much I think the model changes between steps. Revisit the Kalman Filter Math chapter if this is not clear."
+    "We will use FilterPy's functions to compute the $\\mathbf Q$ matrix for us. For simplicity I will assume the noise is a discrete time Wiener process - that it is constant for each time period. This assumption allows me to use a variance to specify how much I think the model changes between steps. Revisit the Kalman Filter Math chapter if this is not clear."
   ]
  },
  {
@ -500,7 +500,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Here I assume the noise in x and y are independent, so the covariances between any x and y variable should be zero. This allows me to compute $\\mathbf{Q}$ for one dimension, and then use `block_diag` to copy it for the x and y axis. The printed result shows how much noise we add during each evolution (prediction). "
+    "Here I assume the noise in x and y are independent, so the covariances between any x and y variable should be zero. This allows me to compute $\\mathbf Q$ for one dimension, and then use `block_diag` to copy it for the x and y axis. The printed result shows how much noise we add during each evolution (prediction). "
   ]
  },
  {
@ -548,7 +548,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The measurement function $\\mathbf{H}$ defines how we go from the state variables to the measurements using the equation $\\mathbf{z} = \\mathbf{Hx}$. In this case we have measurements for (x,y), so we will design $\\mathbf{z}$ as $\\begin{bmatrix}x & y\\end{bmatrix}^\\mathsf{T}$ which is dimension 2x1. Our state variable is size 4x1. We can deduce the required size for $\\textbf{H}$ by recalling that multiplying a matrix of size MxN by NxP yields a matrix of size MxP. Thus,\n",
+    "The measurement function $\\mathbf H$ defines how we go from the state variables to the measurements using the equation $\\mathbf z = \\mathbf{Hx}$. In this case we have measurements for (x,y), so we will design $\\mathbf z$ as $\\begin{bmatrix}x & y\\end{bmatrix}^\\mathsf T$ which is dimension 2x1. Our state variable is size 4x1. We can deduce the required size for $\\textbf{H}$ by recalling that multiplying a matrix of size MxN by NxP yields a matrix of size MxP. Thus,\n",
    "\n",
    "$$(2\\times 1) = (a\\times b)(4 \\times 1) = (2\\times 4)(4\\times 1)$$\n",
    "\n",
@ -556,7 +556,7 @@
    "\n",
    "Filling in the values for $\\textbf{H}$ is easy because the measurement is the position of the robot, which is the $x$ and $y$ variables of the state $\\textbf{x}$. Let's make this slightly more interesting by deciding we want to change units. The measurements are returned in feet, and that we desire to work in meters. $\\textbf{H}$ changes from state to measurement, so the conversion is $\\mathsf{feet} = \\mathsf{meters} / 0.3048$. This yields\n",
    "\n",
-    "$$\\mathbf{H} =\n",
+    "$$\\mathbf H =\n",
    "\\begin{bmatrix} \n",
    "\\frac{1}{0.3048} & 0 & 0 & 0 \\\\\n",
    "0 & 0 & \\frac{1}{0.3048} & 0\n",
@ -607,7 +607,7 @@
   "source": [
    "In this step we model the noise in our sensor. For now we will make the simple assumption that the $x$ and $y$ variables are independent white Gaussian processes. That is, the noise in x is not in any way dependent on the noise in y, and the noise is normally distributed about the mean 0. For now let's set the variance for $x$ and $y$ to be 5 meters$^2$. They are independent, so there is no covariance, and our off diagonals will be 0. This gives us:\n",
    "\n",
-    "$$\\mathbf{R} = \\begin{bmatrix}\\sigma_x^2 & \\sigma_y\\sigma_x \\\\ \\sigma_x\\sigma_y & \\sigma_{y}^2\\end{bmatrix} \n",
+    "$$\\mathbf R = \\begin{bmatrix}\\sigma_x^2 & \\sigma_y\\sigma_x \\\\ \\sigma_x\\sigma_y & \\sigma_{y}^2\\end{bmatrix} \n",
    "= \\begin{bmatrix}5&0\\\\0&5\\end{bmatrix}$$\n",
    "\n",
    "It is a $2{\\times}2$ matrix because we have 2 sensor inputs, and covariance matrices are always of size $n{\\times}n$ for $n$ variables. In Python we write:"
@ -646,10 +646,10 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "For our simple problem we will set the initial position at (0,0) with a velocity of (0,0). Since that is a pure guess, we will set the covariance matrix $\\small\\mathbf{P}$ to a large value.\n",
+    "For our simple problem we will set the initial position at (0,0) with a velocity of (0,0). Since that is a pure guess, we will set the covariance matrix $\\small\\mathbf P$ to a large value.\n",
    "\n",
-    "$$ \\mathbf{x} = \\begin{bmatrix}0\\\\0\\\\0\\\\0\\end{bmatrix}, \\,\n",
-    "\\mathbf{P} = \\begin{bmatrix}500&0&0&0\\\\0&500&0&0\\\\0&0&500&0\\\\0&0&0&500\\end{bmatrix}$$\n",
+    "$$ \\mathbf x = \\begin{bmatrix}0\\\\0\\\\0\\\\0\\end{bmatrix}, \\,\n",
+    "\\mathbf P = \\begin{bmatrix}500&0&0&0\\\\0&500&0&0\\\\0&0&500&0\\\\0&0&0&500\\end{bmatrix}$$\n",
    "\n",
    "The Python implementation is"
   ]
@ -741,7 +741,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "I encourage you to play with this, setting $\\mathbf{Q}$ and $\\mathbf{R}$ to various values.  However, we did a fair amount of that sort of thing in the last chapters, and we have a lot of material to cover, so I will move on to more complicated cases where we will also have a chance to experience changing these values.\n",
+    "I encourage you to play with this, setting $\\mathbf Q$ and $\\mathbf R$ to various values.  However, we did a fair amount of that sort of thing in the last chapters, and we have a lot of material to cover, so I will move on to more complicated cases where we will also have a chance to experience changing these values.\n",
    "\n",
    "Now I will run the same Kalman filter with the same settings, but also plot the covariance ellipse for $x$ and $y$. First, the code without explanation, so we can see the output. I print the last covariance to see what it looks like. But before you scroll down to look at the results, what do you think it will look like? You have enough information to figure this out but this is still new to you, so don't be discouraged if you get it wrong."
   ]
@ -797,9 +797,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Did you correctly predict what the covariance matrix and plots would look like? Perhaps you were expecting a tilted ellipse, as in the last chapters. If so, recall that in those chapters we were not plotting $x$ against $y$, but $x$ against $\\dot{x}$. $x$ is correlated to $\\dot{x}$, but $x$ is not correlated or dependent on $y$. Therefore our ellipses are not tilted. Furthermore, the noise for both $x$ and $y$ are modeled to have the same value, 5, in $\\mathbf{R}$. If we were to set R to, for example,\n",
+    "Did you correctly predict what the covariance matrix and plots would look like? Perhaps you were expecting a tilted ellipse, as in the last chapters. If so, recall that in those chapters we were not plotting $x$ against $y$, but $x$ against $\\dot x$. $x$ is correlated to $\\dot x$, but $x$ is not correlated or dependent on $y$. Therefore our ellipses are not tilted. Furthermore, the noise for both $x$ and $y$ are modeled to have the same value, 5, in $\\mathbf R$. If we were to set R to, for example,\n",
    "\n",
-    "$$\\mathbf{R} = \\begin{bmatrix}10&0\\\\0&5\\end{bmatrix}$$\n",
+    "$$\\mathbf R = \\begin{bmatrix}10&0\\\\0&5\\end{bmatrix}$$\n",
    "\n",
    "we would be telling the Kalman filter that there is more noise in $x$ than $y$, and our ellipses would be longer than they are tall.\n"
   ]
@ -808,7 +808,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The final P tells us everything we need to know about the correlation between the state variables. If we look at the diagonal alone we see the variance for each variable. In other words $\\mathbf{P}_{0,0}$ is the variance for x, $\\mathbf{P}_{1,1}$ is the variance for $\\dot{x}$, $\\mathbf{P}_{2,2}$ is the variance for y, and $\\mathbf{P}_{3,3}$ is the variance for $\\dot{y}$. We can extract the diagonal of a matrix using `numpy.diag()`."
+    "The final P tells us everything we need to know about the correlation between the state variables. If we look at the diagonal alone we see the variance for each variable. In other words $\\mathbf P_{0,0}$ is the variance for x, $\\mathbf P_{1,1}$ is the variance for $\\dot x$, $\\mathbf P_{2,2}$ is the variance for y, and $\\mathbf P_{3,3}$ is the variance for $\\dot y$. We can extract the diagonal of a matrix using `numpy.diag()`."
   ]
  },
  {
@ -834,7 +834,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The covariance matrix contains four $2{\\times}2$ matrices that you should be able to easily pick out. This is due to the correlation of $x$ to $\\dot{x}$, and of $y$ to $\\dot{y}$. The upper left hand side shows the covariance of $x$ to $\\dot{x}$."
+    "The covariance matrix contains four $2{\\times}2$ matrices that you should be able to easily pick out. This is due to the correlation of $x$ to $\\dot x$, and of $y$ to $\\dot y$. The upper left hand side shows the covariance of $x$ to $\\dot x$."
   ]
  },
  {
@ -873,9 +873,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The covariance contains the data for $x$ and $\\dot{x}$ in the upper left because of how it is organized. Recall that entries $\\mathbf{P}_{i,j}$ and $\\mathbf{P}_{j,i}$ contain $\\sigma_i\\sigma_j$.\n",
+    "The covariance contains the data for $x$ and $\\dot x$ in the upper left because of how it is organized. Recall that entries $\\mathbf P_{i,j}$ and $\\mathbf P_{j,i}$ contain $\\sigma_i\\sigma_j$.\n",
    "\n",
-    "Finally, let's look at the lower left side of $\\mathbf{P}$, which is all 0s. Why 0s? Consider $\\mathbf{P}_{3,0}$. That stores the term $\\sigma_3\\sigma_0$, which is the covariance between $\\dot{y}$ and $x$. These are independent, so the term will be 0. The rest of the terms are for similarly independent variables."
+    "Finally, let's look at the lower left side of $\\mathbf P$, which is all 0s. Why 0s? Consider $\\mathbf P_{3,0}$. That stores the term $\\sigma_3\\sigma_0$, which is the covariance between $\\dot y$ and $x$. These are independent, so the term will be 0. The rest of the terms are for similarly independent variables."
   ]
  },
  {
@ -1026,15 +1026,15 @@
   "source": [
    "A zero order Kalman filter is just a filter that tracks with no derivatives. We are tracking position, so that means we only have a state variable for position (no velocity or acceleration), and the state transition function also only accounts for position. Using the matrix formulation we would say that the state variable is\n",
    "\n",
-    "$$\\mathbf{x} = \\begin{bmatrix}x\\end{bmatrix}$$\n",
+    "$$\\mathbf x = \\begin{bmatrix}x\\end{bmatrix}$$\n",
    "\n",
    "The state transition function is very simple. There is no change in position, so we need to model $x=x$; in other words, *x* at time t+1 is the same as it was at time t. In matrix form, our state transition function is\n",
    "\n",
-    "$$\\mathbf{F} = \\begin{bmatrix}1\\end{bmatrix}$$\n",
+    "$$\\mathbf F = \\begin{bmatrix}1\\end{bmatrix}$$\n",
    "\n",
-    "The measurement function is very easy. Recall that we need to define how to convert the state variable $\\mathbf{x}$ into a measurement. We will assume that our measurements are positions. The state variable only contains a position, so we get\n",
+    "The measurement function is very easy. Recall that we need to define how to convert the state variable $\\mathbf x$ into a measurement. We will assume that our measurements are positions. The state variable only contains a position, so we get\n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}1\\end{bmatrix}$$\n",
+    "$$\\mathbf H = \\begin{bmatrix}1\\end{bmatrix}$$\n",
    "\n",
    "Let's write a function that constructs and returns a zero order Kalman filter."
   ]
@ -1075,7 +1075,7 @@
    "\n",
    "A first order system has position and velocity, so the state variable needs both of these. The matrix formulation could be\n",
    "\n",
-    "$$ \\mathbf{x} = \\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix}$$\n",
+    "$$ \\mathbf x = \\begin{bmatrix}x\\\\\\dot x\\end{bmatrix}$$\n",
    "\n",
    "So now we have to design our state transition. The Newtonian equations for a time step are:\n",
    "\n",
@ -1084,21 +1084,21 @@
    " \n",
    "Recall that we need to convert this into the linear equation\n",
    "\n",
-    "$$\\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix} = \\mathbf{F}\\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix}$$\n",
+    "$$\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix} = \\mathbf F\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix}$$\n",
    "\n",
    "Setting\n",
    "\n",
-    "$$\\mathbf{F} = \\begin{bmatrix}1 &\\Delta t\\\\ 0 & 1\\end{bmatrix}$$\n",
+    "$$\\mathbf F = \\begin{bmatrix}1 &\\Delta t\\\\ 0 & 1\\end{bmatrix}$$\n",
    "\n",
    "gives us the equations above.\n",
    "\n",
    "Finally, we design the measurement function. The measurement function needs to implement\n",
    "\n",
-    "$$z = \\mathbf{Hx}$$\n",
+    "$$\\mathbf z = \\mathbf{Hx}$$\n",
    "\n",
    "Our sensor still only reads position, so it should take the position from the state, and 0 out the velocity, like so:\n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}1 & 0 \\end{bmatrix}$$\n",
+    "$$\\mathbf H = \\begin{bmatrix}1 & 0 \\end{bmatrix}$$\n",
    "\n",
    "This function constructs and returns a first order Kalman filter."
   ]
@ -1138,7 +1138,7 @@
   "source": [
    "A second order Kalman filter tracks a second order system, such as position, velocity and acceleration. The state variable will be\n",
    "\n",
-    "$$ \\mathbf{x} = \\begin{bmatrix}x\\\\\\dot{x}\\\\\\ddot{x}\\end{bmatrix}$$\n",
+    "$$ \\mathbf x = \\begin{bmatrix}x\\\\\\dot x\\\\\\ddot{x}\\end{bmatrix}$$\n",
    "\n",
    "So now we have to design our state transition. The Newtonian equations for a time step are:\n",
    "\n",
@ -1148,11 +1148,11 @@
    " \n",
    "Recall that we need to convert this into the linear equation\n",
    "\n",
-    "$$\\begin{bmatrix}x\\\\\\dot{x}\\\\\\ddot{x}\\end{bmatrix} = \\mathbf{F}\\begin{bmatrix}x\\\\\\dot{x}\\\\\\ddot{x}\\end{bmatrix}$$\n",
+    "$$\\begin{bmatrix}x\\\\\\dot x\\\\\\ddot{x}\\end{bmatrix} = \\mathbf F\\begin{bmatrix}x\\\\\\dot x\\\\\\ddot{x}\\end{bmatrix}$$\n",
    "\n",
    "Setting\n",
    "\n",
-    "$$\\mathbf{F} = \\begin{bmatrix}1 & \\Delta t &.5\\Delta t^2\\\\ \n",
+    "$$\\mathbf F = \\begin{bmatrix}1 & \\Delta t &.5\\Delta t^2\\\\ \n",
    "0 & 1 & \\Delta t \\\\\n",
    "0 & 0 & 1\\end{bmatrix}$$\n",
    "\n",
@ -1164,7 +1164,7 @@
    "\n",
    "Our sensor still only reads position, so it should take the position from the state, and 0 out the velocity, like so:\n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}1 & 0 & 0\\end{bmatrix}$$\n",
+    "$$\\mathbf H = \\begin{bmatrix}1 & 0 & 0\\end{bmatrix}$$\n",
    "\n",
    "This function constructs and returns a second order Kalman filter."
   ]
@ -1207,9 +1207,9 @@
   "source": [
    "Now we can run each Kalman filter against the simulation and evaluate the results. \n",
    "\n",
-    "How do we evaluate the results? We can do this qualitatively by plotting the track and the Kalman filter output and eyeballing the results. However, a rigorous approach uses mathematics. Recall that system covariance matrix $\\mathbf{P}$ contains the computed variance and covariances for each of the state variables. The diagonal contains the variance. Remember that roughly 99% of all measurements fall within $3\\sigma$ if the noise is Gaussian. If this is not clear please review the Gaussian chapter before continuing, as this is an important point. \n",
+    "How do we evaluate the results? We can do this qualitatively by plotting the track and the Kalman filter output and eyeballing the results. However, a rigorous approach uses mathematics. Recall that system covariance matrix $\\mathbf P$ contains the computed variance and covariances for each of the state variables. The diagonal contains the variance. Remember that roughly 99% of all measurements fall within $3\\sigma$ if the noise is Gaussian. If this is not clear please review the Gaussian chapter before continuing, as this is an important point. \n",
    "\n",
-    "So we can evaluate the filter by looking at the residuals between the estimated state and actual state and comparing them to the standard deviations which we derive from $\\mathbf{P}$. If the filter is performing correctly 99% of the residuals will fall within $3\\sigma$. This is true for all the state variables, not just for the position. \n",
+    "So we can evaluate the filter by looking at the residuals between the estimated state and actual state and comparing them to the standard deviations which we derive from $\\mathbf P$. If the filter is performing correctly 99% of the residuals will fall within $3\\sigma$. This is true for all the state variables, not just for the position. \n",
    "\n",
    "I must mention that this is only true for simulated systems. Real sensors are not perfectly Gaussian, and you may need to expand your criteria to, say, $5\\sigma$ with real sensor data.\n",
    "\n",
@ -1455,7 +1455,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We can see that the filter diverges almost immediately. After a few seconds the residual exceeds the bounds of three standard deviations. It is important to understand that the covariance matrix $\\mathbf{P}$ is only reporting the *theoretical* performance of the filter *assuming* all of the inputs are correct. In other words, this Kalman filter is diverging, but $\\mathbf{P}$ implies that the Kalman filter's estimates are getting better and better with time because the variance is getting smaller. The filter has no way to know that you are lying to it about the system. This is sometimes referred to a *smug* filter - it is overconfident in its performance.\n",
+    "We can see that the filter diverges almost immediately. After a few seconds the residual exceeds the bounds of three standard deviations. It is important to understand that the covariance matrix $\\mathbf P$ is only reporting the *theoretical* performance of the filter *assuming* all of the inputs are correct. In other words, this Kalman filter is diverging, but $\\mathbf P$ implies that the Kalman filter's estimates are getting better and better with time because the variance is getting smaller. The filter has no way to know that you are lying to it about the system. This is sometimes referred to a *smug* filter - it is overconfident in its performance.\n",
    "\n",
    "In this system the divergence is immediate and striking. In many systems it will only be gradual, and/or slight. It is important to look at charts like these for your systems to ensure that the performance of the filter is within the bounds of its theoretical performance."
   ]
@ -1581,7 +1581,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "I have one more trick up my sleeve. We have a first order system; i.e. the velocity is more-or-less constant. Real world systems are never perfect, so of course the velocity is never exactly the same between time periods. When we use a first order filter we account for that slight variation in velocity with the *process noise*. The matrix $\\mathbf{Q}$ is computed to account for this slight variation. If we move to a second order filter we are now accounting for the changes in velocity. Perhaps now we have no process noise, and we can set $\\mathbf{Q}$ to zero! "
+    "I have one more trick up my sleeve. We have a first order system; i.e. the velocity is more-or-less constant. Real world systems are never perfect, so of course the velocity is never exactly the same between time periods. When we use a first order filter we account for that slight variation in velocity with the *process noise*. The matrix $\\mathbf Q$ is computed to account for this slight variation. If we move to a second order filter we are now accounting for the changes in velocity. Perhaps now we have no process noise, and we can set $\\mathbf Q$ to zero! "
   ]
  },
  {
@ -1957,11 +1957,11 @@
   "source": [
    "## Evaluating Filter Performance\n",
    "\n",
-    "It is easy to design a Kalman filter for a simulated situation. You know how much noise you are injecting in your process model, so you specify $\\mathbf{Q}$ to have the same value. You also know how much noise is in the measurement simulation, so the measurement noise matrix $\\mathbf{R}$ is equally trivial to define. \n",
+    "It is easy to design a Kalman filter for a simulated situation. You know how much noise you are injecting in your process model, so you specify $\\mathbf Q$ to have the same value. You also know how much noise is in the measurement simulation, so the measurement noise matrix $\\mathbf R$ is equally trivial to define. \n",
    "\n",
    "In practice design is more ad hoc. Real sensors rarely perform to spec, and they rarely perform in a Gaussian manner. They are also easily fooled by environmental noise. For example, circuit noise causes voltage flucuations which can affect the output of a sensor. Creating a process model and noise is even more difficult. Modelling an automobile is very difficult. The steering causes nonlinear behavior, tires slip, people brake and accelerate hard enough to cause tire slips, winds push the car off course. The end result is the Kalman filter is an *inexact* model of the system. This inexactness causes suboptimal behavior, which in the worst case causes the filter to diverge completely. \n",
    "\n",
-    "Because of the unknowns you will be unable to analytically compute the correct values for the filter's matrices. You will start by making the best estimate possible, and then test your filter against a wide variety of simulated and real data. Your evaluation of the performance will guide you towards what changes you need to make to the matrices. We've done some of this already - I've shown you the effect of $\\mathbf{Q}$ being too large or small.\n",
+    "Because of the unknowns you will be unable to analytically compute the correct values for the filter's matrices. You will start by making the best estimate possible, and then test your filter against a wide variety of simulated and real data. Your evaluation of the performance will guide you towards what changes you need to make to the matrices. We've done some of this already - I've shown you the effect of $\\mathbf Q$ being too large or small.\n",
    "\n",
    "Now let's consider more analytical ways of accessing performance. If the Kalman filter is performing optimally its estimation errors (the difference between the true state and the estimated state will have these properties:\n",
    "\n",
@ -1970,13 +1970,13 @@
    "    \n",
    "### Normalized Estimated Error Squared (NEES)\n",
    "\n",
-    "The first method is the most powerful, but only possible in simulations. If you are simulating a system you know its true value, or 'ground truth'. It is then trival to compute the error of the system at any step as the difference between ground truth ($\\mathbf{x}$) and the filter's state estimate ($\\hat{\\mathbf{x}}$):\n",
+    "The first method is the most powerful, but only possible in simulations. If you are simulating a system you know its true value, or 'ground truth'. It is then trival to compute the error of the system at any step as the difference between ground truth ($\\mathbf x$) and the filter's state estimate ($\\hat{\\mathbf x}$):\n",
    "\n",
-    "$$\\tilde{\\mathbf{x}} = \\mathbf{x} - \\hat{\\mathbf{x}}$$\n",
+    "$$\\tilde{\\mathbf x} = \\mathbf x - \\hat{\\mathbf x}$$\n",
    "\n",
    "We can then define the **normalized estimated error squared** (NEES) as \n",
    "\n",
-    "$$\\mathbf{\\epsilon} = \\tilde{\\mathbf{x}}^\\mathsf{T}\\mathbf{P}^{-1}\\tilde{\\mathbf{x}}$$\n",
+    "$$\\mathbf{\\epsilon} = \\tilde{\\mathbf x}^\\mathsf T\\mathbf P^{-1}\\tilde{\\mathbf x}$$\n",
    "\n",
    "To understand this equation lets look at it if the state's dimension is one. In that case both x and P are scalars, so\n",
    "\n",
@ -1986,9 +1986,9 @@
    "\n",
    "So as the covariance matrix gets smaller NEES gets larger for the same error. The covariance matrix is the filter's estimate of it's error, so if it is small relative to the estimation error then it is performing worse than if it is large relative to the same estimation error.\n",
    "\n",
-    "This computation gives us a scalar result. If $\\mathbf{x}$ is dimension ($n \\times 1$), then the dimension of the computation is ($1 \\times n$)($n \\times n$)($n \\times 1$) = ($1 \\times 1$). What do we do with this number?\n",
+    "This computation gives us a scalar result. If $\\mathbf x$ is dimension ($n \\times 1$), then the dimension of the computation is ($1 \\times n$)($n \\times n$)($n \\times 1$) = ($1 \\times 1$). What do we do with this number?\n",
    "\n",
-    "The math is outside the scope of this book, but a random variable in the form $\\tilde{\\mathbf{x}}^\\mathsf{T}\\mathbf{P}^{-1}\\tilde{\\mathbf{x}}$ is said to be **chi-squared distributed with n degrees of freedom**, and thus the expected value of the sequence should be $n$. Bar-Shalom [1] has an excellent discussion of this topic. \n",
+    "The math is outside the scope of this book, but a random variable in the form $\\tilde{\\mathbf x}^\\mathsf T\\mathbf P^{-1}\\tilde{\\mathbf x}$ is said to be **chi-squared distributed with n degrees of freedom**, and thus the expected value of the sequence should be $n$. Bar-Shalom [1] has an excellent discussion of this topic. \n",
    "\n",
    "In plain terms, take the average of all the NESS values, and they should be less then the dimension of x. Let's see that using an example from earlier in the chapter:"
   ]
@ -2063,15 +2063,15 @@
    "The residual and system uncertainty of the filter is defined as\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{y} &= \\mathbf{z} - \\mathbf{H \\bar{x}}\\\\\n",
-    "\\mathbf{S} &= \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R}\n",
+    "\\mathbf y &= \\mathbf z - \\mathbf{H \\overline x}\\\\\n",
+    "\\mathbf S &= \\mathbf{H\\overline{P}H}^\\mathsf T + \\mathbf R\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "Given these we can compute the likelihood function with\n",
    "\n",
    "$$\n",
-    "\\mathcal{L} = \\frac{1}{\\sqrt{2\\pi S}}\\exp [-\\frac{1}{2}\\mathbf{y}^\\mathsf{T}\\mathbf{S}^{-1}\\mathbf{y}]$$\n",
+    "\\mathcal{L} = \\frac{1}{\\sqrt{2\\pi S}}\\exp [-\\frac{1}{2}\\mathbf y^\\mathsf T\\mathbf S^{-1}\\mathbf y]$$\n",
    "\n",
    "That may look complicated, but notice that the exponential is the equation for a Gaussian. This suggests an implementation of\n",
    "\n",
@ -2187,18 +2187,18 @@
    "\n",
    "In this chapter we learned that the equation for the state prediction is:\n",
    "\n",
-    "$$\\bar{\\mathbf{x}} = \\mathbf{Fx} + \\mathbf{Bu}$$\n",
+    "$$\\overline{\\mathbf x} = \\mathbf{Fx} + \\mathbf{Bu}$$\n",
    "\n",
-    "Our state is a vector, so we need to represent the control input as a vector. Here $\\mathbf{u}$ is the control input, and $\\mathbf{B}$ is a matrix that transforms the control input into a change in $\\mathbf{x}$. Let's consider a simple example. Suppose the state is $x = \\begin{bmatrix} x & \\dot{x}\\end{bmatrix}$ for a robot we are controlling and the control input is commanded velocity. This gives us a control input of \n",
+    "Our state is a vector, so we need to represent the control input as a vector. Here $\\mathbf{u}$ is the control input, and $\\mathbf{B}$ is a matrix that transforms the control input into a change in $\\mathbf x$. Let's consider a simple example. Suppose the state is $x = \\begin{bmatrix} x & \\dot x\\end{bmatrix}$ for a robot we are controlling and the control input is commanded velocity. This gives us a control input of \n",
    "\n",
-    "$$\\mathbf{u} = \\begin{bmatrix}\\dot{x}_\\mathtt{cmd}\\end{bmatrix}$$\n",
+    "$$\\mathbf{u} = \\begin{bmatrix}\\dot x_\\mathtt{cmd}\\end{bmatrix}$$\n",
    "\n",
    "For simplicity we will assume that the robot can respond instantly to changes to this input. That means that the new position and velocity after $\\Delta t$ seconds will be\n",
    "\n",
-    "$$\\begin{aligned}x &= x + \\dot{x}_\\mathtt{cmd} \\Delta t \\\\\n",
-    "\\dot{x} &= \\dot{x}_\\mathtt{cmd}\\end{aligned}$$\n",
+    "$$\\begin{aligned}x &= x + \\dot x_\\mathtt{cmd} \\Delta t \\\\\n",
+    "\\dot x &= \\dot x_\\mathtt{cmd}\\end{aligned}$$\n",
    "\n",
-    "We need to represent this set of equation in the form $\\bar{\\mathbf{x}} = \\mathbf{Fx} + \\mathbf{Bu}$."
+    "We need to represent this set of equation in the form $\\overline{\\mathbf x} = \\mathbf{Fx} + \\mathbf{Bu}$."
   ]
  },
  {
@ -2209,8 +2209,8 @@
    "\n",
    "\n",
    "\n",
-    "$$\\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix} = \\begin{bmatrix}1 & 0\\\\0 & 0 \\end{bmatrix}\\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix} +\n",
-    "\\begin{bmatrix}\\Delta t \\\\ 1\\end{bmatrix}\\begin{bmatrix}\\dot{x}_\\mathtt{cmd}\\end{bmatrix}\n",
+    "$$\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix} = \\begin{bmatrix}1 & 0\\\\0 & 0 \\end{bmatrix}\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix} +\n",
+    "\\begin{bmatrix}\\Delta t \\\\ 1\\end{bmatrix}\\begin{bmatrix}\\dot x_\\mathtt{cmd}\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "\n",
@ -2282,23 +2282,23 @@
    "\n",
    "Suppose we have a train or cart on a railway. It has a sensor attached to the wheels counting revolutions, which can be converted to a distance along the track. Then, suppose we have a GPS-like sensor which I'll call a 'position sensor' mounted to the train which reports position. I'll explain why I don't just use a GPS in the next section. Thus, we have two measurements, both reporting position along the track. Suppose further that the accuracy of the wheel sensor is 1m, and the accuracy of the position sensor is 10m. How do we combine these two measurements into one filter? This may seem quite contrived, but aircraft use sensor fusion to fuse the measurements from sensors such as a GPS, INS, Doppler radar, VOR, the airspeed indicator, and more.\n",
    "\n",
-    "Kalman filters for inertial filters are very difficult, but fusing data from two or more sensors providing measurements of the same state variable (such as position) is quite easy. The relevant matrix is the measurement matrix $\\mathbf{H}$. Recall that this matrix tells us how to convert from the Kalman filter's state $\\mathbf{x}$ to a measurement $\\mathbf{z}$. Suppose that we decide that our Kalman filter state should contain the position and velocity of the train, so that\n",
+    "Kalman filters for inertial filters are very difficult, but fusing data from two or more sensors providing measurements of the same state variable (such as position) is quite easy. The relevant matrix is the measurement matrix $\\mathbf H$. Recall that this matrix tells us how to convert from the Kalman filter's state $\\mathbf x$ to a measurement $\\mathbf z$. Suppose that we decide that our Kalman filter state should contain the position and velocity of the train, so that\n",
    "\n",
-    "$$ \\mathbf{x} = \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}$$\n",
+    "$$ \\mathbf x = \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}$$\n",
    "\n",
    "We have two measurements for position, so we will define the measurement vector to be a vector of the measurements from the wheel and the position sensor.\n",
    "\n",
-    "$$ \\mathbf{z} = \\begin{bmatrix}x_{wheel} \\\\ x_{ps}\\end{bmatrix}$$\n",
+    "$$ \\mathbf z = \\begin{bmatrix}x_{wheel} \\\\ x_{ps}\\end{bmatrix}$$\n",
    "\n",
-    "So we have to design the matrix $\\mathbf{H}$ to convert $\\mathbf{x}$ to $\\mathbf{z}$ . They are both positions, so the conversion is nothing more than multiplying by one:\n",
+    "So we have to design the matrix $\\mathbf H$ to convert $\\mathbf x$ to $\\mathbf z$ . They are both positions, so the conversion is nothing more than multiplying by one:\n",
    "\n",
-    "$$ \\begin{bmatrix}x_{wheel} \\\\ x_{ps}\\end{bmatrix} = \\begin{bmatrix}1 &0 \\\\ 1& 0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}$$\n",
+    "$$ \\begin{bmatrix}x_{wheel} \\\\ x_{ps}\\end{bmatrix} = \\begin{bmatrix}1 &0 \\\\ 1& 0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}$$\n",
    "\n",
    "To make it clearer, suppose that the wheel reports not position but the number of rotations of the wheels, where 1 revolution yields 2 meters of travel. In that case we would write\n",
    "\n",
-    "$$ \\begin{bmatrix}x_{wheel} \\\\ x_{ps}\\end{bmatrix} = \\begin{bmatrix}0.5 &0 \\\\ 1& 0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}$$\n",
+    "$$ \\begin{bmatrix}x_{wheel} \\\\ x_{ps}\\end{bmatrix} = \\begin{bmatrix}0.5 &0 \\\\ 1& 0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}$$\n",
    "\n",
-    "Now we have to design the measurement noise matrix $\\mathbf{R}$. Suppose that the measurement variance for the position is twice the variance of the wheel, and the standard deviation of the wheel is 1.5 meters. That gives us\n",
+    "Now we have to design the measurement noise matrix $\\mathbf R$. Suppose that the measurement variance for the position is twice the variance of the wheel, and the standard deviation of the wheel is 1.5 meters. That gives us\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
@ -2309,7 +2309,7 @@
    "\\end{aligned}\n",
    "$$\n",
    "\n",
-    "That is pretty much our Kalman filter design. We need to design for $\\mathbf{Q}$, but that is invariant to whether we are doing sensor fusion or not, so I will just choose some arbitrary value.\n",
+    "That is pretty much our Kalman filter design. We need to design for $\\mathbf Q$, but that is invariant to whether we are doing sensor fusion or not, so I will just choose some arbitrary value.\n",
    "\n",
    "So let's run a simulation of this design. I will assume a velocity of 10 m/s with an update rate of 0.1 seconds."
   ]
@ -2574,7 +2574,7 @@
    "\n",
    "This is a difficult problem that hobbyists face when trying to integrate GPS, IMU's and other off the shelf sensors. \n",
    "\n",
-    "Let's look at the effect. A commercial GPS reports position, and an estimated error range. The estimated error just comes from the Kalman filter's $\\mathbf{P}$ matrix. So let's filter some noisy data, take the filtered output as the new noisy input to the filter, and see what the result is. In other words, $\\mathbf{x}$ will supply the  $\\mathbf{z}$ input, and  $\\mathbf{P}$ will supply the measurement covariance $\\mathbf{R}$. To exaggerate the effects somewhat to make them more obvious I will plot the effects of doing this one, and then a second time. The second iteration doesn't make any 'sense' (no one would try that), it just helps me illustrate a point. First, the code and plots."
+    "Let's look at the effect. A commercial GPS reports position, and an estimated error range. The estimated error just comes from the Kalman filter's $\\mathbf P$ matrix. So let's filter some noisy data, take the filtered output as the new noisy input to the filter, and see what the result is. In other words, $\\mathbf x$ will supply the  $\\mathbf z$ input, and  $\\mathbf P$ will supply the measurement covariance $\\mathbf R$. To exaggerate the effects somewhat to make them more obvious I will plot the effects of doing this one, and then a second time. The second iteration doesn't make any 'sense' (no one would try that), it just helps me illustrate a point. First, the code and plots."
   ]
  },
  {
@ -2654,7 +2654,7 @@
   "source": [
    "We see that the filtered output of the reprocessed signal is smoother, but it also diverges from the track. What is happening? Recall that the Kalman filter requires that the signal not be time correlated. However the output of the Kalman filter *is* time correlated because it incorporates all previous measurements into its estimate for this time period. So look at the last graph, for 2 iterations. The measurements start with several peaks that are larger than the track. This is 'remembered' (that is vague terminology, but I am trying to avoid the math) by the filter, and it has started to compute that the object is above the track. Later, at around 13 seconds we have a period where the measurements all happen to be below the track. This also gets incorporated into the memory of the filter, and the iterated output diverges far below the track. \n",
    "\n",
-    "Now let's look at this in a different way. The iterated output is *not* using $\\mathbf{z}$ as the measurement, but the output of the previous Kalman filter estimate. So I will plot the output of the filter against the previous filter's output."
+    "Now let's look at this in a different way. The iterated output is *not* using $\\mathbf z$ as the measurement, but the output of the previous Kalman filter estimate. So I will plot the output of the filter against the previous filter's output."
   ]
  },
  {
@ -2846,7 +2846,7 @@
    "So far we have assumed that the various matrices in the Kalman filter are *stationary* - nonchanging over time. For example, in the robot tracking section we assumed that $\\Delta t = 1.0$ seconds, and designed the state transition matrix to be\n",
    "\n",
    "$$\n",
-    "\\mathbf{F} = \\begin{bmatrix}1& \\Delta t& 0& 0\\\\0& 1& 0& 0\\\\0& 0& 1& \\Delta t\\\\ 0& 0& 0& 1\\end{bmatrix} = \\begin{bmatrix}1& 1& 0& 0\\\\0& 1& 0& 0\\\\0& 0& 1& 1\\\\ 0& 0& 0& 1\\end{bmatrix}$$\n",
+    "\\mathbf F = \\begin{bmatrix}1& \\Delta t& 0& 0\\\\0& 1& 0& 0\\\\0& 0& 1& \\Delta t\\\\ 0& 0& 0& 1\\end{bmatrix} = \\begin{bmatrix}1& 1& 0& 0\\\\0& 1& 0& 0\\\\0& 0& 1& 1\\\\ 0& 0& 0& 1\\end{bmatrix}$$\n",
    "\n",
    "But what if our data rate changes in some unpredictable manner? Or what if we have two sensors, each running at a different rate? What if the error of the measurement changes? \n",
    "\n",
@ -2854,8 +2854,8 @@
    "\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{\\bar{x}} &= {\\begin{bmatrix}x\\\\\\dot{x}\\end{bmatrix}}^- \\\\\n",
-    "\\mathbf{F} &= \\begin{bmatrix}1&\\Delta t  \\\\ 0&1\\end{bmatrix} \n",
+    "\\mathbf{\\overline x} &= {\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix}}^- \\\\\n",
+    "\\mathbf F &= \\begin{bmatrix}1&\\Delta t  \\\\ 0&1\\end{bmatrix} \n",
    "\\end{aligned}$$\n",
    "\n",
    "and set the Kalman filter variable `F` during initialization like so:\n",
@ -2927,27 +2927,27 @@
   "source": [
    "We can do this by setting the data rate to 0.25 seconds, which is 4 Hz. As we loop, on every iteration we call `update()` with only the wheel measurement. Then, every fourth time we will call `update()` with both the wheel and PS measurements. \n",
    "\n",
-    "This means that we vary the amount of data in the z parameter. The matrices associated with the measurement are $\\mathbf{H}$ and $\\mathbf{R}$. In the code above we designed H to be \n",
+    "This means that we vary the amount of data in the z parameter. The matrices associated with the measurement are $\\mathbf H$ and $\\mathbf R$. In the code above we designed H to be \n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}1 &0 \\\\ 1& 0\\end{bmatrix}$$\n",
+    "$$\\mathbf H = \\begin{bmatrix}1 &0 \\\\ 1& 0\\end{bmatrix}$$\n",
    "\n",
    "to account for the two measurements of position. When only the wheel reading is available, we must set\n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}1 &0\\end{bmatrix}.$$\n",
+    "$$\\mathbf H = \\begin{bmatrix}1 &0\\end{bmatrix}.$$\n",
    "\n",
-    "The matrix $\\mathbf{R}$ specifies the noise in each measurement. In the code above we set\n",
+    "The matrix $\\mathbf R$ specifies the noise in each measurement. In the code above we set\n",
    "\n",
-    "$$\\mathbf{R} = \\begin{bmatrix}\\sigma_{wheel}^2 &0 \\\\ 0 & \\sigma_{ps}^2\\end{bmatrix}$$\n",
+    "$$\\mathbf R = \\begin{bmatrix}\\sigma_{wheel}^2 &0 \\\\ 0 & \\sigma_{ps}^2\\end{bmatrix}$$\n",
    "\n",
    "When only the wheel measurement is available, we must set\n",
    "\n",
-    "$$\\mathbf{R} = \\begin{bmatrix}\\sigma_{wheel}^2\\end{bmatrix}$$\n",
+    "$$\\mathbf R = \\begin{bmatrix}\\sigma_{wheel}^2\\end{bmatrix}$$\n",
    "\n",
-    "The two matrices that incorporate time are $\\mathbf{F}$ and $\\mathbf{Q}$. For example,\n",
+    "The two matrices that incorporate time are $\\mathbf F$ and $\\mathbf Q$. For example,\n",
    "\n",
-    "$$\\mathbf{F} = \\begin{bmatrix}1 & \\Delta t \\\\ 0& 1\\end{bmatrix}.$$\n",
+    "$$\\mathbf F = \\begin{bmatrix}1 & \\Delta t \\\\ 0& 1\\end{bmatrix}.$$\n",
    "\n",
-    "Since the wheel and position sensor reading coincide once every 4 readings we can just set $\\Delta t =0.25$ and not modify it while filtering. If the readings did not coincide in each iteration you would have to calculate how much time has passed since the last predict, compute a new $\\mathbf{F}$ and $\\mathbf{Q}$, and then call `predict()` so the filter could make a correct prediction based on the time step required."
+    "Since the wheel and position sensor reading coincide once every 4 readings we can just set $\\Delta t =0.25$ and not modify it while filtering. If the readings did not coincide in each iteration you would have to calculate how much time has passed since the last predict, compute a new $\\mathbf F$ and $\\mathbf Q$, and then call `predict()` so the filter could make a correct prediction based on the time step required."
   ]
  },
  {
@ -3204,7 +3204,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We might think to use the same state variables as used for tracking the dog. However, this will not work. Recall that the Kalman filter state transition must be written as $\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$, which means we must calculate the current state from the previous state. Our assumption is that the ball is traveling in a vacuum, so the velocity in x is a constant, and the acceleration in y is solely due to the gravitational constant $g$. We can discretize the Newtonian equations using the well known Euler method in terms of $\\Delta t$ are:\n",
+    "We might think to use the same state variables as used for tracking the dog. However, this will not work. Recall that the Kalman filter state transition must be written as $\\mathbf{\\overline x} = \\mathbf{Fx} + \\mathbf{Bu}$, which means we must calculate the current state from the previous state. Our assumption is that the ball is traveling in a vacuum, so the velocity in x is a constant, and the acceleration in y is solely due to the gravitational constant $g$. We can discretize the Newtonian equations using the well known Euler method in terms of $\\Delta t$ are:\n",
    "\n",
    "$$\\begin{aligned}\n",
    "x_t &=  x_{t-1} + v_{x(t-1)} {\\Delta t} \\\\\n",
@ -3215,32 +3215,32 @@
    "\\end{aligned}\n",
    "$$\n",
    "\n",
-    "> **sidebar**: *Euler's method integrates a differential equation stepwise by assuming the slope (derivative) is constant at time $t$. In this case the derivative of the position is velocity. At each time step $\\Delta t$ we assume a constant velocity, compute the new position, and then update the velocity for the next time step. There are more accurate methods, such as Runge-Kutta available to us, but because we are updating the state with a measurement in each step Euler's method is very accurate.* If you need to use Runge-Kutta you will have to write your own `predict()` function which computes the state transition for $\\mathbf{x}$, and then uses the normal Kalman filter equation $\\mathbf{\\bar{P}}=\\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q}$ to update the covariance matrix.\n",
+    "> **sidebar**: *Euler's method integrates a differential equation stepwise by assuming the slope (derivative) is constant at time $t$. In this case the derivative of the position is velocity. At each time step $\\Delta t$ we assume a constant velocity, compute the new position, and then update the velocity for the next time step. There are more accurate methods, such as Runge-Kutta available to us, but because we are updating the state with a measurement in each step Euler's method is very accurate.* If you need to use Runge-Kutta you will have to write your own `predict()` function which computes the state transition for $\\mathbf x$, and then uses the normal Kalman filter equation $\\mathbf{\\overline{P}}=\\mathbf{FPF}^\\mathsf T + \\mathbf Q$ to update the covariance matrix.\n",
    "\n",
    "This implies that we need to incorporate acceleration for $y$ into the Kalman filter, but not for $x$. This suggests the following state variables.\n",
    "\n",
    "$$\n",
-    "\\mathbf{x} = \n",
+    "\\mathbf x = \n",
    "\\begin{bmatrix}\n",
    "x \\\\\n",
-    "\\dot{x} \\\\\n",
+    "\\dot x \\\\\n",
    "y \\\\\n",
-    "\\dot{y} \\\\\n",
+    "\\dot y \\\\\n",
    "\\ddot{y}\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "However, the acceleration is due to gravity, which is a constant. Instead of asking the Kalman filter to track a constant we can treat gravity as what it really is - a control input. In other words, gravity is a force that alters the behavior of the system in a known way, and it is applied throughout the flight of the ball. \n",
    "\n",
-    "The equation for the state prediction is $\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$. $\\mathbf{Fx}$ is the familiar state transition function which we will use to model the position and velocity of the ball. The vector $\\mathbf{u}$ lets you specify a control input into the filter. For a car the control input will be things such as the amount the accelerator and brake are pressed, the position of the steering wheel, and so on. For our ball the control input will be gravity. The matrix $\\mathbf{B}$ models how the control inputs affect the behavior of the system. Again, for a car $\\mathbf{B}$ will convert the inputs of the brake and accelerator into changes of velocity, and the input of the steering wheel into a different position and heading. For our ball tracking problem it will compute the velocity change due to gravity. We will go into the details of that in step 3. For now, we design the state variable to be\n",
+    "The equation for the state prediction is $\\mathbf{\\overline x} = \\mathbf{Fx} + \\mathbf{Bu}$. $\\mathbf{Fx}$ is the familiar state transition function which we will use to model the position and velocity of the ball. The vector $\\mathbf{u}$ lets you specify a control input into the filter. For a car the control input will be things such as the amount the accelerator and brake are pressed, the position of the steering wheel, and so on. For our ball the control input will be gravity. The matrix $\\mathbf{B}$ models how the control inputs affect the behavior of the system. Again, for a car $\\mathbf{B}$ will convert the inputs of the brake and accelerator into changes of velocity, and the input of the steering wheel into a different position and heading. For our ball tracking problem it will compute the velocity change due to gravity. We will go into the details of that in step 3. For now, we design the state variable to be\n",
    "\n",
    "$$\n",
-    "\\mathbf{x} = \n",
+    "\\mathbf x = \n",
    "\\begin{bmatrix}\n",
    "x \\\\\n",
-    "\\dot{x} \\\\\n",
+    "\\dot x \\\\\n",
    "y \\\\\n",
-    "\\dot{y} \n",
+    "\\dot y \n",
    "\\end{bmatrix}\n",
    "$$"
   ]
@ -3256,7 +3256,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Our next step is to design the state transition function. Recall that the state transistion function is implemented as a matrix $\\mathbf{F}$ that we multiply with the previous state of our system to get the next state $\\mathbf{x}' = \\mathbf{Fx}$.\n",
+    "Our next step is to design the state transition function. Recall that the state transistion function is implemented as a matrix $\\mathbf F$ that we multiply with the previous state of our system to get the next state $\\mathbf x' = \\mathbf{Fx}$.\n",
    "\n",
    "I will not belabor this as it is very similar to the 1-D case we did in the previous chapter. Our state equations for position and velocity would be:\n",
    "\n",
@ -3273,7 +3273,7 @@
    "In matrix form we write this as:\n",
    "\n",
    "$$\n",
-    "\\mathbf{F} = \\begin{bmatrix}\n",
+    "\\mathbf F = \\begin{bmatrix}\n",
    "1 & \\Delta t & 0 & 0 \\\\\n",
    "0 & 1 & 0 & 0 \\\\\n",
    "0 & 0 & 1 & \\Delta t \\\\\n",
@ -3293,7 +3293,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We will use the control input to account for the force of gravity. The term $\\mathbf{Bu}$ is added to $\\mathbf{\\bar{x}}$ to account for how much $\\mathbf{\\bar{x}}$ changes due to gravity. We can say that  $\\mathbf{Bu}$ contains $\\begin{bmatrix}\\Delta x_g & \\Delta \\dot{x_g} & \\Delta y_g & \\Delta \\dot{y_g}\\end{bmatrix}^\\mathsf{T}$.\n",
+    "We will use the control input to account for the force of gravity. The term $\\mathbf{Bu}$ is added to $\\mathbf{\\overline x}$ to account for how much $\\mathbf{\\overline x}$ changes due to gravity. We can say that  $\\mathbf{Bu}$ contains $\\begin{bmatrix}\\Delta x_g & \\Delta \\dot{x_g} & \\Delta y_g & \\Delta \\dot{y_g}\\end{bmatrix}^\\mathsf T$.\n",
    "\n",
    "If we look at the discretized equations we see that gravity only affect the velocity for $y$.\n",
    "\n",
@ -3306,7 +3306,7 @@
    "\\end{aligned}\n",
    "$$\n",
    "\n",
-    "Therefore we want the product $\\mathbf{Bu}$ to equal $\\begin{bmatrix}0 & 0 & 0 & -g \\Delta t \\end{bmatrix}^\\mathsf{T}$. In some sense it is arbitrary how we define $\\mathbf{B}$ and $\\mathbf{u}$ so long as multiplying them together yields this result. For example, we could define $\\mathbf{B}=1$ and $\\mathbf{u} = \\begin{bmatrix}0 & 0 & 0 & -g \\Delta t \\end{bmatrix}^\\mathsf{T}$. But this doesn't really fit with our definitions for $\\mathbf{B}$ and $\\mathbf{u}$, where $\\mathbf{u}$ is the control input, and $\\mathbf{B}$ is the control function. The control input is $-g$ for the velocity of y. So this is one possible definition.\n",
+    "Therefore we want the product $\\mathbf{Bu}$ to equal $\\begin{bmatrix}0 & 0 & 0 & -g \\Delta t \\end{bmatrix}^\\mathsf T$. In some sense it is arbitrary how we define $\\mathbf{B}$ and $\\mathbf{u}$ so long as multiplying them together yields this result. For example, we could define $\\mathbf{B}=1$ and $\\mathbf{u} = \\begin{bmatrix}0 & 0 & 0 & -g \\Delta t \\end{bmatrix}^\\mathsf T$. But this doesn't really fit with our definitions for $\\mathbf{B}$ and $\\mathbf{u}$, where $\\mathbf{u}$ is the control input, and $\\mathbf{B}$ is the control function. The control input is $-g$ for the velocity of y. So this is one possible definition.\n",
    "\n",
    "$$\\mathbf{B} = \\begin{bmatrix}0&0&0&0 \\\\ 0&0&0&0 \\\\0&0&0&0 \\\\0&0&0&\\Delta t\\end{bmatrix}, \\mathbf{u} = \\begin{bmatrix}0\\\\0\\\\0\\\\-g\\end{bmatrix}$$\n",
    "\n",
@ -3335,7 +3335,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The measurement function defines how we go from the state variables to the measurements using the equation $\\mathbf{z} = \\mathbf{Hx}$. We will assume that we have a sensor that provides us with the position of the ball in (x,y), but cannot measure velocities or accelerations. Therefore our function must be:\n",
+    "The measurement function defines how we go from the state variables to the measurements using the equation $\\mathbf z = \\mathbf{Hx}$. We will assume that we have a sensor that provides us with the position of the ball in (x,y), but cannot measure velocities or accelerations. Therefore our function must be:\n",
    "\n",
    "$$\n",
    "\\begin{bmatrix}z_x \\\\ z_y \\end{bmatrix}= \n",
@ -3345,13 +3345,13 @@
    "\\end{bmatrix} * \n",
    "\\begin{bmatrix}\n",
    "x \\\\\n",
-    "\\dot{x} \\\\\n",
+    "\\dot x \\\\\n",
    "y \\\\\n",
-    "\\dot{y} \\end{bmatrix}$$\n",
+    "\\dot y \\end{bmatrix}$$\n",
    "\n",
    "where\n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}\n",
+    "$$\\mathbf H = \\begin{bmatrix}\n",
    "1 & 0 & 0 & 0 \\\\\n",
    "0 & 0 & 1 & 0\n",
    "\\end{bmatrix}$$"
@ -3370,7 +3370,7 @@
   "source": [
    "As with the robot, we will assume that the error is independent in $x$ and $y$. In this case we will start by assuming that the measurement error in x and y are 0.5 meters. Hence,\n",
    "\n",
-    "$$\\mathbf{R} = \\begin{bmatrix}0.5&0\\\\0&0.5\\end{bmatrix}$$"
+    "$$\\mathbf R = \\begin{bmatrix}0.5&0\\\\0&0.5\\end{bmatrix}$$"
   ]
  },
  {
@ -3386,7 +3386,7 @@
   "source": [
    "We are assuming a ball moving in a vacuum, so there should be no process noise. We have 4 state variables, so we need a $4{\\times}4$ covariance matrix:\n",
    "\n",
-    "$$\\mathbf{Q} = \\begin{bmatrix}0&0&0&0\\\\0&0&0&0\\\\0&0&0&0\\\\0&0&0&0\\end{bmatrix}$$"
+    "$$\\mathbf Q = \\begin{bmatrix}0&0&0&0\\\\0&0&0&0\\\\0&0&0&0\\\\0&0&0&0\\end{bmatrix}$$"
   ]
  },
  {
@ -3400,7 +3400,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We already performed this step when we tested the state transition function. Recall that we computed the initial velocity for $x$ and $y$ using trigonometry, and set the value of $\\mathbf{x}$ with:\n",
+    "We already performed this step when we tested the state transition function. Recall that we computed the initial velocity for $x$ and $y$ using trigonometry, and set the value of $\\mathbf x$ with:\n",
    "\n",
    "```python\n",
    "omega = radians(omega)\n",
@ -3938,7 +3938,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,
--- a/10-Unscented-Kalman-Filter.ipynb
+++ b/10-Unscented-Kalman-Filter.ipynb
@ -705,7 +705,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Now we can present the entire Unscented Kalman filter algorithm. Assume that there is a function $f(x, dt)$ that performs the state transition for our filter - it predicts the next state given the current state. It is the analogue of the $\\mathbf{F}$ matrix of the Kalman filter. Also assume there is a measurement function $h(x)$. It takes the current state and computes what measurement that state corresponds to. It is the analogue of the $\\mathbf{H}$ matrix used by the Kalman filter."
+    "Now we can present the entire Unscented Kalman filter algorithm. Assume that there is a function $f(x, dt)$ that performs the state transition for our filter - it predicts the next state given the current state. It is the analogue of the $\\mathbf F$ matrix of the Kalman filter. Also assume there is a measurement function $h(x)$. It takes the current state and computes what measurement that state corresponds to. It is the analogue of the $\\mathbf H$ matrix used by the Kalman filter."
   ]
  },
  {
@ -741,7 +741,7 @@
    "\n",
    "$$\\begin{aligned}\n",
    "\\mathbf{\\bar{\\mu}} &= \\sum w^m\\boldsymbol{\\mathcal{Y}} \\\\\n",
-    "\\mathbf{\\bar{\\Sigma}} &= \\sum w^c({\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})(\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})^\\mathsf{T}} + \\mathbf{Q}\n",
+    "\\mathbf{\\bar{\\Sigma}} &= \\sum w^c({\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})(\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})^\\mathsf{T}} + \\mathbf Q\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
@ -751,7 +751,7 @@
    "\\mathrm{Kalman} & \\mathrm{Unscented} \\\\\n",
    "\\hline \n",
    "\\mathbf{\\bar{x}} = \\mathbf{Fx} & \\mathbf{\\bar{\\mu}} = \\sum w^m\\boldsymbol{\\mathcal{Y}}  \\\\\n",
-    "\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q}  & \\sum w^c({\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})(\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})^\\mathsf{T}}+\\mathbf{Q}\n",
+    "\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf Q  & \\sum w^c({\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})(\\boldsymbol{\\mathcal{Y}}-\\bf{\\bar{\\mu}})^\\mathsf{T}}+\\mathbf Q\n",
    "\\end{array}$$\n",
    "\n"
   ]
@ -775,7 +775,7 @@
    "\n",
    "$$\\begin{aligned}\n",
    "\\mathbf{\\mu}_z &= \\sum w^m\\boldsymbol{\\mathcal{Z}} \\\\\n",
-    "\\mathbf{P}_z &= \\sum w^c{(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})^\\mathsf{T}} + \\mathbf{R}\n",
+    "\\mathbf P_z &= \\sum w^c{(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})^\\mathsf{T}} + \\mathbf R\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
@ -787,24 +787,24 @@
    "\n",
    "To compute the Kalman gain we first compute the [cross covariance](https://en.wikipedia.org/wiki/Cross-covariance) of the state and the measurements, which is defined as: \n",
    "\n",
-    "$$\\mathbf{P}_{xz} =\\sum w^c(\\boldsymbol{\\chi}-\\mu)(\\boldsymbol{\\mathcal{Z}}-\\mathbf{\\mu}_z)^\\mathsf{T}$$\n",
+    "$$\\mathbf P_{xz} =\\sum w^c(\\boldsymbol{\\chi}-\\mu)(\\boldsymbol{\\mathcal{Z}}-\\mathbf{\\mu}_z)^\\mathsf{T}$$\n",
    "\n",
    "And then the Kalman gain is defined as\n",
    "\n",
-    "$$\\mathbf{K} = \\mathbf{P}_{xz} \\mathbf{P}_z^{-1}$$\n",
+    "$$\\mathbf{K} = \\mathbf P_{xz} \\mathbf P_z^{-1}$$\n",
    "\n",
    "If you think of the inverse as a *kind of* matrix division, you can see that the Kalman gain is a simple ratio which computes:\n",
    "\n",
-    "$$\\mathbf{K} \\approx \\frac{\\mathbf{P}_{xz}}{\\mathbf{P}_z^{-1}} \n",
+    "$$\\mathbf{K} \\approx \\frac{\\mathbf P_{xz}}{\\mathbf P_z^{-1}} \n",
    "\\approx \\frac{\\text{belief in state}}{\\text{belief in measurement}}$$\n",
    "\n",
    "Finally, we compute the new state estimate using the residual and Kalman gain:\n",
    "\n",
-    "$$\\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky}$$\n",
+    "$$\\mathbf x = \\mathbf{\\bar{x}} + \\mathbf{Ky}$$\n",
    "\n",
    "and the new covariance is computed as:\n",
    "\n",
-    "$$ \\mathbf{P} = \\mathbf{\\bar{\\Sigma}} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}$$\n",
+    "$$ \\mathbf P = \\mathbf{\\bar{\\Sigma}} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}$$\n",
    "\n",
    "This step contains a few equations you have to take on faith, but you should be able to see how they relate to the linear Kalman filter equations. We convert the mean and covariance into measurement space, add the measurement error into the measurement covariance, compute the residual and kalman gain, compute the new state estimate as the old estimate plus the residual times the Kalman gain, and adjust the covariance for the information provided by the measurement. The linear algebra is slightly different from the linear Kalman filter, but the algorithm is the same Bayesian algorithm we have been implementing throughout the book. \n",
    "\n",
@ -819,16 +819,16 @@
    "\\textrm{Kalman Filter} & \\textrm{Unscented Kalman Filter} \\\\\n",
    "\\hline \n",
    "\\mathbf{\\bar{x}} = \\mathbf{Fx} & \\mathbf{\\bar{\\mu}} = \\sum w^m\\boldsymbol{\\mathcal{Y}}  \\\\\n",
-    "\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q}  & \\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf{Q} \\\\\n",
+    "\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf Q  & \\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf Q \\\\\n",
    "\\hline \n",
    "\\mathbf{y} = \\boldsymbol{\\mathbf{z}} - \\mathbf{H\\bar{x}}  &\n",
    "\\mathbf{y} = \\mathbf{z} - \\sum w^m h(\\boldsymbol{\\mathcal{Y}})\\\\\n",
-    "\\mathbf{S} = \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R} & \n",
-    "\\mathbf{P}_z = \\sum w^c{(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})^\\mathsf{T}} + \\mathbf{R} \\\\ \n",
+    "\\mathbf{S} = \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf R & \n",
+    "\\mathbf P_z = \\sum w^c{(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})(\\boldsymbol{\\mathcal{Z}}-\\bar{\\mu})^\\mathsf{T}} + \\mathbf R \\\\ \n",
    "\\mathbf{K} = \\mathbf{\\bar{P}H}^\\mathsf{T} \\mathbf{S}^{-1} &\n",
-    "\\mathbf{K} = \\left[\\sum w^c(\\boldsymbol{\\chi}-\\mu)(\\boldsymbol{\\mathcal{Z}}-\\mathbf{\\mu}_z)^\\mathsf{T}\\right] \\mathbf{P}_z^{-1}\\\\\n",
-    "\\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky} & \\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky}\\\\\n",
-    "\\mathbf{P} = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}} & \\mathbf{P} = \\mathbf{\\bar{\\Sigma}} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}\n",
+    "\\mathbf{K} = \\left[\\sum w^c(\\boldsymbol{\\chi}-\\mu)(\\boldsymbol{\\mathcal{Z}}-\\mathbf{\\mu}_z)^\\mathsf{T}\\right] \\mathbf P_z^{-1}\\\\\n",
+    "\\mathbf x = \\mathbf{\\bar{x}} + \\mathbf{Ky} & \\mathbf x = \\mathbf{\\bar{x}} + \\mathbf{Ky}\\\\\n",
+    "\\mathbf P = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}} & \\mathbf P = \\mathbf{\\bar{\\Sigma}} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}\n",
    "\\end{array}$$"
   ]
  },
@ -896,7 +896,7 @@
    "\n",
    "$$ \\mathcal{X}_0 = \\mu$$\n",
    "\n",
-    "For notational convenience we define $\\lambda = \\alpha^2(n+\\kappa)-n$, where $n$ is the dimension of $\\mathbf{x}$. Then the remaining sigma points are computed as\n",
+    "For notational convenience we define $\\lambda = \\alpha^2(n+\\kappa)-n$, where $n$ is the dimension of $\\mathbf x$. Then the remaining sigma points are computed as\n",
    "\n",
    "$$ \n",
    "\\boldsymbol{\\chi}_i = \\begin{cases}\n",
@ -924,7 +924,7 @@
    "\n",
    "### Reasonable Choices for the Parameters\n",
    "\n",
-    "$\\beta=2$ is a good choice for Gaussian problems, $\\kappa=3-n$ where $n$ is the dimension of $\\mathbf{x}$ is a good choice for $\\kappa$, and $0 \\le \\alpha \\le 1$ is an appropriate choice for $\\alpha$, where a larger value for $\\alpha$ spreads the sigma points further from the mean."
+    "$\\beta=2$ is a good choice for Gaussian problems, $\\kappa=3-n$ where $n$ is the dimension of $\\mathbf x$ is a good choice for $\\kappa$, and $0 \\le \\alpha \\le 1$ is an appropriate choice for $\\alpha$, where a larger value for $\\alpha$ spreads the sigma points further from the mean."
   ]
  },
  {
@ -944,11 +944,11 @@
    "\n",
    "To design a linear Kalman filter you need to design the $\\bf{x}$, $\\bf{F}$, $\\bf{H}$, $\\bf{R}$, and $\\bf{Q}$ matrices. We have done this many times already so let me present a design to you without a lot of comment. We want a constant velocity model, so we can define $\\bf{x}$ to be\n",
    "\n",
-    "$$ \\mathbf{x} = \\begin{bmatrix}x &  \\dot{x} & y & \\dot{y} \\end{bmatrix}^\\mathsf{T}$$\n",
+    "$$ \\mathbf x = \\begin{bmatrix}x &  \\dot x & y & \\dot y \\end{bmatrix}^\\mathsf{T}$$\n",
    "\n",
    "With this ordering of state variables we can define our state transition model to be\n",
    "\n",
-    "$$\\mathbf{F} = \\begin{bmatrix}1 & \\Delta t & 0 & 0 \\\\\n",
+    "$$\\mathbf F = \\begin{bmatrix}1 & \\Delta t & 0 & 0 \\\\\n",
    "0&1&0&0 \\\\\n",
    "0&0&1&\\Delta t\\\\\n",
    "0&0&0&1\n",
@ -956,20 +956,20 @@
    "\n",
    "which implements the Newtonian equation\n",
    "\n",
-    "$$x_k = x_{k-1} + \\dot{x}_{k-1}\\Delta t$$\n",
+    "$$x_k = x_{k-1} + \\dot x_{k-1}\\Delta t$$\n",
    "\n",
    "Our sensors provide position measurements but not velocity, so the measurement function is\n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}1&0&0&0 \\\\ 0&0&1&0\n",
+    "$$\\mathbf H = \\begin{bmatrix}1&0&0&0 \\\\ 0&0&1&0\n",
    "\\end{bmatrix}$$\n",
    "\n",
    "Let's say our sensor gives positions in meters with an error of $1\\sigma=0.3$ meters in both the *x* and *y* coordinates. This gives us a measurement noise matrix of \n",
    "\n",
-    "$$\\mathbf{R} = \\begin{bmatrix}0.3^2 &0\\\\0 & 0.3^2\\end{bmatrix}$$\n",
+    "$$\\mathbf R = \\begin{bmatrix}0.3^2 &0\\\\0 & 0.3^2\\end{bmatrix}$$\n",
    "\n",
    "Finally, let's assume that the process noise can be represented by the discrete white noise model - that is, that over each time period the acceleration is constant. We can use `FilterPy`'s `Q_discrete_white_noise()` method to create this matrix for us, but for review the matrix is\n",
    "\n",
-    "$$\\mathbf{Q} = \\begin{bmatrix}\n",
+    "$$\\mathbf Q = \\begin{bmatrix}\n",
    "\\frac{1}{4}\\Delta t^4 & \\frac{1}{2}\\Delta t^3 \\\\\n",
    "\\frac{1}{2}\\Delta t^3 & \\Delta t^2\\end{bmatrix} \\sigma^2$$\n",
    "\n",
@ -1030,7 +1030,7 @@
   "source": [
    "This should hold no surprises for you. Now let's implement this filter as an Unscented Kalman filter. Again, this is purely for educational purposes; using a UKF for a linear filter confers no benefit.\n",
    "\n",
-    "The equations of the UKF are implemented for you with the `FilterPy` class `UnscentedKalmanFilter`; all you have to do is specify the matrices and the nonlinear functions `f(x)` and `h(x)`. `f(x)` implements the state transition function that is implemented by the matrix $\\mathbf{F}$ in the linear filter, and `h(x)` implements the measurement function implemented with the matrix $\\mathbf{H}$. In nonlinear problems these functions are nonlinear, so we cannot use matrices to specify them.\n",
+    "The equations of the UKF are implemented for you with the `FilterPy` class `UnscentedKalmanFilter`; all you have to do is specify the matrices and the nonlinear functions `f(x)` and `h(x)`. `f(x)` implements the state transition function that is implemented by the matrix $\\mathbf F$ in the linear filter, and `h(x)` implements the measurement function implemented with the matrix $\\mathbf H$. In nonlinear problems these functions are nonlinear, so we cannot use matrices to specify them.\n",
    "\n",
    "For our linear problem we can define these functions to implement the linear equations. The function `f(x)` takes the signature `def f(x,dt)` and `h(x)` takes the signature `def h(x)`. Below is a reasonable implementation of these two functions. Each is expected to return a 1-D NumPy array containing the result."
   ]
@ -1150,7 +1150,7 @@
   "source": [
    "This gave me a standard deviation of 0.013 which is quite small. \n",
    "\n",
-    "So far implementation of the UKF is not *that* different from the linear Kalman filter. Instead of implementing the state function and measurement function as the matrices $\\mathbf{F}$ and $\\mathbf{H}$ you supply functions `f(x)` and `h(x)`. The rest of the theory and implementation remains the same. Of course the `FilterPy` code for the UKF is quite different than the Kalman filter code, but from a designer's point of view the problem formulation and filter design is very similar."
+    "So far implementation of the UKF is not *that* different from the linear Kalman filter. Instead of implementing the state function and measurement function as the matrices $\\mathbf F$ and $\\mathbf H$ you supply functions `f(x)` and `h(x)`. The rest of the theory and implementation remains the same. Of course the `FilterPy` code for the UKF is quite different than the Kalman filter code, but from a designer's point of view the problem formulation and filter design is very similar."
   ]
  },
  {
@ -1201,7 +1201,7 @@
   "source": [
    "We will assume that the aircraft is flying at a constant altitude. This leads us to design a three variable state vector:\n",
    "\n",
-    "$$\\mathbf{x} = \\begin{bmatrix}\\mathtt{distance} \\\\\\mathtt{velocity}\\\\ \\mathtt{altitude}\\end{bmatrix}=    \\begin{bmatrix}x \\\\ \\dot{x}\\\\ y\\end{bmatrix}$$"
+    "$$\\mathbf x = \\begin{bmatrix}\\mathtt{distance} \\\\\\mathtt{velocity}\\\\ \\mathtt{altitude}\\end{bmatrix}=    \\begin{bmatrix}x \\\\ \\dot x\\\\ y\\end{bmatrix}$$"
   ]
  },
  {
@ -1211,7 +1211,7 @@
    "Our state transition function is linear \n",
    "\n",
    "$$\\mathbf{\\bar{x}} = \\begin{bmatrix} 1 & \\Delta t & 0 \\\\ 0& 1& 0 \\\\ 0&0&1\\end{bmatrix}\n",
-    "\\begin{bmatrix}x \\\\ \\dot{x}\\\\ y\\end{bmatrix}\n",
+    "\\begin{bmatrix}x \\\\ \\dot x\\\\ y\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "which we can implement very much like the previous problem."
@ -1342,9 +1342,9 @@
   "source": [
    "Now let's put it all together. A military grade radar system can achieve 1 meter RMS range accuracy, and 1 mrad RMS for bearing [1]. We will assume a more modest 5 meter range accuracy, and 0.5$^\\circ$ angular accuracy as this provides a more challenging data set for the filter.\n",
    "\n",
-    "The design of $\\mathbf{Q}$ requires some discussion. The state $\\mathbf{x}$ contains $\\begin{bmatrix}x & \\dot{x} & y\\end{bmatrix}^\\mathtt{T}$. The first two elements are position (down range distance) and velocity, so we can use `Q_discrete_white_noise` noise to compute the values for the upper left hand side of Q. The third element of  $\\mathbf{x}$ is altitude, which we are assuming is independent of the down range distance. That leads us to a block design of $\\mathbf{Q}$ of:\n",
+    "The design of $\\mathbf Q$ requires some discussion. The state $\\mathbf x$ contains $\\begin{bmatrix}x & \\dot x & y\\end{bmatrix}^\\mathtt{T}$. The first two elements are position (down range distance) and velocity, so we can use `Q_discrete_white_noise` noise to compute the values for the upper left hand side of Q. The third element of  $\\mathbf x$ is altitude, which we are assuming is independent of the down range distance. That leads us to a block design of $\\mathbf Q$ of:\n",
    "\n",
-    "$$\\mathbf{Q} = \\begin{bmatrix}\\mathbf{Q}_\\mathtt{x} & 0 \\\\ 0 & Q_\\mathtt{y}\\end{bmatrix}$$\n",
+    "$$\\mathbf Q = \\begin{bmatrix}\\mathbf Q_\\mathtt{x} & 0 \\\\ 0 & Q_\\mathtt{y}\\end{bmatrix}$$\n",
    "\n",
    "I'll start with the aircraft positioned directly over the radar station flying at 100 m/s. A typical radar might update only once every 12 seconds and so we will use that for our update period. "
   ]
@ -1509,12 +1509,12 @@
    "I hope you answered add climb rate to the state, like so:\n",
    "\n",
    "\n",
-    "$$\\mathbf{x} = \\begin{bmatrix}\\mathtt{distance} \\\\\\mathtt{velocity}\\\\ \\mathtt{altitude} \\\\ \\mathtt{climb\\, rate}\\end{bmatrix}=  \\begin{bmatrix}x \\\\\\dot{x}\\\\ y \\\\ \\dot{y}\\end{bmatrix}$$\n",
+    "$$\\mathbf x = \\begin{bmatrix}\\mathtt{distance} \\\\\\mathtt{velocity}\\\\ \\mathtt{altitude} \\\\ \\mathtt{climb\\, rate}\\end{bmatrix}=  \\begin{bmatrix}x \\\\\\dot x\\\\ y \\\\ \\dot y\\end{bmatrix}$$\n",
    "\n",
    "This requires the following change to the state transition function, which is still linear.\n",
    "\n",
    "$$\\mathbf{\\bar{x}} = \\begin{bmatrix} 1 & \\Delta t & 0 &0 \\\\ 0& 1& 0 &0\\\\ 0&0&1&\\Delta t \\\\ 0&0&0&1\\end{bmatrix}\n",
-    "\\begin{bmatrix}x \\\\\\dot{x}\\\\ y\\\\ \\dot{y}\\end{bmatrix} \n",
+    "\\begin{bmatrix}x \\\\\\dot x\\\\ y\\\\ \\dot y\\end{bmatrix} \n",
    "$$\n",
    "\n",
    "The measurement function stays the same, but we will have to alter Q to account for the state dimensionality change."
@ -1679,7 +1679,7 @@
   "source": [
    "To include the doppler we need to add the velocity in $x$ and $y$ into the measurement. The `ACSim` class stores the velocity in the data member `vel`. To perform the Kalman filter update we just need to call `update` with a list containing the bearing, distance, and velocity in $x$ and $y$:\n",
    "\n",
-    "$$z = [slant\\_range, bearing, \\dot{x}, \\dot{y}]$$\n",
+    "$$z = [slant\\_range, bearing, \\dot x, \\dot y]$$\n",
    "\n",
    "The measurement contains four values so the measurement function also needs to return four values. The slant range and bearing will be computed as before, but we do not need to compute the velocity in $x$ and $y$ as they are provided in the state estimate."
   ]
@ -2069,7 +2069,7 @@
   "source": [
    "Each individual measurement has a very large position error. However, a plot of successive measurements shows a clear trend - the target is obviously moving towards the upper right. When a Kalman filter computes the Kalman gain it takes the distribution of errors into account by using the measurement function. In this example the error lies on an approximately $45^\\circ$ degree line, so the filter will discount errors in that direction. On the other hand, there is almost no error in measurement orthogonal to that, and again the Kalman gain will be taking that into account. The end result is the track can be computed even with this type of noise distribution. This graph makes it look easy because we have plotted 100 possible measurements for each position update. This makes the aircraft's movement obvious. In contrast, the Kalman filter only gets one measurement per update. Therefore the filter will not be able to generate as good a fit as the dotted green line implies. \n",
    "\n",
-    "The next interaction we must think about is the fact that the bearing gives us no distance information. Suppose we set the intial position to be 1,000 kilometers away from the sensor (vs the actual distance of 7.07 km) and make $\\mathbf{P}$ very small. At that distance a $1^\\circ$ error translates into a positional error of 17.5 km. The KF would never be able to converge onto the actual target position because the filter is incorrectly very certain about its position estimates and because there is no distance information provided in the measurements."
+    "The next interaction we must think about is the fact that the bearing gives us no distance information. Suppose we set the intial position to be 1,000 kilometers away from the sensor (vs the actual distance of 7.07 km) and make $\\mathbf P$ very small. At that distance a $1^\\circ$ error translates into a positional error of 17.5 km. The KF would never be able to converge onto the actual target position because the filter is incorrectly very certain about its position estimates and because there is no distance information provided in the measurements."
   ]
  },
  {
@ -2312,7 +2312,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "All that is left is to compute $\\mathbf{P} = \\sum_i w_i{(\\mathcal{X}_i-\\mu)(\\mathcal{X}_i-\\mu)^\\mathsf{T}} + \\mathbf{Q}$:\n",
+    "All that is left is to compute $\\mathbf P = \\sum_i w_i{(\\mathcal{X}_i-\\mu)(\\mathcal{X}_i-\\mu)^\\mathsf{T}} + \\mathbf Q$:\n",
    "\n",
    "```python\n",
    "kmax, n = sigmas.shape\n",
@ -2393,18 +2393,18 @@
    "\n",
    "The mean and covariance of those points is computed with the  unscented transform. The residual and Kalman gain is then computed. The cross variance is computed as:\n",
    "\n",
-    "$$\\mathbf{P}_{xz} =\\sum w^c(\\boldsymbol{\\chi}-\\mu)(\\boldsymbol{\\mathcal{Z}}-\\mathbf{\\mu}_z)^\\mathsf{T}$$\n",
+    "$$\\mathbf P_{xz} =\\sum w^c(\\boldsymbol{\\chi}-\\mu)(\\boldsymbol{\\mathcal{Z}}-\\mathbf{\\mu}_z)^\\mathsf{T}$$\n",
    "\n",
    "\n",
    "Finally, we compute the new state estimate using the residual and Kalman gain:\n",
    "\n",
    "\n",
-    "$$K = \\mathbf{P}_{xz} \\mathbf{P}_z^{-1}\\\\\n",
-    "{\\mathbf{x}} = \\mathbf{\\bar{x}} + \\mathbf{Ky}$$\n",
+    "$$K = \\mathbf P_{xz} \\mathbf P_z^{-1}\\\\\n",
+    "{\\mathbf x} = \\mathbf{\\bar{x}} + \\mathbf{Ky}$$\n",
    "\n",
    "and the new covariance is computed as:\n",
    "\n",
-    "$$ \\mathbf{P} = \\mathbf{\\bar{P}} - \\mathbf{KP}_z\\mathbf{K}^\\mathsf{T}$$\n",
+    "$$ \\mathbf P = \\mathbf{\\bar{P}} - \\mathbf{KP}_z\\mathbf{K}^\\mathsf{T}$$\n",
    "\n",
    "This function can be implemented as follows, assuming it is a method of a class that stores the necessary matrices and data."
   ]
@ -2955,7 +2955,7 @@
    "\n",
    "For our robot we will maintain the position and orientation of the robot:\n",
    "\n",
-    "$$\\mathbf{x} = \\begin{bmatrix}x & y & \\theta\\end{bmatrix}^\\mathsf{T}$$\n",
+    "$$\\mathbf x = \\begin{bmatrix}x & y & \\theta\\end{bmatrix}^\\mathsf{T}$$\n",
    "\n",
    "I could include velocities into this model, but as you will see the math will already be quite challenging.\n",
    "\n",
@ -3048,7 +3048,7 @@
    "Thus our measurement function is\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{z}& = h(\\mathbf{x}, \\mathbf{p}) &+ \\mathcal{N}(0, R)\\\\\n",
+    "\\mathbf{z}& = h(\\mathbf x, \\mathbf P) &+ \\mathcal{N}(0, R)\\\\\n",
    "&= \\begin{bmatrix}\n",
    "\\sqrt{(p_x - x)^2 + (p_y - y)^2} \\\\\n",
    "\\tan^{-1}(\\frac{p_y - y}{p_x - x}) - \\theta \n",
@ -3066,7 +3066,7 @@
    "\n",
    "This is quite straightforward as we need to specify measurement noise in measurement space, hence it is linear. It is reasonable to assume that the range and bearing measurement noise is independent, hence\n",
    "\n",
-    "$$\\mathbf{R}=\\begin{bmatrix}\\sigma_{range}^2 & 0 \\\\ 0 & \\sigma_{bearing}^2\\end{bmatrix}$$"
+    "$$\\mathbf R=\\begin{bmatrix}\\sigma_{range}^2 & 0 \\\\ 0 & \\sigma_{bearing}^2\\end{bmatrix}$$"
   ]
  },
  {
@ -3150,7 +3150,7 @@
   "metadata": {},
   "source": [
    "We can now implement the measurement model. The equation is\n",
-    "$$h(\\mathbf{x}, \\mathbf{p})\n",
+    "$$h(\\mathbf x, \\mathbf P)\n",
    "= \\begin{bmatrix}\n",
    "\\sqrt{(p_x - x)^2 + (p_y - y)^2} \\\\\n",
    "\\tan^{-1}(\\frac{p_y - y}{p_x - x}) - \\theta \n",
@ -3539,7 +3539,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,
--- a/11-Extended-Kalman-Filters.ipynb
+++ b/11-Extended-Kalman-Filters.ipynb
@ -389,36 +389,36 @@
   "source": [
    "## Linearizing the Kalman Filter\n",
    "\n",
-    "To implement the extended Kalman filter we will leave the linear equations as they are and use partial derivatives to evaluate the system matrix $\\mathbf{F}$ and the measurement matrix $\\mathbf{H}$ at the filter's state at time t ($\\mathbf{x}_t$). In other words we linearize the equations at time t by finding the slope (derivative) of the equations at that time. Since $\\mathbf{F}$ also depends on the control input vector $\\mathbf{u}$ we will need to include that term:\n",
+    "To implement the extended Kalman filter we will leave the linear equations as they are and use partial derivatives to evaluate the system matrix $\\mathbf F$ and the measurement matrix $\\mathbf H$ at the filter's state at time t ($\\mathbf x_t$). In other words we linearize the equations at time t by finding the slope (derivative) of the equations at that time. Since $\\mathbf F$ also depends on the control input vector $\\mathbf u$ we will need to include that term:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\mathbf{F} \n",
-    "&\\equiv {\\frac{\\partial{f(\\mathbf{x}_t, \\mathbf{u}_t)}}{\\partial{\\mathbf{x}}}}\\biggr|_{{\\mathbf{x}_t},{\\mathbf{u}_t}} \\\\\n",
-    "\\mathbf{H} &\\equiv \\frac{\\partial{h(\\mathbf{x}_t)}}{\\partial{\\mathbf{x}}}\\biggr|_{\\mathbf{x}_t} \n",
+    "\\mathbf F \n",
+    "&\\equiv {\\frac{\\partial{f(\\mathbf x_t, \\mathbf u_t)}}{\\partial{\\mathbf x}}}\\biggr|_{{\\mathbf x_t},{\\mathbf u_t}} \\\\\n",
+    "\\mathbf H &\\equiv \\frac{\\partial{h(\\mathbf x_t)}}{\\partial{\\mathbf x}}\\biggr|_{\\mathbf x_t} \n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
-    "This notation states that at each update step we compute $\\mathbf{F}$ as the partial derivative of our function $f()$ evaluated at $\\mathbf{x}_t$, $\\mathbf{u}_t$.\n",
+    "This notation states that at each update step we compute $\\mathbf F$ as the partial derivative of our function $f()$ evaluated at $\\mathbf x_t$, $\\mathbf u_t$.\n",
    "\n",
    "This leads to these equations for the EKF:\n",
    "\n",
    "<u>Predict step</u>:\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{F} &= {\\frac{\\partial{f(\\mathbf{x})}}{\\partial{\\mathbf{x}}}}\\biggr|_{{\\mathbf{x}_t},{\\mathbf{u}_t}} \\\\\n",
-    "\\mathbf{\\bar{x}} &= \\mathbf{Fx} + \\mathbf{Bu} \\\\\n",
-    "\\mathbf{\\bar{P}} &= \\mathbf{FPF}^\\mathsf{T} + \\mathbf{Q} \n",
+    "\\mathbf F &= {\\frac{\\partial{f(\\mathbf x)}}{\\partial{\\mathbf x}}}\\biggr|_{{\\mathbf x_t},{\\mathbf u_t}} \\\\\n",
+    "\\mathbf{\\overline x} &= \\mathbf{Fx} + \\mathbf{Bu} \\\\\n",
+    "\\mathbf{\\overline P} &= \\mathbf{FPF}^\\mathsf T + \\mathbf Q \n",
    "\\end{aligned}$$\n",
    "\n",
    "<u>Update step</u>:\n",
    "\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{H} &= \\frac{\\partial{h(\\mathbf{x})}}{\\partial{\\mathbf{x}}}\\biggr|_{\\mathbf{x}_t} \\\\\n",
-    "\\mathbf{K} &= \\mathbf{\\bar{P}H}^\\mathsf{T}(\\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R})^{-1}\\\\\n",
-    "\\mathbf{x} &= \\mathbf{\\bar{x}} + \\mathbf{K}(\\mathbf{z}-\\mathbf{H\\bar{x}})  \\\\\n",
-    "\\mathbf{P} &= \\mathbf{\\bar{P}}(\\mathbf{I} - \\mathbf{KH})\n",
+    "\\mathbf H &= \\frac{\\partial{h(\\mathbf x)}}{\\partial{\\mathbf x}}\\biggr|_{\\mathbf x_t} \\\\\n",
+    "\\mathbf K &= \\mathbf{\\overline PH}^\\mathsf T(\\mathbf{H\\overline PH}^\\mathsf T + \\mathbf R)^{-1}\\\\\n",
+    "\\mathbf x &= \\mathbf{\\overline x} + \\mathbf K(\\mathbf{z}-\\mathbf{H\\overline x})  \\\\\n",
+    "\\mathbf P &= \\mathbf{\\overline P}(\\mathbf{I} - \\mathbf{KH})\n",
    "\\end{aligned}$$\n",
    "\n",
    "\n",
@ -488,7 +488,7 @@
   "source": [
    "We want to track the position of an aircraft assuming a constant velocity and altitude, and measurements of the slant distance to the aircraft. That means we need 3 state variables - horizontal distance, horizonal velocity, and altitude:\n",
    "\n",
-    "$$\\mathbf{x} = \\begin{bmatrix}\\mathtt{distance} \\\\\\mathtt{velocity}\\\\ \\mathtt{altitude}\\end{bmatrix}=    \\begin{bmatrix}x \\\\ \\dot{x}\\\\ y\\end{bmatrix}$$"
+    "$$\\mathbf x = \\begin{bmatrix}\\mathtt{distance} \\\\\\mathtt{velocity}\\\\ \\mathtt{altitude}\\end{bmatrix}=    \\begin{bmatrix}x \\\\ \\dot x\\\\ y\\end{bmatrix}$$"
   ]
  },
  {
@ -504,7 +504,7 @@
   "source": [
    "We assume a Newtonian, kinematic system for the aircraft. We've used this model in previous chapters, so by inspection you may recognize that we want\n",
    "\n",
-    "$$\\mathbf{F} = \\begin{bmatrix}1 & \\Delta t & 0\\\\\n",
+    "$$\\mathbf F = \\begin{bmatrix}1 & \\Delta t & 0\\\\\n",
    "0 & 1 & 0 \\\\\n",
    "0 & 0 & 1\\end{bmatrix}$$\n",
    "\n",
@ -512,46 +512,46 @@
    "\n",
    "We model nonlinear systems with a set of differential equations. We need an equation in the form \n",
    "\n",
-    "$$\\dot{\\mathbf{x}} = \\mathbf{Ax} + \\mathbf{w}$$\n",
+    "$$\\dot{\\mathbf x} = \\mathbf{Ax} + \\mathbf{w}$$\n",
    "\n",
    "where $\\mathbf{w}$ is the system noise. \n",
    "\n",
    "The variables $x$ and $y$ are independent so we can compute them separately. The differential equations for motion in one dimension are:\n",
    "\n",
-    "$$\\begin{aligned}v &= \\dot{x} \\\\\n",
+    "$$\\begin{aligned}v &= \\dot x \\\\\n",
    "a &= \\ddot{x} = 0\\end{aligned}$$\n",
    "\n",
    "Now we put the differential equations into state-space form. If this was a second or greater order differential system we would have to first reduce them to an equivalent set of first degree equations. The equations are first order, so we put them in state space matrix form as\n",
    "\n",
-    "$$\\begin{bmatrix}\\dot{x} \\\\ \\ddot{x}\\end{bmatrix} =\\begin{bmatrix}0&1\\\\0&0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}$$\n",
+    "$$\\begin{bmatrix}\\dot x \\\\ \\ddot{x}\\end{bmatrix} =\\begin{bmatrix}0&1\\\\0&0\\end{bmatrix} \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}$$\n",
    "\n",
    "\n",
-    "We let $\\mathbf{F}=\\begin{bmatrix}0&1\\\\0&0\\end{bmatrix}$ and solve the following Taylor series expansion to linearize the equations at $t$:\n",
+    "We let $\\mathbf F=\\begin{bmatrix}0&1\\\\0&0\\end{bmatrix}$ and solve the following Taylor series expansion to linearize the equations at $t$:\n",
    "\n",
-    "$$\\Phi(t) = e^{\\mathbf{F}t} = \\mathbf{I} + \\mathbf{F}t  + \\frac{(\\mathbf{F}t)^2}{2!} + \\frac{(\\mathbf{F}t)^3}{3!} + ... $$\n",
+    "$$\\Phi(t) = e^{\\mathbf Ft} = \\mathbf{I} + \\mathbf Ft  + \\frac{(\\mathbf Ft)^2}{2!} + \\frac{(\\mathbf Ft)^3}{3!} + ... $$\n",
    "\n",
    "\n",
-    "$\\mathbf{F}^2 = \\begin{bmatrix}0&0\\\\0&0\\end{bmatrix}$, so all higher powers of $\\mathbf{F}$ are also $\\mathbf{0}$. Thus the Taylor series expansion is:\n",
+    "$\\mathbf F^2 = \\begin{bmatrix}0&0\\\\0&0\\end{bmatrix}$, so all higher powers of $\\mathbf F$ are also $\\mathbf{0}$. Thus the Taylor series expansion is:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\Phi(t) &=\\mathbf{I} + \\mathbf{F}t + \\mathbf{0} \\\\\n",
+    "\\Phi(t) &=\\mathbf{I} + \\mathbf Ft + \\mathbf{0} \\\\\n",
    "&= \\begin{bmatrix}1&0\\\\0&1\\end{bmatrix} + \\begin{bmatrix}0&1\\\\0&0\\end{bmatrix}t\\\\\n",
    "&= \\begin{bmatrix}1&t\\\\0&1\\end{bmatrix}\n",
    "\\end{aligned}$$\n",
    "\n",
    "We are trying to find the state transition function for\n",
    "\n",
-    "$$\\mathbf{x}(t_k) = \\Phi(t_k-t_{k-1})\\mathbf{x}(t_{k-1})$$ \n",
+    "$$\\mathbf x(t_k) = \\Phi(t_k-t_{k-1})\\mathbf x(t_{k-1})$$ \n",
    "\n",
    "If we define $\\Delta t = t_k-t_{k-1}$ and substitute it for $t$ in the equation above we get\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "x(t_k) &= \\Phi(\\Delta t)x(t_{k-1}) \\\\\n",
-    "\\mathbf{\\bar{x}} &= \\Phi(\\Delta t)\\mathbf{x} \\\\\n",
-    "\\mathbf{\\bar{x}} &=\\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\\mathbf{x} \\\\\n",
-    "\\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}^- &=\\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\\begin{bmatrix}x \\\\ \\dot{x}\\end{bmatrix}\n",
+    "\\mathbf{\\overline x} &= \\Phi(\\Delta t)\\mathbf x \\\\\n",
+    "\\mathbf{\\overline x} &=\\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\\mathbf x \\\\\n",
+    "\\overline{\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}} &=\\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}\n",
    "\\end{aligned}$$\n",
    "\n",
    "This is the same result used by the kinematic equations! This exercise was unnecessary other than to illustrate linearizing differential equations. Subsequent examples will require you to use these techniques. "
@ -568,19 +568,19 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The measurement function for our filter needs to take the filter state $\\mathbf{x}$ and turn it into a measurement, which is the slant range distance. We use the Pythagorean theorem to derive\n",
+    "The measurement function for our filter needs to take the filter state $\\mathbf x$ and turn it into a measurement, which is the slant range distance. We use the Pythagorean theorem to derive\n",
    "\n",
-    "$$h(\\mathbf{x}) = \\sqrt{x_{pos}^2 + x_{alt}^2}$$\n",
+    "$$h(\\mathbf x) = \\sqrt{x_{pos}^2 + x_{alt}^2}$$\n",
    "\n",
    "The relationship between the slant distance and the position on the ground is nonlinear due to the square root term. To use it in the EKF we must linearize it. As we discussed above, the best way to linearize an equation at a point is to find its slope, which we do by taking its derivative.\n",
    "\n",
    "$$\n",
-    "\\mathbf{H} \\equiv \\frac{\\partial{h}}{\\partial{x}}\\biggr|_x \n",
+    "\\mathbf H \\equiv \\frac{\\partial{h}}{\\partial{x}}\\biggr|_x \n",
    "$$\n",
    "\n",
    "The derivative of a matrix is called a Jacobian, which in general takes the form \n",
    "\n",
-    "$$\\frac{\\partial \\mathbf{h}}{\\partial \\mathbf{x}} = \n",
+    "$$\\frac{\\partial \\mathbf H}{\\partial \\mathbf x} = \n",
    "\\begin{bmatrix}\n",
    "\\frac{\\partial h_1}{\\partial x_1} & \\frac{\\partial h_1}{\\partial x_2} &\\dots \\\\\n",
    "\\frac{\\partial h_2}{\\partial x_1} & \\frac{\\partial h_2}{\\partial x_2} &\\dots \\\\\n",
@ -590,7 +590,7 @@
    "\n",
    "In other words, each element in the matrix is the partial derivative of the function $h$ with respect to the variables $x$. For our problem we have\n",
    "\n",
-    "$$\\mathbf{H} = \\begin{bmatrix}\\frac{\\partial h}{\\partial x_{pos}} & \\frac{\\partial h}{\\partial x_{vel}} & \\frac{\\partial h}{\\partial x_{alt}}\\end{bmatrix}$$\n",
+    "$$\\mathbf H = \\begin{bmatrix}\\frac{\\partial h}{\\partial x_{pos}} & \\frac{\\partial h}{\\partial x_{vel}} & \\frac{\\partial h}{\\partial x_{alt}}\\end{bmatrix}$$\n",
    "\n",
    "where $h(x) = \\sqrt{x_{pos}^2 + x_{alt}^2}$ as given above.\n",
    "\n",
@ -616,7 +616,7 @@
    "\n",
    "giving us \n",
    "\n",
-    "$$\\mathbf{H} = \n",
+    "$$\\mathbf H = \n",
    "\\begin{bmatrix}\n",
    "\\frac{x_{pos}}{\\sqrt{x_{pos}^2 + x_{alt}^2}} & \n",
    "0 &\n",
@ -624,9 +624,9 @@
    "\\frac{x_{alt}}{\\sqrt{x_{pos}^2 + x_{alt}^2}}\n",
    "\\end{bmatrix}$$\n",
    "\n",
-    "This may seem daunting, so step back and recognize that all of this math is doing something very simple. We have an equation for the slant range to the airplane which is nonlinear. The Kalman filter only works with linear equations, so we need to find a linear equation that approximates $\\mathbf{H}$. As we discussed above, finding the slope of a nonlinear equation at a given point is a good approximation. For the Kalman filter, the 'given point' is the state variable $\\mathbf{x}$ so we need to take the derivative of the slant range with respect to $\\mathbf{x}$. \n",
+    "This may seem daunting, so step back and recognize that all of this math is doing something very simple. We have an equation for the slant range to the airplane which is nonlinear. The Kalman filter only works with linear equations, so we need to find a linear equation that approximates $\\mathbf H$. As we discussed above, finding the slope of a nonlinear equation at a given point is a good approximation. For the Kalman filter, the 'given point' is the state variable $\\mathbf x$ so we need to take the derivative of the slant range with respect to $\\mathbf x$. \n",
    "\n",
-    "To make this more concrete, let's now write a Python function that computes the Jacobian of $\\mathbf{H}$ for this problem. The `ExtendedKalmanFilter` class will be using this to generate `ExtendedKalmanFilter.H` at each step of the process."
+    "To make this more concrete, let's now write a Python function that computes the Jacobian of $\\mathbf H$ for this problem. The `ExtendedKalmanFilter` class will be using this to generate `ExtendedKalmanFilter.H` at each step of the process."
   ]
  },
  {
@ -651,7 +651,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Finally, let's provide the code for $h(\\mathbf{x})$"
+    "Finally, let's provide the code for $h(\\mathbf x)$"
   ]
  },
  {
@ -724,17 +724,17 @@
    "\n",
    "The radar returns the range distance. A good radar can achieve accuracy of $\\sigma_{range}= 5$ meters, so we will use that value. This gives us\n",
    "\n",
-    "$$\\mathbf{R} = \\begin{bmatrix}\\sigma_{range}^2\\end{bmatrix} = \\begin{bmatrix}25\\end{bmatrix}$$\n",
+    "$$\\mathbf R = \\begin{bmatrix}\\sigma_{range}^2\\end{bmatrix} = \\begin{bmatrix}25\\end{bmatrix}$$\n",
    "\n",
    "\n",
-    "The design of $\\mathbf{Q}$ requires some discussion. The state $\\mathbf{x}= \\begin{bmatrix}x & \\dot{x} & y\\end{bmatrix}^\\mathtt{T}$. The first two elements are position (down range distance) and velocity, so we can use `Q_discrete_white_noise` noise to compute the values for the upper left hand side of Q. The third element of  $\\mathbf{x}$ is altitude, which we are assuming is independent of the down range distance. That leads us to a block design of $\\mathbf{Q}$ of:\n",
+    "The design of $\\mathbf Q$ requires some discussion. The state $\\mathbf x= \\begin{bmatrix}x & \\dot x & y\\end{bmatrix}^\\mathtt{T}$. The first two elements are position (down range distance) and velocity, so we can use `Q_discrete_white_noise` noise to compute the values for the upper left hand side of $\\mathbf Q$. The third element of  $\\mathbf x$ is altitude, which we are assuming is independent of the down range distance. That leads us to a block design of $\\mathbf Q$ of:\n",
    "\n",
-    "$$\\mathbf{Q} = \\begin{bmatrix}\\mathbf{Q}_\\mathtt{x} & 0 \\\\ 0 & Q_\\mathtt{y}\\end{bmatrix}$$\n",
+    "$$\\mathbf Q = \\begin{bmatrix}\\mathbf Q_\\mathtt{x} & 0 \\\\ 0 & \\mathbf Q_\\mathtt{y}\\end{bmatrix}$$\n",
    "\n",
    "\n",
    "### Implementation\n",
    "\n",
-    "The `FilterPy` library provides the class `ExtendedKalmanFilter`. It works very similar to the `KalmanFilter` class we have been using, except that it allows you to provide functions that compute the Jacobian of $\\mathbf{H}$ and the function $h(\\mathbf{x})$. We have already written the code for these two functions, so let's get going.\n",
+    "The `FilterPy` library provides the class `ExtendedKalmanFilter`. It works very similar to the `KalmanFilter` class we have been using, except that it allows you to provide functions that compute the Jacobian of $\\mathbf H$ and the function $h(\\mathbf x)$. We have already written the code for these two functions, so let's get going.\n",
    "\n",
    "We start by importing the filter and creating it. There are 3 variables in `x` and only 1 measurement. At the same time we will create our radar simulator.\n",
    "\n",
@ -760,7 +760,7 @@
    "                       [0, 0, 0]])*dt\n",
    "```\n",
    "\n",
-    "After assigning reasonable values to $\\mathbf{R}$, $\\mathbf{Q}$, and $\\mathbf{P}$ we can run the filter with a simple loop\n",
+    "After assigning reasonable values to $\\mathbf R$, $\\mathbf Q$, and $\\mathbf P$ we can run the filter with a simple loop\n",
    "\n",
    "```python\n",
    "for i in range(int(20/dt)):\n",
@ -1006,11 +1006,11 @@
    "\n",
    "For our robot we will maintain the position and orientation of the robot:\n",
    "\n",
-    "$$\\mathbf{x} = \\begin{bmatrix}x \\\\ y \\\\ \\theta\\end{bmatrix}$$\n",
+    "$$\\mathbf x = \\begin{bmatrix}x \\\\ y \\\\ \\theta\\end{bmatrix}$$\n",
    "\n",
-    "Our control input $\\mathbf{u}$ is the velocity and steering angle\n",
+    "Our control input $\\mathbf u$ is the velocity and steering angle\n",
    "\n",
-    "$$\\mathbf{u} = \\begin{bmatrix}v \\\\ \\alpha\\end{bmatrix}$$"
+    "$$\\mathbf u = \\begin{bmatrix}v \\\\ \\alpha\\end{bmatrix}$$"
   ]
  },
  {
@ -1021,35 +1021,35 @@
    "\n",
    "In general we model our system as a nonlinear motion model plus noise.\n",
    "\n",
-    "$$x^- = x + f(x, u) + \\mathcal{N}(0, Q)$$\n",
+    "$$\\overline x = x + f(x, u) + \\mathcal{N}(0, Q)$$\n",
    "\n",
    "Using the motion model for a robot that we created above, we can expand this to\n",
    "\n",
-    "$$\\begin{bmatrix}x\\\\y\\\\\\theta\\end{bmatrix}^- = \\begin{bmatrix}x\\\\y\\\\\\theta\\end{bmatrix} + \n",
+    "$$\\overline{\\begin{bmatrix}x\\\\y\\\\\\theta\\end{bmatrix}} = \\begin{bmatrix}x\\\\y\\\\\\theta\\end{bmatrix} + \n",
    "\\begin{bmatrix}- R\\sin(\\theta) + R\\sin(\\theta + \\beta) \\\\\n",
    "R\\cos(\\theta) - R\\cos(\\theta + \\beta) \\\\\n",
    "\\beta\\end{bmatrix}$$\n",
    "\n",
    "We linearize this with a taylor expansion at $x$:\n",
    "\n",
-    "$$f(x, u) \\approx \\mathbf{x} + \\frac{\\partial f(\\mathbf{x}, \\mathbf{u})}{\\partial x}\\biggr|_{\\mathbf{x}, \\mathbf{u}} $$\n",
+    "$$f(x, u) \\approx \\mathbf x + \\frac{\\partial f(\\mathbf x, \\mathbf u)}{\\partial x}\\biggr|_{\\mathbf x, \\mathbf u} $$\n",
    "\n",
-    "We replace $f(x, u)$ with our state estimate $\\mathbf{x}$, and the derivative is the Jacobian of $f$."
+    "We replace $f(x, u)$ with our state estimate $\\mathbf x$, and the derivative is the Jacobian of $f$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The Jacobian $\\mathbf{F}$ is\n",
+    "The Jacobian $\\mathbf F$ is\n",
    "\n",
-    "$$\\mathbf{F} = \\frac{\\partial f(x, u)}{\\partial x} =\\begin{bmatrix}\n",
-    "\\frac{\\partial \\dot{x}}{\\partial x} & \n",
-    "\\frac{\\partial \\dot{x}}{\\partial y} &\n",
-    "\\frac{\\partial \\dot{x}}{\\partial \\theta}\\\\\n",
-    "\\frac{\\partial \\dot{y}}{\\partial x} & \n",
-    "\\frac{\\partial \\dot{y}}{\\partial y} &\n",
-    "\\frac{\\partial \\dot{y}}{\\partial \\theta} \\\\\n",
+    "$$\\mathbf F = \\frac{\\partial f(x, u)}{\\partial x} =\\begin{bmatrix}\n",
+    "\\frac{\\partial \\dot x}{\\partial x} & \n",
+    "\\frac{\\partial \\dot x}{\\partial y} &\n",
+    "\\frac{\\partial \\dot x}{\\partial \\theta}\\\\\n",
+    "\\frac{\\partial \\dot y}{\\partial x} & \n",
+    "\\frac{\\partial \\dot y}{\\partial y} &\n",
+    "\\frac{\\partial \\dot y}{\\partial \\theta} \\\\\n",
    "\\frac{\\partial \\dot{\\theta}}{\\partial x} & \n",
    "\\frac{\\partial \\dot{\\theta}}{\\partial y} &\n",
    "\\frac{\\partial \\dot{\\theta}}{\\partial \\theta}\n",
@ -1058,7 +1058,7 @@
    "\n",
    "When we calculate these we get\n",
    "\n",
-    "$$\\mathbf{F} = \\begin{bmatrix}\n",
+    "$$\\mathbf F = \\begin{bmatrix}\n",
    "1 & 0 & -R\\cos(\\theta) + R\\cos(\\theta+\\beta) \\\\\n",
    "0 & 1 & -R\\sin(\\theta) + R\\sin(\\theta+\\beta) \\\\\n",
    "0 & 0 & 1\n",
@ -1166,11 +1166,11 @@
    "\n",
    "$$\\mathbf{M} = \\begin{bmatrix}\\sigma_{vel}^2 & 0 \\\\ 0 & \\sigma_\\alpha^2\\end{bmatrix}$$\n",
    "\n",
-    "If this was a linear problem we would convert from control space to state space using the by now familiar $\\mathbf{FMF}^\\mathsf{T}$ form. Since our motion model is nonlinear we do not try to find a closed form solution to this, but instead linearize it with a Jacobian which we will name $\\mathbf{V}$. \n",
+    "If this was a linear problem we would convert from control space to state space using the by now familiar $\\mathbf{FMF}^\\mathsf T$ form. Since our motion model is nonlinear we do not try to find a closed form solution to this, but instead linearize it with a Jacobian which we will name $\\mathbf{V}$. \n",
    "\n",
    "$$\\mathbf{V} = \\frac{\\partial f(x, u)}{\\partial u} \\begin{bmatrix}\n",
-    "\\frac{\\partial \\dot{x}}{\\partial v} & \\frac{\\partial \\dot{x}}{\\partial \\alpha} \\\\\n",
-    "\\frac{\\partial \\dot{y}}{\\partial v} & \\frac{\\partial \\dot{y}}{\\partial \\alpha} \\\\\n",
+    "\\frac{\\partial \\dot x}{\\partial v} & \\frac{\\partial \\dot x}{\\partial \\alpha} \\\\\n",
+    "\\frac{\\partial \\dot y}{\\partial v} & \\frac{\\partial \\dot y}{\\partial \\alpha} \\\\\n",
    "\\frac{\\partial \\dot{\\theta}}{\\partial v} & \\frac{\\partial \\dot{\\theta}}{\\partial \\alpha}\n",
    "\\end{bmatrix}$$\n",
    "\n",
@ -1247,14 +1247,14 @@
    "This gives us the final form of our prediction equations:\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{\\bar{x}} &= \\mathbf{x} + \n",
+    "\\mathbf{\\overline x} &= \\mathbf x + \n",
    "\\begin{bmatrix}- R\\sin(\\theta) + R\\sin(\\theta + \\beta) \\\\\n",
    "R\\cos(\\theta) - R\\cos(\\theta + \\beta) \\\\\n",
    "\\beta\\end{bmatrix}\\\\\n",
-    "\\mathbf{\\bar{P}} &=\\mathbf{FPF}^{\\mathsf{T}} + \\mathbf{VMV}^{\\mathsf{T}}\n",
+    "\\mathbf{\\overline P} &=\\mathbf{FPF}^{\\mathsf T} + \\mathbf{VMV}^{\\mathsf T}\n",
    "\\end{aligned}$$\n",
    "\n",
-    "One final point. This form of linearization is not the only way to predict $\\mathbf{x}$. For example, we could use a numerical integration technique like *Runge Kutta* to compute the position of the robot in the future. In fact, if the time step is relatively large you will have to do that. As I am sure you are realizing, things are not as cut and dried with the EKF as it was for the KF. For a real problem you have to very carefully model your system with differential equations and then determine the most appropriate way to solve that system. The correct approach depends on the accuracy you require, how nonlinear the equations are, your processor budget, and numerical stability concerns. These are all topics beyond the scope of this book."
+    "One final point. This form of linearization is not the only way to predict $\\mathbf x$. For example, we could use a numerical integration technique like *Runge Kutta* to compute the position of the robot in the future. In fact, if the time step is relatively large you will have to do that. As I am sure you are realizing, things are not as cut and dried with the EKF as it was for the KF. For a real problem you have to very carefully model your system with differential equations and then determine the most appropriate way to solve that system. The correct approach depends on the accuracy you require, how nonlinear the equations are, your processor budget, and numerical stability concerns. These are all topics beyond the scope of this book."
   ]
  },
  {
@ -1263,7 +1263,7 @@
   "source": [
    "### Design the Measurement Model\n",
    "\n",
-    "Now we need to design our measurement model. For this problem we are assuming that we have a sensor that receives a noisy bearing and range to multiple known locations in the landscape. The measurement model must convert the state $\\begin{bmatrix}x & y&\\theta\\end{bmatrix}^\\mathsf{T}$ into a range and bearing to the landmark. Using $p$ be the position of a landmark, the range $r$ is\n",
+    "Now we need to design our measurement model. For this problem we are assuming that we have a sensor that receives a noisy bearing and range to multiple known locations in the landscape. The measurement model must convert the state $\\begin{bmatrix}x & y&\\theta\\end{bmatrix}^\\mathsf T$ into a range and bearing to the landmark. Using $p$ be the position of a landmark, the range $r$ is\n",
    "\n",
    "$$r = \\sqrt{(p_x - x)^2 + (p_y - y)^2}$$\n",
    "\n",
@ -1276,14 +1276,14 @@
    "\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{x}& = h(x,p) &+ \\mathcal{N}(0, R)\\\\\n",
+    "\\mathbf x& = h(x,p) &+ \\mathcal{N}(0, R)\\\\\n",
    "&= \\begin{bmatrix}\n",
    "\\sqrt{(p_x - x)^2 + (p_y - y)^2} \\\\\n",
    "\\arctan(\\frac{p_y - y}{p_x - x}) - \\theta \n",
    "\\end{bmatrix} &+ \\mathcal{N}(0, R)\n",
    "\\end{aligned}$$\n",
    "\n",
-    "This is clearly nonlinear, so we need linearize $h(x, p)$  at $\\mathbf{x}$ by taking its Jacobian. We compute that with SymPy below."
+    "This is clearly nonlinear, so we need linearize $h(x, p)$  at $\\mathbf x$ by taking its Jacobian. We compute that with SymPy below."
   ]
  },
  {
@ -1644,7 +1644,7 @@
    "\n",
    "From this we can see that there is a lot of uncertainty added by our motion model, and that most of the error in in the direction of motion. We can see that from the shape of the blue ellipses. After a few steps we can see that the filter incorporates the landmark measurements.\n",
    "\n",
-    "I used the same initial conditions and landmark locations in the UKF chapter. You can see both in the plot and in the printed final value for $\\mathbf{P}$ that the UKF achieves much better accuracy in terms of the error ellipse. The black solid line denotes the robot's actual path. Both perform roughly as well as far as their estimate for $\\mathbf{x}$ is concerned. \n",
+    "I used the same initial conditions and landmark locations in the UKF chapter. You can see both in the plot and in the printed final value for $\\mathbf P$ that the UKF achieves much better accuracy in terms of the error ellipse. The black solid line denotes the robot's actual path. Both perform roughly as well as far as their estimate for $\\mathbf x$ is concerned. \n",
    "\n",
    "Now lets add another landmark."
   ]
@ -1931,7 +1931,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,
--- a/Appendix-B-Symbols-and-Notations.ipynb
+++ b/Appendix-B-Symbols-and-Notations.ipynb
@ -266,7 +266,7 @@
   "source": [
    "# Symbols and Notations\n",
    "\n",
-    "Here is a collection of the notation used by various authors for the linear Kalman filter. I have ordered them in the same order so that you can compare and contrast the choices. I use the original variable names; I did not try to normalize them."
+    "Here is a collection of the notation used by various authors for the linear Kalman filter equations."
   ]
  },
  {
@ -275,28 +275,29 @@
   "source": [
    "## Labbe\n",
    "\n",
+    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\overline{\\mathbf{x}} &= \\mathbf{F}\\hat{\\mathbf{x}} + \\mathbf{B}\\mathbf{u} \\\\\n",
-    "\\overline{\\mathbf{P}} &=  \\mathbf{F} \\mathbf{P}\\mathbf{F}^\\mathsf{T} + \\textbf{Q} \\\\ \\\\\n",
-    "\\mathbf{y} &= \\mathbf{z} - \\mathbf{H}\\overline{\\textbf{x}} \\\\\n",
-    "\\mathbf{S} &= \\mathbf{H}\\overline{\\mathbf{P}}\\mathbf{H}^\\mathsf{T} + \\mathbf{R} \\\\\n",
-    "\\mathbf{K} &= \\overline{\\mathbf{P}}\\mathbf{H}^\\mathsf{T}\\mathbf{S}^{-1} \\\\\n",
-    "\\hat{\\textbf{x}}  &= \\overline{\\mathbf{x}} +\\mathbf{K}\\mathbf{y} \\\\\n",
-    "\\mathbf{P} &= (\\mathbf{I}-\\mathbf{K}\\mathbf{H})\\overline{\\mathbf{P}}\n",
+    "\\overline{\\mathbf x} &= \\mathbf{Fx} + \\mathbf{Bu} \\\\\n",
+    "\\overline{\\mathbf P} &=  \\mathbf{FPF}^\\mathsf{T} + \\mathbf Q \\\\ \\\\\n",
+    "\\mathbf y &= \\mathbf z - \\mathbf{H}\\overline{\\mathbf x} \\\\\n",
+    "\\mathbf S &= \\mathbf{H}\\overline{\\mathbf P}\\mathbf{H}^\\mathsf{T} + \\mathbf R \\\\\n",
+    "\\mathbf K &= \\overline{\\mathbf P}\\mathbf{H}^\\mathsf{T}\\mathbf{S}^{-1} \\\\\n",
+    "\\mathbf x  &= \\overline{\\mathbf x} +\\mathbf{Ky} \\\\\n",
+    "\\mathbf P &= (\\mathbf{I}-\\mathbf{KH})\\overline{\\mathbf P}\n",
    "\\end{aligned}$$\n",
    "\n",
    "\n",
    "## Wikipedia\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\hat{\\textbf{x}}_{k\\mid k-1} &= \\textbf{F}_{k}\\hat{\\textbf{x}}_{k-1\\mid k-1} + \\textbf{B}_{k} \\textbf{u}_{k} \\\\\n",
-    "\\textbf{P}_{k\\mid k-1} &=  \\textbf{F}_{k} \\textbf{P}_{k-1\\mid k-1} \\textbf{F}_{k}^{\\textsf{T}} + \\textbf{Q}_{k}\\\\\n",
-    "\\tilde{\\textbf{y}}_k &= \\textbf{z}_k - \\textbf{H}_k\\hat{\\textbf{x}}_{k\\mid k-1} \\\\\n",
-    "\\textbf{S}_k &= \\textbf{H}_k \\textbf{P}_{k\\mid k-1} \\textbf{H}_k^\\textsf{T} + \\textbf{R}_k \\\\\n",
-    "\\textbf{K}_k &= \\textbf{P}_{k\\mid k-1}\\textbf{H}_k^\\textsf{T}\\textbf{S}_k^{-1} \\\\\n",
-    "\\hat{\\textbf{x}}_{k\\mid k} &= \\hat{\\textbf{x}}_{k\\mid k-1} + \\textbf{K}_k\\tilde{\\textbf{y}}_k \\\\\n",
-    "\\textbf{P}_{k|k} &= (I - \\textbf{K}_k \\textbf{H}_k) \\textbf{P}_{k|k-1}\n",
+    "\\hat{\\mathbf x}_{k\\mid k-1} &= \\mathbf{F}_{k}\\hat{\\mathbf x}_{k-1\\mid k-1} + \\mathbf{B}_{k} \\mathbf{u}_{k} \\\\\n",
+    "\\mathbf P_{k\\mid k-1} &=  \\mathbf{F}_{k} \\mathbf P_{k-1\\mid k-1} \\mathbf{F}_{k}^{\\textsf{T}} + \\mathbf Q_{k}\\\\\n",
+    "\\tilde{\\mathbf{y}}_k &= \\mathbf{z}_k - \\mathbf{H}_k\\hat{\\mathbf x}_{k\\mid k-1} \\\\\n",
+    "\\mathbf{S}_k &= \\mathbf{H}_k \\mathbf P_{k\\mid k-1} \\mathbf{H}_k^\\textsf{T} + \\mathbf{R}_k \\\\\n",
+    "\\mathbf{K}_k &= \\mathbf P_{k\\mid k-1}\\mathbf{H}_k^\\textsf{T}\\mathbf{S}_k^{-1} \\\\\n",
+    "\\hat{\\mathbf x}_{k\\mid k} &= \\hat{\\mathbf x}_{k\\mid k-1} + \\mathbf{K}_k\\tilde{\\mathbf{y}}_k \\\\\n",
+    "\\mathbf P_{k|k} &= (I - \\mathbf{K}_k \\mathbf{H}_k) \\mathbf P_{k|k-1}\n",
    "\\end{aligned}$$\n",
    "\n",
    "## Brookner\n",
@ -326,11 +327,11 @@
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\hat{\\textbf{x}}^-_{k+1} &= \\mathbf{\\phi}_{k}\\hat{\\textbf{x}}_{k} \\\\\n",
-    "\\hat{\\textbf{x}}_k  &= \\hat{\\textbf{x}}^-_k +\\textbf{K}_k[\\textbf{z}_k - \\textbf{H}_k\\hat{\\textbf{}x}^-_k] \\\\\n",
-    "\\textbf{K}_k &= \\textbf{P}^-_k\\textbf{H}_k^\\mathsf{T}[\\textbf{H}_k\\textbf{P}^-_k\\textbf{H}_k^T + \\textbf{R}_k]^{-1}\\\\\n",
-    "\\textbf{P}^-_{k+1} &=  \\mathbf{\\phi}_k \\textbf{P}_k\\mathbf{\\phi}_k^\\mathsf{T} + \\textbf{Q}_{k} \\\\\n",
-    "\\mathbf{P}_k &= (\\mathbf{I}-\\mathbf{K}_k\\mathbf{H}_k)\\mathbf{P}^-_k\n",
+    "\\hat{\\mathbf x}^-_{k+1} &= \\mathbf{\\phi}_{k}\\hat{\\mathbf x}_{k} \\\\\n",
+    "\\hat{\\mathbf x}_k  &= \\hat{\\mathbf x}^-_k +\\mathbf{K}_k[\\mathbf{z}_k - \\mathbf{H}_k\\hat{\\mathbf{}x}^-_k] \\\\\n",
+    "\\mathbf{K}_k &= \\mathbf P^-_k\\mathbf{H}_k^\\mathsf{T}[\\mathbf{H}_k\\mathbf P^-_k\\mathbf{H}_k^T + \\mathbf{R}_k]^{-1}\\\\\n",
+    "\\mathbf P^-_{k+1} &=  \\mathbf{\\phi}_k \\mathbf P_k\\mathbf{\\phi}_k^\\mathsf{T} + \\mathbf Q_{k} \\\\\n",
+    "\\mathbf P_k &= (\\mathbf{I}-\\mathbf{K}_k\\mathbf{H}_k)\\mathbf P^-_k\n",
    "\\end{aligned}$$\n",
    "\n",
    "\n",
@ -349,36 +350,6 @@
    "\n",
    "## Thrun"
   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Terminology\n",
-    "\n",
-    "### Bar-Shalom\n",
-    "\n",
-    "* State-space\n",
-    "* x: state vector\n",
-    "* u: input vector\n",
-    "* Process noise or plant noise\n",
-    "* system matrix: A  (for $\\dot(x) = Ax + Bu + Dv$)\n",
-    "* F: state transition matrix\n",
-    "* H : Measurement matrix\n",
-    "* y : measurement residual\n",
-    "* $\\overline{P}$ : state prediction covariance\n",
-    "* P: updated state covariance\n",
-    "* S: innovatin covariance\n",
-    "* K(W): filter gain\n",
-    "* $\\hat{x}$ updated state estimate\n",
-    "\n",
-    "\n",
-    "smoothed state = retrodicted state\n",
-    "\n",
-    "x0 = initial estimate\n",
-    "\n",
-    "P0 : initial covariance\n"
-   ]
  }
 ],
 "metadata": {
@ -397,7 +368,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.5.1"
  }
 },
 "nbformat": 4,