Fixed eqn alignment issues. overline to bar.

Some equations used \\ without a gathered or aligned block. They render fine in the notebook, not not in the PDF. Also, switched back my ill chosen use of \overline for \bar.
2016-01-24 14:08:07 -08:00 · 2016-01-24 14:08:07 -08:00 · 71da22ad8c
commit 71da22ad8c
parent d5ea503fde
9 changed files with 199 additions and 180 deletions
--- a/03-Gaussians.ipynb
+++ b/03-Gaussians.ipynb
@ -373,7 +373,7 @@
    "\n",
    "Another example is a fair coin. It has the sample space {H, T}. The coin is fair, so the probability for heads (H) is 50%, and the probability for tails (T) is 50%. We write this as\n",
    "\n",
-    "$$P(X{=}H) = 0.5\\\\P(X{=}T)=0.5$$\n",
+    "$$\\begin{gathered}P(X{=}H) = 0.5\\\\P(X{=}T)=0.5\\end{gathered}$$\n",
    "\n",
    "Sample spaces are not unique. One sample space for a die is {1, 2, 3, 4, 5, 6}. Another valid sample space would be {even, odd}. Another might be {dots in all corners, not dots in all corners}. A sample space is valid so long as it covers all possibilities, and any single event is described by only one element.  {even, 1, 3, 4, 5} is not a valid sample space for a die since a value of 4 is matched both by 'even' and '4'.\n",
    "\n",
@ -1442,9 +1442,9 @@
    "\n",
    "The sum of two Gaussians is given by\n",
    "\n",
-    "$$\\mu = \\mu_1 + \\mu_2 \\\\\n",
+    "$$\\begin{gathered}\\mu = \\mu_1 + \\mu_2 \\\\\n",
    "\\sigma^2 = \\sigma^2_1 + \\sigma^2_2\n",
-    "$$\n",
+    "\\end{gathered}$$\n",
    "\n",
    "There are several proofs for this. I will use convolution since we used convolution in the previous chapter for the histograms of probabilities. \n",
    "\n",
@ -1455,17 +1455,18 @@
    "This is the equation for a convolution. Now we just do some math:\n",
    "\n",
    "\n",
-    "$p(x) = \\int\\limits_{-\\infty}^\\infty f_2(x-x_1)f_1(x_1)\\, dx \\\\\n",
-    "=  \\int\\limits_{-\\infty}^\\infty \n",
-    "\\frac{1}{\\sqrt{2\\pi}\\sigma_z}\\exp\\left[-\\frac{x - z - \\mu_z}{2\\sigma^2_z}\\right]\n",
-    "\\frac{1}{\\sqrt{2\\pi}\\sigma_p}\\exp\\left[-\\frac{x - \\mu_p}{2\\sigma^2_p}\\right] \\, dx \\\\\n",
-    "=  \\int\\limits_{-\\infty}^\\infty\n",
-    "\\frac{1}{\\sqrt{2\\pi}\\sqrt{\\sigma_p^2 + \\sigma_z^2}} \\exp\\left[ -\\frac{(x - (\\mu_p + \\mu_z)))^2}{2(\\sigma_z^2+\\sigma_p^2)}\\right]\n",
-    "\\frac{1}{\\sqrt{2\\pi}\\frac{\\sigma_p\\sigma_z}{\\sqrt{\\sigma_p^2 + \\sigma_z^2}}} \\exp\\left[ -\\frac{(x - \\frac{\\sigma_p^2(x-\\mu_z) + \\sigma_z^2\\mu_p}{}))^2}{2\\left(\\frac{\\sigma_p\\sigma_x}{\\sqrt{\\sigma_z^2+\\sigma_p^2}}\\right)^2}\\right] \\, dx\n",
-    "= \\frac{1}{\\sqrt{2\\pi}\\sqrt{\\sigma_p^2 + \\sigma_z^2}} \\exp\\left[ -\\frac{(x - (\\mu_p + \\mu_z)))^2}{2(\\sigma_z^2+\\sigma_p^2)}\\right] \\int\\limits_{-\\infty}^\\infty\n",
-    "\\frac{1}{\\sqrt{2\\pi}\\frac{\\sigma_p\\sigma_z}{\\sqrt{\\sigma_p^2 + \\sigma_z^2}}} \\exp\\left[ -\\frac{(x - \\frac{\\sigma_p^2(x-\\mu_z) + \\sigma_z^2\\mu_p}{}))^2}{2\\left(\\frac{\\sigma_p\\sigma_x}{\\sqrt{\\sigma_z^2+\\sigma_p^2}}\\right)^2}\\right] \\, dx\n",
-    "$\n",
+    "$p(x) = \\int\\limits_{-\\infty}^\\infty f_2(x-x_1)f_1(x_1)\\, dx$\n",
    "\n",
+    "$=  \\int\\limits_{-\\infty}^\\infty \n",
+    "\\frac{1}{\\sqrt{2\\pi}\\sigma_z}\\exp\\left[-\\frac{x - z - \\mu_z}{2\\sigma^2_z}\\right]\n",
+    "\\frac{1}{\\sqrt{2\\pi}\\sigma_p}\\exp\\left[-\\frac{x - \\mu_p}{2\\sigma^2_p}\\right] \\, dx$\n",
+    "\n",
+    "$=  \\int\\limits_{-\\infty}^\\infty\n",
+    "\\frac{1}{\\sqrt{2\\pi}\\sqrt{\\sigma_p^2 + \\sigma_z^2}} \\exp\\left[ -\\frac{(x - (\\mu_p + \\mu_z)))^2}{2(\\sigma_z^2+\\sigma_p^2)}\\right]\n",
+    "\\frac{1}{\\sqrt{2\\pi}\\frac{\\sigma_p\\sigma_z}{\\sqrt{\\sigma_p^2 + \\sigma_z^2}}} \\exp\\left[ -\\frac{(x - \\frac{\\sigma_p^2(x-\\mu_z) + \\sigma_z^2\\mu_p}{}))^2}{2\\left(\\frac{\\sigma_p\\sigma_x}{\\sqrt{\\sigma_z^2+\\sigma_p^2}}\\right)^2}\\right] \\, dx$\n",
+    "\n",
+    "$= \\frac{1}{\\sqrt{2\\pi}\\sqrt{\\sigma_p^2 + \\sigma_z^2}} \\exp\\left[ -\\frac{(x - (\\mu_p + \\mu_z)))^2}{2(\\sigma_z^2+\\sigma_p^2)}\\right] \\int\\limits_{-\\infty}^\\infty\n",
+    "\\frac{1}{\\sqrt{2\\pi}\\frac{\\sigma_p\\sigma_z}{\\sqrt{\\sigma_p^2 + \\sigma_z^2}}} \\exp\\left[ -\\frac{(x - \\frac{\\sigma_p^2(x-\\mu_z) + \\sigma_z^2\\mu_p}{}))^2}{2\\left(\\frac{\\sigma_p\\sigma_x}{\\sqrt{\\sigma_z^2+\\sigma_p^2}}\\right)^2}\\right] \\, dx$\n",
    "\n",
    "The expression inside the integral is a normal distribution. The sum of a normal distribution is one, hence the integral is one. This gives us\n",
    "\n",
@ -1473,8 +1474,8 @@
    "\n",
    "This is in the form of a normal, where\n",
    "\n",
-    "$$\\mu_x = \\mu_p + \\mu_z \\\\\n",
-    "\\sigma_x^2 = \\sigma_z^2+\\sigma_p^2\\, \\square$$"
+    "$$\\begin{gathered}\\mu_x = \\mu_p + \\mu_z \\\\\n",
+    "\\sigma_x^2 = \\sigma_z^2+\\sigma_p^2\\, \\square\\end{gathered}$$"
   ]
  },
  {
--- a/04-One-Dimensional-Kalman-Filters.ipynb
+++ b/04-One-Dimensional-Kalman-Filters.ipynb
@ -525,17 +525,19 @@
    "\n",
    "What is the sum of two Gaussians? In the last chapter I proved that:\n",
    "\n",
-    "\n",
-    "$$\\mu = \\mu_1 + \\mu_2 \\\\\n",
+    "$$\\begin{gathered}\n",
+    "\\mu = \\mu_1 + \\mu_2 \\\\\n",
    "\\sigma^2 = \\sigma^2_1 + \\sigma^2_2\n",
-    "$$\n",
+    "\\end{gathered}$$\n",
    "\n",
    "This is fantastic news; the sum of two Gaussians is another Gaussian! \n",
    "\n",
    "The math works, but does this make intuitive sense?  Think of the physical representation of this abstract equation. We have \n",
    "\n",
-    "$$x=\\mathcal N(10, 0.2^2)\\\\\n",
-    "f_x = \\mathcal N (15, 0.7^2)$$\n",
+    "$$\\begin{gathered}\n",
+    "x=\\mathcal N(10, 0.2^2)\\\\\n",
+    "f_x = \\mathcal N (15, 0.7^2)\n",
+    "\\end{gathered}$$\n",
    "\n",
    "If we add these we get:\n",
    "\n",
@ -1036,8 +1038,7 @@
    "\n",
    "Let's work a few examples. If the measurement is nine times more accurate than the prior, then $\\bar\\sigma^2 = 9\\sigma_z^2$, and\n",
    "\n",
-    "$$\n",
-    "\\begin{aligned}\n",
+    "$$\\begin{aligned}\n",
    "\\mu&=\\frac{9 \\sigma_z^2 \\mu_z + \\sigma_z^2\\, \\bar\\mu} {9 \\sigma_z^2 + \\sigma_\\mathtt{z}^2} \\\\\n",
    "&= \\left(\\frac{9}{10}\\right) \\mu_z + \\left(\\frac{1}{10}\\right) \\bar\\mu\n",
    "\\end{aligned}\n",
@ -1047,8 +1048,10 @@
    "\n",
    "If the measurement and prior are equally accurate, then $\\bar\\sigma^2 = \\sigma_z^2$ and\n",
    "\n",
-    "$$\\mu=\\frac{\\sigma_z^2\\,  (\\bar\\mu + \\mu_z)}{2\\sigma_\\mathtt{z}^2}\\\\\n",
-    "= \\left(\\frac{1}{2}\\right)\\bar\\mu + \\left(\\frac{1}{2}\\right)\\mu_z$$\n",
+    "$$\\begin{gathered}\n",
+    "\\mu=\\frac{\\sigma_z^2\\,  (\\bar\\mu + \\mu_z)}{2\\sigma_\\mathtt{z}^2} \\\\\n",
+    "= \\left(\\frac{1}{2}\\right)\\bar\\mu + \\left(\\frac{1}{2}\\right)\\mu_z\n",
+    "\\end{gathered}$$\n",
    "\n",
    "which is the average of the two means. It makes intuitive sense to take the average of two equally accurate values.\n",
    "\n",
--- a/06-Multivariate-Kalman-Filters.ipynb
+++ b/06-Multivariate-Kalman-Filters.ipynb
@ -809,7 +809,7 @@
    "We can rewrite this in matrix form as\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "{\\overline{\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix}}} &= \\begin{bmatrix}1&\\Delta t  \\\\ 0&1\\end{bmatrix}  \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}\\\\\n",
+    "\\begin{bmatrix}\\bar x \\\\ \\bar{\\dot x}\\end{bmatrix} &= \\begin{bmatrix}1&\\Delta t  \\\\ 0&1\\end{bmatrix}  \\begin{bmatrix}x \\\\ \\dot x\\end{bmatrix}\\\\\n",
    "\\mathbf{\\bar x} &= \\mathbf{Fx}\n",
    "\\end{aligned}$$\n",
    "\n",
@ -1615,14 +1615,15 @@
    "\n",
    "the value for $\\mathbf{FPF}^\\mathsf T$ is:\n",
    "\n",
-    "$$\\mathbf{FPF}^\\mathsf T = \\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\n",
+    "$$\\begin{aligned}\n",
+    "\\mathbf{FPF}^\\mathsf T &= \\begin{bmatrix}1&\\Delta t\\\\0&1\\end{bmatrix}\n",
    "\\begin{bmatrix}\\sigma^2_x & 0 \\\\  0 & \\sigma^2_{v}\\end{bmatrix}\n",
    "\\begin{bmatrix}1&0\\\\\\Delta t&1\\end{bmatrix} \\\\\n",
-    "= \\begin{bmatrix}\\sigma^2_x&\\sigma_v^2\\Delta t\\\\  0 & \\sigma^2_{v}\\end{bmatrix}\n",
+    "&= \\begin{bmatrix}\\sigma^2_x&\\sigma_v^2\\Delta t\\\\  0 & \\sigma^2_{v}\\end{bmatrix}\n",
    "\\begin{bmatrix}1&0\\\\\\Delta t&1\\end{bmatrix} \\\\\n",
-    "= \\begin{bmatrix}\\sigma^2_x +  \\sigma_v^2\\Delta t^2  &  \\sigma_v^2\\Delta t \\\\\n",
+    "&= \\begin{bmatrix}\\sigma^2_x +  \\sigma_v^2\\Delta t^2  &  \\sigma_v^2\\Delta t \\\\\n",
    "\\sigma_v^2\\Delta t & \\sigma^2_{v}\\end{bmatrix}\n",
-    "$$\n",
+    "\\end{aligned}$$\n",
    "\n",
    "The initial value for $\\mathbf P$ had no covariance between the position and velocity.  Position is computed as $\\dot x\\Delta t + x$, so there is a correlation between the position and velocity. The multiplication $\\mathbf{FPF}^\\mathsf T$ computes a covariance of $\\sigma_v^2 \\Delta t$. The exact value is not important; you just need to recognize that $\\mathbf{FPF}^\\mathsf T$ uses the process model to automatically compute the covariance between the position and velocity!\n",
    "\n",
--- a/07-Kalman-Filter-Math.ipynb
+++ b/07-Kalman-Filter-Math.ipynb
@ -327,9 +327,9 @@
    "\n",
    "Modeling dynamic systems is properly the topic of several college courses. To an extent there is no substitute for a few semesters of ordinary and partial differential equations followed by a graduate course in control system theory. If you are a hobbyist, or trying to solve one very specific filtering problem at work you probably do not have the time and/or inclination to devote a year or more to that education.\n",
    "\n",
-    "Fortunately, I can present enough of the theory to allow us to create the system equations for many different Kalman filters. My goal is to get you to the stage where you can read a publication and understand it well enough to implement the algorithms. The background math is deep, but in practice we end up using a few simple techniques over and over again. \n",
+    "Fortunately, I can present enough of the theory to allow us to create the system equations for many different Kalman filters. My goal is to get you to the stage where you can read a publication and understand it well enough to implement the algorithms. The background math is deep, but in practice we end up using a few simple techniques. \n",
    "\n",
-    "This is the longest section of pure math in this book. You will need to master everything in this section to understand the Extended Kalman filter (EKF), the workhorse nonlinear filter. I do cover more modern filters that do not require as much of this math. You can choose so skim now, and come back to this if you decide to learn the EKF.\n",
+    "This is the longest section of pure math in this book. You will need to master everything in this section to understand the Extended Kalman filter (EKF), the most common nonlinear filter. I do cover more modern filters that do not require as much of this math. You can choose to skim now, and come back to this if you decide to learn the EKF.\n",
    "\n",
    "We need to start by understanding the underlying equations and assumptions that the Kalman filter uses. We are trying to model real world phenomena, so what do we have to consider?\n",
    "\n",
@ -348,9 +348,9 @@
    "\\quad \\mathbf a = \\frac{d \\mathbf v}{d t} = \\frac{d^2 \\mathbf x}{d t^2}\n",
    "$$\n",
    "\n",
-    "A typical automobile tracking problem would have you compute the distance traveled given a constant velocity or acceleration as we did in previous chapters. But, of course we know this is not all that is happening. No car travels on a perfect road. There are bumps that cause the car to slow down, there is wind drag, there are hills that raise and lower the speed. The suspension is a mechanical system with friction and imperfect springs. Gusts of wind alter the car's state.\n",
+    "A typical automobile tracking problem would have you compute the distance traveled given a constant velocity or acceleration, as we did in previous chapters. But, of course we know this is not all that is happening. No car travels on a perfect road. There are bumps, wind drag, and hills that raise and lower the speed. The suspension is a mechanical system with friction and imperfect springs.\n",
    "\n",
-    "Acurately modeling a system with linear equations is impossible except for the most trivial problems. So control theory is forced to make a simplification. At any time $t$ we say that the true state (such as the position of our car) is the predicted value from the imperfect model plus some unknown *process noise*:\n",
+    "Perfectly modeling a system is impossible except for the most trivial problems. We are forced to make a simplification. At any time $t$ we say that the true state (such as the position of our car) is the predicted value from the imperfect model plus some unknown *process noise*:\n",
    "\n",
    "$$\n",
    "x(t) = x_{pred}(t) + noise(t)\n",
@ -358,7 +358,7 @@
    "\n",
    "This is not meant to imply that $noise(t)$ is a function that we can derive analytically. It is merely a statement of fact - we can always describe the true value as the predicted value  plus the process noise. \"Noise\" does not imply random events. If we are tracking a thrown ball in the atmosphere, and our model assumes the ball is in a vacuum, then the effect of air drag is process noise in this context.\n",
    "\n",
-    "In the next section we will learn techniques to convert a set of differential equations into a set of first-order differential equations.  Assuming that, we can say that our model of the system without noise is:\n",
+    "In the next section we will learn techniques to convert a set of higher order differential equations into a set of first-order differential equations.  After the conversion the model of the system without noise is:\n",
    "\n",
    "$$ \\dot{\\mathbf x} = \\mathbf{Ax}$$\n",
    "\n",
@ -372,7 +372,7 @@
    "\n",
    "$$ \\dot{\\mathbf x} = \\mathbf{Ax} + \\mathbf{Bu} + \\mathbf{w}$$\n",
    "\n",
-    "And that's it. That is one of the equations that Dr. Kalman set out to solve, and he found an optimal esitmator if we assume certain properties of $\\mathbf w$."
+    "And that's it. That is one of the equations that Dr. Kalman set out to solve, and he found an optimal estimator if we assume certain properties of $\\mathbf w$."
   ]
  },
  {
@ -386,11 +386,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "In the last section we derived the equation\n",
+    "We've derived the equation\n",
    "\n",
-    "$$ \\dot{\\mathbf x} = \\mathbf{Ax}+ \\mathbf{Bu} + \\mathbf{w}$$.\n",
+    "$$ \\dot{\\mathbf x} = \\mathbf{Ax}+ \\mathbf{Bu} + \\mathbf{w}$$\n",
    "\n",
-    "However, for our filters we are not interested in the derivative of $\\mathbf x$, but in $\\mathbf x$ itself. Ignoring the noise for a moment, we want an equation that recusively finds the value of $\\mathbf x$ at time $t_k$ in terms of $\\mathbf x$ at time $t_{k-1}$:\n",
+    "However, we are not interested in the derivative of $\\mathbf x$, but in $\\mathbf x$ itself. Ignoring the noise for a moment, we want an equation that recusively finds the value of $\\mathbf x$ at time $t_k$ in terms of $\\mathbf x$ at time $t_{k-1}$:\n",
    "\n",
    "$$\\mathbf x(t_k) = \\mathbf F(\\Delta t)\\mathbf x(t_{k-1}) + \\mathbf B(t_k) + \\mathbf u (t_k)$$\n",
    "\n",
@ -399,13 +399,13 @@
    "\n",
    "$$\\mathbf x_k = \\mathbf{Fx}_{k-1} + \\mathbf B_k\\mathbf u_k$$\n",
    "\n",
-    "$\\mathbf F$ is the familiar *state transition matrix*, named due to its ability to transition the state from the previous time step to the current time step. It is very similar to the system dynamics matrix $\\mathbf A$. The difference is that the system dynamics matrix $\\mathbf A$ models a set of linear differential equations, and is continuous. $\\mathbf F$ is discrete, and represents a set of linear equations (not differential equations) which step transition $\\mathbf x_{k-1}$ to $\\mathbf x_k$ over a discrete time step $\\Delta t$. \n",
+    "$\\mathbf F$ is the familiar *state transition matrix*, named due to its ability to transition the state's value between discrete time steps. It is very similar to the system dynamics matrix $\\mathbf A$. The difference is that $\\mathbf A$ models a set of linear differential equations, and is continuous. $\\mathbf F$ is discrete, and represents a set of linear equations (not differential equations) which transitions $\\mathbf x_{k-1}$ to $\\mathbf x_k$ over a discrete time step $\\Delta t$. \n",
    "\n",
-    "Normally finding this equation is quite difficult. The equation $\\dot x = v$ is the simplest possible differential equation and we trivially integrate it as:\n",
+    "Finding this matrix is often quite difficult. The equation $\\dot x = v$ is the simplest possible differential equation and we trivially integrate it as:\n",
    "\n",
-    "$$ \\int\\limits_{x_{k-1}}^{x_k}  \\mathrm{d}x = \\int\\limits_{0}^{\\Delta t} v\\, \\mathrm{d}t \\\\\n",
-    "x_k-x_0 = v \\Delta t \\\\\n",
-    "x_k = v \\Delta t + x_0$$\n",
+    "$$ \\int\\limits_{x_{k-1}}^{x_k}  \\mathrm{d}x = \\int\\limits_{0}^{\\Delta t} v\\, \\mathrm{d}t $$\n",
+    "$$x_k-x_0 = v \\Delta t$$\n",
+    "$$x_k = v \\Delta t + x_0$$\n",
    "\n",
    "This equation is *recursive*: we compute the value of $x$ at time $t$ based on its value at time $t-1$. This recursive form enables us to represent the system (process model) in the form required by the Kalman filter:\n",
    "\n",
@ -439,26 +439,26 @@
    "State-space methods require first-order equations. Any higher order system of equations can be reduced to first-order by defining extra variables for the derivatives and then solving. \n",
    "\n",
    "\n",
-    "Let's do an example. Given the system $\\ddot{x} - 6\\dot x + 9x = t$ find the equivalent first order equations. I've used the dot notation for the time derivatives for clarity.\n",
+    "Let's do an example. Given the system $\\ddot{x} - 6\\dot x + 9x = u$ find the equivalent first order equations. I've used the dot notation for the time derivatives for clarity.\n",
    "\n",
    "The first step is to isolate the highest order term onto one side of the equation.\n",
    "\n",
-    "$$\\ddot{x} = 6\\dot x - 9x + t$$\n",
+    "$$\\ddot{x} = 6\\dot x - 9x + u$$\n",
    "\n",
    "We define two new variables:\n",
    "\n",
-    "$$ x_1(t) = x \\\\\n",
-    "x_2(t) = \\dot x\n",
-    "$$\n",
+    "$$\\begin{aligned} x_1(u) &= x \\\\\n",
+    "x_2(u) &= \\dot x\n",
+    "\\end{aligned}$$\n",
    "\n",
-    "Now we will substitute these into the original equation and solve. The solution yields a set of first-order equations in terms of these new variables. It is conventional to drop the $(t)$ for notational convenience.\n",
+    "Now we will substitute these into the original equation and solve. The solution yields a set of first-order equations in terms of these new variables. It is conventional to drop the $(u)$ for notational convenience.\n",
    "\n",
    "We know that $\\dot x_1 = x_2$ and that $\\dot x_2 = \\ddot{x}$. Therefore\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\dot x_2 &= \\ddot{x} \\\\\n",
-    "          &= 6\\dot x - 9x + t\\\\\n",
-    "          &= 6x_2-9x_1 + t\n",
+    "         &= 6\\dot x - 9x + t\\\\\n",
+    "         &= 6x_2-9x_1 + t\n",
    "\\end{aligned}$$\n",
    "\n",
    "Therefore our first-order system of equations is\n",
@ -522,9 +522,7 @@
    "\n",
    "$$\\mathbf x_k = \\mathbf {Fx}_{k-1}$$\n",
    "\n",
-    "$\\mathbf x_k$ does not mean the k$^{th}$ value of $\\mathbf x$, but the value of $\\mathbf x$ at the k$^{th}$ value of $t$. In this book I go even further and drop the suffixes entirely in favor of an overline to denote the prediction: $\\overline{\\mathbf x} = \\mathbf{Fx}$.\n",
-    "\n",
-    "Broadly speaking there are three common ways to find this matrix for Kalman filters. The technique most often used with Kalman filters is to use a matrix exponential. Linear Time Invariant Theory, also known as LTI System Theory, is a second technique. Finally, there are numerical techniques. You may know of others, but these three are what you will most likely encounter in the Kalman filter literature and praxis."
+    "Broadly speaking there are three common ways to find this matrix for Kalman filters. The technique most often used is the matrix exponential. Linear Time Invariant Theory, also known as LTI System Theory, is a second technique. Finally, there are numerical techniques. You may know of others, but these three are what you will most likely encounter in the Kalman filter literature and praxis."
   ]
  },
  {
@ -535,13 +533,13 @@
    "\n",
    "The solution to the equation $\\frac{dx}{dt} = kx$ can be found by:\n",
    "\n",
-    "$$\\frac{dx}{dt} = kx \\\\\n",
+    "$$\\begin{gathered}\\frac{dx}{dt} = kx \\\\\n",
    "\\frac{dx}{x} = k\\, dt \\\\\n",
    "\\int \\frac{1}{x}\\, dx = \\int k\\, dt \\\\\n",
    "\\log x = kt + c \\\\\n",
    "x = e^{kt+c} \\\\\n",
    "x = e^ce^{kt} \\\\\n",
-    "x = c_0e^{kt}$$\n",
+    "x = c_0e^{kt}\\end{gathered}$$\n",
    "\n",
    "Using similar math, the solution to the first-order equation \n",
    "\n",
@ -610,10 +608,10 @@
    "\n",
    "We can solve these equations by integrating each side. I demonstrated integrating the time invariants system $v = \\dot x$ above. However, integrating the time invariant equation $\\dot x = f(x)$ is not so straightforward. Using the *separation of variables* techniques we divide by $f(x)$ and move the $dt$ term to the right so we can integrate each side:\n",
    "\n",
-    "$$\n",
+    "$$\\begin{gathered}\n",
    "\\frac{dx}{dt} = f(x) \\\\\n",
-    "\\int^x_{x_0} \\frac{1}{f(x)} dx = \\int^t_{t_0} dt\\\\\n",
-    "$$\n",
+    "\\int^x_{x_0} \\frac{1}{f(x)} dx = \\int^t_{t_0} dt\n",
+    "\\end{gathered}$$\n",
    "\n",
    "If we let $F(x) = \\int \\frac{1}{f(x)} dx$ we get\n",
    "\n",
@ -621,8 +619,10 @@
    "\n",
    "We then solve for x with\n",
    "\n",
-    "$$F(x) = t - t_0 + F(x_0) \\\\\n",
-    "x = F^{-1}[t-t_0 + F(x_0)]$$\n",
+    "$$\\begin{gathered}\n",
+    "F(x) = t - t_0 + F(x_0) \\\\\n",
+    "x = F^{-1}[t-t_0 + F(x_0)]\n",
+    "\\end{gathered}$$\n",
    "\n",
    "In other words, we need to find the inverse of $F$. This is not trivial, and a significant amount of coursework in a STEM education is devoted to finding tricky, analytic solutions to this problem. \n",
    "\n",
@ -1259,7 +1259,9 @@
    "\n",
    "Let's say we have the initial condition problem of \n",
    "\n",
-    "$$ y' = y, \\\\ y(0) = 1$$\n",
+    "$$\\begin{gathered}\n",
+    "y' = y, \\\\ y(0) = 1\n",
+    "\\end{gathered}$$\n",
    "\n",
    "We happen to know the exact answer is $y=e^t$ because we solved it earlier, but for an arbitrary ODE we will not know the exact solution. In general all we know is the derivative of the equation, which is equal to the slope. We also know the initial value: at $t=0$, $y=1$. If we know these two pieces of information we can predict the value at $y(t=1)$ using the slope at $t=0$ and the value of $y(0)$. I've plotted this below."
   ]
@ -1718,9 +1720,9 @@
    "\n",
    "Here we will say that $\\mu_1$ is the state $x$, and $\\mu_2$ is the measurement $z$. Thus it follows that that $\\sigma_1^2$ is the state uncertainty $P$, and $\\sigma_2^2$ is the measurement noise $R$. Let's substitute those in.\n",
    "\n",
-    "$$ \\mu = \\frac{Pz + Rx}{P+R} \\\\\n",
-    "\\sigma^2 = \\frac{1}{\\frac{1}{P} + \\frac{1}{R}}\n",
-    "$$\n",
+    "$$\\begin{aligned} \\mu &= \\frac{Pz + Rx}{P+R} \\\\\n",
+    "\\sigma^2 &= \\frac{1}{\\frac{1}{P} + \\frac{1}{R}}\n",
+    "\\end{aligned}$$\n",
    "\n",
    "I will handle $\\mu$ first. The corresponding equation in the multivariate case is\n",
    "\n",
@ -1785,9 +1787,10 @@
    "\n",
    "We are almost done, but recall that the variance of estimate is given by \n",
    "\n",
-    "$${\\sigma_{x}^2} = \\frac{1}{ \\frac{1}{\\sigma_1^2} +  \\frac{1}{\\sigma_2^2}}\\\\\n",
-    "= \\frac{1}{ \\frac{1}{a} +  \\frac{1}{b}}\n",
-    "$$\n",
+    "$$\\begin{aligned}\n",
+    "\\sigma_{x}^2 &= \\frac{1}{\\frac{1}{\\sigma_1^2} +  \\frac{1}{\\sigma_2^2}} \\\\\n",
+    "&= \\frac{1}{\\frac{1}{a} +  \\frac{1}{b}}\n",
+    "\\end{aligned}$$\n",
    "\n",
    "We can incorporate that term into our equation above by observing that\n",
    "\n",
--- a/08-Designing-Kalman-Filters.ipynb
+++ b/08-Designing-Kalman-Filters.ipynb
@ -446,7 +446,7 @@
   "source": [
    "Our next step is to design the state transition function. Recall that the state transition function is implemented as a matrix $\\mathbf F$ that we multiply with the previous state of our system to get the next state, like so. \n",
    "\n",
-    "$$\\mathbf{\\overline x} = \\mathbf{Fx}$$\n",
+    "$$\\mathbf{\\bar x} = \\mathbf{Fx}$$\n",
    "\n",
    "I will not belabor this as it is very similar to the 1-D case we did in the previous chapter. The state equations are\n",
    "\n",
@ -2094,8 +2094,8 @@
    "The residual and system uncertainty of the filter is defined as\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf y &= \\mathbf z - \\mathbf{H \\overline x}\\\\\n",
-    "\\mathbf S &= \\mathbf{H\\overline{P}H}^\\mathsf T + \\mathbf R\n",
+    "\\mathbf y &= \\mathbf z - \\mathbf{H \\bar x}\\\\\n",
+    "\\mathbf S &= \\mathbf{H\\bar{P}H}^\\mathsf T + \\mathbf R\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
@ -2218,7 +2218,7 @@
    "\n",
    "In this chapter we learned that the equation for the state prediction is:\n",
    "\n",
-    "$$\\overline{\\mathbf x} = \\mathbf{Fx} + \\mathbf{Bu}$$\n",
+    "$$\\bar{\\mathbf x} = \\mathbf{Fx} + \\mathbf{Bu}$$\n",
    "\n",
    "Our state is a vector, so we need to represent the control input as a vector. Here $\\mathbf{u}$ is the control input, and $\\mathbf{B}$ is a matrix that transforms the control input into a change in $\\mathbf x$. Let's consider a simple example. Suppose the state is $x = \\begin{bmatrix} x & \\dot x\\end{bmatrix}$ for a robot we are controlling and the control input is commanded velocity. This gives us a control input of \n",
    "\n",
@ -2229,7 +2229,7 @@
    "$$\\begin{aligned}x &= x + \\dot x_\\mathtt{cmd} \\Delta t \\\\\n",
    "\\dot x &= \\dot x_\\mathtt{cmd}\\end{aligned}$$\n",
    "\n",
-    "We need to represent this set of equation in the form $\\overline{\\mathbf x} = \\mathbf{Fx} + \\mathbf{Bu}$."
+    "We need to represent this set of equation in the form $\\bar{\\mathbf x} = \\mathbf{Fx} + \\mathbf{Bu}$."
   ]
  },
  {
@ -2885,7 +2885,7 @@
    "\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{\\overline x} &= {\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix}}^- \\\\\n",
+    "\\mathbf{\\bar x} &= {\\begin{bmatrix}x\\\\\\dot x\\end{bmatrix}}^- \\\\\n",
    "\\mathbf F &= \\begin{bmatrix}1&\\Delta t  \\\\ 0&1\\end{bmatrix} \n",
    "\\end{aligned}$$\n",
    "\n",
@ -3235,7 +3235,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We might think to use the same state variables as used for tracking the dog. However, this will not work. Recall that the Kalman filter state transition must be written as $\\mathbf{\\overline x} = \\mathbf{Fx} + \\mathbf{Bu}$, which means we must calculate the current state from the previous state. Our assumption is that the ball is traveling in a vacuum, so the velocity in x is a constant, and the acceleration in y is solely due to the gravitational constant $g$. We can discretize the Newtonian equations using the well known Euler method in terms of $\\Delta t$ are:\n",
+    "We might think to use the same state variables as used for tracking the dog. However, this will not work. Recall that the Kalman filter state transition must be written as $\\mathbf{\\bar x} = \\mathbf{Fx} + \\mathbf{Bu}$, which means we must calculate the current state from the previous state. Our assumption is that the ball is traveling in a vacuum, so the velocity in x is a constant, and the acceleration in y is solely due to the gravitational constant $g$. We can discretize the Newtonian equations using the well known Euler method in terms of $\\Delta t$ are:\n",
    "\n",
    "$$\\begin{aligned}\n",
    "x_t &=  x_{t-1} + v_{x(t-1)} {\\Delta t} \\\\\n",
@ -3246,7 +3246,7 @@
    "\\end{aligned}\n",
    "$$\n",
    "\n",
-    "> **sidebar**: *Euler's method integrates a differential equation stepwise by assuming the slope (derivative) is constant at time $t$. In this case the derivative of the position is velocity. At each time step $\\Delta t$ we assume a constant velocity, compute the new position, and then update the velocity for the next time step. There are more accurate methods, such as Runge-Kutta available to us, but because we are updating the state with a measurement in each step Euler's method is very accurate.* If you need to use Runge-Kutta you will have to write your own `predict()` function which computes the state transition for $\\mathbf x$, and then uses the normal Kalman filter equation $\\mathbf{\\overline{P}}=\\mathbf{FPF}^\\mathsf T + \\mathbf Q$ to update the covariance matrix.\n",
+    "> **sidebar**: *Euler's method integrates a differential equation stepwise by assuming the slope (derivative) is constant at time $t$. In this case the derivative of the position is velocity. At each time step $\\Delta t$ we assume a constant velocity, compute the new position, and then update the velocity for the next time step. There are more accurate methods, such as Runge-Kutta available to us, but because we are updating the state with a measurement in each step Euler's method is very accurate.* If you need to use Runge-Kutta you will have to write your own `predict()` function which computes the state transition for $\\mathbf x$, and then uses the normal Kalman filter equation $\\mathbf{\\bar P}=\\mathbf{FPF}^\\mathsf T + \\mathbf Q$ to update the covariance matrix.\n",
    "\n",
    "This implies that we need to incorporate acceleration for $y$ into the Kalman filter, but not for $x$. This suggests the following state variables.\n",
    "\n",
@ -3263,7 +3263,7 @@
    "\n",
    "However, the acceleration is due to gravity, which is a constant. Instead of asking the Kalman filter to track a constant we can treat gravity as what it really is - a control input. In other words, gravity is a force that alters the behavior of the system in a known way, and it is applied throughout the flight of the ball. \n",
    "\n",
-    "The equation for the state prediction is $\\mathbf{\\overline x} = \\mathbf{Fx} + \\mathbf{Bu}$. $\\mathbf{Fx}$ is the familiar state transition function which we will use to model the position and velocity of the ball. The vector $\\mathbf{u}$ lets you specify a control input into the filter. For a car the control input will be things such as the amount the accelerator and brake are pressed, the position of the steering wheel, and so on. For our ball the control input will be gravity. The matrix $\\mathbf{B}$ models how the control inputs affect the behavior of the system. Again, for a car $\\mathbf{B}$ will convert the inputs of the brake and accelerator into changes of velocity, and the input of the steering wheel into a different position and heading. For our ball tracking problem it will compute the velocity change due to gravity. We will go into the details of that in step 3. For now, we design the state variable to be\n",
+    "The equation for the state prediction is $\\mathbf{\\bar x} = \\mathbf{Fx} + \\mathbf{Bu}$. $\\mathbf{Fx}$ is the familiar state transition function which we will use to model the position and velocity of the ball. The vector $\\mathbf{u}$ lets you specify a control input into the filter. For a car the control input will be things such as the amount the accelerator and brake are pressed, the position of the steering wheel, and so on. For our ball the control input will be gravity. The matrix $\\mathbf{B}$ models how the control inputs affect the behavior of the system. Again, for a car $\\mathbf{B}$ will convert the inputs of the brake and accelerator into changes of velocity, and the input of the steering wheel into a different position and heading. For our ball tracking problem it will compute the velocity change due to gravity. We will go into the details of that in step 3. For now, we design the state variable to be\n",
    "\n",
    "$$\n",
    "\\mathbf x = \n",
@ -3324,7 +3324,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We will use the control input to account for the force of gravity. The term $\\mathbf{Bu}$ is added to $\\mathbf{\\overline x}$ to account for how much $\\mathbf{\\overline x}$ changes due to gravity. We can say that  $\\mathbf{Bu}$ contains $\\begin{bmatrix}\\Delta x_g & \\Delta \\dot{x_g} & \\Delta y_g & \\Delta \\dot{y_g}\\end{bmatrix}^\\mathsf T$.\n",
+    "We will use the control input to account for the force of gravity. The term $\\mathbf{Bu}$ is added to $\\mathbf{\\bar x}$ to account for how much $\\mathbf{\\bar x}$ changes due to gravity. We can say that  $\\mathbf{Bu}$ contains $\\begin{bmatrix}\\Delta x_g & \\Delta \\dot{x_g} & \\Delta y_g & \\Delta \\dot{y_g}\\end{bmatrix}^\\mathsf T$.\n",
    "\n",
    "If we look at the discretized equations we see that gravity only affect the velocity for $y$.\n",
    "\n",
@ -3577,14 +3577,17 @@
    "\n",
    "where $B_2$ is a coefficient derived experimentally, and $v$ is the velocity of the object. $F_{drag}$ can be factored into $x$ and $y$ components with\n",
    "\n",
-    "$$F_{drag,x} = -B_2v v_x\\\\\n",
-    "F_{drag,y} = -B_2v v_y\n",
-    "$$\n",
+    "$$\\begin{aligned}\n",
+    "F_{drag,x} &= -B_2v v_x\\\\\n",
+    "F_{drag,y} &= -B_2v v_y\n",
+    "\\end{aligned}$$\n",
    "\n",
    "If $m$ is the mass of the ball, we can use $F=ma$ to compute the acceleration as\n",
    "\n",
-    "$$ a_x = -\\frac{B_2}{m}v v_x\\\\\n",
-    "a_y = -\\frac{B_2}{m}v v_y$$\n",
+    "$$\\begin{aligned} \n",
+    "a_x &= -\\frac{B_2}{m}v v_x\\\\\n",
+    "a_y &= -\\frac{B_2}{m}v v_y\n",
+    "\\end{aligned}$$\n",
    "\n",
    "Giordano provides the following function for $\\frac{B_2}{m}$, which takes air density, the cross section of a baseball, and its roughness into account. Understand that this is an approximation based on wind tunnel tests and several simplifying assumptions. It is in SI units: velocity is in meters/sec and time is in seconds.\n",
    "\n",
--- a/10-Unscented-Kalman-Filter.ipynb
+++ b/10-Unscented-Kalman-Filter.ipynb
@ -515,7 +515,7 @@
    "\n",
    "The first two equations are the constraint that the weights must sum to one. The third equation is how you compute a weight mean. The forth equation may be less familiar, but recall that the equation for the covariance of two random variables is:\n",
    "\n",
-    "$$COV(x,y) = \\frac{\\sum(x-\\overline x)(y-\\bar{y})}{n}$$\n",
+    "$$COV(x,y) = \\frac{\\sum(x-\\bar x)(y-\\bar{y})}{n}$$\n",
    "\n",
    "These constraints do not form a unique solution. For example, if you make $w^m_0$ smaller you can compensate by making $w^m_1$ and $w^m_2$ larger. You can use different weights for the mean and covariances, or the same weights. Indeed, these equations do not require that any of the points be the mean of the input at all, though it seems 'nice' to do so, so to speak.\n",
    "\n",
@ -733,14 +733,14 @@
   "source": [
    "We compute the mean and covariance of the prior using the *unscented transform* on the transformed sigma points.  \n",
    "\n",
-    "$$\\mathbf{\\overline x}, \\mathbf{\\overline P} = \n",
+    "$$\\mathbf{\\bar x}, \\mathbf{\\bar P} = \n",
    "UT(\\mathcal{Y}, w_m, w_c, \\mathbf Q)$$\n",
    "\n",
    "The unscented transform is computed with the equations below. I've dropped the subscript $i$ for readability.\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{\\overline x} &= \\sum w^m\\boldsymbol{\\mathcal{Y}} \\\\\n",
-    "\\mathbf{\\overline P} &= \\sum w^c({\\boldsymbol{\\mathcal{Y}} - \\mathbf{\\overline x})(\\boldsymbol{\\mathcal{Y}}-\\mathbf{\\overline x})^\\mathsf{T}} + \\mathbf Q\n",
+    "\\mathbf{\\bar x} &= \\sum w^m\\boldsymbol{\\mathcal{Y}} \\\\\n",
+    "\\mathbf{\\bar P} &= \\sum w^c({\\boldsymbol{\\mathcal{Y}} - \\mathbf{\\bar x})(\\boldsymbol{\\mathcal{Y}}-\\mathbf{\\bar x})^\\mathsf{T}} + \\mathbf Q\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
@ -755,10 +755,10 @@
    "\\text{Kalman} & \\text{Unscented} \\\\\n",
    "\\hline \n",
    "& \\boldsymbol{\\mathcal Y} = f(\\boldsymbol\\chi) \\\\\n",
-    "\\mathbf{\\overline x} = \\mathbf{Fx} & \n",
-    "\\mathbf{\\overline x} = \\sum w^m\\boldsymbol{\\mathcal Y}  \\\\\n",
-    "\\mathbf{\\overline P} = \\mathbf{FPF}^\\mathsf T + \\mathbf Q  & \n",
-    "\\mathbf{\\overline P} = \\sum w^c({\\boldsymbol{\\mathcal Y} - \\mathbf{\\overline x})(\\boldsymbol{\\mathcal Y} - \\mathbf{\\overline x})^\\mathsf T}+\\mathbf Q\n",
+    "\\mathbf{\\bar x} = \\mathbf{Fx} & \n",
+    "\\mathbf{\\bar x} = \\sum w^m\\boldsymbol{\\mathcal Y}  \\\\\n",
+    "\\mathbf{\\bar P} = \\mathbf{FPF}^\\mathsf T + \\mathbf Q  & \n",
+    "\\mathbf{\\bar P} = \\sum w^c({\\boldsymbol{\\mathcal Y} - \\mathbf{\\bar x})(\\boldsymbol{\\mathcal Y} - \\mathbf{\\bar x})^\\mathsf T}+\\mathbf Q\n",
    "\\end{array}$$"
   ]
  },
@ -793,7 +793,7 @@
    "\n",
    "To compute the Kalman gain we first compute the [cross covariance](https://en.wikipedia.org/wiki/Cross-covariance) of the state and the measurements, which is defined as: \n",
    "\n",
-    "$$\\mathbf P_{xz} =\\sum w^c(\\boldsymbol\\chi-\\mathbf{\\overline x})(\\boldsymbol{\\mathcal Z}-\\boldsymbol\\mu_z)^\\mathsf T$$\n",
+    "$$\\mathbf P_{xz} =\\sum w^c(\\boldsymbol\\chi-\\mathbf{\\bar x})(\\boldsymbol{\\mathcal Z}-\\boldsymbol\\mu_z)^\\mathsf T$$\n",
    "\n",
    "And then the Kalman gain is defined as\n",
    "\n",
@ -806,11 +806,11 @@
    "\n",
    "Finally, we compute the new state estimate using the residual and Kalman gain:\n",
    "\n",
-    "$$\\mathbf x = \\overline{\\mathbf x} + \\mathbf{Ky}$$\n",
+    "$$\\mathbf x = \\bar{\\mathbf x} + \\mathbf{Ky}$$\n",
    "\n",
    "and the new covariance is computed as:\n",
    "\n",
-    "$$ \\mathbf P = \\mathbf{\\overline P} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}$$\n",
+    "$$ \\mathbf P = \\mathbf{\\bar P} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}$$\n",
    "\n",
    "This step contains a few equations you have to take on faith, but you should be able to see how they relate to the linear Kalman filter equations. The linear algebra is slightly different from the linear Kalman filter, but the algorithm is the same Bayesian algorithm we have been implementing throughout the book. \n",
    "\n",
@ -825,21 +825,21 @@
    "\\textrm{Kalman Filter} & \\textrm{Unscented Kalman Filter} \\\\\n",
    "\\hline \n",
    "& \\boldsymbol{\\mathcal Y} = f(\\boldsymbol\\chi) \\\\\n",
-    "\\mathbf{\\overline x} = \\mathbf{Fx} & \n",
-    "\\mathbf{\\overline x} = \\sum w^m\\boldsymbol{\\mathcal Y}  \\\\\n",
-    "\\mathbf{\\overline P} = \\mathbf{FPF}^\\mathsf T+\\mathbf Q  & \n",
-    "\\mathbf{\\overline P} = \\sum w^c({\\boldsymbol{\\mathcal Y}-\\overline{\\boldsymbol\\mu})(\\boldsymbol{\\mathcal Y} - \\bar{\\boldsymbol\\mu})^\\mathsf{T}} + \\mathbf Q \\\\\n",
+    "\\mathbf{\\bar x} = \\mathbf{Fx} & \n",
+    "\\mathbf{\\bar x} = \\sum w^m\\boldsymbol{\\mathcal Y}  \\\\\n",
+    "\\mathbf{\\bar P} = \\mathbf{FPF}^\\mathsf T+\\mathbf Q  & \n",
+    "\\mathbf{\\bar P} = \\sum w^c({\\boldsymbol{\\mathcal Y}-\\bar{\\boldsymbol\\mu})(\\boldsymbol{\\mathcal Y} - \\bar{\\boldsymbol\\mu})^\\mathsf{T}} + \\mathbf Q \\\\\n",
    "\\hline \n",
    "& \\boldsymbol{\\mathcal Z} =  h(\\boldsymbol{\\mathcal{Y}}) \\\\\n",
    "& \\boldsymbol\\mu_z = \\sum w^m\\boldsymbol{\\mathcal{Z}} \\\\\n",
    "\\mathbf y = \\mathbf z - \\mathbf{Hx} &\n",
    "\\mathbf y = \\mathbf z - \\boldsymbol\\mu_z \\\\\n",
-    "\\mathbf S = \\mathbf{H\\overline PH}^\\mathsf{T} + \\mathbf R & \n",
+    "\\mathbf S = \\mathbf{H\\bar PH}^\\mathsf{T} + \\mathbf R & \n",
    "\\mathbf P_z = \\sum w^c{(\\boldsymbol{\\mathcal Z}-\\boldsymbol\\mu_z)(\\boldsymbol{\\mathcal{Z}}-\\boldsymbol\\mu_z)^\\mathsf{T}} + \\mathbf R \\\\ \n",
-    "\\mathbf K = \\mathbf{\\overline PH}^\\mathsf T \\mathbf S^{-1} &\n",
-    "\\mathbf K = \\left[\\sum w^c(\\boldsymbol\\chi-\\overline{\\mathbf x})(\\boldsymbol{\\mathcal{Z}}-\\boldsymbol\\mu_z)^\\mathsf{T}\\right] \\mathbf P_z^{-1} \\\\\n",
-    "\\mathbf x = \\mathbf{\\overline x} + \\mathbf{Ky} & \\mathbf x = \\mathbf{\\overline x} + \\mathbf{Ky}\\\\\n",
-    "\\mathbf P = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\overline P} & \\mathbf P = \\overline{\\mathbf P} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}\n",
+    "\\mathbf K = \\mathbf{\\bar PH}^\\mathsf T \\mathbf S^{-1} &\n",
+    "\\mathbf K = \\left[\\sum w^c(\\boldsymbol\\chi-\\bar{\\mathbf x})(\\boldsymbol{\\mathcal{Z}}-\\boldsymbol\\mu_z)^\\mathsf{T}\\right] \\mathbf P_z^{-1} \\\\\n",
+    "\\mathbf x = \\mathbf{\\bar x} + \\mathbf{Ky} & \\mathbf x = \\mathbf{\\bar x} + \\mathbf{Ky}\\\\\n",
+    "\\mathbf P = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar P} & \\mathbf P = \\bar{\\mathbf P} - \\mathbf{KP_z}\\mathbf{K}^\\mathsf{T}\n",
    "\\end{array}$$"
   ]
  },
@ -968,8 +968,10 @@
    "\n",
    "which implement the Newtonian equations\n",
    "\n",
-    "$$x_k = x_{k-1} + \\dot x_{k-1}\\Delta t \\\\\n",
-    "  y_k = y_{k-1} + \\dot y_{k-1}\\Delta t$$\n",
+    "$$\\begin{aligned}\n",
+    "x_k &= x_{k-1} + \\dot x_{k-1}\\Delta t \\\\\n",
+    "y_k &= y_{k-1} + \\dot y_{k-1}\\Delta t\n",
+    "\\end{aligned}$$\n",
    "\n",
    "Our sensors provide position but not velocity, so the measurement function is\n",
    "\n",
@ -1223,7 +1225,7 @@
   "source": [
    "The state transition function is linear \n",
    "\n",
-    "$$\\mathbf{\\overline x} = \\begin{bmatrix} 1 & \\Delta t & 0 \\\\ 0& 1& 0 \\\\ 0&0&1\\end{bmatrix}\n",
+    "$$\\mathbf{\\bar x} = \\begin{bmatrix} 1 & \\Delta t & 0 \\\\ 0& 1& 0 \\\\ 0&0&1\\end{bmatrix}\n",
    "\\begin{bmatrix}x \\\\ \\dot x\\\\ y\\end{bmatrix}\n",
    "$$\n",
    "\n",
@ -2298,7 +2300,7 @@
    "We will choose an alternative definition that has numerical properties which makes it easier easier to compute. We can define the square root as the matrix S, which when multiplied by its transpose, returns $\\Sigma$:\n",
    "\n",
    "$$\n",
-    "\\Sigma = \\mathbf{SS}^\\mathsf T \\\\\n",
+    "\\Sigma = \\mathbf{SS}^\\mathsf T\n",
    "$$\n",
    "\n",
    "This definition is frequently chosen because $\\mathbf S$ is computed using the *Cholesky decomposition* [3]. It decomposes a Hermitian, positive-definite matrix into a lower triangular matrix and its conjugate transpose. $\\mathbf P$ has these properties, so we can treat $\\mathbf S = \\text{cholesky}(\\mathbf P)$ as the square root of $\\mathbf P$.\n",
@ -2480,16 +2482,16 @@
    "\n",
    "$$\\mathbf P_{xz} =\\sum w^c(\\boldsymbol{\\chi}-\\mu)(\\boldsymbol{\\mathcal{Z}}-\\mathbf{\\mu}_z)^\\mathsf{T}$$\n",
    "\n",
-    "\n",
    "Finally, we compute the new state estimate using the residual and Kalman gain:\n",
    "\n",
-    "\n",
-    "$$K = \\mathbf P_{xz} \\mathbf P_z^{-1}\\\\\n",
-    "{\\mathbf x} = \\mathbf{\\overline x} + \\mathbf{Ky}$$\n",
+    "$$\\begin{aligned}\n",
+    "K &= \\mathbf P_{xz} \\mathbf P_z^{-1}\\\\\n",
+    "{\\mathbf x} &= \\mathbf{\\bar x} + \\mathbf{Ky}\n",
+    "\\end{aligned}$$\n",
    "\n",
    "and the new covariance is computed as:\n",
    "\n",
-    "$$ \\mathbf P = \\mathbf{\\overline P} - \\mathbf{KP}_z\\mathbf{K}^\\mathsf{T}$$\n",
+    "$$ \\mathbf P = \\mathbf{\\bar P} - \\mathbf{KP}_z\\mathbf{K}^\\mathsf{T}$$\n",
    "\n",
    "This function can be implemented as follows, assuming it is a method of a class that stores the necessary matrices and data."
   ]
@ -2931,9 +2933,10 @@
    "\n",
    "With $\\theta$ being the robot's orientation we compute the position $C$ before the turn starts as\n",
    "\n",
-    "$$ C_x = x - R\\sin(\\theta) \\\\\n",
-    "C_y = y + R\\cos(\\theta)\n",
-    "$$\n",
+    "$$\\begin{aligned}\n",
+    "C_x &= x - R\\sin(\\theta) \\\\\n",
+    "C_y &= y + R\\cos(\\theta)\n",
+    "\\end{aligned}$$\n",
    "\n",
    "After the move forward for time $\\Delta t$ the new position and orientation of the robot is\n",
    "\n",
@ -2979,7 +2982,7 @@
    "\n",
    "We model our system as a nonlinear motion model plus noise.\n",
    "\n",
-    "$$\\overline x = x + f(x, u) + \\mathcal{N}(0, Q)$$\n",
+    "$$\\bar x = x + f(x, u) + \\mathcal{N}(0, Q)$$\n",
    "\n",
    "Using the motion model for a robot that we created above, we can write:"
   ]
--- a/11-Extended-Kalman-Filters.ipynb
+++ b/11-Extended-Kalman-Filters.ipynb
@ -325,13 +325,13 @@
    "\n",
    "For the linear filter we have these equations for the process and measurement models:\n",
    "\n",
-    "$$\\begin{aligned}\\overline{\\mathbf x} &= \\mathbf{Ax} + \\mathbf{Bu} + w_x\\\\\n",
+    "$$\\begin{aligned}\\bar{\\mathbf x} &= \\mathbf{Ax} + \\mathbf{Bu} + w_x\\\\\n",
    "\\mathbf z &= \\mathbf{Hx} + w_z\n",
    "\\end{aligned}$$\n",
    "\n",
    "For the nonlinear model these equations must be modified to read:\n",
    "\n",
-    "$$\\begin{aligned}\\overline{\\mathbf x} &= f(\\mathbf x, \\mathbf u) + w_x\\\\\n",
+    "$$\\begin{aligned}\\bar{\\mathbf x} &= f(\\mathbf x, \\mathbf u) + w_x\\\\\n",
    "\\mathbf z &= h(\\mathbf x) + w_z\n",
    "\\end{aligned}$$\n",
    "\n",
@ -394,7 +394,7 @@
    "\\end{aligned}\n",
    "$$\n",
    "\n",
-    "$h(\\overline{\\mathbf x})$ is computed with the prior, but I drop the bar on for notational convenience."
+    "$h(\\bar{\\mathbf x})$ is computed with the prior, but I drop the bar on for notational convenience."
   ]
  },
  {
@ -414,8 +414,8 @@
    "\\hline \n",
    "& \\boxed{\\mathbf A = {\\frac{\\partial{f(\\mathbf x_t, \\mathbf u_t)}}{\\partial{\\mathbf x}}}\\biggr|_{{\\mathbf x_t},{\\mathbf u_t}}} \\\\\n",
    "& \\boxed{\\mathbf F = e^{\\mathbf A \\Delta t}} \\\\\n",
-    "\\mathbf{\\overline x} = \\mathbf{Fx} + \\mathbf{Bu} & \\boxed{\\mathbf{\\overline x} = f(\\mathbf x, \\mathbf u)}  \\\\\n",
-    "\\mathbf{\\overline P} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf Q  & \\mathbf{\\overline P} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf Q \\\\\n",
+    "\\mathbf{\\bar x} = \\mathbf{Fx} + \\mathbf{Bu} & \\boxed{\\mathbf{\\bar x} = f(\\mathbf x, \\mathbf u)}  \\\\\n",
+    "\\mathbf{\\bar P} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf Q  & \\mathbf{\\bar P} = \\mathbf{FPF}^\\mathsf{T}+\\mathbf Q \\\\\n",
    "\\hline\n",
    "& \\boxed{\\mathbf H = \\frac{\\partial{h(\\mathbf x_t)}}{\\partial{\\mathbf x}}\\biggr|_{\\mathbf x_t}} \\\\\n",
    "\\textbf{y} = \\mathbf z - \\mathbf{H \\bar{x}} & \\textbf{y} = \\mathbf z - \\boxed{h(\\bar{x})}\\\\\n",
@ -424,7 +424,7 @@
    "\\mathbf P= (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}} & \\mathbf P= (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}\n",
    "\\end{array}$$\n",
    "\n",
-    "We don't normally use $\\mathbf{Fx}$ to propagate the state for the EKF as the linearization causes inaccuracies. It is typical to compute $\\overline{\\mathbf x}$ using a suitable numerical integration technique such as Euler or Runge Kutta. Thus I wrote $\\mathbf{\\overline x} = f(\\mathbf x, \\mathbf u)$. For the same reasons we don't use $\\mathbf{H\\overline{x}}$ in the computation for the residual, opting for the more accurate $h(\\overline{\\mathbf x})$.\n",
+    "We don't normally use $\\mathbf{Fx}$ to propagate the state for the EKF as the linearization causes inaccuracies. It is typical to compute $\\bar{\\mathbf x}$ using a suitable numerical integration technique such as Euler or Runge Kutta. Thus I wrote $\\mathbf{\\bar x} = f(\\mathbf x, \\mathbf u)$. For the same reasons we don't use $\\mathbf{H\\bar{x}}$ in the computation for the residual, opting for the more accurate $h(\\bar{\\mathbf x})$.\n",
    "\n",
    "I think the easiest way to understand the EKF is to start off with an example. Later you may want to come back and reread this section."
   ]
@ -477,8 +477,10 @@
   "source": [
    "This gives us the equalities:\n",
    "\n",
-    "$$\\theta = \\tan^{-1} \\frac y x\\\\\n",
-    "r^2 = x^2 + y^2$$ "
+    "$$\\begin{aligned}\n",
+    "\\theta &= \\tan^{-1} \\frac y x\\\\\n",
+    "r^2 &= x^2 + y^2\n",
+    "\\end{aligned}$$ "
   ]
  },
  {
@ -547,7 +549,7 @@
   "source": [
    "### Design the Measurement Model\n",
    "\n",
-    "The measurement function takes the state estimate of the prior $\\overline{\\mathbf x}$ and turn it into a measurement of the slant range distance. For notational convenience I will use $\\mathbf x$, not $\\overline{\\mathbf x}$. We use the Pythagorean theorem to derive:\n",
+    "The measurement function takes the state estimate of the prior $\\bar{\\mathbf x}$ and turn it into a measurement of the slant range distance. For notational convenience I will use $\\mathbf x$, not $\\bar{\\mathbf x}$. We use the Pythagorean theorem to derive:\n",
    "\n",
    "$$h(\\mathbf x) = \\sqrt{x^2 + y^2}$$\n",
    "\n",
@ -603,7 +605,7 @@
    "\\frac{y}{\\sqrt{x^2 + y^2}}\n",
    "\\end{bmatrix}$$\n",
    "\n",
-    "This may seem daunting, so step back and recognize that all of this math is doing something very simple. We have an equation for the slant range to the airplane which is nonlinear. The Kalman filter only works with linear equations, so we need to find a linear equation that approximates $\\mathbf H$. As we discussed above, finding the slope of a nonlinear equation at a given point is a good approximation. For the Kalman filter, the 'given point' is the state variable $\\mathbf x$ so we need to take the derivative of the slant range with respect to $\\mathbf x$. For the linear Kalman filter $\\mathbf H$ was a constant that we computed prior to running the filter. For the EKF $\\mathbf H$ is updated at each step as the evaluation point $\\overline{\\mathbf x}$ changes at each epoch.\n",
+    "This may seem daunting, so step back and recognize that all of this math is doing something very simple. We have an equation for the slant range to the airplane which is nonlinear. The Kalman filter only works with linear equations, so we need to find a linear equation that approximates $\\mathbf H$. As we discussed above, finding the slope of a nonlinear equation at a given point is a good approximation. For the Kalman filter, the 'given point' is the state variable $\\mathbf x$ so we need to take the derivative of the slant range with respect to $\\mathbf x$. For the linear Kalman filter $\\mathbf H$ was a constant that we computed prior to running the filter. For the EKF $\\mathbf H$ is updated at each step as the evaluation point $\\bar{\\mathbf x}$ changes at each epoch.\n",
    "\n",
    "To make this more concrete, let's now write a Python function that computes the Jacobian of $h$ for this problem."
   ]
@ -982,13 +984,13 @@
    "\n",
    "We model our system as a nonlinear motion model plus noise.\n",
    "\n",
-    "$$\\overline x = x + f(x, u) + \\mathcal{N}(0, Q)$$\n",
+    "$$\\bar x = x + f(x, u) + \\mathcal{N}(0, Q)$$\n",
    "\n",
    "\n",
    "\n",
    "Using the motion model for a robot that we created above, we can expand this to\n",
    "\n",
-    "$$\\overline{\\begin{bmatrix}x\\\\y\\\\\\theta\\end{bmatrix}} = \\begin{bmatrix}x\\\\y\\\\\\theta\\end{bmatrix} + \n",
+    "$$\\bar{\\begin{bmatrix}x\\\\y\\\\\\theta\\end{bmatrix}} = \\begin{bmatrix}x\\\\y\\\\\\theta\\end{bmatrix} + \n",
    "\\begin{bmatrix}- R\\sin(\\theta) + R\\sin(\\theta + \\beta) \\\\\n",
    "R\\cos(\\theta) - R\\cos(\\theta + \\beta) \\\\\n",
    "\\beta\\end{bmatrix}$$"
@ -1206,11 +1208,11 @@
    "This gives us the final form of our prediction equations:\n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\mathbf{\\overline x} &= \\mathbf x + \n",
+    "\\mathbf{\\bar x} &= \\mathbf x + \n",
    "\\begin{bmatrix}- R\\sin(\\theta) + R\\sin(\\theta + \\beta) \\\\\n",
    "R\\cos(\\theta) - R\\cos(\\theta + \\beta) \\\\\n",
    "\\beta\\end{bmatrix}\\\\\n",
-    "\\mathbf{\\overline P} &=\\mathbf{FPF}^{\\mathsf T} + \\mathbf{VMV}^{\\mathsf T}\n",
+    "\\mathbf{\\bar P} &=\\mathbf{FPF}^{\\mathsf T} + \\mathbf{VMV}^{\\mathsf T}\n",
    "\\end{aligned}$$\n",
    "\n",
    "This form of linearization is not the only way to predict $\\mathbf x$. For example, we could use a numerical integration technique such as *Runge Kutta* to compute the movement\n",
--- a/14-Adaptive-Filtering.ipynb
+++ b/14-Adaptive-Filtering.ipynb
--- a/Supporting_Notebooks/Iterative-Least-Squares-for-Sensor-Fusion.ipynb
+++ b/Supporting_Notebooks/Iterative-Least-Squares-for-Sensor-Fusion.ipynb
@ -477,10 +477,11 @@
    "\n",
    "Taking the transpose of each side gives\n",
    "\n",
-    "$${\\delta \\mathbf x} = ({{\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}})^\\mathsf{T} \\\\\n",
-    "={{(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}}^T\\mathbf H^\\mathsf{T} \\delta \\mathbf z^-} \\\\\n",
-    "={{(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}}\\mathbf H^\\mathsf{T} \\delta \\mathbf z^-}\n",
-    "$$\n",
+    "$$\\begin{aligned}\n",
+    "{\\delta \\mathbf x} &= ({{\\delta \\mathbf z^-}^\\mathsf{T}\\mathbf H(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}})^\\mathsf{T} \\\\\n",
+    "&={{(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}}^T\\mathbf H^\\mathsf{T} \\delta \\mathbf z^-} \\\\\n",
+    "&={{(\\mathbf H^\\mathsf{T}\\mathbf H)^{-1}}\\mathbf H^\\mathsf{T} \\delta \\mathbf z^-}\n",
+    "\\end{aligned}$$\n",
    "\n",
    "For various reasons you may want to weigh some measurement more than others. We can do that with the equation\n",
    "\n",
@ -660,8 +661,10 @@
   "source": [
    "So let's think about this. The first iteration is essentially performing the computation that the linear Kalman filter computes during the update step:\n",
    "\n",
-    "$$\\mathbf{y} = \\mathbf z - \\mathbf{Hx}\\\\\n",
-    "\\mathbf x = \\mathbf x + \\mathbf{Ky}$$\n",
+    "$$\\begin{aligned}\n",
+    "\\mathbf y &= \\mathbf z - \\mathbf{Hx}\\\\\n",
+    "\\mathbf x &= \\mathbf x + \\mathbf{Ky}\n",
+    "\\end{aligned}$$\n",
    "\n",
    "where the Kalman gain equals one. You can see that despite the very inaccurate initial guess (900, 90) the computed value for $\\mathbf x$, (805.4, 205.3), was very close to the actual value of (800, 200). However, it was not perfect. But after three iterations the ILS algorithm was able to find the exact answer. So hopefully it is clear why we use ILS instead of doing the sensor fusion with the Kalman filter - it gives a better result. Of course, we started with a very inaccurate guess; what if the guess was better?"
   ]