Merge pull request #61 from kcamd/master

More fixes for chap. 1, 3, 4, 5, 6, 7
2015-09-26 07:50:11 -07:00 · 2015-09-26 07:50:11 -07:00 · 9f9f5d8447
commit 9f9f5d8447
parent 3bd0eda268 9384d97093
6 changed files with 35 additions and 35 deletions
--- a/01-g-h-filter.ipynb
+++ b/01-g-h-filter.ipynb
@ -750,7 +750,7 @@
    "\n",
    "I hand generated the weight data to correspond to a true starting weight of 160 lbs, and a weight gain of 1 lb per day. In other words on the first day (day zero) the true weight is 160lbs, on the second day (day one, the first day of weighing) the true weight is 161 lbs, and so on. \n",
    "\n",
-    "We need to make a guess for the initial weight. It is too early to talk about initialization strategies, so for now I will assume 159 lbs."
+    "We need to make a guess for the initial weight. It is too early to talk about initialization strategies, so for now I will assume 160 lbs."
   ]
  },
  {
--- a/03-Gaussians.ipynb
+++ b/03-Gaussians.ipynb
@ -296,7 +296,7 @@
    "\n",
    "### Random Variables\n",
    "\n",
-    "To understand Gaussians we first need to understand a few simple mathematical computations. We start with a **random variable** x. A random variable is a variable whose value depends on some random process. If you flip a coin, you could have a variable $c$, and assign it the value 1 for heads, and 0 for tails. That a random value. It can be the height of the students in a class. That may not seem random to you, but chances are you cannot predict the height of the student Reem Nassar because her height is not deterministically determined.  For a specific classroom perhaps the heights are\n",
+    "To understand Gaussians we first need to understand a few simple mathematical computations. We start with a **random variable** x. A random variable is a variable whose value depends on some random process. If you flip a coin, you could have a variable $c$, and assign it the value 1 for heads, and 0 for tails. That is a random value. It can be the height of the students in a class. That may not seem random to you, but chances are you cannot predict the height of the student Reem Nassar because her height is not deterministically determined.  For a specific classroom perhaps the heights are\n",
    "\n",
    "$$x= [1.8, 2.0, 1.7, 1.9, 1.6]$$\n",
    "\n",
@ -792,7 +792,7 @@
    "\n",
    "Probably this is immediately recognizable to you as a 'bell curve'. This curve is ubiquitous because under real world conditions many observations are distributed in such a manner. In fact, this is the  curve for the student heights given earlier. I will not use the term 'bell curve' to refer to a Gaussian because several probability distributions have a similar bell curve shape. Non-mathematical sources might not be so precise, so be judicious in what you conclude when you see the term used without definition.\n",
    "\n",
-    "This curve is not unique to heights - a vast amount of natural phenomena exhibits this sort of distribution, including the sensors that we use in filtering problems. As we will see, it also has all the attributes that we are looking for - it represents a unimodal belief or value as a probability, it is continuous, and it is computationally efficient. We will soon discover that it also other desirable qualities which we may not realize we desire.\n",
+    "This curve is not unique to heights - a vast amount of natural phenomena exhibits this sort of distribution, including the sensors that we use in filtering problems. As we will see, it also has all the attributes that we are looking for - it represents a unimodal belief or value as a probability, it is continuous, and it is computationally efficient. We will soon discover that it also has other desirable qualities which we may not realize we desire.\n",
    "\n",
    "To further motivate you, recall the shapes of the probability distributions in the *Discrete Bayes* chapter. They were not perfect Gaussian curves, but they were similar, as in the plot below. We will be using Gaussians to replace the discrete probabilities used in that chapter! Please note that eyeball comparisons of PDF curves is strongly discouraged, as humans have trouble estimating areas; CDFs are usually the preferred choice. "
   ]
@ -1418,7 +1418,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The area under the curve cannot equal 1, so it is not a probability distribution. What actually happens is that a more students than predicted by a normal distribution get scores nearer the upper end of the range (for example), and that tail becomes 'fat'. Also, the test is probably not able to perfectly distinguish incredibly minute differences in skill in the students, so the distribution to the left of the mean is also probably a bit bunched up in places. The resulting distribution is called a *fat tail distribution*. \n",
+    "The area under the curve cannot equal 1, so it is not a probability distribution. What actually happens is that more students than predicted by a normal distribution get scores nearer the upper end of the range (for example), and that tail becomes 'fat'. Also, the test is probably not able to perfectly distinguish incredibly minute differences in skill in the students, so the distribution to the left of the mean is also probably a bit bunched up in places. The resulting distribution is called a *fat tail distribution*. \n",
    "\n",
    "Kalman filters use sensors to measure the world. The errors in sensor's measurements are rarely truly Gaussian. It is far too early to be talking about the difficulties that this presents to the Kalman filter designer. It is worth keeping in the back of your mind the fact that the Kalman filter math is based on a somewhat idealized model of the world.  For now I will present a bit of code that I will be using later in the book to form fat tail distributions to simulate various processes and sensors. This distribution is called the student's t distribution. \n",
    "\n",
@ -1532,7 +1532,7 @@
    "\n",
    "It is unlikely that the Student's T distribution is an accurate model of how your sensor (say, a GPS or Doppler) performs, and this is not a book on how to model physical systems. However, it does produce reasonable data to test your filter's performance when presented with real world noise. We will be using distributions like these throughout the rest of the book in our simulations and tests. \n",
    "\n",
-    "This is not an idle concern. The Kalman filter equations assume the noise is normally distributed, and perform sub-optimally if this is not true. Designers for mission critical filters, such as the filters on spacecraft, need to master a lot of theory and emperical knowledge about the performance of the sensors on their spacecraft. \n",
+    "This is not an idle concern. The Kalman filter equations assume the noise is normally distributed, and perform sub-optimally if this is not true. Designers for mission critical filters, such as the filters on spacecraft, need to master a lot of theory and empirical knowledge about the performance of the sensors on their spacecraft. \n",
    "\n",
    "The code for rand_student_t is included in `filterpy.stats`. You may use it with\n",
    "\n",
--- a/04-One-Dimensional-Kalman-Filters.ipynb
+++ b/04-One-Dimensional-Kalman-Filters.ipynb
@ -335,7 +335,7 @@
    "\n",
    "$$ x = v \\Delta t + x_0$$.\n",
    "\n",
-    "This is a model of the dog's behavior, and we call it the **system model**. However, clearly it is not perfect. The dog might speed up or slow down due to hills, winds, or whimsy. We could add those things to the model if we knew about them, but there will always be more things that we don't know. this is not due to the tracked object being a living creature. The same holds if we are tracking a missile, aircraft, or paint nozzle on a factory robotic painter. There will always be unpredictable errors in physical systems. We can model this mathematically by saying the dog's actual position is the prediction that comes from our **system model** plus what we call the **process noise**. In the literature this is usually written something like this:\n",
+    "This is a model of the dog's behavior, and we call it the **system model**. However, clearly it is not perfect. The dog might speed up or slow down due to hills, winds, or whimsy. We could add those things to the model if we knew about them, but there will always be more things that we don't know. This is not due to the tracked object being a living creature. The same holds if we are tracking a missile, aircraft, or paint nozzle on a factory robotic painter. There will always be unpredictable errors in physical systems. We can model this mathematically by saying the dog's actual position is the prediction that comes from our **system model** plus what we call the **process noise**. In the literature this is usually written something like this:\n",
    "\n",
    "$$ x = f(x) +\\epsilon$$\n",
    "\n",
@ -416,7 +416,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The constructor `__init()__` initializes the DogSimulation class with an initial position `x0`, velocity `vel`, and the variance in the measurement and noise. The `move()` method has the dog move based on its velocity and the process noise. The `sense_position()` method computes and returns a measurement of the dog's position based on the current position and the sensor noise.\n",
+    "The constructor `__init()__` initializes the DogSimulation class with an initial position `x0`, velocity `vel`, and the variance in the measurement and noise. The `move()` method moves the dog based on its velocity and the process noise. The `sense_position()` method computes and returns a measurement of the dog's position based on the current position and the sensor noise.\n",
    "\n",
    "We need to convert the variances that are passed into `__init__()` into standard deviations because `randn()` is scaled by the standard deviation. Variance is the standard deviation square, so we take the square root of the variance in `__init__()`."
   ]
@ -959,7 +959,7 @@
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
-    "\\mu&=\\frac{9 \\sigma_\\mathtt{z}^2 \\mu_2 + \\sigma_\\mathtt{z}^2 \\mu_\\mathtt{prior}} {9 \\sigma_\\mathtt{z}^2 + \\sigma_\\mathtt{z}^2} \\\\\n",
+    "\\mu&=\\frac{9 \\sigma_\\mathtt{z}^2 \\mu_\\mathtt{z} + \\sigma_\\mathtt{z}^2 \\mu_\\mathtt{prior}} {9 \\sigma_\\mathtt{z}^2 + \\sigma_\\mathtt{z}^2} \\\\\n",
    "\\text{or}\\\\\n",
    "\\mu&= \\frac{1}{10} \\mu_\\mathtt{prior} + \\frac{9}{10} \\mu_\\mathtt{z}\n",
    "\\end{aligned}\n",
@ -2190,7 +2190,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "It is easy to see that the filter is not correctly responding to the measurements which are clearly telling the filter that the dog is changing speed. I encourage you to adjust the amount of movement in the dog vs process variance. we will also be studying this topic much more in the following chapter."
+    "It is easy to see that the filter is not correctly responding to the measurements which are clearly telling the filter that the dog is changing speed. I encourage you to adjust the amount of movement in the dog vs process variance. We will also be studying this topic much more in the following chapter."
   ]
  },
  {
--- a/05-Multivariate-Gaussians.ipynb
+++ b/05-Multivariate-Gaussians.ipynb
@ -757,7 +757,7 @@
    "\n",
    "This is the first contour that has values in the off-diagonal elements of the covariance, and this is the first contour plot with a slanted ellipse. This is not a coincidence. The two facts are telling us the same thing. A slanted ellipse tells us that the $x$ and $y$ values are somehow **correlated**. We denote that in the covariance matrix with values off the diagonal.\n",
    "\n",
-    "What does this mean in physical terms? Think of parallel parking a car. You can not pull up beside the spot and then move sideways into the space because cars cannot drive sideways. $x$ and $y$ are not independent. This is a consequence of the steering mechanism. When the steering wheel is turned the car rotates around its rear axle while moving forward. Or think of a horse attached to a pivoting exercise bar in a corral. The horse can only walk in circles, he cannot vary $x$ and $y$ independently, which means he cannot walk in a straight line or a zig zag. If $x$ changes, $y$ must also change in a defined way. \n",
+    "What does this mean in physical terms? Think of parallel parking a car. You can not pull up beside the spot and then move sideways into the space because cars cannot drive sideways. $x$ and $y$ are not independent. This is a consequence of the steering mechanism. When the steering wheel is turned the car rotates around its rear axle while moving forward. Or think of a horse attached to a pivoting exercise bar in a corral. The horse can only walk in circles, it cannot vary $x$ and $y$ independently, which means it cannot walk in a straight line or a zig zag. If $x$ changes, $y$ must also change in a defined way. \n",
    "\n",
    "When we see this ellipse we know that $x$ and $y$ are correlated, and that the correlation is \"strong\". The size of the ellipse shows how much error we have in each axis, and the slant shows how the relative sizes of the variance in $x$ and $y$. For example, a very long and narrow ellipse tilted almost to the horizontal has a strong correlation between $x$ and $y$ (because the ellipse is narrow), and the variance of $x$ is much larger than that of $y$ (because the ellipse is much longer in $x$)."
   ]
@ -963,7 +963,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Recall that Bayesian statistics calls this the *evidence*. The ellipse points towards the radar. It is very long because the range measurement is inaccurate, and he aircraft could be within a considerable distance of the measured range. It is very narrow because the bearing estimate is very accurate and thus the aircraft must be very close to the bearing estimate.\n",
+    "Recall that Bayesian statistics calls this the *evidence*. The ellipse points towards the radar. It is very long because the range measurement is inaccurate, and the aircraft could be within a considerable distance of the measured range. It is very narrow because the bearing estimate is very accurate and thus the aircraft must be very close to the bearing estimate.\n",
    "\n",
    "We want to find the *posterior* - the mean and covariance of incorporating the evidence into the prior. As in every chapter so far we multiply them together. I have the equations for this and we could use those, but I will use FilterPy's `multivariate_multiply` method."
   ]
--- a/06-Multivariate-Kalman-Filters.ipynb
+++ b/06-Multivariate-Kalman-Filters.ipynb
@ -810,7 +810,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's test this! `KalmanFilter` has a `predict` method that performs the prediction by computing $\\mathbf{\\bar{x}} = \\mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to  0.45 meter/sec. We've defined `dt = 0.1`, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged."
+    "Let's test this! `KalmanFilter` has a `predict` method that performs the prediction by computing $\\mathbf{\\bar{x}} = \\mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to  4.5 meter/sec. We've defined `dt = 0.1`, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged."
   ]
  },
  {
@ -950,7 +950,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can see that the center of the ellipse shifted by a small amount (from 10 to 10.45) and the ellipse elongated, showing the correlation between position and velocity. How does the filter compute new values for $\\mathbf{\\bar{P}}$, and what is it based on? It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground."
+    "You can see that the center of the ellipse shifted by a small amount (from 10 to 10.90) and the ellipse elongated, showing the correlation between position and velocity. How does the filter compute new values for $\\mathbf{\\bar{P}}$, and what is it based on? It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground."
   ]
  },
  {
@ -1095,7 +1095,7 @@
    "\n",
    "$$\\mathbf{y} = \\mathbf{z} - \\mathbf{H \\bar{x}}$$\n",
    "\n",
-    "where $\\textbf{y}$ is the residual, $\\mathbf{x^-}$ is the prior, $\\textbf{z}$ is the measurement, and $\\textbf{H}$ is the measurement function. So we take the prior, convert it to a measurement, and subtract it from the measurement our sensor gave us. This gives us the difference between our prediction and measurement in measurement space!\n",
+    "where $\\mathbf{y}$ is the residual, $\\mathbf{\\bar{x}}$ is the prior, $\\mathbf{z}$ is the measurement, and $\\mathbf{H}$ is the measurement function. So we take the prior, convert it to a measurement, and subtract it from the measurement our sensor gave us. This gives us the difference between our prediction and measurement in measurement space!\n",
    "\n",
    "We need to design $\\mathbf{H}$ so that $\\mathbf{H\\bar{x}}$ yields a measurement. For this problem we have a sensor that measures position, so $\\mathbf{z}$ will be a one variable vector:\n",
    "\n",
@ -1372,7 +1372,7 @@
    "It really cannot get much simpler than that. As we tackle more complicated problems this code will remain largely the same; all of the work goes into setting up the `KalmanFilter` variables; executing the filter is trivial.\n",
    "\n",
    "\n",
-    "The rest of the code optionally plots the results and returns the saved states and covaraniances."
+    "The rest of the code optionally plots the results and returns the saved states and covariances."
   ]
  },
  {
@ -1422,7 +1422,7 @@
   "source": [
    "There is still a lot to learn, but we have implemented our first, full Kalman filter using the same theory and equations as published by Rudolf Kalman! Code very much like this runs inside of your GPS and phone, inside every airliner, inside of robots, and so on. \n",
    "\n",
-    "The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (drawn in green). After the initial settling in period the filter should track the dog's position very closely. The yellow shaded portion between the black dotted lines shows 1 standard deviations of the filter's variance, which I explain in the next paragraph.\n",
+    "The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (the 'Track' line). After the initial settling in period the filter should track the dog's position very closely. The yellow shaded portion between the black dotted lines shows 1 standard deviations of the filter's variance, which I explain in the next paragraph.\n",
    "\n",
    "The next two plots show the variance of $x$ and of $\\dot{x}$. If you look at the code, you will see that I have plotted the diagonals of $\\mathbf{P}$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $\\mathbf{P}[0,0]$ is the variance of $x$, and $\\mathbf{P}[1,1]$ is the variance of $\\dot{x}$. You can see that despite initializing $\\mathbf{P}=(\\begin{smallmatrix}500&0\\\\0&400\\end{smallmatrix})$ we quickly converge to small variances for both the position and velocity. The covariance matrix $\\mathbf{P}$ tells us the *theoretical* performance of the filter *assuming* everything we tell it is true. Recall from the Gaussian chapter that the standard deviation is the square root of the variance, and that approximately 68% of a Gaussian distribution occurs within one standard deviation. Therefore, if at least 68% of the filter output is within one standard deviation we can be sure that the filter is performing well. In the top chart I have displayed the one standard deviation as the yellow shaded area between the two dotted lines. To my eye it looks like perhaps the filter is slightly exceeding that bounds, so the filter probably needs some tuning. We will discuss this later in the chapter."
   ]
@ -1496,7 +1496,7 @@
    "\n",
    "$\\underline{\\textbf{Mean}}$\n",
    "\n",
-    "$\\mathbf{x}^- = \\mathbf{Fx} + \\mathbf{Bu}$\n",
+    "$\\mathbf{\\bar{x}} = \\mathbf{Fx} + \\mathbf{Bu}$\n",
    "\n",
    "We are starting out easy - you were already exposed to the first equation while designing the state transition function $\\mathbf{F}$ and control function $\\mathbf{B}$. \n",
    "\n",
@ -1521,7 +1521,7 @@
    "\n",
    "$$\\bar{x} = \\dot{x}\\Delta t + x$$\n",
    "\n",
-    "Since we do not have perfect knowledge of the value of $\\dot{x}$ the sum $x^- = \\dot{x}\\Delta t + x$ gains uncertainty. Because the positions and velocities are correlated we cannot simply add the covariance matrices. The correct equation is\n",
+    "Since we do not have perfect knowledge of the value of $\\dot{x}$ the sum $\\bar{x} = \\dot{x}\\Delta t + x$ gains uncertainty. Because the positions and velocities are correlated we cannot simply add the covariance matrices. The correct equation is\n",
    "\n",
    "$$\\mathbf{\\bar{P}} = \\mathbf{FPF}^\\mathsf{T}$$\n",
    "\n",
@ -1568,7 +1568,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information asbout the position due to adding $\\dot{x}\\Delta t$ to x at each step. The height has not changed - our system model say the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf{F}$ linearly correlates $x$ with $\\dot{x}$ with the expression $\\bar{x} = \\dot{x} \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf{T}$ computation correctly incorporates this correlation into the covariance matrix!\n",
+    "You can see that with a velocity of 5 the position correctly moves 3 units in each 6/10ths of a second step. At each step the width of the ellipse is larger, indicating that we have lost information about the position due to adding $\\dot{x}\\Delta t$ to x at each step. The height has not changed - our system model says the velocity does not change, so the belief we have about the velocity cannot change. As time continues you can see that the ellipse becomes more and more tilted. Recall that a tilt indicates *correlation*. $\\mathbf{F}$ linearly correlates $x$ with $\\dot{x}$ with the expression $\\bar{x} = \\dot{x} \\Delta t + x$. The $\\mathbf{FPF}^\\mathsf{T}$ computation correctly incorporates this correlation into the covariance matrix!\n",
    "\n",
    "Here is an animation of this equation that allows you to change the design of $\\mathbf{F}$ to see how it affects shape of $\\mathbf{P}$. The `F00` slider affects the value of F[0, 0]. `covar` sets the intial covariance between the position and velocity($\\sigma_x\\sigma_{\\dot{x}}$). I recommend answering these questions at a minimum\n",
    "\n",
@ -1637,7 +1637,7 @@
   "source": [
    "### Update Equations\n",
    "\n",
-    "The update equations look messier than the predict equations, but that is mostly due to the Kalman filter computing the update in **measurement space**. This is because measurement are not *invertable*. For example, later on we will be using sensors that only give you the range to a target. It is impossible to convert a range into a position - an infinite number of positions (in a circle) will yield the same range. On the other hand, we can always compute the range (measurement) given a position(state).\n",
+    "The update equations look messier than the predict equations, but that is mostly due to the Kalman filter computing the update in **measurement space**. This is because measurement are not *invertible*. For example, later on we will be using sensors that only give you the range to a target. It is impossible to convert a range into a position - an infinite number of positions (in a circle) will yield the same range. On the other hand, we can always compute the range (measurement) given a position(state).\n",
    "\n",
    "Before I continue, recall that we are trying to do something very simple: choose a new estimate chosen somewhere between a measurement and a prediction, as in this chart:"
   ]
@ -1679,14 +1679,14 @@
   "source": [
    "<u>**System Uncertainty**</u>\n",
    "\n",
-    "$\\textbf{S} = \\mathbf{HP^-H}^\\mathsf{T} + \\mathbf{R}$\n",
+    "$\\textbf{S} = \\mathbf{H\\bar{P}H}^\\mathsf{T} + \\mathbf{R}$\n",
    "\n",
    "To work in measurement space the Kalman filter has to project the covariance matrix into measurement space. The math for this is $\\mathbf{H\\bar{P}H}^\\mathsf{T}$, where $\\mathbf{\\bar{P}}$ is the *prior* covariance and $\\mathbf{H}$ is the measurement function.\n",
    "\n",
    "\n",
    "You should recognize this $\\mathbf{ABA}^\\mathsf{T}$ form - the prediction step used $\\mathbf{FPF}^\\mathsf{T}$ to update $\\mathbf{P}$ with the state transition function. Here, we use the same form to update it with the measurement function. In a real sense the linear algebra is changing the coordinate system for us. \n",
    "\n",
-    "Once the covariace is in measurement space we need to account for the sensor noise. This is very easy  - we just add matrices. The result is variously called either the **system uncertainty** or **innovation covariance**.\n",
+    "Once the covariance is in measurement space we need to account for the sensor noise. This is very easy  - we just add matrices. The result is variously called either the **system uncertainty** or **innovation covariance**.\n",
    "\n",
    "I want you to compare the equation for the system uncertainty and the covariance\n",
    "\n",
@ -1747,13 +1747,13 @@
    "\n",
    "$\\underline{\\texttt{State Update}}$\n",
    "\n",
-    "$\\mathbf{x} = \\mathbf{x}^- + \\mathbf{Ky}$\n",
+    "$\\mathbf{x} = \\mathbf{\\bar{x}} + \\mathbf{Ky}$\n",
    "\n",
-    "We select our new state to be along the residual, scaled by the Kalman gain. The scaling is performed by $\\mathbf{Ky}$, which both scales the residual and converts it back into state space. This is added to the prior, yielding the equation: $\\mathbf{x} =\\mathbf{x}^- + \\mathbf{Ky}$.\n",
+    "We select our new state to be along the residual, scaled by the Kalman gain. The scaling is performed by $\\mathbf{Ky}$, which both scales the residual and converts it back into state space. This is added to the prior, yielding the equation: $\\mathbf{x} =\\mathbf{\\bar{x}} + \\mathbf{Ky}$.\n",
    "\n",
    "$\\underline{\\texttt{Covariance Update}}$\n",
    "\n",
-    "$\\mathbf{P} = (\\mathbf{I}-\\mathbf{KH})\\mathbf{P}^-$\n",
+    "$\\mathbf{P} = (\\mathbf{I}-\\mathbf{KH})\\mathbf{\\bar{P}}$\n",
    "\n",
    "$\\mathbf{I}$ is the identity matrix, and is the way we represent $1$ in multiple dimensions. $\\mathbf{H}$ is our measurement function, and is a constant.  So, simplified, this is simply $\\mathbf{P} = (1-c\\mathbf{K})\\mathbf{P}$. $\\mathbf{K}$ is our ratio of how much prediction vs measurement we use. So, if $\\mathbf{K}$ is large then $(1-\\mathbf{cK})$ is small, and $\\mathbf{P}$ will be made smaller than it was. If $\\mathbf{K}$ is small, then $(1-\\mathbf{cK})$ is large, and $\\mathbf{P}$ will be made larger than it was. So we adjust the size of our uncertainty by some factor of the *Kalman gain*.\n",
    "\n",
@ -1792,7 +1792,7 @@
    "\n",
    "The notation above makes use of the Bayesian a$\\mid$b notation, which means a given the evidence of b. The hat means estimate. So, $\\hat{\\mathbf{x}}_{k\\mid k}$ means the estimate of the state $\\mathbf{X}$ at time $k$ (the first k) given the evidence from time $k$ (the second k). The posterior, in other words. $\\hat{\\mathbf{x}}_{k\\mid k-1}$ means the estimate for the state $\\mathbf{x}$ at time k given the estimate from time k - 1. The prior, in other words. \n",
    "\n",
-    "This notation allows a mathematician to express himself exactly, and when it comes to formal publications presenting new results this precision is necessary. As a programmer I find all of that fairly unreadable; I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format, so each author makes different choices. I find it challenging to switch quickly between books an papers, and so have adopted my admittedly less precise notation. Mathematicians will write scathing emails to me, but I hope the programmers and students will rejoice.\n",
+    "This notation allows a mathematician to express himself exactly, and when it comes to formal publications presenting new results this precision is necessary. As a programmer I find all of that fairly unreadable; I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format, so each author makes different choices. I find it challenging to switch quickly between books and papers, and so have adopted my admittedly less precise notation. Mathematicians will write scathing emails to me, but I hope the programmers and students will rejoice.\n",
    "\n",
    "Here are some examples for how other authors write the prior: $X^*_{n+1,n}$, $\\underline{\\hat{x}}_k(-)$ (really!), $\\hat{\\textbf{x}}^-_{k+1}$, $\\hat{x}_{k}$. If you are lucky an author defines the notation; more often you have to read the equations in context to recognize what the author is doing. Of course, people write within a tradition; papers on Kalman filters in finance are likely to use one set of notations while papers on radar tracking are likely to use a different set. Over time you will start to become familiar with trends, and also instantly recognize when somebody just copied equations wholesale from another work. For example - the equations I gave above were copied from the  Wikipedia [Kalman Filter](https://en.wikipedia.org/wiki/Kalman_filter#Details) [[1]](#[wiki_article]) article.\n",
    "\n",
@ -2047,7 +2047,7 @@
    "\n",
    "Let's remind ourselves of what the term *process uncertainty* means. Consider the problem of tracking a ball. We can accurately model its behavior in static air with math, but if there is any wind our model will diverge from reality. \n",
    "\n",
-    "In the first case we set `Q_var=20 m^2`, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\" as we are saying that the variance in the velocity is 10. Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf{F}}$, but the upshot of that is to not trust the motion prediction step. So the filter will be computing velocity ($\\dot{x}$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to use but the measurements, and thus it follows the measurements closely. \n",
+    "In the first case we set `Q_var=20 m^2`, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\" as we are saying that the variance in the velocity is 20. Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf{F}}$, but the upshot of that is to not trust the motion prediction step. So the filter will be computing velocity ($\\dot{x}$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to use but the measurements, and thus it follows the measurements closely. \n",
    "\n",
    "In the second case we set `Q_var=0.02 m^2`, which is quite small. In physical terms we are telling the filter \"trust the motion computation, it is really good!\". Again, more strictly this actually says there is very small amounts of process noise (variance 0.02 $m^2$), so the motion computation will be accurate. So the filter ends up ignoring some of the measurement as it jumps up and down, because the variation in the measurement does not match our trustworthy velocity prediction."
   ]
@ -2375,7 +2375,7 @@
    "\n",
    "So despite the filter tracking very close to the actual signal we cannot conclude that the 'magic' is to use a small $\\mathbf{P}$. Yes, this will avoid having the Kalman filter take time to accurately track the signal, but if we are truly uncertain about the initial measurements this can cause the filter to generate very bad results. If we are tracking a living object we are probably very uncertain about where it is before we start tracking it. On the other hand, if we are filtering the output of a thermometer, we are as certain about the first measurement as the 1000th. For your Kalman filter to perform well you must set $\\mathbf{P}$ to a value that truly reflects your knowledge about the data. \n",
    "\n",
-    "Let's see the result of a bad initial estimate coupled with a very small $\\mathbf{P}$ We will set our initial estimate at 100 m (whereas the dog actually starts at 0m), but set `P=1 m`."
+    "Let's see the result of a bad initial estimate coupled with a very small $\\mathbf{P}$. We will set our initial estimate at 100 m (whereas the dog actually starts at 0m), but set `P=1 m`."
   ]
  },
  {
@ -2598,7 +2598,7 @@
    "## Filter Initialization\n",
    "\n",
    "\n",
-    "There are many schemes for initializing the filter (i.e. choosing the initial values for $\\mathbf{x}$ and $\\mathbf{P}$. I will share a common approach that performs well in most situations. In this scheme you do not initialize the filter until you get the first measurement, $\\mathbf{z}_0$. From this you can compute the initial value for $\\mathbf{x}$ with $\\mathbf{x}_0 = \\mathbf{z}_0$. If  $\\mathbf{z}$ is not of the same size, type, and units as $\\mathbf{x}$, which is usually the case, we can use our measurement function as follow.\n",
+    "There are many schemes for initializing the filter (i.e. choosing the initial values for $\\mathbf{x}$ and $\\mathbf{P}$). I will share a common approach that performs well in most situations. In this scheme you do not initialize the filter until you get the first measurement, $\\mathbf{z}_0$. From this you can compute the initial value for $\\mathbf{x}$ with $\\mathbf{x}_0 = \\mathbf{z}_0$. If  $\\mathbf{z}$ is not of the same size, type, and units as $\\mathbf{x}$, which is usually the case, we can use our measurement function as follow.\n",
    "\n",
    "We know\n",
    "\n",
--- a/07-Kalman-Filter-Math.ipynb
+++ b/07-Kalman-Filter-Math.ipynb
@ -400,7 +400,7 @@
    "&= 1\\end{aligned}$$\n",
    "\n",
    "\n",
-    "Using similar math we can compute that $VAR(a) = 0.25$ and $VAR(c)=4$. This allows us to fill in the covariance matrix with\n",
+    "Using similar math we can compute that $VAR(b) = 0.25$ and $VAR(c)=4$. This allows us to fill in the covariance matrix with\n",
    "\n",
    "$$\\Sigma = \\begin{bmatrix}1 & & \\\\ & 0.25 & \\\\ &&4\\end{bmatrix}$$"
   ]
@ -847,7 +847,7 @@
    "\n",
    "However, I can present enough of the theory to allow us to create the system equations for many different Kalman filters, and give you enough background to at least follow the mathematics in the literature. My goal is to get you to the stage where you can read a Kalman filtering book or paper and understand it well enough to implement the algorithms. The background math is deep, but we end up using a few simple techniques over and over again in practice.\n",
    "\n",
-    "I struggle a bit with the proper way to present this material. If you have not encountered this math before I fear reading this section will not be very profitable for you. In the **Extended Kalman Filter** chapter I take a more ad-hoc way of presenting this information where I expose a problem that the KF needs to solve, then provide the math without a lot of supporting theory. This gives you the motivation behind the mathematics at the cost of not knowing why the math I give you is correct. On the other hand, the following section gives you the math, but somewhat divorced from the specifics of the problem we are trying to solve. Only you know what kind of learner your are. If you like the presentation of the book so far (practical first, then the math) you may want to wait until you read the **Extended Kalman Filter** before \n",
+    "I struggle a bit with the proper way to present this material. If you have not encountered this math before I fear reading this section will not be very profitable for you. In the **Extended Kalman Filter** chapter I take a more ad-hoc way of presenting this information where I expose a problem that the KF needs to solve, then provide the math without a lot of supporting theory. This gives you the motivation behind the mathematics at the cost of not knowing why the math I give you is correct. On the other hand, the following section gives you the math, but somewhat divorced from the specifics of the problem we are trying to solve. Only you know what kind of learner you are. If you like the presentation of the book so far (practical first, then the math) you may want to wait until you read the **Extended Kalman Filter** before.\n",
    "In particular, if your intent is to work with Extended Kalman filters (a very prelevant form of nonlinear Kalman filtering) you will need to understand this math at least at the level I present it. If that is not your intent this section may still prove to be beneficial if you need to simulate a nonlinear system in order to test your filter.\n",
    "\n",
    "Let's lay out the problem and discuss what the solution will be. We  model *dynamic systems* with a set of first order *differential equations*. This should not be a surprise as calculus is the math of things that vary. For example, we say that velocity is the derivative of distance with respect to time\n",
@ -900,7 +900,7 @@
    "\n",
    "If we let the solution to the left hand side by named $F(x)$, we get\n",
    "\n",
-    "$$F(x) - f(x_0) = t-t_0$$\n",
+    "$$F(x) - F(x_0) = t-t_0$$\n",
    "\n",
    "We then solve for x with\n",
    "\n",
@ -1009,7 +1009,7 @@
   "source": [
    "This is not bad for only three terms. If you are curious, go ahead and implement this as a Python function to compute the series for an arbitrary number of terms. But I will forge ahead to the matrix form of the equation. \n",
    "\n",
-    "Let's consider tracking an object moving in a vacuum. In one dimesion the differential equation for motion with zero acceleration is\n",
+    "Let's consider tracking an object moving in a vacuum. In one dimension the differential equation for motion with zero acceleration is\n",
    "\n",
    "$$ v = \\dot{x}\\\\a=\\ddot{x} =0,$$\n",
    "\n",
@ -1165,7 +1165,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We model kinematic systems using Newton's equations. So far in this book we have either used position and velocity, or position,velocity, and acceleration as the models for our systems. There is nothing stopping us from going further - we can model jerk, jounce, snap, and so on. We don't do that normally because adding terms beyond the dynamics of the real system actually degrades the solution. \n",
+    "We model kinematic systems using Newton's equations. So far in this book we have either used position and velocity, or position, velocity, and acceleration as the models for our systems. There is nothing stopping us from going further - we can model jerk, jounce, snap, and so on. We don't do that normally because adding terms beyond the dynamics of the real system actually degrades the solution. \n",
    "\n",
    "Let's say that we need to model the position, velocity, and acceleration. We can then assume that acceleration is constant. Of course, there is process noise in the system and so the acceleration is not actually constant. In this section we will assume that the acceleration changes by a continuous time zero-mean white noise $w(t)$. In other words, we are assuming that velocity is acceleration changing by small amounts that over time average to 0 (zero-mean). \n",
    "\n",
@ -2076,7 +2076,7 @@
    "\n",
    "$$\\delta \\mathbf{z}^+ = \\mathbf{z} - h(\\mathbf{x}^+)$$\n",
    "\n",
-    "I don't use the plus superscript much because I find it quickly makes the equations unreadable, but $\\mathbf{x}^+$ it is the *a posteriori* state estimate, which is the predicted or unknown future state. In other words, the predict step of the linear Kalman filter computes this value. Here it is stands for the value of x which the ILS algorithm will compute on each iteration.\n",
+    "I don't use the plus superscript much because I find it quickly makes the equations unreadable, but $\\mathbf{x}^+$ is the *a posteriori* state estimate, which is the predicted or unknown future state. In other words, the predict step of the linear Kalman filter computes this value. Here it is stands for the value of x which the ILS algorithm will compute on each iteration.\n",
    "\n",
    "These equations give us the following linear algebra equation:\n",
    "\n",