From 63669a8c43d63910ba44d34ae5dc99ddcc076776 Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Fri, 14 Aug 2015 16:44:30 -0700
Subject: [PATCH 1/7] Spelling corrections.

github issue #52 - it wasn't possible to accept the merge becuase
notebooks don't merge well after cells are run.
---
 01-g-h-filter.ipynb | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/01-g-h-filter.ipynb b/01-g-h-filter.ipynb
index 5275d97..d4f8d6d 100644
--- a/01-g-h-filter.ipynb
+++ b/01-g-h-filter.ipynb
@@ -828,7 +828,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "That is pretty good! There is a lot of data here, so let's talk about how to interpret it.  The thick green line shows the estimate from the filter. It starts at day 0 with the inital guess of 160 lbs. The red line shows the prediction that is made from the previous day's weight. So, on day one the previous weight was 160 lbs, the weight gain is 1 lb, and so the first prediction is 161 lbs. The estimate on day one is then part way between the prediction and measurement at 159.8 lbs. Above the chart is a print out of the previous weight, predicted weight, and new estimate for each day. Finally, the thin black line shows the actual weight gain of the person being weighed. \n",
+    "That is pretty good! There is a lot of data here, so let's talk about how to interpret it.  The thick green line shows the estimate from the filter. It starts at day 0 with the initial guess of 160 lbs. The red line shows the prediction that is made from the previous day's weight. So, on day one the previous weight was 160 lbs, the weight gain is 1 lb, and so the first prediction is 161 lbs. The estimate on day one is then part way between the prediction and measurement at 159.8 lbs. Above the chart is a print out of the previous weight, predicted weight, and new estimate for each day. Finally, the thin black line shows the actual weight gain of the person being weighed. \n",
     "\n",
     "The estimates are not a straight line, but they are straighter than the measurements and somewhat close to the trend line we created. Also, it seems to get better over time. \n",
     "\n",
@@ -936,7 +936,7 @@
     "gain_rate = gain_rate\n",
     "```\n",
     "    \n",
-    "This obviously has no effect, and can be removed. I wrote this to emphasize that in the prediction step you need to predict next value for **all** variables, both *weight* and *gain_rate*. In this case we are assuming that the the gain does not vary, but when we generalize this algorithm we will remove that assumption. "
+    "This obviously has no effect, and can be removed. I wrote this to emphasize that in the prediction step you need to predict next value for **all** variables, both *weight* and *gain_rate*. In this case we are assuming that the gain does not vary, but when we generalize this algorithm we will remove that assumption. "
    ]
   },
   {
@@ -996,7 +996,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let me introduce some more formal terminology. The predict step is known as **sytem propagation**. The *system* is whatever we are estimating - in this case my weight. We *propogate* it into the future. Some texts call this the **evolution**. It means the same thing. The update step is usually known as the **measurement update**. One iteration of the system propagation and measurement update is known as an **epoch**. \n",
+    "Let me introduce some more formal terminology. The predict step is known as **system propagation**. The *system* is whatever we are estimating - in this case my weight. We *propogate* it into the future. Some texts call this the **evolution**. It means the same thing. The update step is usually known as the **measurement update**. One iteration of the system propagation and measurement update is known as an **epoch**. \n",
     "\n",
     "Now let's explore a few different problem domains to better understand this algorithm. Consider the problem of trying to track a train on a track. The track constrains the position of the train to a very specific region. Furthermore, trains are large and slow. It takes them many minutes to slow down or speed up significantly. So, if I know that the train is at kilometer marker 23 km at time t and moving at 18 kph, I can be extremely confident in predicting its position at time t + 1 second. And why is that important? Suppose we can only measure its position with an accuracy of $\\pm$ 250 meters. The train is moving at 18 kph, which is 5 meters per second. So at t+1 second the train will be at 23.005 km yet the measurement could be anywhere from 22.755 km to 23.255 km. So if the next measurement says the position is at 23.4 we know that must be wrong. Even if at time t the engineer slammed on the brakes the train will still be very close to 23.005 km because a train cannot slow down very much in 1 second. If we were to design a filter for this problem (and we will a bit further in the chapter!) we would want to design a filter that gave a very high weighting to the prediction vs the measurement. \n",
     "\n",
@@ -1262,7 +1262,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Excercise -  Create arrays\n",
+    "### Exercise -  Create arrays\n",
     "\n",
     "I want you to create a NumPy array of 10 elements with each element containing 1/10. There are several ways to do this; try to implement as many as you can think of. "
    ]
@@ -1949,7 +1949,7 @@
     "\n",
     "If you really want to test yourself, read the next paragraph and try to predict the results before you move the sliders. \n",
     "\n",
-    "Some things to try include setting $g$  and $h$ to their miminum values. See how perfectly the filter tracks the data! This is only because we are perfectly predicting the weight gain. Adjust $\\dot{x}$ to larger or smaller than 5. The filter should diverge from the data and never reacquire it. Start adding back either $g$ or $h$ and see how the filter snaps back to the data. See what the difference in the line is when you add only $g$ vs only $h$. Can you explain the reason for the difference? Then try setting $g$ greater than 1. Can you explain the results? Put $g$ back to a reasonable value (such as 0.1), and then make $h$ very large. Can you explain these results? Finally, set both $g$ and $h$ to their largest values. \n",
+    "Some things to try include setting $g$  and $h$ to their minimum values. See how perfectly the filter tracks the data! This is only because we are perfectly predicting the weight gain. Adjust $\\dot{x}$ to larger or smaller than 5. The filter should diverge from the data and never reacquire it. Start adding back either $g$ or $h$ and see how the filter snaps back to the data. See what the difference in the line is when you add only $g$ vs only $h$. Can you explain the reason for the difference? Then try setting $g$ greater than 1. Can you explain the results? Put $g$ back to a reasonable value (such as 0.1), and then make $h$ very large. Can you explain these results? Finally, set both $g$ and $h$ to their largest values. \n",
     " \n",
     "If you want to explore with this more, change the value of the array `zs` to the values used in any of the charts above and rerun the cell to see the result."
    ]
@@ -2329,7 +2329,7 @@
    "source": [
     "There are two lessons to be learned here. First, use the $h$ term to respond to changes in velocity that you are not modeling. But, far more importantly, there is a trade off here between responding quickly and accurately to changes in behavior and producing ideal output for when the system is in a steady state that you have. If the train never changes velocity we would make $h$ extremely small to avoid having the filtered estimate unduly affected by the noise in the measurement. But in an interesting problem there are almost always changes in state, and we want to react to them quickly. The more quickly we react to them, the more we are affected by the noise in the sensors. \n",
     "\n",
-    "I could go on, but my aim is not to develop g-h filter theory here so much as to build insight into how combining measurements and predictions leads to a filtered solution. Tthere is extensive literature on choosing $g$ and $h$ for problems such as this, and there are optimal ways of choosing them to achieve various goals. As I explained earlier it is easy to 'lie' to the filter when experimenting with test data like this. In the subsequent chapters we will learn how the Kalman filter solves this problem in the same basic manner, but with far more sophisticated mathematics. "
+    "I could go on, but my aim is not to develop g-h filter theory here so much as to build insight into how combining measurements and predictions leads to a filtered solution. There is extensive literature on choosing $g$ and $h$ for problems such as this, and there are optimal ways of choosing them to achieve various goals. As I explained earlier it is easy to 'lie' to the filter when experimenting with test data like this. In the subsequent chapters we will learn how the Kalman filter solves this problem in the same basic manner, but with far more sophisticated mathematics. "
    ]
   },
   {
@@ -2349,7 +2349,7 @@
     "\n",
     "    pip install filterpy\n",
     "    \n",
-    "Read Appendix A for more information on installing or downloding FilterPy from GitHub.\n",
+    "Read Appendix A for more information on installing or downloading FilterPy from GitHub.\n",
     "\n",
     "To use the g-h filter import it and create an object from the class `GHFilter`. "
    ]

From b98f157169c58b2766dcc0bd31da535237d554ef Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Sat, 15 Aug 2015 07:08:20 -0700
Subject: [PATCH 2/7] Minor language clean up.

---
 01-g-h-filter.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/01-g-h-filter.ipynb b/01-g-h-filter.ipynb
index 5275d97..56a05e7 100644
--- a/01-g-h-filter.ipynb
+++ b/01-g-h-filter.ipynb
@@ -1435,9 +1435,9 @@
    "source": [
     "The g-h filter is not one filter - it is a classification for a family of filters. Eli Brookner in *Tracking and Kalman Filtering Made Easy* lists 11, and I am sure there are more. Not only that, but each type of filter has numerous subtypes. Each filter is differentiated by how $g$ and $h$ are chosen. So there is no 'one size fits all' advice that I can give here. Some filters set $g$ and $h$ as constants, others vary them dynamically. The Kalman filter varies them dynamically at each step. Some filters allow $g$ and $h$ to take any value within a range, others constrain one to be dependent on the other by some function $f(\\dot{}), \\mbox{where }g = f(h)$.\n",
     "\n",
-    "The topic of this book is not the entire family of g-h filters; more importantly, we are interested in the *Bayesian* aspect of these filters, which I have not addressed yet. Therefore I will not cover selection of $g$ and $h$ in depth. Eli Brookner's book *Tracking and Kalman Filtering Made Easy* is an excellent resource for that topic, if it interests you. If this strikes you as an odd position for me to take, recognize that the typical formulation of the Kalman filter does not use $g$ and $h$ at all; the Kalman filter is a g-h filter because it mathematically reduces to this algorithm. When we design the Kalman filter we will be making a number of carefully considered choices to optimize it's performance, and those choices indirectly affect $g$ and $h$, but you will not be choosing $g$ and $h$ directly. Don't worry if this is not too clear right now, it will be much clearer later after we develop the Kalman filter theory.\n",
+    "The topic of this book is not the entire family of g-h filters; more importantly, we are interested in the *Bayesian* aspect of these filters, which I have not addressed yet. Therefore I will not cover selection of $g$ and $h$ in depth. *Tracking and Kalman Filtering Made Easy* is an excellent resource for that topic. If this strikes you as an odd position for me to take, recognize that the typical formulation of the Kalman filter does not use $g$ and $h$ at all. The Kalman filter is a g-h filter because it mathematically reduces to this algorithm. When we design the Kalman filter we use design criteria that can be mathematically reduced to $g$ and $h$, but the Kalman filter form is usually a much more powerful way to think about the problem. Don't worry if this is not too clear right now, it will be much clearer later after we develop the Kalman filter theory.\n",
     "\n",
-    "However, it is worth seeing how varying $g$ and $h$ affects the results, so we will work through some examples. This will give us strong insight into the fundamental strengths and limitations of this type of filter, and help us understand the behavior of the rather more sophisticated Kalman filter."
+    "It is worth seeing how varying $g$ and $h$ affects the results, so we will work through some examples. This will give us strong insight into the fundamental strengths and limitations of this type of filter, and help us understand the behavior of the rather more sophisticated Kalman filter."
    ]
   },
   {

From 3ae8eb88d5c0a24fc95f34d59784aa527e5489a8 Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Sat, 15 Aug 2015 12:01:04 -0700
Subject: [PATCH 3/7] Improved wording for Bayes Theorm section.

---
 02-Discrete-Bayes.ipynb | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/02-Discrete-Bayes.ipynb b/02-Discrete-Bayes.ipynb
index 59a5de4..84d9f9a 100644
--- a/02-Discrete-Bayes.ipynb
+++ b/02-Discrete-Bayes.ipynb
@@ -1671,15 +1671,21 @@
    "source": [
     "We developed the math in this chapter merely by reasoning about the information we have at each moment. In the process we discovered **Bayes Theorem**. We will go into the specifics of the math of Bayes theorem later in the book. For now we will take a more intuitive approach. Recall from the preface that Bayes theorem tells us how to compute the probability of an event given previous information. That is exactly what we have been doing in this chapter. With luck our code should match the Bayes Theorem equation! \n",
     "\n",
-    "Bayes theorem is written as\n",
+    "We implemented the `update()` function with this probability calculation:\n",
+    "\n",
+    "$$ \\mathtt{posterior} = \\frac{\\mathtt{evidence}\\times \\mathtt{prior}}{\\mathtt{normalization}}$$ \n",
+    "\n",
+    "To review, the *prior* is the probability of something happening before we include the measurement and the *posterior* is the probability we compute after incorporating the information from the measurement.\n",
+    "\n",
+    "Bayes theorem is\n",
     "\n",
     "$$P(A|B) = \\frac{P(B | A)\\, P(A)}{P(B)}\\cdot$$\n",
     "\n",
     "If you are not familiar with this notation, let's review. $P(A)$ means the probability of event $A$. If $A$ is the event of a fair coin landing heads, then $P(A) = 0.5$.\n",
     "\n",
-    "$P(A|B)$ is called a **conditional probability**. That is, it represents the probability of $A$ happening *if* $B$ happened. For example, it is more likely to rain today if it also rained yesterday because rain systems tend to last more than one day. We'd write the probability of it raining today given that it rained yesterday as $P(rain_{today}|rain_{yesterday})$.\n",
+    "$P(A|B)$ is called a **conditional probability**. That is, it represents the probability of $A$ happening *if* $B$ happened. For example, it is more likely to rain today if it also rained yesterday because rain systems tend to last more than one day. We'd write the probability of it raining today given that it rained yesterday as $P(\\mathtt{rain_{today}}|\\mathtt{rain_{yesterday}})$.\n",
     "\n",
-    "In Bayesian statistics $P(A)$ is called the **prior**, and $P(A|B)$ is called the **posterior**. To see why, let's rewrite the equation in terms of our problem. We will use $x_i$ for the position at *i*, and $Z$ for the measurement. Hence, we want to know $P(x_i|Z)$, that is, the probability of the dog being at $x_i$ given the measurement $Z$. \n",
+    "In Bayes theorem $P(A)$ is the *prior*, $P(B)$ is the *evidence*, and $P(A|B)$ is the *posterior*. By substituting the mathematical terms with the corresponding words  you can see that Bayes theorem matches out update equation. Let's rewrite the equation in terms of our problem. We will use $x_i$ for the position at *i*, and $Z$ for the measurement. Hence, we want to know $P(x_i|Z)$, that is, the probability of the dog being at $x_i$ given the measurement $Z$. \n",
     "\n",
     "So, let's plug that into the equation and solve it.\n",
     "\n",
@@ -1697,11 +1703,7 @@
     "\n",
     "I added the `else` here, which has no mathematical effect, to point out that every element in $x$ (called `belief` in the code) is multiplied by a probability. You may object that I am multiplying by a scale factor, which I am, but this scale factor is derived from the probability of the measurement being correct vs  the probability being incorrect.\n",
     "\n",
-    "The last term to consider is the denominator $P(Z)$. This is the probability of getting the measurement $Z$ without taking the location into account. We compute that by taking the sum of $x$, or `sum(belief)` in the code. That is how we compute the normalization! So, the `update()` function is doing nothing more than computing Bayes theorem. Recall this equation from earlier in the chapter:\n",
-    "\n",
-    "$$ \\mathtt{posterior} = \\frac{\\mathtt{prior}\\times \\mathtt{evidence}}{\\mathtt{normalization}}$$ \n",
-    "\n",
-    "That is the Bayes theorem written in words instead of mathematical symbols. I could have given you Bayes theorem and then written a function, but I doubt that would have been illuminating unless you already know Bayesian statistics. Instead, we figured out what to do just by reasoning about the situation, and so of course the resulting code ended up implementing Bayes theorem. Students spend a lot of time struggling to understand this theorem; I hope you found it relatively straightforward."
+    "The last term to consider is the denominator $P(Z)$. This is the probability of getting the measurement $Z$ without taking the location into account. We compute that by taking the sum of $x$, or `sum(belief)` in the code. That is how we compute the normalization! So, the `update()` function is doing nothing more than computing Bayes theorem."
    ]
   },
   {

From 44552eea48e11863322be9d89df53fc0cfa67c8c Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Sat, 15 Aug 2015 12:02:10 -0700
Subject: [PATCH 4/7] Deleted empty cell.

---
 02-Discrete-Bayes.ipynb | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/02-Discrete-Bayes.ipynb b/02-Discrete-Bayes.ipynb
index 84d9f9a..9adf9c3 100644
--- a/02-Discrete-Bayes.ipynb
+++ b/02-Discrete-Bayes.ipynb
@@ -1798,15 +1798,6 @@
     "\n",
     " https://en.wikipedia.org/wiki/Time_evolution"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

From 5d13589b8d08eabcb14fd5ea9270e703e83273f0 Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Sat, 15 Aug 2015 15:56:45 -0700
Subject: [PATCH 5/7] Cleaned up a lot of duplicate text.

---
 07-Kalman-Filter-Math.ipynb | 165 +++---------------------------------
 1 file changed, 12 insertions(+), 153 deletions(-)

diff --git a/07-Kalman-Filter-Math.ipynb b/07-Kalman-Filter-Math.ipynb
index 70d300f..a1743e0 100644
--- a/07-Kalman-Filter-Math.ipynb
+++ b/07-Kalman-Filter-Math.ipynb
@@ -276,116 +276,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "** author's note:** *the ordering of material in this chapter is questionable. I delve into solving ODEs before discussing the basic Kalman equations. If you are reading this while it is being worked on (so long as this notice exists), you may find it easier to skip around a bit until I organize it better.*\n",
-    "\n",
-    "\n",
     "If you've gotten this far I hope that you are thinking that the Kalman filter's fearsome reputation is somewhat undeserved. Sure, I hand waved some equations away, but I hope implementation has been fairly straightforward for you. The underlying concept is quite straightforward - take two measurements, or a measurement and a prediction, and choose the output to be somewhere between the two. If you believe the measurement more your guess will be closer to the measurement, and if you believe the prediction is more accurate your guess will lie closer it it. That's not rocket science (little joke - it is exactly this math that got Apollo to the moon and back!). \n",
     "\n",
-    "Well, to be honest I have been choosing my problems carefully. For any arbitrary problem finding some of the matrices that we need to feed into the Kalman filter equations can be quite difficult. I haven't been *too tricky*, though. Equations like Newton's equations of motion can be trivially computed for Kalman filter applications, and they make up the bulk of the kind of problems that we want to solve. If you are a hobbyist, you can safely pass by this chapter for now, and perhaps forever. Some of the later chapters will assume the material in this chapter, but much of the work will still be accessible to you. \n",
+    "To be honest I have been choosing my problems carefully. For any arbitrary problem finding some of the matrices that we need to feed into the Kalman filter equations can be quite difficult. I haven't been *too tricky*, though. Equations like Newton's equations of motion can be trivially computed for Kalman filter applications, and they make up the bulk of the kind of problems that we want to solve. \n",
     "\n",
-    "But, I urge everyone to at least read the first section, and to skim the rest. It is not much harder than what you have done - the difficulty comes in finding closed form expressions for specific problems, not understanding the math in this chapter. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Bayesian Probability"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The title of this book is *Kalman and Bayesian Filters in Python* but to date I have not touched on the Bayesian aspect much. There was enough going on in the earlier chapters that adding this form of reasoning about filters could be a distraction rather than a help. I now wish to take some time to explain what Bayesian probability is and how a Kalman filter is in fact a Bayesian filter. This is not a diversion. First of all, a lot of the Kalman filter literature uses this formulation when talking about filters, so you will need to understand what they are talking about. Second, this math plays a strong role in filtering design once we move past the Kalman filter. \n",
-    "\n",
-    "To do so we will go back to our first tracking problem - tracking a dog in a hallway. Recall the update step - we believed with some degree of precision that the dog was at position 7 (for example), and then receive a measurement that the dog is at position 7.5. We want to incorporate that measurement into our belief. In the *Discrete Bayes* chapter we used histograms to denote our estimates at each hallway position, and in the *One Dimensional Kalman Filters* we used Gaussians. Both are method of using *Bayesian* probability.\n",
-    "\n",
-    "Briefly, *Bayesian* probability is a branch of math that lets us evaluate a hypothesis or new data point given some uncertain information about the past. For example, suppose you are driving down your neighborhood street and see one of your neighbors at their door, using a key to let themselves in. Three doors down you see two people in masks breaking a window of another house. What might you conclude?\n",
-    "\n",
-    "It is likely that you would reason that in the first case your neighbors were getting home and unlocking their door to get inside. In the second case you at least strongly suspect a robbery is in progress. In the first case you would proceed on, and in the second case you'd probably call the police.\n",
-    "\n",
-    "Of course, things are not always what they appear. Perhaps unbeknownst to you your neighbor sold their house that morning, and they were now breaking in to steal the new owner's belongings. In the second case, perhaps the owners of the house were at a costume event at the next house, they had a medical emergency with their child, realized they lost their keys, and were breaking into their own house to get the much needed medication. Those are both *unlikely* events, but possible. Adding a few additional pieces of information would allow you to determine the true state of affairs in all but the most complicated situations.\n",
-    "\n",
-    "These are instances of *Bayesian* reasoning. We take knowledge from the past and integrate in new information. You know that your neighbor owned their house yesterday, so it is still highly likely that they still own it today. You know that owners of houses normally have keys to the front door, and that the normal mode of entrance into your own house is not breaking windows, so the second case is *likely* to be a breaking and entering. The reasoning is not ironclad as shown by the alternative explanations, but it is likely."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Bayes' theorem"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "*Bayes' theorem* mathematically formalizes the above reasoning. It is written as\n",
-    "\n",
-    "$$P(A|B) = \\frac{P(B | A)\\, P(A)}{P(B)}\\cdot$$\n",
-    "\n",
-    "\n",
-    "Before we do some computations, let's review what the terms mean. P(A) is called the *prior probability* of the event A, and is often shortened to the *prior*. What is the prior? It is the probability of A being true *before* we incorporate new evidence. In our dog tracking problem above, the prior is the probability we assign to our belief that the dog is positioned at 7 before we make the measurement of 7.5. It is important to master this terminology if you expect to read a lot of the literature.\n",
-    "\n",
-    "$P(A|B)$ is the *conditional probability* that A is true given that B is true. For example, if it is true that your neighbor still owns their house, then it will be very likely that they are not breaking into their house. In Bayesian probability this is called the *posterior*, and it denotes our new belief after incorporating the measurement/knowledge of B. For our dog tracking problem the posterior is the probability given to the estimated position after incorporating the measurement 7.5. For the neighbor problem the posterior would be the probability of a break in after you find out that your neighbor sold their home last week.\n",
-    "\n",
-    "What math did we use for the dog tracking problem? Recall that we used this equation to compute the new mean and probability distribution\n",
-    "\n",
-    "$$\n",
-    "\\begin{aligned}\n",
-    "N(estimate) * N(measurement) &= \\\\\n",
-    "N(\\mu_1, \\sigma_1^2)*N(\\mu_2, \\sigma_2^2) &= N(\\frac{\\sigma_1^2 \\mu_2 + \\sigma_2^2 \\mu_1}{\\sigma_1^2 + \\sigma_2^2},\\frac{1}{\\frac{1}{\\sigma_1^2} + \\frac{1}{\\sigma_2^2}}) \\cdot\n",
-    "\\end{aligned}\n",
-    "$$ \n",
-    "\n",
-    "\n",
-    "Here $N(\\mu_1, \\sigma_1^2)$ is the old estimated position, so $\\sigma_1$ is an indication of our *prior* probability. $N(\\mu_2, \\sigma_2^2)$ is the mean and variance of our measurement, and so the result can be thought of as the new position and probability distribution after incorporating the new measurement. In other words, our *posterior distribution* is \n",
-    "\n",
-    "$$\\frac{1}{{\\sigma_{estimate}}^2} + \\frac{1}{{\\sigma_{measurement}}^2}$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This is still a little hard to compare to Bayes' equation because we are dealing with probability distributions rather than probabilities. So let's cast our minds back to the discrete Bayes chapter where we computed the probability that our dog was at any given position in the hallway. It looked like this:\n",
-    "\n",
-    "```python\n",
-    "def update(pos_belief, measure, p_hit, p_miss):\n",
-    "    for i in range(len(hallway)):\n",
-    "        if hallway[i] == measure:\n",
-    "            pos_belief[i] *= p_hit\n",
-    "        else:\n",
-    "            pos_belief[i] *= p_miss\n",
-    "\n",
-    "    pos_belief /= sum(pos_belief)\n",
-    "```\n",
-    "\n",
-    "Let's rewrite this using our newly learned terminology.\n",
-    "\n",
-    "```python\n",
-    "def update(prior, measure, prob_hit, prob_miss):\n",
-    "    posterior = np.zeros(len(prior))\n",
-    "    for i in range(len(hallway)):\n",
-    "        if hallway[i] == measure:\n",
-    "            posterior[i] = prior[i] * p_hit\n",
-    "        else:\n",
-    "            posterior[i] = prior[i] * p_miss\n",
-    "\n",
-    "    return posterior / sum(posterior)\n",
-    "```\n",
-    "\n",
-    "\n",
-    "So what is this doing? It's multiplying the old belief that the dog is at position *i* (prior probability) with the probability that the measurement is correct for that position, and then dividing by the total probability for that new event.\n",
-    "\n",
-    "Now let's look at Bayes' equation again.\n",
-    "\n",
-    "$$P(A|B) = \\frac{P(B | A)\\, P(A)}{P(B)}\\cdot$$\n",
-    "\n",
-    "It's the same thing being calculated by the code. Multiply the prior ($P(A)$) by the probability of the measurement at each position ($P(B|A)$) and divide by the total probability for the event ($P(B)$).\n",
-    "\n",
-    "In other words the first half of the Discrete Bayes chapter developed Bayes' equation from a thought experiment. I could have presented Bayes' equation and then given you the Python routine above to implement it, but chances are you would not have understood *why* Bayes' equation works. Presenting the equation first is the normal approach of Kalman filtering texts, and I always found it extremely nonintuitive. "
+    "I have strived to illustrate concepts with code and reasoning, not math. But there are topics that do require more mathematics than I have used so far. In this chapter I will give you the math behind the topics that we have learned so far, and introduce the math that you will need to understand the topics in the rest of the book. Many topics are optional."
    ]
   },
   {
@@ -699,8 +594,7 @@
     "$$\n",
     "\\begin{aligned}\n",
     " \\mathbf{v} &= \\frac{d \\mathbf{x}}{d t}\\\\ \n",
-    " \\quad \\mathbf{a} &= \\frac{d \\mathbf{v}}{d t}\\\\\n",
-    " &= \\frac{d^2 \\mathbf{x}}{d t^2} \\,\\!\n",
+    " \\quad \\mathbf{a} &= \\frac{d \\mathbf{v}}{d t} = \\frac{d^2 \\mathbf{x}}{d t^2} \\,\\!\n",
     "\\end{aligned}\n",
     " $$\n",
     " \n",
@@ -1235,44 +1129,6 @@
     "If you practice this a bit you will become adept at it. Isolate the highest term, define a new variable and its derivatives, and then substitute."
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**ORPHAN TEXT**\n",
-    "\n",
-    "I'll address the computation of $\\mathbf{Q}$ in the next paragraph. For now you can see that as time passes our uncertainty in velocity is growing slowly. It is a bit harder to see, but if you compare this graph to the previous one the uncertainty in position also increased.\n",
-    "\n",
-    "We have not given the math for computing the elements of $\\mathbf{Q}$ yet, but if you suspect the math is sometimes difficult you would be correct. One of the problems is that we are usually modeling a *continuous* system - the behavior of the system is changing at every instant, but the Kalman filter is *discrete*. That means that we have to somehow convert the continuous noise of the system into a discrete value, which usually involves calculus. There are other difficulties I will not mention now.\n",
-    "\n",
-    "However, for the class of problems we are solving in this chapter (*discretized continuous-time kinematic filters*) we can directly compute the state equations for moving objects by using Newton's equations.\n",
-    "\n",
-    "For these kinds of problems we can rely on precomputed forms for $\\mathbf{Q}$. We will learn how to derive these matrices in the next chapter. For now I present them without proof. If we assume that for each time period the acceleration due to process noise is constant and uncorrelated, we get the following.\n",
-    "\n",
-    "For constant velocity the form is\n",
-    "\n",
-    "$$\\begin{bmatrix}\n",
-    "\\frac{1}{4}{\\Delta t}^4 & \\frac{1}{2}{\\Delta t}^3 \\\\\n",
-    "\\frac{1}{2}{\\Delta t}^3 & \\Delta t^2\n",
-    "\\end{bmatrix}\\sigma^2\n",
-    "$$\n",
-    "\n",
-    "and for constant acceleration we have\n",
-    "\n",
-    "$$\\begin{bmatrix}\n",
-    "\\frac{1}{4}{\\Delta t}^4 & \\frac{1}{2}{\\Delta t}^3 & \\frac{1}{2}{\\Delta t}^2 \\\\\n",
-    "\\frac{1}{2}{\\Delta t}^3 & {\\Delta t}^2 & \\Delta t \\\\\n",
-    "\\frac{1}{2}{\\Delta t}^2 & \\Delta t & 1\n",
-    "\\end{bmatrix} \\sigma^2\n",
-    "$$\n",
-    "\n",
-    "It general it is not true that acceleration will be constant and uncorrelated, but this is still a useful approximation for moderate time period, and will suffice for this chapter. Fortunately you can get a long way with approximations and simulation. Let's think about what these matrices are implying. We are trying to model the effects of *process noise*, such as the wind buffeting the flight of a thrown ball. Variations in wind will cause changes in acceleration, and so the effect on the acceleration is large. However, the effects on velocity and position are proportionally smaller. In the matrices, the acceleration term is in the lower right, and this is the largest value. **A good rule of thumb is to set $\\sigma$ somewhere from $\\frac{1}{2}\\Delta a$ to $\\Delta a$, where $\\Delta a$ is the maximum amount that the acceleration will change between sample periods**. In practice we pick a number, run simulations on data, and choose a value that works well. \n",
-    "\n",
-    "The filtered result will not be optimal, but in my opinion the promise of optimal results from Kalman filters is mostly wishful thinking. Consider, for example, tracking a car. In that problem the process noise would include things like potholes, wind gusts, changing drag due to turning, rolling down windows, and many more factors. We cannot realistically model that analytically, and so in practice we work out a simplified model, compute $\\mathbf{Q}$ based on that simplified model, and then add *a bit more* to  $\\small\\mathbf{Q}$ in hopes of taking the incalculable factors into account. Then we use a lot of simulations and trial runs to see if the filter behaves well; if it doesn't we adjust $\\small\\mathbf{Q}$ until the filter performs well. In this chapter we will focus on forming an intuitive understanding on how adjusting $\\small\\mathbf{Q}$ affects the output of the filter. In the Kalman Filter Math chapter we will discuss the analytic computation of $\\small\\mathbf{Q}$, and also provide code that will compute it for you.\n",
-    "\n",
-    "For now we will import the code from the `FilterPy` module, where it is already implemented. I will import it and call help on it so you can see the documentation for it."
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1623,7 +1479,9 @@
    "source": [
     "We cannot say that this model is more or less correct than the continuous model - both are approximations to what is happening to the actual object. Only experience and experiments can guide you to the appropriate model. In practice you will usually find that either model provides reasonable results, but typically one will perform better than the other.\n",
     "\n",
-    "The advantage of the second model is that we can model the noise in terms of $\\sigma^2$ which we can describe in terms of the motion and the amount of error we expect. The first model requires us to specify the spectral density, which is not very intuitive, but it handles varying time samples much more easily since the noise is integrated across the time period. However, these are not fixed rules - use whichever model (or a model of your own devising) based on testing how the filter performs and/or your knowledge of the behavior of the physical model."
+    "The advantage of the second model is that we can model the noise in terms of $\\sigma^2$ which we can describe in terms of the motion and the amount of error we expect. The first model requires us to specify the spectral density, which is not very intuitive, but it handles varying time samples much more easily since the noise is integrated across the time period. However, these are not fixed rules - use whichever model (or a model of your own devising) based on testing how the filter performs and/or your knowledge of the behavior of the physical model.\n",
+    "\n",
+    "A good rule of thumb is to set $\\sigma$ somewhere from $\\frac{1}{2}\\Delta a$ to $\\Delta a$, where $\\Delta a$ is the maximum amount that the acceleration will change between sample periods. In practice we pick a number, run simulations on data, and choose a value that works well."
    ]
   },
   {
@@ -1790,9 +1648,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "** author's note: this is just notes to a section. If you need to know this in depth, \n",
-    "*Computational Physics in Python * by Dr. Eric Ayars is excellent, and available here.\n",
-    "http://phys.csuchico.edu/ayars/312/Handouts/comp-phys-python.pdf **\n",
+    "> ** author's note: This topic requires multiple books to fully cover it. If you need to know this in depth, \n",
+    "*Computational Physics in Python * by Dr. Eric Ayars is excellent, and available for free here.\n",
+    "\n",
+    "> http://phys.csuchico.edu/ayars/312/Handouts/comp-phys-python.pdf **\n",
     "\n",
     "So far in this book we have been working with systems that can be expressed with simple linear differential equations such as\n",
     "\n",
@@ -1820,7 +1679,7 @@
     "$$\\begin{aligned}\\bar{x} &= x + v\\Delta t \\\\\n",
     "\\bar{\\dot{x}} &= \\dot{x}\\end{aligned}$$.\n",
     "\n",
-    "This works for linear ordinary differential equations (ODEs), but does not work (well) for nonlinear equations. For example, consider trying to predict the position of a rapidly turning car. Cars turn by pivoting the front wheels, which cause the car to pivot around the rear axle. Therefore the path will be continuously varying and a linear prediction will necessarily produce an incorrect value. If the change in the system is small enough relative to $\\Delta t$ this can often produce adequate results, but that will rarely be the case with the nonlinear Kalman filters we will be studying in subsequent chapters. Another problem is that even trivial systems produce differential equations for which finding closed form solutions is difficult or impossible. \n",
+    "This works for linear ordinary differential equations (ODEs), but does not work well for nonlinear equations. For example, consider trying to predict the position of a rapidly turning car. Cars turn by pivoting the front wheels, which cause the car to pivot around the rear axle. Therefore the path will be continuously varying and a linear prediction will necessarily produce an incorrect value. If the change in the system is small enough relative to $\\Delta t$ this can often produce adequate results, but that will rarely be the case with the nonlinear Kalman filters we will be studying in subsequent chapters. Another problem is that even trivial systems produce differential equations for which finding closed form solutions is difficult or impossible. \n",
     "\n",
     "For these reasons we need to know how to numerically integrate differential equations. This can be a vast topic, and SciPy provides integration routines such as `scipy.integrate.ode`. These routines are robust, but "
    ]
@@ -2157,7 +2016,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Iterative Least Squares for Sensor Fusion"
+    "## Iterative Least Squares for Sensor Fusion (Optional)"
    ]
   },
   {

From 2f3894cc735030080a3a519f71377a8dc2731540 Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Wed, 19 Aug 2015 17:51:12 -0700
Subject: [PATCH 6/7] Added space after ## for Notebook 4.0

---
 00-Preface.ipynb | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/00-Preface.ipynb b/00-Preface.ipynb
index ca85949..2357799 100644
--- a/00-Preface.ipynb
+++ b/00-Preface.ipynb
@@ -416,7 +416,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##Downloading the book"
+    "## Downloading the book"
    ]
   },
   {
@@ -446,7 +446,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##Installation and Software Requirements"
+    "## Installation and Software Requirements"
    ]
   },
   {
@@ -472,7 +472,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##My Libraries and Modules"
+    "## My Libraries and Modules"
    ]
   },
   {
@@ -497,7 +497,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##Thoughts on Python and Coding Math"
+    "## Thoughts on Python and Coding Math"
    ]
   },
   {
@@ -519,7 +519,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##License"
+    "## License"
    ]
   },
   {
@@ -551,7 +551,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##Resources"
+    "## Resources"
    ]
   },
   {

From 570b7862787bf35fb2c9fc94b2d9581b6628f012 Mon Sep 17 00:00:00 2001
From: Roger Labbe <rlabbejr@gmail.com>
Date: Wed, 19 Aug 2015 17:59:11 -0700
Subject: [PATCH 7/7] Fixed ## headings for IPython 4.0 changes.

---
 08-Designing-Kalman-Filters.ipynb | 16 +++++++++++-----
 12-Particle-Filters.ipynb         |  2 +-
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/08-Designing-Kalman-Filters.ipynb b/08-Designing-Kalman-Filters.ipynb
index b7e40d4..e2e9985 100644
--- a/08-Designing-Kalman-Filters.ipynb
+++ b/08-Designing-Kalman-Filters.ipynb
@@ -1985,7 +1985,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##Evaluating Filter Performance\n",
+    "## Evaluating Filter Performance\n",
     "\n",
     "It is easy to design a Kalman filter for a simulated situation. You know how much noise you are injecting in your process model, so you specify $\\mathbf{Q}$ to have the same value. You also know how much noise is in the measurement simulation, so the measurement noise matrix $\\mathbf{R}$ is equally trivial to define. \n",
     "\n",
@@ -2195,7 +2195,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "###NIS"
+    "### NIS\n",
+    "todo"
    ]
   },
   {
@@ -2923,6 +2924,13 @@
     "The code is fairly straightforward. The `update()` method optionally takes R as an argument, and I chose to do that rather than alter `KalmanFilter.R`, mostly to show that it is possible. Either way is fine. I modified `KalmanFilter.H` on each update depending on whether there are 1 or 2 measurements available. The only other difficulty was storing the wheel and PS measurements in two different arrays because there are a different number of measurements for each. "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Control Inputs\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -3409,8 +3417,6 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**author's note - I originally had ball tracking code in 2 different places in the book. One has been copied here, so now we have 2 sections on ball tracking. I need to edit this into one section, obviously. Sorry for the duplication.**\n",
-    "\n",
     "We are now ready to design a practical Kalman filter application. For this problem we assume that we are tracking a ball traveling through the Earth's atmosphere. The path of the ball is influenced by wind, drag, and the rotation of the ball. We will assume that our sensor is a camera; code that we will not implement will perform some type of image processing to detect the position of the ball. This is typically called *blob detection* in computer vision. However, image processing code is not perfect; in any given frame it is possible to either detect no blob or to detect spurious blobs that do not correspond to the ball. Finally, we will not assume that we know the starting position, angle, or rotation of the ball; the tracking code will have to initiate tracking based on the measurements that are provided. The main simplification that we are making here is a 2D world; we assume that the ball is always traveling orthogonal to the plane of the camera's sensor. We have to make that simplification at this point because we have not yet discussed how we might extract 3D information from a camera, which necessarily provides only 2D data. "
    ]
   },
@@ -3829,7 +3835,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##References\n",
+    "## References\n",
     "\n",
     "[1] Bar-Shalom, Yaakov, et al. *Estimation with Applications to Tracking and Navigation.* John Wiley & Sons, 2001."
    ]
diff --git a/12-Particle-Filters.ipynb b/12-Particle-Filters.ipynb
index 8058341..bcad700 100644
--- a/12-Particle-Filters.ipynb
+++ b/12-Particle-Filters.ipynb
@@ -1094,7 +1094,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##Importance Sampling\n",
+    "## Importance Sampling\n",
     "\n",
     "In the filter above I hand waved a difficulty away. There is some probability distribution that describes the position and movement of our robot. This might be impossible to integrate analytically, so we want to draw a sample of particles from that distribution and compute the integral using MC methods. \n",
     "\n",