Cleaned up a lot of duplicate text.

This commit is contained in:
Roger Labbe 2015-08-15 15:56:45 -07:00
parent 44552eea48
commit 5d13589b8d

View File

@ -276,116 +276,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"** author's note:** *the ordering of material in this chapter is questionable. I delve into solving ODEs before discussing the basic Kalman equations. If you are reading this while it is being worked on (so long as this notice exists), you may find it easier to skip around a bit until I organize it better.*\n",
"\n",
"\n",
"If you've gotten this far I hope that you are thinking that the Kalman filter's fearsome reputation is somewhat undeserved. Sure, I hand waved some equations away, but I hope implementation has been fairly straightforward for you. The underlying concept is quite straightforward - take two measurements, or a measurement and a prediction, and choose the output to be somewhere between the two. If you believe the measurement more your guess will be closer to the measurement, and if you believe the prediction is more accurate your guess will lie closer it it. That's not rocket science (little joke - it is exactly this math that got Apollo to the moon and back!). \n",
"\n",
"Well, to be honest I have been choosing my problems carefully. For any arbitrary problem finding some of the matrices that we need to feed into the Kalman filter equations can be quite difficult. I haven't been *too tricky*, though. Equations like Newton's equations of motion can be trivially computed for Kalman filter applications, and they make up the bulk of the kind of problems that we want to solve. If you are a hobbyist, you can safely pass by this chapter for now, and perhaps forever. Some of the later chapters will assume the material in this chapter, but much of the work will still be accessible to you. \n",
"To be honest I have been choosing my problems carefully. For any arbitrary problem finding some of the matrices that we need to feed into the Kalman filter equations can be quite difficult. I haven't been *too tricky*, though. Equations like Newton's equations of motion can be trivially computed for Kalman filter applications, and they make up the bulk of the kind of problems that we want to solve. \n",
"\n",
"But, I urge everyone to at least read the first section, and to skim the rest. It is not much harder than what you have done - the difficulty comes in finding closed form expressions for specific problems, not understanding the math in this chapter. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Bayesian Probability"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The title of this book is *Kalman and Bayesian Filters in Python* but to date I have not touched on the Bayesian aspect much. There was enough going on in the earlier chapters that adding this form of reasoning about filters could be a distraction rather than a help. I now wish to take some time to explain what Bayesian probability is and how a Kalman filter is in fact a Bayesian filter. This is not a diversion. First of all, a lot of the Kalman filter literature uses this formulation when talking about filters, so you will need to understand what they are talking about. Second, this math plays a strong role in filtering design once we move past the Kalman filter. \n",
"\n",
"To do so we will go back to our first tracking problem - tracking a dog in a hallway. Recall the update step - we believed with some degree of precision that the dog was at position 7 (for example), and then receive a measurement that the dog is at position 7.5. We want to incorporate that measurement into our belief. In the *Discrete Bayes* chapter we used histograms to denote our estimates at each hallway position, and in the *One Dimensional Kalman Filters* we used Gaussians. Both are method of using *Bayesian* probability.\n",
"\n",
"Briefly, *Bayesian* probability is a branch of math that lets us evaluate a hypothesis or new data point given some uncertain information about the past. For example, suppose you are driving down your neighborhood street and see one of your neighbors at their door, using a key to let themselves in. Three doors down you see two people in masks breaking a window of another house. What might you conclude?\n",
"\n",
"It is likely that you would reason that in the first case your neighbors were getting home and unlocking their door to get inside. In the second case you at least strongly suspect a robbery is in progress. In the first case you would proceed on, and in the second case you'd probably call the police.\n",
"\n",
"Of course, things are not always what they appear. Perhaps unbeknownst to you your neighbor sold their house that morning, and they were now breaking in to steal the new owner's belongings. In the second case, perhaps the owners of the house were at a costume event at the next house, they had a medical emergency with their child, realized they lost their keys, and were breaking into their own house to get the much needed medication. Those are both *unlikely* events, but possible. Adding a few additional pieces of information would allow you to determine the true state of affairs in all but the most complicated situations.\n",
"\n",
"These are instances of *Bayesian* reasoning. We take knowledge from the past and integrate in new information. You know that your neighbor owned their house yesterday, so it is still highly likely that they still own it today. You know that owners of houses normally have keys to the front door, and that the normal mode of entrance into your own house is not breaking windows, so the second case is *likely* to be a breaking and entering. The reasoning is not ironclad as shown by the alternative explanations, but it is likely."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Bayes' theorem"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Bayes' theorem* mathematically formalizes the above reasoning. It is written as\n",
"\n",
"$$P(A|B) = \\frac{P(B | A)\\, P(A)}{P(B)}\\cdot$$\n",
"\n",
"\n",
"Before we do some computations, let's review what the terms mean. P(A) is called the *prior probability* of the event A, and is often shortened to the *prior*. What is the prior? It is the probability of A being true *before* we incorporate new evidence. In our dog tracking problem above, the prior is the probability we assign to our belief that the dog is positioned at 7 before we make the measurement of 7.5. It is important to master this terminology if you expect to read a lot of the literature.\n",
"\n",
"$P(A|B)$ is the *conditional probability* that A is true given that B is true. For example, if it is true that your neighbor still owns their house, then it will be very likely that they are not breaking into their house. In Bayesian probability this is called the *posterior*, and it denotes our new belief after incorporating the measurement/knowledge of B. For our dog tracking problem the posterior is the probability given to the estimated position after incorporating the measurement 7.5. For the neighbor problem the posterior would be the probability of a break in after you find out that your neighbor sold their home last week.\n",
"\n",
"What math did we use for the dog tracking problem? Recall that we used this equation to compute the new mean and probability distribution\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"N(estimate) * N(measurement) &= \\\\\n",
"N(\\mu_1, \\sigma_1^2)*N(\\mu_2, \\sigma_2^2) &= N(\\frac{\\sigma_1^2 \\mu_2 + \\sigma_2^2 \\mu_1}{\\sigma_1^2 + \\sigma_2^2},\\frac{1}{\\frac{1}{\\sigma_1^2} + \\frac{1}{\\sigma_2^2}}) \\cdot\n",
"\\end{aligned}\n",
"$$ \n",
"\n",
"\n",
"Here $N(\\mu_1, \\sigma_1^2)$ is the old estimated position, so $\\sigma_1$ is an indication of our *prior* probability. $N(\\mu_2, \\sigma_2^2)$ is the mean and variance of our measurement, and so the result can be thought of as the new position and probability distribution after incorporating the new measurement. In other words, our *posterior distribution* is \n",
"\n",
"$$\\frac{1}{{\\sigma_{estimate}}^2} + \\frac{1}{{\\sigma_{measurement}}^2}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is still a little hard to compare to Bayes' equation because we are dealing with probability distributions rather than probabilities. So let's cast our minds back to the discrete Bayes chapter where we computed the probability that our dog was at any given position in the hallway. It looked like this:\n",
"\n",
"```python\n",
"def update(pos_belief, measure, p_hit, p_miss):\n",
" for i in range(len(hallway)):\n",
" if hallway[i] == measure:\n",
" pos_belief[i] *= p_hit\n",
" else:\n",
" pos_belief[i] *= p_miss\n",
"\n",
" pos_belief /= sum(pos_belief)\n",
"```\n",
"\n",
"Let's rewrite this using our newly learned terminology.\n",
"\n",
"```python\n",
"def update(prior, measure, prob_hit, prob_miss):\n",
" posterior = np.zeros(len(prior))\n",
" for i in range(len(hallway)):\n",
" if hallway[i] == measure:\n",
" posterior[i] = prior[i] * p_hit\n",
" else:\n",
" posterior[i] = prior[i] * p_miss\n",
"\n",
" return posterior / sum(posterior)\n",
"```\n",
"\n",
"\n",
"So what is this doing? It's multiplying the old belief that the dog is at position *i* (prior probability) with the probability that the measurement is correct for that position, and then dividing by the total probability for that new event.\n",
"\n",
"Now let's look at Bayes' equation again.\n",
"\n",
"$$P(A|B) = \\frac{P(B | A)\\, P(A)}{P(B)}\\cdot$$\n",
"\n",
"It's the same thing being calculated by the code. Multiply the prior ($P(A)$) by the probability of the measurement at each position ($P(B|A)$) and divide by the total probability for the event ($P(B)$).\n",
"\n",
"In other words the first half of the Discrete Bayes chapter developed Bayes' equation from a thought experiment. I could have presented Bayes' equation and then given you the Python routine above to implement it, but chances are you would not have understood *why* Bayes' equation works. Presenting the equation first is the normal approach of Kalman filtering texts, and I always found it extremely nonintuitive. "
"I have strived to illustrate concepts with code and reasoning, not math. But there are topics that do require more mathematics than I have used so far. In this chapter I will give you the math behind the topics that we have learned so far, and introduce the math that you will need to understand the topics in the rest of the book. Many topics are optional."
]
},
{
@ -699,8 +594,7 @@
"$$\n",
"\\begin{aligned}\n",
" \\mathbf{v} &= \\frac{d \\mathbf{x}}{d t}\\\\ \n",
" \\quad \\mathbf{a} &= \\frac{d \\mathbf{v}}{d t}\\\\\n",
" &= \\frac{d^2 \\mathbf{x}}{d t^2} \\,\\!\n",
" \\quad \\mathbf{a} &= \\frac{d \\mathbf{v}}{d t} = \\frac{d^2 \\mathbf{x}}{d t^2} \\,\\!\n",
"\\end{aligned}\n",
" $$\n",
" \n",
@ -1235,44 +1129,6 @@
"If you practice this a bit you will become adept at it. Isolate the highest term, define a new variable and its derivatives, and then substitute."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**ORPHAN TEXT**\n",
"\n",
"I'll address the computation of $\\mathbf{Q}$ in the next paragraph. For now you can see that as time passes our uncertainty in velocity is growing slowly. It is a bit harder to see, but if you compare this graph to the previous one the uncertainty in position also increased.\n",
"\n",
"We have not given the math for computing the elements of $\\mathbf{Q}$ yet, but if you suspect the math is sometimes difficult you would be correct. One of the problems is that we are usually modeling a *continuous* system - the behavior of the system is changing at every instant, but the Kalman filter is *discrete*. That means that we have to somehow convert the continuous noise of the system into a discrete value, which usually involves calculus. There are other difficulties I will not mention now.\n",
"\n",
"However, for the class of problems we are solving in this chapter (*discretized continuous-time kinematic filters*) we can directly compute the state equations for moving objects by using Newton's equations.\n",
"\n",
"For these kinds of problems we can rely on precomputed forms for $\\mathbf{Q}$. We will learn how to derive these matrices in the next chapter. For now I present them without proof. If we assume that for each time period the acceleration due to process noise is constant and uncorrelated, we get the following.\n",
"\n",
"For constant velocity the form is\n",
"\n",
"$$\\begin{bmatrix}\n",
"\\frac{1}{4}{\\Delta t}^4 & \\frac{1}{2}{\\Delta t}^3 \\\\\n",
"\\frac{1}{2}{\\Delta t}^3 & \\Delta t^2\n",
"\\end{bmatrix}\\sigma^2\n",
"$$\n",
"\n",
"and for constant acceleration we have\n",
"\n",
"$$\\begin{bmatrix}\n",
"\\frac{1}{4}{\\Delta t}^4 & \\frac{1}{2}{\\Delta t}^3 & \\frac{1}{2}{\\Delta t}^2 \\\\\n",
"\\frac{1}{2}{\\Delta t}^3 & {\\Delta t}^2 & \\Delta t \\\\\n",
"\\frac{1}{2}{\\Delta t}^2 & \\Delta t & 1\n",
"\\end{bmatrix} \\sigma^2\n",
"$$\n",
"\n",
"It general it is not true that acceleration will be constant and uncorrelated, but this is still a useful approximation for moderate time period, and will suffice for this chapter. Fortunately you can get a long way with approximations and simulation. Let's think about what these matrices are implying. We are trying to model the effects of *process noise*, such as the wind buffeting the flight of a thrown ball. Variations in wind will cause changes in acceleration, and so the effect on the acceleration is large. However, the effects on velocity and position are proportionally smaller. In the matrices, the acceleration term is in the lower right, and this is the largest value. **A good rule of thumb is to set $\\sigma$ somewhere from $\\frac{1}{2}\\Delta a$ to $\\Delta a$, where $\\Delta a$ is the maximum amount that the acceleration will change between sample periods**. In practice we pick a number, run simulations on data, and choose a value that works well. \n",
"\n",
"The filtered result will not be optimal, but in my opinion the promise of optimal results from Kalman filters is mostly wishful thinking. Consider, for example, tracking a car. In that problem the process noise would include things like potholes, wind gusts, changing drag due to turning, rolling down windows, and many more factors. We cannot realistically model that analytically, and so in practice we work out a simplified model, compute $\\mathbf{Q}$ based on that simplified model, and then add *a bit more* to $\\small\\mathbf{Q}$ in hopes of taking the incalculable factors into account. Then we use a lot of simulations and trial runs to see if the filter behaves well; if it doesn't we adjust $\\small\\mathbf{Q}$ until the filter performs well. In this chapter we will focus on forming an intuitive understanding on how adjusting $\\small\\mathbf{Q}$ affects the output of the filter. In the Kalman Filter Math chapter we will discuss the analytic computation of $\\small\\mathbf{Q}$, and also provide code that will compute it for you.\n",
"\n",
"For now we will import the code from the `FilterPy` module, where it is already implemented. I will import it and call help on it so you can see the documentation for it."
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -1623,7 +1479,9 @@
"source": [
"We cannot say that this model is more or less correct than the continuous model - both are approximations to what is happening to the actual object. Only experience and experiments can guide you to the appropriate model. In practice you will usually find that either model provides reasonable results, but typically one will perform better than the other.\n",
"\n",
"The advantage of the second model is that we can model the noise in terms of $\\sigma^2$ which we can describe in terms of the motion and the amount of error we expect. The first model requires us to specify the spectral density, which is not very intuitive, but it handles varying time samples much more easily since the noise is integrated across the time period. However, these are not fixed rules - use whichever model (or a model of your own devising) based on testing how the filter performs and/or your knowledge of the behavior of the physical model."
"The advantage of the second model is that we can model the noise in terms of $\\sigma^2$ which we can describe in terms of the motion and the amount of error we expect. The first model requires us to specify the spectral density, which is not very intuitive, but it handles varying time samples much more easily since the noise is integrated across the time period. However, these are not fixed rules - use whichever model (or a model of your own devising) based on testing how the filter performs and/or your knowledge of the behavior of the physical model.\n",
"\n",
"A good rule of thumb is to set $\\sigma$ somewhere from $\\frac{1}{2}\\Delta a$ to $\\Delta a$, where $\\Delta a$ is the maximum amount that the acceleration will change between sample periods. In practice we pick a number, run simulations on data, and choose a value that works well."
]
},
{
@ -1790,9 +1648,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"** author's note: this is just notes to a section. If you need to know this in depth, \n",
"*Computational Physics in Python * by Dr. Eric Ayars is excellent, and available here.\n",
"http://phys.csuchico.edu/ayars/312/Handouts/comp-phys-python.pdf **\n",
"> ** author's note: This topic requires multiple books to fully cover it. If you need to know this in depth, \n",
"*Computational Physics in Python * by Dr. Eric Ayars is excellent, and available for free here.\n",
"\n",
"> http://phys.csuchico.edu/ayars/312/Handouts/comp-phys-python.pdf **\n",
"\n",
"So far in this book we have been working with systems that can be expressed with simple linear differential equations such as\n",
"\n",
@ -1820,7 +1679,7 @@
"$$\\begin{aligned}\\bar{x} &= x + v\\Delta t \\\\\n",
"\\bar{\\dot{x}} &= \\dot{x}\\end{aligned}$$.\n",
"\n",
"This works for linear ordinary differential equations (ODEs), but does not work (well) for nonlinear equations. For example, consider trying to predict the position of a rapidly turning car. Cars turn by pivoting the front wheels, which cause the car to pivot around the rear axle. Therefore the path will be continuously varying and a linear prediction will necessarily produce an incorrect value. If the change in the system is small enough relative to $\\Delta t$ this can often produce adequate results, but that will rarely be the case with the nonlinear Kalman filters we will be studying in subsequent chapters. Another problem is that even trivial systems produce differential equations for which finding closed form solutions is difficult or impossible. \n",
"This works for linear ordinary differential equations (ODEs), but does not work well for nonlinear equations. For example, consider trying to predict the position of a rapidly turning car. Cars turn by pivoting the front wheels, which cause the car to pivot around the rear axle. Therefore the path will be continuously varying and a linear prediction will necessarily produce an incorrect value. If the change in the system is small enough relative to $\\Delta t$ this can often produce adequate results, but that will rarely be the case with the nonlinear Kalman filters we will be studying in subsequent chapters. Another problem is that even trivial systems produce differential equations for which finding closed form solutions is difficult or impossible. \n",
"\n",
"For these reasons we need to know how to numerically integrate differential equations. This can be a vast topic, and SciPy provides integration routines such as `scipy.integrate.ode`. These routines are robust, but "
]
@ -2157,7 +2016,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Iterative Least Squares for Sensor Fusion"
"## Iterative Least Squares for Sensor Fusion (Optional)"
]
},
{