|
|
|
|
@@ -1,7 +1,7 @@
|
|
|
|
|
{
|
|
|
|
|
"metadata": {
|
|
|
|
|
"name": "",
|
|
|
|
|
"signature": "sha256:85df0fc94fc4635a6e5ff492b47171d902e6302afab1b362a6d0272e085fd392"
|
|
|
|
|
"signature": "sha256:dedbd7f55073b104a3b3626e73108b6a60611b932a173ce508666a264357c866"
|
|
|
|
|
},
|
|
|
|
|
"nbformat": 3,
|
|
|
|
|
"nbformat_minor": 0,
|
|
|
|
|
@@ -278,7 +278,114 @@
|
|
|
|
|
"\n",
|
|
|
|
|
"Well, to be honest I have been choosing my problems carefully. For any arbitrary problem finding some of the matrices that we need to feed into the Kalman filter equations can be quite difficult. I haven't been *too tricky*, though. Equations like Newton's equations of motion can be trivially computed for Kalman filter applications, and they make up the bulk of the kind of problems that we want to solve. If you are a hobbyist, you can safely pass by this chapter for now, and perhaps forever. Some of the later chapters will assume the material in this chapter, but much of the work will still be accessible to you. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"But, I urge everyone to at least read the first section, and to skim the rest. It is not much harder than what you have done - the difficulty comes in finding closed form expressions for specific problems, not understanding the math in this chapter. \n"
|
|
|
|
|
"But, I urge everyone to at least read the first section, and to skim the rest. It is not much harder than what you have done - the difficulty comes in finding closed form expressions for specific problems, not understanding the math in this chapter. "
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "heading",
|
|
|
|
|
"level": 2,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"Bayesian Probabilitity"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"The title of this book is *Kalman and Bayesian Filters in Python* but to date I have not touched on the Bayesian aspect much. There was enough going on in the earlier chapters that adding this form of reasoning about filters could be a distraction rather than a help. I now which to take some time to explain what Bayesian probability is and how a Kalman filter is in fact a Bayesian filter. This is not just a diversion. First of all, a lot of the Kalman filter literature uses this formulation when talking about filters, so you will need to understand what they are talking about. Second, this math plays a strong role in filtering design once we move past the Kalman filter. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"To do so we will go back to our first tracking problem - tracking a dog in a hallway. Recall the update step - we believed with some degree of precision that the dog was at position 7 (for example), and then receive a measurement that the dog is at position 7.5. We want to incorporate that measurement into our belief. In the *Discrete Bayes* chapter we used histograms to denote our estimates at each halway position, and in the \n",
|
|
|
|
|
"*One Dimensional Kalman Filters* we used Gaussians. Both are method of using *Bayesian* probability.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Briefly, *Bayesian* probability is a branch of math that lets us evaluate a hypothesis or new data point given some uncertain information about the past. For example, suppose you are driving down your neighborhood street and see one of your neighbors at their door, using a key to let themselves in. Three doors down you see two people in masks breaking a window of another house. What might you conclude?\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"It is likely that you would reason that in the first case your neighbors were getting home and unlocking their door to get inside. In the second case you at least strongly suspect a robbery is in progress. In the first case you would just proceed on, and in the second case you'd probably call the police.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Of course, things are not always what they appear. Perhaps unbeknowst to you your neighbor sold their house that morning, and they were now breaking in to steal the new owner's belongings. In the second case, perhaps the owners of the house were at a costume event at the next house, they had a medical emergency with their child, realized they lost their keys, and were breaking into their own house to get the much needed medication. Those are both *unlikely* events, but possible. Adding a few additional pieces of information would allow you to determine the true state of affairs in all but the most complicated situations.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"These are instances of *Bayesian* reasoning. We take knowledge from the past and integrate in new information. You know that your neighbor owned their house yesterday, so it is still highly likely that they still own it today. You know that owners of houses normally have keys to the front door, and that the normal mode of entrance into your own house is not breaking windows, so the second case is *likely* to be a breaking and entering. The reasoning is not ironclad as shown by the alternative explanations, but it is likely."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "heading",
|
|
|
|
|
"level": 3,
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"Bayes' theorem"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"*Bayes' theorem* mathematically formalizes the above reasoning. It is written as\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$P(A|B) = \\frac{P(B | A)\\, P(A)}{P(B)}\\cdot$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Before we do some computations, let's review what the terms mean. P(A) is called the *prior probability* of the event A, and is often just shortened to the *prior*. What is the prior? It is just the probabability of A being true *before* we incorporate new evidence. In our dog tracking problem above, the prior is the probability we assign to our belief that the dog is positioned at 7 before we make the measurement of 7.5. It is important to master this terminology if you expect to read a lot of the literature.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$P(A|B)$ is the *conditional probability* that A is true given that B is true. For example, if it is true that your neighbor still owns their house, then it will be very likely that they are not breaking into their house. In Bayesian probability this is called the *posterior*, and it denotes our new belief after incorporating the measurement/knowledge of B. For our dog tracking problem the posterier is the probability given to the estimated position after incorporating the measurement 7.5. For the neighbor problem the posterier would be the probability of a break in after you find out that your neighbor sold their home last week.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"What math did we use for the dog tracking problem? Recall that we used this equation to compute the new mean and probability distribution\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\\begin{aligned}\n",
|
|
|
|
|
"N(estimate) * N(measurement) &= \\\\\n",
|
|
|
|
|
"N(\\mu_1, \\sigma_1^2)*N(\\mu_2, \\sigma_2^2) &= N(\\frac{\\sigma_1^2 \\mu_2 + \\sigma_2^2 \\mu_1}{\\sigma_1^2 + \\sigma_2^2},\\frac{1}{\\frac{1}{\\sigma_1^2} + \\frac{1}{\\sigma_2^2}}) \\cdot\n",
|
|
|
|
|
"\\end{aligned}\n",
|
|
|
|
|
"$$ \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Here $N(\\mu_1, \\sigma_1^2)$ is the old estimated position, so $\\sigma_1$ is an indiation of our *prior* probability. $N(\\mu_2, \\sigma_2^2)$ is the mean and variance of our measurement, and so the result can be thought of as the new position and probability distribution after incorporating the new measurement. In other words, our *posterior distribution* is \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\\frac{1}{{\\sigma_{estimate}}^2} + \\frac{1}{{\\sigma_{measurement}}^2}$$"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"This is still a little hard to compare to Bayes' equation because we are dealing with probability distributions rather than probabilities. So let's cast our minds back to the discrete Bayes chapter where we computed the probability that our dog was at any given position in the hallway. It looked like this:\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" def normalize(p):\n",
|
|
|
|
|
" s = sum(p)\n",
|
|
|
|
|
" for i in range (len(p)):\n",
|
|
|
|
|
" p[i] = p[i] / s\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" def update(pos, measure, p_hit, p_miss):\n",
|
|
|
|
|
" q = np.array(pos, dtype=float)\n",
|
|
|
|
|
" for i in range(len(hallway)):\n",
|
|
|
|
|
" if hallway[i] == measure:\n",
|
|
|
|
|
" q[i] = pos[i] * p_hit\n",
|
|
|
|
|
" else:\n",
|
|
|
|
|
" q[i] = pos[i] * p_miss\n",
|
|
|
|
|
" normalize(q)\n",
|
|
|
|
|
" return q\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Let's rewrite this using our newly learned terminology.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" def update(prior_probability, measure, prob_hit, prob_miss):\n",
|
|
|
|
|
" posterior_probability = np.array(prior_probability, dtype=float)\n",
|
|
|
|
|
" for i in range(len(hallway)):\n",
|
|
|
|
|
" if hallway[i] == measure:\n",
|
|
|
|
|
" posterior_probability[i] = prior_probability[i] * p_hit\n",
|
|
|
|
|
" else:\n",
|
|
|
|
|
" posterior_probability[i] = prior_probability[i] * p_miss\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" return posterior_probability / sum(posterior_probability)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" \n",
|
|
|
|
|
"So what is this doing? It's multiplying the old belief that the dog is at position *i* (prior probability) with the probability that the measurement is correct for that position, and then dividing by the total probability for that new event.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Now let's look at Bayes' equation again.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$P(A|B) = \\frac{P(B | A)\\, P(A)}{P(B)}\\cdot$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"It's the same thing being calculated by the code. Multipy the prior ($P(A)$) by the probability of the measurement at each position ($P(B|A)$) and divide by the total probability for the event ($P(B)$).\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"In other words the first half of the Discrete Bayes chapter developed Bayes' equation from a thought experiment. I could have just presented Bayes' equation and then given you the Python routine above to implement it, but chances are you would not have understood *why* Bayes' equation works. Presenting the equation first is the normal approach of Kalman filtering texts, and I always found it extremely nonintuitive. \n"
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
@@ -295,13 +402,13 @@
|
|
|
|
|
"source": [
|
|
|
|
|
"We need to start by understanding the underlying equations and assumptions that the Kalman filter uses. We are trying to model real world phenomena, so what do we have to consider?\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"First, each physical system has a process. For example, a car traveling at a certain velocity goes so far in a fixed amount of time, and it's velocity varies as a function of its acceleration. We describe that behavior with the well known Newtonian equations we learned in high school.\n",
|
|
|
|
|
"First, each physical system has a process. For example, a car traveling at a certain velocity goes so far in a fixed amount of time, and its velocity varies as a function of its acceleration. We describe that behavior with the well known Newtonian equations we learned in high school.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\\begin{aligned}\n",
|
|
|
|
|
"v&=at\\\\\n",
|
|
|
|
|
"x &= \\frac{1}{2}at_2 + v_0t + d_0\n",
|
|
|
|
|
"x &= \\frac{1}{2}at^2 + v_0t + d_0\n",
|
|
|
|
|
"\\end{aligned}\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -342,9 +449,9 @@
|
|
|
|
|
"\n",
|
|
|
|
|
"$$ f(\\mathbf{x}) = \\mathbf{Fx} + \\mathbf{Bu} + \\mathbf{w}$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"And that's it. That is the equation that Kalman set out to solve, and he found a way to compute an optimal solution if we assume certain properties of $w$.\n",
|
|
|
|
|
"And that's it. That is one of the equations that Kalman set out to solve, and he found a way to compute an optimal solution if we assume certain properties of $w$.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"However, we took advantage of something I left mostly unstated in the last chapter. We were able to provide a definition for $\\mathbf{F}$ because we were able to take advantage of the exact solution that Newtonian equations provide us. However, if you have an engineering background you will realize what a small class of problems that covers.If you don't, I will explain it next, and provide you with several ways to compute $\\mathbf{F}$ for arbitrary systems."
|
|
|
|
|
"However, we took advantage of something I left mostly unstated in the last chapter. We were able to provide a definition for $\\mathbf{F}$ because we were able to take advantage of the exact solution that Newtonian equations provide us. However, if you have an engineering background you will realize what a small class of problems that covers. If you don't, I will explain it next, and provide you with several ways to compute $\\mathbf{F}$ for arbitrary systems."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
@@ -359,9 +466,27 @@
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"Modelling dynamic systems is properly the topic of at least one undergraduate course in mathmatics; I took several. However, I can present enough of the theory to allow us to create the system equations for many different Kalman filters, and give you enough background to at least follow the mathematics in the literature. My goal is to get you to the stage where you can read Brown, Zarchan, Bar-Shalom, or any other book and understand them well enough to implement the algorithms. Even without reading those books, you should be able to design many common forms of Filters by following the examples. \n",
|
|
|
|
|
"Modelling dynamic systems is properly the topic of at least one undergraduate course in mathmatics. To an extent there is no substitute for a few semesters of ordinary and partial differential equations. If you are a hobbiest, or trying to solve one very specific filtering problem at work you probably do not have the time and/or inclination to devote a year or more to that education.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"We model dynamic systems with a set of differential equations. For example, we already presented the Newtonian equation\n",
|
|
|
|
|
"However, I can present enough of the theory to allow us to create the system equations for many different Kalman filters, and give you enough background to at least follow the mathematics in the literature. My goal is to get you to the stage where you can read a Kalman filtering book or paper and understand it well enough to implement the algorithms. The background math is deep, but we end up using a few simple techniques over and over again in practice. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Let's lay out the problem and discuss what the solution will be. We model *dynamic systems* with a set of first order *differential equations*. This should not be a surprise as calculus is the math of of thing that vary. For example, we say that velocity is the derivative of distance with respect to time\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\\mathbf{v}= \\frac{d \\mathbf{x}}{d t} = \\dot{\\mathbf{x}}$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"where $\\dot{\\mathbf{x}}$ is the notation for the derivative of $\\mathbf{x}$ with respect to t.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"We need to use these equations for the predict step of the Kalman filter. Given the state of the system at time $t$ we want to predict its state at time $t + \\Delta t$. The Kalman filter matrices do not accept differential equations, so we need a mathematical technique that will find the solution to those equations at each time step. In general it is extremely difficult to find analytic solutions to systems of differential equations, so we will normally use *numerical* techniques to find accurate approximations for these equations."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Why This is Hard\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"We model dynamic systems with a set of first order differential equations. For example, we already presented the Newtonian equation\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\\mathbf{v}=\\dot{\\mathbf{x}}$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -374,14 +499,13 @@
|
|
|
|
|
"if the behaviour of the system depends on time. However, if the system is *time invariant* the equations are of the form\n",
|
|
|
|
|
"$$ f(x) = \\dot{x}$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"What does *time invariant* mean? Consider a home stereo. If you input a signal $x$ into it at time $t$, it will output some signal $f(x)$ a moment later. If you instead make the input at a later time the output signal will still be exactly the same, just shifted in time. This is different from, say, an aircraft. If you make a control input to the aircraft at a later time it's behavior will be different because it will have burned additonal fuel (and thus lost weight), drag may be different due to being at a different altitude, and so on.\n",
|
|
|
|
|
"What does *time invariant* mean? Consider a home stereo. If you input a signal $x$ into it at time $t$, it will output some signal $f(x)$. If you instead make the input at a later time $t + \\Delta t$ the output signal will still be exactly the same, just shifted in time. This is different from, say, an aircraft. If you make a control input to the aircraft at a later time it's behavior will be different because it will have burned additonal fuel (and thus lost weight), drag may be different due to being at a different altitude, and so on.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"We can solve these equations by integrating each side. The time variant equation is very straightforward. We essentially solved this problem with the Newtonian equations above, but let's be explicit and write it out. Starting with $$\\dot{\\mathbf{x}}=\\mathbf{v}$$ we get the expected\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$ \\int \\dot{\\mathbf{x}}\\mathrm{d}t = \\int \\mathbf{v} \\mathrm{d}t\\\\\n",
|
|
|
|
|
"x = vt + x_0$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"However, integrating the time invariant equation is not so straightforward. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$ \\dot{x} = f(x) \\\\\n",
|
|
|
|
|
@@ -405,7 +529,7 @@
|
|
|
|
|
"\n",
|
|
|
|
|
"In other words, we need to find the inverse of $F$. This is not at all trivial, and a significant amount of course work in a STEM education is devoted to finding tricky, analytic solutions to this problem, backed by several centuries of research. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"In the end, however, they are tricks, and many simple forms of $f(x)$ either have no closed form solution, or pose extreme difficulties. Instead, the practicing engineer turns to numerical methods to find a solution to her problems. I would suggest that students would be better served by learning fewer analytic mathematical tricks and instead focusing on learning numerical methods."
|
|
|
|
|
"In the end, however, they are tricks, and many simple forms of $f(x)$ either have no closed form solution, or pose extreme difficulties. Instead, the practicing engineer turns to numerical methods to find a solution to her problems. I would suggest that students would be better served by learning fewer analytic mathematical tricks and instead focusing on learning numerical methods, but that is the topic for another book."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
@@ -420,7 +544,7 @@
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"So let me leap over quite a bit of mathematics and present the typical numerical techniques used in Kalman filter design. \n",
|
|
|
|
|
" If you already have the mathematical training in solving partial differential equations you may be able to put it to use; I am not assuming that sort of background. So let me skip over quite a bit of mathematics and present the typical numerical techniques used in Kalman filter design. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"First, we express the system equations in state-space form (i.e. using linear algebra equations) with\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -430,20 +554,19 @@
|
|
|
|
|
"\n",
|
|
|
|
|
"$$x(t) = \\Phi(t-t_0)x(t_0)$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"In other words, we just want to compute the value of $x$ at time $t$ by multipying its previous value by some matrix $\\Phi$.\n",
|
|
|
|
|
"In other words, we just want to compute the value of $x$ at time $t$ by multipying its previous value by some matrix $\\Phi$. This is not trivial to do because the original equations do not include time \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Broadly speaking there are three ways to find $\\Phi$. *Linear Time Invarient Theory*, known as LTI System Theory, gives us a way to find $\\Phi$ using the inverse Laplace transform. You are either nodding your head now, or completely lost. Don't worry, I will not be using the Lapace transform in this book except in this paragraph, as the computation is quite difficult to perform in practice, and there are better techniques we can avail ourselves to. LTI system theory tells us that \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$ \\Phi(t) = \\mathcal{L}^{-1}[(s\\mathbf{I} - \\mathbf{F})^{-1}]$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"I have no intention of going into this other than to say that the inverse Lapace transform converts a signal into the frequency (time) domain, but finding a solution to the equation above is non-trivial. If you are interested, the Wikipedia article on LTI system theory provides an introduction [1].\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"The second technique is to use a Taylor-series expansion: \n",
|
|
|
|
|
"Broadly speaking there are three ways to find $\\Phi$. The technique most often used with Kalman filters is to use a Taylor-series expansion: \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$ \\Phi(t) = e^{\\mathbf{F}t} = \\mathbf{I} + \\mathbf{F}t + \\frac{(\\mathbf{F}t)^2}{2!} + \\frac{(\\mathbf{F}t)^3}{3!} + ... $$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"This is much easier to compute, and is the typical approach used in Kalman filter design when the filter is reasonably small. If you are wondering where $e$ came from, I again point to wikipedia - this time the matrix exponential article [2]. Here the important point is to recognize the very simple and regular form this equation takes. We will put this form into use in the next chapter, so I will not belabor its use here. \n",
|
|
|
|
|
"This is much easy to compute, and thus is the typical approach used in Kalman filter design when the filter is reasonably small. If you are wondering where $e$ came from, I point you to the wikipedia article on the matrix exponential [1]. Here the important point is to recognize the very simple and regular form this equation takes. We will put this form into use in the next chapter, so I will not belabor its use here. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"*Linear Time Invarient Theory*, also known as LTI System Theory, gives us a way to find $\\Phi$ using the inverse Laplace transform. You are either nodding your head now, or completely lost. Don't worry, I will not be using the Lapace transform in this book except in this paragraph, as the computation is quite difficult to perform in practice. LTI system theory tells us that \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$ \\Phi(t) = \\mathcal{L}^{-1}[(s\\mathbf{I} - \\mathbf{F})^{-1}]$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"I have no intention of going into this other than to say that the inverse Lapace transform converts a signal into the frequency (time) domain, but finding a solution to the equation above is non-trivial. If you are interested, the Wikipedia article on LTI system theory provides an introduction [2].\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Finally, there are numerical techniques to find $\\Phi$. As filters get larger finding analytical solutions becomes very tedious (though packages like Sympy make it easier). C. F. van Loan [3] has developed a technique that finds both $\\Phi$ and $Q$ numerically.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -523,6 +646,45 @@
|
|
|
|
|
],
|
|
|
|
|
"prompt_number": 2
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"### Forming First Order Equations from Higher Order Equations\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"In the sections above I spoke of *first order* differential equations; these are equations with only first derivatives. However, physical systems often require second or higher order equations. Any higher order system of equations can be converted to a first order set of equations by defining extra variables for the first order terms and then solving. Let's do an example. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Given the system $\\ddot{x} - 6\\dot{x} + 9x = t$ find the first order equations.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"The first step is to isolate the highest order term onto one side of the equation .\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\\ddot{x} = 6\\dot{x} - 9x + t$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"We define two new variables:\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$ x_1(t) = x \\\\\n",
|
|
|
|
|
"x_2(t) = \\dot{x}\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Now we will substitute these into the original equation and solve, giving us a set of first order equations in terms of these new variables.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"First, we know that $\\dot{x}_1 = x_2$ and that $\\dot{x}_2 = \\ddot{x}$. Therefore\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\\begin{aligned}\n",
|
|
|
|
|
"\\dot{x}_2 &= \\ddot{x} \\\\\n",
|
|
|
|
|
" &= 6\\dot{x} - 9x + t\\\\\n",
|
|
|
|
|
" &= 6x_2-9x_1 + t\n",
|
|
|
|
|
"\\end{aligned}$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Therefore our first order system of equations is\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\\begin{aligned}\\dot{x}_1 &= x_2 \\\\\n",
|
|
|
|
|
"\\dot{x}_2 &= 6x_2-9x_1 + t\\end{aligned}$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"If you practice this a bit you will become adept at it. Just isolate the highest term, define a new variable and its derivatives, and then substitute."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "heading",
|
|
|
|
|
"level": 2,
|
|
|
|
|
@@ -553,7 +715,7 @@
|
|
|
|
|
"\\end{aligned}\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"I will start with the update step, as that is what we started with in the one dimensional Kalman filter case. Our first equation is\n",
|
|
|
|
|
"I will start with the update step, as that is what we started with in the one dimensional Kalman filter case. The first equation is\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\\mathbf{y} = \\mathbf{z} - \\mathbf{H x}\\tag{3}\n",
|
|
|
|
|
@@ -599,25 +761,9 @@
|
|
|
|
|
"$$\n",
|
|
|
|
|
"Unfortunately it is a fair amount of linear algebra to derive this. The derivation can be quite elegant, and I urge you to look it up if you have the mathematical education to follow it. But $\\mathbf{K}$ is just the *Kalman gain* - the ratio of how much measurement vs prediction we should use to create the new estimate. $\\mathbf{R}$ is the *measurement noise*, and $\\mathbf{P}$ is our *uncertainty covariance matrix* from the prediction step.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"** author's note: the following aside probably belongs elsewhere in the book**\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> As an aside, most textbooks are more exact with the notation, in Gelb[1] for example, *Pk(+)* is used to denote the uncertainty covariance for the prediction step, and *Pk(-)* for the uncertainty covariance for the update step. Other texts use subscripts with 'k|k-1', superscipt $^-$, and many other variations. As a programmer I find all of that fairly unreadable; I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format, so each author makes different choices. I find it challenging to switch quickly between books an papers, and so have adopted this admittedly less precise notation. Mathematicians will write scathing emails to me, but I hope the programmers and students will rejoice.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> If you are a programmer trying to understand a paper's math equations, I strongly recommend just removing all of the superscripts, subscripts, and diacriticals, replacing them with a single letter. If you work with equations like this every day this is superflous advice, but when I read I am usually trying to understand the flow of computation. To me it is far more understandable to remember that $P$ in this step represents the updated value of $P$ computed in the last step, as opposed to trying to remember what $P_{k-1}(+)$ denotes, and what its relation to $P_k(-)$ is, if any.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> For example, for the equation of $\\mathbf{S}$ above, Wikipedia uses\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> $$\\textbf{S}_k = \\textbf{H}_k \\textbf{P}_{k\\mid k-1} \\textbf{H}_k^\\mathsf{T} + \\textbf{R}_k\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> Is that more exact? Absolutely. Is it easier or harder to read? You'll need to answer that for yourself.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> For reference, the Appendix **Symbols and Notations** lists the symbology used by the major authors in the field.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"So let's work through this expression by expression. Start with $\\mathbf{HPH}^\\mathsf{T}$. The linear equation $\\mathbf{ABA}^T$ can be thought of as changing the basis of $\\mathbf{B}$ to $\\mathbf{A}$. So $\\mathbf{HPH}^\\mathsf{T}$ is taking the covariance $\\mathbf{P}$ and putting it in measurement ($\\mathbf{H}$) space. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"In English, consider the problem of reading a temperature with a thermometer that provices readings in volts. Our state is in terms of temperature, but we are now doing calculations in *measurement space* - volts. So we need to convert $\\mathbf{P}$ from applying to temperatures to volts. The linear algebra form $\\textbf{H}\\textbf{P}\\textbf{H}$ takes $\\mathbf{P}$ to the basis used by $\\mathbf{H}$, namely volts. \n",
|
|
|
|
|
"In English, consider the problem of reading a temperature with a thermometer that provices readings in volts. Our state is in terms of temperature, but we are now doing calculations in *measurement space* - volts. So we need to convert $\\mathbf{P}$ from applying to temperatures to volts. The linear algebra form $\\textbf{H}\\textbf{P}\\textbf{H}^\\mathsf{T}$ takes $\\mathbf{P}$ to the basis used by $\\mathbf{H}$, namely volts. \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Then, once in measurement space, we can add the measurement noise $\\mathbf{R}$ to it. Hence, the expression for the uncertainty once we include the measurement is:\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -640,13 +786,13 @@
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\\textbf{K} = \\frac{uncertainty_{prediction}}{uncertainty_{measurement}}\n",
|
|
|
|
|
"\\textbf{K} = \\frac{uncertainty_{prediction}}{uncertainty_{measurement}}\\textbf{H}^\\mathsf{T}\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"In other words, the *Kalman gain* equation is doing nothing more than computing a ratio based on how much we trust the prediction vs the measurement. If we are confident in our measurements and unconfident in our predictions $\\mathbf{K}$ will favor the measurement, and vice versa. The equation is complicated because we are doing this in multiple dimensions via matrices, but the concept is simple - scale by a ratio.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Without going into the derivation of $\\mathbf{K}$, I'll say that this equation is the result of finding a value of $\\mathbf{K}$ that optimizes the *mean-square estimation error*. It does this by finding the minimal values for $\\mathbf{P}$ along it's diagonal. Recall that the diagonal of $\\mathbf{P}$ is just the variance for each state variable. So, this equation for $\\mathbf{K}$ ensures that the Kalman filter output is optimal. To put this in concrete terms, for our dog tracking problem this means that the estimates for both position and velocity will be optimal - a value of $\\mathbf{K}$ that made the position extremely accurate but the velocity very inaccurate would be rejected in favor of a $\\mathbf{K}$ that made both position and velocity just somewhat accurate."
|
|
|
|
|
"Without going into the derivation of $\\mathbf{K}$, I'll say that this equation is the result of finding a value of $\\mathbf{K}$ that optimizes the *mean-square estimation error*. It does this by finding the minimal values for $\\mathbf{P}$ along its diagonal. Recall that the diagonal of $\\mathbf{P}$ is just the variance for each state variable. So, this equation for $\\mathbf{K}$ ensures that the Kalman filter output is optimal. To put this in concrete terms, for our dog tracking problem this means that the estimates for both position and velocity will be optimal - a value of $\\mathbf{K}$ that made the position extremely accurate but the velocity very inaccurate would be rejected in favor of a $\\mathbf{K}$ that made both position and velocity just somewhat accurate."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
@@ -1172,6 +1318,27 @@
|
|
|
|
|
"while not correct, is often a useful approximation. If you do this you will have to perform quite a few studies to guarantee that your filter works in a variety of situations. Given the availability of functions to compute the correct values of $\\mathbf{Q}$ for you I would strongly recommend not using approximations. Perhaps it is justified for quick-and-dirty filters, or on embedded devices where you need to wring out every last bit of performance, and seek to minimize the number of matrix operations required. "
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"\n",
|
|
|
|
|
"** author's note: the following aside probably belongs elsewhere in the book**\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> As an aside, most textbooks are more exact with the notation, in Gelb[1] for example, *Pk(+)* is used to denote the uncertainty covariance for the prediction step, and *Pk(-)* for the uncertainty covariance for the update step. Other texts use subscripts with 'k|k-1', superscipt $^-$, and many other variations. As a programmer I find all of that fairly unreadable; I am used to thinking about variables changing state as a program runs, and do not use a different variable name for each new computation. There is no agreed upon format, so each author makes different choices. I find it challenging to switch quickly between books an papers, and so have adopted this admittedly less precise notation. Mathematicians will write scathing emails to me, but I hope the programmers and students will rejoice.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> If you are a programmer trying to understand a paper's math equations, I strongly recommend just removing all of the superscripts, subscripts, and diacriticals, replacing them with a single letter. If you work with equations like this every day this is superflous advice, but when I read I am usually trying to understand the flow of computation. To me it is far more understandable to remember that $P$ in this step represents the updated value of $P$ computed in the last step, as opposed to trying to remember what $P_{k-1}(+)$ denotes, and what its relation to $P_k(-)$ is, if any.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> For example, for the equation of $\\mathbf{S}$ above, Wikipedia uses\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> $$\\textbf{S}_k = \\textbf{H}_k \\textbf{P}_{k\\mid k-1} \\textbf{H}_k^\\mathsf{T} + \\textbf{R}_k\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> Is that more exact? Absolutely. Is it easier or harder to read? You'll need to answer that for yourself.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"> For reference, the Appendix **Symbols and Notations** lists the symbology used by the major authors in the field."
|
|
|
|
|
]
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"cell_type": "heading",
|
|
|
|
|
"level": 2,
|
|
|
|
|
@@ -1184,9 +1351,9 @@
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
" * [1] *LTI System Theory* http://en.wikipedia.org/wiki/LTI_system_theory\n",
|
|
|
|
|
" \n",
|
|
|
|
|
" * [2] *Matrix Exponential* http://en.wikipedia.org/wiki/Matrix_exponential \n",
|
|
|
|
|
" * [1] *Matrix Exponential* http://en.wikipedia.org/wiki/Matrix_exponential \n",
|
|
|
|
|
"\n",
|
|
|
|
|
" * [2] *LTI System Theory* http://en.wikipedia.org/wiki/LTI_system_theory\n",
|
|
|
|
|
" \n",
|
|
|
|
|
" * [3] C.F. van Loan, \"Computing Integrals Involving the Matrix Exponential,\" IEEE Transactions Automatic Control, June 1978."
|
|
|
|
|
]
|
|
|
|
|
|