Minor fixes

This commit is contained in:
George Shaw 2018-07-10 15:36:22 -07:00
parent 47cbc4bf43
commit 609b8be237
8 changed files with 15 additions and 15 deletions

View File

@ -1541,7 +1541,7 @@
"\n",
"This is a broad topic which I will not treat exhaustively. \n",
"\n",
"Let's consider a trivial example. We think of things like test scores as being normally distributed. If you have ever had a professor “grade on a curve” you have been subject to this assumption. But of course test scores cannot follow a normal distribution. This is because the distribution assigns a nonzero probability distribution for *any* value, no matter how far from the mean. So, for example, say your mean is 90 and the standard deviation is 13. The normal distribution assumes that there is a large chance of somebody getting a 90, and a small chance of somebody getting a 40. However, it also implies that there is a tiny chance of somebody getting a grade of -10, or 150. It assigns an extremely chance of getting a score of $-10^{300}$ or $10^{32986}$. The tails of a Gaussian distribution are infinitely long.\n",
"Let's consider a trivial example. We think of things like test scores as being normally distributed. If you have ever had a professor “grade on a curve” you have been subject to this assumption. But of course test scores cannot follow a normal distribution. This is because the distribution assigns a nonzero probability distribution for *any* value, no matter how far from the mean. So, for example, say your mean is 90 and the standard deviation is 13. The normal distribution assumes that there is a large chance of somebody getting a 90, and a small chance of somebody getting a 40. However, it also implies that there is a tiny chance of somebody getting a grade of -10, or 150. It assigns an extremely small chance of getting a score of $-10^{300}$ or $10^{32986}$. The tails of a Gaussian distribution are infinitely long.\n",
"\n",
"But for a test we know this is not true. Ignoring extra credit, you cannot get less than 0, or more than 100. Let's plot this range of values using a normal distribution to see how poorly this represents real test scores distributions."
]

View File

@ -630,7 +630,7 @@
"\n",
"$$x = \\| \\mathcal L\\bar x \\|$$\n",
"\n",
"We've just shown that we can represent the prior with a Gaussian. What about the likelihood? The likelihood is the probability of the measurement given the current state. We've learned how to represent measurements as a Gaussians. For example, maybe our sensor states that the dog is at 23 m, with a standard deviation of 0.4 meters. Our measurement, expressed as a likelihood, is $z = \\mathcal N (23, 0.16)$.\n",
"We've just shown that we can represent the prior with a Gaussian. What about the likelihood? The likelihood is the probability of the measurement given the current state. We've learned how to represent measurements as Gaussians. For example, maybe our sensor states that the dog is at 23 m, with a standard deviation of 0.4 meters. Our measurement, expressed as a likelihood, is $z = \\mathcal N (23, 0.16)$.\n",
"\n",
"Both the likelihood and prior are modeled with Gaussians. Can we multiply Gaussians? Is the product of two Gaussians another Gaussian?\n",
"\n",

View File

@ -308,7 +308,7 @@
"\n",
"$$\\sigma^2 = \\begin{bmatrix}10\\\\4\\end{bmatrix}$$ \n",
"\n",
"This is incomplete because it does not consider the more general case. In the **Gaussians** chapter we computed the variance in the heights of students. That is a measure of how the weights vary relative to each other. If all students are the same height the variance is 0, and if their heights are wildly different the variance will be large. \n",
"This is incomplete because it does not consider the more general case. In the **Gaussians** chapter we computed the variance in the heights of students. That is a measure of how the heights vary relative to each other. If all students are the same height, then the variance is 0, and if their heights are wildly different, then the variance will be large. \n",
"\n",
"There is also a relationship between height and weight. In general, a taller person weighs more than a shorter person. Height and weight are *correlated*. We want a way to express not only what we think the variance is in the height and the weight, but also the degree to which they are correlated. In other words, we want to know how weight varies compared to the heights. We call that the *covariance*. \n",
"\n",

View File

@ -1580,7 +1580,7 @@
" kf.update(z)\n",
"```\n",
"\n",
"Each call to `predict` and `update` modifies the state variables `x` and `P`. Therefore, after the call to `predict` `kf.x` contains the prior. After the call to update `kf.x` contains the posterior. `data` contains the actual position and measurement of the dog, so we use `[:, 1]` to get an array of measurements.\n",
"Each call to `predict` and `update` modifies the state variables `x` and `P`. Therefore, after the call to `predict`, `kf.x` contains the prior. After the call to update, `kf.x` contains the posterior. `data` contains the actual position and measurement of the dog, so we use `[:, 1]` to get an array of measurements.\n",
"\n",
"It really cannot get much simpler than that. As we tackle more complicated problems this code will remain largely the same; all of the work goes into setting up the `KalmanFilter` matrices; executing the filter is trivial.\n",
"\n",
@ -2555,7 +2555,7 @@
"\n",
"Let's start varying our parameters to see the effect of various changes. This is a very normal thing to be doing with Kalman filters. It is difficult, and often impossible to exactly model our sensors. An imperfect model means imperfect output from our filter. Engineers spend a lot of time tuning Kalman filters so that they perform well with real world sensors. We will spend time now to learn the effect of these changes. As you learn the effect of each change you will develop an intuition for how to design a Kalman filter. Designing a Kalman filter is as much art as science. We are modeling a physical system using math, and models are imperfect.\n",
"\n",
"Let's look at the effects of the measurement noise $\\mathbf R$ and process noise $\\mathbf Q$. We will want to see the effect of different settings for $\\mathbf R$ and $\\mathbf Q$, so I have given the measurements a variance of 225 meters squared. That is very large, but it magnifies the effects of various design choices on the graphs, making it easier to recognize what is happening. our first experiment holds $\\mathbf R$ constant while varying $\\mathbf Q$."
"Let's look at the effects of the measurement noise $\\mathbf R$ and process noise $\\mathbf Q$. We will want to see the effect of different settings for $\\mathbf R$ and $\\mathbf Q$, so I have given the measurements a variance of 225 meters squared. That is very large, but it magnifies the effects of various design choices on the graphs, making it easier to recognize what is happening. Our first experiment holds $\\mathbf R$ constant while varying $\\mathbf Q$."
]
},
{
@ -2603,7 +2603,7 @@
"source": [
"The filter in the first plot should follow the noisy measurement closely. In the second plot the filter should vary from the measurement quite a bit, and be much closer to a straight line than in the first graph. Why does ${\\mathbf Q}$ affect the plots this way?\n",
"\n",
"Let's remind ourselves of what the term *process uncertainty* means. Consider the problem of tracking a ball. We can accurately model its behavior in a vacuum with math, but with wind, varying air density, temperature, and an spinning ball with an imperfect surface our model will diverge from reality. \n",
"Let's remind ourselves of what the term *process uncertainty* means. Consider the problem of tracking a ball. We can accurately model its behavior in a vacuum with math, but with wind, varying air density, temperature, and a spinning ball with an imperfect surface our model will diverge from reality. \n",
"\n",
"In the first case we set `Q_var=20 m^2`, which is quite large. In physical terms this is telling the filter \"I don't trust my motion prediction step\" as we are saying that the variance in the velocity is 20. Strictly speaking, we are telling the filter there is a lot of external noise that we are not modeling with $\\small{\\mathbf F}$, but the upshot of that is to not trust the motion prediction step. The filter will be computing velocity ($\\dot x$), but then mostly ignoring it because we are telling the filter that the computation is extremely suspect. Therefore the filter has nothing to trust but the measurements, and thus it follows the measurements closely. \n",
"\n",
@ -2697,7 +2697,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we see that the filter initially struggles for several iterations to acquire the track, but then it accurately tracks our dog. In fact, this is nearly optimum - we have not designed $\\mathbf Q$ optimally, but $\\mathbf R$ is optimal. A rule of thumb for $\\mathbf Q$ is to set it between $\\frac{1}{2}\\Delta a$ to $\\Delta a$, where $\\Delta a$ is the maximum amount that the acceleration will change between sample period. This only applies for the assumption we are making in this chapter - that acceleration is constant and uncorrelated between each time period. In the Kalman Math chapter we will discuss several different ways of designing $\\mathbf Q$.\n",
"Here we see that the filter initially struggles for several iterations to acquire the track, but then it accurately tracks our dog. In fact, this is nearly optimum - we have not designed $\\mathbf Q$ optimally, but $\\mathbf R$ is optimal. A rule of thumb for $\\mathbf Q$ is to set it between $\\frac{1}{2}\\Delta a$ to $\\Delta a$, where $\\Delta a$ is the maximum amount that the acceleration will change between sample periods. This only applies for the assumption we are making in this chapter - that acceleration is constant and uncorrelated between each time period. In the Kalman Math chapter we will discuss several different ways of designing $\\mathbf Q$.\n",
"\n",
"To some extent you can get similar looking output by varying either ${\\mathbf R}$ or ${\\mathbf Q}$, but I urge you to not 'magically' alter these until you get output that you like. Always think about the physical implications of these assignments, and vary ${\\mathbf R}$ and/or ${\\mathbf Q}$ based on your knowledge of the system you are filtering. Back that up with extensive simulations and/or trial runs of real data."
]
@ -3226,7 +3226,7 @@
"source": [
"## Exercise: Compare Velocities\n",
"\n",
"Since we are plotting velocities let's look at what the the 'raw' velocity is, which we can compute by subtracting subsequent measurements. i.e the velocity at time 1 can be approximated by `xs[1] - xs[0]`. Plot the raw value against the values estimated by the Kalman filter and by the RTS filter. Discuss what you see."
"Since we are plotting velocities let's look at what the 'raw' velocity is, which we can compute by subtracting subsequent measurements. i.e the velocity at time 1 can be approximated by `xs[1] - xs[0]`. Plot the raw value against the values estimated by the Kalman filter and by the RTS filter. Discuss what you see."
]
},
{
@ -3342,7 +3342,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
"version": "3.6.5"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {

View File

@ -1189,7 +1189,7 @@
"$$ \\mathbf P=\\mathbf{FPF}^\\mathsf{T} + \\mathbf Q$$\n",
"\n",
"If the values for $\\mathbf Q$ are small relative to $\\mathbf P$\n",
"than it will be contributing almost nothing to the computation of $\\mathbf P$. Setting $\\mathbf Q$ to the zero matrix except for the lower right term\n",
"then it will be contributing almost nothing to the computation of $\\mathbf P$. Setting $\\mathbf Q$ to the zero matrix except for the lower right term\n",
"\n",
"$$\\mathbf Q=\\begin{bmatrix}0&0&0\\\\0&0&0\\\\0&0&\\sigma^2\\end{bmatrix}$$\n",
"\n",

View File

@ -990,7 +990,7 @@
"\n",
"$$\\mathbf z = \\mathbf{Hx}$$\n",
"\n",
"Our sensor still only reads position, so it should take the position from the state, and 0 out the velocity, like so:\n",
"Our sensor still only reads position, so it should take the position from the state, and 0 out the velocity and acceleration, like so:\n",
"\n",
"$$\\mathbf H = \\begin{bmatrix}1 & 0 \\end{bmatrix}$$\n",
"\n",
@ -2058,7 +2058,7 @@
"We could add a check that either takes the performance limits of the aircraft into account:\n",
"```\n",
"vel = y / dt\n",
"if vel >= MIN_AC_VELOCITY and vel <= MIN_AC_VELOCITY:\n",
"if vel >= MIN_AC_VELOCITY and vel <= MAX_AC_VELOCITY:\n",
" kf.update()\n",
"```\n",
" \n",
@ -2445,7 +2445,7 @@
"\n",
"In practice it happens a bit differently. Likelihoods can be difficult to deal with mathmatically. It is common to compute and use the *log-likelihood* instead, which is just the natural log of the likelihood. This has several benefits. First, the log is strictly increasing, and it reaches it's maximum value at the same point of the function it is applied to. If you want to find the maximum of a function you normally take the derivative of it; it can be difficult to find the derivative of some arbitrary function, but finding $\\frac{d}{dx} log(f(x))$ is trivial, and the result is the same as $\\frac{d}{dx} f(x)$. We don't use this property in this book, but it is essential when performing analysis on filters.\n",
"\n",
"The likelihood and log-likelihood are computed you when `update()` is called, and is accessible via the 'log_likelihood' and `likelihood` data attribute. Let's look at this: I'll run the filter with several measurements within expected range, and then inject measurements far from the expected values:"
"The likelihood and log-likelihood are computed for you when `update()` is called, and is accessible via the 'log_likelihood' and `likelihood` data attribute. Let's look at this: I'll run the filter with several measurements within expected range, and then inject measurements far from the expected values:"
]
},
{

View File

@ -847,7 +847,7 @@
"\n",
"The workhorses of nonlinear filters are the *linearized Kalman filter* and *extended Kalman filter* (EKF). These two techniques were invented shortly after Kalman published his paper and they have been the main techniques used since then. The flight software in airplanes, the GPS in your car or phone almost certainly use one of these techniques. \n",
"\n",
"However, these techniques are extremely demanding. The EKF linearizes the differential equations at one point, which requires you to find a solution to a matrix of partial derivatives (a Jacobian). This can be difficult or impossible to do analytically. If impossible, you have to use numerical techniques to find the Jacobian, but this is expensive computationally and introduces more error into the system. Finally, if the problem is quite nonlinear the linearization leads to a lot of error being introduced in each step, and the filters frequently diverge. You can not throw some equations into some arbitrary solver and expect to to get good results. It's a difficult field for professionals. I note that most Kalman filtering textbooks merely gloss over the EKF despite it being the most frequently used technique in real world applications. \n",
"However, these techniques are extremely demanding. The EKF linearizes the differential equations at one point, which requires you to find a solution to a matrix of partial derivatives (a Jacobian). This can be difficult or impossible to do analytically. If impossible, you have to use numerical techniques to find the Jacobian, but this is expensive computationally and introduces more error into the system. Finally, if the problem is quite nonlinear the linearization leads to a lot of error being introduced in each step, and the filters frequently diverge. You cannot throw some equations into some arbitrary solver and expect to to get good results. It's a difficult field for professionals. I note that most Kalman filtering textbooks merely gloss over the EKF despite it being the most frequently used technique in real world applications. \n",
"\n",
"Recently the field has been changing in exciting ways. First, computing power has grown to the point that we can use techniques that were once beyond the ability of a supercomputer. These use *Monte Carlo* techniques - the computer generates thousands to tens of thousands of random points and tests all of them against the measurements. It then probabilistically kills or duplicates points based on how well they match the measurements. A point far away from the measurement is unlikely to be retained, whereas a point very close is quite likely to be retained. After a few iterations there is a clump of particles closely tracking your object, and a sparse cloud of points where there is no object.\n",
"\n",

View File

@ -582,7 +582,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"However, we must understand that this smoothing is predicated on the system model. We have told the filter that that what we are tracking follows a constant velocity model with very low process error. When the filter *looks ahead* it sees that the future behavior closely matches a constant velocity so it is able to reject most of the noise in the signal. Suppose instead our system has a lot of process noise. For example, if we are tracking a light aircraft in gusty winds its velocity will change often, and the filter will be less able to distinguish between noise and erratic movement due to the wind. We can see this in the next graph. "
"However, we must understand that this smoothing is predicated on the system model. We have told the filter that what we are tracking follows a constant velocity model with very low process error. When the filter *looks ahead* it sees that the future behavior closely matches a constant velocity so it is able to reject most of the noise in the signal. Suppose instead our system has a lot of process noise. For example, if we are tracking a light aircraft in gusty winds its velocity will change often, and the filter will be less able to distinguish between noise and erratic movement due to the wind. We can see this in the next graph. "
]
},
{