Formatting - making matrix variables bold, small, etc.

2014-05-26 00:56:55 -07:00 · 2014-05-26 00:56:55 -07:00 · 1599069a64
commit 1599069a64
parent 6dc9c02af0
1 changed files with 33 additions and 31 deletions
--- a/Multidimensional_Kalman_Filters.ipynb
+++ b/Multidimensional_Kalman_Filters.ipynb
@ -1,7 +1,7 @@
 {
 "metadata": {
  "name": "",
-  "signature": "sha256:be0a709066adb31921ff0f85de1570fb3fe218d1a139417a7dc26ef13ea519e2"
+  "signature": "sha256:1dc131b15da6cd7244d52ea00f85302deed12274cd93ac32c078d53e70b8ecac"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
@ -1013,13 +1013,13 @@
     "source": [
      "##### **Step 6**: Design the Process Noise Matrix\n",
      "\n",
-      "What is *process noise*? Consider the motion of a thrown ball. In a vacuum and with constant gravitational force it moves in a parabola. However, if you throw the ball on the surface of the earth you will also need to model factors like rotation and air drag. However, even when you have done all of that there is usually things you cannot account for. For example, consider wind. On a windy day the ball's trajectory will differ from the computed trajectory, perhaps by a significant amount. Without wind sensors, we may have no way to model the wind. Wind can come from any direction, so it is likely to have a near Gaussian distribution. The Kalman filter models this as *process noise*, and calls it $\\mathbf{Q}$.\n",
+      "What is *process noise*? Consider the motion of a thrown ball. In a vacuum and with constant gravitational force it moves in a parabola. However, if you throw the ball on the surface of the earth you will also need to model factors like rotation and air drag. However, even when you have done all of that there is usually things you cannot account for. For example, consider wind. On a windy day the ball's trajectory will differ from the computed trajectory, perhaps by a significant amount. Without wind sensors, we may have no way to model the wind. Wind can come from any direction, so it is likely to have a near Gaussian distribution. The Kalman filter models this as *process noise*, and calls it $\\small\\mathbf{Q}$.\n",
      "\n",
      "Astute readers will realize that we can inspect the ball's path and extract wind as an unobserved state variable, but the point to grasp here is there will always be some unmodelled noise in our process, and the Kalman filter gives us a way to model it.\n",
      "\n",
-      "Designing the process noise matrix can be quite demanding. For our first example, we will set it to 0, like so: $\\mathbf{Q}=0$. It is unlikely that you would do that for a real filter.\n",
+      "Designing the process noise matrix can be quite demanding. For our first example, we will set it to 0, like so: $\\small\\mathbf{Q}=0$. It is unlikely that you would do that for a real filter.\n",
      "\n",
-      "> Some books and papers use $\\mathbf{R}$ for measurement noise and $\\mathbf{Q}$ for the process noise. Others do the opposite, using $\\mathbf{Q}$ for measurement noise and $\\mathbf{R}$ for the process noise! Read carefully, and make sure you don't get confused."
+      "> Some books and papers use $\\small\\mathbf{R}$ for measurement noise and $\\small\\mathbf{Q}$ for the process noise. Others do the opposite, using $\\small\\mathbf{Q}$ for measurement noise and $\\small\\mathbf{R}$ for the process noise! Read carefully, and make sure you don't get confused. I use the following mnemonic. Radars are used to measure positions, and they have measurement error. So, for me, $\\small\\mathbf{R}$ is the **R**adar's measurement noise. I've read a lot of Kalman filter literature in the context of radar tracking, so it makes sense to me. I don't have a good one for $\\small\\mathbf{Q}$, other than to note that it alphabetically follows the p in **P**rocess."
     ]
    },
    {
@ -1088,30 +1088,32 @@
      "\n",
      "**1**: We just assign the initial value for our state. Here we just initialize both the position and velocity to zero. \n",
      "\n",
-      "**2**: We set $F=(\\begin{smallmatrix}1&1\\\\0&1\\end{smallmatrix})$, as in design step 2 above. \n",
+      "**2**: We set $\\small\\mathbf{F}=(\\begin{smallmatrix}1&1\\\\0&1\\end{smallmatrix})$, as in design step 2 above. \n",
      "\n",
      "**3**: We set $H=(\\begin{smallmatrix}1&0\\end{smallmatrix})$, as in design step 3 above.\n",
      "\n",
-      "**4**: We set $R = 5$ and $Q=0$ as in steps 5 and 6.\n",
+      "**4**: We set $\\small\\mathbf{R} = 5$ and $\\small\\mathbf{Q}=0$ as in steps 5 and 6.\n",
      "\n",
      "**5**: Recall in the last chapter we set our initial belief to $\\mathcal{N}(\\mu,\\sigma^2)=\\mathcal{N}(0,500)$ to signify our lack of knowledge about the initial conditions. We implemented this in Python with a list that contained both $\\mu$ and $\\sigma^2$ in the variable $pos$:\n",
      "\n",
      "    pos = (0,500)\n",
      "    \n",
-      "Multidimensional Kalman filters stores the state variables in $\\mathbf{x}$ and their *covariance* in $\\mathbf{P}$. These are $f.x$ and $f.P$ in the code above. Notionally, this is similar as the one dimension case, but instead of having a mean and variance we have a mean and covariance. For the multidimensional case, we have $$\\mathcal{N}(\\mu,\\sigma^2)=\\mathcal{N}(x,P)$$\n",
+      "Multidimensional Kalman filters stores the state variables in $\\mathbf{x}$ and their *covariance* in $\\small\\mathbf{P}$. These are $f.x$ and $f.P$ in the code above. Notionally, this is similar as the one dimension case, but instead of having a mean and variance we have a mean and covariance. For the multidimensional case, we have\n",
      "\n",
-      "$P$ is initialized to the identity matrix of size $n{\\times}n$, so multiplying by 500 assigns a variance of 500 to $x$ and $\\dot{x}$. So $f.P$ contains\n",
+      "$$\\mathcal{N}(\\mu,\\sigma^2)=\\mathcal{N}(\\mathbf{x},\\mathbf{P})$$\n",
+      "\n",
+      "$\\small\\mathbf{P}$ is initialized to the identity matrix of size $n{\\times}n$, so multiplying by 500 assigns a variance of 500 to $x$ and $\\dot{x}$. So $f.P$ contains\n",
      "\n",
      "$$\\begin{bmatrix} 500&0\\\\0&500\\end{bmatrix}$$\n",
      "\n",
      "This will become much clearer once we look at the covariance matrix in detail in later sessions. For now recognize that each diagonal element $e_{ii}$ is the variance for the $ith$ state variable. \n",
      "\n",
-      "> Summary: For our dog tracking problem, in the 1-D case $\\mu$ was the position, and $\\sigma^2$ was the variance. In the 2-D case $x$ is our position and velocity, and $P$ is the *covariance* of the position and velocity. It is the same thing, just in higher dimensions!\n",
+      "> Summary: For our dog tracking problem, in the 1-D case $\\mu$ was the position, and $\\sigma^2$ was the variance. In the 2-D case $\\small\\mathbf{x}$ is our position and velocity, and $\\small\\mathbf{P}$ is the *covariance* of the position and velocity. It is the same thing, just in higher dimensions!\n",
      "\n",
      ">| | 1D | 2D and up|\n",
      ">|--|----|---|\n",
      ">|state|$\\mu$|$x$|\n",
-      ">|uncertainty|$\\sigma^2$|$P$|\n",
+      ">|uncertainty|$\\sigma^2$|$\\small\\mathbf{P}$|\n",
      "\n",
      "All that is left is to run the code! The *DogSensor* class from the previous chapter has been placed in *DogSensor.py*."
     ]
@ -1163,7 +1165,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "This is the complete code for the filter, and most of it is just boilerplate. The first function *dog_tracking_filter()* is a helper function that creates a *KalmamFilter* object with specified $\\mathbf{R}$, $\\mathbf{Q}$ and $\\mathbf{P}$ matrices. We've shown this code already, so I will not discuss it more here. \n",
+      "This is the complete code for the filter, and most of it is just boilerplate. The first function *dog_tracking_filter()* is a helper function that creates a *KalmamFilter* object with specified $\\small\\mathbf{R}$, $\\small\\mathbf{Q}$ and $\\small\\mathbf{P}$ matrices. We've shown this code already, so I will not discuss it more here. \n",
      "\n",
      "The function *filter_dog()* implements the filter itself.  Lets work through it line by line. The first line creates the simulation of the DogSensor, as we have seen in the previous chapter.\n",
      "\n",
@ -1231,7 +1233,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Finally, call it. We will start by filtering 100 measurements with a noise factor of 30, $\\mathbf{R}=5$ and $\\mathbf{Q}=0$."
+      "Finally, call it. We will start by filtering 100 measurements with a noise factor of 30, $\\small\\mathbf{R}=5$ and $\\small\\mathbf{Q}=0$."
     ]
    },
    {
@ -1270,7 +1272,7 @@
      "\n",
      "The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (drawn in green). After the initial settling in period the filter should track the dog's position very closely.\n",
      "\n",
-      "The next two plots show the variance of $x$ and of $\\dot{x}$. If you look at the code, you will see that I have plotted the diagonals of $P$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $P[0,0]$ is the variance of $x$, and $P[1,1]$ is the variance of $\\dot{x}$. You can see that despite initializing $P=(\\begin{smallmatrix}500&0\\\\0&500\\end{smallmatrix})$ we quickly converge to small variances for both the position and velocity. We will spend a lot of time on the covariance matrix later, so for now I will leave it at that.\n",
+      "The next two plots show the variance of $x$ and of $\\dot{x}$. If you look at the code, you will see that I have plotted the diagonals of $\\small\\mathbf{P}$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $\\small\\mathbf{P}[0,0]$ is the variance of $x$, and $\\small\\mathbf{P}[1,1]$ is the variance of $\\dot{x}$. You can see that despite initializing $\\small\\mathbf{P}=(\\begin{smallmatrix}500&0\\\\0&500\\end{smallmatrix})$ we quickly converge to small variances for both the position and velocity. We will spend a lot of time on the covariance matrix later, so for now I will leave it at that.\n",
      "\n",
      "In the previous chapter we filtered very noisy signals with much simpler code than the code above. However, realize that right now we are working with a very simple example - an object moving through 1-D space and one sensor. That is about the limit of what we can compute with the code in the last chapter. In contrast, we can implement very complicated, multidimensional filter with this code merely by altering are assignments to the filter's variables. Perhaps we want to track 100 dimensions in financial models. Or we have an aircraft with a GPS, INS, TACAN, radar altimeter, baro altimeter, and airspeed indicator, and we want to integrate all those sensors into a model that predicts position, velocity, and accelerations in 3D (which requires 9 state variables). We can do that with the code in this chapter."
     ]
@ -1473,14 +1475,14 @@
      "| | 1D | 2+D|\n",
      "|--|----|---|\n",
      "|state|$\\mu$|$x$|\n",
-      "|uncertainty|$\\sigma^2$|$P$|\n",
+      "|uncertainty|$\\sigma^2$|$\\small\\mathbf{P}$|\n",
      "\n",
-      "This should remind you that $P$, the covariance matrix is nothing more than the variance of our state - such as the position of our dog. It has many elements in it, but don't be daunted; we will learn how to interpret a very large $9\\times 9$ covariance matrix, or even larger.\n",
+      "This should remind you that $\\small\\mathbf{P}$, the covariance matrix is nothing more than the variance of our state - such as the position of our dog. It has many elements in it, but don't be daunted; we will learn how to interpret a very large $9{\\times}9$ covariance matrix, or even larger.\n",
      "\n",
      "Recall the beginning of the chapter, where we provided the equation for the covariance matrix. It read:\n",
      "\n",
      "$$\n",
-      "P = \\begin{pmatrix}\n",
+      "\\mathbf{P} = \\begin{pmatrix}\n",
      "  {{\\sigma}_{1}}^2 & p{\\sigma}_{1}{\\sigma}_{2} & \\cdots & p{\\sigma}_{1}{\\sigma}_{n} \\\\\n",
      "  p{\\sigma}_{2}{\\sigma}_{1} &{{\\sigma}_{2}}^2 & \\cdots & p{\\sigma}_{2}{\\sigma}_{n} \\\\\n",
      "  \\vdots  & \\vdots  & \\ddots & \\vdots  \\\\\n",
@ -1488,7 +1490,7 @@
      " \\end{pmatrix}\n",
      "$$\n",
      "\n",
-      "(I have subtituted $P$ for $\\Sigma$ because of the nomenclature used by the Kalman filter literature).\n",
+      "(I have subtituted $\\small\\mathbf{P}$ for $\\Sigma$ because of the nomenclature used by the Kalman filter literature).\n",
      "\n",
      "The diagonal contains the variance of each of our state variables. So, if our state variables are\n",
      "\n",
@ -1558,7 +1560,7 @@
      "Here the ellipse is slanted, signifying that $x$ and $\\dot{x}$ are correlated (and, of course, dependent - all correlated variables are dependent). You may or may not have noticed that the off diagonal elements were set to the same value, 2.4. This was not an accident. Let's look at the equation for the covariance for the case where the number of dimensions is two.\n",
      "\n",
      "$$\n",
-      "P = \\begin{pmatrix}\n",
+      "\\mathbf{P} = \\begin{pmatrix}\n",
      "  \\sigma_1^2 & p\\sigma_1\\sigma_2 \\\\\n",
      "  p\\sigma_2\\sigma_1 &\\sigma_2^2 \n",
      " \\end{pmatrix}\n",
@ -1567,18 +1569,18 @@
      "Look at the computation for the off diagonal elements. \n",
      "\n",
      "$$\\begin{align*}\n",
-      "P_{0,1}&=p\\sigma_1\\sigma_2 \\\\\n",
-      "P_{1,0}&=p\\sigma_2\\sigma_1.\n",
+      "\\mathbf{P}_{0,1}&=p\\sigma_1\\sigma_2 \\\\\n",
+      "\\mathbf{P}_{1,0}&=p\\sigma_2\\sigma_1.\n",
      "\\end{align*}$$\n",
      "\n",
      "If we re-arrange terms we get\n",
      "$$\\begin{align*}\n",
-      "P_{0,1}&=p\\sigma_1\\sigma_2 \\\\\n",
-      "P_{1,0}&=p\\sigma_1\\sigma_1 \\mbox{, yielding}  \\\\\n",
-      "P_{0,1}&=P_{1,0}\n",
+      "\\mathbf{P}_{0,1}&=p\\sigma_1\\sigma_2 \\\\\n",
+      "\\mathbf{P}_{1,0}&=p\\sigma_1\\sigma_1 \\mbox{, yielding}  \\\\\n",
+      "\\mathbf{P}_{0,1}&=P_{1,0}\n",
      "\\end{align*}$$\n",
      "\n",
-      "In general, we can state that $P_{i,j}=P_{j,i}$.\n",
+      "In general, we can state that $\\small\\mathbf{P}_{i,j}=\\small\\mathbf{P}_{j,i}$.\n",
      "\n",
      "So for my example I multiplied the diagonals, 2 and 6, to get 12, and then scaled that with the arbitrarily chosen $p=.2$ to get 2.4.\n",
      "\n",
@ -1654,13 +1656,13 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "The output on these is a bit messy, but you should be able to see what is happening. In both plots we are drawing the covariance matrix for each point. We start with the covariance $P=(\\begin{smallmatrix}50&0\\\\0&50\\end{smallmatrix})$, which signifies a lot of uncertainty about our initial belief. After we receive the first measurement the Kalman filter updates this belief, and so the variance is no longer as large. In the top plot the first ellipse (the one on the far left) should be a slighly squashed ellipse. As the filter continues processing the measurements the covariance ellipse quickly shifts shape until it settles down to being a long, narrow ellipse tilted in the direction of movement.\n",
+      "The output on these is a bit messy, but you should be able to see what is happening. In both plots we are drawing the covariance matrix for each point. We start with the covariance $\\small\\mathbf{P}=(\\begin{smallmatrix}50&0\\\\0&50\\end{smallmatrix})$, which signifies a lot of uncertainty about our initial belief. After we receive the first measurement the Kalman filter updates this belief, and so the variance is no longer as large. In the top plot the first ellipse (the one on the far left) should be a slighly squashed ellipse. As the filter continues processing the measurements the covariance ellipse quickly shifts shape until it settles down to being a long, narrow ellipse tilted in the direction of movement.\n",
      "\n",
      "Think about what this means physically. The x-axis of the ellipse denotes our uncertainty in position, and the y-axis our uncertainty in velocity. So, an ellipse that is taller than it is wide signifies that we are more uncertain about the velocity than the position. Conversely, a wide, narrow ellipse shows high uncertainty in position and low uncertainty in velocity. Finally, the amount of tilt shows the amount of correlation between the two variables. \n",
      "\n",
-      "The first plot, with $R=5$, finishes up with an ellipse that is wider than it is tall. If that is not clear I have printed out the variances for the last ellipse in the lower right hand corner. The variance for position is 3.85, and the variance for velocity is 3.0. \n",
+      "The first plot, with $\\small\\mathbf{R}=5$, finishes up with an ellipse that is wider than it is tall. If that is not clear I have printed out the variances for the last ellipse in the lower right hand corner. The variance for position is 3.85, and the variance for velocity is 3.0. \n",
      "\n",
-      "In contrast, the second plot, with $R=0.5$, has a final ellipse that is taller than wide. The ellipses in the second plot are all much smaller than the ellipses in the first plot. This stands to reason because a small $R$ implies a small amount of noise in our measurements. Small noise means accurate predictions, and thus a strong belief in our position. "
+      "In contrast, the second plot, with $\\small\\mathbf{R}=0.5$, has a final ellipse that is taller than wide. The ellipses in the second plot are all much smaller than the ellipses in the first plot. This stands to reason because a small $\\small\\mathbf{R}$ implies a small amount of noise in our measurements. Small noise means accurate predictions, and thus a strong belief in our position. "
     ]
    },
    {
@ -1669,15 +1671,15 @@
     "source": [
      "##### Question: Explain Ellipse Differences\n",
      "\n",
-      "Why are the ellipses for $R=5$ shorter, and more tilted than the ellipses for $R=0.5$. Hint: think about this in the context of what these ellipses mean physically, not in terms of the math. If you aren't sure about the answer,change R to truly large and small numbers such as 100 and 0.1, observe the changes, and think about what this means. \n",
+      "Why are the ellipses for $R=5$ shorter, and more tilted than the ellipses for $\\small\\mathbf{R}=0.5$. Hint: think about this in the context of what these ellipses mean physically, not in terms of the math. If you aren't sure about the answer,change $\\small\\mathbf{R}$ to truly large and small numbers such as 100 and 0.1, observe the changes, and think about what this means. \n",
      "\n",
      "##### Solution\n",
      "\n",
-      "The $x$ axis is for position, and $y$ is velocity. An ellipse that is vertical, or nearly so, says there is no correlation between position and velocity, and an ellipse that is diagnal says that there is a lot of correlation. Phrased that way, it sounds unlikely - either they are correlated or not. But this is a measure of the *output of the filter*, not a description of the actual, physical world. When $R$ is very large we are telling the filter that there is a lot of noise in the measurements. In that case the Kalman gain $K$ is set to favor the prediction over the measurement, and the prediction comes from the velocity state variable. So, there is a large correlation between $x$ and $\\dot{x}$. Conversely, if $R$ is small, we are telling the filter that the measurement is very trustworthy, and $K$ is set to favor the measurement over the prediction. Why would the filter want to use the prediction if the measurement is nearly perfect? If the filter is not using much from the prediction there will be very little correlation reported. \n",
+      "The $x$ axis is for position, and $y$ is velocity. An ellipse that is vertical, or nearly so, says there is no correlation between position and velocity, and an ellipse that is diagnal says that there is a lot of correlation. Phrased that way, it sounds unlikely - either they are correlated or not. But this is a measure of the *output of the filter*, not a description of the actual, physical world. When $\\small\\mathbf{R}$ is very large we are telling the filter that there is a lot of noise in the measurements. In that case the Kalman gain $\\small\\mathbf{K}$ is set to favor the prediction over the measurement, and the prediction comes from the velocity state variable. So, there is a large correlation between $x$ and $\\dot{x}$. Conversely, if $\\small\\mathbf{R}$ is small, we are telling the filter that the measurement is very trustworthy, and $\\small\\mathbf{K}$ is set to favor the measurement over the prediction. Why would the filter want to use the prediction if the measurement is nearly perfect? If the filter is not using much from the prediction there will be very little correlation reported. \n",
      "\n",
      "**This is a critical point to understand!**. The Kalman filter is just a mathematical model for a real world system. A report of little correlation *does not mean* there is no correlation in the physical system, just that there was no correlation in the mathematical model. It's just a report of how much measurement vs prediction was incorporated into the model.  \n",
      "\n",
-      "Let's bring that point home with a truly large measurement error. We will set $R=500$. Think about what the plot will look like before scrolling down. To emphasize the issue, I will set the amount of noise injected into the measurements to 0, so the measurement will exactly equal the actual position. "
+      "Let's bring that point home with a truly large measurement error. We will set $\\small\\mathbf{R}=500$. Think about what the plot will look like before scrolling down. To emphasize the issue, I will set the amount of noise injected into the measurements to 0, so the measurement will exactly equal the actual position. "
     ]
    },
    {
@ -1711,7 +1713,7 @@
     "cell_type": "markdown",
     "metadata": {},
     "source": [
-      "Keep looking at these plots until you grasp how to interpret the covariance matrix $P$. When you start dealing with a, say, $9\\times 9$ matrix it may seem overwhelming - there are 81 numbers to interpret. Just break it down - the diagonal contains the variance for each state variable, and all off diagonal elements are the product of two variances and a scaling factor $p$. You will not be able to plot a $9\\times 9$ matrix on the screen because it would require living in 10-D space, so you have to develop your intution and understanding in this simple, 2-D case. \n",
+      "Keep looking at these plots until you grasp how to interpret the covariance matrix $\\small\\mathbf{P}$. When you start dealing with a, say, $9{\\times}9$ matrix it may seem overwhelming - there are 81 numbers to interpret. Just break it down - the diagonal contains the variance for each state variable, and all off diagonal elements are the product of two variances and a scaling factor $p$. You will not be able to plot a $9{\\times}9$ matrix on the screen because it would require living in 10-D space, so you have to develop your intution and understanding in this simple, 2-D case. \n",
      "\n",
      "> **sidebar**: when plotting covariance ellipses, make sure to always use *plt.axis('equal')* in your code. If the axis use different scales the ellipses will be drawn distorted. For example, the ellipse may be drawn as being taller than it is wide, but it may actually be wider than tall."
     ]