Fixed bad description of fat tails. GitHub #217

Updated my description of the fat tails to be more general, especially
since my tails were truncated not fat!

Also added a list of Wikipedia links at the end for reference of
those reading the PDF.
This commit is contained in:
Roger Labbe 2018-04-25 17:05:53 -07:00
parent 237ad1f9cb
commit 47062b018d

View File

@ -1520,15 +1520,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fat Tails\n",
"## Limitations of Using Gaussians to Model the World\n",
"\n",
"Earlier I mentioned the *central limit theorem*, which states that under certain conditions the arithmetic sum of any independent random variable will be normally distributed, regardless of how the random variables are distributed. This is important to us because nature is full of distributions which are not normal, but when we apply the central limit theorem over large populations we end up with normal distributions. \n",
"\n",
"However, a key part of the proof is “under certain conditions”. These conditions often do not hold for the physical world. The resulting distributions are called *fat tailed*. Tails is a colloquial term for the far left and right side parts of the curve where the probability density is close to zero.\n",
"However, a key part of the proof is “under certain conditions”. These conditions often do not hold for the physical world. For example, a kitchen scale cannot read below zero, but if we represent the measurement error as a Gaussian the left side of the curve extends to negative infinity, implying a very small chance of giving a negative reading. \n",
"\n",
"Let's consider a trivial example. We think of things like test scores as being normally distributed. If you have ever had a professor “grade on a curve” you have been subject to this assumption. But of course test scores cannot follow a normal distribution. This is because the distribution assigns a nonzero probability distribution for *any* value, no matter how far from the mean. So, for example, say your mean is 90 and the standard deviation is 13. The normal distribution assumes that there is a large chance of somebody getting a 90, and a small chance of somebody getting a 40. However, it also implies that there is a tiny chance of somebody getting a grade of -10, or 150. It assigns an infinitesimal chance of getting a score of $-10^{300}$ or $10^{32986}$. The tails of a Gaussian distribution are infinitely long.\n",
"This is a broad topic which I will not treat exhaustively. \n",
"\n",
"But for a test we know this is not true. Ignoring extra credit, you cannot get less than 0, or more than 100. Let's plot this range of values using a normal distribution."
"Let's consider a trivial example. We think of things like test scores as being normally distributed. If you have ever had a professor “grade on a curve” you have been subject to this assumption. But of course test scores cannot follow a normal distribution. This is because the distribution assigns a nonzero probability distribution for *any* value, no matter how far from the mean. So, for example, say your mean is 90 and the standard deviation is 13. The normal distribution assumes that there is a large chance of somebody getting a 90, and a small chance of somebody getting a 40. However, it also implies that there is a tiny chance of somebody getting a grade of -10, or 150. It assigns an extremely chance of getting a score of $-10^{300}$ or $10^{32986}$. The tails of a Gaussian distribution are infinitely long.\n",
"\n",
"But for a test we know this is not true. Ignoring extra credit, you cannot get less than 0, or more than 100. Let's plot this range of values using a normal distribution to see how poorly this represents real test scores distributions."
]
},
{
@ -1559,9 +1561,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The area under the curve cannot equal 1, so it is not a probability distribution. What actually happens is that more students than predicted by a normal distribution get scores nearer the upper end of the range (for example), and that tail becomes “fat”. Also, the test is probably not able to perfectly distinguish minute differences in skill in the students, so the distribution to the left of the mean is also probably a bit bunched up in places. The resulting distribution is called a [*fat tail distribution*](https://en.wikipedia.org/wiki/Fat-tailed_distribution). \n",
"The area under the curve cannot equal 1, so it is not a probability distribution. What actually happens is that more students than predicted by a normal distribution get scores nearer the upper end of the range (for example), and that tail becomes “fat”. Also, the test is probably not able to perfectly distinguish minute differences in skill in the students, so the distribution to the left of the mean is also probably a bit bunched up in places. \n",
"\n",
"Sensors measure the world. The errors in a sensor's measurements are rarely truly Gaussian. It is far too early to be talking about the difficulties that this presents to the Kalman filter designer. It is worth keeping in the back of your mind the fact that the Kalman filter math is based on an idealized model of the world. For now I will present a bit of code that I will be using later in the book to form fat tail distributions to simulate various processes and sensors. This distribution is called the [*Student's $t$-distribution*](https://en.wikipedia.org/wiki/Student%27s_t-distribution). \n",
"Sensors measure the world. The errors in a sensor's measurements are rarely truly Gaussian. It is far too early to be talking about the difficulties that this presents to the Kalman filter designer. It is worth keeping in the back of your mind the fact that the Kalman filter math is based on an idealized model of the world. For now I will present a bit of code that I will be using later in the book to form distributions to simulate various processes and sensors. This distribution is called the [*Student's $t$-distribution*](https://en.wikipedia.org/wiki/Student%27s_t-distribution). \n",
"\n",
"Let's say I want to model a sensor that has some white noise in the output. For simplicity, let's say the signal is a constant 10, and the standard deviation of the noise is 2. We can use the function `numpy.random.randn()` to get a random number with a mean of 0 and a standard deviation of 1. I can simulate this with:"
]
@ -1613,7 +1615,7 @@
"source": [
"That looks like I would expect. The signal is centered around 10. A standard deviation of 2 means that 68% of the measurements will be within $\\pm$ 2 of 10, and 99% will be within $\\pm$ 6 of 10, and that looks like what is happening. \n",
"\n",
"Now let's look at a fat tailed distribution generated with the Student's $t$-distribution. I will not go into the math, but just give you the source code for it and then plot a distribution using it."
"Now let's look at distribution generated with the Student's $t$-distribution. I will not go into the math, but just give you the source code for it and then plot a distribution using it."
]
},
{
@ -1665,17 +1667,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see from the plot that while the output is similar to the normal distribution there are outliers that go far more than 3 standard deviations from the mean (7 to 13). This is what causes the 'fat tail'.\n",
"We can see from the plot that while the output is similar to the normal distribution there are outliers that go far more than 3 standard deviations from the mean (7 to 13). \n",
"\n",
"It is unlikely that the Student's $t$-distribution is an accurate model of how your sensor (say, a GPS or Doppler) performs, and this is not a book on how to model physical systems. However, it does produce reasonable data to test your filter's performance when presented with real world noise. We will be using distributions like these throughout the rest of the book in our simulations and tests. \n",
"\n",
"This is not an idle concern. The Kalman filter equations assume the noise is normally distributed, and perform sub-optimally if this is not true. Designers for mission critical filters, such as the filters on spacecraft, need to master a lot of theory and empirical knowledge about the performance of the sensors on their spacecraft. \n",
"This is not an idle concern. The Kalman filter equations assume the noise is normally distributed, and perform sub-optimally if this is not true. Designers for mission critical filters, such as the filters on spacecraft, need to master a lot of theory and empirical knowledge about the performance of the sensors on their spacecraft. For example, a presentation I saw on a NASA mission stated that while theory states that they should use 3 standard deviations to distinguish noise from valid measurements in practice they had to use 5 to 6 standard deviations. This was something they determined by experiments.\n",
"\n",
"The code for rand_student_t is included in `filterpy.stats`. You may use it with\n",
"\n",
"```python\n",
"from filterpy.stats import rand_student_t\n",
"```"
"```\n",
"\n",
"While I'll not cover it here, statistics has defined ways of describing the shape of a probability distribution by how it varies from an exponential distribution. The normal distribution is shaped symmetrically around the mean - like a bell curve. However, a probability distribution can be asymmetrical around the mean. The measure of this is called [*skew*](https://en.wikipedia.org/wiki/Skewness). The tails can be shortened, fatter, thinner, or otherwise shaped differently from an exponential distribution. The measure of this is called [*kurtosis*](https://en.wikipedia.org/wiki/Kurtosis)."
]
},
{
@ -1684,7 +1688,7 @@
"source": [
"## Summary and Key Points\n",
"\n",
"This chapter is a poor introduction to statistics in general. I've only covered the concepts that needed to use Gaussians in the remainder of the book, no more. What I've covered will not get you very far if you intend to read the Kalman filter literature. If this is a new topic to you I suggest reading a statistics textbook. I've always liked the Schaum series for self study, and Alan Downey's *Think Stats* [5] is also very good. \n",
"This chapter is a poor introduction to statistics in general. I've only covered the concepts that needed to use Gaussians in the remainder of the book, no more. What I've covered will not get you very far if you intend to read the Kalman filter literature. If this is a new topic to you I suggest reading a statistics textbook. I've always liked the Schaum series for self study, and Alan Downey's *Think Stats* [5] is also very good and freely available online. \n",
"\n",
"The following points **must** be understood by you before we continue:\n",
"\n",
@ -1693,7 +1697,9 @@
"* $\\mu$ is the average of all possible values\n",
"* The variance $\\sigma^2$ represents how much our measurements vary from the mean\n",
"* The standard deviation ($\\sigma$) is the square root of the variance ($\\sigma^2$)\n",
"* Many things in nature approximate a normal distribution"
"* Many things in nature approximate a normal distribution, but the math is not perfect.\n",
"\n",
"The next several chapters will be using Gaussians to help perform filtering. As noted in the last section, sometimes Gaussians do not describe the world very well. Latter parts of the book are dedicated to filters which work even when the noise or system's behavior is very non-Gaussian. "
]
},
{
@ -1721,6 +1727,39 @@
"\n",
"http://greenteapress.com/thinkstats/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Useful Wikipedia Links\n",
"\n",
"https://en.wikipedia.org/wiki/Probability_distribution\n",
"\n",
"https://en.wikipedia.org/wiki/Random_variable\n",
"\n",
"https://en.wikipedia.org/wiki/Sample_space\n",
"\n",
"https://en.wikipedia.org/wiki/Central_tendency\n",
"\n",
"https://en.wikipedia.org/wiki/Expected_value\n",
"\n",
"https://en.wikipedia.org/wiki/Standard_deviation\n",
"\n",
"https://en.wikipedia.org/wiki/Variance\n",
"\n",
"https://en.wikipedia.org/wiki/Probability_density_function\n",
"\n",
"https://en.wikipedia.org/wiki/Central_limit_theorem\n",
"\n",
"https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule\n",
"\n",
"https://en.wikipedia.org/wiki/Cumulative_distribution_function\n",
"\n",
"https://en.wikipedia.org/wiki/Skewness\n",
"\n",
"https://en.wikipedia.org/wiki/Kurtosis"
]
}
],
"metadata": {
@ -1740,7 +1779,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
"version": "3.6.4"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {