diff --git a/03-Gaussians.ipynb b/03-Gaussians.ipynb
index 01369ae..8f1af68 100644
--- a/03-Gaussians.ipynb
+++ b/03-Gaussians.ipynb
@@ -354,7 +354,7 @@
     "\n",
     "for discrete distributions, and as \n",
     "\n",
-    "$$\\int P(X{=}u)= 1$$\n",
+    "$$\\int P(X{=}u) \\,du= 1$$\n",
     "\n",
     "for continuous distributions."
    ]
@@ -447,7 +447,7 @@
     "\n",
     "$$\\mathbb E[X] = (1)(0.8) + (3)(0.15) + (5)(0.05) = 1.5$$\n",
     "\n",
-    "Here I have introduced the notation $\\mathbb E[X]$ for the expected value of $x$. Some texts use $E(x)$. The value 1.5 for $x$ makes intuitive sense because x is far more like to be 1 than 3 or 5, and 3 is more likely than 5 as well.\n",
+    "Here I have introduced the notation $\\mathbb E[X]$ for the expected value of $x$. Some texts use $E(x)$. The value 1.5 for $x$ makes intuitive sense because $x$ is far more likely to be 1 than 3 or 5, and 3 is more likely than 5 as well.\n",
     "\n",
     "We can formalize this by letting $x_i$ be the $i^{th}$ value of $X$, and $p_i$ be the probability of its occurrence. This gives us\n",
     "\n",
@@ -459,7 +459,7 @@
     "\n",
     "If $x$ is continuous we substitute the sum for an integral, like so\n",
     "\n",
-    "$$\\mathbb E[X] = \\int_{-\\infty}^\\infty x\\, f(x)$$\n",
+    "$$\\mathbb E[X] = \\int_{-\\infty}^\\infty x\\, f(x) \\,dx$$\n",
     "\n",
     "where $f(x)$ is the probability distribution function of $x$. We won't be using this equation yet, but we will be using it in the next chapter."
    ]
@@ -531,11 +531,11 @@
     "\n",
     "Statistics has formalized this concept of measuring variation into the notion of *standard deviation* and *variance*. The equation for computing the *variance* is\n",
     "\n",
-    "$$VAR(X) = E[(X - \\mu)^2]$$\n",
+    "$$\\mathit{VAR}(X) = E[(X - \\mu)^2]$$\n",
     "\n",
-    "Ignoring the squared terms for a moment, you can see that the variance is the *expected value* for how much the sample space (X) varies from the mean (squared, of course). We have the formula for the expected value $E[X] = \\sum\\limits_{i=1}^n p_ix_i$, and we will assume that any height is equally probable, so we can substitute that into the equation above to get\n",
+    "Ignoring the squared terms for a moment, you can see that the variance is the *expected value* for how much the sample space ($X$) varies from the mean (squared, of course). We have the formula for the expected value $E[X] = \\sum\\limits_{i=1}^n p_ix_i$, and we will assume that any height is equally probable, so we can substitute that into the equation above to get\n",
     "\n",
-    "$$VAR(X) = \\frac{1}{n}\\sum_{i=1}^n (x_i - \\mu)^2$$\n",
+    "$$\\mathit{VAR}(X) = \\frac{1}{n}\\sum_{i=1}^n (x_i - \\mu)^2$$\n",
     "\n",
     "Let's compute the variance of the three classes to see what values we get and to become familiar with this concept.\n",
     "\n",
@@ -543,9 +543,9 @@
     "\n",
     "$$ \n",
     "\\begin{aligned}\n",
-    "VAR(X) &=\\frac{(1.8-1.8)^2 + (2-1.8)^2 + (1.7-1.8)^2 + (1.9-1.8)^2 + (1.6-1.8)^2} {5} \\\\\n",
+    "\\mathit{VAR}(X) &=\\frac{(1.8-1.8)^2 + (2-1.8)^2 + (1.7-1.8)^2 + (1.9-1.8)^2 + (1.6-1.8)^2} {5} \\\\\n",
     "&= \\frac{0 + 0.04 + 0.01 + 0.01 + 0.04}{5} \\\\\n",
-    "VAR(X)&= 0.02 \\, m^2\n",
+    "\\mathit{VAR}(X)&= 0.02 \\, m^2\n",
     "\\end{aligned}$$\n",
     "\n",
     "NumPy provides the function `var()` to compute the variance:"
@@ -576,9 +576,9 @@
    "source": [
     "This is perhaps a bit hard to interpret. Heights are in meters, yet the variance is meters squared. Thus we have a more commonly used measure, the *standard deviation*, which is defined as the square root of the variance:\n",
     "\n",
-    "$$\\sigma = \\sqrt{VAR(X)}=\\sqrt{\\frac{1}{n}\\sum_{i=1}^n(x_i - \\mu)^2}$$\n",
+    "$$\\sigma = \\sqrt{\\mathit{VAR}(X)}=\\sqrt{\\frac{1}{n}\\sum_{i=1}^n(x_i - \\mu)^2}$$\n",
     "\n",
-    "It is typical to use $\\sigma$ for the *standard deviation* and $\\sigma^2$ for the *variance*. In most of this book I will be using $\\sigma^2$ instead of $VAR(X)$ for the variance; they symbolize the same thing.\n",
+    "It is typical to use $\\sigma$ for the *standard deviation* and $\\sigma^2$ for the *variance*. In most of this book I will be using $\\sigma^2$ instead of $\\mathit{VAR}(X)$ for the variance; they symbolize the same thing.\n",
     "\n",
     "For the first class we compute the standard deviation with\n",
     "\n",
@@ -660,7 +660,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "For only 5 students we obviously will not get exactly 68% within one standard deviation. We do see that 3 out of 5 students are within $1\\sigma$, or 60%, which is as close as you can get to 68% with only 5 samples. I haven't yet introduced enough math or Python for you to fully understand the next bit of code, but let's look at the results for a class with 100 students.\n",
+    "For only 5 students we obviously will not get exactly 68% within one standard deviation. We do see that 3 out of 5 students are within $\\pm1\\sigma$, or 60%, which is as close as you can get to 68% with only 5 samples. I haven't yet introduced enough math or Python for you to fully understand the next bit of code, but let's look at the results for a class with 100 students.\n",
     "\n",
     ">  We write one standard deviation as  $1\\sigma$, which is pronounced \"one standard deviation\", not \"one sigma\". Two standard deviations is $2\\sigma$, and so on."
    ]
@@ -706,7 +706,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can see by eye that roughly 68% of the heights lie within $1\\sigma$ of the mean 1.8."
+    "We can see by eye that roughly 68% of the heights lie within $\\pm1\\sigma$ of the mean 1.8."
    ]
   },
   {
@@ -830,7 +830,7 @@
     "\n",
     "This is clearly incorrect, as there is more than 0 variance in the data. \n",
     "\n",
-    "Maybe we can use the absolute value? We can see by inspection that the result is $12/4=3$ which is certainly correct - each value varies by 3 from the mean. But what if have $Y=[6, -2, -3, 1]$? In this case we get $12/4=3$. $Y$ is clearly more spread out than $X$, but the computation yields the same variance. If we use the correct formula we get a variance of 3.5 for $Y$, which reflects its larger variation.\n",
+    "Maybe we can use the absolute value? We can see by inspection that the result is $12/4=3$ which is certainly correct — each value varies by 3 from the mean. But what if we have $Y=[6, -2, -3, 1]$? In this case we get $12/4=3$. $Y$ is clearly more spread out than $X$, but the computation yields the same variance. If we use the correct formula we get a variance of 3.5 for $Y$, which reflects its larger variation.\n",
     "\n",
     "This is not a proof of correctness. Indeed, Carl Friedrich Gauss, the inventor of the technique, recognized that is is somewhat arbitrary. If there are outliers then squaring the difference gives disproportionate weight to that term. For example, let's see what happens if we have $X = [1,-1,1,-2,3,2,100]$."
    ]
@@ -914,11 +914,11 @@
    "metadata": {},
    "source": [
     "> I explain how to plot Gaussians, and much more, in the Notebook *Computing_and_Plotting_PDFs* in the \n",
-    "Supporting_Notebooks folder. You can read it online [here](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/Supporting_Notebooks/Computing_and_plotting_PDFs.ipynb)[1]\n",
+    "Supporting_Notebooks folder. You can read it online [here](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/Supporting_Notebooks/Computing_and_plotting_PDFs.ipynb) [1].\n",
     "\n",
     "This may be recognizable to you as a 'bell curve'. This curve is ubiquitous because under real world conditions many observations are distributed in such a manner. In fact, this is the  curve for the student heights given earlier. I will not use the term 'bell curve' to refer to a Gaussian because many probability distributions have a similar bell curve shape. Non-mathematical sources might not be so precise, so be judicious in what you conclude when you see the term used without definition.\n",
     "\n",
-    "This curve is not unique to heights - a vast amount of natural phenomena exhibits this sort of distribution, including the sensors that we use in filtering problems. As we will see, it also has all the attributes that we are looking for - it represents a unimodal belief or value as a probability, it is continuous, and it is computationally efficient. We will soon discover that it also has other desirable qualities which we may not realize we desire.\n",
+    "This curve is not unique to heights — a vast amount of natural phenomena exhibits this sort of distribution, including the sensors that we use in filtering problems. As we will see, it also has all the attributes that we are looking for — it represents a unimodal belief or value as a probability, it is continuous, and it is computationally efficient. We will soon discover that it also has other desirable qualities which we may not realize we desire.\n",
     "\n",
     "To further motivate you, recall the shapes of the probability distributions in the *Discrete Bayes* chapter. They were not perfect Gaussian curves, but they were similar, as in the plot below. We will be using Gaussians to replace the discrete probabilities used in that chapter!"
    ]
@@ -959,7 +959,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "A bit of nomenclature before we continue - this chart depicts the *probability density* of of a *random variable* having any value between ($-\\infty..\\infty)$. What does that mean? Imagine we take an infinite number of infinitely precise measurements of the speed of automobiles on a section of highway. We could then plot the results by showing the relative number of cars going at any given speed. If the average was 120 kph, it might look like this:"
+    "A bit of nomenclature before we continue - this chart depicts the *probability density* of a *random variable* having any value between ($-\\infty..\\infty)$. What does that mean? Imagine we take an infinite number of infinitely precise measurements of the speed of automobiles on a section of highway. We could then plot the results by showing the relative number of cars going past at any given speed. If the average was 120 kph, it might look like this:"
    ]
   },
   {
@@ -989,11 +989,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The y-axis depicts the *probability density* - the relative amount of cars that are going the speed at the corresponding x-axis.\n",
+    "The y-axis depicts the *probability density* — the relative amount of cars that are going the speed at the corresponding x-axis.\n",
     "\n",
-    "You may object that human heights or automobile speeds cannot be less than zero, let alone $-\\infty$ or $-\\infty$. This is true, but this is a common limitation of mathematical modeling. \"The map is not the territory\" is a common expression, and it is true for Bayesian filtering and statistics. The Gaussian distribution above models the distribution of the measured automobile speeds, but being a model it is necessarily imperfect. The difference between model and reality will come up again and again in these filters. Gaussians are used in many branches of mathematics, not because they perfectly model reality, but because they are easier to use than any other relatively accurate choice. However, even in this book Gaussians will fail to model reality, forcing us to use computationally expensive alternative. \n",
+    "You may object that human heights or automobile speeds cannot be less than zero, let alone $-\\infty$ or $-\\infty$. This is true, but this is a common limitation of mathematical modeling. “The map is not the territory” is a common expression, and it is true for Bayesian filtering and statistics. The Gaussian distribution above models the distribution of the measured automobile speeds, but being a model it is necessarily imperfect. The difference between model and reality will come up again and again in these filters. Gaussians are used in many branches of mathematics, not because they perfectly model reality, but because they are easier to use than any other relatively accurate choice. However, even in this book Gaussians will fail to model reality, forcing us to use computationally expensive alternative. \n",
     "\n",
-    "You will see these distributions called *Gaussian distributions* or *normal distributions*.  *Gaussian* and *normal* both mean the same thing in this context, and are used interchangeably. I will use both throughout this book as different sources will use either term, and I want you to be used to seeing both. Finally, as in this paragraph, it is typical to shorten the name and talk about a *Gaussian* or *normal* - these are both typical shortcut names for the *Gaussian distribution*. "
+    "You will see these distributions called *Gaussian distributions* or *normal distributions*.  *Gaussian* and *normal* both mean the same thing in this context, and are used interchangeably. I will use both throughout this book as different sources will use either term, and I want you to be used to seeing both. Finally, as in this paragraph, it is typical to shorten the name and talk about a *Gaussian* or *normal* — these are both typical shortcut names for the *Gaussian distribution*. "
    ]
   },
   {
@@ -1119,7 +1119,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "So the mean ($\\mu$) is what it sounds like - the average of all possible probabilities. Because of the symmetric shape of the curve it is also the tallest part of the curve. The thermometer reads $22^{\\circ}C$, so that is what we used for the mean.  \n",
+    "So the mean ($\\mu$) is what it sounds like — the average of all possible probabilities. Because of the symmetric shape of the curve it is also the tallest part of the curve. The thermometer reads 22°C, so that is what we used for the mean.  \n",
     "\n",
     "The notation for a normal distribution for a random variable $X$ is $X \\sim\\ \\mathcal{N}(\\mu,\\sigma^2)$ where $\\sim$ means *distributed according to*. This means I can express the temperature reading of our thermometer as\n",
     "\n",
@@ -1139,7 +1139,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Since this is a probability density distribution it is required that the area under the curve always equals one. This should be intuitively clear - the area under the curve represents all possible outcomes, *something* happened, and the probability of *something happening* is one, so the density must sum to one. We can prove this ourselves with a bit of code. (If you are mathematically inclined, integrate the Gaussian equation from $-\\infty$ to $\\infty$)"
+    "Since this is a probability density distribution it is required that the area under the curve always equals one. This should be intuitively clear — the area under the curve represents all possible outcomes, *something* happened, and the probability of *something happening* is one, so the density must sum to one. We can prove this ourselves with a bit of code. (If you are mathematically inclined, integrate the Gaussian equation from $-\\infty$ to $\\infty$)"
    ]
   },
   {
@@ -1204,17 +1204,17 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "So what is this telling us? The Gaussian with $\\sigma^2=0.05$ is very narrow. It is saying that we believe $x=23$, and that we are very sure about that. In contrast, the Gaussian with $\\sigma^2=5$ also believes that $x=23$, but we are much less sure about that. Our believe that $x=23$ is lower, and so our belief about the likely possible values for $x$ is spread out - we think it is quite likely that $x=20$ or $x=26$, for example. $\\sigma^2=0.05$ has almost completely eliminated $22$ or $24$ as possible values, whereas $\\sigma^2=5$ considers them nearly as likely as $23$.\n",
+    "So what is this telling us? The Gaussian with $\\sigma^2=0.05$ is very narrow. It is saying that we believe $x=23$, and that we are very sure about that. In contrast, the Gaussian with $\\sigma^2=5$ also believes that $x=23$, but we are much less sure about that. Our believe that $x=23$ is lower, and so our belief about the likely possible values for $x$ is spread out — we think it is quite likely that $x=20$ or $x=26$, for example. $\\sigma^2=0.05$ has almost completely eliminated $22$ or $24$ as possible values, whereas $\\sigma^2=5$ considers them nearly as likely as $23$.\n",
     "\n",
-    "If we think back to the thermometer, we can consider these three curves as representing the readings from three different thermometers. The curve for $\\sigma^2=0.05$ represents a very accurate thermometer, and curve for $\\sigma^2=5$ represents a fairly inaccurate one. Note the very powerful property the Gaussian distribution affords us - we can entirely represent both the reading and the error of a thermometer with only two numbers - the mean and the variance.\n",
+    "If we think back to the thermometer, we can consider these three curves as representing the readings from three different thermometers. The curve for $\\sigma^2=0.05$ represents a very accurate thermometer, and curve for $\\sigma^2=5$ represents a fairly inaccurate one. Note the very powerful property the Gaussian distribution affords us — we can entirely represent both the reading and the error of a thermometer with only two numbers — the mean and the variance.\n",
     "\n",
-    "An equivalent formation for a Gaussian is $\\mathcal{N}(\\mu,1/\\tau)$ where $\\mu$ is the *mean* and $\\tau$ the *precision*. $1/\\tau = \\sigma^2$; it is the reciprocal of the variance. While we do not use this formulation in this book, it underscores that the variance is a measure of how precise our data is. A small variance yields large precision - our measurement is very precise. Conversely, a large variance yields low precision - our belief is spread out across a large area. You should become comfortable with thinking about Gaussians in these equivalent forms. In Bayesian terms Gaussians reflect our *belief* about a measurement, they express the *precision* of the measurement, and they express how much *variance* there is in the measurements. These are all different ways of stating the same fact.\n",
+    "An equivalent formation for a Gaussian is $\\mathcal{N}(\\mu,1/\\tau)$ where $\\mu$ is the *mean* and $\\tau$ the *precision*. $1/\\tau = \\sigma^2$; it is the reciprocal of the variance. While we do not use this formulation in this book, it underscores that the variance is a measure of how precise our data is. A small variance yields large precision — our measurement is very precise. Conversely, a large variance yields low precision — our belief is spread out across a large area. You should become comfortable with thinking about Gaussians in these equivalent forms. In Bayesian terms Gaussians reflect our *belief* about a measurement, they express the *precision* of the measurement, and they express how much *variance* there is in the measurements. These are all different ways of stating the same fact.\n",
     "\n",
     "I'm getting ahead of myself, but in the next chapters we will use Gaussians to express our belief in things like the estimated position of the object we are tracking, or the accuracy of the sensors we are using.\n",
     "\n",
     "## The  68-95-99.7 Rule\n",
     "\n",
-    "It is worth spending a few words on standard deviation now. The standard deviation is a measure of how much variation from the mean exists. For Gaussian distributions, 68% of all the data falls within one standard deviation ($1\\sigma$) of the mean, 95% falls within two standard deviations ($2\\sigma$), and 99.7% within three ($3\\sigma$). This is often called the 68-95-99.7 rule. So if you were told that the average test score in a class was 71 with a standard deviation of 9.4, you could conclude that 95% of the students received a score between 52.2 and 89.8 if the distribution is normal (that is calculated with $71 \\pm (2 * 9.4)$). \n",
+    "It is worth spending a few words on standard deviation now. The standard deviation is a measure of how much variation from the mean exists. For Gaussian distributions, 68% of all the data falls within one standard deviation ($\\pm1\\sigma$) of the mean, 95% falls within two standard deviations ($\\pm2\\sigma$), and 99.7% within three ($\\pm3\\sigma$). This is often called the 68-95-99.7 rule. So if you were told that the average test score in a class was 71 with a standard deviation of 9.4, you could conclude that 95% of the students received a score between 52.2 and 89.8 if the distribution is normal (that is calculated with $71 \\pm (2 * 9.4)$). \n",
     "\n",
     "Finally, these are not arbitrary numbers. If the Gaussian for our position is $\\mu=22$ meters, then the standard deviation also has units meters. Thus $\\sigma=0.2$ implies that 68% of the measurements range from 21.8 to 22.2 meters. Variance is the standard deviation squared, so $\\sigma^2 = .04$ meters$^2$."
    ]
@@ -1398,7 +1398,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If we look at the documentation for `scipy.stats.norm` <a href=\"http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.normfor\">here</a>[2] we see that there are many other functions that norm provides.\n",
+    "If we look at the documentation for `scipy.stats.norm` [here](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.normfor) [2] we see that there are many other functions that norm provides.\n",
     "\n",
     "For example, we can generate $n$ samples from the distribution with the `rvs()` function."
    ]
@@ -1502,9 +1502,9 @@
    "source": [
     "Earlier I spoke very briefly about the *central limit theorem*, which states that under certain conditions the arithmetic sum of any independent random variable will be normally distributed, regardless of how the random variables are distributed. This is extremely important for (at least) two reasons. First, nature is full of distributions which are not normal, but when we apply the central limit theorem over large populations we end up with normal distributions. Second, Gaussians are mathematically *tractable*. We will see this more as we develop the Kalman filter theory, but there are very nice closed form solutions for operations on Gaussians that allow us to use them analytically.\n",
     "\n",
-    "However, a key part of the proof is \"under certain conditions\". These conditions often do not hold for the physical world. The resulting distributions are called *fat tailed*. Tails is a colloquial term for the far left and right side parts of the curve where the probability density is close to zero.\n",
+    "However, a key part of the proof is “under certain conditions”. These conditions often do not hold for the physical world. The resulting distributions are called *fat tailed*. Tails is a colloquial term for the far left and right side parts of the curve where the probability density is close to zero.\n",
     "\n",
-    "Let's consider a trivial example. We think of things like test scores as being normally distributed. If you have ever had a professor 'grade on a curve' you have been subject to this assumption. But of course test scores cannot follow a normal distribution. This is because the distribution assigns a nonzero probability distribution for *any* value, no matter how far from the mean. So, for example, say your mean is 90 and the standard deviation is 13. The normal distribution assumes that there is a large chance of somebody getting a 90, and a small chance of somebody getting a 40. However, it also implies that there is a tiny chance of somebody getting a grade of -10, or 150. It assigns an infinitesimal chance of getting a score of $-10^{300}$ or $10^{32986}$. The *tails* of a Gaussian distribution are infinitely long.\n",
+    "Let's consider a trivial example. We think of things like test scores as being normally distributed. If you have ever had a professor “grade on a curve” you have been subject to this assumption. But of course test scores cannot follow a normal distribution. This is because the distribution assigns a nonzero probability distribution for *any* value, no matter how far from the mean. So, for example, say your mean is 90 and the standard deviation is 13. The normal distribution assumes that there is a large chance of somebody getting a 90, and a small chance of somebody getting a 40. However, it also implies that there is a tiny chance of somebody getting a grade of -10, or 150. It assigns an infinitesimal chance of getting a score of $-10^{300}$ or $10^{32986}$. The *tails* of a Gaussian distribution are infinitely long.\n",
     "\n",
     "But for a test we know this is not true. Ignoring extra credit, you cannot get less than 0, or more than 100. Let's plot this range of values using a normal distribution."
    ]
@@ -1539,9 +1539,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The area under the curve cannot equal 1, so it is not a probability distribution. What actually happens is that more students than predicted by a normal distribution get scores nearer the upper end of the range (for example), and that tail becomes 'fat'. Also, the test is probably not able to perfectly distinguish incredibly minute differences in skill in the students, so the distribution to the left of the mean is also probably a bit bunched up in places. The resulting distribution is called a *fat tail distribution*. \n",
+    "The area under the curve cannot equal 1, so it is not a probability distribution. What actually happens is that more students than predicted by a normal distribution get scores nearer the upper end of the range (for example), and that tail becomes “fat”. Also, the test is probably not able to perfectly distinguish incredibly minute differences in skill in the students, so the distribution to the left of the mean is also probably a bit bunched up in places. The resulting distribution is called a *fat tail distribution*. \n",
     "\n",
-    "Kalman filters use sensors to measure the world. The errors in sensor's measurements are rarely truly Gaussian. It is far too early to be talking about the difficulties that this presents to the Kalman filter designer. It is worth keeping in the back of your mind the fact that the Kalman filter math is based on an idealized model of the world.  For now I will present a bit of code that I will be using later in the book to form fat tail distributions to simulate various processes and sensors. This distribution is called the *Student's t distribution*. \n",
+    "Kalman filters use sensors to measure the world. The errors in a sensor's measurements are rarely truly Gaussian. It is far too early to be talking about the difficulties that this presents to the Kalman filter designer. It is worth keeping in the back of your mind the fact that the Kalman filter math is based on an idealized model of the world.  For now I will present a bit of code that I will be using later in the book to form fat tail distributions to simulate various processes and sensors. This distribution is called the *Student's $t$-distribution*. \n",
     "\n",
     "Let's say I want to model a sensor that has some white noise in the output. For simplicity, let's say the signal is a constant 10, and the standard deviation of the noise is 2. We can use the function `numpy.random.randn()` to get a random number with a mean of 0 and a standard deviation of 1. So I could simulate this sensor with"
    ]
@@ -1595,7 +1595,7 @@
    "source": [
     "That looks like I would expect. The signal is centered around 10. A standard deviation of 2 means that 68% of the measurements will be within $\\pm$ 2 of 10, and 99% will be within $\\pm$ 6 of 10, and that looks like what is happening. \n",
     "\n",
-    "Now let's look at a fat tailed distribution generated with the Student's T distribution. I will not go into the math, but just give you the source code for it and then plot a distribution using it."
+    "Now let's look at a fat tailed distribution generated with the Student's $t$-distribution. I will not go into the math, but just give you the source code for it and then plot a distribution using it."
    ]
   },
   {
@@ -1651,7 +1651,7 @@
    "source": [
     "We can see from the plot that while the output is similar to the normal distribution there are outliers that go far more than 3 standard deviations from the mean (7 to 13). This is what causes the 'fat tail'.\n",
     "\n",
-    "It is unlikely that the Student's T distribution is an accurate model of how your sensor (say, a GPS or Doppler) performs, and this is not a book on how to model physical systems. However, it does produce reasonable data to test your filter's performance when presented with real world noise. We will be using distributions like these throughout the rest of the book in our simulations and tests. \n",
+    "It is unlikely that the Student's $t$-distribution is an accurate model of how your sensor (say, a GPS or Doppler) performs, and this is not a book on how to model physical systems. However, it does produce reasonable data to test your filter's performance when presented with real world noise. We will be using distributions like these throughout the rest of the book in our simulations and tests. \n",
     "\n",
     "This is not an idle concern. The Kalman filter equations assume the noise is normally distributed, and perform sub-optimally if this is not true. Designers for mission critical filters, such as the filters on spacecraft, need to master a lot of theory and empirical knowledge about the performance of the sensors on their spacecraft. \n",
     "\n",