diff --git a/05-Multivariate-Gaussians.ipynb b/05-Multivariate-Gaussians.ipynb
index 87e3f3c..2b306bc 100644
--- a/05-Multivariate-Gaussians.ipynb
+++ b/05-Multivariate-Gaussians.ipynb
@@ -1849,7 +1849,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Theory states that roughly 99% of a distribution will fall withing 3 standard deviations, and this appears to be the case.\n",
+    "Theory states that roughly 99% of a distribution will fall within 3 standard deviations, and this appears to be the case.\n",
     "\n",
     "Nine dimensions? I haven't quite figured out how to plot a 9D ellipsoid on a 2D screen, so there will be no graphs. The concept is the same; the standard deviation error of the distribution can be described by a 9D ellipsoid."
    ]
diff --git a/07-Kalman-Filter-Math.ipynb b/07-Kalman-Filter-Math.ipynb
index 113c4bf..e44d74f 100644
--- a/07-Kalman-Filter-Math.ipynb
+++ b/07-Kalman-Filter-Math.ipynb
@@ -143,7 +143,7 @@
     "\n",
     "$$ \\dot{\\mathbf x} = \\mathbf{Ax}+ \\mathbf{Bu} + \\mathbf{w}$$\n",
     "\n",
-    "However, we are not interested in the derivative of $\\mathbf x$, but in $\\mathbf x$ itself. Ignoring the noise for a moment, we want an equation that recusively finds the value of $\\mathbf x$ at time $t_k$ in terms of $\\mathbf x$ at time $t_{k-1}$:\n",
+    "However, we are not interested in the derivative of $\\mathbf x$, but in $\\mathbf x$ itself. Ignoring the noise for a moment, we want an equation that recursively finds the value of $\\mathbf x$ at time $t_k$ in terms of $\\mathbf x$ at time $t_{k-1}$:\n",
     "\n",
     "$$\\mathbf x(t_k) = \\mathbf F(\\Delta t)\\mathbf x(t_{k-1}) + \\mathbf B(t_k)\\mathbf u (t_k)$$\n",
     "\n",
@@ -340,7 +340,7 @@
     "\n",
     "You will recognize this as the matrix we derived analytically for the constant velocity Kalman filter in the **Multivariate Kalman Filter** chapter.\n",
     "\n",
-    "SciPy's linalg module includes a routine `expm()` to compute the matrix exponential. It does not use the Taylor series method, but the [Padé Approximation](https://en.wikipedia.org/wiki/Pad%C3%A9_approximant). There are many (at least 19) methods to computed the matrix exponential, and all suffer from numerical difficulties[1]. You should be aware of the problems, especially when $\\mathbf A$ is large. If you search for \"pade approximation matrix exponential\" you will find many publications devoted to this problem. \n",
+    "SciPy's linalg module includes a routine `expm()` to compute the matrix exponential. It does not use the Taylor series method, but the [Padé Approximation](https://en.wikipedia.org/wiki/Pad%C3%A9_approximant). There are many (at least 19) methods to compute the matrix exponential, and all suffer from numerical difficulties[1]. You should be aware of the problems, especially when $\\mathbf A$ is large. If you search for \"pade approximation matrix exponential\" you will find many publications devoted to this problem. \n",
     "\n",
     "In practice this may not be of concern to you as for the Kalman filter we normally just take the first two terms of the Taylor series. But don't assume my treatment of the problem is complete and run off and try to use this technique for other problem without doing a numerical analysis of the performance of this technique. Interestingly, one of the favored ways of solving $e^{\\mathbf At}$ is to use a generalized ode solver. In other words, they do the opposite of what we do - turn $\\mathbf A$ into a set of differential equations, and then solve that set using numerical techniques! \n",
     "\n",
diff --git a/08-Designing-Kalman-Filters.ipynb b/08-Designing-Kalman-Filters.ipynb
index 3bb7ae2..d83e66c 100644
--- a/08-Designing-Kalman-Filters.ipynb
+++ b/08-Designing-Kalman-Filters.ipynb
@@ -2117,7 +2117,7 @@
     "    \n",
     "### Normalized Estimated Error Squared (NEES)\n",
     "\n",
-    "The first method is the most powerful, but only possible in simulations. If you are simulating a system you know its true value, or 'ground truth'. It is then trival to compute the error of the system at any step as the difference between ground truth ($\\mathbf x$) and the filter's state estimate ($\\hat{\\mathbf x}$):\n",
+    "The first method is the most powerful, but only possible in simulations. If you are simulating a system you know its true value, or 'ground truth'. It is then trivial to compute the error of the system at any step as the difference between ground truth ($\\mathbf x$) and the filter's state estimate ($\\hat{\\mathbf x}$):\n",
     "\n",
     "$$\\tilde{\\mathbf x} = \\mathbf x - \\hat{\\mathbf x}$$\n",
     "\n",
@@ -2195,7 +2195,7 @@
     "```python\n",
     "from filterpy.stats import NEES\n",
     "```\n",
-    "This is an excellent measure of the filter's performance, and should be used whenever possible, especially in production code when you need to evaluate the filter while it is running. While designing a filter I still prefer plotting the residauls as it gives me a more intuitive understand what is happening.\n",
+    "This is an excellent measure of the filter's performance, and should be used whenever possible, especially in production code when you need to evaluate the filter while it is running. While designing a filter I still prefer plotting the residuals as it gives me a more intuitive understanding of what is happening.\n",
     "\n",
     "However, if your simulation is of limited fidelity then you need to use another approach."
    ]
@@ -2234,7 +2234,7 @@
     "likelihood = multivariate_normal.pdf(z.flatten(), mean=hx, cov=S)\n",
     "```\n",
     "\n",
-    "In practice it happens a bit differently. Likelihoods can be difficult to deal with mathmatically. It is common to compute and use the *log-likelihood* instead, which is just the natural log of the likelihood. This has several benefits. First, the log is strictly increasing, and it reaches it's maximum value at the same point of the function it is applied to. If you want to find the maximum of a function you normally take the derivative of it; it can be difficult to find the derivative of some arbitrary function, but finding $\\frac{d}{dx} log(f(x))$ is trivial, and the result is the same as $\\frac{d}{dx} f(x)$. We don't use this property in this book, but it is essential when performing analysis on filters.\n",
+    "In practice it happens a bit differently. Likelihoods can be difficult to deal with mathematically. It is common to compute and use the *log-likelihood* instead, which is just the natural log of the likelihood. This has several benefits. First, the log is strictly increasing, and it reaches it's maximum value at the same point of the function it is applied to. If you want to find the maximum of a function you normally take the derivative of it; it can be difficult to find the derivative of some arbitrary function, but finding $\\frac{d}{dx} log(f(x))$ is trivial, and the result is the same as $\\frac{d}{dx} f(x)$. We don't use this property in this book, but it is essential when performing analysis on filters.\n",
     "\n",
     "The likelihood and log-likelihood are computed for you when `update()` is called, and is accessible via the 'log_likelihood' and `likelihood` data attribute. Let's look at this: I'll run the filter with several measurements within expected range, and then inject measurements far from the expected values:"
    ]
diff --git a/09-Nonlinear-Filtering.ipynb b/09-Nonlinear-Filtering.ipynb
index 2811be7..3da1c64 100644
--- a/09-Nonlinear-Filtering.ipynb
+++ b/09-Nonlinear-Filtering.ipynb
@@ -104,7 +104,7 @@
     "* homogeneity: $f(ax) = af(x)$\n",
     "\n",
     "\n",
-    "This leads us to say that a linear system is defined as a system whose output is linearly proportional to the sum of all its inputs. A consequence of this is that to be linear if the input is zero than the output must also be zero. Consider an audio amp - if I sing into a microphone, and you start talking, the output should be the sum of our voices (input) scaled by the amplifier gain. But if the amplifier outputs a nonzero signal such as a hum for a zero input the additive relationship no longer holds. This is because linearity requires that $amp(voice) = amp(voice + 0)$. This clearly should give the same output, but if amp(0) is nonzero, then\n",
+    "This leads us to say that a linear system is defined as a system whose output is linearly proportional to the sum of all its inputs. A consequence of this is that to be linear if the input is zero then the output must also be zero. Consider an audio amp - if I sing into a microphone, and you start talking, the output should be the sum of our voices (input) scaled by the amplifier gain. But if the amplifier outputs a nonzero signal such as a hum for a zero input the additive relationship no longer holds. This is because linearity requires that $amp(voice) = amp(voice + 0)$. This clearly should give the same output, but if amp(0) is nonzero, then\n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",
@@ -640,7 +640,7 @@
     "\n",
     "The workhorses of nonlinear filters are the *linearized Kalman filter* and *extended Kalman filter* (EKF). These two techniques were invented shortly after Kalman published his paper and they have been the main techniques used since then. The flight software in airplanes, the GPS in your car or phone almost certainly use one of these techniques. \n",
     "\n",
-    "However, these techniques are extremely demanding. The EKF linearizes the differential equations at one point, which requires you to find a solution to a matrix of partial derivatives (a Jacobian). This can be difficult or impossible to do analytically. If impossible, you have to use numerical techniques to find the Jacobian, but this is expensive computationally and introduces more error into the system. Finally, if the problem is quite nonlinear the linearization leads to a lot of error being introduced in each step, and the filters frequently diverge. You cannot throw some equations into some arbitrary solver and expect to to get good results. It's a difficult field for professionals. I note that most Kalman filtering textbooks merely gloss over the EKF despite it being the most frequently used technique in real world applications. \n",
+    "However, these techniques are extremely demanding. The EKF linearizes the differential equations at one point, which requires you to find a solution to a matrix of partial derivatives (a Jacobian). This can be difficult or impossible to do analytically. If impossible, you have to use numerical techniques to find the Jacobian, but this is expensive computationally and introduces more error into the system. Finally, if the problem is quite nonlinear the linearization leads to a lot of error being introduced in each step, and the filters frequently diverge. You cannot throw some equations into some arbitrary solver and expect to get good results. It's a difficult field for professionals. I note that most Kalman filtering textbooks merely gloss over the EKF despite it being the most frequently used technique in real world applications. \n",
     "\n",
     "Recently the field has been changing in exciting ways. First, computing power has grown to the point that we can use techniques that were once beyond the ability of a supercomputer. These use *Monte Carlo* techniques - the computer generates thousands to tens of thousands of random points and tests all of them against the measurements. It then probabilistically kills or duplicates points based on how well they match the measurements. A point far away from the measurement is unlikely to be retained, whereas a point very close is quite likely to be retained. After a few iterations there is a clump of particles closely tracking your object, and a sparse cloud of points where there is no object.\n",
     "\n",
diff --git a/10-Unscented-Kalman-Filter.ipynb b/10-Unscented-Kalman-Filter.ipynb
index 9daa335..8feb52c 100644
--- a/10-Unscented-Kalman-Filter.ipynb
+++ b/10-Unscented-Kalman-Filter.ipynb
@@ -532,7 +532,7 @@
     "\\end{aligned}\n",
     "$$\n",
     "\n",
-    "These equations should be familar - they are the constraint equations we developed above. \n",
+    "These equations should be familiar - they are the constraint equations we developed above. \n",
     "\n",
     "In short, the unscented transform takes points sampled from some arbitary probability distribution, passes them through an arbitrary, nonlinear function and produces a Gaussian for each transformed points. I hope you can envision how we can use this to implement a nonlinear Kalman filter. Once we have Gaussians all of the mathematical apparatus we have already developed comes into play!\n",
     "\n",
@@ -2305,7 +2305,7 @@
    "source": [
     "## Batch Processing\n",
     "\n",
-    "The Kalman filter is recursive - estimates are based on the currernt measurement and prior estimate. But it is very common to have a set of data that have been already collected which we want to filter. In this case the filter can be run in a *batch* mode, where all of the measurements are filtered at once.\n",
+    "The Kalman filter is recursive - estimates are based on the current measurement and prior estimate. But it is very common to have a set of data that have been already collected which we want to filter. In this case the filter can be run in a *batch* mode, where all of the measurements are filtered at once.\n",
     "\n",
     "Collect your measurements into an array or list.\n",
     "\n",
diff --git a/11-Extended-Kalman-Filters.ipynb b/11-Extended-Kalman-Filters.ipynb
index bbc4146..59d28d2 100644
--- a/11-Extended-Kalman-Filters.ipynb
+++ b/11-Extended-Kalman-Filters.ipynb
@@ -225,7 +225,7 @@
    "source": [
     "### Design the State Variables\n",
     "\n",
-    "We want to track the position of an aircraft assuming a constant velocity and altitude, and measurements of the slant distance to the aircraft. That means we need 3 state variables - horizontal distance, horizonal velocity, and altitude:\n",
+    "We want to track the position of an aircraft assuming a constant velocity and altitude, and measurements of the slant distance to the aircraft. That means we need 3 state variables - horizontal distance, horizontal velocity, and altitude:\n",
     "\n",
     "$$\\mathbf x = \\begin{bmatrix}\\mathtt{distance} \\\\\\mathtt{velocity}\\\\ \\mathtt{altitude}\\end{bmatrix}=    \\begin{bmatrix}x \\\\ \\dot x\\\\ y\\end{bmatrix}$$"
    ]