em dash; sentence case

2025-07-27 15:26:00 -04:00
parent c3b221cd29
commit 33c6e62d68
59 changed files with 385 additions and 243 deletions
--- a/quarto/differentiable_vector_calculus/matrix_calculus_notes.qmd
+++ b/quarto/differentiable_vector_calculus/matrix_calculus_notes.qmd
@@ -1,4 +1,4 @@
-# Matrix Calculus
+# Matrix calculus

 This section illustrates a more general setting for taking derivatives, that unifies the different expositions taken prior.

@@ -74,7 +74,7 @@ Additionally, many other set of objects form vector spaces. Certain families of

 Let's take differentiable functions as an example. These form a vector space as the derivative of a linear combination of differentiable functions is defined through the simplest derivative rule: $[af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'$. If $f$ and $g$ are differentiable, then so is $af(x)+bg(x)$.

-A finite vector space is described by a *basis* -- a minimal set of vectors needed to describe the space, after consideration of linear combinations. For some typical vector spaces, this is the set of special vectors with $1$ as one of the entries, and $0$ otherwise.
+A finite vector space is described by a *basis*---a minimal set of vectors needed to describe the space, after consideration of linear combinations. For some typical vector spaces, this is the set of special vectors with $1$ as one of the entries, and $0$ otherwise.

 A key fact about a basis for a finite vector space is every vector in the vector space can be expressed *uniquely* as a linear combination of the basis vectors. The set of numbers used in the linear combination, along with an order to the basis, means an element in a finite vector space can be associated with a unique coordinate vector.

@@ -88,7 +88,7 @@ Vectors and matrices have properties that are generalizations of the real number

 * Viewing a vector as a matrix is possible. The association chosen here is common and is through a *column* vector.

-* The *transpose* of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so $v\cdot w = v^T w$, where we use a  superscript $T$ for the transpose. The transpose of a product, is the product of the transposes -- reversed: $(AB)^T = B^T A^T$; the tranpose of a transpose is an identity operation: $(A^T)^T = A$; the inverse of a transpose is the tranpose of the inverse: $(A^{-1})^T = (A^T)^{-1}$.
+* The *transpose* of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so $v\cdot w = v^T w$, where we use a  superscript $T$ for the transpose. The transpose of a product, is the product of the transposes---reversed: $(AB)^T = B^T A^T$; the tranpose of a transpose is an identity operation: $(A^T)^T = A$; the inverse of a transpose is the tranpose of the inverse: $(A^{-1})^T = (A^T)^{-1}$.

 * Matrices for which $A = A^T$ are called symmetric.

@@ -231,7 +231,7 @@ Various differentiation rules are still available such as the sum, product, and

 ### Sum and product rules for the derivative

-Using the differential notation -- which implicitly ignores higher order terms as they vanish in a limit -- the sum and product rules can be derived.
+Using the differential notation---which implicitly ignores higher order terms as they vanish in a limit---the sum and product rules can be derived.

 For the sum rule, let $f(x) = g(x) + h(x)$. Then

@@ -377,7 +377,7 @@ Multiplying left to right (the first) is called reverse mode; multiplying right

 The reason comes down to the shape of the matrices. To see, we need to know that matrix multiplication  of an $m \times q$ matrix times a $q \times n$ matrix takes an order of $mqn$ operations.

-When $m=1$, the derviative is a product of matrices of size $n\times j$, $j\times k$, and $k \times 1$ yielding a matrix of size $n \times 1$ matching the function dimension.
+When $m=1$, the derivative is a product of matrices of size $n\times j$, $j\times k$, and $k \times 1$ yielding a matrix of size $n \times 1$ matching the function dimension.

 The operations involved in multiplication from left to right can be quantified. The first operation takes $njk$ operation leaving an $n\times k$ matrix, the next multiplication then takes another $nk1$ operations or $njk + nk$ together.

@@ -435,7 +435,7 @@ That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$. (Th
 Alternatively, we can identify $A$ through its
 components, as a vector in $R^{n^2}$ and then leverage the Jacobian.

-One such identification is vectorization -- consecutively stacking the
+One such identification is vectorization---consecutively stacking the
 column vectors into a single vector. In `Julia` the `vec` function does this
 operation:

@@ -444,7 +444,7 @@ operation:
 vec(A)
 ```

-The stacking by column follows how `Julia` stores matrices and how `Julia` references a matrices entries by linear index:
+The stacking by column follows how `Julia` stores matrices and how `Julia` references  entries in a matrix by linear index:

 ```{julia}
 vec(A) == [A[i] for i in eachindex(A)]
@@ -562,7 +562,7 @@ all(l == r for (l, r) ∈ zip(L, R))

 ----

-Now to use this relationship to recognize $df = A dA + dA A$ with the Jacobian computed from $\text{vec}{f(a)}$.
+Now to use this relationship to recognize $df = A dA + dA A$ with the Jacobian computed from $\text{vec}(f(a))$.

 We have $\text{vec}(A dA + dA A) = \text{vec}(A dA) + \text{vec}(dA A)$, by obvious linearity of $\text{vec}$. Now inserting an identity matrix, $I$, which is symmteric, in a useful spot we have:

@@ -683,7 +683,7 @@ det(I + dA) - det(I)

 ## The adjoint method

-The chain rule brings about a series of products. The adjoint method illustrated below, shows how to approach the computation of the series in a direction that minimizes the computational cost, illustrating why reverse mode is preferred to forward mode when a scalar function of several variables is considered.
+The chain rule brings about a series of products. The adjoint method illustrated by @BrightEdelmanJohnson and summarize below, shows how to approach the computation of the series in a direction that minimizes the computational cost, illustrating why reverse mode is preferred to forward mode when a scalar function of several variables is considered.


@BrightEdelmanJohnson consider the derivative of
@@ -778,9 +778,9 @@ Here $v$ can be solved for by taking adjoints (as before). Let $A = \partial h/\

 ## Second derivatives, Hessian

-@CarlssonNikitinTroedssonWendt

-We reference a theorem  presented by [Carlsson, Nikitin, Troedsson, and Wendt](https://arxiv.org/pdf/2502.03070v1) for exposition with some modification
+
+We reference a theorem  presented by @CarlssonNikitinTroedssonWendt for exposition with some modification

 ::: {.callout-note appearance="minimal"}
 Theorem 1. Let $f:X \rightarrow Y$, where $X,Y$ are finite dimensional *inner product* spaces with elements in $R$. Suppose $f$ is smooth (a certain number of derivatives). Then for each $x$ in $X$ there exists a unique linear operator, $f'(x)$, and a unique *bilinear* *symmetric* operator $f'': X \oplus X \rightarrow Y$ such that
@@ -804,7 +804,7 @@ $$
 \begin{align*}
 f(x + dx) &= f(x) +
 \frac{\partial f}{\partial x_1} dx_1 + \frac{\partial f}{\partial x_2} dx_2\\
-&+  \frac{1}{2}\left(
+&{+}  \frac{1}{2}\left(
 \frac{\partial^2 f}{\partial x_1^2}dx_1^2 +
 \frac{\partial^2 f}{\partial x_1 \partial x_2}dx_1dx_2 +
 \frac{\partial^2 f}{\partial x_2^2}dx_2^2
@@ -832,7 +832,7 @@ $$

 $H$ being the *Hessian* with entries $H_{ij} = \frac{\partial f}{\partial x_i \partial x_j}$.

-This formula -- $f(x+dx)-f(x) \approx f'(x)dx + dx^T H dx$ -- is valid for any $n$, showing $n=2$ was just for ease of notation when expressing in the coordinates and not as matrices.
+This formula---$f(x+dx)-f(x) \approx f'(x)dx + dx^T H dx$---is valid for any $n$, showing $n=2$ was just for ease of notation when expressing in the coordinates and not as matrices.

 By uniqueness, we have under these assumptions that the Hessian is *symmetric* and the expression $dx^T H dx$ is a *bilinear* form, which we can identify as $f''(x)[dx,dx]$.

@@ -909,24 +909,23 @@ $$
 &= \left(
 \text{det}(A) + \text{det}(A)\text{tr}(A^{-1}dA')
 \right)
-\text{tr}((A^{-1} - A^{-1}dA' A^{-1})dA) - \text{det}(A) \text{tr}(A^{-1}dA) \\
+\text{tr}((A^{-1} - A^{-1}dA' A^{-1})dA)\\
+&\quad{-} \text{det}(A) \text{tr}(A^{-1}dA) \\
 &=
-\text{det}(A) \text{tr}(A^{-1}dA)\\
-&+ \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) \\
-&- \text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA)\\
-&- \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA' A^{-1}dA)\\
-&- \text{det}(A) \text{tr}(A^{-1}dA) \\
+\textcolor{blue}{\text{det}(A) \text{tr}(A^{-1}dA)}\\
+&\quad{+} \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) \\
+&\quad{-} \text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA)\\
+&\quad{-} \textcolor{red}{\text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA' A^{-1}dA)}\\
+&\quad{-} \textcolor{blue}{\text{det}(A) \text{tr}(A^{-1}dA)} \\
 &= \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) - \text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA)\\
-&+ \text{third order term}
+&\quad{+} \textcolor{red}{\text{third order term}}
 \end{align*}
 $$

 So, after dropping the third-order term, we see:

 $$
-\begin{align*}
 f''(A)[dA,dA']
-&= \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA)\\
-&\quad - \text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA).
-\end{align*}
+= \text{det}(A)\text{tr}(A^{-1}dA')\text{tr}(A^{-1}dA) -
+\text{det}(A)\text{tr}(A^{-1}dA' A^{-1}dA).
 $$
--- a/quarto/differentiable_vector_calculus/polar_coordinates.qmd
+++ b/quarto/differentiable_vector_calculus/polar_coordinates.qmd
@@ -1,4 +1,4 @@
-# Polar Coordinates and Curves
+# Polar coordinates and curves


 {{< include ../_common_code.qmd >}}
@@ -226,7 +226,7 @@ The folium has radial part $0$ when $\cos(\theta) = 0$ or $\sin(2\theta) = b/4a$
 plot_polar(𝒂0..(pi/2-𝒂0), 𝒓)
 ```

-The second - which is too small to appear in the initial plot without zooming in - with
+The second---which is too small to appear in the initial plot without zooming in---with


 ```{julia}
--- a/quarto/differentiable_vector_calculus/scalar_functions.qmd
+++ b/quarto/differentiable_vector_calculus/scalar_functions.qmd
@@ -388,7 +388,7 @@ For a scalar function, Define a *level curve* as the solutions to the equations
 contour(xsₛ, ysₛ, zzsₛ)
 ```

-Were one to walk along one of the contour lines, then there would be no change in elevation. The areas of greatest change in elevation - basically the hills - occur where the different contour lines are closest. In this particular area, there is a river that runs from the upper right through to the lower left and this is flanked by hills.
+Were one to walk along one of the contour lines, then there would be no change in elevation. The areas of greatest change in elevation---basically the hills--- occur where the different contour lines are closest. In this particular area, there is a river that runs from the upper right through to the lower left and this is flanked by hills.


 The $c$ values for the levels drawn may be specified through the `levels` argument:
@@ -636,7 +636,7 @@ This says, informally, for any scale about $L$ there is a "ball" about $C$ (not
 In the univariate case, it can be useful to characterize a limit at $x=c$ existing if *both* the left and right limits exist and the two are equal. Generalizing to getting close in $R^m$ leads to the intuitive idea of a limit existing in terms of any continuous "path" that approaches $C$ in the $x$-$y$ plane has a limit and all are equal. Let $\gamma$ describe the path, and $\lim_{s \rightarrow t}\gamma(s) = C$. Then $f \circ \gamma$ will be a univariate function. If there is a limit, $L$, then this composition will also have the same limit as $s \rightarrow t$. Conversely, if for *every* path this composition has the *same* limit, then $f$ will have a limit.


-The "two path corollary" is a trick to show a limit does not exist - just find two paths where there is a limit, but they differ, then a limit does not exist in general.
+The "two path corollary" is a trick to show a limit does not exist---just find two paths where there is a limit, but they differ, then a limit does not exist in general.


 ### Continuity of scalar functions
@@ -997,7 +997,7 @@ The figure suggests a potential geometric relationship between the gradient and
 We see here how the gradient of $f$, $\nabla{f} = \langle f_{x_1}, f_{x_2}, \dots, f_{x_n} \rangle$, plays a similar role as the derivative does for univariate functions.


-First, we consider the role of the derivative for univariate functions. The main characterization - the derivative is the slope of the line that best approximates the function at a point - is quantified by Taylor's theorem. For a function $f$ with a continuous second derivative:
+First, we consider the role of the derivative for univariate functions. The main characterization---the derivative is the slope of the line that best approximates the function at a point---is quantified by Taylor's theorem. For a function $f$ with a continuous second derivative:


 $$
@@ -1174,7 +1174,7 @@ atand(mean(slopes))
 Which seems about right for a generally uphill trail section, as this is.


-In the above example, the data is given in terms of a sample, not a functional representation. Suppose instead, the surface was generated by `f` and the path - in the $x$-$y$ plane - by $\gamma$. Then we could estimate the maximum and average steepness by a process like this:
+In the above example, the data is given in terms of a sample, not a functional representation. Suppose instead, the surface was generated by `f` and the path---in the $x$-$y$ plane---by $\gamma$. Then we could estimate the maximum and average steepness by a process like this:


 ```{julia}
--- a/quarto/differentiable_vector_calculus/scalar_functions_applications.qmd
+++ b/quarto/differentiable_vector_calculus/scalar_functions_applications.qmd
@@ -918,7 +918,7 @@ zs = fₗ.(xs, ys)
 scatter3d!(xs, ys, zs)
 ```

-A contour plot also shows that some - and only one - extrema happens on the interior:
+A contour plot also shows that some---and only one---extrema happens on the interior:


 ```{julia}
@@ -967,10 +967,10 @@ We confirm this by looking at the Hessian and noting $H_{11} > 0$:
 Hₛ = subs.(hessian(exₛ, [x,y]), x=>xstarₛ[x], y=>xstarₛ[y])
 ```

-As it occurs at $(\bar{x}, \bar{y})$ where $\bar{x} = (x_1 + x_2 + x_3)/3$ and $\bar{y} = (y_1+y_2+y_3)/3$ - the averages of the three values - the critical point is an interior point of the triangle.
+As it occurs at $(\bar{x}, \bar{y})$ where $\bar{x} = (x_1 + x_2 + x_3)/3$ and $\bar{y} = (y_1+y_2+y_3)/3$---the averages of the three values---the critical point is an interior point of the triangle.


-As mentioned by Strang, the real problem is to minimize $d_1 + d_2 + d_3$. A direct approach with `SymPy` - just replacing `d2` above with the square root fails. Consider instead the gradient of $d_1$, say. To avoid square roots, this is taken implicitly from $d_1^2$:
+As mentioned by Strang, the real problem is to minimize $d_1 + d_2 + d_3$. A direct approach with `SymPy`---just replacing `d2` above with the square root fails. Consider instead the gradient of $d_1$, say. To avoid square roots, this is taken implicitly from $d_1^2$:


 $$
@@ -1016,7 +1016,7 @@ psₛₗ = [a*u for (a,u) in zip(asₛ₁, usₛ)]
 plot!(polygon(psₛₗ)...)
 ```

-Let's see where the minimum distance point is by constructing a plot. The minimum must be on the boundary, as the only point where the gradient vanishes is the origin, not in the triangle. The plot of the triangle has a contour plot of the distance function, so we see clearly that the minimum happens at the point `[0.5, -0.866025]`. On this plot, we drew the gradient at some points along the boundary. The gradient points in the direction of greatest increase - away from the minimum. That the gradient vectors have a non-zero projection onto the edges of the triangle in a direction pointing away from the point indicates that the function `d` would increase if moved along the boundary in that direction, as indeed it does.
+Let's see where the minimum distance point is by constructing a plot. The minimum must be on the boundary, as the only point where the gradient vanishes is the origin, not in the triangle. The plot of the triangle has a contour plot of the distance function, so we see clearly that the minimum happens at the point `[0.5, -0.866025]`. On this plot, we drew the gradient at some points along the boundary. The gradient points in the direction of greatest increase---away from the minimum. That the gradient vectors have a non-zero projection onto the edges of the triangle in a direction pointing away from the point indicates that the function `d` would increase if moved along the boundary in that direction, as indeed it does.


 ```{julia}
@@ -1064,7 +1064,7 @@ The smallest value is when $t=0$ or $t=1$, so at one of the points, as `li` is d
 ##### Example: least squares


-We know that two points determine a line. What happens when there are more than two points? This is common in statistics where a bivariate data set (pairs of points $(x,y)$) are summarized through a linear model $\mu_{y|x} = \alpha + \beta x$, That is the average value for $y$ given a particular $x$ value is given through the equation of a line. The data is used to identify what the slope and intercept are for this line. We consider a simple case - $3$ points. The case of $n \geq 3$ being similar.
+We know that two points determine a line. What happens when there are more than two points? This is common in statistics where a bivariate data set (pairs of points $(x,y)$) are summarized through a linear model $\mu_{y|x} = \alpha + \beta x$, That is the average value for $y$ given a particular $x$ value is given through the equation of a line. The data is used to identify what the slope and intercept are for this line. We consider a simple case---$3$ points. The case of $n \geq 3$ being similar.


 We have a line $l(x) = \alpha + \beta(x)$ and three points $(x_1, y_1)$, $(x_2, y_2)$, and $(x_3, y_3)$. Unless these three points *happen* to be collinear, they can't possibly all lie on the same line. So to *approximate* a relationship by a line requires some inexactness. One measure of inexactness is the *vertical* distance to the line:
@@ -1118,7 +1118,7 @@ As found, the formulas aren't pretty. If $x_1 + x_2 + x_3 = 0$ they simplify. Fo
 subs(outₗₛ[β], sum(xₗₛ) => 0)
 ```

-Let $\vec{x} = \langle x_1, x_2, x_3 \rangle$ and $\vec{y} = \langle y_1, y_2, y_3 \rangle$ this is simply $(\vec{x} \cdot \vec{y})/(\vec{x}\cdot \vec{x})$, a formula that will generalize to $n > 3$. The assumption is not a restriction - it comes about by subtracting the mean, $\bar{x} = (x_1 + x_2 + x_3)/3$, from each $x$ term (and similarly subtract $\bar{y}$ from each $y$ term). A process called "centering."
+Let $\vec{x} = \langle x_1, x_2, x_3 \rangle$ and $\vec{y} = \langle y_1, y_2, y_3 \rangle$ this is simply $(\vec{x} \cdot \vec{y})/(\vec{x}\cdot \vec{x})$, a formula that will generalize to $n > 3$. The assumption is not a restriction---it comes about by subtracting the mean, $\bar{x} = (x_1 + x_2 + x_3)/3$, from each $x$ term (and similarly subtract $\bar{y}$ from each $y$ term). A process called "centering."


 With this observation, the formulas can be re-expressed through:
@@ -1587,7 +1587,7 @@ $$
 G(\epsilon_1, \epsilon_2) = L.
 $$

-Now, Lagrange's method can be employed. This will be fruitful - even though we know the answer - it being $\epsilon_1 = \epsilon_2 = 0$!
+Now, Lagrange's method can be employed. This will be fruitful---even though we know the answer---it being $\epsilon_1 = \epsilon_2 = 0$!


 Forging ahead, we compute $\nabla{F}$ and $\lambda \nabla{G}$ and set $\epsilon_1 = \epsilon_2 = 0$ where the two  are equal. This will lead to a description of $y$ in terms of $y'$.
--- a/quarto/differentiable_vector_calculus/vector_fields.qmd
+++ b/quarto/differentiable_vector_calculus/vector_fields.qmd
@@ -111,7 +111,7 @@ Plot of a vector field from $R^2 \rightarrow R^2$ illustrated by drawing curves

 To the plot, we added the partial derivatives with respect to $r$ (in red) and with respect to $\theta$ (in blue). These are found with the soon-to-be discussed Jacobian. From the graph, you can see that these vectors are tangent vectors to the drawn curves.

-The curves form a non-rectangular grid. Were the cells exactly parallelograms, the area would be computed taking into account the length of the vectors and the angle between them -- the same values that come out of a cross product.
+The curves form a non-rectangular grid. Were the cells exactly parallelograms, the area would be computed taking into account the length of the vectors and the angle between them---the same values that come out of a cross product.


 ## Parametrically defined surfaces
@@ -323,7 +323,7 @@ plt = plot_axes()

 We are using the vector of tuples interface (representing points) to specify the curve to draw.

-Now we add on some curves for fixed $t$ and then fixed $\theta$ utilizing the fact that `project` returns a tuple of $x$--$y$ values to display.
+Now we add on some curves for fixed $t$ and then fixed $\theta$ utilizing the fact that `project` returns a tuple of $x$---$y$ values to display.

 ```{julia}
 for t in range(t₀, tₙ, 20)
@@ -1225,9 +1225,9 @@ q = interpolate(vcat(basic_conditions, new_conds))
 plot_q_level_curve(q;layout=(1,2))
 ```

-For this shape, if $b$ increases away from $b_0$, the secant line connecting $(a_0,0)$ and $(b, f(b)$ will have a negative slope, but there are no points nearby $x=c_0$ where the derivative has a tangent line with negative slope, so the continuous function is only on the left side of $b_0$. Mathematically, as $f$ is increasing $c_0$ -- as $f'''(c_0) = 3  > 0$ -- and $f$ is decreasing at $f(b_0)$ -- as $f'(b_0) = -1 < 0$, the signs alone suggest the scenario. The contour plot reveals, not one, but two one-sided functions of $b$ giving $c$.
+For this shape, if $b$ increases away from $b_0$, the secant line connecting $(a_0,0)$ and $(b, f(b)$ will have a negative slope, but there are no points nearby $x=c_0$ where the derivative has a tangent line with negative slope, so the continuous function is only on the left side of $b_0$. Mathematically, as $f$ is increasing $c_0$---as $f'''(c_0) = 3  > 0$---and $f$ is decreasing at $f(b_0)$---as $f'(b_0) = -1 < 0$, the signs alone suggest the scenario. The contour plot reveals, not one, but two one-sided functions of $b$ giving $c$.

----
+---

 Now to characterize all possibilities.

@@ -1291,7 +1291,7 @@ $$
 Then $F(c, b) = g_1(b) - g_2(c)$.

 By construction,  $g_2(c_0) = 0$ and $g_2^{(k)}(c_0) = f^{(k+1)}(c_0)$,
-Adjusting $f$ to have a vanishing second -- but not third -- derivative at $c_0$ means $g_2$ will satisfy the assumptions of the lemma assuming $f$ has at least four continuous derivatives (as all our example polynomials do).
+Adjusting $f$ to have a vanishing second---but not third---derivative at $c_0$ means $g_2$ will satisfy the assumptions of the lemma assuming $f$ has at least four continuous derivatives (as all our example polynomials do).

 As for $g_1$, we have by construction $g_1(b_0) = 0$. By differentiation we get a pattern for some constants $c_j = (j+1)\cdot(j+2)\cdots \cdot k$ with $c_k = 1$.

--- a/quarto/differentiable_vector_calculus/vector_valued_functions.qmd
+++ b/quarto/differentiable_vector_calculus/vector_valued_functions.qmd
@@ -981,7 +981,7 @@ $$
 \vec{v} \times \vec{c} = GM \hat{x} + \vec{d}.
 $$

-As $\vec{x}$ and $\vec{v}\times\vec{c}$ lie in the same plane - orthogonal to $\vec{c}$ - so does $\vec{d}$. With a suitable re-orientation, so that $\vec{d}$ is along the $x$ axis, $\vec{c}$ is along the $z$-axis, then we have $\vec{c} = \langle 0,0,c\rangle$ and $\vec{d} = \langle d ,0,0 \rangle$, and $\vec{x} = \langle x, y, 0 \rangle$. Set $\theta$ to be the angle, then $\hat{x} = \langle \cos(\theta), \sin(\theta), 0\rangle$.
+As $\vec{x}$ and $\vec{v}\times\vec{c}$ lie in the same plane---orthogonal to $\vec{c}$---so does $\vec{d}$. With a suitable re-orientation, so that $\vec{d}$ is along the $x$ axis, $\vec{c}$ is along the $z$-axis, then we have $\vec{c} = \langle 0,0,c\rangle$ and $\vec{d} = \langle d ,0,0 \rangle$, and $\vec{x} = \langle x, y, 0 \rangle$. Set $\theta$ to be the angle, then $\hat{x} = \langle \cos(\theta), \sin(\theta), 0\rangle$.


 Now
@@ -1662,7 +1662,7 @@ $$
 The first equation relates the steering angle with the curvature. If the steering angle is not changed ($d\alpha/du=0$) then the curvature is constant and the motion is circular. It will be greater for larger angles (up to $\pi/2$). As the curvature is the reciprocal of the radius, this means the radius of the circular trajectory will be smaller. For the same constant steering angle, the curvature will be smaller for longer wheelbases, meaning the circular trajectory will have a larger radius. For cars, which have similar dynamics, this means longer wheelbase cars will take more room to make a U-turn.


-The second equation may be interpreted in ratio of arc lengths. The infinitesimal arc length of the rear wheel is proportional to that of the front wheel only scaled down by $\cos(\alpha)$. When $\alpha=0$ - the bike is moving in a straight line - and the two are the same. At the other extreme - when $\alpha=\pi/2$ - the bike must be pivoting on its rear wheel and the rear wheel has no arc length. This cosine, is related to the speed of the back wheel relative to the speed of the front wheel, which was used in the initial differential equation.
+The second equation may be interpreted in ratio of arc lengths. The infinitesimal arc length of the rear wheel is proportional to that of the front wheel only scaled down by $\cos(\alpha)$. When $\alpha=0$---the bike is moving in a straight line---and the two are the same. At the other extreme---when $\alpha=\pi/2$---the bike must be pivoting on its rear wheel and the rear wheel has no arc length. This cosine, is related to the speed of the back wheel relative to the speed of the front wheel, which was used in the initial differential equation.


 The last equation, relates the curvature of the back wheel track to the steering angle of the front wheel. When $\alpha=\pm\pi/2$, the rear-wheel curvature, $k$, is infinite, resulting in a cusp (no circle with non-zero radius will approximate the trajectory). This occurs when the front wheel is steered orthogonal to the direction of motion. As was seen in previous graphs of the trajectories, a cusp can happen for quite regular front wheel trajectories.
@@ -1875,7 +1875,7 @@ $$
 $$


-We see $\vec\beta'$ is zero (the curve is non-regular) when $\kappa'(s) = 0$. The curvature changes from increasing to decreasing, or vice versa at each of the $4$ crossings of the major and minor axes - there are $4$ non-regular points, and we see $4$ cusps in the evolute.
+We see $\vec\beta'$ is zero (the curve is non-regular) when $\kappa'(s) = 0$. The curvature changes from increasing to decreasing, or vice versa at each of the $4$ crossings of the major and minor axes--there are $4$ non-regular points, and we see $4$ cusps in the evolute.


 The curve parameterized by $\vec{r}(t) = 2(1 - \cos(t)) \langle \cos(t), \sin(t)\rangle$ over $[0,2\pi]$ is cardiod. It is formed by rolling a circle of radius $r$ around another similar sized circle. The following graphically shows the evolute is a smaller cardiod (one-third the size). For fun, the evolute of the evolute is drawn:
--- a/quarto/differentiable_vector_calculus/vectors.qmd
+++ b/quarto/differentiable_vector_calculus/vectors.qmd
@@ -81,7 +81,7 @@ $$
 \| \vec{v} \| = \sqrt{ v_1^2 + v_2^2 + \cdots + v_n^2}.
 $$

-The definition of a norm leads to a few properties. First, if $c$ is a scalar, $\| c\vec{v} \| = |c| \| \vec{v} \|$ - which says scalar multiplication by $c$ changes the length by $|c|$. (Sometimes, scalar multiplication is described as "scaling by....") The other property is an analog of the triangle inequality, in which for any two vectors $\| \vec{v} + \vec{w} \| \leq \| \vec{v} \| + \| \vec{w} \|$. The right hand side is equal only when the two vectors are parallel.
+The definition of a norm leads to a few properties. First, if $c$ is a scalar, $\| c\vec{v} \| = |c| \| \vec{v} \|$---which says scalar multiplication by $c$ changes the length by $|c|$. (Sometimes, scalar multiplication is described as "scaling by....") The other property is an analog of the triangle inequality, in which for any two vectors $\| \vec{v} + \vec{w} \| \leq \| \vec{v} \| + \| \vec{w} \|$. The right hand side is equal only when the two vectors are parallel.


 A vector with length $1$ is called a *unit* vector. Dividing a non-zero vector by its norm will yield a unit vector, a consequence of the first property above. Unit vectors are often written with a "hat:" $\hat{v}$.
@@ -234,7 +234,7 @@ A simple example might be to add up a sequence of numbers. A direct way might be
 x1, x2, x3, x4, x5, x6 = 1, 2, 3, 4, 5, 6
 x1 + x2 + x3 + x4 + x5 + x6
 ```
- 
+
 Someone doesn't need to know `Julia`'s syntax to guess what this computes, save for the idiosyncratic tuple assignment used, which could have been bypassed at the cost of even more typing.

 A more efficient means to do, as each component isn't named, this would be to store the data in a container:
@@ -267,7 +267,7 @@ These two functions are *reductions*. There are others, such as `maximum` and `m
 reduce(+, xs; init=0)  # sum(xs)
 ```

-or 
+or

 ```{julia}
 reduce(*, xs; init=1)  # prod(xs)
@@ -289,9 +289,9 @@ and
 foldr(=>, xs)
 ```

-Next, we do a slightly more complicated problem. 
+Next, we do a slightly more complicated problem.

-Recall the distance formula between two points, also called the *norm*. It is written here with the square root on the other side: $d^2 = (x_1-y_1)^2 + (x_0 - y_0)^2$. This computation can be usefully generalized to higher dimensional points (with $n$ components each). 
+Recall the distance formula between two points, also called the *norm*. It is written here with the square root on the other side: $d^2 = (x_1-y_1)^2 + (x_0 - y_0)^2$. This computation can be usefully generalized to higher dimensional points (with $n$ components each).

 This first example shows how the value for $d^2$ can be found using broadcasting and `sum`:

@@ -309,10 +309,10 @@ This formula is a sum after applying an operation to the paired off values. Usin
 sum((xi - yi)^2 for (xi, yi) in zip(xs, ys))
 ```

-The `zip` function, used above, produces an iterator over tuples of the paired off values in the two (or more) containers passed to it. 
+The `zip` function, used above, produces an iterator over tuples of the paired off values in the two (or more) containers passed to it.


-This pattern -- where a reduction follows a function's application to the components -- is implemented in `mapreduce`. 
+This pattern---where a reduction follows a function's application to the components---is implemented in `mapreduce`.


 ```{julia}
@@ -337,7 +337,7 @@ mapreduce((xi,yi) -> (xi-yi)^2, +, xs, ys)
 At times, extracting all but the first or last value can be of interest. For example, a polygon comprised of $n$ points (the vertices), might be stored using a vector for the $x$ and $y$ values with an additional point that mirrors the first. Here are the points:

 ```{julia}
-xs = [1, 3, 4, 2]  
+xs = [1, 3, 4, 2]
 ys = [1, 1, 2, 3]
 pts = zip(xs, ys)  # recipe for [(x1,y1), (x2,y2), (x3,y3), (x4,y4)]
 ```
@@ -392,7 +392,7 @@ The `take` method could be used to remove the padded value from the `xs` and `ys

 ##### Example: Riemann sums

-In the computation of a Riemann sum, the interval $[a,b]$ is partitioned using $n+1$ points $a=x_0 < x_1 < \cdots < x_{n-1} < x_n = b$. 
+In the computation of a Riemann sum, the interval $[a,b]$ is partitioned using $n+1$ points $a=x_0 < x_1 < \cdots < x_{n-1} < x_n = b$.

 ```{julia}
 a, b, n = 0, 1, 4
@@ -414,7 +414,7 @@ sum(f ∘ first, partitions)
 ```

 This uses a few things: like `mapreduce`, `sum` allows a function to
-be applied to each element in the `partitions` collection. (Indeed, the default method to compute `sum(xs)` for an arbitrary container resolves to `mapreduce(identity, add_sum, xs)` where `add_sum` is basically `+`.) 
+be applied to each element in the `partitions` collection. (Indeed, the default method to compute `sum(xs)` for an arbitrary container resolves to `mapreduce(identity, add_sum, xs)` where `add_sum` is basically `+`.)

 In this case, the
 values come as tuples to the function to apply to each component.
@@ -636,7 +636,7 @@ But the associative property does not make sense, as $(\vec{u} \cdot \vec{v}) \c
 ## Matrices


-Algebraically, the dot product of two vectors - pair off by components, multiply these, then add - is a common operation. Take for example, the general equation of a line, or a plane:
+Algebraically, the dot product of two vectors---pair off by components, multiply these, then add---is a common operation. Take for example, the general equation of a line, or a plane:


 $$
@@ -764,7 +764,7 @@ Vectors are defined similarly. As they are identified with *column* vectors, we


 ```{julia}
-𝒷 = [10, 11, 12]   # not 𝒷 = [10 11 12], which would be a row vector.
+a = [10, 11, 12]   # not a = [10 11 12], which would be a row vector.
 ```

 In `Julia`, entries in a matrix (or a vector) are stored in a container with a type wide enough accommodate each entry. In this example, the type is SymPy's `Sym` type:
@@ -822,7 +822,7 @@ We can then see how the system of equations is represented with matrices:


 ```{julia}
-M * xs - 𝒷
+M * xs - a
 ```

 Here we use `SymPy` to verify the above:
@@ -899,7 +899,7 @@ and
 ```
 :::{.callout-note}
 ## Note
-The adjoint is defined *recursively* in `Julia`. In the `CalculusWithJulia` package, we overload the `'` notation for *functions* to yield a univariate derivative found with automatic differentiation. This can lead to problems: if we have a matrix of functions, `M`, and took the transpose with `M'`, then the entries of `M'` would be the derivatives of the functions in `M` - not the original functions. This is very much likely to not be what is desired. The `CalculusWithJulia` package commits **type piracy** here *and* abuses the generic idea for `'` in Julia. In general type piracy is very much frowned upon, as it can change expected behaviour. It is defined in `CalculusWithJulia`, as that package is intended only to act as a means to ease users into the wider package ecosystem of `Julia`.
+The adjoint is defined *recursively* in `Julia`. In the `CalculusWithJulia` package, we overload the `'` notation for *functions* to yield a univariate derivative found with automatic differentiation. This can lead to problems: if we have a matrix of functions, `M`, and took the transpose with `M'`, then the entries of `M'` would be the derivatives of the functions in `M`---not the original functions. This is very much likely to not be what is desired. The `CalculusWithJulia` package commits **type piracy** here *and* abuses the generic idea for `'` in Julia. In general type piracy is very much frowned upon, as it can change expected behaviour. It is defined in `CalculusWithJulia`, as that package is intended only to act as a means to ease users into the wider package ecosystem of `Julia`.
 :::

 ---
@@ -1081,7 +1081,7 @@ norm(u₂ × v₂)
 ---


-This analysis can be extended to the case of 3 vectors, which - when not co-planar - will form a *parallelepiped*.
+This analysis can be extended to the case of 3 vectors, which---when not co-planar---will form a *parallelepiped*.


 ```{julia}