edits

2025-05-09 07:33:21 -04:00
parent fa5f9f449d
commit ad52202c92
1 changed files with 16 additions and 16 deletions
--- a/quarto/differentiable_vector_calculus/matrix_calculus_notes.qmd
+++ b/quarto/differentiable_vector_calculus/matrix_calculus_notes.qmd
@@ -176,7 +176,7 @@ $$
 \begin{align*}
 df &= f(x + dx) - f(x)\\
 &= (x + dx)^T A (x + dx) - x^TAx \\
-&= \textcolor{blue}{x^TAx} + dx^TA x + \textcolor{blue}{x^TAx} + \textcolor{red}{dx^T A dx} - \textcolor{blue}{x^TAx}\\
+&= \textcolor{blue}{x^TAx} + dx^TA x + x^TAdx + \textcolor{red}{dx^T A dx} - \textcolor{blue}{x^TAx}\\
 &= dx^TA x + x^TAdx \\
 &= (dx^TAx)^T + x^TAdx \\
 &= x^T A^T dx + x^T A dx\\
@@ -246,7 +246,7 @@ df &= f(x + dx) - f(x) \\
 \end{align*}
 $$

-Comparing we get $f'(x)dx = (g'(x) + h'(x))[dx]$ or $f'(x) = g'(x) + h'(x)$.
+Comparing we get $f'(x)[dx] = (g'(x) + h'(x))[dx]$ or $f'(x) = g'(x) + h'(x)$. (The last two lines above show how the new linear operator $g'(x) + h'(x)$ is defined on a value, but adding the application for each.

 The sum rule has the same derivation as was done with univariate, scalar functions. Similarly for the product rule.

@@ -256,10 +256,10 @@ $$
 \begin{align*}
 df &= f(x + dx) - f(x) \\
 &= g(x+dx)h(x + dx) - g(x) h(x)\\
-&= \left(g(x) + g'(x)dx\right)\left(h(x) + h'(x) dx\right) - \left(g(x) h(x)\right) \\
-&= \textcolor{blue}{g(x)h(x)} + g'(x) dx h(x) + g(x) h'(x) dx + \textcolor{red}{g'(x)dx h'(x) dx} - \textcolor{blue}{g(x) h(x)}\\
-&= g'(x)dxh(x)  + g(x)h'(x) dx\\
-&= (g'(x)h(x) + g(x)h'(x)) dx
+&= \left(g(x) + g'(x)[dx]\right)\left(h(x) + h'(x) [dx]\right) - g(x) h(x) \\
+&= \textcolor{blue}{g(x)h(x)} + g'(x) [dx] h(x) + g(x) h'(x) [dx] + \textcolor{red}{g'(x)[dx] h'(x) [dx]} - \textcolor{blue}{g(x) h(x)}\\
+&= \left(g'(x)[dx]\right)h(x)  + g(x)\left(h'(x) [dx]\right)\\
+&= dg h + g dh
 \end{align*}
 $$

@@ -369,16 +369,18 @@ Multiplying left to right (the first) is called reverse mode; multiplying right
 * if $f:R^n \rightarrow R^m$ has $n=1$ and $m$ much bigger than one, the it is faster to do right to left multiplication (many outputs than inputs)


-The basic idea comes down to the shape of the matrices. When $m=1$, the derviative is a product of matrices of size $n\times j$, $j\times k$, and $k \times 1$ yielding a matrix of size $n \times 1$ matching the function dimension. Matrix multiplication  of an $m \times q$ matrix times a $q \times n$ matrix takes an order of $mqn$ operations.
+The reason comes down to the shape of the matrices. To see, we need to know that matrix multiplication  of an $m \times q$ matrix times a $q \times n$ matrix takes an order of $mqn$ operations.

-The operations involved in multiplication of left to right can be quantified. The first operation takes $njk$ operation leaving an $n\times k$ matrix, the next multiplication then takes another $nk1$ operations or $njk + nk$ together.
+When $m=1$, the derviative is a product of matrices of size $n\times j$, $j\times k$, and $k \times 1$ yielding a matrix of size $n \times 1$ matching the function dimension.
+
+The operations involved in multiplication from left to right can be quantified. The first operation takes $njk$ operation leaving an $n\times k$ matrix, the next multiplication then takes another $nk1$ operations or $njk + nk$ together.

 Whereas computing from the right to left is first $jk1$ operations leaving a $j \times 1$ matrix. The next operation would take another $nk1$ operations. In total:

-* left to right is $njk + nk$ = $nk \cdot (1 + j)$.
+* left to right is $njk + nk$ = $nk \cdot (j + 1)$.
 * right to left is $jk + jn = j\cdot (k+n)$.

-When $j=k$, say,  we can compare and see the second is a factor less in terms of operations. This can be quite significant in higher dimensions, whereas the dimensions of calculus (where $n$ and $m$ are $3$ or less) it is not an issue.
+When $j=k$, say, we can compare and see the second is a factor less in terms of operations. This can be quite significant in higher dimensions, whereas the dimensions of calculus (where $n$ and $m$ are $3$ or less) it is not an issue.


 ##### Example
@@ -401,17 +403,15 @@ Whereas the relationship is changed when the first matrix is skinny and the last
@btime (A*B)*C setup=(A=rand(m,k); B=rand(k,j); C=rand(j,n));
 ```

-##### Example
+----

 In calculus, we have $n$ and $m$ are $1$,$2$,or $3$. But that need not be the case, especially if differentiation is over a parameter space.

-XXX insert example XXX
-
 ## Derivatives of matrix functions

 What is the the derivative of $f(A) = A^2$?

-The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size. This innocuous question isn't directly handled, here, by the Jacobian, which is defined for vector-valued functions $f:R^n \rightarrow R^m$.
+The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size.

 This derivative can be derived directly from the *product rule*:

@@ -422,7 +422,7 @@ df &= d(A^2) = d(AA)\\
 \end{align*}
 $$

-That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$ and not $2A\delta A$, as $A$ may not commute with $\delta A$.
+That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$. (This is not $2A\delta A$, as $A$ may not commute with $\delta A$.)

 ### Vectorization of a matrix

@@ -463,7 +463,7 @@ J = vec(f(A)).jacobian(vec(A)) # jacobian of f̃

 We do this via linear algebra first, then see a more elegant manner following the notes.

-A basic course in linear algebra shows that any linear operator on a finite vector space can be represented as a matrix. The basic idea is to represent what the operator does to each *basis* element and put these values as columns of the matrix.
+A course in linear algebra shows that any linear operator on a finite vector space can be represented as a matrix. The basic idea is to represent what the operator does to each *basis* element and put these values as columns of the matrix.


 In this $3 \times 3$ case, the linear operator works on an object with $9$ slots and returns an object with $9$ slots, so the matrix will be $9 \times 9$.