edits
This commit is contained in:
@@ -176,7 +176,7 @@ $$
|
||||
\begin{align*}
|
||||
df &= f(x + dx) - f(x)\\
|
||||
&= (x + dx)^T A (x + dx) - x^TAx \\
|
||||
&= \textcolor{blue}{x^TAx} + dx^TA x + \textcolor{blue}{x^TAx} + \textcolor{red}{dx^T A dx} - \textcolor{blue}{x^TAx}\\
|
||||
&= \textcolor{blue}{x^TAx} + dx^TA x + x^TAdx + \textcolor{red}{dx^T A dx} - \textcolor{blue}{x^TAx}\\
|
||||
&= dx^TA x + x^TAdx \\
|
||||
&= (dx^TAx)^T + x^TAdx \\
|
||||
&= x^T A^T dx + x^T A dx\\
|
||||
@@ -246,7 +246,7 @@ df &= f(x + dx) - f(x) \\
|
||||
\end{align*}
|
||||
$$
|
||||
|
||||
Comparing we get $f'(x)dx = (g'(x) + h'(x))[dx]$ or $f'(x) = g'(x) + h'(x)$.
|
||||
Comparing we get $f'(x)[dx] = (g'(x) + h'(x))[dx]$ or $f'(x) = g'(x) + h'(x)$. (The last two lines above show how the new linear operator $g'(x) + h'(x)$ is defined on a value, but adding the application for each.
|
||||
|
||||
The sum rule has the same derivation as was done with univariate, scalar functions. Similarly for the product rule.
|
||||
|
||||
@@ -256,10 +256,10 @@ $$
|
||||
\begin{align*}
|
||||
df &= f(x + dx) - f(x) \\
|
||||
&= g(x+dx)h(x + dx) - g(x) h(x)\\
|
||||
&= \left(g(x) + g'(x)dx\right)\left(h(x) + h'(x) dx\right) - \left(g(x) h(x)\right) \\
|
||||
&= \textcolor{blue}{g(x)h(x)} + g'(x) dx h(x) + g(x) h'(x) dx + \textcolor{red}{g'(x)dx h'(x) dx} - \textcolor{blue}{g(x) h(x)}\\
|
||||
&= g'(x)dxh(x) + g(x)h'(x) dx\\
|
||||
&= (g'(x)h(x) + g(x)h'(x)) dx
|
||||
&= \left(g(x) + g'(x)[dx]\right)\left(h(x) + h'(x) [dx]\right) - g(x) h(x) \\
|
||||
&= \textcolor{blue}{g(x)h(x)} + g'(x) [dx] h(x) + g(x) h'(x) [dx] + \textcolor{red}{g'(x)[dx] h'(x) [dx]} - \textcolor{blue}{g(x) h(x)}\\
|
||||
&= \left(g'(x)[dx]\right)h(x) + g(x)\left(h'(x) [dx]\right)\\
|
||||
&= dg h + g dh
|
||||
\end{align*}
|
||||
$$
|
||||
|
||||
@@ -369,16 +369,18 @@ Multiplying left to right (the first) is called reverse mode; multiplying right
|
||||
* if $f:R^n \rightarrow R^m$ has $n=1$ and $m$ much bigger than one, the it is faster to do right to left multiplication (many outputs than inputs)
|
||||
|
||||
|
||||
The basic idea comes down to the shape of the matrices. When $m=1$, the derviative is a product of matrices of size $n\times j$, $j\times k$, and $k \times 1$ yielding a matrix of size $n \times 1$ matching the function dimension. Matrix multiplication of an $m \times q$ matrix times a $q \times n$ matrix takes an order of $mqn$ operations.
|
||||
The reason comes down to the shape of the matrices. To see, we need to know that matrix multiplication of an $m \times q$ matrix times a $q \times n$ matrix takes an order of $mqn$ operations.
|
||||
|
||||
The operations involved in multiplication of left to right can be quantified. The first operation takes $njk$ operation leaving an $n\times k$ matrix, the next multiplication then takes another $nk1$ operations or $njk + nk$ together.
|
||||
When $m=1$, the derviative is a product of matrices of size $n\times j$, $j\times k$, and $k \times 1$ yielding a matrix of size $n \times 1$ matching the function dimension.
|
||||
|
||||
The operations involved in multiplication from left to right can be quantified. The first operation takes $njk$ operation leaving an $n\times k$ matrix, the next multiplication then takes another $nk1$ operations or $njk + nk$ together.
|
||||
|
||||
Whereas computing from the right to left is first $jk1$ operations leaving a $j \times 1$ matrix. The next operation would take another $nk1$ operations. In total:
|
||||
|
||||
* left to right is $njk + nk$ = $nk \cdot (1 + j)$.
|
||||
* left to right is $njk + nk$ = $nk \cdot (j + 1)$.
|
||||
* right to left is $jk + jn = j\cdot (k+n)$.
|
||||
|
||||
When $j=k$, say, we can compare and see the second is a factor less in terms of operations. This can be quite significant in higher dimensions, whereas the dimensions of calculus (where $n$ and $m$ are $3$ or less) it is not an issue.
|
||||
When $j=k$, say, we can compare and see the second is a factor less in terms of operations. This can be quite significant in higher dimensions, whereas the dimensions of calculus (where $n$ and $m$ are $3$ or less) it is not an issue.
|
||||
|
||||
|
||||
##### Example
|
||||
@@ -401,17 +403,15 @@ Whereas the relationship is changed when the first matrix is skinny and the last
|
||||
@btime (A*B)*C setup=(A=rand(m,k); B=rand(k,j); C=rand(j,n));
|
||||
```
|
||||
|
||||
##### Example
|
||||
----
|
||||
|
||||
In calculus, we have $n$ and $m$ are $1$,$2$,or $3$. But that need not be the case, especially if differentiation is over a parameter space.
|
||||
|
||||
XXX insert example XXX
|
||||
|
||||
## Derivatives of matrix functions
|
||||
|
||||
What is the the derivative of $f(A) = A^2$?
|
||||
|
||||
The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size. This innocuous question isn't directly handled, here, by the Jacobian, which is defined for vector-valued functions $f:R^n \rightarrow R^m$.
|
||||
The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size.
|
||||
|
||||
This derivative can be derived directly from the *product rule*:
|
||||
|
||||
@@ -422,7 +422,7 @@ df &= d(A^2) = d(AA)\\
|
||||
\end{align*}
|
||||
$$
|
||||
|
||||
That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$ and not $2A\delta A$, as $A$ may not commute with $\delta A$.
|
||||
That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$. (This is not $2A\delta A$, as $A$ may not commute with $\delta A$.)
|
||||
|
||||
### Vectorization of a matrix
|
||||
|
||||
@@ -463,7 +463,7 @@ J = vec(f(A)).jacobian(vec(A)) # jacobian of f̃
|
||||
|
||||
We do this via linear algebra first, then see a more elegant manner following the notes.
|
||||
|
||||
A basic course in linear algebra shows that any linear operator on a finite vector space can be represented as a matrix. The basic idea is to represent what the operator does to each *basis* element and put these values as columns of the matrix.
|
||||
A course in linear algebra shows that any linear operator on a finite vector space can be represented as a matrix. The basic idea is to represent what the operator does to each *basis* element and put these values as columns of the matrix.
|
||||
|
||||
|
||||
In this $3 \times 3$ case, the linear operator works on an object with $9$ slots and returns an object with $9$ slots, so the matrix will be $9 \times 9$.
|
||||
|
||||
Reference in New Issue
Block a user