WIP
This commit is contained in:
@@ -24,7 +24,7 @@ For a scalar function $f: R^n \rightarrow R$, the gradient of $f$, $\nabla{f}$,
|
||||
| $f: R\rightarrow R$ | univariate | familiar graph of function | $f$ |
|
||||
| $f: R\rightarrow R^m$ | vector-valued | space curve when n=2 or 3 | $\vec{r}$, $\vec{N}$ |
|
||||
| $f: R^n\rightarrow R$ | scalar | a surface when n=2 | $f$ |
|
||||
| $F: R^n\rightarrow R^n$ | vector field | a vector field when n=2 | $F$ |
|
||||
| $F: R^n\rightarrow R^n$ | vector field | a vector field when n=2, 3| $F$ |
|
||||
| $F: R^n\rightarrow R^m$ | multivariable | n=2,m=3 describes a surface | $F$, $\Phi$ |
|
||||
|
||||
|
||||
@@ -34,7 +34,9 @@ After an example where the use of a multivariable function is of necessity, we d
|
||||
## Vector fields
|
||||
|
||||
|
||||
We have seen that the gradient of a scalar function, $f:R^2 \rightarrow R$, takes a point in $R^2$ and associates a vector in $R^2$. As such $\nabla{f}:R^2 \rightarrow R^2$ is a vector field. A vector field can be visualized by sampling a region and representing the field at those points. The details, as previously mentioned, are in the `vectorfieldplot` function of `CalculusWithJulia`.
|
||||
We have seen that the gradient of a scalar function, $f:R^2 \rightarrow R$, takes a point in $R^2$ and associates a vector in $R^2$. As such $\nabla{f}:R^2 \rightarrow R^2$ is a vector field. A vector field is a vector-valued function from $R^n \rightarrow R^n$ for $n \geq 2$.
|
||||
|
||||
An input/output pair can be visualized by identifying the input values as a point, and the output as a vector visualized by anchoring the vector at the point. A vector field is a sampling of such pairs, usually taken over some ordered grid. The details, as previously mentioned, are in the `vectorfieldplot` function of `CalculusWithJulia`.
|
||||
|
||||
|
||||
```{julia}
|
||||
@@ -1071,7 +1073,7 @@ Adjusting $f$ to have a vanishing second -- but not third -- derivative at $c_0$
|
||||
As for $g_1$, we have by construction $g_1(b_0) = 0$. By differentiation we get a pattern for some constants $c_j = (j+1)\cdot(j+2)\cdots \cdot k$ with $c_k = 1$.
|
||||
|
||||
$$
|
||||
g^{(k)}(b) = k! \cdot \frac{f(a_0) - f(b)}{(a_0-b)^{k+1}} - \sum_{j=1}^k c_j \frac{f^{(j)}(b)}{(a_0 - b)^{k-j+1}}.
|
||||
g^{(k)}(b) = k! \cdot \frac{f(a_0) - f(b)}{(a_0-b)^{k+1}} - \sum_{j=1}^k c_j \frac{f^{(j)}(b)}{(a_0 - b)^{k-j+1}}.
|
||||
$$
|
||||
|
||||
Of note that when $f(a_0) = f(b_0) = 0$ that if $f^{(k)}(b_0)$ is the first non-vanishing derivative of $f$ at $b_0$ that $g^{(k)}(b_0) = f^{(k)}(b_0)/(b_0 - a_0)$ (they have the same sign).
|
||||
@@ -1146,6 +1148,30 @@ This handles most cases, but leaves the possibility that a function with infinit
|
||||
|
||||
## Questions
|
||||
|
||||
##### Question
|
||||
|
||||
```{julia}
|
||||
#| echo: false
|
||||
gr()
|
||||
p1 = vectorfieldplot((x,y) -> [x,y], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="A");
|
||||
p2 = vectorfieldplot((x,y) -> [x-y,x], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9,title="B");
|
||||
p3 = vectorfieldplot((x,y) -> [y,0], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="C");
|
||||
p4 = vectorfieldplot((x,y) -> [-y,x], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="D");
|
||||
plot(p1, p2, p3, p4; layout=[2,2])
|
||||
```
|
||||
|
||||
In the above figure, match the function with the vector field plot.
|
||||
|
||||
```{julia}
|
||||
#| echo: false
|
||||
plotly()
|
||||
matchq(("`F(x,y)=[-y ,x]`", "`F(x,y)=[y,0]`",
|
||||
"`F(x,y)=[x-y,x]`", "`F(x,y)=[x,y]`"),
|
||||
("A", "B", "C", "D"),
|
||||
(4,3,2,1);
|
||||
label="For each function mark the correct vector field plot"
|
||||
)
|
||||
```
|
||||
|
||||
###### Question
|
||||
|
||||
|
||||
@@ -126,6 +126,7 @@ window.Quarto = {
|
||||
|
||||
|
||||
<p>XXX Add in examples from paper XXX optimization? large number of parameters? ,…</p>
|
||||
<p>Mention numerator layout from https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions</p>
|
||||
<div class="callout callout-style-default callout-note callout-titled">
|
||||
<div class="callout-header d-flex align-content-center">
|
||||
<div class="callout-icon-container">
|
||||
@@ -140,18 +141,18 @@ Based on Bright, Edelman, and Johnson’s notes
|
||||
</div>
|
||||
</div>
|
||||
<p>We have seen several “derivatives” of a function, based on the number of inputs and outputs. The first one was for functions <span class="math inline">\(f: R \rightarrow R\)</span>.</p>
|
||||
<p>Then <span class="math inline">\(f\)</span> has a derivative at <span class="math inline">\(x\)</span> if this limit exists</p>
|
||||
<p>In this case, we saw that <span class="math inline">\(f\)</span> has a derivative at <span class="math inline">\(c\)</span> if this limit exists:</p>
|
||||
<p><span class="math display">\[
|
||||
\lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}.
|
||||
\lim_{h \rightarrow 0}\frac{f(c + h) - f(c)}{h}.
|
||||
\]</span></p>
|
||||
<p>The derivative of the function <span class="math inline">\(x\)</span> is this limit for a given <span class="math inline">\(x\)</span>. Common notation is:</p>
|
||||
<p>The derivative as a function of <span class="math inline">\(x\)</span> using this rule for any <span class="math inline">\(x\)</span> in the domain. Common notation is:</p>
|
||||
<p><span class="math display">\[
|
||||
f'(x) = \frac{dy}{dx} = \lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}
|
||||
\]</span></p>
|
||||
<p>(when the limit exists).</p>
|
||||
<p>This limit gets expressed in different ways:</p>
|
||||
<p>This limit gets re-expressed in different ways:</p>
|
||||
<ul>
|
||||
<li><p>linearization write <span class="math inline">\(f(x+\Delta x) - f(x) \approx f'(x)\Delta x\)</span>, where <span class="math inline">\(\delta x\)</span> is a small displacement from <span class="math inline">\(x\)</span>. The reason there isn’t equality is the unwritten higher order terms that vanish in a limit.</p></li>
|
||||
<li><p>linearization writes <span class="math inline">\(f(x+\Delta x) - f(x) \approx f'(x)\Delta x\)</span>, where <span class="math inline">\(\Delta x\)</span> is a small displacement from <span class="math inline">\(x\)</span>. The reason there isn’t equality is the unwritten higher order terms that vanish in a limit.</p></li>
|
||||
<li><p>Alternate limits. Another way of writing this is in terms of explicit smaller order terms:</p></li>
|
||||
</ul>
|
||||
<p><span class="math display">\[
|
||||
@@ -159,37 +160,39 @@ f'(x) = \frac{dy}{dx} = \lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}
|
||||
\]</span></p>
|
||||
<p>which means if we divide both sides by <span class="math inline">\(h\)</span> and take the limit, we will get <span class="math inline">\(0\)</span> on the right and the relationship on the left.</p>
|
||||
<ul>
|
||||
<li>Differential notation simply writes this as <span class="math inline">\(dy = f(x)dx\)</span>. More verbosely, we might write</li>
|
||||
<li>Differential notation simply writes this as <span class="math inline">\(dy = f'(x)dx\)</span>. Focusing on <span class="math inline">\(f\)</span> and not <span class="math inline">\(y=f(x)\)</span>, we might write</li>
|
||||
</ul>
|
||||
<p><span class="math display">\[
|
||||
df = f(x+dx) - f(x) = f'(x) dx.
|
||||
\]</span></p>
|
||||
<p>Here <span class="math inline">\(dx\)</span> is a differential, made rigorous by a limit, which hides the higher order terms.</p>
|
||||
<p>We will see all the derivatives encountered so far can be similarly expressed.</p>
|
||||
<p>In the above, <span class="math inline">\(df\)</span> and <span class="math inline">\(dx\)</span> are differentials, made rigorous by a limit, which hides the higher order terms.</p>
|
||||
<p>In these notes the limit has been defined, with suitable modification, for functions of vectors (multiple values) with scalar or vector outputs.</p>
|
||||
<p>For example, when <span class="math inline">\(f: R \rightarrow R^m\)</span> was a vector-valued function the derivative was defined similarly through a limit of <span class="math inline">\((f(t + \Delta t) - f(t))/{\Delta t}\)</span>, where each component needed to have a limit. This can be rewritten through <span class="math inline">\(f(t + dt) - f(t) = f'(t) dt\)</span>, again using differentials to avoid the higher order terms.</p>
|
||||
<p>When <span class="math inline">\(f: R^n \rightarrow R\)</span> is a scalar-valued function of a derivative, differentiability was defined by a gradient existing with <span class="math inline">\(f(c+h) - f(c) - \nabla{f}(c) \cdot h\)</span> being <span class="math inline">\(\mathscr{o}(\|h\|)\)</span>. In other words <span class="math inline">\(df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh\)</span>. The gradient has the same shape as <span class="math inline">\(c\)</span>, a column vector. If we take the row vector (e.g. <span class="math inline">\(f'(c) = \nabla{f}(c)^T\)</span>) then again we see <span class="math inline">\(df = f(c+dh) - f(c) = f'(c) dh\)</span>, where the last term uses matrix multiplication of a row vector times a column vector.</p>
|
||||
<p>When <span class="math inline">\(f: R^n \rightarrow R\)</span> is a scalar-valued function with vector inputs, differentiability was defined by a gradient existing with <span class="math inline">\(f(c+h) - f(c) - \nabla{f}(c) \cdot h\)</span> being <span class="math inline">\(\mathscr{o}(\|h\|)\)</span>. In other words <span class="math inline">\(df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh\)</span>. The gradient has the same shape as <span class="math inline">\(c\)</span>, a column vector. If we take the row vector (e.g. <span class="math inline">\(f'(c) = \nabla{f}(c)^T\)</span>) then again we see <span class="math inline">\(df = f(c+dh) - f(c) = f'(c) dh\)</span>, where the last term uses matrix multiplication of a row vector times a column vector.</p>
|
||||
<p>Finally, when <span class="math inline">\(f:R^n \rightarrow R^m\)</span>, the Jacobian was defined and characterized by <span class="math inline">\(\| f(x + dx) - f(x) - J_f(x)dx \|\)</span> being <span class="math inline">\(\mathscr{o}(\|dx\|)\)</span>. Again, we can express this through <span class="math inline">\(df = f(x + dx) - f(x) = f'(x)dx\)</span> where <span class="math inline">\(f'(x) = J_f(x)\)</span>.</p>
|
||||
<p>In writing <span class="math inline">\(df = f(x + dx) - f(x) = f'(x) dx\)</span> generically, some underlying facts are left implicit: <span class="math inline">\(dx\)</span> has the same shape as <span class="math inline">\(x\)</span> (so can be added); <span class="math inline">\(f'(x) dx\)</span> may mean usual multiplication or matrix multiplication; and there is an underlying concept of distance and size that allows the above to be rigorous. This may be an abolute value or a norm.</p>
|
||||
<p>Further, various differentiation rules apply such as the sum, product, and chain rule.</p>
|
||||
<p>Further, various differentiation rules apply such as the sum, product, and chain rules.</p>
|
||||
<p>The <span class="citation" data-cites="BrightEdelmanJohnson">@BrightEdelmanJohnson</span> notes cover differentiation of functions in this uniform manner and then extend the form by treating derivatives as <em>linear operators</em>.</p>
|
||||
<p>A <a href="https://en.wikipedia.org/wiki/Operator_(mathematics)">linear operator</a> is a mathematical object which satisfies</p>
|
||||
<p><span class="math display">\[
|
||||
f[\alpha v + \beta w] = \alpha f[v] + \beta f[w].
|
||||
\]</span></p>
|
||||
<p>where the <span class="math inline">\(\alpha\)</span> and <span class="math inline">\(\beta\)</span> are scalars, and <span class="math inline">\(v\)</span> and <span class="math inline">\(w\)</span> possibly not and come from a <em>vector space</em>. Regular multiplication and matrix multiplication are familiar linear operations, but there are many others.</p>
|
||||
<p>The referenced notes identify <span class="math inline">\(f'(x) dx\)</span> with <span class="math inline">\(f'(x)[dx]\)</span>, the latter emphasizing <span class="math inline">\(f'(x)\)</span> acts on <span class="math inline">\(dx\)</span> and the notation is not commutative (e.g., it is not <span class="math inline">\(dx f'(x)\)</span>).</p>
|
||||
<p>The referenced notes identify <span class="math inline">\(f'(x) dx\)</span> as <span class="math inline">\(f'(x)[dx]\)</span>, the latter emphasizing <span class="math inline">\(f'(x)\)</span> acts on <span class="math inline">\(dx\)</span> and the notation is not commutative (e.g., it is not <span class="math inline">\(dx f'(x)\)</span>). The use of <span class="math inline">\([]\)</span> is to indicate that <span class="math inline">\(f'(x)\)</span> “acts” on <span class="math inline">\(dx\)</span> in a linear manner. It may be multiplication, matrix multiplication, or something else. Parentheses are not used which might imply function application or multiplication.</p>
|
||||
<p>Linear operators are related to vector spaces.</p>
|
||||
<p>A <a href="https://en.wikipedia.org/wiki/Vector_space">vector space</a> is a set of mathematical objects which can be added together and also multiplied by a scalar. Vectors of similar size, as previously discussed, are the typical example, with vector addition and scalar multiplication previously discussed topics. Matrices of similar size (and some subclasses) also form a vector space. Additionally, many other set of objects form vector spaces. An example might be polynomial functions of degree <span class="math inline">\(n\)</span> or less; continuous functions, or functions with a certain number of derivatives.</p>
|
||||
<p>Take differentiable functions as an example, then the simplest derivative rules <span class="math inline">\([af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'\)</span> show the linearity of the derivative in this setting. This linearity is different from how the derivative is a linear operator on <span class="math inline">\(dx\)</span>.</p>
|
||||
<p>A vector space is described by a <em>basis</em> – a minimal set of vectors needed to describe the space, after consideration of linear combinations. For many vectors, this the set of special vectors with <span class="math inline">\(1\)</span> as one of the entries, and <span class="math inline">\(0\)</span> otherwise.</p>
|
||||
<p>A key fact about a basis is every vector in the vector space can be expressed <em>uniquely</em> as a linear combination of the basis vectors.</p>
|
||||
<p>Take differentiable functions as an example, then the simplest derivative rules <span class="math inline">\([af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'\)</span> show the linearity of the derivative in this setting.</p>
|
||||
<p>A finite vector space is described by a <em>basis</em> – a minimal set of vectors needed to describe the space, after consideration of linear combinations. For some typical vector spaces, this the set of special vectors with <span class="math inline">\(1\)</span> as one of the entries, and <span class="math inline">\(0\)</span> otherwise.</p>
|
||||
<p>A key fact about a basis for a finite vector space is every vector in the vector space can be expressed <em>uniquely</em> as a linear combination of the basis vectors.</p>
|
||||
<p>Vectors and matrices have properties that are generalizations of the real numbers. As vectors and matrices form vector spaces, the concept of addition of vectors and matrices is defined, as is scalar multiplication. Additionally, we have seen:</p>
|
||||
<ul>
|
||||
<li><p>The dot product between two vectors of the same length is defined easily (<span class="math inline">\(v\cdot w = \Sigma_i v_i w_i\)</span>). It is coupled with the length as <span class="math inline">\(\|v\|^2 = v\cdot v\)</span>.</p></li>
|
||||
<li><p>Matrix multiplication is defined for two properly sized matrices. If <span class="math inline">\(A\)</span> is <span class="math inline">\(m \times k\)</span> and <span class="math inline">\(B\)</span> is <span class="math inline">\(k \times n\)</span> then <span class="math inline">\(AB\)</span> is a <span class="math inline">\(m\times n\)</span> matrix with <span class="math inline">\((i,j)\)</span> term given by the dot product of the <span class="math inline">\(i\)</span>th row of <span class="math inline">\(A\)</span> (viewed as a vector) and the <span class="math inline">\(j\)</span>th column of <span class="math inline">\(B\)</span> (viewed as a vector). Matrix multiplication is associative but <em>not</em> commutative. (E.g. <span class="math inline">\((AB)C = A(BC)\)</span> but <span class="math inline">\(AB\)</span> and <span class="math inline">\(BA\)</span> need not be equal (or even defined, as the shapes may not match up).</p></li>
|
||||
<li><p>A square matrix <span class="math inline">\(A\)</span> has an <em>inverse</em> <span class="math inline">\(A^{-1}\)</span> if <span class="math inline">\(AA^{-1} = A^{-1}A = I\)</span>, where <span class="math inline">\(I\)</span> is the identity matrix (a matrix which is zero except on its diagonal entries which are all <span class="math inline">\(1\)</span>). Square matrices may or may not have an inverse. When they don’t the matrix is called singular.</p></li>
|
||||
<li><p>Matrix multiplication is defined for two properly sized matrices. If <span class="math inline">\(A\)</span> is <span class="math inline">\(m \times k\)</span> and <span class="math inline">\(B\)</span> is <span class="math inline">\(k \times n\)</span> then <span class="math inline">\(AB\)</span> is a <span class="math inline">\(m\times n\)</span> matrix with <span class="math inline">\((i,j)\)</span> term given by the dot product of the <span class="math inline">\(i\)</span>th row of <span class="math inline">\(A\)</span> (viewed as a vector) and the <span class="math inline">\(j\)</span>th column of <span class="math inline">\(B\)</span> (viewed as a vector). Matrix multiplication is associative but <em>not</em> commutative. (E.g. <span class="math inline">\((AB)C = A(BC)\)</span> but <span class="math inline">\(AB\)</span> and <span class="math inline">\(BA\)</span> need not be equal, or even defined, as the shapes may not match up).</p></li>
|
||||
<li><p>A square matrix <span class="math inline">\(A\)</span> has an <em>inverse</em> <span class="math inline">\(A^{-1}\)</span> if <span class="math inline">\(AA^{-1} = A^{-1}A = I\)</span>, where <span class="math inline">\(I\)</span> is the identity matrix (a matrix which is zero except on its diagonal entries, which are all <span class="math inline">\(1\)</span>). Square matrices may or may not have an inverse. A matrix without an inverse is called <em>singular</em>.</p></li>
|
||||
<li><p>Viewing a vector as a matrix is possible. The association is typically through a <em>column</em> vector.</p></li>
|
||||
<li><p>The transpose of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so <span class="math inline">\(v\cdot w = v^T w\)</span>, where we use a superscript <span class="math inline">\(T\)</span> for the transpose. The transpose of a product, is the product of the transposes – reversed: <span class="math inline">\((AB)^T = B^T A^T\)</span>; the tranpose of a transpose is an identity operation: <span class="math inline">\((A^T)^T = A\)</span>; the inverse of a transpose is the tranpose of the inverse: <span class="math inline">\((A^{-1})^T = (A^T){-1}\)</span>.</p></li>
|
||||
<li><p>The <em>transpose</em> of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so <span class="math inline">\(v\cdot w = v^T w\)</span>, where we use a superscript <span class="math inline">\(T\)</span> for the transpose. The transpose of a product, is the product of the transposes – reversed: <span class="math inline">\((AB)^T = B^T A^T\)</span>; the tranpose of a transpose is an identity operation: <span class="math inline">\((A^T)^T = A\)</span>; the inverse of a transpose is the tranpose of the inverse: <span class="math inline">\((A^{-1})^T = (A^T)^{-1}\)</span>.</p></li>
|
||||
<li><p>The <em>adjoint</em> of a matrix is related to the transpose, only complex conjugates are also taken.</p></li>
|
||||
<li><p>Matrices for which <span class="math inline">\(A = A^T\)</span> are called symmetric.</p></li>
|
||||
<li><p>A few of the operations on matrices are the transpose and the inverse. These return a matrix, when defined. There is also the determinant and the trace, which return a scalar from a matrix. The trace is just the sum of the diagonal; the determinant is more involved to compute, but was previously seen to have a relationship to the volume of a certain parallellpiped. There are a few other operations described in the following.</p></li>
|
||||
</ul>
|
||||
@@ -233,26 +236,26 @@ df &= f(x + dx) - f(x)\\
|
||||
\]</span></p>
|
||||
<p>The term <span class="math inline">\(dx^t A dx\)</span> is dropped, as it is higher order (goes to zero faster), it containing two <span class="math inline">\(dx\)</span> terms. In the second to last step, an identity operation (taking the transpose of the scalar quantity) is taken to simplify the algebra. Finally, as <span class="math inline">\(df = f'(x)[dx]\)</span> the identity of <span class="math inline">\(f'(x) = x^T(A^T+A)\)</span> is made, or taking transposes <span class="math inline">\(\nabla f = (A + A^T)x\)</span>.</p>
|
||||
<p>Compare the elegance above, with the component version, even though simplified, it still requires a specification of the size to carry the following out:</p>
|
||||
<div id="43173572" class="cell" data-execution_count="1">
|
||||
<div id="c963d045" class="cell" data-execution_count="1">
|
||||
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">using</span> <span class="bu">SymPy</span></span>
|
||||
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="pp">@syms</span> x[<span class="fl">1</span><span class="op">:</span><span class="fl">3</span>]<span class="op">::</span><span class="dt">real </span>A[<span class="fl">1</span><span class="op">:</span><span class="fl">3</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">3</span>]<span class="op">::</span><span class="dt">real</span></span>
|
||||
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>u <span class="op">=</span> x<span class="op">'</span> <span class="op">*</span> A <span class="op">*</span> x</span>
|
||||
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>grad_u <span class="op">=</span> [<span class="fu">diff</span>(u, xi) for xi <span class="kw">in</span> x]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="36">
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="2">
|
||||
<p><span class="math inline">\(\left[\begin{smallmatrix}2 A₁_{₁} x₁ + A₁_{₂} x₂ + A₁_{₃} x₃ + A₂_{₁} x₂ + A₃_{₁} x₃\\A₁_{₂} x₁ + A₂_{₁} x₁ + 2 A₂_{₂} x₂ + A₂_{₃} x₃ + A₃_{₂} x₃\\A₁_{₃} x₁ + A₂_{₃} x₂ + A₃_{₁} x₁ + A₃_{₂} x₂ + 2 A₃_{₃} x₃\end{smallmatrix}\right]\)</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>Compare to the formula for the gradient just derived:</p>
|
||||
<div id="46b8de64" class="cell" data-execution_count="2">
|
||||
<div id="2c247a7b" class="cell" data-execution_count="2">
|
||||
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>grad_u_1 <span class="op">=</span> (A <span class="op">+</span> A<span class="op">'</span>)<span class="op">*</span>x</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="37">
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="3">
|
||||
<p><span class="math inline">\(\left[\begin{smallmatrix}2 A₁_{₁} x₁ + x₂ \left(A₁_{₂} + A₂_{₁}\right) + x₃ \left(A₁_{₃} + A₃_{₁}\right)\\2 A₂_{₂} x₂ + x₁ \left(A₁_{₂} + A₂_{₁}\right) + x₃ \left(A₂_{₃} + A₃_{₂}\right)\\2 A₃_{₃} x₃ + x₁ \left(A₁_{₃} + A₃_{₁}\right) + x₂ \left(A₂_{₃} + A₃_{₂}\right)\end{smallmatrix}\right]\)</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>The two are, of course, equal</p>
|
||||
<div id="96ea196c" class="cell" data-execution_count="3">
|
||||
<div id="07d2e734" class="cell" data-execution_count="3">
|
||||
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">all</span>(a <span class="op">==</span> b <span class="cf">for</span> (a,b) <span class="op">∈</span> <span class="fu">zip</span>(grad_u, grad_u_1))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display" data-execution_count="38">
|
||||
<div class="cell-output cell-output-display" data-execution_count="4">
|
||||
<pre><code>true</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
@@ -384,24 +387,24 @@ f' = (a'b')c' \text{ or } f' = a'(b'c')
|
||||
<section id="example-1" class="level5">
|
||||
<h5 class="anchored" data-anchor-id="example-1">Example</h5>
|
||||
<p>Using the <code>BenchmarkTools</code> package, we can check the time to compute various products:</p>
|
||||
<div id="6f49ad74" class="cell" data-execution_count="4">
|
||||
<div id="af27cda0" class="cell" data-execution_count="4">
|
||||
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">using</span> <span class="bu">BenchmarkTools</span></span>
|
||||
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>n,j,k,m <span class="op">=</span> <span class="fl">20</span>,<span class="fl">15</span>,<span class="fl">10</span>,<span class="fl">1</span></span>
|
||||
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="pp">@btime</span> <span class="fu">A*</span>(B<span class="op">*</span>C) setup<span class="op">=</span>(A<span class="op">=</span><span class="fu">rand</span>(n,j);B<span class="op">=</span><span class="fu">rand</span>(j,k); C<span class="op">=</span><span class="fu">rand</span>(k,m));</span>
|
||||
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="pp">@btime</span> (A<span class="op">*</span>B)<span class="op">*</span>C setup<span class="op">=</span>(A<span class="op">=</span><span class="fu">rand</span>(n,j);B<span class="op">=</span><span class="fu">rand</span>(j,k); C<span class="op">=</span><span class="fu">rand</span>(k,m));</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-stdout">
|
||||
<pre><code> 420.814 ns (4 allocations: 432 bytes)
|
||||
702.678 ns (4 allocations: 1.88 KiB)</code></pre>
|
||||
<pre><code> 452.474 ns (4 allocations: 432 bytes)
|
||||
837.192 ns (4 allocations: 1.88 KiB)</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
<p>The latter computation is about 1.5 times slower.</p>
|
||||
<p>Whereas the relationship is changed when the first matrix is skinny and the last is not:</p>
|
||||
<div id="818fc075" class="cell" data-execution_count="5">
|
||||
<div id="786d1609" class="cell" data-execution_count="5">
|
||||
<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="pp">@btime</span> <span class="fu">A*</span>(B<span class="op">*</span>C) setup<span class="op">=</span>(A<span class="op">=</span><span class="fu">rand</span>(m,k);B<span class="op">=</span><span class="fu">rand</span>(k,j); C<span class="op">=</span><span class="fu">rand</span>(j,n));</span>
|
||||
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="pp">@btime</span> (A<span class="op">*</span>B)<span class="op">*</span>C setup<span class="op">=</span>(A<span class="op">=</span><span class="fu">rand</span>(m,k);B<span class="op">=</span><span class="fu">rand</span>(k,j); C<span class="op">=</span><span class="fu">rand</span>(j,n));</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-stdout">
|
||||
<pre><code> 901.488 ns (4 allocations: 1.88 KiB)
|
||||
623.468 ns (4 allocations: 432 bytes)</code></pre>
|
||||
<pre><code> 1.098 μs (4 allocations: 1.88 KiB)
|
||||
779.020 ns (4 allocations: 432 bytes)</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
@@ -415,30 +418,31 @@ f' = (a'b')c' \text{ or } f' = a'(b'c')
|
||||
<section id="derivatives-of-matrix-functions" class="level2">
|
||||
<h2 class="anchored" data-anchor-id="derivatives-of-matrix-functions">Derivatives of matrix functions</h2>
|
||||
<p>What is the the derivative of <span class="math inline">\(f(A) = A^2\)</span>?</p>
|
||||
<p>The function <span class="math inline">\(f\)</span> takes a <span class="math inline">\(n\times n\)</span> matrix and returns a matrix of the same size. This innocuous question isn’t directly handled by the Jacobian, which is defined for vector valued function <span class="math inline">\(f:R^n \rightarrow R^m\)</span>.</p>
|
||||
<p>The function <span class="math inline">\(f\)</span> takes a <span class="math inline">\(n\times n\)</span> matrix and returns a matrix of the same size. This innocuous question isn’t directly handled, here, by the Jacobian, which is defined for vector-valued functions <span class="math inline">\(f:R^n \rightarrow R^m\)</span>.</p>
|
||||
<p>This derivative can be derived directly from the <em>product rule</em>:</p>
|
||||
<p><span class="math display">\[
|
||||
\begin{align*}
|
||||
f(A) &= [AA]'\\
|
||||
f'(A) &= [AA]'\\
|
||||
&= A dA + dA A
|
||||
\end{align*}
|
||||
\]</span></p>
|
||||
<p>That is <span class="math inline">\(f'(A)\)</span> is the operator <span class="math inline">\(f'(A)[\delta A] = A \delta A + \delta A A\)</span> and not <span class="math inline">\(2A\delta A\)</span>, as <span class="math inline">\(A\)</span> may not commute with <span class="math inline">\(\delta A\)</span>.</p>
|
||||
<p>XXX THIS ISN”T EVEN RIGHT</p>
|
||||
<section id="vectorization-of-a-matrix" class="level3">
|
||||
<h3 class="anchored" data-anchor-id="vectorization-of-a-matrix">Vectorization of a matrix</h3>
|
||||
<p>Alternatively, we can identify <span class="math inline">\(A\)</span> through its components, as a vector in <span class="math inline">\(R^{n^2}\)</span> and then leverage the Jacobian.</p>
|
||||
<p>One such identification is vectorization – consecutively stacking the column vectors into a vector. In <code>Julia</code> the <code>vec</code> function does this operation:</p>
|
||||
<div id="ca0ee93f" class="cell" data-execution_count="6">
|
||||
<div id="9743defb" class="cell" data-execution_count="6">
|
||||
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="pp">@syms</span> A[<span class="fl">1</span><span class="op">:</span><span class="fl">2</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">2</span>]</span>
|
||||
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="fu">vec</span>(A)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="41">
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="7">
|
||||
<p><span class="math inline">\(\left[\begin{smallmatrix}A₁_{₁}\\A₂_{₁}\\A₁_{₂}\\A₂_{₂}\end{smallmatrix}\right]\)</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>The stacking by column follows how <code>Julia</code> stores matrices and how <code>Julia</code> references a matrices entries by linear index:</p>
|
||||
<div id="f39f1774" class="cell" data-execution_count="7">
|
||||
<div id="35d92b4b" class="cell" data-execution_count="7">
|
||||
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">vec</span>(A) <span class="op">==</span> [A[i] for i <span class="kw">in</span> <span class="fu">eachindex</span>(A)]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display" data-execution_count="42">
|
||||
<div class="cell-output cell-output-display" data-execution_count="8">
|
||||
<pre><code>true</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
@@ -447,11 +451,11 @@ f(A) &= [AA]'\\
|
||||
\tilde{f}(\text{vec}(A)) = \text{vec}(f(A))
|
||||
\]</span></p>
|
||||
<p>We use <code>SymPy</code> to compute the Jacobian of this vector valued function.</p>
|
||||
<div id="6d85eb25" class="cell" data-execution_count="8">
|
||||
<div id="5074252c" class="cell" data-execution_count="8">
|
||||
<div class="sourceCode cell-code" id="cb12"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="pp">@syms</span> A[<span class="fl">1</span><span class="op">:</span><span class="fl">3</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">3</span>]<span class="op">::</span><span class="dt">real</span></span>
|
||||
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="fu">f</span>(x) <span class="op">=</span> x<span class="op">^</span><span class="fl">2</span></span>
|
||||
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>J <span class="op">=</span> <span class="fu">vec</span>(<span class="fu">f</span>(A)).<span class="fu">jacobian</span>(<span class="fu">vec</span>(A)) <span class="co"># jacobian of f̃</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="43">
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="9">
|
||||
<p><span class="math inline">\(\left[\begin{smallmatrix}2 A₁_{₁} & A₁_{₂} & A₁_{₃} & A₂_{₁} & 0 & 0 & A₃_{₁} & 0 & 0\\A₂_{₁} & A₁_{₁} + A₂_{₂} & A₂_{₃} & 0 & A₂_{₁} & 0 & 0 & A₃_{₁} & 0\\A₃_{₁} & A₃_{₂} & A₁_{₁} + A₃_{₃} & 0 & 0 & A₂_{₁} & 0 & 0 & A₃_{₁}\\A₁_{₂} & 0 & 0 & A₁_{₁} + A₂_{₂} & A₁_{₂} & A₁_{₃} & A₃_{₂} & 0 & 0\\0 & A₁_{₂} & 0 & A₂_{₁} & 2 A₂_{₂} & A₂_{₃} & 0 & A₃_{₂} & 0\\0 & 0 & A₁_{₂} & A₃_{₁} & A₃_{₂} & A₂_{₂} + A₃_{₃} & 0 & 0 & A₃_{₂}\\A₁_{₃} & 0 & 0 & A₂_{₃} & 0 & 0 & A₁_{₁} + A₃_{₃} & A₁_{₂} & A₁_{₃}\\0 & A₁_{₃} & 0 & 0 & A₂_{₃} & 0 & A₂_{₁} & A₂_{₂} + A₃_{₃} & A₂_{₃}\\0 & 0 & A₁_{₃} & 0 & 0 & A₂_{₃} & A₃_{₁} & A₃_{₂} & 2 A₃_{₃}\end{smallmatrix}\right]\)</span></p>
|
||||
</div>
|
||||
</div>
|
||||
@@ -459,10 +463,10 @@ f(A) &= [AA]'\\
|
||||
<p>A basic course in linear algebra shows that any linear operator on a finite vector space can be represented as a matrix. The basic idea is to represent what the operator does to each <em>basis</em> element and put these values as columns of the matrix.</p>
|
||||
<p>In this <span class="math inline">\(3 \times 3\)</span> case, the linear operator works on an object with <span class="math inline">\(9\)</span> slots and returns an object with <span class="math inline">\(9\)</span> slots, so the matrix will be <span class="math inline">\(9 \times 9\)</span>.</p>
|
||||
<p>The basis elements are simply the matrices with a <span class="math inline">\(1\)</span> in spot <span class="math inline">\((i,j)\)</span> and zero elsewhere. Here we generate them through a function:</p>
|
||||
<div id="ff65c6df" class="cell" data-execution_count="9">
|
||||
<div id="5b9348a2" class="cell" data-execution_count="9">
|
||||
<div class="sourceCode cell-code" id="cb13"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="fu">basis</span>(i,j,A) <span class="op">=</span> (b<span class="op">=</span><span class="fu">zeros</span>(<span class="dt">Int</span>, <span class="fu">size</span>(A)<span class="op">...</span>); b[i,j] <span class="op">=</span> <span class="fl">1</span>; b)</span>
|
||||
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>JJ <span class="op">=</span> [<span class="fu">vec</span>(<span class="fu">basis</span>(i,j,A)<span class="op">*</span>A <span class="op">+</span> <span class="fu">A*basis</span>(i,j,A)) for j <span class="kw">in</span> <span class="fl">1</span><span class="op">:</span><span class="fl">3</span> for i <span class="kw">in</span> <span class="fl">1</span><span class="op">:</span><span class="fl">3</span>]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display" data-execution_count="44">
|
||||
<div class="cell-output cell-output-display" data-execution_count="10">
|
||||
<pre><code>9-element Vector{Vector{Sym{PyCall.PyObject}}}:
|
||||
[2*A₁_₁, A₂_₁, A₃_₁, A₁_₂, 0, 0, A₁_₃, 0, 0]
|
||||
[A₁_₂, A₁_₁ + A₂_₂, A₃_₂, 0, A₁_₂, 0, 0, A₁_₃, 0]
|
||||
@@ -477,16 +481,16 @@ f(A) &= [AA]'\\
|
||||
</div>
|
||||
<p>The elements of <code>JJ</code> show the representation of each of the <span class="math inline">\(9\)</span> basis elements under the linear transformation.</p>
|
||||
<p>To construct the matrix representing the linear operator, we need to concatenate these horizontally as column vectors</p>
|
||||
<div id="1b5dd766" class="cell" data-execution_count="10">
|
||||
<div id="52d7fb12" class="cell" data-execution_count="10">
|
||||
<div class="sourceCode cell-code" id="cb15"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>JJ <span class="op">=</span> <span class="fu">hcat</span>(JJ<span class="op">...</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="45">
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="11">
|
||||
<p><span class="math inline">\(\left[\begin{smallmatrix}2 A₁_{₁} & A₁_{₂} & A₁_{₃} & A₂_{₁} & 0 & 0 & A₃_{₁} & 0 & 0\\A₂_{₁} & A₁_{₁} + A₂_{₂} & A₂_{₃} & 0 & A₂_{₁} & 0 & 0 & A₃_{₁} & 0\\A₃_{₁} & A₃_{₂} & A₁_{₁} + A₃_{₃} & 0 & 0 & A₂_{₁} & 0 & 0 & A₃_{₁}\\A₁_{₂} & 0 & 0 & A₁_{₁} + A₂_{₂} & A₁_{₂} & A₁_{₃} & A₃_{₂} & 0 & 0\\0 & A₁_{₂} & 0 & A₂_{₁} & 2 A₂_{₂} & A₂_{₃} & 0 & A₃_{₂} & 0\\0 & 0 & A₁_{₂} & A₃_{₁} & A₃_{₂} & A₂_{₂} + A₃_{₃} & 0 & 0 & A₃_{₂}\\A₁_{₃} & 0 & 0 & A₂_{₃} & 0 & 0 & A₁_{₁} + A₃_{₃} & A₁_{₂} & A₁_{₃}\\0 & A₁_{₃} & 0 & 0 & A₂_{₃} & 0 & A₂_{₁} & A₂_{₂} + A₃_{₃} & A₂_{₃}\\0 & 0 & A₁_{₃} & 0 & 0 & A₂_{₃} & A₃_{₁} & A₃_{₂} & 2 A₃_{₃}\end{smallmatrix}\right]\)</span></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>The matrix <span class="math inline">\(JJ\)</span> is identical to <span class="math inline">\(J\)</span>, above:</p>
|
||||
<div id="ca842927" class="cell" data-execution_count="11">
|
||||
<div id="c81dcb86" class="cell" data-execution_count="11">
|
||||
<div class="sourceCode cell-code" id="cb16"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="fu">all</span>(j <span class="op">==</span> jj <span class="cf">for</span> (j, jj) <span class="kw">in</span> <span class="fu">zip</span>(J, JJ))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display" data-execution_count="46">
|
||||
<div class="cell-output cell-output-display" data-execution_count="12">
|
||||
<pre><code>true</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
@@ -509,10 +513,10 @@ a_{m1}B & a_{m2}B & \cdots & a_{mn}B
|
||||
\end{bmatrix}
|
||||
\]</span></p>
|
||||
<p>The function <code>kron</code> forms this product:</p>
|
||||
<div id="c562b2e6" class="cell" data-execution_count="12">
|
||||
<div id="641a155b" class="cell" data-execution_count="12">
|
||||
<div class="sourceCode cell-code" id="cb18"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="pp">@syms</span> A[<span class="fl">1</span><span class="op">:</span><span class="fl">2</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">3</span>] B[<span class="fl">1</span><span class="op">:</span><span class="fl">3</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">4</span>]</span>
|
||||
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="fu">kron</span>(A, B) <span class="co"># same as hcat((vcat((A[i,j]*B for i in 1:2)...) for j in 1:3)...)</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="47">
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="13">
|
||||
<p><span class="math inline">\(\left[\begin{smallmatrix}A₁_{₁} B₁_{₁} & A₁_{₁} B₁_{₂} & A₁_{₁} B₁_{₃} & A₁_{₁} B₁_{₄} & A₁_{₂} B₁_{₁} & A₁_{₂} B₁_{₂} & A₁_{₂} B₁_{₃} & A₁_{₂} B₁_{₄} & A₁_{₃} B₁_{₁} & A₁_{₃} B₁_{₂} & A₁_{₃} B₁_{₃} & A₁_{₃} B₁_{₄}\\A₁_{₁} B₂_{₁} & A₁_{₁} B₂_{₂} & A₁_{₁} B₂_{₃} & A₁_{₁} B₂_{₄} & A₁_{₂} B₂_{₁} & A₁_{₂} B₂_{₂} & A₁_{₂} B₂_{₃} & A₁_{₂} B₂_{₄} & A₁_{₃} B₂_{₁} & A₁_{₃} B₂_{₂} & A₁_{₃} B₂_{₃} & A₁_{₃} B₂_{₄}\\A₁_{₁} B₃_{₁} & A₁_{₁} B₃_{₂} & A₁_{₁} B₃_{₃} & A₁_{₁} B₃_{₄} & A₁_{₂} B₃_{₁} & A₁_{₂} B₃_{₂} & A₁_{₂} B₃_{₃} & A₁_{₂} B₃_{₄} & A₁_{₃} B₃_{₁} & A₁_{₃} B₃_{₂} & A₁_{₃} B₃_{₃} & A₁_{₃} B₃_{₄}\\A₂_{₁} B₁_{₁} & A₂_{₁} B₁_{₂} & A₂_{₁} B₁_{₃} & A₂_{₁} B₁_{₄} & A₂_{₂} B₁_{₁} & A₂_{₂} B₁_{₂} & A₂_{₂} B₁_{₃} & A₂_{₂} B₁_{₄} & A₂_{₃} B₁_{₁} & A₂_{₃} B₁_{₂} & A₂_{₃} B₁_{₃} & A₂_{₃} B₁_{₄}\\A₂_{₁} B₂_{₁} & A₂_{₁} B₂_{₂} & A₂_{₁} B₂_{₃} & A₂_{₁} B₂_{₄} & A₂_{₂} B₂_{₁} & A₂_{₂} B₂_{₂} & A₂_{₂} B₂_{₃} & A₂_{₂} B₂_{₄} & A₂_{₃} B₂_{₁} & A₂_{₃} B₂_{₂} & A₂_{₃} B₂_{₃} & A₂_{₃} B₂_{₄}\\A₂_{₁} B₃_{₁} & A₂_{₁} B₃_{₂} & A₂_{₁} B₃_{₃} & A₂_{₁} B₃_{₄} & A₂_{₂} B₃_{₁} & A₂_{₂} B₃_{₂} & A₂_{₂} B₃_{₃} & A₂_{₂} B₃_{₄} & A₂_{₃} B₃_{₁} & A₂_{₃} B₃_{₂} & A₂_{₃} B₃_{₃} & A₂_{₃} B₃_{₄}\end{smallmatrix}\right]\)</span></p>
|
||||
</div>
|
||||
</div>
|
||||
@@ -533,11 +537,11 @@ a_{m1}B & a_{m2}B & \cdots & a_{mn}B
|
||||
<p>Appropriate sizes for <span class="math inline">\(A\)</span>, <span class="math inline">\(B\)</span>, and <span class="math inline">\(C\)</span> are determined by the various products in <span class="math inline">\(BCA^T\)</span>.</p>
|
||||
<p>If <span class="math inline">\(A\)</span> is <span class="math inline">\(m \times n\)</span> and <span class="math inline">\(B\)</span> is <span class="math inline">\(r \times s\)</span>, then since <span class="math inline">\(BC\)</span> is defined, <span class="math inline">\(C\)</span> has <span class="math inline">\(s\)</span> rows, and since <span class="math inline">\(CA^T\)</span> is defined, <span class="math inline">\(C\)</span> must have <span class="math inline">\(n\)</span> columns, as <span class="math inline">\(A^T\)</span> is <span class="math inline">\(n \times m\)</span>, so <span class="math inline">\(C\)</span> must be <span class="math inline">\(s\times n\)</span>. Checking this is correct on the other side, <span class="math inline">\(A \times B\)</span> would be size <span class="math inline">\(mr \times ns\)</span> and <span class="math inline">\(\vec{C}\)</span> would be size <span class="math inline">\(sn\)</span>, so that product works, size wise.</p>
|
||||
<p>The referred to notes have an explanation for this formula, but we confirm with an example with <span class="math inline">\(m=n-2\)</span>, <span class="math inline">\(r=s=3\)</span>:</p>
|
||||
<div id="d9cdbb04" class="cell" data-execution_count="13">
|
||||
<div id="cf055fff" class="cell" data-execution_count="13">
|
||||
<div class="sourceCode cell-code" id="cb19"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="pp">@syms</span> A[<span class="fl">1</span><span class="op">:</span><span class="fl">2</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">2</span>]<span class="op">::</span><span class="dt">real </span>B[<span class="fl">1</span><span class="op">:</span><span class="fl">3</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">3</span>]<span class="op">::</span><span class="dt">real </span>C[<span class="fl">1</span><span class="op">:</span><span class="fl">3</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">2</span>]<span class="op">::</span><span class="dt">real</span></span>
|
||||
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>L, R <span class="op">=</span> <span class="fu">kron</span>(A,B)<span class="fu">*vec</span>(C), <span class="fu">vec</span>(B<span class="op">*</span>C<span class="op">*</span>A<span class="op">'</span>)</span>
|
||||
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a><span class="fu">all</span>(l <span class="op">==</span> r <span class="cf">for</span> (l, r) <span class="op">∈</span> <span class="fu">zip</span>(L, R))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display" data-execution_count="48">
|
||||
<div class="cell-output cell-output-display" data-execution_count="14">
|
||||
<pre><code>true</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
@@ -557,13 +561,13 @@ a_{m1}B & a_{m2}B & \cdots & a_{mn}B
|
||||
\left((I \otimes A) + (A^T \otimes I)\right) \text{vec}(dA)
|
||||
\]</span></p>
|
||||
<p>We should then get the Jacobian we computed from the following:</p>
|
||||
<div id="67dde440" class="cell" data-execution_count="14">
|
||||
<div id="3e891953" class="cell" data-execution_count="14">
|
||||
<div class="sourceCode cell-code" id="cb21"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="pp">@syms</span> A[<span class="fl">1</span><span class="op">:</span><span class="fl">3</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">3</span>]<span class="op">::</span><span class="dt">real</span></span>
|
||||
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a><span class="im">using</span> <span class="bu">LinearAlgebra</span>: I</span>
|
||||
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>J <span class="op">=</span> <span class="fu">vec</span>(A<span class="op">^</span><span class="fl">2</span>).<span class="fu">jacobian</span>(<span class="fu">vec</span>(A))</span>
|
||||
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>JJ <span class="op">=</span> <span class="fu">kron</span>(<span class="fu">I</span>(<span class="fl">3</span>), A) <span class="op">+</span> <span class="fu">kron</span>(A<span class="op">'</span>, <span class="fu">I</span>(<span class="fl">3</span>))</span>
|
||||
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a><span class="fu">all</span>(j <span class="op">==</span> jj <span class="cf">for</span> (j,jj) <span class="kw">in</span> <span class="fu">zip</span>(J,JJ))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display" data-execution_count="49">
|
||||
<div class="cell-output cell-output-display" data-execution_count="15">
|
||||
<pre><code>true</code></pre>
|
||||
</div>
|
||||
</div>
|
||||
@@ -622,11 +626,11 @@ C_{ij},
|
||||
\]</span></p>
|
||||
<p>This agrees through a formula to compute the inverse of a matrix through its cofactor matrix divided by its determinant.</p>
|
||||
<p>That the trace gets involved, can be seen from this computation, which shows the only first-order terms are from the diagonal sum:</p>
|
||||
<div id="f7627a0b" class="cell" data-execution_count="15">
|
||||
<div id="2d53b599" class="cell" data-execution_count="15">
|
||||
<div class="sourceCode cell-code" id="cb23"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="im">using</span> <span class="bu">LinearAlgebra</span></span>
|
||||
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a><span class="pp">@syms</span> dA[<span class="fl">1</span><span class="op">:</span><span class="fl">2</span>, <span class="fl">1</span><span class="op">:</span><span class="fl">2</span>]</span>
|
||||
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a><span class="fu">det</span>(I <span class="op">+</span> dA) <span class="op">-</span> <span class="fu">det</span>(I)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="50">
|
||||
<div class="cell-output cell-output-display cell-output-markdown" data-execution_count="16">
|
||||
<p><span class="math inline">\(dA₁_{₁} dA₂_{₂} + dA₁_{₁} - dA₁_{₂} dA₂_{₁} + dA₂_{₂}\)</span></p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -3,6 +3,9 @@
|
||||
XXX Add in examples from paper XXX
|
||||
optimization? large number of parameters? ,...
|
||||
|
||||
|
||||
Mention numerator layout from https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions
|
||||
|
||||
::: {.callout-note}
|
||||
## Based on Bright, Edelman, and Johnson's notes
|
||||
|
||||
@@ -12,13 +15,13 @@ This section samples material from the notes [Matrix Calculus (for Machine Learn
|
||||
|
||||
We have seen several "derivatives" of a function, based on the number of inputs and outputs. The first one was for functions $f: R \rightarrow R$.
|
||||
|
||||
Then $f$ has a derivative at $x$ if this limit exists
|
||||
In this case, we saw that $f$ has a derivative at $c$ if this limit exists:
|
||||
|
||||
$$
|
||||
\lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}.
|
||||
\lim_{h \rightarrow 0}\frac{f(c + h) - f(c)}{h}.
|
||||
$$
|
||||
|
||||
The derivative of the function $x$ is this limit for a given $x$. Common notation is:
|
||||
The derivative as a function of $x$ using this rule for any $x$ in the domain. Common notation is:
|
||||
|
||||
$$
|
||||
f'(x) = \frac{dy}{dx} = \lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}
|
||||
@@ -27,9 +30,9 @@ $$
|
||||
(when the limit exists).
|
||||
|
||||
|
||||
This limit gets expressed in different ways:
|
||||
This limit gets re-expressed in different ways:
|
||||
|
||||
* linearization write $f(x+\Delta x) - f(x) \approx f'(x)\Delta x$, where $\delta x$ is a small displacement from $x$. The reason there isn't equality is the unwritten higher order terms that vanish in a limit.
|
||||
* linearization writes $f(x+\Delta x) - f(x) \approx f'(x)\Delta x$, where $\Delta x$ is a small displacement from $x$. The reason there isn't equality is the unwritten higher order terms that vanish in a limit.
|
||||
|
||||
* Alternate limits. Another way of writing this is in terms of explicit smaller order terms:
|
||||
|
||||
@@ -39,13 +42,15 @@ $$
|
||||
|
||||
which means if we divide both sides by $h$ and take the limit, we will get $0$ on the right and the relationship on the left.
|
||||
|
||||
* Differential notation simply writes this as $dy = f(x)dx$. More verbosely, we might write
|
||||
* Differential notation simply writes this as $dy = f'(x)dx$. Focusing on $f$ and not $y=f(x)$, we might write
|
||||
|
||||
$$
|
||||
df = f(x+dx) - f(x) = f'(x) dx.
|
||||
$$
|
||||
|
||||
Here $dx$ is a differential, made rigorous by a limit, which hides the higher order terms.
|
||||
We will see all the derivatives encountered so far can be similarly expressed.
|
||||
|
||||
In the above, $df$ and $dx$ are differentials, made rigorous by a limit, which hides the higher order terms.
|
||||
|
||||
|
||||
In these notes the limit has been defined, with suitable modification, for functions of vectors (multiple values) with scalar or vector outputs.
|
||||
@@ -53,7 +58,7 @@ In these notes the limit has been defined, with suitable modification, for funct
|
||||
|
||||
For example, when $f: R \rightarrow R^m$ was a vector-valued function the derivative was defined similarly through a limit of $(f(t + \Delta t) - f(t))/{\Delta t}$, where each component needed to have a limit. This can be rewritten through $f(t + dt) - f(t) = f'(t) dt$, again using differentials to avoid the higher order terms.
|
||||
|
||||
When $f: R^n \rightarrow R$ is a scalar-valued function of a derivative, differentiability was defined by a gradient existing with $f(c+h) - f(c) - \nabla{f}(c) \cdot h$ being $\mathscr{o}(\|h\|)$. In other words $df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh$. The gradient has the same shape as $c$, a column vector. If we take the row vector (e.g. $f'(c) = \nabla{f}(c)^T$) then again we see $df = f(c+dh) - f(c) = f'(c) dh$, where the last term uses matrix multiplication of a row vector times a column vector.
|
||||
When $f: R^n \rightarrow R$ is a scalar-valued function with vector inputs, differentiability was defined by a gradient existing with $f(c+h) - f(c) - \nabla{f}(c) \cdot h$ being $\mathscr{o}(\|h\|)$. In other words $df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh$. The gradient has the same shape as $c$, a column vector. If we take the row vector (e.g. $f'(c) = \nabla{f}(c)^T$) then again we see $df = f(c+dh) - f(c) = f'(c) dh$, where the last term uses matrix multiplication of a row vector times a column vector.
|
||||
|
||||
Finally, when $f:R^n \rightarrow R^m$, the Jacobian was defined and characterized by
|
||||
$\| f(x + dx) - f(x) - J_f(x)dx \|$ being $\mathscr{o}(\|dx\|)$. Again, we can express this through $df = f(x + dx) - f(x) = f'(x)dx$ where $f'(x) = J_f(x)$.
|
||||
@@ -61,7 +66,7 @@ $\| f(x + dx) - f(x) - J_f(x)dx \|$ being $\mathscr{o}(\|dx\|)$. Again, we can e
|
||||
|
||||
In writing $df = f(x + dx) - f(x) = f'(x) dx$ generically, some underlying facts are left implicit: $dx$ has the same shape as $x$ (so can be added); $f'(x) dx$ may mean usual multiplication or matrix multiplication; and there is an underlying concept of distance and size that allows the above to be rigorous. This may be an abolute value or a norm.
|
||||
|
||||
Further, various differentiation rules apply such as the sum, product, and chain rule.
|
||||
Further, various differentiation rules apply such as the sum, product, and chain rules.
|
||||
|
||||
|
||||
The @BrightEdelmanJohnson notes cover differentiation of functions in this uniform manner and then extend the form by treating derivatives as *linear operators*.
|
||||
@@ -75,29 +80,35 @@ $$
|
||||
|
||||
where the $\alpha$ and $\beta$ are scalars, and $v$ and $w$ possibly not and come from a *vector space*. Regular multiplication and matrix multiplication are familiar linear operations, but there are many others.
|
||||
|
||||
The referenced notes identify $f'(x) dx$ with $f'(x)[dx]$, the latter emphasizing $f'(x)$ acts on $dx$ and the notation is not commutative (e.g., it is not $dx f'(x)$).
|
||||
The referenced notes identify $f'(x) dx$ as $f'(x)[dx]$, the latter emphasizing $f'(x)$ acts on $dx$ and the notation is not commutative (e.g., it is not $dx f'(x)$). The use of $[]$ is to indicate that $f'(x)$ "acts" on $dx$ in a linear manner. It may be multiplication, matrix multiplication, or something else. Parentheses are not used which might imply function application or multiplication.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Linear operators are related to vector spaces.
|
||||
|
||||
A [vector space](https://en.wikipedia.org/wiki/Vector_space) is a set of mathematical objects which can be added together and also multiplied by a scalar. Vectors of similar size, as previously discussed, are the typical example, with vector addition and scalar multiplication previously discussed topics. Matrices of similar size (and some subclasses) also form a vector space. Additionally, many other set of objects form vector spaces. An example might be polynomial functions of degree $n$ or less; continuous functions, or functions with a certain number of derivatives.
|
||||
|
||||
Take differentiable functions as an example, then the simplest derivative rules $[af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'$ show the linearity of the derivative in this setting. This linearity is different from how the derivative is a linear operator on $dx$.
|
||||
Take differentiable functions as an example, then the simplest derivative rules $[af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'$ show the linearity of the derivative in this setting.
|
||||
|
||||
A vector space is described by a *basis* -- a minimal set of vectors needed to describe the space, after consideration of linear combinations. For many vectors, this the set of special vectors with $1$ as one of the entries, and $0$ otherwise.
|
||||
A finite vector space is described by a *basis* -- a minimal set of vectors needed to describe the space, after consideration of linear combinations. For some typical vector spaces, this the set of special vectors with $1$ as one of the entries, and $0$ otherwise.
|
||||
|
||||
A key fact about a basis is every vector in the vector space can be expressed *uniquely* as a linear combination of the basis vectors.
|
||||
A key fact about a basis for a finite vector space is every vector in the vector space can be expressed *uniquely* as a linear combination of the basis vectors.
|
||||
|
||||
Vectors and matrices have properties that are generalizations of the real numbers. As vectors and matrices form vector spaces, the concept of addition of vectors and matrices is defined, as is scalar multiplication. Additionally, we have seen:
|
||||
|
||||
* The dot product between two vectors of the same length is defined easily ($v\cdot w = \Sigma_i v_i w_i$). It is coupled with the length as $\|v\|^2 = v\cdot v$.
|
||||
|
||||
* Matrix multiplication is defined for two properly sized matrices. If $A$ is $m \times k$ and $B$ is $k \times n$ then $AB$ is a $m\times n$ matrix with $(i,j)$ term given by the dot product of the $i$th row of $A$ (viewed as a vector) and the $j$th column of $B$ (viewed as a vector). Matrix multiplication is associative but *not* commutative. (E.g. $(AB)C = A(BC)$ but $AB$ and $BA$ need not be equal (or even defined, as the shapes may not match up).
|
||||
* Matrix multiplication is defined for two properly sized matrices. If $A$ is $m \times k$ and $B$ is $k \times n$ then $AB$ is a $m\times n$ matrix with $(i,j)$ term given by the dot product of the $i$th row of $A$ (viewed as a vector) and the $j$th column of $B$ (viewed as a vector). Matrix multiplication is associative but *not* commutative. (E.g. $(AB)C = A(BC)$ but $AB$ and $BA$ need not be equal, or even defined, as the shapes may not match up).
|
||||
|
||||
* A square matrix $A$ has an *inverse* $A^{-1}$ if $AA^{-1} = A^{-1}A = I$, where $I$ is the identity matrix (a matrix which is zero except on its diagonal entries which are all $1$). Square matrices may or may not have an inverse. When they don't the matrix is called singular.
|
||||
* A square matrix $A$ has an *inverse* $A^{-1}$ if $AA^{-1} = A^{-1}A = I$, where $I$ is the identity matrix (a matrix which is zero except on its diagonal entries, which are all $1$). Square matrices may or may not have an inverse. A matrix without an inverse is called *singular*.
|
||||
|
||||
* Viewing a vector as a matrix is possible. The association is typically through a *column* vector.
|
||||
|
||||
* The transpose of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so $v\cdot w = v^T w$, where we use a superscript $T$ for the transpose. The transpose of a product, is the product of the transposes -- reversed: $(AB)^T = B^T A^T$; the tranpose of a transpose is an identity operation: $(A^T)^T = A$; the inverse of a transpose is the tranpose of the inverse: $(A^{-1})^T = (A^T){-1}$.
|
||||
* The *transpose* of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so $v\cdot w = v^T w$, where we use a superscript $T$ for the transpose. The transpose of a product, is the product of the transposes -- reversed: $(AB)^T = B^T A^T$; the tranpose of a transpose is an identity operation: $(A^T)^T = A$; the inverse of a transpose is the tranpose of the inverse: $(A^{-1})^T = (A^T)^{-1}$.
|
||||
|
||||
* The *adjoint* of a matrix is related to the transpose, only complex conjugates are also taken.
|
||||
|
||||
* Matrices for which $A = A^T$ are called symmetric.
|
||||
|
||||
@@ -377,20 +388,20 @@ XXXX
|
||||
|
||||
What is the the derivative of $f(A) = A^2$?
|
||||
|
||||
The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size. This innocuous question isn't directly
|
||||
handled by the Jacobian, which is defined for vector valued function $f:R^n \rightarrow R^m$.
|
||||
The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size. This innocuous question isn't directly handled, here, by the Jacobian, which is defined for vector-valued functions $f:R^n \rightarrow R^m$.
|
||||
|
||||
This derivative can be derived directly from the *product rule*:
|
||||
|
||||
$$
|
||||
\begin{align*}
|
||||
f(A) &= [AA]'\\
|
||||
f'(A) &= [AA]'\\
|
||||
&= A dA + dA A
|
||||
\end{align*}
|
||||
$$
|
||||
|
||||
That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$ and not $2A\delta A$, as $A$ may not commute with $\delta A$.
|
||||
|
||||
XXX THIS ISN"T EVEN RIGHT
|
||||
|
||||
### Vectorization of a matrix
|
||||
|
||||
|
||||
Reference in New Issue
Block a user