diff --git a/quarto/differentiable_vector_calculus/vector_fields.qmd b/quarto/differentiable_vector_calculus/vector_fields.qmd
index b8172b6..3e3363f 100644
--- a/quarto/differentiable_vector_calculus/vector_fields.qmd
+++ b/quarto/differentiable_vector_calculus/vector_fields.qmd
@@ -24,7 +24,7 @@ For a scalar function $f: R^n \rightarrow R$, the gradient of $f$, $\nabla{f}$,
| $f: R\rightarrow R$ | univariate | familiar graph of function | $f$ |
| $f: R\rightarrow R^m$ | vector-valued | space curve when n=2 or 3 | $\vec{r}$, $\vec{N}$ |
| $f: R^n\rightarrow R$ | scalar | a surface when n=2 | $f$ |
-| $F: R^n\rightarrow R^n$ | vector field | a vector field when n=2 | $F$ |
+| $F: R^n\rightarrow R^n$ | vector field | a vector field when n=2, 3| $F$ |
| $F: R^n\rightarrow R^m$ | multivariable | n=2,m=3 describes a surface | $F$, $\Phi$ |
@@ -34,7 +34,9 @@ After an example where the use of a multivariable function is of necessity, we d
## Vector fields
-We have seen that the gradient of a scalar function, $f:R^2 \rightarrow R$, takes a point in $R^2$ and associates a vector in $R^2$. As such $\nabla{f}:R^2 \rightarrow R^2$ is a vector field. A vector field can be visualized by sampling a region and representing the field at those points. The details, as previously mentioned, are in the `vectorfieldplot` function of `CalculusWithJulia`.
+We have seen that the gradient of a scalar function, $f:R^2 \rightarrow R$, takes a point in $R^2$ and associates a vector in $R^2$. As such $\nabla{f}:R^2 \rightarrow R^2$ is a vector field. A vector field is a vector-valued function from $R^n \rightarrow R^n$ for $n \geq 2$.
+
+An input/output pair can be visualized by identifying the input values as a point, and the output as a vector visualized by anchoring the vector at the point. A vector field is a sampling of such pairs, usually taken over some ordered grid. The details, as previously mentioned, are in the `vectorfieldplot` function of `CalculusWithJulia`.
```{julia}
@@ -1071,7 +1073,7 @@ Adjusting $f$ to have a vanishing second -- but not third -- derivative at $c_0$
As for $g_1$, we have by construction $g_1(b_0) = 0$. By differentiation we get a pattern for some constants $c_j = (j+1)\cdot(j+2)\cdots \cdot k$ with $c_k = 1$.
$$
-g^{(k)}(b) = k! \cdot \frac{f(a_0) - f(b)}{(a_0-b)^{k+1}} - \sum_{j=1}^k c_j \frac{f^{(j)}(b)}{(a_0 - b)^{k-j+1}}.
+g^{(k)}(b) = k! \cdot \frac{f(a_0) - f(b)}{(a_0-b)^{k+1}} - \sum_{j=1}^k c_j \frac{f^{(j)}(b)}{(a_0 - b)^{k-j+1}}.
$$
Of note that when $f(a_0) = f(b_0) = 0$ that if $f^{(k)}(b_0)$ is the first non-vanishing derivative of $f$ at $b_0$ that $g^{(k)}(b_0) = f^{(k)}(b_0)/(b_0 - a_0)$ (they have the same sign).
@@ -1146,6 +1148,30 @@ This handles most cases, but leaves the possibility that a function with infinit
## Questions
+##### Question
+
+```{julia}
+#| echo: false
+gr()
+p1 = vectorfieldplot((x,y) -> [x,y], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="A");
+p2 = vectorfieldplot((x,y) -> [x-y,x], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9,title="B");
+p3 = vectorfieldplot((x,y) -> [y,0], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="C");
+p4 = vectorfieldplot((x,y) -> [-y,x], xlim=(-4,4), ylim=(-4,4), nx=9, ny=9, title="D");
+plot(p1, p2, p3, p4; layout=[2,2])
+```
+
+In the above figure, match the function with the vector field plot.
+
+```{julia}
+#| echo: false
+plotly()
+matchq(("`F(x,y)=[-y ,x]`", "`F(x,y)=[y,0]`",
+ "`F(x,y)=[x-y,x]`", "`F(x,y)=[x,y]`"),
+ ("A", "B", "C", "D"),
+ (4,3,2,1);
+ label="For each function mark the correct vector field plot"
+ )
+```
###### Question
diff --git a/quarto/staging/matrix-calculus-notes.html b/quarto/staging/matrix-calculus-notes.html
index dd6fd95..9f7b7e8 100644
--- a/quarto/staging/matrix-calculus-notes.html
+++ b/quarto/staging/matrix-calculus-notes.html
@@ -126,6 +126,7 @@ window.Quarto = {
XXX Add in examples from paper XXX optimization? large number of parameters? ,…
+
Mention numerator layout from https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions
@@ -140,18 +141,18 @@ Based on Bright, Edelman, and Johnson’s notes
We have seen several “derivatives” of a function, based on the number of inputs and outputs. The first one was for functions \(f: R \rightarrow R\).
-
Then \(f\) has a derivative at \(x\) if this limit exists
+
In this case, we saw that \(f\) has a derivative at \(c\) if this limit exists:
linearization write \(f(x+\Delta x) - f(x) \approx f'(x)\Delta x\), where \(\delta x\) is a small displacement from \(x\). The reason there isn’t equality is the unwritten higher order terms that vanish in a limit.
+
linearization writes \(f(x+\Delta x) - f(x) \approx f'(x)\Delta x\), where \(\Delta x\) is a small displacement from \(x\). The reason there isn’t equality is the unwritten higher order terms that vanish in a limit.
Alternate limits. Another way of writing this is in terms of explicit smaller order terms:
which means if we divide both sides by \(h\) and take the limit, we will get \(0\) on the right and the relationship on the left.
-
Differential notation simply writes this as \(dy = f(x)dx\). More verbosely, we might write
+
Differential notation simply writes this as \(dy = f'(x)dx\). Focusing on \(f\) and not \(y=f(x)\), we might write
\[
df = f(x+dx) - f(x) = f'(x) dx.
\]
-
Here \(dx\) is a differential, made rigorous by a limit, which hides the higher order terms.
+
We will see all the derivatives encountered so far can be similarly expressed.
+
In the above, \(df\) and \(dx\) are differentials, made rigorous by a limit, which hides the higher order terms.
In these notes the limit has been defined, with suitable modification, for functions of vectors (multiple values) with scalar or vector outputs.
For example, when \(f: R \rightarrow R^m\) was a vector-valued function the derivative was defined similarly through a limit of \((f(t + \Delta t) - f(t))/{\Delta t}\), where each component needed to have a limit. This can be rewritten through \(f(t + dt) - f(t) = f'(t) dt\), again using differentials to avoid the higher order terms.
-
When \(f: R^n \rightarrow R\) is a scalar-valued function of a derivative, differentiability was defined by a gradient existing with \(f(c+h) - f(c) - \nabla{f}(c) \cdot h\) being \(\mathscr{o}(\|h\|)\). In other words \(df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh\). The gradient has the same shape as \(c\), a column vector. If we take the row vector (e.g. \(f'(c) = \nabla{f}(c)^T\)) then again we see \(df = f(c+dh) - f(c) = f'(c) dh\), where the last term uses matrix multiplication of a row vector times a column vector.
+
When \(f: R^n \rightarrow R\) is a scalar-valued function with vector inputs, differentiability was defined by a gradient existing with \(f(c+h) - f(c) - \nabla{f}(c) \cdot h\) being \(\mathscr{o}(\|h\|)\). In other words \(df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh\). The gradient has the same shape as \(c\), a column vector. If we take the row vector (e.g. \(f'(c) = \nabla{f}(c)^T\)) then again we see \(df = f(c+dh) - f(c) = f'(c) dh\), where the last term uses matrix multiplication of a row vector times a column vector.
Finally, when \(f:R^n \rightarrow R^m\), the Jacobian was defined and characterized by \(\| f(x + dx) - f(x) - J_f(x)dx \|\) being \(\mathscr{o}(\|dx\|)\). Again, we can express this through \(df = f(x + dx) - f(x) = f'(x)dx\) where \(f'(x) = J_f(x)\).
In writing \(df = f(x + dx) - f(x) = f'(x) dx\) generically, some underlying facts are left implicit: \(dx\) has the same shape as \(x\) (so can be added); \(f'(x) dx\) may mean usual multiplication or matrix multiplication; and there is an underlying concept of distance and size that allows the above to be rigorous. This may be an abolute value or a norm.
-
Further, various differentiation rules apply such as the sum, product, and chain rule.
+
Further, various differentiation rules apply such as the sum, product, and chain rules.
The @BrightEdelmanJohnson notes cover differentiation of functions in this uniform manner and then extend the form by treating derivatives as linear operators.
where the \(\alpha\) and \(\beta\) are scalars, and \(v\) and \(w\) possibly not and come from a vector space. Regular multiplication and matrix multiplication are familiar linear operations, but there are many others.
-
The referenced notes identify \(f'(x) dx\) with \(f'(x)[dx]\), the latter emphasizing \(f'(x)\) acts on \(dx\) and the notation is not commutative (e.g., it is not \(dx f'(x)\)).
+
The referenced notes identify \(f'(x) dx\) as \(f'(x)[dx]\), the latter emphasizing \(f'(x)\) acts on \(dx\) and the notation is not commutative (e.g., it is not \(dx f'(x)\)). The use of \([]\) is to indicate that \(f'(x)\) “acts” on \(dx\) in a linear manner. It may be multiplication, matrix multiplication, or something else. Parentheses are not used which might imply function application or multiplication.
Linear operators are related to vector spaces.
A vector space is a set of mathematical objects which can be added together and also multiplied by a scalar. Vectors of similar size, as previously discussed, are the typical example, with vector addition and scalar multiplication previously discussed topics. Matrices of similar size (and some subclasses) also form a vector space. Additionally, many other set of objects form vector spaces. An example might be polynomial functions of degree \(n\) or less; continuous functions, or functions with a certain number of derivatives.
-
Take differentiable functions as an example, then the simplest derivative rules \([af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'\) show the linearity of the derivative in this setting. This linearity is different from how the derivative is a linear operator on \(dx\).
-
A vector space is described by a basis – a minimal set of vectors needed to describe the space, after consideration of linear combinations. For many vectors, this the set of special vectors with \(1\) as one of the entries, and \(0\) otherwise.
-
A key fact about a basis is every vector in the vector space can be expressed uniquely as a linear combination of the basis vectors.
+
Take differentiable functions as an example, then the simplest derivative rules \([af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'\) show the linearity of the derivative in this setting.
+
A finite vector space is described by a basis – a minimal set of vectors needed to describe the space, after consideration of linear combinations. For some typical vector spaces, this the set of special vectors with \(1\) as one of the entries, and \(0\) otherwise.
+
A key fact about a basis for a finite vector space is every vector in the vector space can be expressed uniquely as a linear combination of the basis vectors.
Vectors and matrices have properties that are generalizations of the real numbers. As vectors and matrices form vector spaces, the concept of addition of vectors and matrices is defined, as is scalar multiplication. Additionally, we have seen:
The dot product between two vectors of the same length is defined easily (\(v\cdot w = \Sigma_i v_i w_i\)). It is coupled with the length as \(\|v\|^2 = v\cdot v\).
-
Matrix multiplication is defined for two properly sized matrices. If \(A\) is \(m \times k\) and \(B\) is \(k \times n\) then \(AB\) is a \(m\times n\) matrix with \((i,j)\) term given by the dot product of the \(i\)th row of \(A\) (viewed as a vector) and the \(j\)th column of \(B\) (viewed as a vector). Matrix multiplication is associative but not commutative. (E.g. \((AB)C = A(BC)\) but \(AB\) and \(BA\) need not be equal (or even defined, as the shapes may not match up).
-
A square matrix \(A\) has an inverse\(A^{-1}\) if \(AA^{-1} = A^{-1}A = I\), where \(I\) is the identity matrix (a matrix which is zero except on its diagonal entries which are all \(1\)). Square matrices may or may not have an inverse. When they don’t the matrix is called singular.
+
Matrix multiplication is defined for two properly sized matrices. If \(A\) is \(m \times k\) and \(B\) is \(k \times n\) then \(AB\) is a \(m\times n\) matrix with \((i,j)\) term given by the dot product of the \(i\)th row of \(A\) (viewed as a vector) and the \(j\)th column of \(B\) (viewed as a vector). Matrix multiplication is associative but not commutative. (E.g. \((AB)C = A(BC)\) but \(AB\) and \(BA\) need not be equal, or even defined, as the shapes may not match up).
+
A square matrix \(A\) has an inverse\(A^{-1}\) if \(AA^{-1} = A^{-1}A = I\), where \(I\) is the identity matrix (a matrix which is zero except on its diagonal entries, which are all \(1\)). Square matrices may or may not have an inverse. A matrix without an inverse is called singular.
Viewing a vector as a matrix is possible. The association is typically through a column vector.
-
The transpose of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so \(v\cdot w = v^T w\), where we use a superscript \(T\) for the transpose. The transpose of a product, is the product of the transposes – reversed: \((AB)^T = B^T A^T\); the tranpose of a transpose is an identity operation: \((A^T)^T = A\); the inverse of a transpose is the tranpose of the inverse: \((A^{-1})^T = (A^T){-1}\).
+
The transpose of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so \(v\cdot w = v^T w\), where we use a superscript \(T\) for the transpose. The transpose of a product, is the product of the transposes – reversed: \((AB)^T = B^T A^T\); the tranpose of a transpose is an identity operation: \((A^T)^T = A\); the inverse of a transpose is the tranpose of the inverse: \((A^{-1})^T = (A^T)^{-1}\).
+
The adjoint of a matrix is related to the transpose, only complex conjugates are also taken.
Matrices for which \(A = A^T\) are called symmetric.
A few of the operations on matrices are the transpose and the inverse. These return a matrix, when defined. There is also the determinant and the trace, which return a scalar from a matrix. The trace is just the sum of the diagonal; the determinant is more involved to compute, but was previously seen to have a relationship to the volume of a certain parallellpiped. There are a few other operations described in the following.
@@ -233,26 +236,26 @@ df &= f(x + dx) - f(x)\\
\]
The term \(dx^t A dx\) is dropped, as it is higher order (goes to zero faster), it containing two \(dx\) terms. In the second to last step, an identity operation (taking the transpose of the scalar quantity) is taken to simplify the algebra. Finally, as \(df = f'(x)[dx]\) the identity of \(f'(x) = x^T(A^T+A)\) is made, or taking transposes \(\nabla f = (A + A^T)x\).
Compare the elegance above, with the component version, even though simplified, it still requires a specification of the size to carry the following out:
-
+
usingSymPy@syms x[1:3]::real A[1:3, 1:3]::realu = x'* A * xgrad_u = [diff(u, xi) for xi in x]
@@ -415,30 +418,31 @@ f' = (a'b')c' \text{ or } f' = a'(b'c')
Derivatives of matrix functions
What is the the derivative of \(f(A) = A^2\)?
-
The function \(f\) takes a \(n\times n\) matrix and returns a matrix of the same size. This innocuous question isn’t directly handled by the Jacobian, which is defined for vector valued function \(f:R^n \rightarrow R^m\).
+
The function \(f\) takes a \(n\times n\) matrix and returns a matrix of the same size. This innocuous question isn’t directly handled, here, by the Jacobian, which is defined for vector-valued functions \(f:R^n \rightarrow R^m\).
This derivative can be derived directly from the product rule:
\[
\begin{align*}
-f(A) &= [AA]'\\
+f'(A) &= [AA]'\\
&= A dA + dA A
\end{align*}
\]
That is \(f'(A)\) is the operator \(f'(A)[\delta A] = A \delta A + \delta A A\) and not \(2A\delta A\), as \(A\) may not commute with \(\delta A\).
+
XXX THIS ISN”T EVEN RIGHT
Vectorization of a matrix
Alternatively, we can identify \(A\) through its components, as a vector in \(R^{n^2}\) and then leverage the Jacobian.
One such identification is vectorization – consecutively stacking the column vectors into a vector. In Julia the vec function does this operation:
A basic course in linear algebra shows that any linear operator on a finite vector space can be represented as a matrix. The basic idea is to represent what the operator does to each basis element and put these values as columns of the matrix.
In this \(3 \times 3\) case, the linear operator works on an object with \(9\) slots and returns an object with \(9\) slots, so the matrix will be \(9 \times 9\).
The basis elements are simply the matrices with a \(1\) in spot \((i,j)\) and zero elsewhere. Here we generate them through a function:
-
+
basis(i,j,A) = (b=zeros(Int, size(A)...); b[i,j] =1; b)JJ = [vec(basis(i,j,A)*A +A*basis(i,j,A)) for j in1:3 for i in1:3]
Appropriate sizes for \(A\), \(B\), and \(C\) are determined by the various products in \(BCA^T\).
If \(A\) is \(m \times n\) and \(B\) is \(r \times s\), then since \(BC\) is defined, \(C\) has \(s\) rows, and since \(CA^T\) is defined, \(C\) must have \(n\) columns, as \(A^T\) is \(n \times m\), so \(C\) must be \(s\times n\). Checking this is correct on the other side, \(A \times B\) would be size \(mr \times ns\) and \(\vec{C}\) would be size \(sn\), so that product works, size wise.
The referred to notes have an explanation for this formula, but we confirm with an example with \(m=n-2\), \(r=s=3\):
-
+
@syms A[1:2, 1:2]::real B[1:3, 1:3]::real C[1:3, 1:2]::realL, R =kron(A,B)*vec(C), vec(B*C*A')all(l == r for (l, r) ∈zip(L, R))
diff --git a/quarto/staging/matrix-calculus-notes.qmd b/quarto/staging/matrix-calculus-notes.qmd
index a9e457e..032c48d 100644
--- a/quarto/staging/matrix-calculus-notes.qmd
+++ b/quarto/staging/matrix-calculus-notes.qmd
@@ -3,6 +3,9 @@
XXX Add in examples from paper XXX
optimization? large number of parameters? ,...
+
+Mention numerator layout from https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions
+
::: {.callout-note}
## Based on Bright, Edelman, and Johnson's notes
@@ -12,13 +15,13 @@ This section samples material from the notes [Matrix Calculus (for Machine Learn
We have seen several "derivatives" of a function, based on the number of inputs and outputs. The first one was for functions $f: R \rightarrow R$.
-Then $f$ has a derivative at $x$ if this limit exists
+In this case, we saw that $f$ has a derivative at $c$ if this limit exists:
$$
-\lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}.
+\lim_{h \rightarrow 0}\frac{f(c + h) - f(c)}{h}.
$$
-The derivative of the function $x$ is this limit for a given $x$. Common notation is:
+The derivative as a function of $x$ using this rule for any $x$ in the domain. Common notation is:
$$
f'(x) = \frac{dy}{dx} = \lim_{h \rightarrow 0}\frac{f(x + h) - f(x)}{h}
@@ -27,9 +30,9 @@ $$
(when the limit exists).
-This limit gets expressed in different ways:
+This limit gets re-expressed in different ways:
-* linearization write $f(x+\Delta x) - f(x) \approx f'(x)\Delta x$, where $\delta x$ is a small displacement from $x$. The reason there isn't equality is the unwritten higher order terms that vanish in a limit.
+* linearization writes $f(x+\Delta x) - f(x) \approx f'(x)\Delta x$, where $\Delta x$ is a small displacement from $x$. The reason there isn't equality is the unwritten higher order terms that vanish in a limit.
* Alternate limits. Another way of writing this is in terms of explicit smaller order terms:
@@ -39,13 +42,15 @@ $$
which means if we divide both sides by $h$ and take the limit, we will get $0$ on the right and the relationship on the left.
-* Differential notation simply writes this as $dy = f(x)dx$. More verbosely, we might write
+* Differential notation simply writes this as $dy = f'(x)dx$. Focusing on $f$ and not $y=f(x)$, we might write
$$
df = f(x+dx) - f(x) = f'(x) dx.
$$
-Here $dx$ is a differential, made rigorous by a limit, which hides the higher order terms.
+We will see all the derivatives encountered so far can be similarly expressed.
+
+In the above, $df$ and $dx$ are differentials, made rigorous by a limit, which hides the higher order terms.
In these notes the limit has been defined, with suitable modification, for functions of vectors (multiple values) with scalar or vector outputs.
@@ -53,7 +58,7 @@ In these notes the limit has been defined, with suitable modification, for funct
For example, when $f: R \rightarrow R^m$ was a vector-valued function the derivative was defined similarly through a limit of $(f(t + \Delta t) - f(t))/{\Delta t}$, where each component needed to have a limit. This can be rewritten through $f(t + dt) - f(t) = f'(t) dt$, again using differentials to avoid the higher order terms.
-When $f: R^n \rightarrow R$ is a scalar-valued function of a derivative, differentiability was defined by a gradient existing with $f(c+h) - f(c) - \nabla{f}(c) \cdot h$ being $\mathscr{o}(\|h\|)$. In other words $df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh$. The gradient has the same shape as $c$, a column vector. If we take the row vector (e.g. $f'(c) = \nabla{f}(c)^T$) then again we see $df = f(c+dh) - f(c) = f'(c) dh$, where the last term uses matrix multiplication of a row vector times a column vector.
+When $f: R^n \rightarrow R$ is a scalar-valued function with vector inputs, differentiability was defined by a gradient existing with $f(c+h) - f(c) - \nabla{f}(c) \cdot h$ being $\mathscr{o}(\|h\|)$. In other words $df = f(c + dh) - f(c) = \nabla{f}(c) \cdot dh$. The gradient has the same shape as $c$, a column vector. If we take the row vector (e.g. $f'(c) = \nabla{f}(c)^T$) then again we see $df = f(c+dh) - f(c) = f'(c) dh$, where the last term uses matrix multiplication of a row vector times a column vector.
Finally, when $f:R^n \rightarrow R^m$, the Jacobian was defined and characterized by
$\| f(x + dx) - f(x) - J_f(x)dx \|$ being $\mathscr{o}(\|dx\|)$. Again, we can express this through $df = f(x + dx) - f(x) = f'(x)dx$ where $f'(x) = J_f(x)$.
@@ -61,7 +66,7 @@ $\| f(x + dx) - f(x) - J_f(x)dx \|$ being $\mathscr{o}(\|dx\|)$. Again, we can e
In writing $df = f(x + dx) - f(x) = f'(x) dx$ generically, some underlying facts are left implicit: $dx$ has the same shape as $x$ (so can be added); $f'(x) dx$ may mean usual multiplication or matrix multiplication; and there is an underlying concept of distance and size that allows the above to be rigorous. This may be an abolute value or a norm.
-Further, various differentiation rules apply such as the sum, product, and chain rule.
+Further, various differentiation rules apply such as the sum, product, and chain rules.
The @BrightEdelmanJohnson notes cover differentiation of functions in this uniform manner and then extend the form by treating derivatives as *linear operators*.
@@ -75,29 +80,35 @@ $$
where the $\alpha$ and $\beta$ are scalars, and $v$ and $w$ possibly not and come from a *vector space*. Regular multiplication and matrix multiplication are familiar linear operations, but there are many others.
-The referenced notes identify $f'(x) dx$ with $f'(x)[dx]$, the latter emphasizing $f'(x)$ acts on $dx$ and the notation is not commutative (e.g., it is not $dx f'(x)$).
+The referenced notes identify $f'(x) dx$ as $f'(x)[dx]$, the latter emphasizing $f'(x)$ acts on $dx$ and the notation is not commutative (e.g., it is not $dx f'(x)$). The use of $[]$ is to indicate that $f'(x)$ "acts" on $dx$ in a linear manner. It may be multiplication, matrix multiplication, or something else. Parentheses are not used which might imply function application or multiplication.
+
+
+
+
Linear operators are related to vector spaces.
A [vector space](https://en.wikipedia.org/wiki/Vector_space) is a set of mathematical objects which can be added together and also multiplied by a scalar. Vectors of similar size, as previously discussed, are the typical example, with vector addition and scalar multiplication previously discussed topics. Matrices of similar size (and some subclasses) also form a vector space. Additionally, many other set of objects form vector spaces. An example might be polynomial functions of degree $n$ or less; continuous functions, or functions with a certain number of derivatives.
-Take differentiable functions as an example, then the simplest derivative rules $[af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'$ show the linearity of the derivative in this setting. This linearity is different from how the derivative is a linear operator on $dx$.
+Take differentiable functions as an example, then the simplest derivative rules $[af(x) + bg(x)]' = a[f(x)]' + b[g(x)]'$ show the linearity of the derivative in this setting.
-A vector space is described by a *basis* -- a minimal set of vectors needed to describe the space, after consideration of linear combinations. For many vectors, this the set of special vectors with $1$ as one of the entries, and $0$ otherwise.
+A finite vector space is described by a *basis* -- a minimal set of vectors needed to describe the space, after consideration of linear combinations. For some typical vector spaces, this the set of special vectors with $1$ as one of the entries, and $0$ otherwise.
-A key fact about a basis is every vector in the vector space can be expressed *uniquely* as a linear combination of the basis vectors.
+A key fact about a basis for a finite vector space is every vector in the vector space can be expressed *uniquely* as a linear combination of the basis vectors.
Vectors and matrices have properties that are generalizations of the real numbers. As vectors and matrices form vector spaces, the concept of addition of vectors and matrices is defined, as is scalar multiplication. Additionally, we have seen:
* The dot product between two vectors of the same length is defined easily ($v\cdot w = \Sigma_i v_i w_i$). It is coupled with the length as $\|v\|^2 = v\cdot v$.
-* Matrix multiplication is defined for two properly sized matrices. If $A$ is $m \times k$ and $B$ is $k \times n$ then $AB$ is a $m\times n$ matrix with $(i,j)$ term given by the dot product of the $i$th row of $A$ (viewed as a vector) and the $j$th column of $B$ (viewed as a vector). Matrix multiplication is associative but *not* commutative. (E.g. $(AB)C = A(BC)$ but $AB$ and $BA$ need not be equal (or even defined, as the shapes may not match up).
+* Matrix multiplication is defined for two properly sized matrices. If $A$ is $m \times k$ and $B$ is $k \times n$ then $AB$ is a $m\times n$ matrix with $(i,j)$ term given by the dot product of the $i$th row of $A$ (viewed as a vector) and the $j$th column of $B$ (viewed as a vector). Matrix multiplication is associative but *not* commutative. (E.g. $(AB)C = A(BC)$ but $AB$ and $BA$ need not be equal, or even defined, as the shapes may not match up).
-* A square matrix $A$ has an *inverse* $A^{-1}$ if $AA^{-1} = A^{-1}A = I$, where $I$ is the identity matrix (a matrix which is zero except on its diagonal entries which are all $1$). Square matrices may or may not have an inverse. When they don't the matrix is called singular.
+* A square matrix $A$ has an *inverse* $A^{-1}$ if $AA^{-1} = A^{-1}A = I$, where $I$ is the identity matrix (a matrix which is zero except on its diagonal entries, which are all $1$). Square matrices may or may not have an inverse. A matrix without an inverse is called *singular*.
* Viewing a vector as a matrix is possible. The association is typically through a *column* vector.
-* The transpose of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so $v\cdot w = v^T w$, where we use a superscript $T$ for the transpose. The transpose of a product, is the product of the transposes -- reversed: $(AB)^T = B^T A^T$; the tranpose of a transpose is an identity operation: $(A^T)^T = A$; the inverse of a transpose is the tranpose of the inverse: $(A^{-1})^T = (A^T){-1}$.
+* The *transpose* of a matrix comes by permuting the rows and columns. The transpose of a column vector is a row vector, so $v\cdot w = v^T w$, where we use a superscript $T$ for the transpose. The transpose of a product, is the product of the transposes -- reversed: $(AB)^T = B^T A^T$; the tranpose of a transpose is an identity operation: $(A^T)^T = A$; the inverse of a transpose is the tranpose of the inverse: $(A^{-1})^T = (A^T)^{-1}$.
+
+* The *adjoint* of a matrix is related to the transpose, only complex conjugates are also taken.
* Matrices for which $A = A^T$ are called symmetric.
@@ -377,20 +388,20 @@ XXXX
What is the the derivative of $f(A) = A^2$?
-The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size. This innocuous question isn't directly
-handled by the Jacobian, which is defined for vector valued function $f:R^n \rightarrow R^m$.
+The function $f$ takes a $n\times n$ matrix and returns a matrix of the same size. This innocuous question isn't directly handled, here, by the Jacobian, which is defined for vector-valued functions $f:R^n \rightarrow R^m$.
This derivative can be derived directly from the *product rule*:
$$
\begin{align*}
-f(A) &= [AA]'\\
+f'(A) &= [AA]'\\
&= A dA + dA A
\end{align*}
$$
That is $f'(A)$ is the operator $f'(A)[\delta A] = A \delta A + \delta A A$ and not $2A\delta A$, as $A$ may not commute with $\delta A$.
+XXX THIS ISN"T EVEN RIGHT
### Vectorization of a matrix