added 2 additional figures for hig, sip overviews

2022-04-21 17:23:21 +02:00 · 2022-04-21 17:23:21 +02:00 · a8c116eb1e
commit a8c116eb1e
parent a22c6763e1
5 changed files with 31 additions and 8 deletions
--- a/physgrad-hig.md
+++ b/physgrad-hig.md
@ -61,6 +61,16 @@ It might seem attractive at first to clamp singular values to a small value $\ta

 The use of a partial inversion via $^{-1/2}$ instead of a full inversion with $^{-1}$ helps preventing that small eigenvalues lead to overly large contributions in the update step. This is inspired by Adam, which  normalizes the search direction via $J/(\sqrt(diag(J^{T}J)))$ instead of inverting it via $J/(J^{T}J)$, with $J$ being the diagonal of the Jacobian matrix. For Adam, this compromise is necessary due to the rough approximation via the diagonal. For HIGs, we use the full Jacobian, and hence can do a proper inversion. Nonetheless, as outlined in the original paper {cite}`schnell2022hig`, the half-inversion regularizes the inverse and provides substantial improvements for the learning, while reducing the chance of gradient explosions.

+```{figure} resources/physgrad-hig-spaces.jpg
+---
+height: 160px
+name: hig-spaces
+---
+A visual overview of the different spaces involved in HIG training. Most importantly, it makes use of the joint, inverse Jacobian for neural network and physics.
+```
+
+
+
 ## Constructing the Jacobian

 The formulation above hides one important aspect of HIGs: the search direction we compute not only jointly takes into account the scaling of neural network and physics, but can also incorporate information from all the samples in a mini-batch. This has the advantage of finding the optimal direction (in an $L^2$ sense) to minimize the loss, instead of averaging directions as done with SGD or Adam.
@ -79,7 +89,7 @@ $$
    \right) \ .
 $$

-The notation with $\big\vert_{x_i}$$ also makes clear that all parts of the Jacobian are evaluated with the corresponding input states. In contrast to regular optimizations, where larger batches typically don't pay off too much due to the averaging effect, the HIGs have a stronger dependence on the batch size. They often profit from larger mini-batch sizes.
+The notation with $\big\vert_{x_i}$ also makes clear that all parts of the Jacobian are evaluated with the corresponding input states. In contrast to regular optimizations, where larger batches typically don't pay off too much due to the averaging effect, the HIGs have a stronger dependence on the batch size. They often profit from larger mini-batch sizes.

 To summarize, compute the HIG update requires evaluating the individual Jacobians of a batch, doing an SVD of the combined Jacobian, truncating and half-inverting the singular values, and computing the update direction by re-assembling the half-inverted Jacobian matrix.

--- a/physgrad-nn.md
+++ b/physgrad-nn.md
@ -20,21 +20,23 @@ $$ (eq:unsupervised-training)



+
 ## NN training 

-To integrate the update step from equation {eq}`PG-def` into the training process for an NN, we consider three components: the NN itself, the physics simulator, and the loss function. 
-To join these three pieces together, we use the following algorithm. As introduced by Holl et al. {cite}`holl2021pg`, we'll denote this training process as _scale-invariant physics_ (SIP) training.
+To integrate the update step from equation {eq}`PG-def` into the training process for an NN, we consider three components: the NN itself, the physics simulator, and the loss function: 


-```{figure} resources/physgrad-sip.jpg
+```{figure} resources/physgrad-sip-spaces.jpg
 ---
-height: 220px
-name: sip-training
+height: 160px
+name: sip-spaces
 ---
-A visual overview of SIP training for an entry $i$ of a mini-batch, including the two loss computations in $y$ and in $x$ space (for the proxy loss).
+A visual overview of the different spaces involved in SIP training.
 ```


+To join these three pieces together, we use the following algorithm. As introduced by Holl et al. {cite}`holl2021pg`, we'll denote this training process as _scale-invariant physics_ (SIP) training.
+

 ```{admonition} Scale-Invariant Physics (SIP) Training
 :class: tip
@ -84,7 +86,18 @@ Due to the dependency of $\mathcal P^{-1}$ on the prediction $y$, it does not av
 To demonstrate this, consider the case that GD is being used as solver for the inverse simulation.
 Then the total loss is purely defined in $y$ space, reducing to a regular first-order optimization. 

-Hence, to summarize with SIPs we employ a trivial Newton step for the loss in $y$, and a proxy $L^2$ loss in $x$ that connects the computational graphs of inverse physics and NN for backpropagation.
+Hence, to summarize with SIPs we employ a trivial Newton step for the loss in $y$, and a proxy $L^2$ loss in $x$ that connects the computational graphs of inverse physics and NN for backpropagation. The following figure visualizes the different steps.
+
+
+```{figure} resources/physgrad-sip.jpg
+---
+height: 220px
+name: sip-training
+---
+A visual overview of SIP training for an entry $i$ of a mini-batch, including the two loss computations in $y$ and in $x$ space (for the proxy loss).
+```
+
+

 ## Iterations and time dependence

--- a/resources/pbdl-figures.key
+++ b/resources/pbdl-figures.key
--- a/resources/physgrad-hig-spaces.jpg
+++ b/resources/physgrad-hig-spaces.jpg
--- a/resources/physgrad-sip-spaces.jpg
+++ b/resources/physgrad-sip-spaces.jpg