added first figure, cleanup

2022-04-21 15:48:16 +02:00 · 2022-04-21 15:48:16 +02:00 · a22c6763e1
commit a22c6763e1
parent e84633aaa2
10 changed files with 22 additions and 19 deletions
--- a/_toc.yml
+++ b/_toc.yml
@ -1,7 +1,7 @@
 format: jb-book
 root: intro.md
 parts:
- caption: Introduction bla
+- caption: Introduction
  chapters:
  - file: intro-teaser.ipynb
  - file: overview.md
--- a/intro.md
+++ b/intro.md
@ -32,6 +32,8 @@ As a _sneak preview_, the next chapters will show:

 - How to more tightly interact with a full simulator for _inverse problems_. E.g., we'll demonstrate how to circumvent the convergence problems of standard reinforcement learning techniques by leveraging simulators in the training loop.

+- We'll also discuss the importance of _inversion_ for the update steps, and how higher-order information can be used to speed up convergence, and obtain more accurate neural networks.
+
 Throughout this text,
 we will introduce different approaches for introducing physical models
 into deep learning, i.e., _physics-based deep learning_ (PBDL) approaches.
@ -45,8 +47,9 @@ different techniques is particularly useful.
 :class: tip
 We focus on Jupyter notebooks, a key advantage of which is that all code examples
 can be executed _on the spot_, from your browser. You can modify things and 
-immediately see what happens -- give it a try...
-<br><br>
+immediately see what happens -- give it a try by 
+[[running this teaser example in your browser]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/intro-teaser.ipynb).
+
 Plus, Jupyter notebooks are great because they're a form of [literate programming](https://en.wikipedia.org/wiki/Literate_programming).
 ```

--- a/physgrad-comparison.ipynb
+++ b/physgrad-comparison.ipynb
@ -52,12 +52,12 @@
    "be interesting how this influences the positions in $\\mathbf{y}$ that develop while searching for\n",
    "the right position in $\\mathbf{x}$.\n",
    "\n",
-    "```{figure} resources/placeholder.png\n",
+    "```{figure} resources/physgrad-3spaces.jpg\n",
    "---\n",
-    "height: 220px\n",
-    "name: pg-three-spaces\n",
+    "height: 150px\n",
+    "name: three-spaces\n",
    "---\n",
-    "TODO, visual overview of 3 spaces\n",
+    "We're targeting inverse problems to retrieve an entry in $\\mathbf x$ from a loss computed in terms of output from a physics simulator $\\mathbf y$. Hence in a forward pass, we transformm from $\\mathbf x$ to $\\mathbf y$, and then compute a loss $L$. The backwards pass transforms back to $\\mathbf x$. Thus, the accuracy in terms of $\\mathbf x$ is the most crucial one, but we can likewise track progress of an optimization in terms of $\\mathbf y$ and $L$.\n",
    "```\n",
    "\n"
   ]
--- a/physgrad-discuss.md
+++ b/physgrad-discuss.md
@ -58,5 +58,8 @@ The HIGs on the other hand, go back to first order information in the form of Ja

 However, in both cases, the resulting models can give a performance that we simply can't obtain by, e.g., training longer with a simpler DP or supervised approach. So, if we plan to evaluate these models often, e.g., shipping them in an application, this increased one-time cost can pay off in the long run.

-
-
+This concludes the chapter on improved learning methods for physics-based NNs. 
+It's clearly an active topic of research, with plenty of room for new methods, but the algorithms here already
+indicate the potential of tailored learning algorithms for physical problems. 
+This also concludes the focus on numerical simulations as DL components. In the next chapter, we'll instead
+focus on a different statistical viewpoint, the inclusion of uncertainty.
--- a/physgrad-hig-code.ipynb
+++ b/physgrad-hig-code.ipynb
@ -76,9 +76,6 @@
        "import tensorflow as tf\n",
        "import time, os\n",
        "\n",
-        "os.environ[\"CUDA_VISIBLE_DEVICES\"]=''  # set GPUs\n",
-        "#tf.config.run_functions_eagerly(True)  # deactivate Tensorflow Tracing\n",
-        "\n",
        "# main switch for the three methods:\n",
        "MODE = 'HIG'  # HIG | SIP | GD"
      ]
@ -817,4 +814,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
--- a/physgrad-hig.md
+++ b/physgrad-hig.md
@ -59,7 +59,7 @@ It might seem attractive at first to clamp singular values to a small value $\ta

 ```

-The use of a partial inversion via $^{-1/2}$ instead of a full inversion with $^{-1}$ helps preventing that small eigenvalues lead to overly large contributions in the update step. This is inspired by Adam, which  normalizes the search direction via $J/(\sqrt(diag(J^{R}J)))$ instead of inverting it via $J/(J^{T}J)$, with $J$ being the diagonal of the Jacobian matrix. For Adam, this compromise is necessary due to the rough approximation via the diagonal. For HIGs, we use the full Jacobian, and hence can do a proper inversion. Nonetheless, as outlined in the original paper {cite}`schnell2022hig`, the half-inversion regularizes the inverse and provides substantial improvements for the learning, while reducing the chance of gradient explosions.
+The use of a partial inversion via $^{-1/2}$ instead of a full inversion with $^{-1}$ helps preventing that small eigenvalues lead to overly large contributions in the update step. This is inspired by Adam, which  normalizes the search direction via $J/(\sqrt(diag(J^{T}J)))$ instead of inverting it via $J/(J^{T}J)$, with $J$ being the diagonal of the Jacobian matrix. For Adam, this compromise is necessary due to the rough approximation via the diagonal. For HIGs, we use the full Jacobian, and hence can do a proper inversion. Nonetheless, as outlined in the original paper {cite}`schnell2022hig`, the half-inversion regularizes the inverse and provides substantial improvements for the learning, while reducing the chance of gradient explosions.

 ## Constructing the Jacobian

@ -100,10 +100,10 @@ $$
 and a scaled loss function 

 $$ 
-    L(y,\hat{y};\lambda)= \frac{1}{2} \big(y^1-\hat{y}^1\big)^2+ \frac{1}{2} \big(\lambda \cdot y^2-\hat{y}^2\big)^2 \ . 
+    L(y,\hat{y};\lambda)= \frac{1}{2} \big(y_1-\hat{y}_1\big)^2+ \frac{1}{2} \big(\lambda \cdot y_2-\hat{y}_2 \big)^2 \ . 
 $$

-Here $y^1$ and $y^2$ denote the first, and second component of $y$ (in contrast to the subscript used for the entries of a mini-batch above). Note that the scaling via $\lambda$ is intentionally only applied to the second component in the loss. This mimics an uneven scaling of the two components as commonly encountered in physical simulation settings, the amount of which can be chosen via $\lambda$.
+Here $y_1$ and $y_2$ denote the first, and second component of $y$ (in contrast to the subscript $i$ used for the entries of a mini-batch above). Note that the scaling via $\lambda$ is intentionally only applied to the second component in the loss. This mimics an uneven scaling of the two components as commonly encountered in physical simulation settings, the amount of which can be chosen via $\lambda$.

 We'll use a small neural network with a single hidden layer consisting of 7 neurons with _tanh()_ activations and the objective to learn $\hat{y}$. 

--- a/physgrad-nn.md
+++ b/physgrad-nn.md
@ -26,12 +26,12 @@ To integrate the update step from equation {eq}`PG-def` into the training proces
 To join these three pieces together, we use the following algorithm. As introduced by Holl et al. {cite}`holl2021pg`, we'll denote this training process as _scale-invariant physics_ (SIP) training.


-```{figure} resources/placeholder.png
+```{figure} resources/physgrad-sip.jpg
 ---
 height: 220px
-name: pg-training
+name: sip-training
 ---
-TODO, visual overview of SIP training
+A visual overview of SIP training for an entry $i$ of a mini-batch, including the two loss computations in $y$ and in $x$ space (for the proxy loss).
 ```


--- a/resources/pbdl-figures.key
+++ b/resources/pbdl-figures.key
--- a/resources/physgrad-3spaces.jpg
+++ b/resources/physgrad-3spaces.jpg
--- a/resources/physgrad-sip.jpg
+++ b/resources/physgrad-sip.jpg