intro updates

2021-02-15 16:04:09 +08:00 · 2021-02-15 16:04:09 +08:00 · 0625cb6b0b
commit 0625cb6b0b
parent b66e6cda2c
6 changed files with 120 additions and 67 deletions
--- a/_toc.yml
+++ b/_toc.yml
@ -6,6 +6,8 @@
 - file: overview.md
  sections:
    - file: overview-equations.md
+    - file: overview-burgers-forw-v2.ipynb
+    - file: overview-ns-forw-v2.ipynb
    - file: overview-burgers-forw.ipynb
    - file: overview-ns-forw.ipynb
 - file: supervised
@ -21,6 +23,7 @@
    - file: diffphys-code-gradient.ipynb
    - file: diffphys-code-tf.ipynb
    - file: diffphys-discuss.md
+    - file: diffphys-code-ns-v2.ipynb
    - file: diffphys-code-ns.ipynb
    - file: diffphys-code-sol.ipynb
    - file: diffphys-outlook.md
--- a/intro-teaser.ipynb
+++ b/intro-teaser.ipynb
@ -2,6 +2,7 @@
 "cells": [
  {
   "cell_type": "markdown",
+   "id": "supported-manner",
   "metadata": {},
   "source": [
    "# A Teaser Example"
@ -9,11 +10,12 @@
  },
  {
   "cell_type": "markdown",
+   "id": "lesbian-brave",
   "metadata": {},
   "source": [
    "Let's directly look at a very reduced example that highlights some of the key capabilities of physics-based learning approaches.\n",
    "\n",
-    "Take a look at the following picture - the desired solution is shown in light gray. If we don't take care we'll learn approximations like the red one shown in the left, which are completely off! With an improved learning setup, ideally by using a discretized numerical solver, we can at least accurately represent a part of the solutions (green on the right).\n",
+    "Take a look at the following picture - the desired solution is shown in light gray. If we don't take care we'll learn approximations like the red one shown on the left, which are completely off! With an improved learning setup, ideally by using a discretized numerical solver, we can at least accurately represent a part of the solutions (shown in green on the right).\n",
    "\n",
    "```{figure} resources/intro-teaser-side-by-side.png\n",
    "---\n",
@ -26,6 +28,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "deadly-paint",
   "metadata": {},
   "source": [
    "## Differentiable physics"
@ -33,21 +36,23 @@
  },
  {
   "cell_type": "markdown",
+   "id": "funky-tamil",
   "metadata": {},
   "source": [
-    "Let's illustrate the properties of deep learning via differentiable physics (DP) with a simple example: We'd like to find an unknown function $f^*$ that generates solutions from a space $Y$ that we're interested in. \n",
+    "Let's illustrate the properties of deep learning via _differentiable physics_ (DP) with a simple example: We'd like to find an unknown function $f^*$ that generates solutions from a space $Y$, taking inputs from $X$, i.e. $f^*: X \\to Y$ (in the following, we'll often denote _idealized_, and unknown functions with a $^*$). \n",
    "\n",
    "Let's additionally assume we have a generic differential equation $\\mathcal P^*: Y \\to X$ (our _model_ equation), that encodes a property of the solutions, here modeled in terms of a mapping to the input space $X$. Alternatively, we could imagine $X$ representing future solutions that we're looking for (then $\\mathcal P^*$ would model a time evolution), or it could be a constraint for conservation of mass (then $\\mathcal P^*$ would measure divergence).\n",
    "\n",
    "Using a neural network $f$ to learn the unknown and ideal function $f^*$, we could turn to classic _supervised_ training to obtain $f$ by collecting data. This classical setup requires a dataset by sampling $x$ from $X$ and adding the corresponding solutions $y$ from $Y$. We could obtain these, e.g., by classical numerical techniques. Then we train the NN $f$ in the usual way using this dataset. \n",
    "\n",
-    "In contrast to this, differentiable physics takes advantage of the fact that we can directly use a discretized version of the physical model $\\mathcal P$ and employ it to guide the training of $f$. I.e., we want $f$ to _interact_ with our _simulator_ $\\mathcal P$. This can vastly improve the learning, as we'll illustrate below with a super simple example, and later on with more complex ones.\n",
+    "In contrast to this supervised approach, employing differentiable physics takes advantage of the fact that we can directly use a discretized version of the physical model $\\mathcal P$ and employ it to guide the training of $f$. I.e., we want $f$ to _interact_ with our _simulator_ $\\mathcal P$. This can vastly improve the learning, as we'll illustrate below with a very simple example (more complex ones will follow later on).\n",
    "\n",
-    "(Note that it order for the DP approach to work, $\\mathcal P$ has to differentiable, as implied by the name. These differentials, in the form of a gradient, are what's driving the learning process.)\n"
+    "Note that it order for the DP approach to work, $\\mathcal P$ has to differentiable, as implied by the name. These differentials, in the form of a gradient, are what's driving the learning process.\n"
   ]
  },
  {
   "cell_type": "markdown",
+   "id": "recreational-table",
   "metadata": {},
   "source": [
    "## Finding the inverse function of a parabola"
@ -55,6 +60,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "latest-amino",
   "metadata": {},
   "source": [
    "To illustrate these two approaches, we consider the following simplified setting: Given the function $\\mathcal P: y\\to y^2$ for $y$ in the inverval $[0,1]$, find the unknown function $f$ such that $\\mathcal P(f(x)) = x$ for all $x$ in $[0,1]$. Note that the _discretization_ here is simply given by representing the $x$ and $y$ via floating point numbers in the computer.\n",
@ -68,6 +74,7 @@
  {
   "cell_type": "code",
   "execution_count": 1,
+   "id": "accompanied-anaheim",
   "metadata": {},
   "outputs": [],
   "source": [
@ -78,6 +85,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "numerous-emphasis",
   "metadata": {},
   "source": [
    "For supervised training, we need a method to find for each datapoint the corresponding solution of the above problem. We simply use our solver $\\mathcal P$ for the problem to pre-compute these solutions: We randomly choose between the positive and the negative square root. This makes sense because in the generic case, when we require optimization techniques to do this step, these methods are not expected to favor one particular mode in multimodal solutions."
@ -86,6 +94,7 @@
  {
   "cell_type": "code",
   "execution_count": 2,
+   "id": "realistic-event",
   "metadata": {},
   "outputs": [],
   "source": [
@ -97,6 +106,7 @@
  {
   "cell_type": "code",
   "execution_count": 3,
+   "id": "capable-month",
   "metadata": {},
   "outputs": [],
   "source": [
@ -107,6 +117,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "stone-science",
   "metadata": {},
   "source": [
    "Now we can define a network, loss, and training configuration. We'll use a simple `keras` model with three hidden layers, ReLU activations."
@ -115,6 +126,7 @@
  {
   "cell_type": "code",
   "execution_count": 4,
+   "id": "weighted-costa",
   "metadata": {},
   "outputs": [],
   "source": [
@ -129,6 +141,7 @@
  {
   "cell_type": "code",
   "execution_count": 5,
+   "id": "adolescent-yellow",
   "metadata": {},
   "outputs": [],
   "source": [
@ -140,6 +153,7 @@
  {
   "cell_type": "code",
   "execution_count": 6,
+   "id": "underlying-continuity",
   "metadata": {},
   "outputs": [
    {
@ -170,6 +184,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "governmental-mixture",
   "metadata": {},
   "source": [
    "As both model and data set are very small, the training converges very quickly, but if we inspect the predictions of the network, we can see that it nowhere near the solution we wer hoping to find: it averages between the data points on both sides of the x-axis and therefore, fails to find satisfying solutions to our above problem.\n",
@ -180,6 +195,7 @@
  {
   "cell_type": "code",
   "execution_count": 7,
+   "id": "sought-basement",
   "metadata": {},
   "outputs": [
    {
@ -208,11 +224,12 @@
  },
  {
   "cell_type": "markdown",
+   "id": "reduced-airplane",
   "metadata": {},
   "source": [
-    "😱 This is obviously completely wrong! The red solution is nowhere near one of the two modes of our solution shown in blue.\n",
+    "😱 This is obviously completely wrong! The red solution is nowhere near one of the two modes of our solution shown in gray.\n",
    "\n",
-    "Note that the line is often not perfectly at zero, which is where the two modes of the solution should average out in the continuous setting. This is caused by the relatively coarse sampling with only 200 points in this example.\n",
+    "Note that the red line is often not perfectly at zero, which is where the two modes of the solution should average out in the continuous setting. This is caused by the relatively coarse sampling with only 200 points in this example.\n",
    "<br>\n",
    "\n",
    "---"
@ -220,36 +237,33 @@
  },
  {
   "cell_type": "markdown",
+   "id": "dated-requirement",
   "metadata": {},
   "source": [
-    "## Finding f via a Differentiable Physics approach"
+    "## A Differentiable Physics approach"
   ]
  },
  {
   "cell_type": "markdown",
+   "id": "acoustic-review",
   "metadata": {},
   "source": [
-    "Now let's apply the differentiable physics approach: we'll directly include our discretized model $\\mathcal P$ in the training. \n",
+    "Now let's apply the differentiable physics approach to find $f$: we'll directly include our discretized model $\\mathcal P$ in the training. \n",
    "\n",
-    "There is no real data generation step; we only need to sample from the $[0,1]$ interval. We'll simply keep the same $x$ locations used in the previous case."
+    "There is no real data generation step; we only need to sample from the $[0,1]$ interval. We'll simply keep the same $x$ locations used in the previous case, and a new instance of a model with the same architecture as before `model_dp`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
+   "id": "extensive-forward",
   "metadata": {},
   "outputs": [],
   "source": [
    "# X-Data\n",
-    "# X = X , we can directly re-use the X from above, nothing has changed..."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
+    "# X = X , we can directly re-use the X from above, nothing has changed...\n",
+    "# Y is evaluated on the fly\n",
+    "\n",
    "# Model\n",
    "model_dp = tf.keras.models.Sequential([\n",
    "  tf.keras.layers.Dense(8, activation=act),\n",
@ -259,16 +273,18 @@
  },
  {
   "cell_type": "markdown",
+   "id": "conscious-budapest",
   "metadata": {},
   "source": [
-    "This loss function is the **crucial** point now: we directly incorporate the function f into the loss. In this simple case, the `Loss_dp` function simply computes the square of the prediction `y_pred`. \n",
+    "The loss function is the **crucial** point now: we directly incorporate the function f into the loss. In this simple case, the `Loss_dp` function simply computes the square of the prediction `y_pred`. \n",
    "\n",
-    "Later on, a lot more could happen here: we could evaluate finite difference stencils on the predicted solution, or compute a whole implicit time-integration step of a solver. Note: here we effectively have a simple residual equation $y_{\\text{pred}}^2 - y_{\\text{true}} = 0$ which we are minimizing via a _mean-squared error_. It's not necessary to make it so simple: the more knowledge and numerical methods we can incorporate, the better we can guide the training process."
+    "Later on, a lot more could happen here: we could evaluate finite difference stencils on the predicted solution, or compute a whole implicit time-integration step of a solver. Here we effectively have a simple residual equation $y_{\\text{pred}}^2 - y_{\\text{true}} = 0$ which we are minimizing via a _mean-squared error_. It's not necessary to make it so simple: the more knowledge and numerical methods we can incorporate, the better we can guide the training process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
+   "id": "western-leader",
   "metadata": {},
   "outputs": [],
   "source": [
@ -283,6 +299,7 @@
  {
   "cell_type": "code",
   "execution_count": 11,
+   "id": "artistic-table",
   "metadata": {},
   "outputs": [
    {
@ -313,6 +330,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "spatial-agency",
   "metadata": {},
   "source": [
    "Now the network actually has learned a good inverse of the parabola function! The following plot shows the solution in green."
@ -321,6 +339,7 @@
  {
   "cell_type": "code",
   "execution_count": 12,
+   "id": "indonesian-abraham",
   "metadata": {},
   "outputs": [
    {
@ -350,13 +369,14 @@
  },
  {
   "cell_type": "markdown",
+   "id": "prostate-radio",
   "metadata": {},
   "source": [
    "This looks much better 😎, at least in the range of 0.1 to 1. \n",
    "\n",
    "What has happened here?\n",
    "\n",
-    "- We've prevented an undesired averaging of multiple modes in the solution by evaluating our discrete model w.r.t. current prediction of the network, rather than using a pre-computed solution. This let's us find the best single mode near the network prediction, and prevents an averaging of the modes that are contained in the solution manifold.\n",
+    "- We've prevented an undesired averaging of multiple modes in the solution by evaluating our discrete model w.r.t. current prediction of the network, rather than using a pre-computed solution. This let's us find the best single mode near the network prediction, and prevents an averaging of the modes that exist in the solution manifold.\n",
    "\n",
    "- We're still only getting one side of the curve! This is to be expected, because we're representing the solutions with a deterministic function. Hence we can only represent a single mode. Interestingly, whether it's the top or bottom mode is determined by the random initialization of the weights in $f$ - run the example a couple of time to see this effect in action. To capture multiple modes we'd need to extend the model to capture the full distribution of the outputs and parametrize it with additional dimensions.\n",
    "\n",
@ -365,6 +385,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "necessary-filename",
   "metadata": {},
   "source": [
    "## Discussion\n",
@ -377,6 +398,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "useful-special",
   "metadata": {},
   "source": [
    "## Next steps\n",
@ -389,6 +411,7 @@
  {
   "cell_type": "code",
   "execution_count": null,
+   "id": "victorian-discipline",
   "metadata": {},
   "outputs": [],
   "source": []
@ -410,7 +433,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.6"
+   "version": "3.8.5"
  }
 },
 "nbformat": 4,
--- a/intro.md
+++ b/intro.md
@ -24,13 +24,13 @@ Some visual examples of hybrid solvers, i.e. numerical simulators that are enhan

 As a _sneak preview_, in the next chapters we'll show:

- How to train networks to infer fluid flow solutions around shapes like airfoils in one go, i.e., without needing a simulator.
+- How to train networks to infer fluid flows around shapes like airfoils in one go, i.e., a _surrogate model_ that replaces a traditional numerical simulation.

- We'll show how to use model equations as residual to train networks that represent solutions, and how to improve upon this behavior by using differentiable simulations.
+- We'll show how to use model equations as residual to train networks that represent solutions, and how to improve upon these residual constraints by using _differentiable simulations_.

- Even more tightly coupling a full _rough_ simulator for control problems is another topic. E.g., we'll demonstrate how to circumvent the convergence problems of standard reinforcement learning techniques by leveraging simulators in the training loop.
+- How to more tightly interact with a full simulator for _control problems_. E.g., we'll demonstrate how to circumvent the convergence problems of standard reinforcement learning techniques by leveraging simulators in the training loop.

-This _book_, where book stands for a collection of text, equations, images and code examples,
+This _book_, where "book" stands for a collection of texts, equations, images and code examples,
 is maintained by the
 [TUM Physics-based Simulation Group](https://ge.in.tum.de). Feel free to contact us via
 [old fashioned email](mailto:i15ge@cs.tum.edu) if you have any comments. 
@ -41,7 +41,7 @@ This collection of materials is a living document, and will grow and change over
 Feel free to contribute 😀 
 We also maintain a [link collection](https://github.com/thunil/Physics-Based-Deep-Learning) with recent research papers.

-```{admonition} Code, executable, right here, right now
+```{admonition} Executable code, right here, right now
 :class: tip
 We focus on jupyter notebooks, a key advantage of which is that all code examples
 can be executed _on the spot_, out of a browser. You can modify things and 
--- a/notation.md
+++ b/notation.md
@ -23,10 +23,13 @@

 | ABbreviation | Meaning |
 | --- | --- |
-| CNN | Convolutional neural network |
-| DL | Deep learning |
-| NN | Neural network |
-| PBDL | Physics-based deep learning |
+| CNN  | Convolutional Neural Network |
+| DL   | Deep Learning |
+| GD   | (steepest) Gradient Descent|
+| NN   | Neural Network |
+| PDE  | Partial Differential Equation |
+| PBDL | Physics-Based Deep Learning |
+| SGD  | Stochastic Gradient Descent|



--- a/overview-equations.md
+++ b/overview-equations.md
@ -1,6 +1,57 @@
-Model Equations
+Models and Equations
 ============================

+Below we'll give a _very_ (really very!) brief intro to deep learning, primarily to introduce the notation.
+In addition we'll discuss some _model equations_ below. Note that we won't use _model_ to denote trained neural networks, in contrast to some other texts. These will only be called "NNs" or "networks". A "model" will always denote model equations for a physical effect, typically a PDE.
+
+## Deep Learning and Neural Networks
+
+There are lots of great introductions to deep learning - hence, we'll keep it short:
+our goal is to approximate $f^*(x)=y$ with an NN $f(x;\theta)$,
+given some formulation for an error $e(y,y^*)$ with $y=f(x;\theta)$ being the output
+of the NN, and $y^*$ denoting a reference or ground truth value.
+This gives a minimization problem to find $f(x;\theta)$ such that $e$ is minimized.
+
+We typically optimize, i.e. _train_, 
+with some variant of a stochastic gradient descent (SGD) optimizer.
+We'll rely on auto-diff to compute the gradient w.r.t. weights, $\partial f / \partial \theta$,
+We will also assume that $e$ denotes a _scalar_ error function (also
+called cost, or objective function sometimes).
+This is crucial for the efficient calculation of gradients.
+
+<!-- general goal, minimize E for e(x,y) ... cf. eq. 8.1 from DLbook 
+introduce scalar loss, always(!) scalar...  (also called *cost* or *objective* function) -->
+
+For training we distinguish: the **training** data set drawn from some distribution, 
+the **validation** set (from the same distribution, but different data),
+and **test** data sets with _some_ different distribution than the training one.
+The latter distinction is important! For the test set we want 
+_out of distribution_ (OOD) data to check how well our trained model generalizes.
+Note that this gives a huge range of difficulties: from tiny changes that will certainly work
+up to completely different inputs that are essentially guaranteeed to fail. Hence,
+test data should be generated with care.
+
+Enough for now - if all the above wasn't totally obvious for you, we very strongly recommend to 
+read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org),
+especially the sections about [MLPs]https://www.deeplearningbook.org/contents/mlp.html and 
+"Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.html).
+
+```{admonition} Note: Classification vs Regression
+:class: tip
+
+The classic ML distinction between _classification_ and _regression_ problems is not so important here:
+we only deal with _regression_ problems in the following.
+
+```
+
+<!--
+maximum likelihood estimation
+Also interesting: from a math standpoint ''just'' non-linear optimization ...
+-->
+
+## Partial Differential Equations as Physical Models
+
+
 TODO

 give an overview of PDE models to be used later on ...
@ -98,10 +149,11 @@ $\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ .

 ---

+## Some PDEs we'll use later on
+
+
 Later on, additional equations...

-
-
 Navier-Stokes, in 2D:

 $\begin{aligned}
--- a/overview.md
+++ b/overview.md
@ -134,17 +134,16 @@ each of the different techniques is particularly useful.

 To be a bit more specific, _physics_ is a huge field, and we can't cover everything... 

-```{note}
-For now our focus are:
- _field-based simulations_ (no Lagrangian methods)
- combinations with _deep learning_ (plenty of other interesting ML techniques, but not here)
- experiments as _outlook_ (replace synthetic data with real)
+```{note} The focus of this book is on...
+- _Field-based simulations_ (no Lagrangian methods)
+- Combinations with _deep learning_ (plenty of other interesting ML techniques, but not here)
+- Experiments as _outlook_ (replace synthetic data with real)
 ```

 It's also worth noting that we're starting to build the methods from some very
 fundamental steps. Here are some considerations for skipping ahead to the later chapters.

-```{admonition} You can skip ahead if...
+```{admonition} Hint: You can skip ahead if...
 :class: tip

 - you're very familiar with numerical methods and PDE solvers, and want to get started with DL topics right away. The _Supervised Learning_ chapter is a good starting point then.
@ -171,30 +170,3 @@ Ling et al. isotropic turb, small FC, unused?

 PINNs ... and more ... -->

-
-## Deep Learning and Neural Networks
-
-TODO
-
-Very brief intro, basic equations... approximate $f^*(x)=y$ with NN $f(x;\theta)$ ...
-
-learn via GD, $\partial f / \partial \theta$ 
-
-general goal, minimize E for e(x,y) ... cf. eq. 8.1 from DLbook
-
-introduce scalar loss, always(!) scalar...
-  (also called *cost* or *objective* function)
-
-distuingish: training, validation and (out of distribution!) test sets.
-
-Read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org),
-especially about [MLPs]https://www.deeplearningbook.org/contents/mlp.html and 
-"Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.html).
-
-**Note:** Classic distinction between _classification_ and _regression_ problems not so important here,
-we only deal with _regression_ problems in the following.
-
-maximum likelihood estimation
-
-Also interesting: from a math standpoint ''just'' non-linear optimization ...
-