more text

2021-01-12 11:50:42 +08:00 · 2021-01-12 11:50:42 +08:00 · 0063c71c05
commit 0063c71c05
parent 03a4c7ef29
12 changed files with 447 additions and 169 deletions
--- a/_toc.yml
+++ b/_toc.yml
@ -1,9 +1,9 @@
-# Table of content
+# PBDL Table of content (cf https://jupyterbook.org/customize/toc.html)
 # Learn more at https://jupyterbook.org/customize/toc.html
 #
 - file: intro
 - file: overview.md
  sections:
    - file: overview-equations.md
    - file: overview-burgers-forw.ipynb
    - file: overview-ns-forw.ipynb
 - file: supervised
@ -12,6 +12,7 @@
 - file: physicalloss
  sections:
    - file: physicalloss-code.ipynb
    - file: physicalloss-discuss.md
 - file: diffphys
  sections:
    - file: diffphys-code-gradient.ipynb
@ -23,3 +24,4 @@
    - file: markdown
    - file: notebooks
 - file: references
 - file: notation
--- a/diffphys-discuss.md
+++ b/diffphys-discuss.md
@ -29,7 +29,7 @@ For the PINN representation with fully-connected networks on the other hand, we
 The following table summarizes these findings:
-| Method   |  Pro   |  Con  |
+| Method   |  ✅ Pro   |  ❌ Con  |
 |----------|-------------|------------|
 | **PINN** | - Analytic derivatives via back-propagation  | - Expensive evaluation of NN, as well as derivative calculations | 
 |          | - Simple to implement  | - Incompatible with existing numerical methods     | 
--- a/intro.md
+++ b/intro.md
@ -73,9 +73,15 @@ The contents of the following files would not have been possible without the hel
 - Ms. y
 - ...
 % tests...
 % some markdown tests follow ...
 ---
 a b c
 ```{admonition} My title2
@ -86,6 +92,7 @@ See also... Test link: {doc}`supervised`
 ✅  Do this , ❌  Don't do this
 % ----------------
 ---
@ -152,6 +159,6 @@ time series, sequence prediction?] {cite}`wiewel2019lss,bkim2019deep,wiewel2020l
 _Misc jupyter book TODOs_
 - Fix latex PDF output
- How to include links in references?
+- How to include links to papers in the bibtex references?
--- a/jupyter-book-reference.md
+++ b/jupyter-book-reference.md
@ -1,5 +1,8 @@
-Jupyter Book Reference Stuff
+Old Jupyter Book Reference Stuff
 =======================
 There are many ways to write content in Jupyter Book. This short section
 covers a few tips for how to do so.
 TODO remove sometime...
--- a/notation.md
+++ b/notation.md
@ -0,0 +1,38 @@
 # Notation and Abbreviations
 ## Math notation:
 | Symbol | Meaning |
 | --- | --- |
 | $A$ | matrix |
 | $\eta$ | learning rate or step size |
 | $\Gamma$ | boundary of computational domain $\Omega$ |
 | $f()$ | approximated version of $f^{*}$ |
 | $f^{*}()$ | generic function to be approximated, typically  unknown |
 | $\Omega$ | computational domain |
 | $\mathcal P$ | physical model, PDE |
 | $\theta$ | neural network params |
 | $t$ | time dimension |
 | $\mathbf{u}$ | vector-valued velocity |
 | $x$ | neural network input or spatial coordinate |
 | $y$ | neural network output |
 ## Summary of the most important abbreviations:
 | ABbreviation | Meaning |
 | --- | --- |
 | CNN | Convolutional neural network |
 | DL | Deep learning |
 | NN | Neural network |
 | PBDL | Physics-based deep learning |
 % test table formatting in markdown
 % |    | Sentence #  | Word    | POS   | Tag   |
 % |---:|:-------------|:-----------|:------|:------|
 % | 1 | Sentence: 1  | They       | PRP   | O     |
 % | 2 | Sentence: 1  | marched    | VBD   | O     |
--- a/overview-equations.md
+++ b/overview-equations.md
@ -0,0 +1,138 @@
 Model Equations
 ============================
 overview of PDE models to be used later on ...
 domain $\Omega$, boundary $\Gamma$
 continuous functions, but few assumptions about continuity for now...
 ```{admonition} Notation and abbreviations
 :class: seealso
 If unsure, please check the summary of our mathematical notation
 and the abbreviations used inn: {doc}`notation`, at the bottom of the left panel.
 ```
 % \newcommand{\pde}{\mathcal{P}}         % PDE ops
 % \newcommand{\pdec}{\pde_{s}}
 % \newcommand{\manifsrc}{\mathscr{S}}    % coarse / "source"
 % \newcommand{\pder}{\pde_{R}}
 % \newcommand{\manifref}{\mathscr{R}}
 % vc - coarse solutions
 % \renewcommand{\vc}[1]{\vs_{#1}}            % plain coarse state at time t
 % \newcommand{\vcN}{\vs}                     % plain coarse state without time 
 % vc - coarse solutions, modified by correction
 % \newcommand{\vct}[1]{\tilde{\vs}_{#1}}     % modified / over time at time t
 % \newcommand{\vctN}{\tilde{\vs}}            % modified / over time without time
 % vr - fine/reference solutions
 % \renewcommand{\vr}[1]{\mathbf{r}_{#1}}            % fine / reference state at time t , never modified
 % \newcommand{\vrN}{\mathbf{r}}                     % plain coarse state without time 
 % \newcommand{\project}{\mathcal{T}}           % transfer operator fine <> coarse
 % \newcommand{\loss}{\mathcal{L}}              % generic loss function
 % \newcommand{\nn}{f_{\theta}}
 % \newcommand{\dt}{\Delta t}                   % timestep
 % \newcommand{\corrPre}{\mathcal{C}_{\text{pre}}}            % analytic correction , "pre computed"
 % \newcommand{\corr}{\mathcal{C}}                         % just C for now...
 % \newcommand{\nnfunc}{F} % {\text{NN}}
 Some notation from SoL, move with parts from overview into "appendix"?
 We typically solve a discretized PDE $\mathcal{P}$ by performing discrete time steps of size $\Delta t$. 
 Each subsequent step can depend on any number of previous steps,
 $\mathbf{u}(\mathbf{x},t+\Delta t) = \mathcal{P}(\mathbf{u}(\mathbf{x},t), \mathbf{u}(\mathbf{x},t-\Delta t),...)$, 
 where
 $\mathbf{x} \in \Omega \subseteq \mathbb{R}^d$ for the domain $\Omega$ in $d$
 dimensions, and $t \in \mathbb{R}^{+}$.
 Numerical methods yield approximations of a smooth function such as $\mathbf{u}$ in a discrete
 setting and invariably introduce errors. These errors can be measured in terms
 of the deviation from the exact analytical solution.
 For discrete simulations of
 PDEs, these errors are typically expressed as a function of the truncation, $O(\Delta t^k)$ 
 for a given step size $\Delta t$ and an exponent $k$ that is discretization dependent.
 The following PDEs typically work with a continuous
 velocity field $\mathbf{u}$ with $d$ dimensions and components, i.e.,
 $\mathbf{u}(\mathbf{x},t): \mathbb{R}^d \rightarrow \mathbb{R}^d $.
 For discretized versions below, $d_{i,j}$ will denote the dimensionality
 of a field such as the velocity,
 with domain size $d_{x},d_{y},d_{z}$ for source and reference in 3D.
 % with $i \in \{s,r\}$ denoting source/inference manifold and reference manifold, respectively.
 %This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$
 %Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.
 For all PDEs, we use non-dimensional parametrizations as outlined below,
 and the components of the velocity vector are typically denoted by $x,y,z$ subscripts, i.e.,
 $\mathbf{u} = (u_x,u_y,u_z)^T$ for $d=3$.
 Burgers' equation in 2D. It represents a well-studied advection-diffusion PDE:
 $\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
  \nu \nabla\cdot \nabla u_x + g_x(t), 
  \\
  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
  \nu \nabla\cdot \nabla u_y + g_y(t)
 $, 
 where $\nu$ and $\mathbf{g}$ denote diffusion constant and external forces, respectively.
 Burgers' equation in 1D without forces with $u_x = u$:
 %\begin{eqnarray}
 $\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ .
 ---
 Later on, additional equations...
 Navier-Stokes, in 2D:
 $
    \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
    - \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_x  
    \\
    \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
    - \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_y  
    \\
    \text{subject to} \quad \nabla \cdot \mathbf{u} = 0
 $
 Navier-Stokes, in 2D with Boussinesq:
 %$\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x$
 %$ -\frac{1}{\rho} \nabla p $
 $
  \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p 
  \\
  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \eta d
  \\
  \text{subject to} \quad \nabla \cdot \mathbf{u} = 0,
  \\
  \frac{\partial d}{\partial{t}} + \mathbf{u} \cdot \nabla d = 0 
 $
 Navier-Stokes, in 3D:
 $
  \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_x 
  \\
  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_y 
  \\
  \frac{\partial u_z}{\partial{t}} + \mathbf{u} \cdot \nabla u_z = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_z 
  \\
  \text{subject to} \quad \nabla \cdot \mathbf{u} = 0.
 $
--- a/overview.md
+++ b/overview.md
@ -1,12 +1,14 @@
 Overview
 ============================
-The following "book" of targets _"Physics-Based Deep Learning"_ techniques, 
+The following collection of digital documents, i.e. "book", 
-i.e., methods that combine physical modeling and numerical simulations with
+targets _Physics-Based Deep Learning_ techniques.
-deep learning (DL). Here, DL will typically refer to methods based
+By that we mean combining physical modeling and numerical simulations with
-on artificial neural networks. The general direction of 
+methods based on artificial neural networks. 
-Physics-Based Deep Learning represents a very
+The general direction of Physics-Based Deep Learning represents a very
-active, quickly growing and exciting field of research. 
+active, quickly growing and exciting field of research -- we want to provide 
 a starting point for new researchers as well as a hands-on introduction into
 state-of-the-art resarch topics. 
 ## Motivation
@ -50,8 +52,8 @@ whether key phenomena are visible in the solutions or not.
 :class: tip
 Thus, a key aspect that we want to address in the following in the following is:
 - explain how to use DL,
- and how to combine it with existing knowledge of physics and simulations,
+- how to combine it with existing knowledge of physics and simulations,
- **without throwing away** all existing numerical knowledeg and techniques!
+- **without throwing away** all existing numerical knowledge and techniques!
 ```
 Rather, we want to build on all the neat techniques that we have
@ -112,7 +114,7 @@ starting points with code examples, and illustrate pros and cons of the
 different approaches. In particular, it's important to know in which scenarios 
 each of the different techniques is particularly useful.
-```{admonition} Skip ahead if...
+```{admonition} You can skip ahead if...
 :class: tip
 - you're very familiar with numerical methods and PDE solvers, and want to get started with DL topics right away. The _Supervised Learning_ chapter is a good starting point then.
@ -138,37 +140,13 @@ PINNs ... and more ...
 ## Deep Learning and Neural Networks
-Very brief intro, basic equations... approximate $f(x)=y$ with NN ...
+Very brief intro, basic equations... approximate $f^*(x)=y$ with NN $f(x;\theta)$ ...
-Details in [Deep Learning book](https://www.deeplearningbook.org)
+learn via GD, $\partial f / \partial \theta$ 
 Read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org),
 especially about [MLPs]https://www.deeplearningbook.org/contents/mlp.html and 
 "Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.html).
-## Notation and Abbreviations
+**Note:** Classic distinction between _classification_ and _regression_ problems not so important here,
-
+we only deal with _regression_ problems in the following.
 Unify notation... TODO ...
 Math notation:
 | Symbol | Meaning |
 | --- | --- |
 | $x$ | NN input |
 | $y$ | NN output |
 | $\theta$ | NN params |
 Quick summary of the most important abbreviations:
 | ABbreviation | Meaning |
 | --- | --- |
 | CNN | Convolutional neural network |
 | DL | Deep learning |
 | NN | Neural network |
 | PBDL | Physics-based deep learning |
 test table formatting in markdown
 |    | Sentence #  | Word    | POS   | Tag   |
 |---:|:-------------|:-----------|:------|:------|
 | 1 | Sentence: 1  | They       | PRP   | O     |
 | 2 | Sentence: 1  | marched    | VBD   | O     |
--- a/physicalloss-discuss.md
+++ b/physicalloss-discuss.md
@ -0,0 +1,37 @@
 Discussion of Physical Soft-Constraints
 =======================
 The good news so far is - we have a DL method that can include 
 physical laws in the form of soft constraints by minimizing residuals.
 However, as the very simple previous example illustrates, this is just a conceptual
 starting point.
 On the positive side, we can leverage DL frameworks with backpropagation to compute
 the derivatives of the model. At the same time, this puts us at the mercy of the learned
 representation regarding the reliability of these derivatives. Also, each derivative
 requires backpropagation through the full network, which can be very slow. Especially so
 for higher-order derivatives.
 And while the setup is realtively simple, it is generally difficult to control. The NN
 has flexibility to refine the solution by itself, but at the same time, tricks are necessary
 when it doesn't pick the right regions of the solution.
 In general, a fundamental drawback of this approach is that it does combine with traditional
 numerical techniques well. E.g., learned representation is not suitable to be refined with 
 a classical iterative solver such as the conjugate gradient method. This means many
 powerful techniques that were developed in the past decades cannot be used in this context.
 Bringing these numerical methods back into the picture will be one of the central
 goals of the next sections.
 ✅ Pro: 
 - uses physical model
 - derivatives via backpropagation
 ❌ Con: 
 - slow ...
 - only soft constraints
 - largely incompatible _classical_ numerical methods
 - derivatives rely on learned representation
 Next, let's look at how we can leverage numerical methods to improve the DL accuracy and efficiency
 by making use of differentiable solvers.
--- a/physicalloss.md
+++ b/physicalloss.md
@ -1,134 +1,98 @@
 Physical Loss Terms
 =======================
 The supervised setting of the previous sections can quickly 
 yield approximate solutions with a fairly simple training process, but what's
 quite sad to see here is that we only use physical models and numerics
 as an "external" tool to produce a big pile of data 😢.
-Using the equations now, but no numerical methods!
+## Using Physical Models
-Still interesting, leverages analytic derivatives of NNs, but lots of problems
+We can improve this setting by trying to bring the model equations (or parts thereof)
 into the training process. E.g., given a PDE for $\mathbf{u}(x,t)$ with a time evolution, 
 we can typically express it in terms of a function $\mathcal F$ of the derivatives 
 of $\mathbf{u}$ via  
 $
  \mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{x..x})
 $,
 where the $_{x}$ subscripts denote spatial derivatives of higher order.
 In this context we can employ DL by approxmating the unknown $\mathbf{u}$ itself 
 with a NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
 naturally should be satisfied, i.e., the residual $R$ should be equal to zero: 
 $
  R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{x..x}) = 0
 $
 This nicely integrates with the objective for training a neural network: similar to before
 we can collect sample solutions 
 $[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(x)=y$. 
 This is typically important, as most practical PDEs we encounter do not have unique solutions
 unless initial and boundary conditions are specified. Hence, if we only consider $R$ we might
 get solutions with random offset or other undesirable components. Hence the supervised sample points
 help to _pin down_ the solution in certain places.
 Now our training objective becomes
 $\text{arg min}_{\theta} \ \alpha_0 \sum_i (f(x_i ; \theta)-y_i)^2 + \alpha_1 R(x_i) $,
 where $\alpha_{0,1}$ denote hyper parameters that scale the contribution of the supervised term and 
 the residual term, respectively. We could of course add additional residual terms with suitable scaling factors here.
 Note that, similar to the data samples used for supervised training, we have no guarantees that the
 residual terms $R$ will actually reach zero during training. The non-linear optimization of the training process
 will minimize the supervised and residual terms as much as possible, but worst case, large non-zero residual 
 contributions can remain. We'll look at this in more detail in the upcoming code example, for now it's important 
 to remember that physical constraints in this way only represent _soft-constraints_, without guarantees
 of minimizing these constraints.
 ## Neural network derivatives
 In order to compute the residuals at training time, it would be possible to store 
 the unknowns of $\mathbf{u}$ on a computational mesh, e.g., a grid, and discretize the equations of
 $R$ there. This has a fairly long "tradition" in DL, and was proposed by Tompson et al. {cite}`tompson2017` early on.
 Instead, a more widely used variant of employing physical soft-constraints {cite}`raissi2018hiddenphys`
 uses fully connected NNs to represent $\mathbf{u}$. This has some interesting pros and cons that we'll outline in the following.
 Due to the popularity of the version, we'll also focus on it in the following code examples and comparisons.
 The central idea here is that the aforementioned general function $f$ that we're after in our learning problems
 can be seen as a representation of a physical field we're after. Thus, the $\mathbf{u}(x)$ will 
 be turned into $\mathbf{u}(x, \theta)$ where we choose $\theta$ such that the solution to $\mathbf{u}$ is 
 represented as precisely as possible.
 One nice side effect of this viewpoint is that NN representations inherently support the calculation of derivatives. 
 The derivative $\partial f / \partial \theta$ was a key building block for learning via gradient descent, as explained 
 in {doc}`overview`. Here, we can use the same tools to compute spatial derivatives such as $\partial \mathbf{u} / \partial x$,
 Note that above for $R$ we've written this derivative in the shortened notation as $\mathbf{u}_{x}$.
 For functions over time this of course also works for $\partial \mathbf{u} / \partial t$, i.e. $\mathbf{u}_{t}$ in the notation above.
 Thus, for some generic $R$, made up of $\mathbf{u}_t$ and $\mathbf{u}_{x}$ terms, we can rely on the back-propagation algorithm
 of DL frameworks to compute these derivatives once we have a NN that represents $\mathbf{u}$. Essentially, this gives us a 
 function (the NN) that receives space and time coordinates to produce a solution for $\mathbf{u}$. Hence, the input is typically
 quite low-dimensional, e.g., 3+1 values for a 3D case over time, and often produces a scalar value or a spatial vector.
 Due to the lack of explicit spatial sampling points, an MLP, i.e., fully-connected NN is the architecture of choice here.
 To pick a simple example, Burgers equation in 1D,
 $\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ , we can directly
 formulate a loss term $R = \frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} - \nu \frac{\partial^2 u}{\partial x^2} u$ that should be minimized as much as possible at training time. For each of the terms, e.g. $\frac{\partial u}{\partial x}$,
 we can simply query the DL framework that realizes $u$ to obtain the corresponding derivative. 
 For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we can typically simply query the derivative function of the framework twice. In the following section, we'll give a specific example of how that works in tensorflow.
 ## Summary so far
 This gives us a method to include physical equations into DL learning as a soft-constraint.
 Typically, this setup is suitable for _inverse_ problems, where we have certain measurements or observations
 that we wish to find a solution of a model PDE for. Because of the high expense of the reconstruction (to be 
 demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult 
 to capture a wide range of solutions, such as the previous supervised airfoil example, in this way.
 ```{figure} resources/placeholder.png
 ---
-
+height: 220px
-
+name: pinn-training
 % \newcommand{\pde}{\mathcal{P}}         % PDE ops
 % \newcommand{\pdec}{\pde_{s}}
 % \newcommand{\manifsrc}{\mathscr{S}}    % coarse / "source"
 % \newcommand{\pder}{\pde_{R}}
 % \newcommand{\manifref}{\mathscr{R}}
 % vc - coarse solutions
 % \renewcommand{\vc}[1]{\vs_{#1}}            % plain coarse state at time t
 % \newcommand{\vcN}{\vs}                     % plain coarse state without time 
 % vc - coarse solutions, modified by correction
 % \newcommand{\vct}[1]{\tilde{\vs}_{#1}}     % modified / over time at time t
 % \newcommand{\vctN}{\tilde{\vs}}            % modified / over time without time
 % vr - fine/reference solutions
 % \renewcommand{\vr}[1]{\mathbf{r}_{#1}}            % fine / reference state at time t , never modified
 % \newcommand{\vrN}{\mathbf{r}}                     % plain coarse state without time 
 % \newcommand{\project}{\mathcal{T}}           % transfer operator fine <> coarse
 % \newcommand{\loss}{\mathcal{L}}              % generic loss function
 % \newcommand{\nn}{f_{\theta}}
 % \newcommand{\dt}{\Delta t}                   % timestep
 % \newcommand{\corrPre}{\mathcal{C}_{\text{pre}}}            % analytic correction , "pre computed"
 % \newcommand{\corr}{\mathcal{C}}                         % just C for now...
 % \newcommand{\nnfunc}{F} % {\text{NN}}
 Some notation from SoL, move with parts from overview into "appendix"?
 We typically solve a discretized PDE $\mathcal{P}$ by performing discrete time steps of size $\Delta t$. 
 Each subsequent step can depend on any number of previous steps,
 $\mathbf{u}(\mathbf{x},t+\Delta t) = \mathcal{P}(\mathbf{u}(\mathbf{x},t), \mathbf{u}(\mathbf{x},t-\Delta t),...)$, 
 where
 $\mathbf{x} \in \Omega \subseteq \mathbb{R}^d$ for the domain $\Omega$ in $d$
 dimensions, and $t \in \mathbb{R}^{+}$.
 Numerical methods yield approximations of a smooth function such as $\mathbf{u}$ in a discrete
 setting and invariably introduce errors. These errors can be measured in terms
 of the deviation from the exact analytical solution.
 For discrete simulations of
 PDEs, these errors are typically expressed as a function of the truncation, $O(\Delta t^k)$ 
 for a given step size $\Delta t$ and an exponent $k$ that is discretization dependent.
 The following PDEs typically work with a continuous
 velocity field $\mathbf{u}$ with $d$ dimensions and components, i.e.,
 $\mathbf{u}(\mathbf{x},t): \mathbb{R}^d \rightarrow \mathbb{R}^d $.
 For discretized versions below, $d_{i,j}$ will denote the dimensionality
 of a field such as the velocity,
 with domain size $d_{x},d_{y},d_{z}$ for source and reference in 3D.
 % with $i \in \{s,r\}$ denoting source/inference manifold and reference manifold, respectively.
 %This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$
 %Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.
 For all PDEs, we use non-dimensional parametrizations as outlined below,
 and the components of the velocity vector are typically denoted by $x,y,z$ subscripts, i.e.,
 $\mathbf{u} = (u_x,u_y,u_z)^T$ for $d=3$.
 Burgers' equation in 2D. It represents a well-studied advection-diffusion PDE:
 $\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
  \nu \nabla\cdot \nabla u_x + g_x(t), 
  \\
  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
  \nu \nabla\cdot \nabla u_y + g_y(t)
 $, 
 where $\nu$ and $\mathbf{g}$ denote diffusion constant and external forces, respectively.
 Burgers' equation in 1D without forces with $u_x = u$:
 %\begin{eqnarray}
 $\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ .
 ---
-
+TODO, visual overview of PINN training
-Later on, additional equations...
+```
 Navier-Stokes, in 2D:
 $
    \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
    - \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_x  
    \\
    \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
    - \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_y  
    \\
    \text{subject to} \quad \nabla \cdot \mathbf{u} = 0
 $
 Navier-Stokes, in 2D with Boussinesq:
 %$\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x$
 %$ -\frac{1}{\rho} \nabla p $
 $
  \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p 
  \\
  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \eta d
  \\
  \text{subject to} \quad \nabla \cdot \mathbf{u} = 0,
  \\
  \frac{\partial d}{\partial{t}} + \mathbf{u} \cdot \nabla d = 0 
 $
 Navier-Stokes, in 3D:
 $
  \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_x 
  \\
  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_y 
  \\
  \frac{\partial u_z}{\partial{t}} + \mathbf{u} \cdot \nabla u_z = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_z 
  \\
  \text{subject to} \quad \nabla \cdot \mathbf{u} = 0.
 $
--- a/references.bib
+++ b/references.bib
@ -762,3 +762,34 @@
 	PUBLISHER = {Dept. of Computer Science 10, University of Erlangen-Nuremberg}
 }
 % ----------------- external --------------------
@inproceedings{tompson2017,
  title =	 {Accelerating Eulerian Fluid Simulation With Convolutional Networks},
  booktitle =	 {Proceedings of Machine Learning Research},
  author =	 {Tompson, Jonathan and Schlachter, Kristofer and Sprechmann, Pablo and Perlin, Ken},
  year =	 2017,
  pages =	 {3424--3433}
 }
@article{raissi2018hiddenphys,
  title={Hidden physics models: Machine learning of nonlinear partial differential equations},
  author={Raissi, Maziar and Karniadakis, George Em},
  journal={Journal of Computational Physics},
  volume={357},
  pages={125--141},
  year={2018},
  publisher={Elsevier}
 }
--- a/resources/placeholder.png
+++ b/resources/placeholder.png
--- a/supervised.md
+++ b/supervised.md
@ -1,5 +1,85 @@
 Supervised Learning
 =======================
-Doing things the old fashioned way...
+_Supervised_ here essentially means: "doing things the old fashioned way". Old fashioned in the context of 
 deep learning (DL), of course, so it's still fairly new, and old fashioned of course also doesn't always mean bad.
 In a way this viewpoint is a starting point for all projects one would encounter in the context of DL, and
 hence is worth studying. And although it typically yields inferior results to approaches that more tightly 
 couple with physics, it nonetheless can be the only choice in certain application scenarios where no good
 model equations exist.
 ## Problem Setting
 For supervised learning, we're faced with an 
 unknown function $f^*(x)=y$, collect lots of pairs of data $[x_0,y_0], ...[x_n,y_n]$ (the training data set)
 and directly train a NN to represent an approximation of $f^*$ denoted as $f$, such
 that $f(x)=y$.
 The $f$ we can obtain is typically not exact, 
 but instead we obtain it via a minimization problem:
 by adjusting weights $\theta$ of our representation with $f$ such that
 $\text{arg min}_{\theta} \sum_i (f(x_i ; \theta)-y_i)^2$.
 This will give us $\theta$ such that $f(x;\theta) \approx y$ as accurately as possible given
 our choice of $f$ and the hyper parameters for training. Note that above we've assumed 
 the simplest case of an $L^2$ loss. A more general version would use an error metric $e(x,y)$
 to be minimized via $\text{arg min}_{\theta} \sum_i e( f(x_i ; \theta) , y_i) )$. The choice
 of a suitable metric is topic we will get back to later on.
 Irrespective of our choice of metric, this formulation
 gives the actual "learning" process for a supervised approach.
 The training data typically needs to be of substantial size, and hence it is attractive 
 to use numerical simulations to produce a large number of training input-output pairs.
 This means that the training process uses a set of model equations, and approximates
 them numerically, in order to train the NN representation $\tilde{f}$. This
 has a bunch of advantages, e.g., we don't have measurement noise of real-world devices
 and we don't need manual labour to annotate a large number of samples to get training data.
 On the other hand, this approach inherits the common challenges of replacing experiments
 with simulations: first, we need to ensure the chosen model has enough power to predict the 
 bheavior of real-world phenomena that we're interested in.
 In addition, the numerical approximations have numerical errors
 which need to be kept small enough for a chosen application. As these topics are studied in depth
 for classical simulations, the existing knowledge can likewise be leveraged to
 set up DL training tasks.
 ```{figure} resources/placeholder.png
 ---
 height: 220px
 name: supervised-training
 ---
 TODO, visual overview of supervised training
 ```
 ## Applications
 Let's directly look at an example with a fairly complicated context:
 we have a turbulent airflow around wing profiles, and we'd like to know the average motion 
 and pressure distribution around this airfoil for different Reynolds numbers and angles of attack.
 Thus, given an airfoil shape, Reynolds numbers, and angle of attack, we'd like to obtain 
 a velocity field $\mathbf{u}$ and a pressure field $p$ in a computational domain $\Omega$ 
 around the airfoil in the center of $\Omega$.
 This is classically approximated with _Reynolds-Averaged Navier Stokes_ (RANS) models, and this 
 setting is still one of the most widely used applications of Navier-Stokes solver in industry.
 However, instead of relying on traditional numerical methods to solve the RANS equations,
 we know aim for training a neural network that completely bypasses the numerical solver,
 and produces the solution in terms of $\mathbf{u}$ and $p$.
 ## Discussion
 TODO , add as separate section after code?
 TODO , discuss pros / cons of supervised learning
 TODO , CNNs powerful, graphs & co likewise possible
 Pro: 
 - very fast output and training
 Con: 
 - lots of data needed
 - undesirable averaging / inaccuracies due to direct loss
 Outlook: interactions with external "processes" (such as embedding into a solver) very problematic, see DP later on...