unified caps of headings

2021-04-12 09:19:00 +08:00 · 2021-04-12 09:19:00 +08:00 · f1f475373d
commit f1f475373d
parent a9397074e1
17 changed files with 69 additions and 46 deletions
--- a/diffphys-code-burgers.ipynb
+++ b/diffphys-code-burgers.ipynb
@ -293,7 +293,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## More Optimization Steps\n",
+    "## More optimization steps\n",
    "\n",
    "Before moving on to more complex physics simulations, or involving NNs, let's finish the optimization task at hand, and run more steps to get a better solution.\n",
    "\n"
@ -492,7 +492,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Physics-Informed vs. Differentiable Physics Reconstruction\n",
+    "## Physics-informed vs. differentiable physics reconstruction\n",
    "\n",
    "Now we have both versions, the one with the PINN, and the DP version, so let's compare both reconstructions in more detail.\n",
    "\n",
@ -599,7 +599,7 @@
    "Now we have a first example to show similarities and differences of the two approaches. In the section, we'll present a discussion of the findings so far, before moving to more complex cases.\n",
    "\n",
    "\n",
-    "## Next Steps\n",
+    "## Next steps\n",
    "\n",
    "As with the PINN version, there's variety of things that can be improved and experimented with the code above:\n",
    "\n",
--- a/diffphys-code-ns.ipynb
+++ b/diffphys-code-ns.ipynb
@ -89,7 +89,7 @@
    "id": "rdSTbMoaS0Uz"
   },
   "source": [
-    "## Batched Simulations\n",
+    "## Batched simulations\n",
    "\n",
    "Now we can set up the simulation, which will work in line with the previous \"regular\" simulation example from the {doc}`overview-ns-forw`. However, now we'll directly include an additional dimension, in line with a mini-batch used for NN training. For this, we'll include an additional named dimension called `inflow_loc`. This dimension will exist \"above\" the previous spatial dimensions `x`, `y` and the channel dimensions `vector`. As indicated by the name `inflow_loc`, the main differences for this dimension will lie in different locations of the inflow, in order to obtain different flow simulations. The named dimensions in phiflow make it very convenient to broadcast information\n",
    "\n",
@ -515,7 +515,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Next Steps\n",
+    "## Next steps\n",
    "\n",
    "Based on the code example above, we can recommend experimenting with the following:\n",
    "\n",
--- a/diffphys-code-sol.ipynb
+++ b/diffphys-code-sol.ipynb
@ -12,7 +12,7 @@
    " \n",
    "Pretty much all numerical methods contain some form of iterative process. That can be repeated updates over time for explicit solvers,or within a single update step for implicit solvers. Below we'll target iterations over time, an example for the second case could be found [here](https://github.com/tum-pbs/CG-Solver-in-the-Loop).\n",
    "\n",
-    "## Problem Formulation\n",
+    "## Problem formulation\n",
    "\n",
    "In the context of reducing errors, it's crucial to have a _differentiable physics solver_, so that the learning process can take the reaction of the solver into account. This interaction is not possible with supervised learning or PINN training. Even small inference errors of a supervised NN can accumulate over time, and lead to a data distribution that differs from the distribution of the pre-computed data. This distribution shift can lead to sub-optimal results, or even cause blow-ups of the solver.\n",
    "\n",
@ -96,7 +96,7 @@
    "\n",
    "---\n",
    "\n",
-    "## Getting started with the Implementation\n",
+    "## Getting started with the implementation\n",
    "\n",
    "First, let's download the prepared data set (for details on generation & loading cf. https://github.com/tum-pbs/Solver-in-the-Loop), and let's get the data handling out of the way, so that we can focus on the _interesting_ parts..."
   ]
@ -174,7 +174,7 @@
    "id": "OhnzPdoww11P"
   },
   "source": [
-    "## Simulation Setup\n",
+    "## Simulation setup\n",
    "\n",
    "Now we can set up the _source_ simulation $\\mathcal{P}_{s}$. \n",
    "Note that we won't deal with \n",
@ -232,7 +232,7 @@
    "id": "RYFUGICgxk0K"
   },
   "source": [
-    "## Network Architecture\n",
+    "## Network architecture\n",
    "\n",
    "We'll also define two alternative neural networks to represent \n",
    "$\\newcommand{\\vcN}{\\mathbf{s}} \\newcommand{\\corr}{\\mathcal{C}} \\corr$: \n",
@ -364,7 +364,7 @@
   "source": [
    "---\n",
    "\n",
-    "## Data Handling\n",
+    "## Data handling\n",
    "\n",
    "So far so good - we also need to take care of a few more mundane tasks, e.g. the some data handling and randomization. Below we define a `Dataset` class that stores all \"ground truth\" reference data (already downsampled).\n",
    "\n",
@ -637,7 +637,7 @@
    "id": "AbpNPzplQZMF"
   },
   "source": [
-    "## Interleaving Simulation and Network\n",
+    "## Interleaving simulation and NN\n",
    "\n",
    "Now comes the **most crucial** step in the whole setup: we define the chain of simulation steps and network evaluations to be used at training time. After all the work defining helper functions, it's acutally pretty simple: we loop over `msteps`, call the simulator via `KarmanFlow.step` for an input state, and afterwards evaluate the correction via `network(to_keras())`. The correction is then added to the last simulation state in the `prediction` list (we're actually simply overwriting the last simulated step `prediction[-1]` with `velocity + correction[-1]`.\n",
    "\n",
--- a/diffphys-control.ipynb
+++ b/diffphys-control.ipynb
@ -100,7 +100,7 @@
        "id": "tFb5WYgzL-Wf"
      },
      "source": [
-        "## Control of Incompressible Fluids \n",
+        "## Control of incompressible fluids \n",
        "\n",
        "The next sections will walk you through data generation, supervised network initialization and end-to-end training using the differentiable PDE solver, [Φ<sub>Flow</sub>](https://github.com/tum-pbs/PhiFlow). \n",
        "(_Note: this example uses an older version 1.4.1 of Φ<sub>Flow</sub>._)\n",
@ -141,7 +141,7 @@
        "id": "vQCiicZv30de"
      },
      "source": [
-        "## Data Generation\n",
+        "## Data generation\n",
        "\n",
        "Before starting the training, we have to generate a data set to train with. I.e., the goal is to pre-compute our a set of ground truth time sequences $u^*$. Due to the complexity of the training below, we'll use a staged approach that pre-trains a supervised network as a rough initialization, and then refines it to learn control looking further and further ahead into the future (i.e., being trained for longer simulation sequences). \n",
        "\n",
@ -322,7 +322,9 @@
        "id": "cwrv1oHs30dj"
      },
      "source": [
-        "## Supervised Initialization"
+        "## Supervised initialization\n",
+        "\n",
+        "First we define a split of the 1000 data samples into 100 test, 100 validation, and 800 training samples."
      ]
    },
    {
@ -425,7 +427,7 @@
        "id": "imuKMrRG30dm"
      },
      "source": [
-        "## CFE Pretraining with Differentiable Physics"
+        "## CFE pretraining with differentiable physics"
      ]
    },
    {
@ -492,7 +494,7 @@
        "id": "g9HcP2oK30do"
      },
      "source": [
-        "## End-to-end Training with Differentiable Physics"
+        "## End-to-end training with differentiable physics"
      ]
    },
    {
@ -763,11 +765,22 @@
        "id": "XKzyhAGjL-Wv"
      },
      "source": [
-        "## Next Steps\n",
+        "## Next steps\n",
        "\n",
        "- Change the `test_range` indices to look at different examples, and test the generalization of the trained controller networks.\n",
        "- Try using a `RefinedSequence` (instead of a `StaggeredSequence`) to train with the prediction refinement scheme. This will yield a further improved control and reduced density error."
      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "71VhyjxMqCtV"
+      },
+      "source": [
+        ""
+      ],
+      "execution_count": null,
+      "outputs": []
    }
  ]
 }
--- a/diffphys-discuss.md
+++ b/diffphys-discuss.md
@ -10,7 +10,8 @@ additional properties, and summarize the pros and cons.

 ![Divider](resources/divider4.jpg)

-## Time Steps and Iterations
+
+## Time steps and iterations

 When using DP approaches for learning application, there is a large amount of flexibility
 w.r.t. combination of DP and NN building blocks. 
@ -61,7 +62,7 @@ Note that this picture (and the ones before) have assumed an _additive_ influenc

 DP setups with many time steps can be difficult to train: the gradients need to backpropagate through the full chain of PDE solver evaluations and NN evaluations. Typically, each of them represents a non-linear and complex function. Hence for larger numbers of steps, the vanishing and exploding gradient problem can make training difficult (see {doc}`diffphys-code-sol` for some practical tipps how to alleviate this).

-## Alternatives: Noise
+## Alternatives: noise

 It is worth mentioning here that other works have proposed perturbing the inputs and 
 the iterations at training time with noise {cite}`sanchez2020learning` (somewhat similar to
--- a/diffphys-dpvspinn.md
+++ b/diffphys-dpvspinn.md
@ -3,7 +3,10 @@ Diff. Physics versus Phys.-informed Training

 In the previous sections we've seen example reconstructions that used physical residuals as soft constraints, in the form of the PINNs, and reconstructions that used a differentiable physics (DP) solver. While both methods can find minimizers for the same minimization problem, the solutions the obtained differ substantially, as do the behavior of the non-linear optimization problem that we get from each formulation. In the following we discuss these differences in more detail, and we will combine conclusions drawn from the behavior of the Burgers case of the previous sections with observations from research papers.

-## Compatibility with Existing Numerical Methods
+![Divider](resources/divider3.jpg)
+
+
+## Compatibility with existing numerical methods

 It is very obvious that the PINN implementation is quite simple, which is a positive aspect, but at the same time it differs strongly from "typical" discretizations and solution approaches that are usually to employed equations like Burgers equation. The derivatives are computed via the neural network, and hence rely on a fairly accurate representation of the solution to provide a good direction for optimization problems.

@ -31,6 +34,10 @@ That being said, because the DP approaches can cover much larger solution manifo

 As a consequence, these training runs not only take more computational resources per NN iteration, the also need longer to converge. Regarding resources, each computation of the look-ahead potentially requires a large number of simulation steps, and typically a similar amount of resources for the backprop step. Regarding convergence, the complexer signal that should be learned can take more training iterations or even require larger NN structures. 

+
+![Divider](resources/divider2.jpg)
+
+
 ## Summary

 The following table summarizes these findings:
--- a/diffphys.md
+++ b/diffphys.md
@ -27,7 +27,7 @@ Training with differentiable physics mean that one or more differentiable operat
 provide directions to steer the learning process.
 ```

-## Differentiable Operators
+## Differentiable operators

 With the DP direction we build on existing numerical solvers. I.e., 
 the approach is strongly relying on the algorithms developed in the larger field 
@ -123,7 +123,7 @@ one by one.
 For the details of forward and reverse mode differentiation, please check out external materials such 
 as this [nice survey by Baydin et al.](https://arxiv.org/pdf/1502.05767.pdf).

-## Learning via DP Operators 
+## Learning via DP operators 

 Thus, once the operators of our simulator support computations of the Jacobian-vector 
 products, we can integrate them into DL pipelines just like you would include a regular fully-connected layer
@ -331,7 +331,7 @@ and backpropagate through these steps. In line with other DP approaches, this en

 ---

-## Summary of Differentiable Physics so far
+## Summary of differentiable physics so far

 To summarize, using differentiable physical simulations 
 gives us a tool to include physical equations with a chosen discretization into DL learning.
--- a/intro-teaser.ipynb
+++ b/intro-teaser.ipynb
@ -240,7 +240,7 @@
   "id": "dated-requirement",
   "metadata": {},
   "source": [
-    "## A Differentiable Physics approach"
+    "## A differentiable physics approach"
   ]
  },
  {
--- a/overview-equations.md
+++ b/overview-equations.md
@ -4,9 +4,11 @@ Models and Equations
 Below we'll give a very (really _very_!) brief intro to deep learning, primarily to introduce the notation.
 In addition we'll discuss some _model equations_ below. Note that we won't use _model_ to denote trained neural networks, in contrast to some other texts. These will only be called "NNs" or "networks". A "model" will always denote model equations for a physical effect, typically a PDE.

-## Deep Learning and Neural Networks
+## Deep learning and neural networks

-There are lots of great introductions to deep learning - hence, we'll keep it short:
+In this book we focus on the connection with physical
+models, and there are lots of great introductions to deep learning. 
+Hence, we'll keep it short: 
 our goal is to approximate an unknown function

 $f^*(x) = y^*$ , 
@ -56,7 +58,7 @@ maximum likelihood estimation
 Also interesting: from a math standpoint ''just'' non-linear optimization ...
 -->

-## Partial Differential Equations as Physical Models
+## Partial differential equations as physical models

 The following section will give a brief outlook for the model equations
 we'll be using later on in the DL examples.
--- a/overview-ns-forw.ipynb
+++ b/overview-ns-forw.ipynb
@ -176,7 +176,7 @@
    "\n",
    "Just for testing, we've also printed the mean value of the velocities, and the max density after the update. As you can see in the resulting image, we have a first round region of smoke, with a slight upwards motion (which does not show here yet). \n",
    "\n",
-    "## Datatypes and Dimensions\n",
+    "## Datatypes and dimensions\n",
    "\n",
    "The created grids are instances of the class `Grid`.\n",
    "Like tensors, grids also have the `shape` attribute which lists all batch, spatial and channel dimensions.\n",
@ -248,7 +248,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Time Evolution\n",
+    "## Time evolution\n",
    "\n",
    "With this setup, we can easily advance the simulation forward in time a bit more by repeatedly calling the `step` function."
   ]
--- a/overview.md
+++ b/overview.md
@ -140,11 +140,11 @@ different approaches. In particular, it's important to know in which scenarios
 each of the different techniques is particularly useful.


-## More Specifically
+## More specifically

-To be a bit more specific, _physics_ is a huge field, and we can't cover everything... 
+_Physics_ is a huge field, and we can't cover everything here... 

-```{note} The focus of this book lies on...
+```{note} The focus of this book lies on:
 - _Field-based simulations_ (no Lagrangian methods)
 - Combinations with _deep learning_ (plenty of other interesting ML techniques, but not here)
 - Experiments as _outlook_ (replace synthetic data with real-world observations)
--- a/physgrad-comparison.ipynb
+++ b/physgrad-comparison.ipynb
@ -9,7 +9,7 @@
    "The previous section has made many comments about the advantages and disadvantages of different optimization methods. Below we'll show with a practical example how much differences these properties actually make.\n",
    "\n",
    "\n",
-    "## Problem Formulation\n",
+    "## Problem formulation\n",
    "\n",
    "We'll consider a very simple setup to clearly illustrate what's happening: we have a two-dimensional input space $\\mathbf{x}$, a mock \"physical model\" likewise with two dimensions $\\mathbf{z}$, and a scalar loss $L$, i.e. \n",
    "$\\mathbf{x} \\in \\mathbb{R}^2$, \n",
@ -207,7 +207,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Gradient Descent\n",
+    "## Gradient descent\n",
    "\n",
    "For gradient descent, the simple gradient based update from equation {eq}`GD-update`\n",
    "in our setting gives the following update step in $\\mathbf{x}$:\n",
@ -782,7 +782,7 @@
    "\n",
    "---\n",
    "\n",
-    "## Approximate Inversions\n",
+    "## Approximate inversions\n",
    "\n",
    "If an analytic inverse like the `fun_z_inv_analytic` above is not readily available, we can actually resort to optimization schemes like Newton's method or BFGS to approximate it numerically. This is a topic that is orthogonal to the comparison of different optimization methods, but it can be easily illustrated based on the PG example above.\n",
    "\n",
@ -915,7 +915,7 @@
    "\n",
    "---\n",
    "\n",
-    "## Next Steps\n",
+    "## Next steps\n",
    "\n",
    "Based on this code example you can try the following modifications:\n",
    "\n",
--- a/physgrad-nn.md
+++ b/physgrad-nn.md
@ -6,7 +6,7 @@ The discussion in the previous two sections already hints at physical gradients
 By default, PGs would be restricted to functions with square Jacobians. Hence we wouldn't be able to directly use them in optimizations or learning problems, which typically have scalar objective functions.
 In this section, we will first show how PGs can be integrated into the optimization pipeline to optimize scalar objectives.

-## Physical Gradients and Loss Functions
+## Physical Gradients and loss functions

 As before, we consider a scalar objective function $L(z)$ that depends on the result of an invertible simulator $z = \mathcal P(x)$. In {doc}`physgrad` we've outlined the inverse gradient (IG) update $\Delta x = \frac{\partial x}{\partial L} \cdot \Delta L$, where $\Delta L$ denotes a step to take in terms of the loss. 

@ -40,7 +40,7 @@ Using equation {eq}`quasi-newton-update`, we get $\Delta z = \eta \cdot (z^\text
 Once $\Delta z$ is determined, the gradient can be backpropagated to earlier time steps using the inverse simulator $\mathcal P^{-1}$. We've already used this combination of a Newton step for the loss and PGs for the PDE in {doc}`physgrad-comparison`.


-## NN Training 
+## NN training 

 The previous step gives us an update for the input of the discretized PDE $\mathcal P^{-1}(x)$, i.e. a $\Delta x$. If $x$ was an output of an NN, we can then use established DL algorithms to backpropagate the desired change to the weights of the network.
 We have a large collection of powerful methodologies for training neural networks at our disposal, 
@ -82,7 +82,7 @@ $$
 where $\mathcal P_{(x,z)}^{-1}(z + \Delta z)$ is treated as a constant.


-## Iterations and Time Dependence
+## Iterations and time dependence

 The above procedure describes the optimization of neural networks that make a single prediction.
 This is suitable for scenarios to reconstruct the state of a system at $t_0$ given the state at a $t_e > t_0$ or to estimate an optimal initial state to match certain conditions at $t_e$.
@ -93,7 +93,7 @@ Such scenarios arise e.g. in control tasks, where a network induces small forces
 In these scenarios, the process above (Newton step for loss, PG step for physics, GD for the NN) is iteratively repeated, e.g., over the course of different time steps, leading to a series of additive terms in $L$.
 This typically makes the learning task more difficult, as we repeatedly backpropagate through the iterations of the physical solver and the NN, but the PG learning algorithm above extends to these case just like a regular GD training.

-## Time Reversal
+## Time reversal

 The inverse function of a simulator is typically the time-reversed physical process.
 In some cases, simply inverting the time axis of the forward simulator, $t \rightarrow -t$, can yield an adequate global inverse simulator.
@ -103,7 +103,7 @@ Unless the simulator destroys information in practice, e.g., due to accumulated

 ---

-## A Learning Toolbox
+## A learning toolbox

 Taking a step back, what we have here is a flexible "toolbox" for propagating update steps
 through different parts of a system to be optimized. An important takeaway message is that
--- a/physicalloss-code.ipynb
+++ b/physicalloss-code.ipynb
@ -252,7 +252,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## The loss functions & training\n",
+    "## Loss function and training\n",
    "\n",
    "As objective for the learning process we can now combine the _direct_ constraints, i.e., the solution at $t=0.5$ and the Dirchlet $u=0$ boundary conditions with the loss from the PDE residuals. For both boundary constraints we'll use 100 points below, and then sample the solution in the inner region with an additional 1000 points.\n",
    "\n",
--- a/physicalloss.md
+++ b/physicalloss.md
@ -14,7 +14,7 @@ name: physloss-overview
 Physical losses typically combine a supervised loss with a combination of derivatives from the neural network.
 ```

-## Using Physical Models
+## Using physical models

 We can improve this setting by trying to bring the model equations (or parts thereof)
 into the training process. E.g., given a PDE for $\mathbf{u}(\mathbf{x},t)$ with a time evolution, 
--- a/supervised-discuss.md
+++ b/supervised-discuss.md
@ -67,7 +67,7 @@ at extrapolation. So we can't expect an NN to magically work with new inputs.
 Rather, we need to make sure that we can properly shape the input space,
 e.g., by normalization and by focusing on invariants. In short, if you always train
 your networks for inputs in the range $[0\dots1]$, don't expect it to work
-with inputs of $[10\dots11]$. You might be able to subtract an offset of $10$ beforehand,
+with inputs of $[27\dots39]$. You might be able to subtract an offset of $10$ beforehand,
 and re-apply it after evaluating the network.
 As a rule of thumb: always make sure you
 actually train the NN on the kinds of input you want to use at inference time.
@ -96,7 +96,7 @@ avoid overfitting.

 ![Divider](resources/divider2.jpg)

-## Supervised Training in a nutshell
+## Supervised training in a nutshell

 To summarize, supervised training has the following properties.

--- a/supervised.md
+++ b/supervised.md
@ -10,7 +10,7 @@ hence is worth studying. While it typically yields inferior results to approache
 couple with physics, it nonetheless can be the only choice in certain application scenarios where no good
 model equations exist.

-## Problem Setting
+## Problem setting

 For supervised training, we're faced with an 
 unknown function $f^*(x)=y^*$, collect lots of pairs of data $[x_0,y^*_0], ...[x_n,y^*_n]$ (the training data set)