first version DP

2021-01-16 13:30:26 +08:00
parent cbf20ff8fe
commit b68bbcfaa5
5 changed files with 120 additions and 63 deletions
--- a/diffphys-code-gradient.ipynb
+++ b/diffphys-code-gradient.ipynb
@@ -4,13 +4,13 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Burgers Optimization with a Manual Differentiable Physics Gradient\n",
+    "# Burgers Optimization with a (Manual) Differentiable Physics Gradient\n",
    "\n",
-    "manual!\n",
+    "To illustrate the process of compute gradients in a differentiable physics setting, we first demonstrate a process that isn't recommended for more complex real-world cases, but more clearly shows the way we can get gradient information from a physical simulation in a DL framework. We'll target tensorflow in the following.\n",
    "\n",
-    "like PINN... now only DOFs are initial state! rest is determined by simulation\n",
+    "We target the same reconstruction task like for the PINN example. This has some immediate implications: the evolution of the  system is now fully determined by our PDE formulatio. Hence, the only real unknown is the initial state! We will still need to re-compute all the states betwwen the initial and target state many times, just now we won't need an NN for this step, we can rely on our discretized model. Also, as we choose an initial discretization for the DP approach, the unknown initial state consists of the sampling points of the involved physical fields, and we can simply represent these unknowns as floating point variables. Hence, even for the initial state we do not need to set up an NN. Thus, our Burgers reconstruction problem reduces to a gradient-based opitmization without any NN when solving it with DP. Nonetheless, it's a very good starting point to illustrate the process.\n",
    "\n",
-    "build graph from sim (ignore warning)"
+    "First, we'll set up our discretized simulation. Here we can employ the phiflow solver, as shown in {doc}`overview-burgers-forw`. phiflow directly gives us a computational graph in tensorflow. So, as a first step, let's set up our grid with 128 points, and construct a tensorflow graph for 32 steps of the simulation. (As usual, you can ignore the warnings when loading TF and phiflow.)"
   ]
  },
  {
@@ -85,7 +85,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Note, this didnt run anything! let's set up the loss"
+    "Note, a lot has happened in these first steps, but we didnt run a single calculation for our simulation. In contrast to the first example, which used numpy as phiflow backend, we now have used the tensorflow backend (the trick here was importing phi.tf.flow instad of phi.flow). This gave us a tensorflow graph, but didn't run any real simulation steps. That will only happen once we run a tensorflow session and pass some data through this graph. So, in a way _nothing_ has happened so far in terms of our simulation! Next, let's set up the loss. That's pretty simple, we want the solution at t=0.5, i.e. step number 16, to respect the data in terms of an L2 norm. Let's also directly initialize the TF variables, and see what loss we get from the initial zero'ed initialization:"
   ]
  },
  {
--- a/diffphys-code-tf.ipynb
+++ b/diffphys-code-tf.ipynb
@@ -4,9 +4,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Burgers Optimization with a DL-based Differentiable Physics Gradient\n",
+    "# Burgers Optimization with a DL-based DP\n",
    "\n",
-    "Use DL framework, i.e. an actual optimizer from TensorFlow"
+    "Use DL framework, i.e. an actual optimizer from TensorFlow. The _proper_ way to do things..."
   ]
  },
  {
--- a/diffphys.md
+++ b/diffphys.md
@@ -1,7 +1,6 @@
 Differentiable Physics
 =======================

-%... are much more powerful ...
 As a next step towards a tighter and more generic combination of deep learning
 methods and deep learning we will target incorporating _differentiable physical 
 simulations_ into the learning process. In the following, we'll shorten
@@ -16,6 +15,14 @@ advantages such as improved learning feedback and generalization, as we'll outli
 In contrast to physics-informed loss functions, it also enables handling more complex
 solution manifolds instead of single inverse problems.

+```{figure} resources/placeholder.png
+---
+height: 220px
+name: dp-training
+---
+TODO, visual overview of DP training
+```
+
 ## Differentiable Operators

 With this direction we build on existing numerical solvers. I.e., 
@@ -99,6 +106,7 @@ $
    \ 
    \frac{ \partial \mathcal P_2 }{ \partial \mathbf{u} }|_{\mathbf{u}^n}
 $,
+
 which is just the vector valued version of the "classic" chain rule
 $f(g(x))' = f'(g(x)) g'(x)$, and directly extends for larger numbers of composited functions, i.e. $i>2$.

@@ -168,76 +176,126 @@ Note that to simplify things, we assume that $\mathbf{u}$ is only a function in
 i.e. constant over time. We'll bring back the time evolution of $\mathbf{u}$ later on.
 %
 [TODO, write out simple finite diff approx?]
+[denote discrete d as $\mathbf{d}$ below?]
 %
-Let's denote this re-formulation as $\mathcal P$ that maps a state of $d(t)$ into a 
+Let's denote this re-formulation as $\mathcal P$. It maps a state of $d(t)$ into a 
 new state at an evoled time, i.e.:

 $$
-    d(t+\Delta t) = \mathcal P ( d(t), \mathbf{u}, t+\Delta t) 
+    d(t+\Delta t) = \mathcal P ( ~ d(t), \mathbf{u}, t+\Delta t) 
 $$

-As a simple optimization and learning task, let's consider the problem of
-finding an unknown motion $\mathbf{u}$ such that starting with a given initial state $d^{~0}$ at $t=t^0$,
-the time evolved scalar density at time $t=t^e$ has a certain shape or configuration $d^{\text{target}}$.
+As a simple example of an optimization and learning task, let's consider the problem of
+finding an motion $\mathbf{u}$ such that starting with a given initial state $d^{~0}$ at $t^0$,
+the time evolved scalar density at time $t^e$ has a certain shape or configuration $d^{\text{target}}$.
 Informally, we'd like to find a motion that deforms $d^{~0}$ into a target state.
-The simplest way to express this goal is via an $L^2$ loss between the two states, hence we want
-to minimize $|d(t^e) - d^{\text{target}}|^2$. 
+The simplest way to express this goal is via an $L^2$ loss between the two states. So we want
+to minimize the loss function $F=|d(t^e) - d^{\text{target}}|^2$. 
+
 Note that as described here this is a pure optimization task, there's no NN involved,
-and our goal is $\mathbf{u}$. We do not want to apply this motion to other, unseen _test data_,
+and our goal is to obtain $\mathbf{u}$. We do not want to apply this motion to other, unseen _test data_,
 as would be custom in a real learning task.

 The final state of our marker density $d(t^e)$ is fully determined by the evolution 
-of $\mathcal P$ via $\mathbf{u}$, which gives the following minimization as overall goal:
+of $\mathcal P$ via $\mathbf{u}$, which gives the following minimization problem as overall goal:

 $$
-    \text{arg min}_{~\mathbf{u}} | \mathcal P ( d^{~0}, \mathbf{u}, t^e) - d^{\text{target}}|^2
+    \text{arg min}_{~\mathbf{u}} | \mathcal P ( d^{~0}, \mathbf{u}, t^e - t^0 ) - d^{\text{target}}|^2
 $$

 We'd now like to find the minimizer for this objective by
-gradient descent (as the most straight-forward starting point), where the 
-gradient is determined by the differentiable physics approach described above.
-As the velocity field $\mathbf{u}$ contains our degrees of freedom,
-what we're looking for is the Jacobian 
-$\partial \mathcal P / \partial \mathbf{u}$, which we luckily don't need as a full
-matrix, but instead only mulitplied by the vector obtained from the derivative of the 
-$L^2$ loss $L= |d(t^e) - d^{\text{target}}|^2$, thus
-$\partial L / \partial d$
+_gradient descent_ (GD), where the 
+gradient is determined by the differentiable physics approach described earlier in this chapter.
+Once things are working with GD, we can relatively easily switch to better optimizers or bring
+an NN into the picture, hence it's always a good starting point.

-TODO, introduce L earlier
-P gives us d, next pd / du -> udpate u
+As the discretized velocity field $\mathbf{u}$ contains all our degrees of freedom,
+what we need to update the velocity by an amount 
+$\Delta \mathbf{u} = \partial L / \partial \mathbf{u}$, 
+which can be decomposed into 
+$\Delta \mathbf{u} = 
+\frac{ \partial d }{ \partial \mathbf{u}}
+\frac{ \partial L }{ \partial d}
+$.
+And as the evolution of $d$ is given by our discretized physical model $P$,
+what we're acutally looking for is the Jacobian 
+$\partial \mathcal P / \partial \mathbf{u}$ to
+compute 
+$\Delta \mathbf{u} = 
+ \frac{ \partial \mathcal P }{ \partial \mathbf{u}}
+ \frac{ \partial L }{ \partial d}$. 
+We luckily don't need $\partial \mathcal P / \partial \mathbf{u}$ as a full
+matrix, but instead only mulitplied by the vector obtained from the derivative of our scalar 
+loss function $L$.

-...to obtain an explicit update of the form
-$d(t+\Delta t) = A d(t)$,
-where the matrix $A$ represents the discretized advection step of size $\Delta t$ for $\mathbf{u}$.
+%the $L^2$ loss $L= |d(t^e) - d^{\text{target}}|^2$, thus

-E.g., for a simple first order upwinding scheme on a Cartesian grid we'll get a matrix that essentially encodes
-linear interpolation coefficients for positions $\mathbf{x} + \Delta t \mathbf{u}$. For a grid of 
-size $d_x \times d_y$ we'd have a 
+So what are the actual Jacobians here:
+the one for $L$ is simple enough, we simply get a column vector with entries of the form
+$2(d(t^e)_i - d^{\text{target}})_i$ for one component $i$.
+$\partial \mathcal P / \partial \mathbf{u}$ is more interesting:
+here we'll get derivatives of the chosen advection operator w.r.t. each component of the 
+velocities.

-simplest example, advection step of passive scalar
-turn into matrix, mult by transpose 
-matrix free
+%...to obtain an explicit update of the form $d(t+\Delta t) = A d(t)$, where the matrix $A$ represents the discretized advection step of size $\Delta t$ for $\mathbf{u}$. ... we'll get a matrix that essentially encodes linear interpolation coefficients for positions $\mathbf{x} + \Delta t \mathbf{u}$. For a grid of size $d_x \times d_y$ we'd have a 

+```{figure} resources/placeholder.png
+---
+height: 100px
+name: advection-upwing
+---
+TODO, small sketch of 1D advection
+```
+
+E.g., for a simple [first order upwinding scheme](https://en.wikipedia.org/wiki/Upwind_scheme) 
+on a Cartesian grid in 1D, with marker density and velocity $d_i$ and $u_i$ for cell $i$
+(superscripts for time $t$ are omitted for brevity), 
+we get 
+
+$$ \begin{aligned}
+    & d_i^{~t+\Delta t} = d_i - u_i^+ (d_{i+1} - d_{i}) +  u_i^- (d_{i} - d_{i-1}) \text{ with }  \\
+    & u_i^+ = \text{max}(u_i \Delta t / \Delta x,0) \\
+    & u_i^- = \text{min}(u_i \Delta t / \Delta x,0)
+\end{aligned} $$
+
+E.g., for a positive $u_i$ we have 
+$d_i^{~t+\Delta t} = (1 + \frac{u_i \Delta t }{ \Delta x}) d_i - \frac{u_i \Delta t }{ \Delta x} d_{i+1}$
+and hence 
+$\partial \mathcal P / \partial u_i$ from cell $i$ would be $1 + \frac{u_i \Delta t }{ \Delta x}$.
+For the full gradient we'd need to add 
+the potential contributions from cells $i+1$ and $i-1$, depending on the sign of their velocities.
+
+In practice this step is similar to evaluating a transposed matrix multiplication.
+If we rewrite the calculation above as 
+$d^{~t+\Delta t} = A \mathbf{u}$, then $\partial \mathcal P / \partial \mathbf{u} = A^T$.
+In many practical cases, a matrix free implementation of this multiplication might 
+be preferable to actually constructing $A$.
+
+## A (slightly) more complex example
+
+[TODO]
 more complex, matrix inversion, eg Poisson solve
 dont backprop through all CG steps (available in phiflow though)
 rather, re-use linear solver to compute multiplication by inverse matrix

-note - time can be "virtual" , solving for steady state
-only assumption: some iterative procedure, not just single eplicit step - then things simplify.
+[note 1: essentialy yields implicit derivative, cf implicit function theorem & co]

-## Summary of Differentiable Physics
+[note 2: time can be "virtual" , solving for steady state
+only assumption: some iterative procedure, not just single eplicit step - then things simplify.]

-This gives us a method to include physical equations into DL learning as a soft-constraint.
-Typically, this setup is suitable for _inverse_ problems, where we have certain measurements or observations
-that we wish to find a solution of a model PDE for. Because of the high expense of the reconstruction (to be 
-demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult 
-to capture a wide range of solutions, such as the previous supervised airfoil example, in this way.
+## Summary of Differentiable Physics so far

-```{figure} resources/placeholder.png
---
-height: 220px
-name: dp-training
---
-TODO, visual overview of DP training
-```
+To summarize, using differentiable physical simulations 
+gives us a tool to include phsyical equations with a chosen discretization into DL learning.
+In contrast to the residual constraints of the previous chapter,
+this makes it possible to left NNs seamlessly interact with physical solvers.

+We'd previously fully discard our physical model and solver
+once the NN is trained: in the example from {doc}`physicalloss-code` 
+the NN gives us the solution directly, bypassing
+any solver or model equation. With the DP approach we can train an NN alongside
+a numerical solver, and thus we can make use of the physical model (as represented by 
+the solver) later on at inference time. This allows us to move beyond solving single
+inverse problems, and can yield NNs that quite robustly generalize to new inputs.
+
+Let's revisit the sample problem from {doc}`physicalloss-code` in the context of DPs.
--- a/physicalloss-discuss.md
+++ b/physicalloss-discuss.md
@@ -16,11 +16,10 @@ And while the setup is realtively simple, it is generally difficult to control.
 has flexibility to refine the solution by itself, but at the same time, tricks are necessary
 when it doesn't pick the right regions of the solution.

-## Is it "Machine Learning"
+## Is it "Machine Learning"?

 TODO, discuss - more akin to classical optimization:
-we test for space/time positions at training time, and are interested in the  
-solution there afterwards.
+we test for space/time positions at training time, and are interested in the solution there afterwards.

 hence, no real generalization, or test data with different distribution.
 more similar to inverse problem that solves single state e.g. via BFGS or Newton.
--- a/physicalloss.md
+++ b/physicalloss.md
@@ -6,6 +6,14 @@ yield approximate solutions with a fairly simple training process, but what's
 quite sad to see here is that we only use physical models and numerics
 as an "external" tool to produce a big pile of data 😢.

+```{figure} resources/placeholder.png
+---
+height: 220px
+name: pinn-training
+---
+TODO, visual overview of PINN training
+```
+
 ## Using Physical Models

 We can improve this setting by trying to bring the model equations (or parts thereof)
@@ -88,12 +96,4 @@ that we wish to find a solution of a model PDE for. Because of the high expense
 demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult 
 to capture a wide range of solutions, such as the previous supervised airfoil example, in this way.

-```{figure} resources/placeholder.png
---
-height: 220px
-name: pinn-training
---
-TODO, visual overview of PINN training
-```
-