first version DP

2021-01-16 13:30:26 +08:00
parent cbf20ff8fe
commit b68bbcfaa5
5 changed files with 120 additions and 63 deletions
--- a/diffphys.md
+++ b/diffphys.md
@@ -1,7 +1,6 @@
 Differentiable Physics
 =======================

-%... are much more powerful ...
 As a next step towards a tighter and more generic combination of deep learning
 methods and deep learning we will target incorporating _differentiable physical 
 simulations_ into the learning process. In the following, we'll shorten
@@ -16,6 +15,14 @@ advantages such as improved learning feedback and generalization, as we'll outli
 In contrast to physics-informed loss functions, it also enables handling more complex
 solution manifolds instead of single inverse problems.

+```{figure} resources/placeholder.png
+---
+height: 220px
+name: dp-training
+---
+TODO, visual overview of DP training
+```
+
 ## Differentiable Operators

 With this direction we build on existing numerical solvers. I.e., 
@@ -99,6 +106,7 @@ $
    \ 
    \frac{ \partial \mathcal P_2 }{ \partial \mathbf{u} }|_{\mathbf{u}^n}
 $,
+
 which is just the vector valued version of the "classic" chain rule
 $f(g(x))' = f'(g(x)) g'(x)$, and directly extends for larger numbers of composited functions, i.e. $i>2$.

@@ -168,76 +176,126 @@ Note that to simplify things, we assume that $\mathbf{u}$ is only a function in
 i.e. constant over time. We'll bring back the time evolution of $\mathbf{u}$ later on.
 %
 [TODO, write out simple finite diff approx?]
+[denote discrete d as $\mathbf{d}$ below?]
 %
-Let's denote this re-formulation as $\mathcal P$ that maps a state of $d(t)$ into a 
+Let's denote this re-formulation as $\mathcal P$. It maps a state of $d(t)$ into a 
 new state at an evoled time, i.e.:

 $$
-    d(t+\Delta t) = \mathcal P ( d(t), \mathbf{u}, t+\Delta t) 
+    d(t+\Delta t) = \mathcal P ( ~ d(t), \mathbf{u}, t+\Delta t) 
 $$

-As a simple optimization and learning task, let's consider the problem of
-finding an unknown motion $\mathbf{u}$ such that starting with a given initial state $d^{~0}$ at $t=t^0$,
-the time evolved scalar density at time $t=t^e$ has a certain shape or configuration $d^{\text{target}}$.
+As a simple example of an optimization and learning task, let's consider the problem of
+finding an motion $\mathbf{u}$ such that starting with a given initial state $d^{~0}$ at $t^0$,
+the time evolved scalar density at time $t^e$ has a certain shape or configuration $d^{\text{target}}$.
 Informally, we'd like to find a motion that deforms $d^{~0}$ into a target state.
-The simplest way to express this goal is via an $L^2$ loss between the two states, hence we want
-to minimize $|d(t^e) - d^{\text{target}}|^2$. 
+The simplest way to express this goal is via an $L^2$ loss between the two states. So we want
+to minimize the loss function $F=|d(t^e) - d^{\text{target}}|^2$. 
+
 Note that as described here this is a pure optimization task, there's no NN involved,
-and our goal is $\mathbf{u}$. We do not want to apply this motion to other, unseen _test data_,
+and our goal is to obtain $\mathbf{u}$. We do not want to apply this motion to other, unseen _test data_,
 as would be custom in a real learning task.

 The final state of our marker density $d(t^e)$ is fully determined by the evolution 
-of $\mathcal P$ via $\mathbf{u}$, which gives the following minimization as overall goal:
+of $\mathcal P$ via $\mathbf{u}$, which gives the following minimization problem as overall goal:

 $$
-    \text{arg min}_{~\mathbf{u}} | \mathcal P ( d^{~0}, \mathbf{u}, t^e) - d^{\text{target}}|^2
+    \text{arg min}_{~\mathbf{u}} | \mathcal P ( d^{~0}, \mathbf{u}, t^e - t^0 ) - d^{\text{target}}|^2
 $$

 We'd now like to find the minimizer for this objective by
-gradient descent (as the most straight-forward starting point), where the 
-gradient is determined by the differentiable physics approach described above.
-As the velocity field $\mathbf{u}$ contains our degrees of freedom,
-what we're looking for is the Jacobian 
-$\partial \mathcal P / \partial \mathbf{u}$, which we luckily don't need as a full
-matrix, but instead only mulitplied by the vector obtained from the derivative of the 
-$L^2$ loss $L= |d(t^e) - d^{\text{target}}|^2$, thus
-$\partial L / \partial d$
+_gradient descent_ (GD), where the 
+gradient is determined by the differentiable physics approach described earlier in this chapter.
+Once things are working with GD, we can relatively easily switch to better optimizers or bring
+an NN into the picture, hence it's always a good starting point.

-TODO, introduce L earlier
-P gives us d, next pd / du -> udpate u
+As the discretized velocity field $\mathbf{u}$ contains all our degrees of freedom,
+what we need to update the velocity by an amount 
+$\Delta \mathbf{u} = \partial L / \partial \mathbf{u}$, 
+which can be decomposed into 
+$\Delta \mathbf{u} = 
+\frac{ \partial d }{ \partial \mathbf{u}}
+\frac{ \partial L }{ \partial d}
+$.
+And as the evolution of $d$ is given by our discretized physical model $P$,
+what we're acutally looking for is the Jacobian 
+$\partial \mathcal P / \partial \mathbf{u}$ to
+compute 
+$\Delta \mathbf{u} = 
+ \frac{ \partial \mathcal P }{ \partial \mathbf{u}}
+ \frac{ \partial L }{ \partial d}$. 
+We luckily don't need $\partial \mathcal P / \partial \mathbf{u}$ as a full
+matrix, but instead only mulitplied by the vector obtained from the derivative of our scalar 
+loss function $L$.

-...to obtain an explicit update of the form
-$d(t+\Delta t) = A d(t)$,
-where the matrix $A$ represents the discretized advection step of size $\Delta t$ for $\mathbf{u}$.
+%the $L^2$ loss $L= |d(t^e) - d^{\text{target}}|^2$, thus

-E.g., for a simple first order upwinding scheme on a Cartesian grid we'll get a matrix that essentially encodes
-linear interpolation coefficients for positions $\mathbf{x} + \Delta t \mathbf{u}$. For a grid of 
-size $d_x \times d_y$ we'd have a 
+So what are the actual Jacobians here:
+the one for $L$ is simple enough, we simply get a column vector with entries of the form
+$2(d(t^e)_i - d^{\text{target}})_i$ for one component $i$.
+$\partial \mathcal P / \partial \mathbf{u}$ is more interesting:
+here we'll get derivatives of the chosen advection operator w.r.t. each component of the 
+velocities.

-simplest example, advection step of passive scalar
-turn into matrix, mult by transpose 
-matrix free
+%...to obtain an explicit update of the form $d(t+\Delta t) = A d(t)$, where the matrix $A$ represents the discretized advection step of size $\Delta t$ for $\mathbf{u}$. ... we'll get a matrix that essentially encodes linear interpolation coefficients for positions $\mathbf{x} + \Delta t \mathbf{u}$. For a grid of size $d_x \times d_y$ we'd have a 

+```{figure} resources/placeholder.png
+---
+height: 100px
+name: advection-upwing
+---
+TODO, small sketch of 1D advection
+```
+
+E.g., for a simple [first order upwinding scheme](https://en.wikipedia.org/wiki/Upwind_scheme) 
+on a Cartesian grid in 1D, with marker density and velocity $d_i$ and $u_i$ for cell $i$
+(superscripts for time $t$ are omitted for brevity), 
+we get 
+
+$$ \begin{aligned}
+    & d_i^{~t+\Delta t} = d_i - u_i^+ (d_{i+1} - d_{i}) +  u_i^- (d_{i} - d_{i-1}) \text{ with }  \\
+    & u_i^+ = \text{max}(u_i \Delta t / \Delta x,0) \\
+    & u_i^- = \text{min}(u_i \Delta t / \Delta x,0)
+\end{aligned} $$
+
+E.g., for a positive $u_i$ we have 
+$d_i^{~t+\Delta t} = (1 + \frac{u_i \Delta t }{ \Delta x}) d_i - \frac{u_i \Delta t }{ \Delta x} d_{i+1}$
+and hence 
+$\partial \mathcal P / \partial u_i$ from cell $i$ would be $1 + \frac{u_i \Delta t }{ \Delta x}$.
+For the full gradient we'd need to add 
+the potential contributions from cells $i+1$ and $i-1$, depending on the sign of their velocities.
+
+In practice this step is similar to evaluating a transposed matrix multiplication.
+If we rewrite the calculation above as 
+$d^{~t+\Delta t} = A \mathbf{u}$, then $\partial \mathcal P / \partial \mathbf{u} = A^T$.
+In many practical cases, a matrix free implementation of this multiplication might 
+be preferable to actually constructing $A$.
+
+## A (slightly) more complex example
+
+[TODO]
 more complex, matrix inversion, eg Poisson solve
 dont backprop through all CG steps (available in phiflow though)
 rather, re-use linear solver to compute multiplication by inverse matrix

-note - time can be "virtual" , solving for steady state
-only assumption: some iterative procedure, not just single eplicit step - then things simplify.
+[note 1: essentialy yields implicit derivative, cf implicit function theorem & co]

-## Summary of Differentiable Physics
+[note 2: time can be "virtual" , solving for steady state
+only assumption: some iterative procedure, not just single eplicit step - then things simplify.]

-This gives us a method to include physical equations into DL learning as a soft-constraint.
-Typically, this setup is suitable for _inverse_ problems, where we have certain measurements or observations
-that we wish to find a solution of a model PDE for. Because of the high expense of the reconstruction (to be 
-demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult 
-to capture a wide range of solutions, such as the previous supervised airfoil example, in this way.
+## Summary of Differentiable Physics so far

-```{figure} resources/placeholder.png
---
-height: 220px
-name: dp-training
---
-TODO, visual overview of DP training
-```
+To summarize, using differentiable physical simulations 
+gives us a tool to include phsyical equations with a chosen discretization into DL learning.
+In contrast to the residual constraints of the previous chapter,
+this makes it possible to left NNs seamlessly interact with physical solvers.

+We'd previously fully discard our physical model and solver
+once the NN is trained: in the example from {doc}`physicalloss-code` 
+the NN gives us the solution directly, bypassing
+any solver or model equation. With the DP approach we can train an NN alongside
+a numerical solver, and thus we can make use of the physical model (as represented by 
+the solver) later on at inference time. This allows us to move beyond solving single
+inverse problems, and can yield NNs that quite robustly generalize to new inputs.
+
+Let's revisit the sample problem from {doc}`physicalloss-code` in the context of DPs.