Starting diffphys chapter
This commit is contained in:
parent
0063c71c05
commit
cbf20ff8fe
@ -6,7 +6,7 @@
|
||||
"id": "o4JZ84moBKMr"
|
||||
},
|
||||
"source": [
|
||||
"# Differentiable Fluid Simulations with Φ<sub>Flow</sub>\n",
|
||||
"# Differentiable Fluid Simulations\n",
|
||||
"\n",
|
||||
"... now a more complex example with fluid simulations (Navier-Stokes) ...\n"
|
||||
]
|
||||
|
241
diffphys.md
241
diffphys.md
@ -1,6 +1,243 @@
|
||||
Differentiable Simulations
|
||||
Differentiable Physics
|
||||
=======================
|
||||
|
||||
... are much more powerful ...
|
||||
%... are much more powerful ...
|
||||
As a next step towards a tighter and more generic combination of deep learning
|
||||
methods and deep learning we will target incorporating _differentiable physical
|
||||
simulations_ into the learning process. In the following, we'll shorten
|
||||
that to "differentiable physics" (DP).
|
||||
|
||||
The central goal of this methods is to use existing numerical solvers, and equip
|
||||
them with functionality to compute gradients with respect to their inputs.
|
||||
Once this is realized for all operators of a simulation, we can leverage
|
||||
the autodiff functionality of DL frameworks with back-propagation to let gradient
|
||||
information from from a simulator into an NN and vice versa. This has numerous
|
||||
advantages such as improved learning feedback and generalization, as we'll outline below.
|
||||
In contrast to physics-informed loss functions, it also enables handling more complex
|
||||
solution manifolds instead of single inverse problems.
|
||||
|
||||
## Differentiable Operators
|
||||
|
||||
With this direction we build on existing numerical solvers. I.e.,
|
||||
the approach is strongly relying on the algorithms developed in the larger field
|
||||
of computational methods for a vast range of physical effects in our world.
|
||||
To start with we need a continuous formulation as model for the physical effect that we'd like
|
||||
to simulate -- if this is missing we're in trouble. But luckily, we can resort to
|
||||
a huge library of established model equations, and ideally also on an established
|
||||
method for discretization of the equation.
|
||||
|
||||
Let's assume we have a continuous formulation $\mathcal P^*(\mathbf{x}, \nu)$ of the physical quantity of
|
||||
interest $\mathbf{u}(\mathbf{x}, t): \mathbb R^d \times \mathbb R^+ \rightarrow \mathbb R^d$,
|
||||
with a model parameter $\nu$ (e.g., a diffusion or viscosity constant).
|
||||
The component of $\mathbf{u}$ will be denoted by a numbered subscript, i.e.,
|
||||
$\mathbf{u} = (u_1,u_2,\dots,u_d)^T$.
|
||||
%and a corresponding discrete version that describes the evolution of this quantity over time: $\mathbf{u}_t = \mathcal P(\mathbf{x}, \mathbf{u}, t)$.
|
||||
Typically, we are interested in the temporal evolution of such a system,
|
||||
and discretization yields a formulation $\mathcal P(\mathbf{x}, \nu)$
|
||||
that we can re-arrange to compute a future state after a time step $\Delta t$ via sequence of
|
||||
operations $\mathcal P_1, \mathcal P_2 \dots \mathcal P_m$ such that
|
||||
$\mathbf{u}(t+\Delta t) = \mathcal P_1 \circ \mathcal P_2 \circ \dots \mathcal P_m ( \mathbf{u}(t),\nu )$,
|
||||
where $\circ$ denotes function decomposition, i.e. $f(g(x)) = f \circ g(x)$.
|
||||
|
||||
```{note}
|
||||
In order to integrate this solver into a DL process, we need to ensure that every operator
|
||||
$\mathcal P_i$ provides a gradient w.r.t. its inputs, i.e. in the example above
|
||||
$\partial \mathcal P_i / \partial \mathbf{u}$.
|
||||
```
|
||||
|
||||
Note that we typically don't need derivatives
|
||||
for all parameters of $\mathcal P$, e.g. we omit $\nu$ in the following, assuming that this is a
|
||||
given model parameter, with which the NN should not interact. Naturally, it can vary,
|
||||
by $\nu$ will not be the output of a NN representation. If this is the case, we can omit
|
||||
providing $\partial \mathcal P_i / \partial \nu$ in our solver.
|
||||
|
||||
## Jacobians
|
||||
|
||||
As $\mathbf{u}$ is typically a vector-valued function, $\partial \mathcal P_i / \partial \mathbf{u}$ denotes
|
||||
a Jacobian matrix $J$ rather than a single value:
|
||||
% test
|
||||
$$ \begin{aligned}
|
||||
\frac{ \partial \mathcal P_i }{ \partial \mathbf{u} } =
|
||||
\begin{bmatrix}
|
||||
\partial \mathcal P_{i,1} / \partial u_{1}
|
||||
& \ \cdots \ &
|
||||
\partial \mathcal P_{i,1} / \partial u_{d}
|
||||
\\
|
||||
\vdots & \ & \
|
||||
\\
|
||||
\partial \mathcal P_{i,d} / \partial u_{1}
|
||||
& \ \cdots \ &
|
||||
\partial \mathcal P_{i,d} / \partial u_{d}
|
||||
\end{bmatrix}
|
||||
\end{aligned} $$
|
||||
where, as above, $d$ denotes the number of components in $\mathbf{u}$. As $\mathcal P$ maps one value of
|
||||
$\mathbf{u}$ to another, the jacobian is square and symmetric here. Of course this isn't necessarily the case
|
||||
for general model equations, but non-square Jacobian matrices would not cause any problems for differentiable
|
||||
simulations.
|
||||
|
||||
In practice, we can rely on the _reverse mode_ differentiation that all modern DL
|
||||
frameworks provide, and focus on computing a matrix vector product of the Jacobian transpose
|
||||
with a vector $\mathbf{a}$, i.e. the expression:
|
||||
$
|
||||
( \frac{\partial \mathcal P_i }{ \partial \mathbf{u} } )^T \mathbf{a}
|
||||
$.
|
||||
If we'd need to construct and store all full Jacobian matrices that we encounter during training,
|
||||
this would cause huge memory overheads and unnecessarily slow down training.
|
||||
Instead, for backpropagation, we can provide faster operations that compute products
|
||||
with the Jacobian transpose because we always have a scalar loss function at the end of the chain.
|
||||
|
||||
[TODO check transpose of Jacobians in equations]
|
||||
|
||||
Given the formulation above, we need to resolve the derivatives
|
||||
of the chain of function compositions of the $\mathcal P_i$ at some current state $\mathbf{u}^n$ via the chain rule.
|
||||
E.g., for two of them
|
||||
|
||||
$
|
||||
\frac{ \partial (\mathcal P_1 \circ \mathcal P_2) }{ \partial \mathbf{u} }|_{\mathbf{u}^n}
|
||||
=
|
||||
\frac{ \partial \mathcal P_1 }{ \partial \mathbf{u} }|_{\mathcal P_2(\mathbf{u}^n)}
|
||||
\
|
||||
\frac{ \partial \mathcal P_2 }{ \partial \mathbf{u} }|_{\mathbf{u}^n}
|
||||
$,
|
||||
which is just the vector valued version of the "classic" chain rule
|
||||
$f(g(x))' = f'(g(x)) g'(x)$, and directly extends for larger numbers of composited functions, i.e. $i>2$.
|
||||
|
||||
Here, the derivatives for $\mathcal P_1$ and $\mathcal P_2$ are still Jacobian matrices, but knowing that
|
||||
at the "end" of the chain we have our scalar loss (cf. {doc}`overview`), the right-most Jacobian will invariably
|
||||
be a matrix with 1 column, i.e. a vector. During reverse mode, we start with this vector, and compute
|
||||
the multiplications with the left Jacobians, $\frac{ \partial \mathcal P_1 }{ \partial \mathbf{u} }$ above,
|
||||
one by one.
|
||||
|
||||
For the details of forward and reverse mode differentiation, please check out external materials such
|
||||
as this [nice survey by Baydin et al.](https://arxiv.org/pdf/1502.05767.pdf).
|
||||
|
||||
## Learning via DP Operators
|
||||
|
||||
Thus, long story short, once the operators of our simulator support computations of the Jacobian-vector
|
||||
products, we can integrate them into DL pipelines just like you would include a regular fully-connected layer
|
||||
or a ReLU activation.
|
||||
|
||||
At this point, the following (very valid) question often comes up: "_Most physics solver can be broken down into a
|
||||
sequence of vector and matrix operations. All state-of-the-art DL frameworks support these, so why don't we just
|
||||
use these operators to realize our physics solver?_"
|
||||
|
||||
It's true that this would theoretically be possible. The problem here is that each of the vector and matrix
|
||||
operations in tensorflow and pytorch is computed individually, and internally needs to store the current
|
||||
state of the forward evaluation for backpropagation (the "$g(x)$" above). For a typical
|
||||
simulation, however, we're not overly interested in every single intermediate result our solver produces.
|
||||
Typically, we're more concerned with significant updates such as the step from $\mathbf{u}(t)$ to $\mathbf{u}(t+\Delta t)$.
|
||||
|
||||
%provide discretized simulator of physical phenomenon as differentiable operator.
|
||||
Thus, in practice it is a very good idea to break down the solving process into a sequence
|
||||
of meaningful but _monolithic_ operators. This not only saves a lot of work by preventing the calculation
|
||||
of unnecessary intermediate results, it also allows us to choose the best possible numerical methods
|
||||
to compute the updates (and derivatives) for these operators.
|
||||
%in practice break down into larger, monolithic components
|
||||
E.g., as this process is very similar to adjoint method optimizations, we can re-use many of the techniques
|
||||
that were developed in this field, or leverage established numerical methods. E.g.,
|
||||
we could leverage the $O(n)$ complexity of multigrid solvers for matrix inversion.
|
||||
|
||||
The flipside of this approach is, that it requires some understanding of the problem at hand,
|
||||
and of the numerical methods. Also, a given solver might not provide gradient calculations out of the box.
|
||||
Thus, we want to employ DL for model equations that we don't have a proper grasp of, it might not be a good
|
||||
idea to direclty go for learning via a DP approach. However, if we don't really understand our model, we probably
|
||||
should go back to studying it a bit more anyway...
|
||||
|
||||
Also, in practice we can be _greedy_ with the derivative operators, and only
|
||||
provide those which are relevant for the learning task. E.g., if our network
|
||||
never produces the parameter $\nu$ in the example above, and it doesn't appear in our
|
||||
loss formulation, we will never encounter a $\partial/\partial \nu$ derivative
|
||||
in our backpropagation step.
|
||||
|
||||
---
|
||||
|
||||
## A practical example
|
||||
|
||||
As a simple example let's consider the advection of a passive scalar density $d(\mathbf{x},t)$ in
|
||||
a velocity field $\mathbf{u}$ as physical model $\mathcal P^*$:
|
||||
|
||||
$$
|
||||
\frac{\partial d}{\partial{t}} + \mathbf{u} \cdot \nabla d = 0
|
||||
$$
|
||||
|
||||
Instead of using this formulation as a residual equation right away (as in {doc}`physicalloss`),
|
||||
we can discretize it with our favorite mesh and discretization scheme,
|
||||
to obtain a formulation that updates the state of our system over time. This is a standard
|
||||
procedure for a _forward_ solve.
|
||||
Note that to simplify things, we assume that $\mathbf{u}$ is only a function in space,
|
||||
i.e. constant over time. We'll bring back the time evolution of $\mathbf{u}$ later on.
|
||||
%
|
||||
[TODO, write out simple finite diff approx?]
|
||||
%
|
||||
Let's denote this re-formulation as $\mathcal P$ that maps a state of $d(t)$ into a
|
||||
new state at an evoled time, i.e.:
|
||||
|
||||
$$
|
||||
d(t+\Delta t) = \mathcal P ( d(t), \mathbf{u}, t+\Delta t)
|
||||
$$
|
||||
|
||||
As a simple optimization and learning task, let's consider the problem of
|
||||
finding an unknown motion $\mathbf{u}$ such that starting with a given initial state $d^{~0}$ at $t=t^0$,
|
||||
the time evolved scalar density at time $t=t^e$ has a certain shape or configuration $d^{\text{target}}$.
|
||||
Informally, we'd like to find a motion that deforms $d^{~0}$ into a target state.
|
||||
The simplest way to express this goal is via an $L^2$ loss between the two states, hence we want
|
||||
to minimize $|d(t^e) - d^{\text{target}}|^2$.
|
||||
Note that as described here this is a pure optimization task, there's no NN involved,
|
||||
and our goal is $\mathbf{u}$. We do not want to apply this motion to other, unseen _test data_,
|
||||
as would be custom in a real learning task.
|
||||
|
||||
The final state of our marker density $d(t^e)$ is fully determined by the evolution
|
||||
of $\mathcal P$ via $\mathbf{u}$, which gives the following minimization as overall goal:
|
||||
|
||||
$$
|
||||
\text{arg min}_{~\mathbf{u}} | \mathcal P ( d^{~0}, \mathbf{u}, t^e) - d^{\text{target}}|^2
|
||||
$$
|
||||
|
||||
We'd now like to find the minimizer for this objective by
|
||||
gradient descent (as the most straight-forward starting point), where the
|
||||
gradient is determined by the differentiable physics approach described above.
|
||||
As the velocity field $\mathbf{u}$ contains our degrees of freedom,
|
||||
what we're looking for is the Jacobian
|
||||
$\partial \mathcal P / \partial \mathbf{u}$, which we luckily don't need as a full
|
||||
matrix, but instead only mulitplied by the vector obtained from the derivative of the
|
||||
$L^2$ loss $L= |d(t^e) - d^{\text{target}}|^2$, thus
|
||||
$\partial L / \partial d$
|
||||
|
||||
TODO, introduce L earlier
|
||||
P gives us d, next pd / du -> udpate u
|
||||
|
||||
...to obtain an explicit update of the form
|
||||
$d(t+\Delta t) = A d(t)$,
|
||||
where the matrix $A$ represents the discretized advection step of size $\Delta t$ for $\mathbf{u}$.
|
||||
|
||||
E.g., for a simple first order upwinding scheme on a Cartesian grid we'll get a matrix that essentially encodes
|
||||
linear interpolation coefficients for positions $\mathbf{x} + \Delta t \mathbf{u}$. For a grid of
|
||||
size $d_x \times d_y$ we'd have a
|
||||
|
||||
simplest example, advection step of passive scalar
|
||||
turn into matrix, mult by transpose
|
||||
matrix free
|
||||
|
||||
more complex, matrix inversion, eg Poisson solve
|
||||
dont backprop through all CG steps (available in phiflow though)
|
||||
rather, re-use linear solver to compute multiplication by inverse matrix
|
||||
|
||||
note - time can be "virtual" , solving for steady state
|
||||
only assumption: some iterative procedure, not just single eplicit step - then things simplify.
|
||||
|
||||
## Summary of Differentiable Physics
|
||||
|
||||
This gives us a method to include physical equations into DL learning as a soft-constraint.
|
||||
Typically, this setup is suitable for _inverse_ problems, where we have certain measurements or observations
|
||||
that we wish to find a solution of a model PDE for. Because of the high expense of the reconstruction (to be
|
||||
demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult
|
||||
to capture a wide range of solutions, such as the previous supervised airfoil example, in this way.
|
||||
|
||||
```{figure} resources/placeholder.png
|
||||
---
|
||||
height: 220px
|
||||
name: dp-training
|
||||
---
|
||||
TODO, visual overview of DP training
|
||||
```
|
||||
|
||||
|
11
overview.md
11
overview.md
@ -144,9 +144,20 @@ Very brief intro, basic equations... approximate $f^*(x)=y$ with NN $f(x;\theta)
|
||||
|
||||
learn via GD, $\partial f / \partial \theta$
|
||||
|
||||
general goal, minimize E for e(x,y) ... cf. eq. 8.1 from DLbook
|
||||
|
||||
$$
|
||||
test \~ \approx eq \ \RR
|
||||
$$
|
||||
|
||||
introduce scalar loss, always(!) scalar...
|
||||
|
||||
Read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org),
|
||||
especially about [MLPs]https://www.deeplearningbook.org/contents/mlp.html and
|
||||
"Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.html).
|
||||
|
||||
**Note:** Classic distinction between _classification_ and _regression_ problems not so important here,
|
||||
we only deal with _regression_ problems in the following.
|
||||
|
||||
maximum likelihood estimation
|
||||
|
||||
|
@ -16,6 +16,17 @@ And while the setup is realtively simple, it is generally difficult to control.
|
||||
has flexibility to refine the solution by itself, but at the same time, tricks are necessary
|
||||
when it doesn't pick the right regions of the solution.
|
||||
|
||||
## Is it "Machine Learning"
|
||||
|
||||
TODO, discuss - more akin to classical optimization:
|
||||
we test for space/time positions at training time, and are interested in the
|
||||
solution there afterwards.
|
||||
|
||||
hence, no real generalization, or test data with different distribution.
|
||||
more similar to inverse problem that solves single state e.g. via BFGS or Newton.
|
||||
|
||||
## Summary
|
||||
|
||||
In general, a fundamental drawback of this approach is that it does combine with traditional
|
||||
numerical techniques well. E.g., learned representation is not suitable to be refined with
|
||||
a classical iterative solver such as the conjugate gradient method. This means many
|
||||
|
@ -9,24 +9,25 @@ as an "external" tool to produce a big pile of data 😢.
|
||||
## Using Physical Models
|
||||
|
||||
We can improve this setting by trying to bring the model equations (or parts thereof)
|
||||
into the training process. E.g., given a PDE for $\mathbf{u}(x,t)$ with a time evolution,
|
||||
into the training process. E.g., given a PDE for $\mathbf{u}(\mathbf{x},t)$ with a time evolution,
|
||||
we can typically express it in terms of a function $\mathcal F$ of the derivatives
|
||||
of $\mathbf{u}$ via
|
||||
$
|
||||
\mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{x..x})
|
||||
\mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx, ... \mathbf{u}_{xx...x} )
|
||||
$,
|
||||
where the $_{x}$ subscripts denote spatial derivatives of higher order.
|
||||
where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to one of the spatial dimensions
|
||||
of higher and higher order (this can of course also include derivatives with repsect to different axes).
|
||||
|
||||
In this context we can employ DL by approxmating the unknown $\mathbf{u}$ itself
|
||||
In this context we can employ DL by approximating the unknown $\mathbf{u}$ itself
|
||||
with a NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
|
||||
naturally should be satisfied, i.e., the residual $R$ should be equal to zero:
|
||||
$
|
||||
R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{x..x}) = 0
|
||||
R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx, ... \mathbf{u}_{xx...x} ) = 0
|
||||
$
|
||||
|
||||
This nicely integrates with the objective for training a neural network: similar to before
|
||||
we can collect sample solutions
|
||||
$[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(x)=y$.
|
||||
$[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(\mathbf{x})=y$.
|
||||
This is typically important, as most practical PDEs we encounter do not have unique solutions
|
||||
unless initial and boundary conditions are specified. Hence, if we only consider $R$ we might
|
||||
get solutions with random offset or other undesirable components. Hence the supervised sample points
|
||||
@ -35,7 +36,7 @@ Now our training objective becomes
|
||||
|
||||
$\text{arg min}_{\theta} \ \alpha_0 \sum_i (f(x_i ; \theta)-y_i)^2 + \alpha_1 R(x_i) $,
|
||||
|
||||
where $\alpha_{0,1}$ denote hyper parameters that scale the contribution of the supervised term and
|
||||
where $\alpha_{0,1}$ denote hyperparameters that scale the contribution of the supervised term and
|
||||
the residual term, respectively. We could of course add additional residual terms with suitable scaling factors here.
|
||||
|
||||
Note that, similar to the data samples used for supervised training, we have no guarantees that the
|
||||
@ -56,8 +57,8 @@ uses fully connected NNs to represent $\mathbf{u}$. This has some interesting pr
|
||||
Due to the popularity of the version, we'll also focus on it in the following code examples and comparisons.
|
||||
|
||||
The central idea here is that the aforementioned general function $f$ that we're after in our learning problems
|
||||
can be seen as a representation of a physical field we're after. Thus, the $\mathbf{u}(x)$ will
|
||||
be turned into $\mathbf{u}(x, \theta)$ where we choose $\theta$ such that the solution to $\mathbf{u}$ is
|
||||
can be seen as a representation of a physical field we're after. Thus, the $\mathbf{u}(\mathbf{x})$ will
|
||||
be turned into $\mathbf{u}(\mathbf{x}, \theta)$ where we choose $\theta$ such that the solution to $\mathbf{u}$ is
|
||||
represented as precisely as possible.
|
||||
|
||||
One nice side effect of this viewpoint is that NN representations inherently support the calculation of derivatives.
|
||||
|
@ -22,7 +22,7 @@ by adjusting weights $\theta$ of our representation with $f$ such that
|
||||
$\text{arg min}_{\theta} \sum_i (f(x_i ; \theta)-y_i)^2$.
|
||||
|
||||
This will give us $\theta$ such that $f(x;\theta) \approx y$ as accurately as possible given
|
||||
our choice of $f$ and the hyper parameters for training. Note that above we've assumed
|
||||
our choice of $f$ and the hyperparameters for training. Note that above we've assumed
|
||||
the simplest case of an $L^2$ loss. A more general version would use an error metric $e(x,y)$
|
||||
to be minimized via $\text{arg min}_{\theta} \sum_i e( f(x_i ; \theta) , y_i) )$. The choice
|
||||
of a suitable metric is topic we will get back to later on.
|
||||
|
Loading…
Reference in New Issue
Block a user