updated physical loss chapter
This commit is contained in:
parent
c8c0158ec2
commit
42c0aafa21
@ -92,7 +92,6 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"WARNING:tensorflow:From /Users/thuerey/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/ops/math_grad.py:297: setdiff1d (from tensorflow.python.ops.array_ops) is deprecated and will be removed after 2018-11-30.\n",
|
||||
"Instructions for updating:\n",
|
||||
"This op will be removed after the deprecation date. Please switch to tf.sets.difference().\n",
|
||||
"Loss: 0.382915\n"
|
||||
@ -592,4 +591,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
}
|
@ -153,7 +153,7 @@ techniques:
|
||||
- _Loss-terms_: the physical dynamics (or parts thereof) are encoded in the
|
||||
loss function, typically in the form of differentiable operations. The
|
||||
learning process can repeatedly evaluate the loss, and usually receives
|
||||
gradients from a PDE-based formulation. These soft-constraints sometimes also go
|
||||
gradients from a PDE-based formulation. These soft constraints sometimes also go
|
||||
under the name "physics-informed" training.
|
||||
|
||||
- _Interleaved_: the full physical simulation is interleaved and combined with
|
||||
|
File diff suppressed because one or more lines are too long
@ -1,4 +1,4 @@
|
||||
Discussion of Physical Soft-Constraints
|
||||
Discussion of Physical Losses
|
||||
=======================
|
||||
|
||||
The good news so far is - we have a DL method that can include
|
||||
@ -9,7 +9,7 @@ starting point.
|
||||
On the positive side, we can leverage DL frameworks with backpropagation to compute
|
||||
the derivatives of the model. At the same time, this puts us at the mercy of the learned
|
||||
representation regarding the reliability of these derivatives. Also, each derivative
|
||||
requires backpropagation through the full network, which can be very expensive. Especially so
|
||||
requires backpropagation through the full network. This can be very expensive, especially
|
||||
for higher-order derivatives.
|
||||
|
||||
And while the setup is relatively simple, it is generally difficult to control. The NN
|
||||
@ -23,23 +23,23 @@ Of course, such denomination questions are superficial - if an algorithm is usef
|
||||
what name it has. However, here the question helps to highlight some important properties
|
||||
that are typically associated with algorithms from fields like machine learning or optimization.
|
||||
|
||||
One main reason _not_ to call these physical constraints machine learning (ML), is that the
|
||||
One main reason _not_ to call the optimization of the previous notebook machine learning (ML), is that the
|
||||
positions where we test and constrain the solution are the final positions we are interested in.
|
||||
As such, there is no real distinction between training, validation and (out of distribution) test sets.
|
||||
As such, there is no real distinction between training, validation and test sets.
|
||||
Computing the solution for a known and given set of samples is much more akin to classical optimization,
|
||||
where inverse problems like the previous Burgers example stem from.
|
||||
|
||||
For machine learning, we typically work under the assumption that the final performance of our
|
||||
model will be evaluated on a different, potentially unknown set of inputs. The _test data_
|
||||
should usually capture such out of distribution (OOD) behavior, so that we can make estimates
|
||||
should usually capture such _out of distribution_ (OOD) behavior, so that we can make estimates
|
||||
about how well our model will generalize to "real-world" cases that we will encounter when
|
||||
we deploy it into an application.
|
||||
we deploy it in an application.
|
||||
|
||||
In contrast, for the PINN training as described here, we reconstruct a single solution in a known
|
||||
and given space-time region. As such, any samples from this domain follow the same distribution
|
||||
and hence don't really represent test or OOD sampes. As the NN directly encodes the solution,
|
||||
and hence don't really represent test or OOD samples. As the NN directly encodes the solution,
|
||||
there is also little hope that it will yield different solutions, or perform well outside
|
||||
of the training distribution. If we're interested in a different solution, we most likely
|
||||
of the training range. If we're interested in a different solution, we
|
||||
have to start training the NN from scratch.
|
||||
|
||||

|
||||
@ -48,7 +48,7 @@ have to start training the NN from scratch.
|
||||
|
||||
Thus, the physical soft constraints allow us to encode solutions to
|
||||
PDEs with the tools of NNs.
|
||||
An inherent drawback of this approach is that it yields single solutions,
|
||||
An inherent drawback of this variant 2 is that it yields single solutions,
|
||||
and that it does not combine with traditional numerical techniques well.
|
||||
E.g., the learned representation is not suitable to be refined with
|
||||
a classical iterative solver such as the conjugate gradient method.
|
||||
|
@ -10,13 +10,8 @@ We as humans have a lot of knowledge about how to describe physical processes
|
||||
mathematically. As the following chapters will show, we can improve the
|
||||
training process by guiding it with our human knowledge of physics.
|
||||
|
||||
```{figure} resources/physloss-overview.jpg
|
||||
---
|
||||
height: 220px
|
||||
name: physloss-overview
|
||||
---
|
||||
Physical losses typically combine a supervised loss with a combination of derivatives from the neural network.
|
||||
```
|
||||

|
||||
|
||||
|
||||
## Using physical models
|
||||
|
||||
@ -63,17 +58,57 @@ Note that, similar to the data samples used for supervised training, we have no
|
||||
residual terms $R$ will actually reach zero during training. The non-linear optimization of the training process
|
||||
will minimize the supervised and residual terms as much as possible, but there is no guarantee. Large, non-zero residual
|
||||
contributions can remain. We'll look at this in more detail in the upcoming code example, for now it's important
|
||||
to remember that physical constraints in this way only represent _soft-constraints_, without guarantees
|
||||
to remember that physical constraints in this way only represent _soft constraints_, without guarantees
|
||||
of minimizing these constraints.
|
||||
|
||||
## Neural network derivatives
|
||||
The previous overview did not really make clear how an NN produces $\mathbf{u}$.
|
||||
We can distinguish two different approaches here:
|
||||
via a chosen explicit representation of the target function (v1 in the following), or via using fully-connected neural networks to represent the solution (v2).
|
||||
E.g., for v1 we could set up a _spatial_ grid (or graph, or a set of sample points), while in the second case no explicit representation exists, and the NN instead receives the _spatial coordinate_ to produce the solution at a query position.
|
||||
We'll outline these two variants in more detail the following.
|
||||
|
||||
In order to compute the residuals at training time, it would be possible to store
|
||||
the unknowns of $\mathbf{u}$ on a computational mesh, e.g., a grid, and discretize the equations of
|
||||
$R$ there. This has a fairly long "tradition" in DL, and was proposed by Tompson et al. {cite}`tompson2017` early on.
|
||||
---
|
||||
|
||||
## Variant 1: Residual derivatives for explicit representations
|
||||
|
||||
For variant 1, we choose the discretization and set up a computational mesh that covers our target domain. Without loss of generality, let's assume this is a Cartesian grid that samples the space with positions $\mathbf{p}$. Now, an NN is trained to produce the solution on the grid: $\mathbf{u}(\mathbf{p}) = f(\mathbf{x} ; \theta)$. For a regular grid, a CNN would be a good choice for $f$, while for triangle meshes we could use a graph-network, or a network with point-convolutions for particles.
|
||||
|
||||
```{figure} resources/physloss-overview-v1.jpg
|
||||
---
|
||||
height: 220px
|
||||
name: physloss-overview-v1
|
||||
---
|
||||
Variant 1: the solution is represented by a chosen computational mesh, and produced by an NN. The residual is discretized there, and can be combined with supervised terms.
|
||||
```
|
||||
|
||||
Now, we can discretize the equations of
|
||||
$R$ on our computational mesh, and compute derivatives with our method of choice. Only caveat: to incorporate the residual
|
||||
into training, we have to formulate the evaluation such that a deep learning framework can backpropagate through the
|
||||
calculations. As our network $f()$ produces the solution $\mathbf{u}$, and the residual depends on it ($R(\mathbf{u})$), we at least need $\partial R / \partial \mathbf u$, such that the gradient can be backpropagated for the weights $\theta$. Luckily, if we formulate $R$ in terms of operations of a DL framework, this will be taken care of by the backpropagation functionality of the framework.
|
||||
|
||||
This variant has a fairly long "tradition" in DL, and was, e.g., proposed by Tompson et al. {cite}`tompson2017` early on to learn
|
||||
divergence free motions. To give a specific example: if our goal is to learn velocities $\mathbf u(t)$ which are divergence free $\nabla \cdot \mathbf u=0$, we can employ this training approach to train an NN without having to pre-compute divergence free velocity fields as training data. For brevity, we will drop the spatial index ($\mathbf p$) here, and focus on $t$, which we can likewise simplify: divergence-freeness has to hold at all times, and hence we can consider a single step from $t=0$ with $\Delta t=1$, i.e., a normalized step from a divergent $\mathbf u(0)$ to a divergence-free $\mathbf u(1)$. For a normal solver, we'd have to compute a pressure
|
||||
$p=\nabla^{-2} \mathbf{u}(0)$, such that $\mathbf{u}(1) = \mathbf{u}(0) - \nabla p$. This is the famous fundamental
|
||||
theorem of vector calculus, or
|
||||
[Helmholtz decomposition](https://en.wikipedia.org/wiki/Helmholtz_decomposition), splitting a vector field into a _solenoidal_ (divergence-free) and irrotational part (the pressure gradient).
|
||||
|
||||
To learn this decomposition, we can approximate $p$ with a CNN on our computational mesh: $p = f(\mathbf{u}(0) ; \theta)$. The learning objective becomes minimizing the divergence of $\mathbf u(0)$, which means minimizing
|
||||
$\nabla \cdot \big( \mathbf{u}(0) - \nabla f(\mathbf{u}(0);\theta) \big)$.
|
||||
To implement this residual, all we need to do is provide the divergence operator $(\nabla \cdot)$ of $\mathbf u$ on our computational mesh. This is typically easy to do via
|
||||
a convolutional layer in the DL framework that contains the finite difference weights for the divergence.
|
||||
Nicely enough, in this case we don't even need additional supervised samples, and can typically purely train with this residual formulation. Also, in contrast to variant 2 below, we can directly handle fairly large spaces of solutions here (we're not restricted to learning single solutions)
|
||||
An example implementation can be found in this [code repository](https://github.com/tum-pbs/CG-Solver-in-the-Loop).
|
||||
|
||||
Overall, this variant 1 has a lot in common with _differentiable physics_ training (it's basically a subset). As we'll discuss differentiable physics in a lot more detail
|
||||
in {doc}`diffphys` and after, we'll focus on the direct NN representation (variant 2) from now on.
|
||||
|
||||
---
|
||||
|
||||
## Variant 2: Derivatives from a neural network representation
|
||||
|
||||
The second variant of employing physical residuals as soft constraints
|
||||
instead uses fully connected NNs to represent $\mathbf{u}$. This _physics-informed_ approach was popularized by Raissi et al. {cite}`raissi2019pinn`, and has some interesting pros and cons that we'll outline in the following. We will target this physics-informed version (variant 2) in the following code examples and discussions.
|
||||
|
||||
A popular variant of employing physical soft-constraints {cite}`raissi2019pinn`
|
||||
instead uses fully connected NNs to represent $\mathbf{u}$. This has some interesting pros and cons that we'll outline in the following, and we will also focus on it in the following code examples and comparisons.
|
||||
|
||||
The central idea here is that the aforementioned general function $f$ that we're after in our learning problems
|
||||
can also be used to obtain a representation of a physical field, e.g., a field $\mathbf{u}$ that satisfies $R=0$. This means $\mathbf{u}(\mathbf{x})$ will
|
||||
@ -86,6 +121,14 @@ in {doc}`overview`. Now, we can use the same tools to compute spatial derivative
|
||||
Note that above for $R$ we've written this derivative in the shortened notation as $\mathbf{u}_{x}$.
|
||||
For functions over time this of course also works for $\partial \mathbf{u} / \partial t$, i.e. $\mathbf{u}_{t}$ in the notation above.
|
||||
|
||||
```{figure} resources/physloss-overview-v2.jpg
|
||||
---
|
||||
height: 220px
|
||||
name: physloss-overview-v2
|
||||
---
|
||||
Variant 2: the solution is produced by a fully-connected network, typically requiring a supervised loss with a combination of derivatives from the neural network for the residual. These NN derivatives have their own share of advantages and disadvantages.
|
||||
```
|
||||
|
||||
Thus, for some generic $R$, made up of $\mathbf{u}_t$ and $\mathbf{u}_{x}$ terms, we can rely on the backpropagation algorithm
|
||||
of DL frameworks to compute these derivatives once we have a NN that represents $\mathbf{u}$. Essentially, this gives us a
|
||||
function (the NN) that receives space and time coordinates to produce a solution for $\mathbf{u}$. Hence, the input is typically
|
||||
@ -101,7 +144,7 @@ For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we ca
|
||||
|
||||
## Summary so far
|
||||
|
||||
The approach above gives us a method to include physical equations into DL learning as a soft-constraint: the residual loss.
|
||||
The approach above gives us a method to include physical equations into DL learning as a soft constraint: the residual loss.
|
||||
Typically, this setup is suitable for _inverse problems_, where we have certain measurements or observations
|
||||
for which we want to find a PDE solution. Because of the high cost of the reconstruction (to be
|
||||
demonstrated in the following), the solution manifold shouldn't be overly complex. E.g., it is not possible
|
||||
|
Binary file not shown.
BIN
resources/physloss-overview-v1.jpg
Normal file
BIN
resources/physloss-overview-v1.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 86 KiB |
BIN
resources/physloss-overview-v2.jpg
Normal file
BIN
resources/physloss-overview-v2.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 142 KiB |
Binary file not shown.
Before Width: | Height: | Size: 127 KiB |
@ -158,7 +158,7 @@ To summarize, supervised training has the following properties.
|
||||
- Interactions with external "processes" (such as embedding into a solver) are difficult.
|
||||
|
||||
The next chapters will explain how to alleviate these shortcomings of supervised training.
|
||||
First, we'll look at bringing model equations into the picture via soft-constraints, and afterwards
|
||||
First, we'll look at bringing model equations into the picture via soft constraints, and afterwards
|
||||
we'll revisit the challenges of bringing together numerical simulations and learned approaches.
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user