updated physical loss chapter

2022-02-23 10:56:53 +01:00 · 2022-02-23 10:56:53 +01:00 · 42c0aafa21
commit 42c0aafa21
parent c8c0158ec2
10 changed files with 112 additions and 92 deletions
--- a/diffphys-code-burgers.ipynb
+++ b/diffphys-code-burgers.ipynb
@ -92,7 +92,6 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "WARNING:tensorflow:From /Users/thuerey/miniconda3/envs/tf/lib/python3.8/site-packages/tensorflow/python/ops/math_grad.py:297: setdiff1d (from tensorflow.python.ops.array_ops) is deprecated and will be removed after 2018-11-30.\n",
      "Instructions for updating:\n",
      "This op will be removed after the deprecation date. Please switch to tf.sets.difference().\n",
      "Loss: 0.382915\n"
@ -592,4 +591,4 @@
 },
 "nbformat": 4,
 "nbformat_minor": 4
-}
+}
--- a/overview.md
+++ b/overview.md
@ -153,7 +153,7 @@ techniques:
 - _Loss-terms_: the physical dynamics (or parts thereof) are encoded in the
  loss function, typically in the form of differentiable operations. The
  learning process can repeatedly evaluate the loss, and usually receives
-  gradients from a PDE-based formulation. These soft-constraints sometimes also go
+  gradients from a PDE-based formulation. These soft constraints sometimes also go
  under the name "physics-informed" training.

 - _Interleaved_: the full physical simulation is interleaved and combined with
--- a/physicalloss-code.ipynb
+++ b/physicalloss-code.ipynb
--- a/physicalloss-discuss.md
+++ b/physicalloss-discuss.md
@ -1,4 +1,4 @@
-Discussion of Physical Soft-Constraints
+Discussion of Physical Losses
 =======================

 The good news so far is - we have a DL method that can include 
@ -9,7 +9,7 @@ starting point.
 On the positive side, we can leverage DL frameworks with backpropagation to compute
 the derivatives of the model. At the same time, this puts us at the mercy of the learned
 representation regarding the reliability of these derivatives. Also, each derivative
-requires backpropagation through the full network, which can be very expensive. Especially so
+requires backpropagation through the full network. This can be very expensive, especially 
 for higher-order derivatives.

 And while the setup is relatively simple, it is generally difficult to control. The NN
@ -23,23 +23,23 @@ Of course, such denomination questions are superficial - if an algorithm is usef
 what name it has. However, here the question helps to highlight some important properties
 that are typically associated with algorithms from fields like machine learning or optimization.

-One main reason _not_ to call these physical constraints machine learning (ML), is that the
+One main reason _not_ to call the optimization of the previous notebook machine learning (ML), is that the
 positions where we test and constrain the solution are the final positions we are interested in.
-As such, there is no real distinction between training, validation and (out of distribution) test sets.
+As such, there is no real distinction between training, validation and test sets.
 Computing the solution for a known and given set of samples is much more akin to classical optimization,
 where inverse problems like the previous Burgers example stem from.

 For machine learning, we typically work under the assumption that the final performance of our 
 model will be evaluated on a different, potentially unknown set of inputs. The _test data_
-should usually capture such out of distribution (OOD) behavior, so that we can make estimates
+should usually capture such _out of distribution_ (OOD) behavior, so that we can make estimates
 about how well our model will generalize to "real-world" cases that we will encounter when 
-we deploy it into an application.
+we deploy it in an application.

 In contrast, for the PINN training as described here, we reconstruct a single solution in a known 
 and given space-time region. As such, any samples from this domain follow the same distribution
-and hence don't really represent test or OOD sampes. As the NN directly encodes the solution,
+and hence don't really represent test or OOD samples. As the NN directly encodes the solution,
 there is also little hope that it will yield different solutions, or perform well outside
-of the training distribution. If we're interested in a different solution, we most likely 
+of the training range. If we're interested in a different solution, we 
 have to start training the NN from scratch.

 ![Divider](resources/divider5.jpg)
@ -48,7 +48,7 @@ have to start training the NN from scratch.

 Thus, the physical soft constraints allow us to encode solutions to 
 PDEs with the tools of NNs.
-An inherent drawback of this approach is that it yields single solutions,
+An inherent drawback of this variant 2 is that it yields single solutions,
 and that it does not combine with traditional numerical techniques well. 
 E.g., the learned representation is not suitable to be refined with 
 a classical iterative solver such as the conjugate gradient method. 
--- a/physicalloss.md
+++ b/physicalloss.md
@ -10,13 +10,8 @@ We as humans have a lot of knowledge about how to describe physical processes
 mathematically. As the following chapters will show, we can improve the
 training process by guiding it with our human knowledge of physics.

-```{figure} resources/physloss-overview.jpg
---
-height: 220px
-name: physloss-overview
---
-Physical losses typically combine a supervised loss with a combination of derivatives from the neural network.
-```
+![Divider](resources/divider6.jpg)
+

 ## Using physical models

@ -63,17 +58,57 @@ Note that, similar to the data samples used for supervised training, we have no
 residual terms $R$ will actually reach zero during training. The non-linear optimization of the training process
 will minimize the supervised and residual terms as much as possible, but there is no guarantee. Large, non-zero residual 
 contributions can remain. We'll look at this in more detail in the upcoming code example, for now it's important 
-to remember that physical constraints in this way only represent _soft-constraints_, without guarantees
+to remember that physical constraints in this way only represent _soft constraints_, without guarantees
 of minimizing these constraints.

-## Neural network derivatives
+The previous overview did not really make clear how an NN produces $\mathbf{u}$.
+We can distinguish two different approaches here:
+via a chosen explicit representation of the target function (v1 in the following), or via using fully-connected neural networks to represent the solution (v2). 
+E.g., for v1 we could set up a _spatial_ grid (or graph, or a set of sample points), while in the second case no explicit representation exists, and the NN instead receives the _spatial coordinate_ to produce the solution at a query position.
+We'll outline these two variants in more detail the following.

-In order to compute the residuals at training time, it would be possible to store 
-the unknowns of $\mathbf{u}$ on a computational mesh, e.g., a grid, and discretize the equations of
-$R$ there. This has a fairly long "tradition" in DL, and was proposed by Tompson et al. {cite}`tompson2017` early on.
+---
+
+## Variant 1: Residual derivatives for explicit representations
+
+For variant 1, we choose the discretization and set up a computational mesh that covers our target domain. Without loss of generality, let's assume this is a Cartesian grid that samples the space with positions $\mathbf{p}$. Now, an NN is trained to produce the solution on the grid: $\mathbf{u}(\mathbf{p}) = f(\mathbf{x} ; \theta)$. For a regular grid, a CNN would be a good choice for $f$, while for triangle meshes we could use a graph-network, or a network with point-convolutions for particles.
+
+```{figure} resources/physloss-overview-v1.jpg
+---
+height: 220px
+name: physloss-overview-v1
+---
+Variant 1: the solution is represented by a chosen computational mesh, and produced by an NN. The residual is discretized there, and can be combined with supervised terms.
+```
+
+Now, we can discretize the equations of
+$R$ on our computational mesh, and compute derivatives with our method of choice. Only caveat: to incorporate the residual 
+into training, we have to formulate the evaluation such that a deep learning framework can backpropagate through the
+calculations. As our network $f()$ produces the solution $\mathbf{u}$, and the residual depends on it ($R(\mathbf{u})$), we at least need $\partial R / \partial \mathbf u$, such that the gradient can be backpropagated for the weights $\theta$. Luckily, if we formulate $R$ in terms of operations of a DL framework, this will be taken care of by the backpropagation functionality of the framework.
+
+This variant has a fairly long "tradition" in DL, and was, e.g., proposed by Tompson et al. {cite}`tompson2017` early on to learn 
+divergence free motions. To give a specific example: if our goal is to learn velocities $\mathbf u(t)$ which are divergence free $\nabla \cdot \mathbf u=0$, we can employ this training approach to train an NN without having to pre-compute divergence free velocity fields as training data. For brevity, we will drop the spatial index ($\mathbf p$) here, and focus on $t$, which we can likewise simplify: divergence-freeness has to hold at all times, and hence we can consider a single step from $t=0$ with $\Delta t=1$, i.e., a normalized step from a divergent $\mathbf u(0)$ to a divergence-free $\mathbf u(1)$. For a normal solver, we'd have to compute a pressure 
+$p=\nabla^{-2} \mathbf{u}(0)$, such that $\mathbf{u}(1) = \mathbf{u}(0) - \nabla p$. This is the famous fundamental 
+theorem of vector calculus, or
+[Helmholtz decomposition](https://en.wikipedia.org/wiki/Helmholtz_decomposition), splitting a vector field into a _solenoidal_ (divergence-free) and irrotational part (the pressure gradient). 
+
+To learn this decomposition, we can approximate $p$ with a CNN on our computational mesh: $p = f(\mathbf{u}(0) ; \theta)$. The learning objective becomes minimizing the divergence of $\mathbf u(0)$, which means minimizing
+$\nabla \cdot \big( \mathbf{u}(0) - \nabla f(\mathbf{u}(0);\theta) \big)$. 
+To implement this residual, all we need to do is provide the divergence operator $(\nabla \cdot)$ of $\mathbf u$ on our computational mesh. This is typically easy to do via 
+a convolutional layer in the DL framework that contains the finite difference weights for the divergence.
+Nicely enough, in this case we don't even need additional supervised samples, and can typically purely train with this residual formulation. Also, in contrast to variant 2 below, we can directly handle fairly large spaces of solutions here (we're not restricted to learning single solutions)
+An example implementation can be found in this [code repository](https://github.com/tum-pbs/CG-Solver-in-the-Loop).
+
+Overall, this variant 1 has a lot in common with _differentiable physics_ training (it's basically a subset). As we'll discuss differentiable physics in a lot more detail
+in {doc}`diffphys` and after, we'll focus on the direct NN representation (variant 2) from now on. 
+
+---
+
+## Variant 2: Derivatives from a neural network representation
+
+The second variant of employing physical residuals as soft constraints 
+instead uses fully connected NNs to represent $\mathbf{u}$. This _physics-informed_ approach was popularized by Raissi et al. {cite}`raissi2019pinn`, and has some interesting pros and cons that we'll outline in the following. We will target  this  physics-informed version (variant 2) in the following code examples and discussions.

-A popular variant of employing physical soft-constraints {cite}`raissi2019pinn`
-instead uses fully connected NNs to represent $\mathbf{u}$. This has some interesting pros and cons that we'll outline in the following, and we will also focus on it in the following code examples and comparisons.

 The central idea here is that the aforementioned general function $f$ that we're after in our learning problems
 can also be used to obtain a representation of a physical field, e.g., a field $\mathbf{u}$ that satisfies $R=0$. This means $\mathbf{u}(\mathbf{x})$ will 
@ -86,6 +121,14 @@ in {doc}`overview`. Now, we can use the same tools to compute spatial derivative
 Note that above for $R$ we've written this derivative in the shortened notation as $\mathbf{u}_{x}$.
 For functions over time this of course also works for $\partial \mathbf{u} / \partial t$, i.e. $\mathbf{u}_{t}$ in the notation above.

+```{figure} resources/physloss-overview-v2.jpg
+---
+height: 220px
+name: physloss-overview-v2
+---
+Variant 2: the solution is produced by a fully-connected network, typically requiring a supervised loss with a combination of derivatives from the neural network for the residual. These NN derivatives have their own share of advantages and disadvantages.
+```
+
 Thus, for some generic $R$, made up of $\mathbf{u}_t$ and $\mathbf{u}_{x}$ terms, we can rely on the backpropagation algorithm
 of DL frameworks to compute these derivatives once we have a NN that represents $\mathbf{u}$. Essentially, this gives us a 
 function (the NN) that receives space and time coordinates to produce a solution for $\mathbf{u}$. Hence, the input is typically
@ -101,7 +144,7 @@ For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we ca

 ## Summary so far

-The approach above gives us a method to include physical equations into DL learning as a soft-constraint: the residual loss.
+The approach above gives us a method to include physical equations into DL learning as a soft constraint: the residual loss.
 Typically, this setup is suitable for _inverse problems_, where we have certain measurements or observations
 for which we want to find a PDE solution. Because of the high cost of the reconstruction (to be 
 demonstrated in the following), the solution manifold shouldn't be overly complex. E.g., it is not possible 
--- a/resources/pbdl-figures.key
+++ b/resources/pbdl-figures.key
--- a/resources/physloss-overview-v1.jpg
+++ b/resources/physloss-overview-v1.jpg
--- a/resources/physloss-overview-v2.jpg
+++ b/resources/physloss-overview-v2.jpg
--- a/resources/physloss-overview.jpg
+++ b/resources/physloss-overview.jpg
--- a/supervised-discuss.md
+++ b/supervised-discuss.md
@ -158,7 +158,7 @@ To summarize, supervised training has the following properties.
 - Interactions with external "processes" (such as embedding into a solver) are difficult.

 The next chapters will explain how to alleviate these shortcomings of supervised training.
-First, we'll look at bringing model equations into the picture via soft-constraints, and afterwards
+First, we'll look at bringing model equations into the picture via soft constraints, and afterwards
 we'll revisit the challenges of bringing together numerical simulations and learned approaches.