large update of differentiable physics chapter

2025-02-17 14:01:59 +08:00 · 2025-02-17 14:01:59 +08:00 · 3f8c7bc672
commit 3f8c7bc672
parent 3907a75d1a
7 changed files with 85 additions and 59 deletions
--- a/diffphys-code-control.ipynb
+++ b/diffphys-code-control.ipynb
--- a/diffphys-code-ns.ipynb
+++ b/diffphys-code-ns.ipynb
--- a/diffphys-code-sol.ipynb
+++ b/diffphys-code-sol.ipynb
@ -8,7 +8,7 @@
   "source": [
    "# Reducing Numerical Errors with Neural Operators\n",
    "\n",
-    "In this example we will target numerical errors that arise in the discretization of a continuous PDE $\\mathcal P^*$, i.e. when we formulate $\\mathcal P$. This approach will demonstrate that, despite the lack of closed-form descriptions, discretization errors often are functions with regular and repeating structures and, thus, can be learned by a neural operator. Once trained, the neural network (NN) can be evaluated locally to improve the solution of a PDE-solver, i.e., to reduce its numerical error. The resulting method is a hybrid one: it will always perform (a coarse) PDE solve, and then improve it at runtime with corrections inferred by an NN.\n",
+    "In this example we will target numerical errors that arise in the discretization of a continuous PDE $\\mathcal P^*$, i.e. when we formulate $\\mathcal P$. This approach will demonstrate that, despite the lack of closed-form descriptions, discretization errors often are functions with regular and repeating structures and, thus, can be learned by a discretized neural operator. Once trained, the neural network (NN) can be evaluated locally to improve the solution of a PDE-solver, i.e., to reduce its numerical error. The resulting method is a hybrid one: it will always perform (a coarse) PDE solve, and then improve it at runtime with corrections inferred by an NN.\n",
    "\n",
    "\n",
    "Pretty much all numerical methods contain some form of iterative process: repeated updates over time for explicit solvers, or within a single update step for implicit or steady-state solvers.\n",
@ -79,10 +79,8 @@
    "\\newcommand{\\vr}[1]{\\mathbf{r}_{#1}}\n",
    "\\loss ( \\mathcal{P}_{s}( \\corr (\\mathcal{T} \\vr{t}) ) , \\mathcal{T} \\vr{t+1}) < \\loss ( \\mathcal{P}_{s}( \\mathcal{T} \\vr{t} ), \\mathcal{T} \\vr{t+1})$.\n",
    "\n",
-    "The correction operator  \n",
-    "$\\newcommand{\\vcN}{\\mathbf{s}} \\newcommand{\\corr}{\\mathcal{C}} \\corr (\\vcN | \\theta)$\n",
-    "is represented as a deep neural network with weights $\\theta$\n",
-    "and receives the state $\\mathbf{s}$ to infer an additive correction field with the same dimension.\n",
+    "The correction operator $\\newcommand{\\vcN}{\\mathbf{s}} \\newcommand{\\corr}{\\mathcal{C}} \\corr (\\vcN | \\theta)$ is represented \n",
+    "as a deep neural network with weights $\\theta$ and receives the state $\\mathbf{s}$ to infer an additive correction field with the same dimension.\n",
    "To distinguish the original states $\\mathbf{s}$ from the corrected ones, we'll denote the latter with an added tilde $\\tilde{\\mathbf{s}}$.\n",
    "The overall learning goal now becomes\n",
    "\n",
@ -136,8 +134,9 @@
   "source": [
    "try:\n",
    "    import google.colab  # to ensure that we are inside colab\n",
-    "    !pip install --upgrade --quiet phiflow==3.2\n",
+    "    !pip install --upgrade --quiet phiflow==3.3\n",
    "    #!pip install --upgrade --quiet git+https://github.com/tum-pbs/PhiFlow@develop\n",
+    "    \n",
    "    # for pbdl-dataset:\n",
    "    !pip install --upgrade --quiet git+https://github.com/tum-pbs/pbdl-dataset\n",
    "\n",
--- a/diffphys-discuss.md
+++ b/diffphys-discuss.md
@ -11,13 +11,19 @@ What is primarily exciting in this context are the implications that arise from
 Most importantly, training via differentiable physics allows us to seamlessly bring the two fields together:
 we can obtain _hybrid_ methods, that use the best numerical methods that we have at our disposal for the simulation itself, as well as for the training process. We can then use the trained model to improve forward or backward solves. Thus, in the end, we have a solver that combines a _traditional_ solver and a _learned_ component that in combination can improve the capabilities of numerical methods.

-## Interaction
+## Reducing data shift via interaction

-One key aspect that is important for these hybrids to work well is to let the NN _interact_ with the PDE solver at training time. Differentiable simulations allow a trained model to "explore and experience" the physical environment, and receive directed feedback regarding its interactions throughout the solver iterations. This combination nicely fits into the broader context of machine learning as _differentiable programming_. 
+One key aspect that is important for these hybrids to work well is to let the NN _interact_ with the PDE solver at training time. Differentiable simulations allow a trained model to "explore and experience" the physical environment, and receive directed feedback regarding its interactions throughout the solver iterations. 
+
+This addresses the classic **data shift** problem of machine learning: rather than relying on a _a-priori_ specified distribution for training the network, the training process generates new trajectories via unrolling on the fly, and computes training signals from them. This can be seen as an _a-posteriori_ approach, and makes the trained NN significantly more resilient to unseen inputs. As we'll evaluate in more detail in {doc}`probmodels-uncond`, it's actually hard to beat a good unrolling setup with other approaches.
+
+Note that the topic of _differentiable physics_ nicely fits into the broader context of machine learning as _differentiable programming_. 

 ## Generalization

-The hybrid approach also bears particular promise for simulators: it improves generalizing capabilities of the trained models by letting the PDE-solver handle large-scale _changes to the data distribution_ such that the learned model can focus on localized structures not captured by the discretization. While physical models generalize very well, learned models often specialize in data distributions seen at training time. This was, e.g., shown for the models reducing numerical errors of the previous chapter: the trained models can deal with solution manifolds with significant amounts of varying physical behavior, while simpler training variants quickly deteriorate over the course of recurrent time steps.
+The hybrid approach also bears particular promise for simulators: it improves generalizing capabilities of the trained models by letting the PDE-solver handle large-scale _changes to the data distribution_. This allows the learned model to focus on localized structures not captured by the discretization. While physical models generalize very well, learned models often specialize in data distributions seen at training time. Hence, this aspect benefits from the previous reduction of data shift, and effectively allows for even larger differences in terms of input distribution. If the NN is set up correctly, these can be handled by the classical solver in a hybrid approach.
+
+These benefits were, e.g., shown for the models reducing numerical errors of {doc}`diffphys-code-sol`: the trained models can deal with solution manifolds with significant amounts of varying physical behavior, while simpler training variants would deteriorate over the course of recurrent time steps.


 ![Divider](resources/divider5.jpg)
@ -28,16 +34,17 @@ To summarize, the pros and cons of training NNs via DP:
 - Uses physical model and numerical methods for discretization.
 - Efficiency and accuracy of selected methods carries over to training.
 - Very tight coupling of physical models and NNs possible.
- Improved generalization via solver interactions.
+- Improved resilience and generalization.

 ❌ Con: 
 - Not compatible with all simulators (need to provide gradients).
 - Require more heavy machinery (in terms of framework support) than previously discussed methods.

-_Outlook_: the last negative point (regarding heavy machinery) is bound to strongly improve given the current pace of software and API developments in the DL area. However, for now it's important to keep in mind that not every simulator is suitable for DP training out of the box. Hence, in this book we'll focus on examples using phiflow, which was designed for interfacing with deep learning frameworks. 
+_Outlook_: the last negative point (regarding heavy machinery) is strongly improving at the moment. Many existing simulators, e.g. the popular open source framework _OpenFoma_, as well as many commercial simulators are working on tight integrations with NNs. However, there's still plenty room for improvement, and in this book we're focusing on examples using phiflow, which was designed for interfacing with deep learning frameworks from ground up. 

-The training via differentiable physics (DP) allows us to integrate full numerical simulations into the training of deep neural networks.
-It is also a very generic approach that is applicable to a wide range of combinations of PDE-based models and deep learning. 
+The training via differentiable physics (DP) allows us to integrate full numerical simulations into the training of deep neural networks. 
+This effectively provides **hard constraints**, as the coupled solver can project and enforce constraints just like classical solvers would.
+It is a very generic approach that is applicable to a wide range of combinations of PDE-based models and deep learning. 

-In the next chapters, we will first compare DP training to model-free alternatives for control problems, and afterwards target the underlying learning process to obtain even better NN states.
+In the next chapters, we will first expand the scope of the learning tasks to incorporate uncertainties, i.e. to work with full distributions rather than single deterministic states and trajectories. Afterwards, we'll also compare DP training to reinforcement learning, and target the underlying learning process to obtain even better NN states.

--- a/diffphys-dpvspinn.md
+++ b/diffphys-dpvspinn.md
@ -16,13 +16,13 @@ The DP version on the other hand inherently relies on a numerical solver that is

 The reliance on a suitable discretization requires some understanding and knowledge of the problem under consideration. A sub-optimal discretization can impede the learning process or, worst case, lead to diverging training runs. However, given the large body of theory and practical realizations of stable solvers for a wide variety of physical problems, this is typically not an unsurmountable obstacle.

-The PINN approaches on the other hand do not require an a-priori choice of a discretization, and as such seems to be "discretization-less". This, however, is only an advantage on first sight. As they yield solutions in a computer, they naturally _have_ to discretize the problem. They construct this discretization over the course of the training process, in a way that lies at the mercy of the underlying nonlinear optimization, and is not easily controllable from the outside. Thus, the resulting accuracy is determined by how well the training manages to estimate the complexity of the problem for realistic use cases, and how well the training data approximates the unknown regions of the solution.
+The PINN approaches on the other hand do not require an a-priori choice of a discretization, and as such seems to be "discretization-less". This, however, is only an advantage on first sight. By now, researchers are trying to "re-integrate" discretizations into PINN training. Generally, PINNs inevitably yield solutions in a computer and thus _have_ to discretize the problem. They construct this discretization over the course of the training process, in a way that lies at the mercy of the underlying nonlinear optimization, and is not easily controllable from the outside. Thus, the resulting accuracy is determined by how well the training manages to estimate the complexity of the problem for realistic use cases, and how well the training data approximates the unknown regions of the solution. 

 E.g., as demonstrated with the Burgers example, the PINN solutions typically have significant difficulties propagating information _backward_ in time. This is closely coupled to the efficiency of the method.

 ## Efficiency

-The PINN approaches typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfillment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.
+The PINN approach also results in fundamentally more difficult training tasks that causes convergence problems. PINNs typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfillment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.

 A well-chosen discretization of a DP approach can remedy this behavior, and provide an improved flow of gradient information. At the same time, the reliance on a computational grid means that solutions can be obtained very quickly. Given an interpolation scheme or a set of basis functions, the solution can be sampled at any point in space or time given a very local neighborhood of the computational grid. Worst case, this can lead to slight memory overheads, e.g., by repeatedly storing mostly constant values of a solution.

@ -54,4 +54,4 @@ The following table summarizes these pros and cons of physics-informed (PI) and

 As a summary, both methods are definitely interesting, and have a lot of potential. There are numerous more complicated extensions and algorithmic modifications that change and improve on the various negative aspects we have discussed for both sides.

-However, as of this writing, the physics-informed (PI) approach has clear limitations when it comes to performance and compatibility with existing numerical methods. Thus, when knowledge of the problem at hand is available, which typically is the case when we choose a suitable PDE model to constrain the learning process, employing a differentiable physics solver can significantly improve the training process as well as the quality of the obtained solution. So, in the following we'll focus on DP variants, and illustrate their capabilities with more complex scenarios in the next chapters. First, we'll consider a case that very efficiently computes space-time gradients for a transient fluid simulations.
+However, as of this writing, the PINN approach has clear limitations when it comes to performance and compatibility with existing numerical methods. Thus, when knowledge of the problem at hand is available, which typically is the case when we choose a suitable PDE model to constrain the learning process, employing a differentiable physics solver to train Neural operators can significantly improve the training process as well as the quality of the obtained solution. So, in the following we'll focus on DP variants, and illustrate their capabilities with more complex scenarios in the next chapters. First, we'll consider a case that very efficiently computes space-time gradients for a transient fluid simulations.
--- a/diffphys-examples.md
+++ b/diffphys-examples.md
@ -7,7 +7,26 @@ When using DP approaches for learning applications,
 there is a lot of flexibility w.r.t. the combination of DP and NN building blocks. 
 As some of the differences are subtle, the following section will go into more detail.
 We'll especially focus on solvers that repeat the PDE and NN evaluations multiple times,
-e.g., to compute multiple states of the physical system over time.
+e.g., to compute multiple states of the physical system over time. In classical numerics,
+this would be called an iterative time stepping method, while in the context of AI, it's
+an _autoregressive_ method.
+
+
+
+```{admonition} Hint: Correction vs Prediction
+:class: tip
+
+The problems that are best tackled with DP approaches are very fundamental. The combination of 
+a imperfect physical model and an _improvement term_ classically goes under many different names:
+_closure problems_ in fluid dynamics and turbulence, _homogenization_ or _coarse-graining_ 
+in material science, while it's called _parametrization_ in climate and weather.
+
+In the following, we'll generically denote all these tasks containing NN+solver as **correction** task, in contrast
+to pure **prediction** tasks for cases where no solver is involved at inference time.
+
+```
+
+

 To re-cap, here's the previous figure about combining NNs and DP operators. 
 In the figure these operators look like a loss term: they typically don't have weights,
@ -22,7 +41,11 @@ The DP approach as described in the previous chapters. A network produces an inp
 ```

 This setup can be seen as the network receiving information about how it's output influences the outcome of the PDE solver. I.e., the gradient will provide information how to produce an NN output that minimizes the loss. 
-Similar to the previously described _physical losses_ (from {doc}`physicalloss`), this can mean upholding a conservation law.
+Similar to the previously described {doc}`physicalloss`, this can, e.g., mean upholding a conservation law or generally a PDE-based constraint over time.
+
+
+
+

 ## Switching the order 

@ -36,15 +59,15 @@ name: diffphys-switch
 A PDE solver produces an output which is processed by an NN.
 ```

-In this case the PDE solver essentially represents an _on-the-fly_ data generator. That's not necessarily always useful: this setup could be replaced by a pre-computation of the same inputs, as the PDE solver is not influenced by the NN. Hence, there's no backpropagation through $\mathcal P$, and it could be replaced by a simple "loading" function. On the other hand, evaluating the PDE solver at training time with a randomized sampling of input parameters can lead to an excellent sampling of the data distribution of the input. If we have realistic ranges for how the inputs vary, this can improve the NN training. If implemented correctly, the solver can also alleviate the need to store and load large amounts of data, and instead produce them more quickly at training time, e.g., directly on a GPU. 
+In this case the PDE solver essentially represents an _on-the-fly_ data generator. That's not necessarily always useful: this setup could be replaced by a pre-computation of the same inputs, as the PDE solver is not influenced by the NN. Hence, there's no backpropagation through $\mathcal P$, and it could be replaced by a simple "loading" function. On the other hand, evaluating the PDE solver at training time with a randomized sampling of input parameters can lead to an excellent sampling of the data distribution of the input. If we have realistic ranges for how the inputs vary, this can improve the NN training. If implemented correctly, the solver can also alleviate the need to store and load large amounts of data, and instead produce them more quickly at training time, e.g., directly on a GPU. Recent methods explore this direction in the context of _Active Learning_.

-However, this version does not leverage the gradient information from a differentiable solver, which is why the following variant is much more interesting.
+However, this version does not leverage the gradient information from a differentiable solver, which is why the following variant is more interesting.

 ## Recurrent evaluation

-In general, there's no combination of NN layers and DP operators that is _forbidden_ (as long as their dimensions are compatible). One that makes particular sense is to "unroll" the iterations of a time stepping process of a simulator, and let the state of a system be influenced by an NN.
+A combination that makes particular sense is to **unroll** the iterations of a time stepping process of a simulator, and let the state of a system be influenced by an NN. (In general, there's no combination of NN layers and DP operators that is _forbidden_ (as long as their dimensions are compatible).)

-In this case we compute a (potentially very long) sequence of PDE solver steps in the forward pass. In-between these solver steps, an NN modifies the state of our system, which is then used to compute the next PDE solver step. During the backpropagation pass, we move backwards through all of these steps to evaluate contributions to the loss function (it can be evaluated in one or more places anywhere in the execution chain), and to backprop the gradient information through the DP and NN operators. This unrollment of solver iterations essentially gives feedback to the NN about how it's "actions" influence the state of the physical system and resulting loss. Here's a visual overview of this form of combination:
+In the case of unrolling, we compute a (potentially very long) sequence of PDE solver steps in the forward pass. In-between these solver steps, an NN modifies the state of our system, which is then used to compute the next PDE solver step. During the backpropagation pass, we move backwards through all of these steps to evaluate contributions to the loss function (it can be evaluated in one or more places anywhere in the execution chain), and to backprop the gradient information through the DP and NN operators. This unrollment of solver iterations essentially gives feedback to the NN about how it's "actions" influence the state of the physical system and resulting loss. Here's a visual overview of this form of combination:

 ```{figure} resources/diffphys-multistep.jpg
 ---
@ -54,7 +77,7 @@ name: diffphys-mulitstep
 Time stepping with interleaved DP and NN operations for $k$ solver iterations. The dashed gray arrows indicate optional intermediate evaluations of loss terms (similar to the solid gray arrow for the last step $k$), and intermediate outputs of the NN are indicated with a tilde.
 ```

-Due to the iterative nature of this process, errors will start out very small, and then slowly increase exponentially over the course of iterations. Hence they are extremely difficult to detect in a single evaluation, e.g., with a simpler supervised training setup. Rather, it is crucial to provide feedback to the NN at training time how the errors evolve over course of the iterations. Additionally, a pre-computation of the states is not possible for such iterative cases, as the iterations depend on the state of the NN. Naturally, the NN state is unknown before training time and changes while being trained. Hence, a DP-based training is crucial in these recurrent settings to provide the NN with gradients about how its current state influences the solver iterations, and correspondingly, how the weights should be changed to better achieve the learning objectives.
+Due to the iterative nature of this process, errors will start out very small, and then (for modes with eigenvalues larger than one in the Jacobian) slowly increase exponentially over the course of iterations. Hence they are extremely difficult to detect in a single evaluation, e.g., with a simpler supervised training setup. Rather, it is crucial to provide feedback to the NN at training time how the errors evolve over course of the iterations. Additionally, a pre-computation of the states is not possible for such iterative cases, as the iterations depend on the state of the NN. Naturally, the NN state is unknown before training time and changes while being trained. This is the classic ML problem of **data shift**. Hence, a DP-based training is crucial in these recurrent settings to provide the NN with gradients about how its current state influences the solver iterations, and correspondingly, how the weights should be changed to better achieve the learning objectives.

 DP setups with many time steps can be difficult to train: the gradients need to backpropagate through the full chain of PDE solver evaluations and NN evaluations. Typically, each of them represents a non-linear and complex function. Hence for larger numbers of steps, the vanishing and exploding gradient problem can make training difficult. Some practical considerations for alleviating this will follow int {doc}`diffphys-code-sol`.

@ -169,6 +192,9 @@ for training setups that tend to overfit. However, if possible, it is preferable
 actual solver in the training loop via a DP approach to give the network feedback about the time 
 evolution of the system.

+With the current state of affairs, generative modeling approaches (denoising diffusion or flow matching) or 
+provide a better founded approach for incorporating noise. We'll look into this topic in more detail in {doc}`probmodels-uncond`.
+
 ---

 ## Complex examples
--- a/diffphys.md
+++ b/diffphys.md
@ -6,14 +6,17 @@ methods and physical simulations we will target incorporating _differentiable
 numerical simulations_ into the learning process. In the following, we'll shorten
 these "differentiable numerical simulations of physical systems" to just "differentiable physics" (DP).

-The central goal of these methods is to use existing numerical solvers, and equip
+The central goal of these methods is to use existing numerical solvers
+to empower and improve AI systems.
+This requires equipping
 them with functionality to compute gradients with respect to their inputs.
 Once this is realized for all operators of a simulation, we can leverage 
 the autodiff functionality of DL frameworks with backpropagation to let gradient 
 information flow from a simulator into an NN and vice versa. This has numerous 
 advantages such as improved learning feedback and generalization, as we'll outline below.

-In contrast to physics-informed loss functions, it also enables handling more complex
+In contrast to the physics-informed loss functions of the previous chapter, 
+it also enables handling more complex
 solution manifolds instead of single inverse problems. 
 E.g., instead of using deep learning
 to solve single inverse problems as in the previous chapter, 
@ -31,7 +34,7 @@ provide directions in the form of gradients to steer the learning process.

 ## Differentiable operators

-With the DP direction we build on existing numerical solvers. I.e., 
+With DP we build on _existing_ numerical solvers. I.e., 
 the approach is strongly relying on the algorithms developed in the larger field 
 of computational methods for a vast range of physical effects in our world.
 To start with, we need a continuous formulation as model for the physical effect that we'd like 
@ -128,6 +131,7 @@ one by one.
 For the details of forward and reverse mode differentiation, please check out external materials such 
 as this [nice survey by Baydin et al.](https://arxiv.org/pdf/1502.05767.pdf).

+
 ## Learning via DP operators 

 Thus, once the operators of our simulator support computations of the Jacobian-vector 
@ -209,7 +213,7 @@ Informally, we'd like to find a flow that deforms $d^{~0}$ through the PDE model
 The simplest way to express this goal is via an $L^2$ loss between the two states. So we want
 to minimize the loss function $L=|d(t^e) - d^{\text{target}}|^2$. 

-Note that as described here this inverse problem is a pure optimization task: there's no NN involved,
+Note that as described here, this inverse problem is a pure optimization task: there's no NN involved,
 and our goal is to obtain $\mathbf{u}$. We do not want to apply this velocity to other, unseen _test data_,
 as would be custom in a real learning task.