Compare commits

..

68 Commits

Author SHA1 Message Date
N_T
45d2b6529e more phiflow 3.4 updates for HEAT SIP 2025-08-12 10:58:56 +02:00
N_T
1396482270 more phiflow 3.4 updates; warning SoL code not yet working 2025-08-12 09:26:34 +02:00
N_T
eda7ba974e phiflow 3.4 updates 2025-08-12 09:23:18 +02:00
N_T
a3de575c19 fixed several typos 2025-08-06 15:08:15 +02:00
N_T
be1dba99e4 fixed typos 2025-06-13 16:19:45 +02:00
N_T
cc2a7ef4ce clarified JVPs 2025-06-03 15:35:29 +02:00
N_T
68bd753ceb clarified FNO scaling 2025-04-27 16:07:38 +02:00
N_T
4919e7a429 Fixed 2nd device bug in teaser example 2025-03-31 16:25:31 +02:00
N_T
cf13364482 Merge branch 'main' of github.com:tum-pbs/pbdl-book 2025-03-31 16:24:59 +02:00
N_T
8eb2c3c7f7 Fixed device bug in teaser example 2025-03-31 16:23:15 +02:00
N_T
50044397a4 fixed pm graph equations 2025-03-24 20:51:21 +01:00
N_T
d95c94ac58 fixed typo in README title 2025-03-22 21:59:31 +01:00
N_T
3503fc77bf Updated links 2025-03-21 12:28:49 +01:00
N_T
971f397e79 Updated readme 2025-03-21 12:26:57 +01:00
N_T
f5e25a9d78 missing file 2025-03-20 20:19:42 +01:00
N_T
b7667370d2 updated intro teaser 2025-03-20 15:55:14 +01:00
N_T
39fcd963ab added genAI dividers 2025-03-20 13:56:40 +01:00
N_T
ac3586cfc1 added transformer block figure 2025-03-19 21:28:54 +01:00
N_T
8afab892b1 smaller tweaks 2025-03-19 17:01:16 +01:00
N_T
a70c03517c updated for graphs Mario 2025-03-19 17:01:00 +01:00
N_T
27b3940d06 fixes Mario (arch) and equation cleanup 2025-03-18 20:33:37 +01:00
N_T
4589cf2860 updates and teaks flow matching and continuity across the whole chapter 2025-03-18 16:20:10 +01:00
N_T
6391dbab10 updated NF 2025-03-17 16:54:53 +01:00
NT
fb5229a105 prob models introduction with code from Benjamin 2025-03-17 15:32:01 +01:00
N_T
3b73717017 updated references 2025-02-20 09:49:35 +08:00
N_T
16f0f351ac cleanup of code examples 2025-02-19 09:53:02 +08:00
N_T
c59992f349 updated intro and outlook 2025-02-19 09:52:46 +08:00
N_T
3f8c7bc672 large update of differentiable physics chapter 2025-02-17 14:01:59 +08:00
N_T
3907a75d1a fixed jupyterbook error for indented pip install 2025-02-17 13:53:49 +08:00
N_T
deaf4c5066 physgrad sin images as jpgs 2025-02-17 13:49:12 +08:00
N_T
7278a04cf1 fixing PDf output, removing citations in figure captions for now as these are causing problem in the tex output 2025-02-17 09:57:48 +08:00
N_T
16e2c13930 added prob models discussion, phi33 notebook fixes 2025-02-14 12:12:53 +08:00
N_T
dacb0d1a2d notebook updates 2025-02-10 14:08:54 +08:00
N_T
4f1763f696 update physloss chapter 2025-02-10 11:35:22 +08:00
N_T
0981a281fe update supervised chapter 2025-02-06 14:05:50 +08:00
N_T
4dd1611430 update of overview chapter 2025-02-05 11:34:07 +08:00
N_T
b87621c92e updated intro overview picture 2025-02-03 19:54:44 +08:00
N_T
8f8634119d updated intro discussion 2025-02-03 15:28:55 +08:00
N_T
dbd5d53e31 updated intro and logo 2025-01-28 09:38:16 +08:00
N_T
38ca428a8a ellipse code updates, added run-in-colab-links 2025-01-27 15:06:59 +08:00
N_T
458934b3c8 first version of DGN example 2025-01-27 14:12:26 +08:00
N_T
60dd9aa3bc first version of ellipse notebook 2025-01-23 13:56:00 +08:00
N_T
8e4b659a4e first version of graph DMs 2025-01-21 12:27:15 +08:00
N_T
54d0dfc203 files and tweaks for SBI-sim notebook 2025-01-17 17:54:09 +08:00
N_T
4c0fdd8dc0 first version of SBI-sim notebook 2025-01-17 17:53:34 +08:00
N_T
3b53adb75d first version of SMDP integration 2025-01-15 11:26:59 +08:00
N_T
d317201c66 added probmodel figures 2025-01-09 15:44:40 +08:00
N_T
084b0e6265 added fno arch figures 2025-01-07 13:53:55 +08:00
N_T
0e2736df52 first round of arch figures 2025-01-07 11:38:03 +08:00
N_T
4febc59084 first draft of arch section 2025-01-03 14:57:53 +08:00
N_T
df71d662fa probmodels phys section 2024-12-27 20:04:29 +08:00
N_T
24f80a841f added diff-models graph section, intro probmodels-ddpm-fm 2024-12-27 10:04:12 +08:00
N_T
3e694b217c fixed typos Georg 2024-12-18 12:32:59 +08:00
N_T
5cb92b4943 diffusion time prediction updates 2024-12-13 13:06:49 +08:00
N_T
c469ed4e14 updated diffusion time prediction and stability sections 2024-12-13 11:37:28 +08:00
N_T
47a51ba60c added uncond. stability chapter 2024-12-09 16:57:18 +08:00
N_T
abb6b46d0f SoL typo 2024-12-09 16:39:09 +08:00
N_T
1049044612 updated SoL code to phiflow3.2 2024-12-09 16:37:50 +08:00
N_T
148118cbe3 improved pip install for probmodels-ddpm-fm 2024-12-09 14:24:48 +08:00
N_T
fe1393fcd1 added references, minor typos, TOC, todo: move dppm to ddpm notebook 2024-12-09 10:31:53 +08:00
N_T
dad3e8fc8d supervised airfoils with images 2024-11-29 18:09:05 +08:00
N_T
2f9c37141f supervised airfoils fixed typo 2024-11-29 18:02:55 +08:00
N_T
960887d527 updated supervised airfoils notebook 2024-11-29 18:01:49 +08:00
N_T
285bff8b95 fixed pip install 2024-11-05 14:33:18 +08:00
N_T
4595ba208d first version of DDPM to FM notebook 2024-11-05 14:14:16 +08:00
N_T
dc9580b092 clarified tensor vs grid differences 2024-11-05 13:56:27 +08:00
N_T
2685e69f7d fixed typos 2024-10-25 14:00:47 +08:00
N_T
9bd9f531ea added HH learning code 2024-10-25 13:40:52 +08:00
104 changed files with 11934 additions and 1106 deletions

View File

@@ -1,4 +1,4 @@
# Welcome to the Physics-based Deep Learning book (PBDL) v0.2
# Welcome to the Physics-based Deep Learning book (PBDL) v0.3
This is the source code repository for the Jupyter book "Physics-based Deep Learning". You can find the full, readable version online at:
[https://physicsbaseddeeplearning.org/](https://physicsbaseddeeplearning.org/)
@@ -9,19 +9,26 @@ A single-PDF version is also available on arXiv: https://arxiv.org/pdf/2109.0523
# A Short Synopsis
The PBDL book contains a practical and comprehensive introduction of everything related to deep learning in the context of physical simulations. As much as possible, all topics come with hands-on code examples in the form of Jupyter notebooks to quickly get started. Beyond standard supervised learning from data, well look at physical loss constraints, more tightly coupled learning algorithms with differentiable simulations, as well as reinforcement learning and uncertainty modeling. We live in exciting times: these methods have a huge potential to fundamentally change what we can achieve with simulations.
The PBDL book contains a hands-on, comprehensive guide to deep learning in the realm of physical simulations. Rather than just theory, we emphasize practical application: every concept is paired with interactive Jupyter notebooks to get you up and running quickly. Beyond traditional supervised learning, we dive into physical loss-constraints, differentiable simulations, diffusion-based approaches for probabilistic generative AI, as well as reinforcement learning and advanced neural network architectures. These foundations are paving the way for the next generation of scientific foundation models. We are living in an era of rapid transformation. These methods have the potential to redefine whats possible in computational science.
The key aspects that we will address in the following are:
* explain how to use deep learning techniques to solve PDE problems,
* how to combine them with existing knowledge of physics,
* without discarding our knowledge about numerical methods.
* How to train neural networks to predict the fluid flow around airfoils with diffusion modeling. This gives a probabilistic surrogate model that replaces and outperforms traditional simulators.
* How to use model equations as residuals to train networks that represent solutions, and how to improve upon these residual constraints by using differentiable simulations.
* How to more tightly interact with a full simulator for inverse problems. E.g., well demonstrate how to circumvent the convergence problems of standard reinforcement learning techniques by leveraging simulators in the training loop.
* Well also discuss the importance of choosing the right network architecture: whether to consider global or local interactions, continuous or discrete representations, and structured versus unstructured graph meshes.
The focus of this book lies on:
* Field-based simulations (not much on Lagrangian methods)
* Combinations with deep learning (plenty of other interesting ML techniques exist, but won't be discussed here)
* Experiments as are left as an outlook (such as replacing synthetic data with real-world observations)
* how to use deep learning techniques to solve PDE problems,
* how to combine them with existing knowledge of physics,
* without discarding numerical methods.
At the same time, its worth noting what we wont be covering:
* Theres no in-depth introduction to deep learning and numerical simulations,
* and the aim is neither a broad survey of research articles in this area.
The name of this book, _Physics-based Deep Learning_, denotes combinations of physical modeling and numerical simulations with methods based on artificial neural networks. The general direction of Physics-Based Deep Learning represents a very active, quickly growing and exciting field of research.
@@ -29,24 +36,27 @@ The aim is to build on all the powerful numerical techniques that we have at our
The resulting methods have a huge potential to improve what can be done with numerical methods: in scenarios where a solver targets cases from a certain well-defined problem domain repeatedly, it can for instance make a lot of sense to once invest significant resources to train a neural network that supports the repeated solves. Based on the domain-specific specialization of this network, such a hybrid could vastly outperform traditional, generic solvers.
![Divider](resources/divider-gen2.jpg)
# What's new?
* For readers familiar with v0.1 of this text, the [extended section on differentiable physics training](http://physicsbaseddeeplearning.org/diffphys-examples.html) and the
brand new chapter on [improved learning methods for physics problems](http://physicsbaseddeeplearning.org/diffphys-examples.html) are highly recommended starting points.
Whats new in v0.3? This latest edition takes things even further with a major new chapter on generative modeling, covering cutting-edge techniques like denoising, flow-matching, autoregressive learning, physics-integrated constraints, and diffusion-based graph networks. Weve also introduced a dedicated section on neural architectures specifically designed for physics simulations. All code examples have been updated to leverage the latest frameworks.
# Teasers
To mention a few highlights: the book contains a notebook to train hybrid fluid flow (Navier-Stokes) solvers via differentiable physics to reduce numerical errors. Try it out:
To mention a few highlights: the book contains a notebook to train hybrid fluid flow (Navier-Stokes) solvers via differentiable physics to reduce numerical errors. Try it out in Colab:
https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/diffphys-code-sol.ipynb
In v0.2 there's new notebook for an improved learning scheme which jointly computes update directions for neural networks and physics (via half-inverse gradients):
https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/physgrad-hig-code.ipynb
PBDL also has example code to train diffusion denoising and flow matching networks for RANS flow predictions around airfoils that yield uncertainty estimates. You can run the code right away here:
https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/probmodels-ddpm-fm.ipynb
It also has example code to train a Bayesian Neural Network for RANS flow predictions around airfoils that yield uncertainty estimates. You can run the code right away here:
https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/bayesian-code.ipynb
There's a notebook for an improved learning scheme which jointly computes update directions for neural networks and physics (via half-inverse gradients):
https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/physgrad-hig-code.ipynb
And a notebook to compare proximal policy-based reinforcement learning with physics-based learning for controlling PDEs (spoiler: the physics-aware version does better in the end). Give it a try:
https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/reinflearn-code.ipynb
![Divider](resources/divider-gen4.jpg)

View File

@@ -2,9 +2,9 @@
# Learn more at https://jupyterbook.org/customize/config.html
title: Physics-based Deep Learning
author: N. Thuerey, P. Holl, M. Mueller, P. Schnell, F. Trost, K. Um
author: N. Thuerey, B. Holzschuh, P. Holl, G. Kohl, M. Lino, Q. Liu, P. Schnell, F. Trost
logo: resources/logo.jpg
copyright: "2021,2022"
copyright: "2021 - 2025"
only_build_toc_files: true
launch_buttons:
@@ -34,3 +34,12 @@ html:
use_issues_button: true
use_repository_button: true
favicon: "favicon.ico"
# for $$ equations in text
parse:
myst_dmath_double_inline: true
sphinx:
extra_extensions:
- sphinx_proof

View File

@@ -10,13 +10,16 @@ parts:
- file: overview-burgers-forw.ipynb
- file: overview-ns-forw.ipynb
- file: overview-optconv.md
- caption: Neural Surrogates and Operators
chapters:
- file: supervised.md
sections:
- file: supervised-airfoils.ipynb
- file: supervised-discuss.md
- file: supervised-arch.md
- file: supervised-airfoils.ipynb
- file: supervised-discuss.md
- caption: Physical Losses
chapters:
- file: physicalloss.md
- file: physicalloss-div.ipynb
- file: physicalloss-code.ipynb
- file: physicalloss-discuss.md
- caption: Differentiable Physics
@@ -25,12 +28,25 @@ parts:
- file: diffphys-code-burgers.ipynb
- file: diffphys-dpvspinn.md
- file: diffphys-code-ns.ipynb
- caption: Differentiable Physics with NNs
chapters:
- file: diffphys-examples.md
- file: diffphys-code-sol.ipynb
- file: diffphys-code-control.ipynb
- file: diffphys-discuss.md
- caption: Probabilistic Learning
chapters:
- file: probmodels-intro.md
- file: probmodels-normflow.ipynb
- file: probmodels-score.ipynb
- file: probmodels-diffusion.ipynb
- file: probmodels-flowmatching.ipynb
- file: probmodels-ddpm-fm.ipynb
- file: probmodels-phys.md
- file: probmodels-sbisim.ipynb
- file: probmodels-time.ipynb
- file: probmodels-uncond.md
- file: probmodels-graph.md
- file: probmodels-graph-ellipse.ipynb
- file: probmodels-discuss.md
- caption: Reinforcement Learning
chapters:
- file: reinflearn-intro.md
@@ -44,16 +60,12 @@ parts:
- file: physgrad-hig.md
- file: physgrad-hig-code.ipynb
- file: physgrad-discuss.md
- caption: PBDL and Uncertainty
chapters:
- file: bayesian-intro.md
- file: bayesian-code.ipynb
- caption: Fast Forward Topics
chapters:
- file: others-intro.md
- file: others-timeseries.md
- file: others-GANs.md
- file: others-lagrangian.md
- file: others-GANs.md
- caption: End Matter
chapters:
- file: outlook.md

View File

@@ -32,7 +32,7 @@
},
"outputs": [],
"source": [
"!pip install --upgrade --quiet phiflow==3.1\n",
"!pip install --upgrade --quiet phiflow==3.4\n",
"from phi.tf.flow import *\n",
"\n",
"N = 128\n",
@@ -325,7 +325,7 @@
"Optimization step 35, loss: 0.008185\n",
"Optimization step 40, loss: 0.005186\n",
"Optimization step 45, loss: 0.003263\n",
"Runtime 130.33s\n"
"Runtime 132.33s\n"
]
}
],

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,4 +1,4 @@
Discussion
Discussion of Differentiable Physics
=======================
The previous sections have explained the _differentiable physics_ approach for deep learning, and have given a range of examples: from a very basic gradient calculation, all the way to complex learning setups powered by advanced simulations. This is a good time to take a step back and evaluate: in the end, the differentiable physics components of these approaches are not too complicated. They are largely based on existing numerical methods, with a focus on efficiently using those methods not only to do a forward simulation, but also to compute gradient information.
@@ -11,13 +11,19 @@ What is primarily exciting in this context are the implications that arise from
Most importantly, training via differentiable physics allows us to seamlessly bring the two fields together:
we can obtain _hybrid_ methods, that use the best numerical methods that we have at our disposal for the simulation itself, as well as for the training process. We can then use the trained model to improve forward or backward solves. Thus, in the end, we have a solver that combines a _traditional_ solver and a _learned_ component that in combination can improve the capabilities of numerical methods.
## Interaction
## Reducing data shift via interaction
One key aspect that is important for these hybrids to work well is to let the NN _interact_ with the PDE solver at training time. Differentiable simulations allow a trained model to "explore and experience" the physical environment, and receive directed feedback regarding its interactions throughout the solver iterations. This combination nicely fits into the broader context of machine learning as _differentiable programming_.
One key aspect that is important for these hybrids to work well is to let the NN _interact_ with the PDE solver at training time. Differentiable simulations allow a trained model to "explore and experience" the physical environment, and receive directed feedback regarding its interactions throughout the solver iterations.
This addresses the classic **data shift** problem of machine learning: rather than relying on a _a-priori_ specified distribution for training the network, the training process generates new trajectories via unrolling on the fly, and computes training signals from them. This can be seen as an _a-posteriori_ approach, and makes the trained NN significantly more resilient to unseen inputs. As we'll evaluate in more detail in {doc}`probmodels-uncond`, it's actually hard to beat a good unrolling setup with other approaches.
Note that the topic of _differentiable physics_ nicely fits into the broader context of machine learning as _differentiable programming_.
## Generalization
The hybrid approach also bears particular promise for simulators: it improves generalizing capabilities of the trained models by letting the PDE-solver handle large-scale _changes to the data distribution_ such that the learned model can focus on localized structures not captured by the discretization. While physical models generalize very well, learned models often specialize in data distributions seen at training time. This was, e.g., shown for the models reducing numerical errors of the previous chapter: the trained models can deal with solution manifolds with significant amounts of varying physical behavior, while simpler training variants quickly deteriorate over the course of recurrent time steps.
The hybrid approach also bears particular promise for simulators: it improves generalizing capabilities of the trained models by letting the PDE-solver handle large-scale _changes to the data distribution_. This allows the learned model to focus on localized structures not captured by the discretization. While physical models generalize very well, learned models often specialize in data distributions seen at training time. Hence, this aspect benefits from the previous reduction of data shift, and effectively allows for even larger differences in terms of input distribution. If the NN is set up correctly, these can be handled by the classical solver in a hybrid approach.
These benefits were, e.g., shown for the models reducing numerical errors of {doc}`diffphys-code-sol`: the trained models can deal with solution manifolds with significant amounts of varying physical behavior, while simpler training variants would deteriorate over the course of recurrent time steps.
![Divider](resources/divider5.jpg)
@@ -28,16 +34,17 @@ To summarize, the pros and cons of training NNs via DP:
- Uses physical model and numerical methods for discretization.
- Efficiency and accuracy of selected methods carries over to training.
- Very tight coupling of physical models and NNs possible.
- Improved generalization via solver interactions.
- Improved resilience and generalization.
❌ Con:
- Not compatible with all simulators (need to provide gradients).
- Require more heavy machinery (in terms of framework support) than previously discussed methods.
_Outlook_: the last negative point (regarding heavy machinery) is bound to strongly improve given the current pace of software and API developments in the DL area. However, for now it's important to keep in mind that not every simulator is suitable for DP training out of the box. Hence, in this book we'll focus on examples using phiflow, which was designed for interfacing with deep learning frameworks.
_Outlook_: the last negative point (regarding heavy machinery) is strongly improving at the moment. Many existing simulators, e.g. the popular open source framework _OpenFoma_, as well as many commercial simulators are working on tight integrations with NNs. However, there's still plenty room for improvement, and in this book we're focusing on examples using phiflow, which was designed for interfacing with deep learning frameworks from ground up.
The training via differentiable physics (DP) allows us to integrate full numerical simulations into the training of deep neural networks.
It is also a very generic approach that is applicable to a wide range of combinations of PDE-based models and deep learning.
The training via differentiable physics (DP) allows us to integrate full numerical simulations into the training of deep neural networks.
This effectively provides **hard constraints**, as the coupled solver can project and enforce constraints just like classical solvers would.
It is a very generic approach that is applicable to a wide range of combinations of PDE-based models and deep learning.
In the next chapters, we will first compare DP training to model-free alternatives for control problems, and afterwards target the underlying learning process to obtain even better NN states.
In the next chapters, we will first expand the scope of the learning tasks to incorporate uncertainties, i.e. to work with full distributions rather than single deterministic states and trajectories. Afterwards, we'll also compare DP training to reinforcement learning, and target the underlying learning process to obtain even better NN states.

View File

@@ -16,13 +16,13 @@ The DP version on the other hand inherently relies on a numerical solver that is
The reliance on a suitable discretization requires some understanding and knowledge of the problem under consideration. A sub-optimal discretization can impede the learning process or, worst case, lead to diverging training runs. However, given the large body of theory and practical realizations of stable solvers for a wide variety of physical problems, this is typically not an unsurmountable obstacle.
The PINN approaches on the other hand do not require an a-priori choice of a discretization, and as such seems to be "discretization-less". This, however, is only an advantage on first sight. As they yield solutions in a computer, they naturally _have_ to discretize the problem. They construct this discretization over the course of the training process, in a way that lies at the mercy of the underlying nonlinear optimization, and is not easily controllable from the outside. Thus, the resulting accuracy is determined by how well the training manages to estimate the complexity of the problem for realistic use cases, and how well the training data approximates the unknown regions of the solution.
The PINN approaches on the other hand do not require an a-priori choice of a discretization, and as such seems to be "discretization-less". This, however, is only an advantage on first sight. By now, researchers are trying to "re-integrate" discretizations into PINN training. Generally, PINNs inevitably yield solutions in a computer and thus _have_ to discretize the problem. They construct this discretization over the course of the training process, in a way that lies at the mercy of the underlying nonlinear optimization, and is not easily controllable from the outside. Thus, the resulting accuracy is determined by how well the training manages to estimate the complexity of the problem for realistic use cases, and how well the training data approximates the unknown regions of the solution.
E.g., as demonstrated with the Burgers example, the PINN solutions typically have significant difficulties propagating information _backward_ in time. This is closely coupled to the efficiency of the method.
## Efficiency
The PINN approaches typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfillment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.
The PINN approach also results in fundamentally more difficult training tasks that causes convergence problems. PINNs typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfillment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.
A well-chosen discretization of a DP approach can remedy this behavior, and provide an improved flow of gradient information. At the same time, the reliance on a computational grid means that solutions can be obtained very quickly. Given an interpolation scheme or a set of basis functions, the solution can be sampled at any point in space or time given a very local neighborhood of the computational grid. Worst case, this can lead to slight memory overheads, e.g., by repeatedly storing mostly constant values of a solution.
@@ -54,4 +54,4 @@ The following table summarizes these pros and cons of physics-informed (PI) and
As a summary, both methods are definitely interesting, and have a lot of potential. There are numerous more complicated extensions and algorithmic modifications that change and improve on the various negative aspects we have discussed for both sides.
However, as of this writing, the physics-informed (PI) approach has clear limitations when it comes to performance and compatibility with existing numerical methods. Thus, when knowledge of the problem at hand is available, which typically is the case when we choose a suitable PDE model to constrain the learning process, employing a differentiable physics solver can significantly improve the training process as well as the quality of the obtained solution. So, in the following we'll focus on DP variants, and illustrate their capabilities with more complex scenarios in the next chapters. First, we'll consider a case that very efficiently computes space-time gradients for a transient fluid simulations.
However, as of this writing, the PINN approach has clear limitations when it comes to performance and compatibility with existing numerical methods. Thus, when knowledge of the problem at hand is available, which typically is the case when we choose a suitable PDE model to constrain the learning process, employing a differentiable physics solver to train Neural operators can significantly improve the training process as well as the quality of the obtained solution. So, in the following we'll focus on DP variants, and illustrate their capabilities with more complex scenarios in the next chapters. First, we'll consider a case that very efficiently computes space-time gradients for a transient fluid simulations.

View File

@@ -7,7 +7,26 @@ When using DP approaches for learning applications,
there is a lot of flexibility w.r.t. the combination of DP and NN building blocks.
As some of the differences are subtle, the following section will go into more detail.
We'll especially focus on solvers that repeat the PDE and NN evaluations multiple times,
e.g., to compute multiple states of the physical system over time.
e.g., to compute multiple states of the physical system over time. In classical numerics,
this would be called an iterative time stepping method, while in the context of AI, it's
an _autoregressive_ method.
```{admonition} Hint: Correction vs Prediction
:class: tip
The problems that are best tackled with DP approaches are very fundamental. The combination of
a imperfect physical model and an _improvement term_ classically goes under many different names:
_closure problems_ in fluid dynamics and turbulence, _homogenization_ or _coarse-graining_
in material science, while it's called _parametrization_ in climate and weather.
In the following, we'll generically denote all these tasks containing NN+solver as **correction** task, in contrast
to pure **prediction** tasks for cases where no solver is involved at inference time.
```
To re-cap, here's the previous figure about combining NNs and DP operators.
In the figure these operators look like a loss term: they typically don't have weights,
@@ -22,7 +41,11 @@ The DP approach as described in the previous chapters. A network produces an inp
```
This setup can be seen as the network receiving information about how it's output influences the outcome of the PDE solver. I.e., the gradient will provide information how to produce an NN output that minimizes the loss.
Similar to the previously described _physical losses_ (from {doc}`physicalloss`), this can mean upholding a conservation law.
Similar to the previously described {doc}`physicalloss`, this can, e.g., mean upholding a conservation law or generally a PDE-based constraint over time.
## Switching the order
@@ -36,15 +59,15 @@ name: diffphys-switch
A PDE solver produces an output which is processed by an NN.
```
In this case the PDE solver essentially represents an _on-the-fly_ data generator. That's not necessarily always useful: this setup could be replaced by a pre-computation of the same inputs, as the PDE solver is not influenced by the NN. Hence, there's no backpropagation through $\mathcal P$, and it could be replaced by a simple "loading" function. On the other hand, evaluating the PDE solver at training time with a randomized sampling of input parameters can lead to an excellent sampling of the data distribution of the input. If we have realistic ranges for how the inputs vary, this can improve the NN training. If implemented correctly, the solver can also alleviate the need to store and load large amounts of data, and instead produce them more quickly at training time, e.g., directly on a GPU.
In this case the PDE solver essentially represents an _on-the-fly_ data generator. That's not necessarily always useful: this setup could be replaced by a pre-computation of the same inputs, as the PDE solver is not influenced by the NN. Hence, there's no backpropagation through $\mathcal P$, and it could be replaced by a simple "loading" function. On the other hand, evaluating the PDE solver at training time with a randomized sampling of input parameters can lead to an excellent sampling of the data distribution of the input. If we have realistic ranges for how the inputs vary, this can improve the NN training. If implemented correctly, the solver can also alleviate the need to store and load large amounts of data, and instead produce them more quickly at training time, e.g., directly on a GPU. Recent methods explore this direction in the context of _Active Learning_.
However, this version does not leverage the gradient information from a differentiable solver, which is why the following variant is much more interesting.
However, this version does not leverage the gradient information from a differentiable solver, which is why the following variant is more interesting.
## Recurrent evaluation
In general, there's no combination of NN layers and DP operators that is _forbidden_ (as long as their dimensions are compatible). One that makes particular sense is to "unroll" the iterations of a time stepping process of a simulator, and let the state of a system be influenced by an NN.
A combination that makes particular sense is to **unroll** the iterations of a time stepping process of a simulator, and let the state of a system be influenced by an NN. (In general, there's no combination of NN layers and DP operators that is _forbidden_ (as long as their dimensions are compatible).)
In this case we compute a (potentially very long) sequence of PDE solver steps in the forward pass. In-between these solver steps, an NN modifies the state of our system, which is then used to compute the next PDE solver step. During the backpropagation pass, we move backwards through all of these steps to evaluate contributions to the loss function (it can be evaluated in one or more places anywhere in the execution chain), and to backprop the gradient information through the DP and NN operators. This unrollment of solver iterations essentially gives feedback to the NN about how it's "actions" influence the state of the physical system and resulting loss. Here's a visual overview of this form of combination:
In the case of unrolling, we compute a (potentially very long) sequence of PDE solver steps in the forward pass. In-between these solver steps, an NN modifies the state of our system, which is then used to compute the next PDE solver step. During the backpropagation pass, we move backwards through all of these steps to evaluate contributions to the loss function (it can be evaluated in one or more places anywhere in the execution chain), and to backprop the gradient information through the DP and NN operators. This unrollment of solver iterations essentially gives feedback to the NN about how it's "actions" influence the state of the physical system and resulting loss. Here's a visual overview of this form of combination:
```{figure} resources/diffphys-multistep.jpg
---
@@ -54,7 +77,7 @@ name: diffphys-mulitstep
Time stepping with interleaved DP and NN operations for $k$ solver iterations. The dashed gray arrows indicate optional intermediate evaluations of loss terms (similar to the solid gray arrow for the last step $k$), and intermediate outputs of the NN are indicated with a tilde.
```
Due to the iterative nature of this process, errors will start out very small, and then slowly increase exponentially over the course of iterations. Hence they are extremely difficult to detect in a single evaluation, e.g., with a simpler supervised training setup. Rather, it is crucial to provide feedback to the NN at training time how the errors evolve over course of the iterations. Additionally, a pre-computation of the states is not possible for such iterative cases, as the iterations depend on the state of the NN. Naturally, the NN state is unknown before training time and changes while being trained. Hence, a DP-based training is crucial in these recurrent settings to provide the NN with gradients about how its current state influences the solver iterations, and correspondingly, how the weights should be changed to better achieve the learning objectives.
Due to the iterative nature of this process, errors will start out very small, and then (for modes with eigenvalues larger than one in the Jacobian) slowly increase exponentially over the course of iterations. Hence they are extremely difficult to detect in a single evaluation, e.g., with a simpler supervised training setup. Rather, it is crucial to provide feedback to the NN at training time how the errors evolve over course of the iterations. Additionally, a pre-computation of the states is not possible for such iterative cases, as the iterations depend on the state of the NN. Naturally, the NN state is unknown before training time and changes while being trained. This is the classic ML problem of **data shift**. Hence, a DP-based training is crucial in these recurrent settings to provide the NN with gradients about how its current state influences the solver iterations, and correspondingly, how the weights should be changed to better achieve the learning objectives.
DP setups with many time steps can be difficult to train: the gradients need to backpropagate through the full chain of PDE solver evaluations and NN evaluations. Typically, each of them represents a non-linear and complex function. Hence for larger numbers of steps, the vanishing and exploding gradient problem can make training difficult. Some practical considerations for alleviating this will follow int {doc}`diffphys-code-sol`.
@@ -169,6 +192,9 @@ for training setups that tend to overfit. However, if possible, it is preferable
actual solver in the training loop via a DP approach to give the network feedback about the time
evolution of the system.
With the current state of affairs, generative modeling approaches (denoising diffusion or flow matching) or
provide a better founded approach for incorporating noise. We'll look into this topic in more detail in {doc}`probmodels-uncond`.
---
## Complex examples

View File

@@ -6,14 +6,17 @@ methods and physical simulations we will target incorporating _differentiable
numerical simulations_ into the learning process. In the following, we'll shorten
these "differentiable numerical simulations of physical systems" to just "differentiable physics" (DP).
The central goal of these methods is to use existing numerical solvers, and equip
The central goal of these methods is to use existing numerical solvers
to empower and improve AI systems.
This requires equipping
them with functionality to compute gradients with respect to their inputs.
Once this is realized for all operators of a simulation, we can leverage
the autodiff functionality of DL frameworks with backpropagation to let gradient
information flow from a simulator into an NN and vice versa. This has numerous
advantages such as improved learning feedback and generalization, as we'll outline below.
In contrast to physics-informed loss functions, it also enables handling more complex
In contrast to the physics-informed loss functions of the previous chapter,
it also enables handling more complex
solution manifolds instead of single inverse problems.
E.g., instead of using deep learning
to solve single inverse problems as in the previous chapter,
@@ -31,7 +34,7 @@ provide directions in the form of gradients to steer the learning process.
## Differentiable operators
With the DP direction we build on existing numerical solvers. I.e.,
With DP we build on _existing_ numerical solvers. I.e.,
the approach is strongly relying on the algorithms developed in the larger field
of computational methods for a vast range of physical effects in our world.
To start with, we need a continuous formulation as model for the physical effect that we'd like
@@ -39,14 +42,13 @@ to simulate -- if this is missing we're in trouble. But luckily, we can
tap into existing collections of model equations and established methods
for discretizing continuous models.
Let's assume we have a continuous formulation $\mathcal P^*(\mathbf{x}, \nu)$ of the physical quantity of
interest $\mathbf{u}(\mathbf{x}, t): \mathbb R^d \times \mathbb R^+ \rightarrow \mathbb R^d$,
Let's assume we have a continuous formulation $\mathcal P^*(\mathbf{u}, \nu)$ of the physical quantity of
interest $\mathbf{u}(\mathbf{u}, t): \mathbb R^d \times \mathbb R^+ \rightarrow \mathbb R^d$,
with model parameters $\nu$ (e.g., diffusion, viscosity, or conductivity constants).
The components of $\mathbf{u}$ will be denoted by a numbered subscript, i.e.,
$\mathbf{u} = (u_1,u_2,\dots,u_d)^T$.
%and a corresponding discrete version that describes the evolution of this quantity over time: $\mathbf{u}_t = \mathcal P(\mathbf{x}, \mathbf{u}, t)$.
Typically, we are interested in the temporal evolution of such a system.
Discretization yields a formulation $\mathcal P(\mathbf{x}, \nu)$
Discretization yields a formulation $\mathcal P(\mathbf{u}, \nu)$
that we re-arrange to compute a future state after a time step $\Delta t$.
The state at $t+\Delta t$ is computed via sequence of
operations $\mathcal P_1, \mathcal P_2 \dots \mathcal P_m$ such that
@@ -60,7 +62,7 @@ $\partial \mathcal P_i / \partial \mathbf{u}$.
```
Note that we typically don't need derivatives
for all parameters of $\mathcal P(\mathbf{x}, \nu)$, e.g.,
for all parameters of $\mathcal P(\mathbf{u}, \nu)$, e.g.,
we omit $\nu$ in the following, assuming that this is a
given model parameter with which the NN should not interact.
Naturally, it can vary within the solution manifold that we're interested in,
@@ -111,7 +113,7 @@ E.g., for two of them
$$
\frac{ \partial (\mathcal P_1 \circ \mathcal P_2) }{ \partial \mathbf{u} } \Big|_{\mathbf{u}^n}
=
\frac{ \partial \mathcal P_1 }{ \partial \mathbf{u} } \big|_{\mathcal P_2(\mathbf{u}^n)}
\frac{ \partial \mathcal P_1 }{ \partial \mathcal P_2 } \big|_{\mathcal P_2(\mathbf{u}^n)}
\
\frac{ \partial \mathcal P_2 }{ \partial \mathbf{u} } \big|_{\mathbf{u}^n} \ ,
$$
@@ -128,6 +130,7 @@ one by one.
For the details of forward and reverse mode differentiation, please check out external materials such
as this [nice survey by Baydin et al.](https://arxiv.org/pdf/1502.05767.pdf).
## Learning via DP operators
Thus, once the operators of our simulator support computations of the Jacobian-vector
@@ -209,7 +212,7 @@ Informally, we'd like to find a flow that deforms $d^{~0}$ through the PDE model
The simplest way to express this goal is via an $L^2$ loss between the two states. So we want
to minimize the loss function $L=|d(t^e) - d^{\text{target}}|^2$.
Note that as described here this inverse problem is a pure optimization task: there's no NN involved,
Note that as described here, this inverse problem is a pure optimization task: there's no NN involved,
and our goal is to obtain $\mathbf{u}$. We do not want to apply this velocity to other, unseen _test data_,
as would be custom in a real learning task.

File diff suppressed because one or more lines are too long

View File

@@ -7,25 +7,15 @@ name: pbdl-logo-large
---
```
Welcome to the _Physics-based Deep Learning Book_ (v0.2) 👋
Welcome to the _Physics-based Deep Learning Book_ (v0.3, the _GenAI_ edition) 👋
**TL;DR**:
This document contains a practical and comprehensive introduction of everything
related to deep learning in the context of physical simulations.
As much as possible, all topics come with hands-on code examples in the
form of Jupyter notebooks to quickly get started.
Beyond standard _supervised_ learning from data, we'll look at _physical loss_ constraints,
more tightly coupled learning algorithms with _differentiable simulations_,
training algorithms tailored to physics problems,
as well as
reinforcement learning and uncertainty modeling.
We live in exciting times: these methods have a huge potential to fundamentally
change what computer simulations can achieve.
This document is a hands-on, comprehensive guide to deep learning in the realm of physical simulations. Rather than just theory, we emphasize practical application: every concept is paired with interactive Jupyter notebooks to get you up and running quickly. Beyond traditional supervised learning, we dive into physical _loss-constraints_, _differentiable_ simulations, _diffusion-based_ approaches for _probabilistic generative AI_, as well as reinforcement learning and advanced neural network architectures. These foundations are paving the way for the next generation of scientific _foundation models_.
We are living in an era of rapid transformation. These methods have the potential to redefine whats possible in computational science.
```{note}
_What's new in v0.2?_
For readers familiar with v0.1 of this text, the extended section {doc}`diffphys-examples` and the
brand new chapter on improved learning methods for physics problems (starting with {doc}`physgrad`) are highly recommended starting points.
_What's new in v0.3?_
This latest edition adds a major new chapter on generative modeling, covering powerful techniques like denoising, flow-matching, autoregressive learning, physics-integrated constraints, and diffusion-based graph networks. We've also introduced a dedicated section on neural architectures specifically designed for physics simulations. All code examples have been updated to leverage the latest frameworks.
```
---
@@ -34,13 +24,13 @@ brand new chapter on improved learning methods for physics problems (starting wi
As a _sneak preview_, the next chapters will show:
- How to train networks to infer a fluid flow around shapes like airfoils, and estimate the uncertainty of the prediction. This gives a _surrogate model_ that replaces a traditional numerical simulation.
- How to train neural networks to [predict the fluid flow around airfoils with diffusion modeling](probmodels-ddpm-fm). This gives a probabilistic _surrogate model_ that replaces and outperforms traditional simulators.
- How to use model equations as residuals to train networks that represent solutions, and how to improve upon these residual constraints by using _differentiable simulations_.
- How to use model equations as residuals to train networks that [represent solutions](diffphys-dpvspinn), and how to improve upon these residual constraints by using [differentiable simulations](diffphys-code-sol).
- How to more tightly interact with a full simulator for _inverse problems_. E.g., we'll demonstrate how to circumvent the convergence problems of standard reinforcement learning techniques by leveraging simulators in the training loop.
- How to more tightly interact with a full simulator for [inverse problems](diffphys-code-control). E.g., we'll demonstrate how to circumvent the convergence problems of standard reinforcement learning techniques by leveraging [simulators in the training loop](reinflearn-code).
- We'll also discuss the importance of _inversion_ for the update steps, and how higher-order information can be used to speed up convergence, and obtain more accurate neural networks.
- We'll also discuss the importance of [choosing the right network architecture](supervised-arch): whether to consider global or local interactions, continuous or discrete representations, and structured versus unstructured graph meshes.
Throughout this text,
we will introduce different approaches for introducing physical models
@@ -85,23 +75,25 @@ Some visual examples of numerically simulated time sequences. In this book, we e
## Thanks!
This project would not have been possible without the help of many people who contributed. Thanks to everyone 🙏 Here's an alphabetical list:
This project would not have been possible without the help of the many people who contributed to it. A big thanks to everyone 🙏 Here's an alphabetical list:
- [Benjamin Holzschuh](https://ge.in.tum.de/about/)
- [Philipp Holl](https://ge.in.tum.de/about/philipp-holl/)
- [Maximilian Mueller](https://ge.in.tum.de/)
- [Georg Kohl](https://ge.in.tum.de/about/georg-kohl/)
- [Mario Lino](https://ge.in.tum.de/about/mario-lino/)
- [Qiang Liu](https://ge.in.tum.de/about/qiang-liu/)
- [Patrick Schnell](https://ge.in.tum.de/about/patrick-schnell/)
- [Felix Trost](https://ge.in.tum.de/)
- [Felix Trost](https://ge.in.tum.de/about/)
- [Nils Thuerey](https://ge.in.tum.de/about/n-thuerey/)
- [Kiwon Um](https://ge.in.tum.de/about/kiwon/)
Additional thanks go to
Georg Kohl for the nice divider images (cf. {cite}`kohl2020lsim`),
Li-Wei Chen for the airfoil data image,
and to
Chloe Paillard for proofreading parts of the document.
% future:
% - [Georg Kohl](https://ge.in.tum.de/about/georg-kohl/)
Li-Wei Chen,
Xin Luo,
Maximilian Mueller,
Chloe Paillard,
Kiwon Um,
and all github contributors!
## Citation
@@ -109,10 +101,14 @@ If you find this book useful, please cite it via:
```
@book{thuerey2021pbdl,
title={Physics-based Deep Learning},
author={Nils Thuerey and Philipp Holl and Maximilian Mueller and Patrick Schnell and Felix Trost and Kiwon Um},
author={N. Thuerey and B. Holzschuh and P. Holl and G. Kohl and M. Lino and Q. Liu and P. Schnell and F. Trost},
url={https://physicsbaseddeeplearning.org},
year={2021},
publisher={WWW}
}
```
## Time to get started
The future of simulation is being rewritten, and with the following AI and deep learning techniques, youll be at the forefront of these developments. Lets dive in!

View File

@@ -1,13 +1,11 @@
# source this file with "." in a shell
# note this script assumes the following paths/versions: python3.7 , /Users/thuerey/Library/Python/3.7/bin/jupyter-book
# updated for nMBA !
# do clean git checkout for changes from json-cleanup-for-pdf.py via:
# git checkout diffphys-code-burgers.ipynb diffphys-code-ns.ipynb diffphys-code-sol.ipynb physicalloss-code.ipynb bayesian-code.ipynb supervised-airfoils.ipynb reinflearn-code.ipynb physgrad-code.ipynb physgrad-comparison.ipynb physgrad-hig-code.ipynb
echo
echo WARNING - still requires one manual quit of first pdf/latex pass, use shift-x to quit
echo WARNING - still requires one manual quit of first pdf/latex pass, use shift-x to quit, then fix latex
echo
PYT=python3
@@ -17,50 +15,14 @@ ${PYT} json-cleanup-for-pdf.py
# clean / remove _build dir ?
/Users/thuerey/Library/Python/3.9/bin/jupyter-book build .
xelatex book
/Users/thuerey/Library/Python/3.9/bin/jupyter-book build . --builder pdflatex
exit # sufficient for newer jupyter book versions
# manual?
#xelatex book
# old "pre" GEN
#/Users/thuerey/Library/Python/3.7/bin/jupyter-book build . --builder pdflatex
#/Users/thuerey/Library/Python/3.9/bin/jupyter-book build . --builder pdflatex
# old cleanup
cd _build/latex
#mv book.pdf book-xetex.pdf # not necessary, failed anyway
# this generates book.tex
rm -f book-in.tex sphinxmessages-in.sty book-in.aux book-in.toc
# rename book.tex -> book-in.tex (this is the original output!)
mv book.tex book-in.tex
mv sphinxmessages.sty sphinxmessages-in.sty
mv book.aux book-in.aux
mv book.toc book-in.toc
#mv sphinxmanual.cls sphinxmanual-in.cls
${PYT} ../../fixup-latex.py
# reads book-in.tex -> writes book-in2.tex
# remove unicode chars via unix iconv
# reads book-in2.tex -> writes book.tex
iconv -c -f utf-8 -t ascii book-in2.tex > book.tex
# finally run pdflatex, now it should work:
# pdflatex -recorder book
pdflatex book
pdflatex book
# unused fixup-latex.py
# for convenience, archive results in main dir
mv book.pdf ../../pbfl-book-pdflatex.pdf
tar czvf ../../pbdl-latex-for-arxiv.tar.gz *
cd ../..
ls -l ./pbfl-book-pdflatex.pdf ./pbdl-latex-for-arxiv.tar.gz
#mv book.pdf ../../pbfl-book-pdflatex.pdf
#tar czvf ../../pbdl-latex-for-arxiv.tar.gz *

View File

@@ -24,15 +24,19 @@
| Abbreviation | Meaning |
| --- | --- |
| AI | Mysterious buzzword popping up in all kinds of places these days |
| BNN | Bayesian neural network |
| CNN | Convolutional neural network |
| CNN | Convolutional neural network (specific NN architecure) |
| DDPM | Denoising diffusion probabilistic models (diffusion modeling variant) |
| DL | Deep Learning |
| GD | (steepest) Gradient Descent|
| FM | Flow matching (diffusion modeling variant) |
| FNO | Fourier neural operator (specific NN architecure) |
| GD | (steepest) Gradient Descent |
| MLP | Multi-Layer Perceptron, a neural network with fully connected layers |
| NN | Neural network (a generic one, in contrast to, e.g., a CNN or MLP) |
| PDE | Partial Differential Equation |
| PBDL | Physics-Based Deep Learning |
| SGD | Stochastic Gradient Descent|
| SGD | Stochastic Gradient Descent |

View File

@@ -1,11 +1,18 @@
Generative Adversarial Networks
=======================
A fundamental problem in machine learning is to fully represent
We've dealt with generative AI techniques and diffusion modeling
in detail in {doc}`probmodels-intro`.
As outlined there, the fundamental problem to fully represent
all possible states of a variable $\mathbf{x}$ under consideration,
i.e. to capture its full distribution.
For this task, _generative adversarial networks_ (GANs) were
shown to be powerful tools in DL. They are important when the data has ambiguous solutions,
i.e. to capture its full distribution, is a very old topic. Hence,
even before DDPMs&Co. there were techniques to make this possible,
and _generative adversarial networks_ (GANs) were
shown to be powerful tools in this context. While they've been largely replaced
by diffsion approaches in research, GANs use a highly interesting approach,
and the following sections will give an introduction and show what's possible with GANs.
Traditionally, GANs were employed when the data has ambiguous solutions,
and no differentiable physics model is available to disambiguate the data. In such a case
a supervised learning would yield an undesirable averaging that can be prevented with
a GAN approach.
@@ -21,12 +28,12 @@ results can be highly ambiguous.
## Maximum likelihood estimation
To train a GAN we have to briefly turn to classification problems.
For these, the learning objective takes a slightly different form than the
To train a GAN we have to briefly turn to _classification problems_, which we've managed to ignore up to now.
For classification, the learning objective takes a slightly different form than the
regression objective in equation {eq}`learn-l2` of {doc}`overview-equations`:
We now want to maximize the likelihood of a learned representation
$f$ that assigns a probability to an input $\mathbf{x}_i$ given a set of weights $\theta$.
This yields a maximization problem of the form
$f$ that assigns a probability to an input $\mathbf{x}_i$ given a set of weights $\theta$ for
a chosen set of $i$ distinct classes. This yields a maximization problem of the form
$$
\text{arg max}_{\theta} \Pi_i f(\mathbf{x}_i;\theta) ,
@@ -163,12 +170,12 @@ This is a highly challenging solution manifold, and requires an extended "cyclic
that pushes the discriminator to take all the physical parameters under consideration into account.
Interestingly, the generator learns to produce realistic and accurate solutions despite
being trained purely on data, i.e. without explicit help in the form of a differentiable physics solver setup.
The figure below shows a range of example outputs of a physically-parametrized GAN {cite}`chu2021physgan`.
```{figure} resources/others-GANs-meaningful-fig11.jpg
---
name: others-GANs-meaningful-fig11
---
A range of example outputs of a physically-parametrized GAN {cite}`chu2021physgan`.
The network can successfully extrapolate to buoyancy settings beyond the
range of values seen at training time.
```

View File

@@ -1,7 +1,7 @@
Additional Topics
=======================
The next sections will give a shorter introduction to other topics that are highly
The next sections will give a shorter introduction to other classic topics that are
interesting in the context of physics-based deep learning. These topics (for now) do
not come with executable notebooks, but we will still point to existing open source
implementations for each of them.

View File

@@ -6,12 +6,13 @@ While this is straight-forward for cases such as data consisting only of integer
for continuously changing quantities such as the temperature in a room.
While the previous examples have focused on aspects beyond discretization
(and used Cartesian grids as a placeholder), the following chapter will target
scenarios where learning with dynamically changing and adaptive discretization has a benefit.
scenarios where learning Neural operators with dynamically changing
and adaptive discretizations have a benefit.
## Types of computational meshes
Generally speaking, we can distinguish three types of computational meshes (or "grids")
As outlined in {doc}`supervised-arch`, we can distinguish three types of computational meshes (or "grids")
with which discretizations are typically performed:
- **structured** meshes: Structured meshes have a regular
@@ -85,7 +86,6 @@ for the next stage of convolutions. After expanding
the size of the latent space over the course of a few layers, it is contracted again
to produce the desired result, e.g., an acceleration.
% {cite}`prantl2019tranquil`
## Continuous convolutions
@@ -161,13 +161,14 @@ to reproduce such behavior.
Nonetheless, an interesting side-effect of having a trained NN for such a liquid simulation
by construction provides a differentiable solver. Based on a pre-trained network, the learned solver
then supports optimization via gradient descent, e.g., w.r.t. input parameters such as viscosity.
The following image shows an examplary _prediction_ task with continuous convolutions from {cite}`ummenhofer2019contconv`.
```{figure} resources/others-lagrangian-canyon.jpg
---
name: others-lagrangian-canyon
---
An example of a particle-based liquid spreading in a landscape scenario, simulated with
learned approach using continuous convolutions {cite}`ummenhofer2019contconv`.
learned, continuous convolutions.
```
## Source code

View File

@@ -126,6 +126,8 @@ Ideally, this step is furthermore unrolled over time to stabilize the evolution
The resulting training will be significantly more expensive, as more weights need to be trained at once,
and a much larger number of intermediate states needs to be processed. However, the increased
cost typically pays off with a reduced overall inference error.
The following images show several time frames of an example prediction of {cite}`wiewel2020lsssubdiv`,
which additionally couples the learned time evolution with a numerically solved advection step.
```{figure} resources/others-timeseries-lss-subdiv-prediction.jpg
@@ -133,8 +135,6 @@ cost typically pays off with a reduced overall inference error.
height: 300px
name: timeseries-lss-subdiv-prediction
---
Several time frames of an example prediction from {cite}`wiewel2020lsssubdiv`, which additionally couples the
learned time evolution with a numerically solved advection step.
The learned prediction is shown at the top, the reference simulation at the bottom.
```

View File

@@ -1,32 +1,31 @@
Outlook
=======================
Despite the lengthy discussions and numerous examples, we've really just barely scratched the surface regarding the possibilities that arise in the context of physics-based deep learning.
Despite the in-depth discussions and diverse examples we've explored, we've really only begun to tap into the vast potential of physics-based deep learning. The techniques covered in the previous chapters arent just useful -— they have the power to reshape computational methods for decades to come. As we've seen in the code examples, theres no magic at play; rather, deep learning provides an incredibly powerful new tool to work with complex, non-linear functions.
Most importantly, the techniques that were explained in the previous chapter have an enormous potential to influence all computational methods of the next decades. As demonstrated many times in the code examples, there's no magic involved, but deep learning gives us very powerful tools to represent and approximate non-linear functions. And deep learning by no means makes existing numerical methods deprecated. Rather, the two are an ideal combination.
Crucially, deep learning doesnt replace traditional numerical methods. Instead, it enhances them. Together, they form a groundbreaking synergy, with a huge potential to unlock new frontiers in simulation and modeling. One aspect we havent yet touched upon is perhaps the most profound: at its core, our ultimate goal is to deepen human understanding of the world. The notion of neural networks as impenetrable “black boxes” is outdated. Instead, they should be seen as just another numerical tool—one that is as interpretable as traditional simulations when used correctly.
A topic that we have not touched at all so far is, that -- of course -- in the end our goal is to improve human understanding of our world. And here the view of neural networks as "black boxes" is clearly outdated. It is simply another numerical method that humans can employ, and the physical fields predicted by a network are as interpretable as the outcome of a traditional simulation. Nonetheless, it is important to further improve the tools for analyzing learned networks, and to extract condensed formulations of the patterns and regularities the networks have found in the solution manifolds.
Looking ahead, one of the most exciting challenges is to refine our ability to analyze learned networks. By distilling the patterns and structures these networks uncover, we move closer to extracting fundamental, human-readable insights from their solution manifolds. The future of differentiable simulation isnt just about better predictions -— its about revealing the hidden order of the physical world in ways weve never imagined.
![Divider](resources/divider2.jpg)
## Some specific directions
Beyond this long term outlook, there are many interesting and immediate steps.
And while the examples with Burgers equation and Navier-Stokes solvers are clearly non-trivial, there's a wide variety of other potential PDE models that the techniques of this book can be applied to. To name just a few promising examples from other fields:
Beyond this long-term vision, there are plenty of exciting and immediate next steps. While our deep dives into Burgers equation and Navier-Stokes solvers have tackled non-trivial challenges, they represent just a fraction of the landscape of PDE models and operators that these techniques can improve. Here are just a few promising directions from other fields:
* PDEs for chemical reactions often show complex behavior due to the interactions of multiple species. Here, and especially interesting direction is to train models that quickly learn to predict the evolution of an experiment or machine, and adjust control knobs to stabilize it, i.e., an online _control_ setting.
* Chemical Reaction PDEs often exhibit intricate behaviors due to multi-species interactions. A particularly exciting avenue is training models that can rapidly predict experimental or industrial processes and dynamically adjust control parameters to stabilize them to enable real-time, intelligent control.
* Plasma simulations share a lot with vorticity-based formulations for fluids, but additionally introduce terms to handle electric and magnetic interactions within the material. Likewise, controllers for plasma fusion experiments and generators are an excellent topic with plenty of potential for DL with differentiable physics.
* Plasma Simulations share similarities with vorticity-based fluid formulations but introduce additional complexities due to electric and magnetic interactions. This makes them a prime candidate for deep learning methods, especially for plasma fusion experiments and energy generators, where differentiable physics could be a game-changer.
* Finally, weather and climate are crucial topics for humanity, and highly complex systems of fluid flows interacting with a multitude of phenomena on the surface of our planet. Accurately modeling all these interacting systems and predicting their long-term behavior shows a lot of promise to benefit from DL approaches that can interface with numerical simulations.
* Weather and Climate Modeling remain among the most critical scientific challenges for humanity. These highly complex, multi-scale systems involve fluid flows intertwined with countless environmental factors. Leveraging deep learning to enhance numerical simulations in this space holds immense potential. Not just for more accurate forecasts, but for unlocking deeper insights into the dynamics of our planet.
![Divider](resources/divider3.jpg)
## Closing remarks
So overall, there's lots of exciting research work left to do - the next years and decades definitely won't be boring. 👍
These are just a few examples, but they illustrate the incredible breadth of opportunities where differentiable physics and deep learning can make an impact. There's lots of exciting research work left to do - the next years and decades definitely won't be boring. 🤗 👍
```{figure} resources/logo.jpg
---

View File

@@ -6,9 +6,9 @@
"source": [
"# Simple Forward Simulation of Burgers Equation with phiflow\n",
"\n",
"This chapter will give an introduction for how to run _forward_, i.e., regular simulations starting with a given initial state and approximating a later state numerically, and introduce the Φ<sub>Flow</sub> framework. Φ<sub>Flow</sub> provides a set of differentiable building blocks that directly interface with deep learning frameworks, and hence is a very good basis for the topics of this book. Before going for deeper and more complicated integrations, this notebook (and the next one), will show how regular simulations can be done with Φ<sub>Flow</sub>. Later on, we'll show that these simulations can be easily coupled with neural networks.\n",
"This chapter will give an introduction for how to run _forward_, i.e., regular simulations starting with a given initial state and approximating a later state numerically, and introduce the Φ<sub>Flow</sub> framework (in the following \"phiflow\"). Phiflow provides a set of differentiable building blocks that directly interface with deep learning frameworks, and hence is a very good basis for the topics of this book. Before going for deeper and more complicated integrations, this notebook (and the next one), will show how regular simulations can be done with phiflow. Later on, we'll show that these simulations can be easily coupled with neural networks.\n",
"\n",
"The main repository for Φ<sub>Flow</sub> (in the following \"phiflow\") is [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow), and additional API documentation and examples can be found at [https://tum-pbs.github.io/PhiFlow/](https://tum-pbs.github.io/PhiFlow/).\n",
"The main repository for phiflow is [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow), and additional API documentation and examples can be found at [https://tum-pbs.github.io/PhiFlow/](https://tum-pbs.github.io/PhiFlow/).\n",
"\n",
"For this jupyter notebook (and all following ones), you can find a _\"[run in colab]\"_ link at the end of the first paragraph (alternatively you can use the launch button at the top of the page). This will load the latest version from the PBDL github repo in a colab notebook that you can execute on the spot: \n",
"[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/overview-burgers-forw.ipynb)\n",
@@ -31,7 +31,7 @@
"source": [
"## Importing and loading phiflow\n",
"\n",
"Let's get some preliminaries out of the way: first we'll import the phiflow library, more specifically the `numpy` operators for fluid flow simulations: `phi.flow` (differentiable versions for a DL framework _X_ are loaded via `phi.X.flow` instead).\n",
"Let's get some preliminaries out of the way: first we'll import the phiflow library, more specifically the `numpy` operators for fluid flow simulations: `phi.flow` (differentiable versions for a DL framework _X_ are loaded via `phi.X.flow` instead). This allows it to easily switch between different APIs, e.g., phiflow solvers can run in either PyTorch, Tensorflow or also JAX.\n",
"\n",
"**Note:** Below, the first command with a \"!\" prefix will install the [phiflow python package from GitHub](https://github.com/tum-pbs/PhiFlow) via `pip` in your python environment once you uncomment it. We've assumed that phiflow isn't installed, but if you have already done so, just comment out the first line (the same will hold for all following notebooks)."
]
@@ -45,15 +45,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Using phiflow version: 3.1.0\n"
"Using phiflow version: 3.4.0\n"
]
}
],
"source": [
"!pip install --upgrade --quiet phiflow==3.1\n",
"!pip install --upgrade --quiet phiflow==3.4\n",
"from phi.flow import *\n",
"\n",
"from phi import __version__\n",
"print(\"Using phiflow version: {}\".format(phi.__version__))"
]
},
@@ -95,7 +93,7 @@
"source": [
"\n",
"Next, we initialize a 1D `velocity` grid from the `INITIAL` numpy array that was converted into a tensor.\n",
"The extent of our domain $\\Omega$ is specifiied via the `bounds` parameter $[-1,1]$, and the grid uses periodic boundary conditions (`extrapolation.PERIODIC`). These two properties are the main difference between a tensor and a grid: the latter has boundary conditions and a physical extent.\n",
"The extent of our domain $\\Omega$ is specifiied via the `bounds` parameter $[-1,1]$, and the grid uses periodic boundary conditions (`extrapolation.PERIODIC`). These two properties are the main difference between phiflow's tensor and grid objects: the latter has boundary conditions and a physical extent.\n",
"\n",
"Just to illustrate, we'll also print some info about the velocity object: it's a `phi.math` tensor with a size of 128. Note that the actual grid content is contained in the `values` of the grid. Below we're printing five entries by using the `numpy()` function to convert the content of the phiflow tensor into a numpy array. For tensors with more dimensions, we'd need to specify the additional dimenions here, e.g., `'y,x,vector'` for a 2D velocity field. (For tensors with a single dimensions we could leave it out.)"
]

View File

@@ -1,25 +1,22 @@
Models and Equations
============================
Below we'll give a brief (really _very_ brief!) intro to deep learning, primarily to introduce the notation.
Below we'll give a _very_ brief intro to deep learning, primarily to introduce the notation.
In addition we'll discuss some _model equations_ below. Note that we'll avoid using _model_ to denote trained neural networks, in contrast to some other texts and APIs. These will be called "NNs" or "networks". A "model" will typically denote a set of model equations for a physical effect, usually PDEs.
## Deep learning and neural networks
In this book we focus on the connection with physical
models, and there are lots of great introductions to deep learning.
Hence, we'll keep it short:
the goal in deep learning is to approximate an unknown function
The goal in deep learning is to approximate an unknown function
$$
f^*(x) = y^* ,
$$ (learn-base)
where $y^*$ denotes reference or "ground truth" solutions.
$f^*(x)$ should be approximated with an NN representation $f(x;\theta)$. We typically determine $f$
where $y^*$ denotes reference or "ground truth" solutions, and
$f^*(x)$ should be approximated with an NN $f(x;\theta)$. We typically determine $f$
with the help of some variant of a loss function $L(y,y^*)$, where $y=f(x;\theta)$ is the output
of the NN.
This gives a minimization problem to find $f(x;\theta)$ such that $e$ is minimized.
This gives a minimization problem to find $f(x;\theta)$ such that $L$ is minimized.
In the simplest case, we can use an $L^2$ error, giving
$$
@@ -28,7 +25,7 @@ $$ (learn-l2)
We typically optimize, i.e. _train_,
with a stochastic gradient descent (SGD) optimizer of choice, e.g. Adam {cite}`kingma2014adam`.
We'll rely on auto-diff to compute the gradient of a _scalar_ loss $L$ w.r.t. the weights, $\partial L / \partial \theta$.
We'll rely on auto-diff to compute the gradient of the _scalar_ loss $L$ w.r.t. the weights, $\partial L / \partial \theta$.
It is crucial for the calculation of gradients that this function is scalar,
and the loss function is often also called "error", "cost", or "objective" function.
@@ -38,14 +35,14 @@ introduce scalar loss, always(!) scalar... (also called *cost* or *objective* f
For training we distinguish: the **training** data set drawn from some distribution,
the **validation** set (from the same distribution, but different data),
and **test** data sets with _some_ different distribution than the training one.
The latter distinction is important. For the test set we want
The latter distinction is important. For testing, we usually want
_out of distribution_ (OOD) data to check how well our trained model generalizes.
Note that this gives a huge range of possibilities for the test data set:
from tiny changes that will certainly work,
up to completely different inputs that are essentially guaranteed to fail.
There's no gold standard, but test data should be generated with care.
Enough for now - if all the above wasn't totally obvious for you, we very strongly recommend to
If the overview above wasn't obvious for you, we strongly recommend to
read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org),
especially the sections about [MLPs](https://www.deeplearningbook.org/contents/mlp.html)
and "Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.html).
@@ -53,7 +50,7 @@ and "Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.
```{note} Classification vs Regression
The classic ML distinction between _classification_ and _regression_ problems is not so important here:
we only deal with _regression_ problems in the following.
we only deal with _regression_ problems in the following.
```
@@ -66,8 +63,19 @@ Also interesting: from a math standpoint ''just'' non-linear optimization ...
The following section will give a brief outlook for the model equations
we'll be using later on in the DL examples.
We typically target continuous PDEs denoted by $\mathcal P^*$
whose solution is of interest in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
We typically target a continuous PDE operator denoted by $\mathcal P^*$,
which maps inputs $\mathcal U$ to $\mathcal V$, where in the most general case $\mathcal U, \mathcal V$
are both infinite dimensional Banach spaces, i.e. $\mathcal P^*: \mathcal U \rightarrow \mathcal V$.
```{admonition} Learned solution operators vs traditional ones
:class: tip
Later on, the goal will be to learn $\mathcal P^*$ (or parts of it) with a neural network. A
variety of different names are used in research: learned surrogates / hybrid simulators or emulators,
Neural operators or solvers, autoregressive models (if timesteps are involved), to name a few.
```
In practice,
the solution of interest lies in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
In addition, we often consider a time evolution for a finite time interval $t \in \mathbb{R}^{+}$.
The corresponding fields are either d-dimensional vector fields, for instance $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$,
or scalar $\mathbf{p}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}$.
@@ -79,12 +87,11 @@ To obtain unique solutions for $\mathcal P^*$ we need to specify suitable
initial conditions, typically for all quantities of interest at $t=0$,
and boundary conditions for the boundary of $\Omega$, denoted by $\Gamma$ in
the following.
$\mathcal P^*$ denotes
a continuous formulation, where we make mild assumptions about
a continuous formulation, where we need to make mild assumptions about
its continuity, we will typically assume that first and second derivatives exist.
We can then use numerical methods to obtain approximations
Traditionally, we can use numerical methods to obtain approximations
of a smooth function such as $\mathcal P^*$ via discretization.
These invariably introduce discretization errors, which we'd like to keep as small as possible.
These errors can be measured in terms of the deviation from the exact analytical solution,
@@ -127,7 +134,7 @@ and the abbreviations used in: {doc}`notation`.
%This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$
%Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.
We solve a discretized PDE $\mathcal{P}$ by performing steps of size $\Delta t$.
With numerical simulations we solve a discretized PDE $\mathcal{P}$ by performing steps of size $\Delta t$.
The solution can be expressed as a function of $\mathbf{u}$ and its derivatives:
$\mathbf{u}(\mathbf{x},t+\Delta t) =
\mathcal{P}( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} )$, where

File diff suppressed because one or more lines are too long

View File

@@ -2,7 +2,7 @@ Optimization and Convergence
============================
This chapter will give an overview of the derivations for different optimization algorithms.
In contrast to other texts, we'll start with _the_ classic optimization algorithm, Newton's method,
In contrast to other texts, we'll start with _the most classic_ optimization algorithm, Newton's method,
derive several widely used variants from it, before coming back full circle to deep learning (DL) optimizers.
The main goal is the put DL into the context of these classical methods. While we'll focus on DL, we will also revisit
the classical algorithms for improved learning algorithms later on in this book. Physics simulations exaggerate the difficulties caused by neural networks, which is why the topics below have a particular relevance for physics-based learning tasks.
@@ -62,7 +62,7 @@ In several instances we'll make use of the fundamental theorem of calculus, repe
$$f(x+\Delta) = f(x) + \int_0^1 \text{d}s ~ f'(x+s \Delta) \Delta \ . $$
In addition, we'll make use of Lipschitz-continuity with constant $\mathcal L$:
$|f(x+\Delta) + f(x)|\le \mathcal L \Delta$, and the well-known Cauchy-Schwartz inequality:
$|f(x+\Delta) - f(x)|\le \mathcal L \Delta$, and the well-known Cauchy-Schwartz inequality:
$ u^T v \le |u| \cdot |v| $.
## Newton's method

View File

@@ -2,10 +2,10 @@ Overview
============================
The name of this book, _Physics-Based Deep Learning_,
denotes combinations of physical modeling and numerical simulations with
methods based on artificial neural networks.
The general direction of Physics-Based Deep Learning represents a very
active, quickly growing and exciting field of research. The following chapter will
denotes combinations of physical modeling and **numerical simulations** with
methods based on **artificial intelligence**, i.e. neural networks.
The general direction of Physics-Based Deep Learning, also going under the name _Scientific Machine Learning_,
represents a very active, quickly growing and exciting field of research. The following chapter will
give a more thorough introduction to the topic and establish the basics
for following chapters.
@@ -15,9 +15,9 @@ height: 240px
name: overview-pano
---
Understanding our environment, and predicting how it will evolve is one of the key challenges of humankind.
A key tool for achieving these goals are simulations, and next-gen simulations
could strongly profit from integrating deep learning components to make even
more accurate predictions about our world.
A key tool for achieving these goals are computer simulations, and the next generation of these simulations
will likely strongly profit from integrating AI and deep learning components, in order to make even
better accurate predictions about the phenomena in our environment.
```
## Motivation
@@ -28,11 +28,11 @@ to the control of plasma fusion {cite}`maingi2019fesreport`,
using numerical analysis to obtain solutions for physical models has
become an integral part of science.
In recent years, machine learning technologies and _deep neural networks_ in particular,
In recent years, artificial intelligence driven by _deep neural networks_,
have led to impressive achievements in a variety of fields:
from image classification {cite}`krizhevsky2012` over
natural language processing {cite}`radford2019language`,
and more recently also for protein folding {cite}`alquraishi2019alphafold`.
and protein folding {cite}`alquraishi2019alphafold`, to various foundation models.
The field is very vibrant and quickly developing, with the promise of vast possibilities.
### Replacing traditional simulations?
@@ -45,14 +45,17 @@ for real-world, industrial applications such as airfoil flows {cite}`chen2021hig
same time outperforming traditional solvers by orders of magnitude in terms of runtime.
Instead of relying on models that are carefully crafted
from first principles, can data collections of sufficient size
be processed to provide the correct answers?
from first principles, can sufficiently large datasets
be processed instead to provide the correct answers?
As we'll show in the next chapters, this concern is unfounded.
Rather, it is crucial for the next generation of simulation systems
to bridge both worlds: to
combine _classical numerical_ techniques with _deep learning_ methods.
combine _classical numerical_ techniques with _A.I._ methods.
In addition, the latter offer exciting new possibilities in areas that
have been challenging for traditional methods, such as dealing
with complex _distributions and uncertainty_ in simulations.
One central reason for the importance of this combination is
One central reason for the importance of the combination with numerics is
that DL approaches are powerful, but at the same time strongly profit
from domain knowledge in the form of physical models.
DL techniques and NNs are novel, sometimes difficult to apply, and
@@ -70,47 +73,48 @@ developed in the field of numerical mathematics, this book will
show that it is highly beneficial to use them as much as possible
when applying DL.
### Black boxes and magic?
### Black boxes?
People who are unfamiliar with DL methods often associate neural networks
with _black boxes_, and see the training processes as something that is beyond the grasp
In the past, AI and DL methods have often associated trained neural networks
with _black boxes_, implying that they are something that is beyond the grasp
of human understanding. However, these viewpoints typically stem from
relying on hearsay and not dealing with the topic enough.
relying on hearsay and general skepticism about "hyped" topics.
Rather, the situation is a very common one in science: we are facing a new class of methods,
and "all the gritty details" are not yet fully worked out. This is pretty common
The situation is a very common one in science, though: we are facing a new class of methods,
and "all the gritty details" are not yet fully worked out. This is and has been pretty common
for all kinds of scientific advances.
Numerical methods themselves are a good example. Around 1950, numerical approximations
and solvers had a tough standing. E.g., to cite H. Goldstine,
numerical instabilities were considered to be a
"constant source of anxiety in the future" {cite}`goldstine1990history`.
By now we have a pretty good grasp of these instabilities, and numerical methods
are ubiquitous and well established.
are ubiquitous and well established. AI, neural networks follow the same path of
human progress.
Thus, it is important to be aware of the fact that -- in a way -- there is nothing
magical or otherworldly to deep learning methods. They're simply another set of
numerical tools. That being said, they're clearly fairly new, and right now
very special or otherworldly to deep learning methods. They're simply a new set of
numerical tools. That being said, they're clearly very new, and right now
definitely the most powerful set of tools we have for non-linear problems.
Just because all the details aren't fully worked out and nicely written up,
that shouldn't stop us from including these powerful methods in our numerical toolbox.
That all the details aren't fully worked out and have nicely been written up
shouldn't stop us from including these powerful methods in our numerical toolbox.
### Reconciling DL and simulations
### Reconciling AI and simulations
Taking a step back, the aim of this book is to build on all the powerful techniques that we have
at our disposal for numerical simulations, and use them wherever we can in conjunction
with deep learning.
As such, a central goal is to _reconcile_ the data-centered viewpoint with physical simulations.
As such, a central goal is to _reconcile_ the AI viewpoint with physical simulations.
```{admonition} Goals of this document
:class: tip
The key aspects that we will address in the following are:
- explain how to use deep learning techniques to solve PDE problems,
- how to use deep learning techniques to **solve PDE** problems,
- how to combine them with **existing knowledge** of physics,
- without **discarding** our knowledge about numerical methods.
- without **discarding** numerical methods.
At the same time, it's worth noting what we won't be covering:
- introductions to deep learning and numerical simulations,
- we're neither aiming for a broad survey of research articles in this area.
- there's no in-depth **introduction** to deep learning and numerical simulations (there are great other works already taking care of this),
- and the aim is neither a broad survey of research articles in this area.
```
The resulting methods have a huge potential to improve
@@ -118,26 +122,28 @@ what can be done with numerical methods: in scenarios
where a solver targets cases from a certain well-defined problem
domain repeatedly, it can for instance make a lot of sense to once invest
significant resources to train
a neural network that supports the repeated solves. Based on the
domain-specific specialization of this network, such a hybrid solver
could vastly outperform traditional, generic solvers. And despite
a neural network that supports the repeated solves.
The development of large so-called "foundation models" is especially
promising in this area.
Based on the domain-specific specialization via fine-tuning with a smaller dataset,
a hybrid solver could vastly outperform traditional, generic solvers. And despite
the many open questions, first publications have demonstrated
that this goal is not overly far away {cite}`um2020sol,kochkov2021`.
that this goal is a realistic one {cite}`um2020sol,kochkov2021`.
Another way to look at it is that all mathematical models of our nature
are idealized approximations and contain errors. A lot of effort has been
made to obtain very good model equations, but to make the next
big step forward, DL methods offer a very powerful tool to close the
big step forward, AI and DL methods offer a very powerful tool to close the
remaining gap towards reality {cite}`akkaya2019solving`.
## Categorization
Within the area of _physics-based deep learning_,
we can distinguish a variety of different
approaches, from targeting constraints, combined methods, and
optimizations to applications. More specifically, all approaches either target
approaches, e.g., targeting constraints, combined methods,
optimizations and applications. More specifically, all approaches either target
_forward_ simulations (predicting state or temporal evolution) or _inverse_
problems (e.g., obtaining a parametrization for a physical system from
problems (e.g., obtaining a parametrization or state for a physical system from
observations).
![An overview of categories of physics-based deep learning methods](resources/physics-based-deep-learning-overview.jpg)
@@ -160,17 +166,14 @@ techniques:
gradients from a PDE-based formulation. These soft constraints sometimes also go
under the name "physics-informed" training.
- _Interleaved_: the full physical simulation is interleaved and combined with
an output from a deep neural network; this requires a fully differentiable
simulator and represents the tightest coupling between the physical system and
the learning process. Interleaved differentiable physics approaches are especially important for
temporal evolutions, where they can yield an estimate of the future behavior of the
dynamics.
- _Hybrid_: the full physical simulation is interleaved and combined with
an output from a deep neural network; this usually requires a fully differentiable
simulator. It represents the tightest coupling between the physical system and
the learning process and results in a hybrid solver that combines classic techniques with AI-based ones.
Thus, methods can be categorized in terms of forward versus inverse
solve, and how tightly the physical model is integrated into the
optimization loop that trains the deep neural network. Here, especially
interleaved approaches that leverage _differentiable physics_ allow for
solve, and how tightly the physical model is integrated with the neural network.
Here, especially hybrid approaches that leverage _differentiable physics_ allow for
very tight integration of deep learning and numerical simulation methods.
@@ -186,19 +189,28 @@ In contrast, we'll focus on _physical_ simulations from now on, hence the name.
When coming from other backgrounds, other names are more common however. E.g., the differentiable
physics approach is equivalent to using the adjoint method, and coupling it with a deep learning
procedure. Effectively, it is also equivalent to apply backpropagation / reverse-mode differentiation
to a numerical simulation. However, as mentioned above, motivated by the deep learning viewpoint,
to a numerical simulation.
However, as mentioned above, motivated by the deep learning viewpoint,
we'll refer to all these as "differentiable physics" approaches from now on.
The hybrid solvers that result from integrating DL with a traditional solver can also be seen
as a classic topic: in this context, the neural network has the task to _correct_ the solver.
This correction can in turn either target numerical errors, or unresolved terms in an equation.
This is a fundamental problem in science that has been addressed under various names, e.g.,
as the _closure problem_ in fluid dynamics and turbulence, as _homogenization_ or _coarse-graining_
in material science, and _parametrization_ in climate and weather simulation. The re-invention
of this goal in the different fields points to the importance of the underlying problem,
and this text will illustrate the new ways that DL offers to tackle it.
---
## Looking ahead
_Physical simulations_ are a huge field, and we won't be able to cover all possible types of physical models and simulations.
_Physics simulations_ are a huge field, and we won't be able to cover all possible types of physical models and simulations.
```{note} Rather, the focus of this book lies on:
- _Field-based simulations_ (no Lagrangian methods)
- Dense _field-based simulations_ (no Lagrangian methods)
- Combinations with _deep learning_ (plenty of other interesting ML techniques exist, but won't be discussed here)
- Experiments are left as an _outlook_ (i.e., replacing synthetic data with real-world observations)
```
@@ -218,24 +230,17 @@ A brief look at our _notation_ in the {doc}`notation` chapter won't hurt in both
## Implementations
This text also represents an introduction to a wide range of deep learning and simulation APIs.
We'll use popular deep learning APIs such as _pytorch_ [https://pytorch.org](https://pytorch.org) and _tensorflow_ [https://www.tensorflow.org](https://www.tensorflow.org), and additionally
give introductions into the differentiable simulation framework _Φ<sub>Flow</sub> (phiflow)_ [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow). Some examples also use _JAX_ [https://github.com/google/jax](https://github.com/google/jax). Thus after going through
these examples, you should have a good overview of what's available in current APIs, such that
This text also represents an introduction to deep learning and simulation APIs.
We'll primarily use the popular deep learning API _pytorch_ [https://pytorch.org](https://pytorch.org), but also a bit of _tensorflow_ [https://www.tensorflow.org](https://www.tensorflow.org), and additionally
give introductions into the differentiable simulation framework _Φ<sub>Flow</sub> (phiflow)_ [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow). Some examples also use _JAX_ [https://github.com/google/jax](https://github.com/google/jax), which provides an interesting alternative.
Thus after going through these examples, you should have a good overview of what's available in current APIs, such that
the best one can be selected for new tasks.
As we're (in most Jupyter notebook examples) dealing with stochastic optimizations, many of the following code examples will produce slightly different results each time they're run. This is fairly common with NN training, but it's important to keep in mind when executing the code. It also means that the numbers discussed in the text might not exactly match the numbers you'll see after re-running the examples.
As we're dealing with stochastic optimizations in most of the Jupyter notebooks, many of the following code examples will produce slightly different results each time they're run. This is fairly common with NN training, but it's important to keep in mind when executing the code. It also means that the numbers discussed in the text might not exactly match the numbers you'll see after re-running the examples.
<!-- ## A brief history of PBDL in the context of Fluids
First:
Tompson, seminal...
First: Tompson, seminal...
Chu, descriptors, early but not used
Ling et al. isotropic turb, small FC, unused?
PINNs ... and more ... -->

View File

@@ -84,7 +84,7 @@
}
],
"source": [
"!pip install --upgrade --quiet phiflow==3.1\n",
"!pip install --upgrade --quiet phiflow==3.4\n",
"from phi.torch.flow import * # switch to TF with \"phi.tf.flow\""
]
},
@@ -254,9 +254,9 @@
" signal_prior = 0.5\n",
" expected_amp = 1. * kernel.shape.get_size('x') * inv_kernel # This can be measured\n",
" signal_likelihood = math.exp(-0.5 * (abs(amp) / expected_amp) ** 2) * signal_prior # this can be NaN\n",
" signal_likelihood = math.where(math.isfinite(signal_likelihood), signal_likelihood, math.zeros_like(signal_likelihood))\n",
" signal_likelihood = math.where(math.is_finite(signal_likelihood), signal_likelihood, math.zeros_like(signal_likelihood))\n",
" noise_likelihood = math.exp(-0.5 * (abs(amp) / f_uncertainty) ** 2) * (1 - signal_prior)\n",
" probability_signal = math.divide_no_nan(signal_likelihood, (signal_likelihood + noise_likelihood))\n",
" probability_signal = math.safe_div(signal_likelihood, (signal_likelihood + noise_likelihood))\n",
" action = math.where((0.5 >= probability_signal) | (probability_signal >= 0.68), 2 * (probability_signal - 0.5), 0.) # 1 sigma required to take action\n",
" prob_kernel = math.exp(log_kernel * action)\n",
" return prob_kernel, probability_signal\n",
@@ -310,7 +310,7 @@
"BATCH = batch(batch=128)\n",
"STEPS = 50\n",
"\n",
"math.seed(0)\n",
"#math.seed(0)\n",
"net = u_net(1, 1)\n",
"optimizer = adam(net, 0.001)"
]
@@ -347,7 +347,7 @@
"def loss_function(net, x_gt: CenteredGrid, sip: bool):\n",
" y_target = diffuse.fourier(x_gt, 8., 1)\n",
" with math.precision(32):\n",
" prediction = field.native_call(net, field.to_float(y_target)).vector[0]\n",
" prediction = field.native_call(net, field.to_float(y_target))\n",
" prediction += field.mean(x_gt) - field.mean(prediction)\n",
" x = field.stop_gradient(prediction)\n",
" if sip:\n",
@@ -459,7 +459,7 @@
}
],
"source": [
"math.seed(0)\n",
"#math.seed(0)\n",
"net_gd = u_net(1, 1)\n",
"optimizer_gd = adam(net_gd, 0.001)\n",
"\n",
@@ -648,4 +648,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}

View File

@@ -126,7 +126,7 @@ The value of $\xi$ determines the conditioning of $\mathcal P$ with large $\xi$
Here's an example of the resulting loss landscape for $y^*=(0.3, -0.5)$, $\xi=1$, $\phi=15^\circ$ that shows the entangling of the sine function for $x_1$ and linear change for $x_2$:
```{figure} resources/physgrad-sin-loss.png
```{figure} resources/physgrad-sin-loss.jpg
---
height: 200px
name: physgrad-sin-loss
@@ -137,7 +137,7 @@ Next we train a fully-connected neural network to invert this problem via equati
We'll compare SIP training using a saddle-free Newton solver to various state-of-the-art network optimizers.
For fairness, the best learning rate is selected independently for each optimizer.
When choosing $\xi=1$ the problem is perfectly conditioned. In this case all network optimizers converge, with Adam having a slight advantage. This is shown in the left graph:
```{figure} resources/physgrad-sin-time-graphs.png
```{figure} resources/physgrad-sin-time-graphs.jpg
---
height: 180px
name: physgrad-sin-time-graphs
@@ -154,7 +154,7 @@ While the evaluation of the Hessian inherently requires more computations, the p
By increasing $\xi$ while keeping $\phi=0$ fixed we can show how the conditioning continually influences the different methods,
as shown on the left here:
```{figure} resources/physgrad-sin-add-graphs.png
```{figure} resources/physgrad-sin-add-graphs.jpg
---
height: 180px
name: physgrad-sin-add-graphs

File diff suppressed because one or more lines are too long

View File

@@ -3,12 +3,13 @@ Discussion of Physical Losses
The good news so far is - we have a DL method that can include
physical laws in the form of soft constraints by minimizing residuals.
However, as the very simple previous example illustrates, this is just a conceptual
starting point.
However, as the very simple previous example illustrates, this causes
new difficulties, and is just a conceptual starting point.
On the positive side, we can leverage DL frameworks with backpropagation to compute
the derivatives of the model. At the same time, this puts us at the mercy of the learned
representation regarding the reliability of these derivatives. Also, each derivative
the derivatives of the model. At the same time, this makes the loss landscape more
complicated, relies on the learned
representation regarding the reliability of the derivatives. Also, each derivative
requires backpropagation through the full network. This can be very expensive, especially
for higher-order derivatives.
@@ -16,16 +17,12 @@ And while the setup is relatively simple, it is generally difficult to control.
has flexibility to refine the solution by itself, but at the same time, tricks are necessary
when it doesn't focus on the right regions of the solution.
## Is it "Machine Learning"?
## Generalization?
One question that might also come to mind at this point is: _can we really call it machine learning_?
Of course, such denomination questions are superficial - if an algorithm is useful, it doesn't matter
what name it has. However, here the question helps to highlight some important properties
that are typically associated with algorithms from fields like machine learning or optimization.
One main reason _not_ to call the optimization of the previous notebook machine learning (ML), is that the
One aspect to note with the previous PINN optimization is that the
positions where we test and constrain the solution are the final positions we are interested in.
As such, there is no real distinction between training, validation and test sets.
As such, from a classic ML standpoint, there is no real distinction between training, validation and test sets.
Computing the solution for a known and given set of samples is much more akin to classical optimization,
where inverse problems like the previous Burgers example stem from.
@@ -33,7 +30,8 @@ For machine learning, we typically work under the assumption that the final perf
model will be evaluated on a different, potentially unknown set of inputs. The _test data_
should usually capture such _out of distribution_ (OOD) behavior, so that we can make estimates
about how well our model will generalize to "real-world" cases that we will encounter when
we deploy it in an application.
we deploy it in an application. The v1 version, using a prescribed discretization actually
had this property, and could generalized to new inputs.
In contrast, for the PINN training as described here, we reconstruct a single solution in a known
and given space-time region. As such, any samples from this domain follow the same distribution
@@ -47,26 +45,27 @@ have to start training the NN from scratch.
## Summary
Thus, the physical soft constraints allow us to encode solutions to
PDEs with the tools of NNs.
An inherent drawback of this variant 2 is that it yields single solutions,
and that it does not combine with traditional numerical techniques well.
PDEs with the tools of NNs. As they're more widely used, we'll focus on PINNs (v2) here:
An inherent drawback is that they typically yield single solutions or very narrow solution manifolds,
and that they do not combine with traditional numerical techniques well.
In comparison to the Neural surrogates/operators from {doc}`supervised` we've made a step backwards in some way.
E.g., the learned representation is not suitable to be refined with
a classical iterative solver such as the conjugate gradient method.
This means many
powerful techniques that were developed in the past decades cannot be used in this context.
Bringing these numerical methods back into the picture will be one of the central
goals of the next sections.
✅ Pro:
- Uses physical model.
- Derivatives can be conveniently computed via backpropagation.
- Uses physical model
- Derivatives can be conveniently computed via backpropagation
❌ Con:
- Quite slow ...
- Physical constraints are enforced only as soft constraints.
- Largely incompatible with _classical_ numerical methods.
- Accuracy of derivatives relies on learned representation.
- Problematic convergence
- Physical constraints are enforced only as soft constraints
- Largely incompatible with _classical_ numerical methods
- Usefulness of derivatives relies on learned representation
To address these issues,
we'll next look at how we can leverage existing numerical methods to improve the DL process

869
physicalloss-div.ipynb Normal file

File diff suppressed because one or more lines are too long

View File

@@ -2,9 +2,9 @@ Physical Loss Terms
=======================
The supervised setting of the previous sections can quickly
yield approximate solutions with a fairly simple training process. However, what's
quite sad to see here is that we only use physical models and numerical methods
as an "external" tool to produce a big pile of data 😢.
yield approximate solutions with a simple and stable training process. However, it's
unfortunate that we only use physical models and numerical methods
as an "external" tool to produce lots of data 😢.
We as humans have a lot of knowledge about how to describe physical processes
mathematically. As the following chapters will show, we can improve the
@@ -23,20 +23,21 @@ $$
\mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} ) ,
$$
where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to one of the spatial dimensions
of higher and higher order (this can of course also include mixed derivatives with respect to different axes). $\mathbf{u}_t$ denotes the changes over time.
In this context, we can approximate the unknown $\mathbf{u}$ itself with a neural network. If the approximation, which we call $\tilde{\mathbf{u}}$, is accurate, the PDE should be satisfied naturally. In other words, the residual R should be equal to zero:
where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to the spatial dimensions
(this could of course also include mixed derivatives with respect to different axes). $\mathbf{u}_t$ denotes the changes over time.
Given a solution $\mathbf{u}$, we can compute the residual R, which naturally should be equal to zero for a correct solution:
$$
R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} ) = 0 .
$$
In this context, we can approximate the unknown $\mathbf{u}$ itself with a neural network.
If the approximation is accurate, the PDE residual should likewise be zero.
This nicely integrates with the objective for training a neural network: we can train for
minimizing this residual in combination with direct loss terms.
Similar to before, we can use pre-computed solutions
$[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(\mathbf{x})=y$ as constraints
in addition to the residual terms.
In addition to relying on the residual, we can use pre-computed solutions
$[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(\mathbf{x})=y$ as targets.
This is typically important, as most practical PDEs do not have unique solutions
unless initial and boundary conditions are specified. Hence, if we only consider $R$ we might
get solutions with random offset or other undesirable components. The supervised sample points
@@ -51,19 +52,22 @@ where $\alpha_{0,1}$ denote hyperparameters that scale the contribution of the s
the residual term, respectively. We could of course add additional residual terms with suitable scaling factors here.
It is instructive to note what the two different terms in equation {eq}`physloss-training` mean: The first term is a conventional, supervised L2-loss. If we were to optimize only this loss, our network would learn to approximate the training samples well, but might average multiple modes in the solutions, and do poorly in regions in between the sample points.
If we, instead, were to optimize only the second term (the physical residual), our neural network might be able to locally satisfy the PDE, but still could produce solutions that are still far away from our training data. This can happen due to "null spaces" in the solutions, i.e., different solutions that all satisfy the residuals.
If we, instead, were to optimize only the second term (the physical residual), our neural network might be able to locally satisfy the PDE, but
could have large difficulties find a solution that fits globally.
This can happen due to "null spaces" in the solutions, i.e., different solutions that all satisfy the residuals. Then local points can converge
to different solutions, in combination yielding a very suboptimal one.
Therefore, we optimize both objectives simultaneously such that, in the best case, the network learns to approximate the specific solutions of the training data while still capturing knowledge about the underlying PDE.
Note that, similar to the data samples used for supervised training, we have no guarantees that the
residual terms $R$ will actually reach zero during training. The non-linear optimization of the training process
will minimize the supervised and residual terms as much as possible, but there is no guarantee. Large, non-zero residual
contributions can remain. We'll look at this in more detail in the upcoming code example, for now it's important
to remember that physical constraints in this way only represent _soft constraints_, without guarantees
to keep in mind that the physical constraints formulated this way only represent _soft constraints_, without guarantees
of minimizing these constraints.
The previous overview did not really make clear how an NN produces $\mathbf{u}$.
We can distinguish two different approaches here:
via a chosen explicit representation of the target function (v1 in the following), or via using fully-connected neural networks to represent the solution (v2).
via a chosen explicit representation of the target function (v1 in the following), or with a _Neural field_ based on fully-connected neural networks to represent the solution (v2).
E.g., for v1 we could set up a _spatial_ grid (or graph, or a set of sample points), while in the second case no explicit representation exists, and the NN instead receives the _spatial coordinate_ to produce the solution at a query position.
We'll outline these two variants in more detail the following.
@@ -96,30 +100,28 @@ To learn this decomposition, we can approximate $p$ with a CNN on our computatio
$\nabla \cdot \big( \mathbf{u}(0) - \nabla f(\mathbf{u}(0);\theta) \big)$.
To implement this residual, all we need to do is provide the divergence operator $(\nabla \cdot)$ of $\mathbf u$ on our computational mesh. This is typically easy to do via
a convolutional layer in the DL framework that contains the finite difference weights for the divergence.
Nicely enough, in this case we don't even need additional supervised samples, and can typically purely train with this residual formulation. Also, in contrast to variant 2 below, we can directly handle fairly large spaces of solutions here (we're not restricted to learning single solutions)
Nicely enough, in this case we don't even need additional supervised samples, and can typically purely train with this residual formulation. Also, in contrast to variant 2 below, we can directly handle fairly large spaces of solutions here (we're not restricted to learning single solutions).
An example implementation can be found in this [code repository](https://github.com/tum-pbs/CG-Solver-in-the-Loop).
Overall, this variant 1 has a lot in common with _differentiable physics_ training (it's basically a subset). As we'll discuss differentiable physics in a lot more detail
in {doc}`diffphys` and after, we'll focus on direct NN representations (variant 2) from now on.
Overall, this variant 1 has a lot in common with _differentiable physics_ training (it's basically a subset) that will be covered with a lot more detail in {doc}`diffphys`. Hence, we'll focus a bit more on direct NN representations (variant 2) in this chapter.
---
## Variant 2: Derivatives from a neural network representation
The second variant of employing physical residuals as soft constraints
instead uses fully connected NNs to represent $\mathbf{u}$. This _physics-informed_ approach was popularized by Raissi et al. {cite}`raissi2019pinn`, and has some interesting pros and cons that we'll outline in the following. We will target this physics-informed version (variant 2) in the following code examples and discussions.
instead uses fully connected NNs to represent $\mathbf{u}$. This _physics-informed_ (PINN) approach was popularized by Raissi et al. {cite}`raissi2019pinn`, and has some interesting pros and cons that we'll outline in the following. By now, this approach can be seen as part of the _Neural field_ representations that e.g. also include NeRFs and learned signed distance functions.
The central idea here is that the aforementioned general function $f$ that we're after in our learning problems
The central idea with Neural fields is that the aforementioned general function $f$ that we're after
can also be used to obtain a representation of a physical field, e.g., a field $\mathbf{u}$ that satisfies $R=0$. This means $\mathbf{u}(\mathbf{x})$ will
be turned into $\mathbf{u}(\mathbf{x}, \theta)$ where we choose the NN parameters $\theta$ such that a desired $\mathbf{u}$ is
represented as precisely as possible.
represented as precisely as possible, and $\mathbf{u}$ simply returns the right value at spatial location $\mathbf{x}$.
One nice side effect of this viewpoint is that NN representations inherently support the calculation of derivatives.
One nice side effect of this viewpoint is that NN representations inherently support the calculation of derivatives w.r.t. inputs.
The derivative $\partial f / \partial \theta$ was a key building block for learning via gradient descent, as explained
in {doc}`overview`. Now, we can use the same tools to compute spatial derivatives such as $\partial \mathbf{u} / \partial x$,
in {doc}`overview`. Now, we can use the same tools to compute spatial derivatives such as $\partial \mathbf{u} / \partial x = \partial f / \partial x$,
Note that above for $R$ we've written this derivative in the shortened notation as $\mathbf{u}_{x}$.
For functions over time this of course also works for $\partial \mathbf{u} / \partial t$, i.e. $\mathbf{u}_{t}$ in the notation above.
For functions over time this of course also works by adding $t$ as input to compute $\partial \mathbf{u} / \partial t$, i.e. $\mathbf{u}_{t}$ in the notation above.
```{figure} resources/physloss-overview-v2.jpg
---
@@ -139,14 +141,22 @@ To pick a simple example, Burgers equation in 1D,
$\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ , we can directly
formulate a loss term $R = \frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} - \nu \frac{\partial^2 u}{\partial x^2}$ that should be minimized as much as possible at training time. For each of the terms, e.g. $\frac{\partial u}{\partial x}$,
we can simply query the DL framework that realizes $u$ to obtain the corresponding derivative.
For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we can simply query the derivative function of the framework multiple times. In the following section, we'll give a specific example of how that works in tensorflow.
For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we can query the derivative function of the framework multiple times.
In the following section, we'll give a specific example of how that works in tensorflow.
## Summary so far
The approach above gives us a method to include physical equations into DL learning as a soft constraint: the residual loss.
Typically, this setup is suitable for _inverse problems_, where we have certain measurements or observations
for which we want to find a PDE solution. Because of the high cost of the reconstruction (to be
demonstrated in the following), the solution manifold shouldn't be overly complex. E.g., it is typically not possible
to capture a wide range of solutions, such as with the previous supervised airfoil example, by only using a physical residual loss.
While v1 relies on an inductive bias in the form of a discretization, v2 relies on derivatives computed by via autodiff.
Typically, v2 is especially suitable for _inverse problems_, where we have certain measurements or observations
for which we want to find a PDE solution.
Because of the ill-posedness of the optimization and learning problem,
and the high cost of the reconstruction (to be
demonstrated in the following), the solution manifold shouldn't be overly complex for these PINN approaches.
E.g., it is typically very difficult to capture time dependence or a wide range of solutions,
such as with the previous supervised airfoil example.
Next, we'll demonstrate these concepts with code: first, we'll show how learning the Helmholtz decomposition works out in
practice with a **v1**-approach. Afterwards, we'll illustrate the **v2** PINN-approaches with a practical example.

1180
probmodels-ddpm-fm.ipynb Normal file

File diff suppressed because one or more lines are too long

743
probmodels-diffusion.ipynb Normal file

File diff suppressed because one or more lines are too long

26
probmodels-discuss.md Normal file
View File

@@ -0,0 +1,26 @@
Discussion of Probabilistic Learning
=======================
As the previous sections have demonstrated, probabilistic learning offers a wide range of very exciting possibilities in the context of physics-based learning. First, these methods come with a highly interesting and well developed theory. Surprisingly, some parts are actually more developed than basic questions about simpler learning approaches.
At the same time, they enable a fundamentally different way to work with simulations: they provide a simple way to work with complex distributions of solutions. This is of huge importance for inverse problems, e.g. in the context of obtaining likelihood-based estimates for _simulation-based inference_.
![Divider](resources/divider-gen1.jpg)
That being said, diffusion based approaches will not show relatively few advantages for deterministic settings: they are not more accurate, and typically induce slightly larger computational costs. An interesting exception is the long-term stability, as discussed in {doc}`probmodels-uncond`. To summarize the key aspects of probabilistic deep learning approaches:
✅ Pro:
- Enable training and inference for distributions
- Well developed theory
- Stable training
❌ Con:
- (Slightly) increased inference cost
- No real advantage for deterministic settings
One more concluding recommendation: if your problems contains ambiguities, diffusion modeling in the form of _flow matching_ is the method of choice. If your data contains reliable input-output pairs, go with simpler _deterministic training_ instead.
![Divider](resources/divider-gen3.jpg)
Next, we can turn to a new viewpoint on learning problems, the field of _reinforcement learning_. As the next sections will point out, it is actually not so different from the topics of the previous chapters despite the new viewpoint.

0
probmodels-dppm-fm.ipynb Normal file
View File

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

201
probmodels-graph.md Normal file
View File

@@ -0,0 +1,201 @@
Graph-based Diffusion Models
=======================
Similar to classical numerics, regular grids are ideal for certain situations, but sub-optimal for others. Diffusion models are no different, but luckily the concepts of the previous sections do carry over when replacing the regular grids with graphs. Importantly, denoising and flow matching work similarly well on unstrucuted Eulerian meshes, as will be demonstrated below. This test case will illustrate another important aspect: diffusion models excel at _completing_ data distributions. I.e., even when the training data has an incomplete distribution for a single example (defined by the geometry of the physical domain, boundary conditions and physical parameters), the "global" view of learning from different examples let's the networks _complete_ the posterior distribution over the course of seeing partial data for many different examples.
Most simulation problems like fluid flows are often poorly represented by a single mean solution. E.g., for many practical applications involving turbulence, it is crucial to **access the full distribution of possible flow states**, from which relevant statistics (e.g., RMS and two-point correlations) can be derived. This is where diffusion models can leverage their strengths: instead of having to simulate a lengthy transient phase to converge towards an equilibrium state, diffusion models can completely skip the transient warm-up, and directly produce the desired samples. Hence, this allows for computing the relevant flow statistics very efficiently compared to classic solvers.
## Diffusion Graph Net (DGN)
In the following, we'll demonstrate these capabilities based on the _diffusion graph net_ (DGN) approach {cite}`lino2025dgn`, the full source code for which [can be found here](https://github.com/tum-pbs/dgn4cfd/).
To learn the probability distribution of dynamical states of physical systems, defined by their discretization mesh and their physical parameters, the DDPM and flow matching frameworks can directly be applied to the mesh nodes. Additionally, DGN introduces a second model variant, which operates in a pre-trained semantic _latent space_ rather than directly in the physical space (this variant will be called LDGN).
In contrast to relying on regular grid discretizations as in previous sections, the systems geometry is now represented using a mesh with nodes $\mathcal{V}_M$ and edges ${\mathcal{E}}_M$, where each node $i$ is located at ${x}_i$. The systems state at time $t$, ${Y}(t)$, is defined by $F$ continuous fields sampled at the mesh nodes: ${Y}(t) := \{ {y}_i(t) \in \mathbb{R}^{F} \ | \ i \in {\mathcal{V}}_M \}$, with the short form ${y}_i(t) \equiv {y}({x}_i,t)$. Simulators evolve the system through a sequence of states, $\mathcal{Y} = \{{Y}(t_0), {Y}(t_1), \dots, {Y}(t_n), \dots \}$, starting from an initial state ${Y}(t_0)$.
We assume that after an initial transient phase, the system reaches a statistical equilibrium. In this stage, statistical measures of ${Y}$, computed over sufficiently long time intervals, are time-invariant, even if the dynamics display oscillatory or chaotic behavior. The states in the equilibrium stage, ${\mathcal{Z}} \subset {\mathcal{Y}}$, depend only on the systems geometry and physical parameters, and not on its initial state. This is illustrated in the following picture.
```{figure} resources/probmodels-graph-over.jpg
---
height: 180px
name: probmodels-graph-over
---
(a) DGN learns the probability distribution of the systems' converged states provided only a short trajectory of length $\delta << T$ per system. (b) An example with a turbulent wing experiment. The distribution learned by the LDGN model accurately captures the variance of all states (bottom right), despite seeing only an incomplete distribution for each wing during training (top right).
```
In many engineering applications, such as aerodynamics and structural vibrations, the primary focus is not on each individual state along the trajectory, but rather on the statistics that characterize the systems dynamics. However, simulating a trajectory of converged states $\mathcal{Z}$ long enough to accurately capture these statistics can be very expensive, especially for real-world problems involving 3D chaotic systems. The following DGN approachs aims for directly sampling converged states ${Z}(t) \in \mathcal{Z}$ without simulating the initial transient phase. Subsequently, we can analyze the system's dynamics by drawing enough samples.
Given a dataset of short trajectories from $N$ systems, $\mathfrak{Z} = \{\mathcal{Z}_1, \mathcal{Z}_2, ..., \mathcal{Z}_N\}$, the goal in the following is to learn a probabilistic model of $\mathfrak{Z}$ that enables sampling of a converged state ${Z}(t) \in \mathcal{Z}$, conditioned on the system's mesh, boundary conditions, and physical parameters. This model must capture the underlying probability distributions even when trained on trajectories that are too short to fully characterize their individual statistics. Although this is an ill-posed problem, given sufficient training trajectories, diffusion models on graphs manage to uncover the statistical correlations and shared patterns, enabling interpolation across the condition space.
## Diffusion on Graphs
We'll use DDPM (and later flow matching) to generate states ${Z}(t)$ by denoising a sample ${Z}^R \in \mathbb{R}^{|\mathcal{V}_M| \times F}$ drawn from an isotropic Gaussian distribution. The systems conditional information is encoded in a directed graph ${\mathcal{G}} :=({\mathcal{V}}, {\mathcal{E}})$, where ${\mathcal{V}} \equiv {\mathcal{V}}_M$ and the mesh edges ${\mathcal{E}}_M$ are represented as bi-directional graph edges ${\mathcal{E}}$. Node attributes ${V}_c = \{{v}_{i}^c \ | \ i \in {\mathcal{V}} \}$ and edge attributes ${E}_c = \{{e}_{ij}^c \ | \ (i,j) \in {\mathcal{E}} \}$ encode the conditional features, including the relative positions between adjacent node, ${x}_j - {x}_i$.
In the *diffusion* (or *forward*) process, node features from ${Z}^1 \in \mathbb{R}^{|\mathcal{V}| \times F}$ to ${Z}^R \in \mathbb{R}^{|\mathcal{V}| \times F}$ are generated by sequentially adding Gaussian noise:
$
q({Z}^r|{Z}^{r-1})=\mathcal{N}({Z}^r; \sqrt{1-\beta_r} {Z}^{r-1}, \beta_r \mathbf{I}),
$
where $\beta_r \in (0,1)$, and $Z^0 \equiv Z(t)$. Any ${Z}^r$ can be sampled directly via:
$$
{Z}^r = \sqrt{\bar{\alpha}_r} {Z}^0 + \sqrt{1-\bar{\alpha}_r} {\epsilon},
$$
with $\alpha_r := 1 - \beta_r$, $\bar{\alpha}_r := \prod_{s=1}^r \alpha_s$ and ${\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$.
The denoising process removes noise through learned Gaussian transitions:
$
p_\theta({Z}^{r-1}|{Z}^r) =\mathcal{N} ({Z}^{r-1}; {\mu}_\theta^r, {\Sigma}_\theta^r),
$
where the mean and variance are parameterized as:
$$
\begin{aligned}
{\mu}_\theta^r = \frac{1}{\sqrt{\alpha_r}} \left( {Z}^r - \frac{\beta_r}{\sqrt{1-\bar{\alpha}_r}} {\epsilon}_\theta^r \right),
\qquad
{\Sigma}_\theta^r = \exp\left( \mathbf{v}_\theta^r \log \beta_r + (1-\mathbf{v}_\theta^r)\log \tilde{\beta}_r \right),
\end{aligned}
$$
with $\tilde{\beta}_r := (1 - \bar{\alpha}_{r-1}) / (1 - \bar{\alpha}_r) \beta_r$. Here, ${\epsilon}_\theta^r \in \mathbb{R}^{|\mathcal{V}| \times F}$ predicts the noise ${\epsilon}$ in equation (1), and $\mathbf{v}_\theta^r \in \mathbb{R}^{|\mathcal{V}| \times F}$ interpolates between the two bounds of the process' entropy, $\beta_r$ and $\tilde{\beta}_r$.
DGNs predict ${\epsilon}_\theta^r$ and $\mathbf{v}_\theta^r$ using a regular message-passing-based GNN {cite}`sanchez2020learning`. This takes ${Z}^{r-1}$ as input, and it is conditioned on the graph ${\mathcal{G}}$, its node and edge features, and the diffusion step $r$:
$$
\begin{aligned}
[{\epsilon}_\theta^r, \mathbf{v}_\theta^r] \leftarrow \text{{DGN}}_\theta({Z}^{r-1}, {\mathcal{G}}, {V}_c, {E}_c, r).
\end{aligned}
$$
The _DGN_ network is trained using the hybrid loss function proposed in *"Improved Denoising Diffusion Probabilistic Models"* by Nichol et al. The full denoising process requires $R$ evaluations of the DGN to transition from ${Z}^R$ to ${Z}^0$.
DGN follows the widely used encoder-processor-decoder GNN architecture. In addition to the node and edge encoders, the encoder includes a diffusion-step encoder, which generates a vector ${r}_\text{emb} \in \mathbb{R}^{F_\text{emb}}$ that embeds the diffusion step $r$. The node encoder processes the conditional node features ${v}_i^c$, alongside ${r}_\text{emb}$. Specifically, the diffusion-step encoder and the node encoder operate as follows:
$$
\begin{aligned}
{r}_\text{emb} \leftarrow
\phi \circ {\small \text{Linear}} \circ {\small \text{SinEmb}} (r),
\quad
{v}_i \leftarrow {\small \text{Linear}} \left( \left[ \phi \circ {\small \text{Linear}} ({v}_i^c) \ | \ {r}_\text{emb}
\right] \right),
\quad \forall i \in \mathcal{V},
\end{aligned}
$$
where $\phi$ denotes the activation function and ${\small \text{SinEmb}}$ is the sinusoidal embedding function. The edge encoder applies a linear layer to the conditional edge features ${e}_{ij}^c$.
The encoded node and edge features are $\mathbb{R}^{F_{h}}$-dimensional vectors ($F_\text{emb} = 4 \times F_h$). We condition each message-passing layer on $r$ by projecting ${r}_\text{emb}$ to an $F_{h}$-dimensional space and adding the result to the node features before each of these layers — i.e., ${v}_i \leftarrow {v}_i + {\small \text{Linear}}({r}_\text{emb})$. Each message-passing layer follows:
$$
\begin{aligned}
\mathbf{e}_{ij} &\leftarrow W_e \mathbf{e}_{ij} + \text{MLP}^e \left( \text{LN} \left([\mathbf{e}_{ij}|\mathbf{v}_{i}|\mathbf{v}_{j}] \right) \right), \qquad \forall (i,j) \in \displaystyle \mathcal{E},\\
\bar{\mathbf{e}}_{j} &\leftarrow \sum_{i \in \mathcal{N}^-_j} \mathbf{e}_{ij}, \qquad \forall j \in \displaystyle \mathcal{V},\\
\mathbf{v}_j &\leftarrow W_v \mathbf{v}_j + \text{MLP}^v \left( \text{LN} \left( [\bar{\mathbf{e}}_{j} | \mathbf{v}_j]\right) \right), \qquad \forall j \in \displaystyle \mathcal{V}.
\end{aligned}
$$
Previous work on graph-based diffusion models has used sequential message passing to propagate node features across the graph. However, this approach fails for large-scale phenomena, such as the flows studied in the context of DGN, as denoising of global features becomes bottlenecked by the limited reach of message passing.
To address this, a multi-scale GNN is adopted for the processor, applying message passing on ${\mathcal{G}}$ and multiple coarsened versions of it in a U-Net fashion. This design leverages the U-Nets effectiveness in removing both high- and low-frequency noise. To obtain each lower-resolution graph from its higher-resolution counterpart, we use Guillards coarsening algorithm, originally developed for fast mesh coarsening in CFD applications. As in the conventional U-Net, pooling and unpooling operations, now based on message passing, are used to transition between higher- and lower-resolution graphs.
```{figure} resources/probmodels-graph-pooling.jpg
---
height: 200px
name: probmodels-graph-pooling
---
Message passing is applied on ${\mathcal{G}}$ and multiple coarsened versions of it in a U-Net fashion. The lower-resolution graphs are obtained using a mesh coarsening algorithm popularised in CFD applications.
```
## Diffusion in Latent Space
Diffusion models can also operate in a lower-dimensional graph-based representation that is perceptually equivalent to $\mathfrak{Z}$. This space is defined as the latent space of a Variational Graph Auto-Encoder (VGAE) trained to reconstruct ${Z}(t)$. We'll refer to a DGN trained on this latent space as a Latent DGN (LDGN).
```{figure} resources/probmodels-graph-arch.jpg
---
height: 220px
name: probmodels-graph-arch
---
(a) The VGAE consists of a condition encoder, a (node) encoder, and a (node) decoder. The multi-scale latent features from the condition encoder serve as conditioning inputs to both the encoder and the decoder. (b) During LDGN inference, Gaussian noise is sampled in the VGAE latent space and, after multiple denoising steps conditioned on the low-resolution outputs from the VGAE's condition encoder, transformed into the physical space by the VGAE's decoder.
```
In this configuration, the VGAE captures high-frequency information (e.g., spatial gradients and small vortices), while the LDGN focuses on modeling mid- to large-scale patterns (e.g., the wake and vortex street). By decoupling these two tasks, the generative learning process is simplified, allowing the LDGN to concentrate on more meaningful latent representations that are less sensitive to small-scale fluctuations. Additionally, during inference, the VGAEs decoder helps remove residual noise from the samples generated by the LDGN. This approach significantly reduces sampling costs since the LDGN operates on a smaller graph rather than directly on ${\mathcal{G}}$.
For the VGAE, an encoder-decoder architecture is used with an additional condition encoder to handle conditioning inputs. The condition encoder processes ${V}_c$ and ${E}_c$, encoding these into latent node features ${V}^\ell_c$ and edge features ${E}^\ell_c$ across $L$ graphs $\{{\mathcal{G}}^\ell := ({\mathcal{V}}^\ell, {\mathcal{E}}^\ell) {I}d 1 \leq \ell \leq L\}$, where ${\mathcal{G}}^1 \equiv {\mathcal{G}}$ and the size of the graphs decreases progressively, i.e., $|{\mathcal{V}}^1| > |{\mathcal{V}}^2| > \dots > |{\mathcal{V}}^L|$. This transformation begins by linearly projecting ${V}_c$ and ${E}_c$ to a $F_\text{ae}$-dimensional space and applying two message-passing layers to yield ${V}^1_c$ and ${E}^1_c$. Then, $L-1$ encoding blocks are applied sequentially:
$$
\begin{aligned}
\left[{V}^{\ell+1}_c, {E}^{\ell+1}_c \right] \leftarrow {\small MP} \circ {\small MP} \circ {\small \text{GraphPool}} \left({V}^\ell_c, {E}^\ell_c \right), \quad \text{for} \ l = 1, 2, \dots, L-1,
\end{aligned}
$$
where _MP_ denotes a message-passing layer and _GraphPool_ denotes a graph-pooling layer.
The encoder produces two $F_L$-dimensional vectors for each node $i \in {\mathcal{V}}^L$, the mean ${\mu}_i$ and standard deviation ${\sigma}_i$ that parametrize a Gaussian distribution over the latent space. It takes as input a state ${Z}(t)$, which is linearly projected to a $F_\text{ae}$-dimensional vector space and then passed through $L-1$ sequential down-sampling blocks (message passing + graph pooling), each conditioned on the outputs of the condition encoder:
$$
\begin{aligned}
{V} \leftarrow {\small \text{GraphPool}} \circ {\small MP} \circ {\small MP} \left( {V} + {\small \text{Linear}}\left({V}^\ell_c \right), {\small \text{Linear}}\left({E}^\ell_c \right) \right), \ \text{for} \ l = 1, 2, \dots, L-1;
\end{aligned}
$$
and a bottleneck block:
$$
\begin{aligned}
{V} \leftarrow {\small MP} \circ {\small MP} \left( {V} + {\small \text{Linear}}\left({V}^L_c \right), {\small \text{Linear}}\left({E}^L_c \right) \right).
\end{aligned}
$$
The output features are passed through a node-wise MLP that returns ${\mu}_i$ and ${\sigma}_i$ for each node $i \in {\mathcal{V}}^L$. The latent variables are then computed as ${\zeta}_i = {\small \text{BatchNorm}}({\mu}_i + {\sigma}_i {\epsilon}_i$), where ${\epsilon}_i \sim \mathcal{N}(0, {I})$. Finally, the decoder mirrors the encoder, employing a symmetric architecture (replacing graph pooling by graph unpooling layers) to upsample the latent features back to the original graph ${\mathcal{G}}$. Its blocks are also conditioned on the outputs of the condition encoder. The message passing and the graph pooling and unpooling layers in the VGAE are the same as in the (L)DGN.
The VGAE is trained to reconstruct states ${Z}(t) \in \mathfrak{Z}$ with a KL-penalty towards a standard normal distribution on the learned latent space. Once trained, the LDGN can be trained following the approach in Section~\ref{sec:DGN}. However, the objective is now to learn the distribution of the latent states ${\zeta}$, defined on the coarse graph ${\mathcal{G}}^L$, conditioned on the outputs ${V}^L_c$ and ${E}^L_c$ from the condition encoder.
During inference, the condition encoder generates the conditioning features ${V}^\ell_c$ and ${E}^\ell_c$ (for $l = 1, 2, \dots, L$), and after the LDGN completes its denoising steps, the decoder transforms the generated ${\zeta}_0$ back into the physical feature-space defined on ${\mathcal{G}}$.
Unlike in conventional VGAEs, the condition encoder is necessary because, at inference time, an encoding of ${V}_c$ and ${E}_c$ is needed on graph ${\mathcal{G}}^L$, where the LDGN operates. This encoding cannot be directly generated by the encoder, as it also requires ${Z}(t)$ as input, which is unavailable during inference. An alternative approach would be to define the conditions directly in the coarse representation of the system provided by ${\mathcal{G}}^L$, but this representation lacks fine-grained details, leading to sub-optimal results.
![Divider](resources/divider-gen4.jpg)
## Turbulent Flows around Wings in 3D
Let's directly turn to a complex case to illustrate the capabilities of DGN. (A more basic case will be studied in the Jupyter notebook on the following page.)
The Wing experiments of the DGN project target wings in 3D turbulent flow, characterized by detailed vortices that form and dissipate on the wing surface. This task is particularly challenging due to the high-dimensional, chaotic nature of turbulence and its inherent multi-scale interactions across a wide range of scales.
The geometry of the wings varies in terms of relative thickness, taper ratio, sweep angle, and twist angle.
These simulations are computationally expensive, and using GNNs allows us to concentrate computational effort on the wing's surface, avoiding the need for costly volumetric fields. A regular grid around the wing would require over $10^5$ cells, in contrast to approximately 7,000 nodes for the surface mesh representation. The surface pressure can be used to determine both the aerodynamic performance of the wing and its structural requirements.
Fast access to the probabilistic distribution of these quantities would be highly valuable for aerodynamic modeling tasks.
The training dataset for this task was generated using Detached Eddy Simulation (DES) with OpenFOAMs PISO solver,
using 250 consecutive states shortly after the data-generating simulator reached statistical equilibrium.
This represents about **10%** of the states needed to achieve statistically stationary variance, thus the models are trained with a very partial view on each case.
## Distributional accuracy
A high accuracy for each sample does not necessarily imply that a model is learning the true distribution. In fact, these properties often conflict. For instance, in VGAEs, the KL-divergence penalty allows control over whether to prioritize sample quality or mode coverage.
To evaluate how well models capture the probability distribution of system states, we use the Wasserstein-2 distance. This metric can be computed in two ways: (i) by treating the distribution at each node independently and averaging the result across all nodes, or (ii) by considering the joint distribution across all nodes in the graph. These metrics are denoted as $W_2^\text{node}$ and $W_2^\text{graph}$, respectively. The node-level measure ($W_2^\text{node}$) provides insights into how accurately the model estimates point-wise statistics, such as the mean and standard deviation at each node. However, it does not penalize inaccurate spatial correlations, whereas the graph-wise measure ($W_2^\text{graph}$) does.
To ensure stable results when computing these metrics, the target distribution is represented by 2,500 consecutive states, and the predicted one by 3,000 samples.
While the trajectories in the training data are long enough to capture the mean flow, they fall short of capturing the standard deviation, spatial correlations, or higher-order statistics. Despite these challenges, the DGN, and especially the LDGN, are capable of accurately learning the complete probability distributions of the training trajectories and accurately generating new distribution for both in- and out-of-distribution physical settings. The figure below shows a qualitative evaluation together with correlation measurements. Both DGN variants also fare much better than the _Gaussian-Mixture model_ baseline denoted as GM-GNN.
```{figure} resources/probmodels-graph-wing.jpg
---
height: 220px
name: probmodels-graph-wing
---
(a) The _Wing_ task targets pressure distributions on a wing in 3D turbulent flow. (b) The standard deviation of the distribution generated by the LDGN is the closest to the ground-truth (shown here in terms or correlation).
```
In terms of Wasserstein distance $W_2^\text{graph}$, the latent-space diffusion model also outperforms the others, with a distance of $\textbf{1.95 ± 0.89}$, while DGN follows with $2.12 ± 0.90$, and the gaussian mixture model gives $4.32 ± 0.86$.
## Computational Performance
Comparisons between runtimes of different implementations always should be taken with a grain of salt.
Nonetheless, for the Wing experiments, the ground-truth simulator, running on 8 CPU threads, required 2,989 minutes to simulate the initial transient phase plus 2,500 equilibrium states. This duration is just enough to obtain a well converged variance. In contrast, the LDGN model took only 49 minutes on 8 CPU threads and 2.43 minutes on a single GPU to generate 3,000 samples.
If we consider the generation of a single converged state (for use as an initial condition in another simulator, for example), the speedup is four orders of magnitude on the CPU, and five orders of magnitude on the GPU.
Thanks to its latent space, the LDGN model is not only more accurate, but also $8\times$ faster than the DGN model, while requiring only about 55\% more training time.
These significant efficiency advantages suggest that graph-based diffusion models can be particularly valuable in scenarios where computational costs are otherwise prohibitive.
These results indicate that diffusion modeling in the context of unstructured simulations represent a significant step towards leveraging probabilistic methods in real-world engineering applications.
To highlight the aspects of DGN and its implementation, we now turn to a simpler test case that can be analyzed in detail within a Jupyter notebook.

119
probmodels-intro.md Normal file
View File

@@ -0,0 +1,119 @@
Introduction to Probabilistic Learning
=======================
So far we've treated the target function $f(x)=y$ as being deterministic, with a unique solution $y$ for every input. That's certainly a massive simplification: in practice, solutions can be ambiguous, our learned model might mix things up, and both effects could show up in combination. This all calls for moving towards a probabilistic setting, which we'll address here. The machinery from previous sections will come in handy, as the probabilistic viewpoint essentially introduces another dimension for the problem. Instead of a single $y$, we now have a multitude of solutions drawn from a distribution $Y$, each with a probability $p_Y(y)$, often shortened to $p(y)$.
Samples $y \sim p(y)$ drawn from the distribution should follow this probability, so that we can distinguish rare and and frequent cases.
To summarize, instead of individual solutions $y$ we're facing a large number of samples $y \sim p(y)$.
![Divider](resources/divider-gen-full.jpg)
## Uncertainty
All measurements, models, and discretizations that we are working with exhibit uncertainties. For measurements and observations, they typically appear in the form of measurement errors. Model equations, on the other hand, usually encompass only parts of a system we're interested in (leaving the remainder as an uncertainty), while for numerical simulations we inevitably introduce discretization errors. In the context of machine learning, we additionally have errors introduced by the trained model. All these errors and unclear aspects make up the uncertainties of the predicted outcomes, the _predictive uncertainty_. For practical applications it's crucial to have means for quantifying this uncertainty. This is a central motivation for working with probabilistic models, and for adjacent fields such as in "uncertainty quantification" (UQ).
```{note} Aleatoric vs. Epistemic Uncertainty.
The predictive uncertainty in many cases can
be distinguished in terms of two types of uncertainty:
- _Aleatoric_ uncertainty denotes uncertainty within the data, e.g., noise in measurements.
- _Epistemic_ uncertainty, on the other hand, describes uncertainties within a model such as a trained neural network.
A word of caution is important here:
while this distinction seems clear cut, both effects overlay and can be difficult to tell apart. E.g., when facing discretization errors, uncertain outcomes could be caused by unknown ambiguities in the data, or by a suboptimal discrete representation.
These aspects can be very difficult to disentangle in practice.
```
Closely aligned, albeit taking a slightly different perspective, are so-called _simulation-based inference_ (SBI) methods. Here the main motivation is to estimate likelihoods in computer-based simulations, so that reliable probability distributions for the solutions can be obtained. The SBI viewpoint provides a methodological approach for working with computer simulations and uncertainties, and will provide a red thread for the following sections.
## Forward or Backward?
At this point it's important to revisit the central distinction between forward and inverse ("backward") problems: most classic numerical methods target ➡️ **forward** ➡️ problems to compute solutions for steady-state or future states of a system.
Forward problems arise in many settings, but across the board, at least as many problems are ⬅️ **inverse** ⬅️ problems, where a forward simulation plays a central role, but the main question is not a state that it generates, but rather the value of parameter of simulator to explain a certain measurement or observation. To formalize this, our simulator $f$ is parametrized by a set of inputs $\nu$, e.g., a viscosity, and takes states $x$ to produce a modified state $y$. We have an observation $\tilde{y}$ and are interested in the value of $\nu$ to produce the observation. In the easiest case this inverse problem can tackled as a minimization problem
$\text{arg min}_{\nu} | f(x;\nu) - \tilde{y} |_2^2$. Solving it would tell us the viscosity of an observed material, and similar problems arise in pretty much all fields, from material science to cosmology. To simplify the notation, we'll merge $\nu$ into $x$, and minimize for $x$ correspondingly, but it's important to keep in mind that $x$ can encompass any set of parameters or state samples that we'd like to solve for with our inverse problem.
In the following, we will focus on inverse problems, as these best illustrate the capabilities of the probabilistic modeling, but the algorithms discussed are not exclusively applicable to inverse problems (an example will follow).
## Simulation-based Inference
For inverse problems, it is in practice not sufficient to match a single observation $\tilde{y}$. Rather, we'd like to ensure that the parameter we obtain explains a wide range of observations, and we might be interested in the possibility of multiple values explaining our observations. Similarly, quantifying the uncertainty of the estimate is important in real world settings: is the observation explained by only a very narrow range of parameters, or could the parameter vary by orders of magnitude without really influencing the observation? These questions require a statistical analysis, typically called _inference_, to draw conclusions about the results obtained from the inverse problem solve. To connect this viewpoint with the distinction regarding epistemic and aleatoric uncertainties above, we're primarily addressing the latter here: which uncertainties lie in our observations, given a scientific hypothesis in the form of a simulator.
To formalize these inverse problems let's consider
a vector-valued input$x$ that can contain states and / or
the aforementioned parameters (like $\nu$).
We also have a
distribution of latent variables
$z \sim p(z|x)$ that describes the unknown part of our system.
Examples for z are unobservable and stochastic variables , intermediate simulation steps, or the control flow of simulator.
```{note} Bayes theorem is fundamental for all of the following. For completeness, here it is: $p(x|y)~p(y) = p(y|x)~p(x)$. And it's worth keeping in mind that both sides are equivalent to the joint probabilities, i.e. $... = p(x,y) = p(y,x)$.
```
For $x$ there is a prior distribution X with a probability density $p(x)$for the inputs,
and the simulator produces an observation or output $y \sim p(y | x, z)$. Thus, $x$ can take different values, maybe it contains some noise, and the $z$ is out of our control, and can likewise influence the $y$ that are produced.
The function for the conditional probability $p(y|x)$ is called the **likelihood** function, and is a crucial value in the following. Note that it does not depend on $z$, as these latent states are out of our control.
So we actually need to
compute the marginal likelihood $p(y|x) = \int p(y, z | x) dz$ by integrating over all possible $z$.
This is necessary because the likelihood function shouldn't depend on $z$, otherwise we'd need to know the exact values of $z$ before being able to calculate the likelihood.
Unfortunately, this is often intractable, as $z$ can be difficult to sample, and in some case we can't even control it in a reasonable way.
Some algorithms have been proposed to compute likelihoods, one popular one is Approximate Bayesian Computation (ABC), but all approaches are highly expensive and require a lot of expert knowledge to set up. They suffer from the _curse of dimensionality_, i.e. become very expensive when facing larger numbers of degrees of freedom. Thus,
obtaining good approximations of the likelihood will be a topic that we'll revisit below.
![Divider](resources/divider-gen4.jpg)
With a function for the likelihood we can compute the
**distribution of the posterior**, the main quantity we're after,
in the following way:
$p(x|y) = \frac{p(y|x)p(x)}{\int p(y|x') p(x') dx'}$,
where the denominator
$\int p(y|x') p(x') dx'$ is called the _evidence_.
The evidence is just $p(y)$, which shows
that the equation for the posterior follows directly from Bayes' theorem $p(x|y) = p(y|x) p(x) / p(y)$.
The evidence can be computed with stochastic methods such as Markov Chain Monte Carlo (MCMC).
It primarily "normalizes" our posterior distribution and is typically easier to obtain than the likelihood, but nonetheless still a challenging term.
```{admonition} Leveraging Deep Learning
:class: tip
This is were deep learning turns out to be extremely useful: we can use it to train a conditional density estimator $q_\theta(x|y)$ for the posterior $p(x|y)$ that allows sampling, and can be trained from simulations $y \sim p(y|x)$ alone.
```
Deep learning has been instrumental to provide new ways of addressing the classic challenges of obtaining accurate estimates of posterior distributions, and this is what we'll focus on in this chapter. Previously, we called our neural networks $f_\theta$, but in the following we'll use $q_\theta = f_\theta$ to make clear we're dealing with a learned probability. Specifically, we'll target neural networks that learn a probability density, i.e. $\int q_\theta(x) dx = 1$.
We'll often first target unconditional densities, and then show how they can be modified to learn conditional versions $q_\theta(x|y)$.
Looking ahead, the learned SBI methods, i.e. approaches for computing posterior distributions, have the following properties:
✅ Pro:
* Fast inference (once trained)
* Less affected by curse of dimensionality
* Can represent arbitrary priors
❌ Con:
* Require costly upfront training
* Lacks rigorous theoretical guarantees
In the following we'll explain how to obtain and derive a very popular and powerful family of methods that can be summarized as **diffusion models**. We could simply provide the final algorithm (which will turn out to be surprisingly simple), but it's actually very interesting to see where it all comes from.
We'll focus on the basics, and leave the _physics-based extensions_ (i.e. including differentiable simulators) for a later section. The path towards diffusion models also introduces a few highly interesting concepts from machine learning along the way, and provides a nice "red thread" for discussing seminal papers from the past few years. Here we go...
<br>
![Divider](resources/divider-gen6.jpg)
```{note} Historic Alternative: Bayesian Neural Networks
A classic variant that should be mentioned here are "Bayesian Neural Networks". They
follow Bayes more closely, and pre-scribe a prior distribution on the neural network
parameters to learn the posterior distribution. Every weight and bias in the NN are assumed to be Gaussian with an own mean and variance, which are adjusted at training time. For inference, we can then "sample" a network, and use it like any regular NN.
Despite being a very good idea on paper, this method turned out to have problems with learning complex distributions, and requires careful tuning of the hyperparameters involved. Hence, these days, it's strongly recommended to use flow matching (or at least a diffusion model) instead.
If you're interested in details, BNNs with a code example can be found, e.g., in v0.3 of PBDL: https://arxiv.org/abs/2109.05237v3 .
```

1243
probmodels-normflow.ipynb Normal file

File diff suppressed because one or more lines are too long

287
probmodels-phys.md Normal file
View File

@@ -0,0 +1,287 @@
Incorporating Physical Constraints
=======================
Despite the powerful capabilities of diffusion- and flow-based networks for generative modeling that we discussed in the previous sections, there is no direct feedback loop between the network, the observation and the sample at training time. This means there is no direct mechanism to include **physics-based constraints** such as priors from PDEs. As a consequence, it's very difficult to produce highly accurate samples based on learning alone: For scientific applications, we often want to make sure the errors go down to any chosen threshold.
In this chapter, we will outline strategies to remedy this shortcoming, and building on the content of previous chapters, the central goal of both methods is to get **differentiable simulations** back into the training and inference loop. The previous chapters have shown that they're very capable tools, so the main question is how to best employ them in the context of diffusion modeling.
```{note}
Below we'll focus on the inverse problem setting from {doc}`probmodels-intro`. I.e., we have a system $y=f(x)$ (with numerical simulator $y=\mathcal P(x)$) and given an observation $y$, we'd like to obtain the posterior distribution for the distributional solution $x \sim p(x|y)$ of the inverse problem.
```
## Guiding Diffusion Models
Having access to a physical model with a differentiable simulation $\mathcal{P}(x)=y$ means we can obtain gradients $\nabla_x$ through the simulation. As before, we aim for solving _inverse_ problems where, given an output $y$ we'd like to sample from the conditional posterior distribution $p(x|y)$ to obtain samples $x$ that explain $y$. The previous chapter demonstrated learning such distributions with diffusion models, and given a physics prior $\mathcal{P}$, there's a first fundamental choice: should be use the gradient at _training time_, i.e., trying to improve the learned distribution $p_\theta$, or at _inference time_, to improve sampling $x \sim p_\theta(x|y)$.
**Training with physics priors:** The hope of incorporating physics-based signals in the form of gradients at training time would be to improve the state of $p_\theta$ after training. While there's a certain hope this could, e.g., compensate for sparse training data, there is little hope for substantially improving the accuracy of the learned distribution. The training process for diffusion and flow matching models typically yields very capable neural networks, that are excellent at producing approximate samples from the posterior. They're typically limited in terms of their accuracy by model and training data size, but it's difficult to fundamentally improve the capabilities of a model at this stage. Rather, in this context it is more interesting to obtain higher accuracies at inference time.
**Inference with physics priors:** For scientific applications, classic simulations typically yield control knobs that allow for choosing a level of accuracy. E.g., iterative solvers for linear systems provide iteration counts and residual thresholds, and if a solution is not accurate enough, a user can simply reduce the residual threshold to obtain a more accurate output. In contrast, neural networks typically come without such controls, and even the iteration count of denoising or velocity integration (for flow matching) are bounded in terms of final accuracy. More steps typically reduce noise, and correspondingly the error, but will plateau at a level of accuracy given by the capabilities of the trained model. This is exactly where the gradients of physics solver show promise: they provide an external process that can guide and improve the output of a diffusion model. As we'll show below, this makes it possible to push the levels of accuracy beyond those of pure learning, and can yield inverse problem solvers that really outperform traditional solvers.
Recall that for denoising, we train a noise estimator $\epsilon_\theta$, and at inference time iterate denoising steps of the form
$x_{\text{new}} = x - \hat \alpha_t \epsilon_\theta(x, t) + \hat \sigma_t \mathcal N(0,I)$ , where $\hat \alpha,\hat \sigma$ denote the merged scaling factors for both terms.
The most straight-forward approach for including gradients is to additionally include a step in the direction of the gradient $\nabla_x || \mathcal P(x) - y||_2$. For simplicity, we take an $L^2$ distance towards the observation $y$ here. This was shown to direct sampling even when the posterior is not conditional, i.e., if we only have access to $x \sim p_\theta(x)$, and is known as _diffusion posterior sampling_ {cite}`chung2023dps`.
While this approach manages to includes $\mathcal P$, there are two challenges: $x$ is typically noisy, and the gradient step can distort the distributional sampling of the denoising process. The first point is handled quite easily with an _extrapolation step_ (more details below), while the second one is more difficult to address: the gradient descent steps via $\nabla_x \mathcal P$ are akin to a classic optimization for the inverse problem and could strongly distort the outputs of the diffusion model. E.g., in the worst case they could pull the different points of the posterior distribution towards a single case favored by the simulator $\mathcal P$. Hence, the following paragraphs will outline a strategy that merges simulator and learning, while preserving the distribution of the posterior.
We'll focus on flow matching as a state-of-the-art approach next, and afterwards discuss variant that treats the diffusion steps themselves as a physical process.
![Divider](resources/divider-genA.jpg)
## Physics-Guided Flow Matching
To reintroduce control signals using simulators into the flow matching algorithm we'll follow {cite}`holzschuh2024fm`. The goal is to transform an existing pretrained flow-based network, as outlined in {doc}`probmodels-intro`, with a flexible control signal by aggregating the learned flow and control signals into a _controlled flow_. This is the task of a second neural network, the _control network_, in order to make sure that the posterior distribution is not negatively affected by the signals from the simulator. This second network is small compared to the pretrained flow network, and freezing the weights of the pretrained network works very well; thus, the refinement for control needs only a fairly small amount of additional parameters and computing resources.
```{figure} resources/probmodels-phys-overview.jpg
---
height: 240px
name: probmodels-phys-overview
---
An overview of the control framework. We will consider a pretrained flow network $v_\theta$ and use the predicted flow for the trajectory point $x_t$ at time $t$ to estimate $\hat{x}_1$.
On the right, we show a gradient-based control signal with a differentiable simulator and cost function $C$ for improving $\hat{x}_1$.
An additional network learns to combine the predicted flow with feedback via the control signal to give a new controlled flow.
By combining learning-based updates with suitable controls, we avoid local optima and obtain high-accuracy samples with low inference times.
```
The control signals can be based on gradients and a cost function, if the simulator is differentiable, but they can also be learned directly from the simulator output.
Below, we'll show that performance gains due to simulator feedback are substantial and cannot be achieved by training on larger datasets alone.
Specifically, we'll show that flow matching with simulator feedback is competitive with MCMC baselines for a problem from gravitational lensing in terms of accuracy, and it beats them significantly regarding inference time. This indicates that it provides a very attractive tool for practical applications.
**Controlled flow $v_\theta^C$** First, it's a good idea to pretrain a regular, conditional flow network $v_\theta(x,y,t)$ without any control signals to make sure that we can realize the best achievable performance possible based on learning alone.
Then, in a second training phase, a control network $v_\theta^C(v, c,t)$ is introduced. It receives the pretrained flow $v$ and control signal $c$ as input. Based on these additional inputs, it can used, e.g., the gradient of a PDE to produce an improved flow matching velocity. At inference time, we integrate
$dx/dt = v^C_\theta(v,c,t)$ just like before, only now this means evaluating $v_\theta(x,y,t)$ and then $c$ beforehand. (We'll focus on the details of $c$ in a moment.)
First, the control network is much smaller in size than the regular flow network, making up ca. $10\%$ of the weights $\theta$. The network weights of $v_\theta$ can be frozen, to train with the conditional flow matching loss {eq}`conditional-flow-matching` for a small number of additional steps. This reduces training time and compute since we do not need to backpropagate gradients through $v_\theta(x, y,t)$. Freezing the weights of $v_\theta$ typically does not negatively affects the performance, although a joint end-to-end training could provide some additional improvements.
**1-step prediction** The conditional flow matching networks $v_\theta(x,y,t)$ from {doc}`probmodels-intro` gradually transform samples from $p_0$ to $p_1$ during inference via integrating the simple ODE $dx_t/dt = v_\theta(x_t,y,t)$ step by step. There is no direct feedback loop between the current point on the trajectory $x_t$, the observation $y$, and a physical model that we could bring into the picture. An important first issue is that the current trajectory point $x_t$ is often not be close to a good estimate of a posterior sample $x_1$.
This is especially severe at the beginning of inference, where $x_0$ is drawn from the source distribution (typically a Gaussian), and hence $x_t$ will be very noisy. Most simulators really don't like very noisy inputs, and trying to compute gradients on top of it is clearly a very bad idea.
This issue is alleviated by extrapolating $x_t$ forward in time to obtain an estimated $\hat{x}_1$
$$
\begin{align}
\hat{x}_1 = x_t + (1-t) v_\theta(x_t, y, t).
\end{align}
$$ (eq:1_step_prediction)
and then performing subsequent operations for control and guidance on $\hat{x}_1$ instead of the current, potentially noisy $x_1$.
Note that this 1-step prediction is also conceptually related to diffusion sampling using [_likelihood-guidance_](http://DBLP:conf/nips/WuTNBC23). For inference in diffusion models, where sampling is based on the conditional score $\nabla_{x_t} \log p(x_t|y)$ and can be decomposed into
$$
\begin{align}
\nabla_{x_t} \log p(x_t|y) = \nabla_{x_t} \log p(x_t) + \nabla_{x_t} \log p(y|x_t).
\end{align}
$$
The first expression can be estimated using a pretrained diffusion network, whereas the latter is usually intractable, but can be approximated using
$p(y|x_t) \approx p_{y|x_0}(y|\hat{x}(x_t))$,
where the denoising estimate $\hat{x}(x_t) = \mathbb{E}_q[x_0|x_t]$ is usually obtained via Tweedie's formula $(\mathbb{E}_q[x_0|x_t] - x_t) / t\sigma^2$. In practice, the estimate $\hat{x}(x_t)$ is very poor when $x_t$ is still noisy, making inference difficult in the early stages. In contrast, flows based on linear conditional transportation paths have empirically been shown to have trajectories with less curvature compared to, for example, denoising-based networks. This property of flow matching enables inference in fewer steps and providing better estimates for $\hat{x}_1$.
### Physics-based Controls
Now we focus on the content of the control signal $c$ that was already used above. We extend the idea of self-conditioning via physics-based control signals to include an additional feedback loop between the network output and an underlying physics-based prior. We'll distinguish between two types of controls in the following: a gradient-based control from a differentiable simulator, and one from a learned estimator network.
```{figure} resources/probphys02-control.jpg
---
height: 240px
name: probphys02-control
---
Types of control signals. (a) From a differentiable simulator, and (b) from a learned encoder.
```
**Gradient-based control signal** In the first case, we make use of a differentiable simulator $\mathcal{P}$ to construct a cost function $C$. Naturally, $C$ will likewise be differentiable such that we can compute a gradient for a predicted solution. Also, we will rely on the stochasticity of diffusion/flow matching, and as such the simulator can be deterministic.
Given an observation $y$ and the estimated 1-step prediction $\hat{x}_1$, the control signal computes to how well $\hat{x}_1$ explains $y$ via the cost function $C$. Good choices for the cost are, e.g., an $L^2$ loss or a likelihood $p(y|\hat{x}_1)$. We define the control signal $c$ to consist of two components: the cost itself, and the gradient w.r.t. the cost function:
$$
\begin{align}
c(\hat{x}_1, y) := [C(\mathcal{P}(\hat{x}_1), y); \nabla_{\hat{x}_1} C(\mathcal{P}(\hat{x}_1), y)].
\end{align}
$$
As this information is passed to a network, the network can freely make use of the current distance to the target (the value of $C$) and the direction towards lowering it in the form of $\nabla_{\hat{x}_1} C$.
**Learning-based control signal** When the simulator is non-differentiable, the second variant of using a learned estimator comes in handy.
To combine the simulator output with the observation $y$, a learnable encoder network _Enc_ with parameters $\theta_E$ can be introduced to judge the similarity of the simulation and the observation. The output of the encoder is small and of size $O(\mathrm{dim}(x))$.
The control signal is then defined as
$$
\begin{align}
c(\hat{x}_1, y) := Enc(\mathcal{P}(\hat{x}_1), y).
\end{align}
$$
The gradient backpropagation is stopped at the output of the simulator $\mathcal{P}$, as shown in {numref}`figure {number} <probphys02-control>`.
Before showing some examples of the capabilities of these two types of control, we'll discuss some of their properties.
![Divider](resources/divider-genB.jpg)
### Additional Considerations
**Stochastic simulators** Many Bayesian inference problems have a stochastic simulator. For simplicity, we assume that all stochasticity within such a simulator can be controlled via a variable $z \sim \mathcal{N}(0, I)$, which is an additional input. Motivated by the equivalence of exchanging expectation and gradient
$$
\begin{align}
\nabla_{\hat{x}_1} \mathbb{E}_{z\sim \mathcal{N}(0,1)} [ C(\mathcal P_z(\hat{x}_1), y)] = \mathbb{E}_{z\sim \mathcal{N}(0,1)} [ \nabla_{\hat{x}_1} C(\mathcal P_z(\hat{x}_1), y)],
\end{align}
$$
when calling the simulator, we draw a random realization of $z$. During training, we randomly draw $z$ for each sample and step while during inference we keep the value of $z$ fixed for each trajectory.
**Time-dependence**
If the estimate $\hat{x}_1$ is bad and the corresponding cost $C(\hat{x}_1, y)$ is high, gradients and control signals can become unreliable. It turns out that the estimates $\hat{x}_1$ become more reliable for later times in the flow matching process.
In practice, $t \geq 0.8$ is a good threshold. Therefore, we only train the control network $v_\theta^C$ in this range, which allows for focusing on control signals containing more useful information to, e.g. fine tune the solutions with the accurate gradients of a differentiable simulator. For $t < 0.8$, we directly output the pretrained flow $v_\theta(t, x, y)$.
**Theoretical correctness**
In the formulation above, the approximation $\hat{x}_1$ only influences the control signal, which is an input to the controlled flow network $v_\theta^C$. In the case of a deterministic simulator, this makes the control signal a function of $x_t$. The controlled flow network is trained with the same loss as vanilla flow matching. This has the nice consequence that the theoretical properties are preserved.
This is in contrast to e.g. "likelihood-based guidance", which uses an approximation for $\nabla_{x_t} \log p(y|x_t)$ as a guidance term during inference, which is not covered by the original flow matching theory.
### An Example from Astrophysics
To demonstrate how these guidance from a physics solver affect the accuracy of samples and the posterior, we show an example from strong gravitational lensing: an inverse problem in astrophysics that is challenging and requires precise posteriors for accurate modeling of observations. In galaxy-scale strong lenses, light from a source galaxy is deflected by the gravitational potential of a galaxy between the source and observer, causing multiple images of the source to be seen. Traditional computational approaches require several minutes to many hours or days to model a single lens system. Therefore, there is an urgent need to reduce the compute and inference with learning-based methods. In this experiment, it's shown that using flow matching and the control signals with feedback from a simulator gives posterior distributions for lens modeling that are competitive with the posteriors obtained by MCMC-based methods. At the same time, they are much faster at inference.
```{figure} resources/probmodels-astro.jpg
---
height: 240px
name: probmodels-astro
---
Results from flow matching for reconstructing gravitational lenses. Left: flow matching with a differentiable simulator (bottom) clearly outperforms pure flow matching (top). Right: comparisons against classic baselines. The FM+simulator variant is more accurate while being faster.
```
The image aboves shows an example reconstruction and the residual errors. While flow matching and the physics-based variant are both very accurate (it's hard to visually make out differences), the FM version is just on par with classic inverse solvers. The version with the simulator, however, provides a substantial boost in terms of accuracy that is very difficult to achieve even for classic solvers. The quantitative results are shown in the table on the right: the best classic baseline is AIES with an average $\chi_2$ statistic of 1.74, while FM with simulator yields 1.48. Provided that the best possible result due to noisy observations is 1.17 for this scenario, the FM+simulation version is really highly accurate.
At the same time, the performance numbers for _modeling time_ in the right column show that the FM variant clearly outperforms the classic solvers. While the simulator increases inference time compared to only the neural network (10s to 19s), the classic baselines require more than $50\times$ longer reconstruction times. Interestingly, this example also highlights the problems of "simpler" physics combinations in the form of DPS. The DPS version does not manage to keep up with the classic solvers in terms of accuracy. To conclude, the _FM+simulator_ variant is not only substantially more accurate, but also ca. $35\times$ faster than the best classic solver above (AIES). (Source code for this approach will be available soon [in this repository](https://github.com/tum-pbs/sbi-sim).)
---
A summary of the physics-based flow matching is given by the following bullet points:
✅ Pro:
* Improved accuracy over purely learned diffusion models
* Gives control over residual accuracy
* Reduced runtime compared to traditional inverse solvers
❌ Con:
* Requires differentiable physical process
* Increased computational resources
![Divider](resources/divider-gen1.jpg)
## Score Matching with Differentiable Physics
So far we have treated the _diffusion time_ of denoising and flow matching as a process that is purely virtual and orthogonal to the time of the physical process to be represented by the forward and inverse problems. This is the most generic viewpoint, and works nicely, as demonstrated above. However, it's interesting to think about the alternative: merging the two processes, i.e., treating the diffusion process as an inherent component of the physics system.
```{figure} resources/probmodels-smdp-1trainB.jpg
---
height: 240px
name: probmodels-smdp-trainB
---
The physics process (heat diffusion as an example, left) perturbs and "destroys" the initial state. At inference time (right, Buoyancy flow as an example), the solver is used to compute inverse steps and produce solutions by combining steps along the score and the gradient of the solver.
```
The following sections will explain such a combined approach, following the paper "Solving Inverse Physics Problems with Score Matching" {cite}`holzschuh2023smdp`, which which [code is available in this repository](https://github.com/tum-pbs/SMDP).
This approach solves inverse physics problems by leveraging the ideas of score matching. The systems current state is moved backward in time step by step by combining an approximate inverse physics simulator and a learned correction function. A central insight of this work is that training the learned correction with a single-step loss is equivalent to a score matching objective, while recursively predicting longer parts of the trajectory during training relates to maximum likelihood training of a corresponding probability flow. The resulting inverse solver exhibits good accuracy and temporal stability. In line with diffusion modeling and in contrast to classic learned solvers, it allows for sampling the posterior of the solutions. The method will be called _SMDP_ (for _Score Matching with Differentiable Physics_) in the following.
### Training and Inference with SMDP
For training, SMDP fits a neural ODE, the probability flow, to the set of perturbed training trajectories. The probability flow is comprised of an approximate reverse physics simulator $\tilde{\mathcal{P}}^{-1}$ as well as a correction function $s_\theta$. For inference, we simulate the system backward in time from $\mathbf{x}_T$ to $\mathbf{x}_0$ by combining $\tilde{\mathcal{P}}^{-1}$, the trained $s_\theta$ and Gaussian noise in each step.
For optimizing $s_\theta$, our approach moves a sliding window of size $S$ along the training trajectories and reconstructs the current window. Gradients for $\theta$ are accumulated and backpropagated through all prediction steps. This process is illustrated in the following figure:
```{figure} resources/probmodels-smdp-1train.jpg
---
height: 240px
name: probmodels-smdp-train
---
Overview of the score matching training process while incorporating a physics solver $\mathcal P$ and it's approximate inverse solver $\matcal{P}^{-1}.
```
A differentiable solver or a learned surrogate model is employed for $\tilde{\mathcal{P}}^{-1}$.
The neural network $s_\theta(\mathbf{x}, t)$ parameterized by $\theta$ is trained such that
$$
\mathbf{x}_{m} \approx \mathbf{x}_{m+1} + \Delta t \left[ \tilde{\mathcal{P}}^{-1}(\mathbf{x}_{m+1}) + s_\theta(\mathbf{x}_{m+1}, t_{m+1}) \right].
$$
In this equation, the term $s_\theta(\mathbf{x}_{m+1}, t_{m+1})$ corrects approximation errors and resolves uncertainties from the stochastic forcing $F_{t_m}(z_m)$. Potentially, this process can be unrolled over multiple steps at training time to improve accuracy and stability. At inference, time the stochastic differential equation
$$
d\mathbf{x} = \left[ -\tilde{\mathcal{P}}^{-1}(\mathbf{x}) + C \, s_\theta(\mathbf{x},t) \right] dt + g(t) dW
$$
is integrated via the Euler-Maruyama method to obtain a solution for the inverse problem.
Setting $C=1$ and excluding the noise gives the probability flow ODE: a unique, deterministic solution. This deterministic variant is not probablistic anymore, but has other interesting properties.
```{figure} resources/probmodels-smdp-2infer.jpg
---
height: 148px
name: probmodels-smdp-infer
---
An overview of SMDP at inference time.
```
### SMDP in Action
This section shows experiments for the stochastic heat equation: $\frac{\partial u}{\partial t} = \alpha \Delta u$, which plays a fundamental role in many physical systems. It slightly perturbs the heat diffusion process and includes an additional term $g(t)\ \xi$, where $\xi$ is space-time white noise. For the experiments, we fix the diffusivity constant to $\alpha = 1$ and sample initial conditions at $t=0$ from Gaussian random fields with $n=4$ at resolution $32 \times 32$. We simulate the heat diffusion with noise from $t=0$ until $t=0.2$ using the Euler-Maruyama method and a spectral solver $\mathcal{P}_h$ with a fixed step size and $g \equiv 0.1$. Given a simulation end state $\mathbf{x}_T$, we want to recover a possible initial state $\mathbf{x}_0$.
In this experiment, the forward solver cannot be used to infer $\mathbf{x}_0$ directly since high frequencies due to noise are amplified, leading to physically implausible solutions. Instead, the reverse physics step $\tilde{P}^{-1}$ is implemented by using the forward step of the solver $\mathcal{P}_h(\mathbf{x})$, i.e. $\tilde{\mathcal{P}}^{-1}(\mathbf{x}) \approx - \mathcal{P}_h (\mathbf{x})$.
A small ResNet-like architecture is used based on an encoder and decoder part as representation for the score function $s_\theta(\mathbf{x}, t)$. The spectral solver is implemented via differentiable programming in _JAX_. As baseline methods, a supervised training of the same architecture as $s_\theta(\mathbf{x}, t)$, a Bayesian neural network (BNN), as well as a FNO network are considered. An $L_2$ loss is used for all these methods, i.e., the training data consists of pairs of initial state $\mathbf{x}_0$ and end state $\mathbf{x}_T$. Additionally, a variant of the SMDP method is included for which the reverse physics step $\tilde{\mathcal{P}}^{-1}$ is reomved, such that the inversion of the dynamics has to be learned entirely by $s_\theta$, denoted by ''$s_\theta$~only''.
```{figure} resources/probmodels-smdp-3heat.jpg
---
name: probmodels-smdp-heat
---
While the ODE trajectories provide smooth solutions with the lowest reconstruction MSE, the SDE solutions synthesize high-frequency content, significantly improving spectral error.
The ``$s_\theta$ only'' version without the reverse physics step exhibits a significantly larger spectral error. Metrics (right) are averaged over three runs.
```
SMDP and the baselines are evaluated by considering the _reconstruction MSE_ on a test set of $500$ initial conditions and end states. For the reconstruction MSE, the prediction of the network is simulated forward in time with the solver $\mathcal{P}_h$ to obtain a corresponding end state, which is compared to the ground truth via the $L_2$ distance. This metric has the disadvantage that it does not measure how well the prediction matches the training data manifold. I.e., for this case, whether the prediction resembles the properties of the initial Gaussian random field. For that reason, the power spectral density of the states is shown as a _spectral loss_. An evaluation and visualization of the reconstructions are given in figure \ref{fig:stochastic_heat_eq_overview}, which shows that the ODE inference performs best regarding the reconstruction MSE. However, its solutions are smooth and do not contain the necessary small-scale structures. This is reflected in a high spectral error. The SDE variant, on the other hand, performs very well in terms of spectral error and yields visually convincing solutions with only a slight increase in the reconstruction MSE.
This highlights the role of noise as a source of entropy in the inference process for diffusion models, such as the SDE in SMDP, which is essential for synthesizing small-scale structures. Note that there is a natural tradeoff between both metrics, and the ODE and SDE inference perform best for each of the cases while using an identical set of weights. This heat diffusion example highlights the advantages and properties of treating the physical process as part of the diffusion process. This, of course, extends to other physics. E.g., [the SMDP repository](https://github.com/tum-pbs/SMDP) additionally shows a case with an inverse Navier-Stokes solve.
## Summary of Physics-based Diffusion Models
Overall, the sections above have explained two methods to incorporate physics-based constraints and models in the form of PDEs into diffusion modeling. Interestingly, the inclusion is largely in line with {doc}`diffphys`, i.e. gradients of the physics solver are a central quantity, and concepts like unrolling play an important role. On the other hand, the probabilistic modeling introduces additional complexity on the training and inference sides. It provides powerful tools and access to distribiutions of solutions (we haven't even touched follow up applications such as uncertainty quantification above), but this comes at a cost.
As a rule of thumb 👍, diffusion modeling should only be used if the solution is a distribution that is _not_ well represented by the mean of the solutions. If the mean is accetable, "regular" neural networks offer substantial advantages in terms of reduced complexity for training and inference.
However, if the solutions are a distribution 🌦️, diffusion models are powerful tools to work with complex and varied solutions. Given its capabilties, deep learning with diffusion models arguably introduces surprisingly _little_ additional complexity. E.g., training flow matching models is suprisingly robust, can be build on top of deterministic training, and introduces only a mild computational overhead.
To show how the combination of physics solvers and diffusion models turns out in terms of an implementation, the next section shows source code for an SMDP use case.

1309
probmodels-sbisim.ipynb Normal file

File diff suppressed because one or more lines are too long

1083
probmodels-score.ipynb Normal file

File diff suppressed because one or more lines are too long

1595
probmodels-time.ipynb Normal file

File diff suppressed because one or more lines are too long

130
probmodels-uncond.md Normal file
View File

@@ -0,0 +1,130 @@
Unconditional Stability
=======================
The results of the previous section, for time predictions with diffusion models, and earlier ones ({doc}`diffphys-discuss`)
make it clear that unconditionally stable networks are definitely possible.
This has also been reported various other works. However, there's still a fair amount of approaches that seem to have trouble with long term stability.
This poses a very interesting question: which ingredients are necessary to obtain _unconditional stability_?
Unconditional stability here means obtaining trained networks that are stable for arbitrarily long rollouts. Are inductive biases or special training methodologies necessary, or is it simply a matter of training enough different initializations? Our setup provides a very good starting point to shed light on this topic.
The "success stories" from earlier chapters, some with fairly simple setups, indicate that unconditional stability is “nothing special” for neural network based predictors. I.e., it does not require special loss functions or tricks beyond a proper learning setup (suitable hyperparameters, sufficiently large model plus enough data).
As errors will accumulate over time, we can expect that network size and the total number of update steps in training are important. Interestingly, it seems that the neural network architecture doesnt really matter: we can obtain stable rollouts with pretty much “any” architecture once its sufficiently large.
Note that we'll focus on time steps with a **fixed length** in the following. The "unconditional stability" refers to being stable over an arbitrary number of iterative steps. The following networks could potentially trained for variable time step sizes as well, but we will focus on the "dimension" of stability of multiple, iterative network calls below.
![Divider](resources/divider-gen2.jpg)
## Main Considerations for an Evaluation
As shown in the previous chapter, diffusion models perform extremely well. This can be attributed to the underlying task of working with pure noise as input (e.g., for denoising or flow matching tasks). Likewise, the network architecture has only a minor influence: the network simply needs to be large enough to provide a converging iteration. For supervised or unrolled training, we can leverage a variety of discrete and continuous neural operators. CNNs, Unets, FNOs and Transformers are popular approaches here.
Interestingly, FNOs, due to their architecture _project_ the solution onto a subspace of the frequencies in the discretization. This inherently removes high frequencies that primarily drive instabilities. As such, they're influenced by unrolling to a lesser extent [(details can be found, e.g., here)](https://tum-pbs.github.io/apebench-paper/).
Operators that better preserve small-scale details, such as convolutions, can strongly benefit from unrolling. This will be a focus of the following ablations.
Interestingly, it turns out that the batch size and the length of the unrolling horizon play a crucial but conflicting role: small batches are preferable, but in the worst case under-utilize the hardware and require long training runs. Unrolling on the other hand significantly stabilizes the rollout, but leads to increased resource usage due to the longer computational graph for each NN update. Thus, our experiments show that a “sweet spot” along the Pareto-front of batch size vs unrolling horizon can be obtained by aiming for as-long-as-possible rollouts at training time in combination with a batch size that sufficiently utilizes the available GPU memory.
Learning Task: To analyze the temporal stability of autoregressive networks on long rollouts, two flow prediction tasks from the [ACDM benchmark](https://github.com/tum-pbs/autoreg-pde-diffusion) are considered: an easier incompressible cylinder flow (denoted by _Inc_), and a complex transonic wake flow (denoted by _Tra_) at Reynolds number 10 000. For Inc, the networks are trained on flows with Reynolds number 200 900 and required to extrapolate to Reynolds numbers of 960, 980, and 1000 during inference (_Inc-high_). For Tra, the training data consists of flows with Mach numbers between 0.53 and 0.9, and networks are tested on the Mach numbers 0.50, 0.51, and 0.52 (denoted by _Tra-ext_). This Mach number is tough as it contains a substantial amounts of shocks that interact with the flow.
For each sequences in both data sets, three training runs of each architecture are unrolled over 200.000 steps. This unrolling length is no proof that these networks yield infinitely long stable rollouts, but they feature an extremely small probability for blowups.
## Comparing Architectures
As a first comparison, we'll train three network architectures with an identical U-Net architecture, that use different stabilization techniques. This comparison shows that it is possible to successfully achieve the task "unconditional stability" in different ways:
- Unrolled training (_U-Net-ut_) where gradients are backpropagated through multiple time steps during training.
- Networks trained on a single prediction step with added training noise (_U-Net-tn_). This technique is known to improve stability by reducing data shift, as the added noise emulates errors that accumulate during inference.
- Autoregressive conditional diffusion models (ACDM). A denoising diffusion model is conditioned on the previous time step and iteratively refines noise to create a prediction for the next step, as shown in {doc}`probmodels-time`.
```{figure} resources/probmodels-uncond01.png
---
height: 240px
name: probmodels-uncond-inc
---
Vorticity predictions for an incompressible flow with a Reynolds number of 1000 over 200 000 time steps (Inc-high).
```
The figure above illustrates the resulting predictions. All methods and training runs remain unconditionally stable over the entire rollout on Inc-high. Since this flow is unsteady but fully periodic, the results of all networks are simple, periodic trajectories that prevent error accumulation. This example serves to show that for simpler tasks, long term stability is less of an issue. Networks have a relatively easy time to keep their predictions within the manifold of the solutions. Let's consider a tougher example: the transonic flows with shock waves in Tra.
```{figure} resources/probmodels-uncond02.png
---
height: 240px
name: probmodels-uncond-tra
---
Vorticity predictions for transonic flows with a Mach number 0.52 (Tra-ext, outside the trainig data range) over 200 000 time steps.
```
For the test sequences from Tra-ext, one from the three trained U-Net-tn networks has stability issues within the first few thousand steps. This network deteriorates to a simple, mean flow prediction without vortices. Unrolled training (U-Net-ut) and diffusion models (ACDM), on the other hand, are fully stable across sequences and training runs for this case, indicating a higher resistance to rollout errors which normally cause instabilities. The autoregressive diffusion models turn out to be unconditionally stable across the board [(details here)](https://arxiv.org/abs/2309.01745), so we'll drop them in the following evaluations and focus on models where stability is more difficult to achieve: the U-Nets, as representatives of convolutional, discrete neural operators.
## Stability Criteria
Focusing on the U-Net networks with unrolled training, we will next focus on training multiple models (3 each time), and measure the percentage of stable runs they achieve. This provides more thorough statistics compared to the single, qualitative examples above.
We'll investigate the first key criteria rollout length, to show how it influences fully stable rollouts over extremely long horizons.
Figure 2 lists the percentage of stable runs for a range of ablation networks on the Tra-ext data set with rollouts over 200 000 time steps. Results on the individual Mach numbers, as well as an average (top row) are shown.
```{figure} resources/probmodels-uncond03-ma.png
---
height: 210px
name: probmodels-uncond03-ma
---
Percentage of stable runs on the Tra-ext data set for different ablations of unrolled training.
```
The different generalization test over Mach numbers make no difference.
The most important criterion for stability is the number of unrolling steps m: while networks with m <= 4 consistently do not achieve stable rollouts, using m >= 8 is sufficient for stability across different Mach numbers.
**Negligible Aspects:**
Three factors that did not substantially impact rollout stability in experiments are the prediction strategy, the amount of training data, and the backbone architecture. We'll only briefly summarize the results here. First, using residual predictions, i.e., predicting the difference to the previous time step instead of the full time steps itself, did not impact stability. Second, the stability is not affected when reducing the amount of available training data by a factor of 8, from 1000 time steps per Mach number to 125 steps (while training with 8× more epochs to ensure a fair comparison). This training data reduction still retains the full physical behavior, i.e., complete vortex shedding periods. Third, it possible to train other backbone architectures with unrolling to achieve fully stable rollouts as well, such as dilated ResNets. For ResNets without dilations only one trained network is stable, most likely due to the reduced receptive field. However, we expect achieving full stability is also possible with longer training rollout horizons.
------
## Batch Size vs Rollout
Interestingly, the batch size turns out to be an important factor:
it can substantially impact the stability of autoregressive networks. This is similar to the image domain, where smaller batches are know to improve generalization (this is the motivation for using mini-batching instead of gradients over the full data set). The impact of the batch size on the stability and training time is shown in the figure below, for both investigated data sets. Networks that only come close to the ideal rollout length at a large batch size, can be stabilized with smaller batches. However, this effect does not completely remove the need for unrolled training, as networks without unrolling were unstable across all tested batch sizes. For the Inc case, the U-Net width was reduced by a factor of 8 across layers (in comparison to above), to artificially increase the difficulty of this task. Otherwise all parameter configurations would already be stable and show the effect of varying the batchsize.
```{figure} resources/probmodels-uncond04a.png
---
height: 210px
name: probmodels-uncond04a
---
Percentage of stable runs and training time for different combinations of rollout length and batch size for the Tra-ext data set. Grey configurations are omitted due to memory limitations (mem) or due to high computational demands (-).
```
```{figure} resources/probmodels-uncond04b.png
---
height: 210px
name: probmodels-uncond04b
---
Percentage of stable runs and training time for rollout length and batch size for the Inc-high data set. Grey again indicates out-of-memory (mem) or overly high computational demands (-).
```
This shows that increasing the batch size is more expensive in terms of training time on both data sets, due to less memory efficient computations. Using longer rollouts during training does not necessarily induce longer training times, as we compensate for longer rollouts with a smaller number of updates per epoch. E.g., we use either 250 batches with a rollout of 4, or 125 batches with a rollout of 8. Thus the number of simulation states that each network sees over the course of training remains constant. However, we did in practice observe additional computational costs for training the larger U-Net network on Tra-ext. This leads to the "central" question in these ablations: which combination of rollout length and batch size is most efficient?
```{figure} resources/probmodels-uncond05.png
---
height: 180px
name: probmodels-uncond05
---
Training time for different combinations of rollout length and batch size to on the Tra-ext data set (left) and the Inc-high data set (right). Only configurations that to lead to highly stable networks (stable run percentage >= 89%) are shown.
```
The figure above answers this question by showing the central tradeoff between rollout length and batch size (only stable versions are included here).
To achieve _unconditionally stable_ networks and neural operators, it is consistently beneficial to choose configurations where large rollout lengths are paired with a batch size that is big enough the sufficiently utilize the available GPU memory. This means, improved stability is achieved more efficiently with longer training rollouts rather than smaller batches, as indicated by the green dots with the lowest training times.
## Summary
To conclude the results above: With a suitable training setup, unconditionally stable predictions with extremely long rollout are clearly possible, even for complex flows. According to the experiments, the most important factors that impact stability are the decision for or against diffusion-based training
Without diffusion, several factors need to be considered:
- Long rollouts at training time
- Small batch sizes
- Comparing these two factors: longer rollouts are preferable, and result in faster training times than smaller batch sizes
- At the same time, sufficiently large networks are necessary (this depends on the complexity of the learning task).
Factors that did not substantially impact long-term stability are:
- Prediction paradigm during training, i.e., residual and direct prediction are viable
- Additional training data without new physical behavior
- Different network architectures, although the ideal number of unrolling steps might vary for each architecture
This concludes the topic of "unconditional stability".
Further details of these experiments can be found in the [ACDM paper](https://arxiv.org/abs/2309.01745)

View File

@@ -13,6 +13,68 @@
@STRING{NeurIPS = "Advances in Neural Information Processing Systems"}
@article{braun2025msbg,
title ={{Adaptive Phase-Field-FLIP for Very Large Scale Two-Phase Fluid Simulation}},
author = {Braun, Bernhard and Bender, Jan and Thuerey, Nils},
journal = {{ACM} Transaction on Graphics},
volume = {44 (3)},
year = {2025},
publisher = {ACM},
}
@inproceedings{lino2025dgn,
title={Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks},
author={Mario Lino and Tobias Pfaff and Nils Thuerey},
booktitle={International Conference on Learning Representations},
year={2025}
}
@inproceedings{liu2025config,
title={ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks},
author={Qiang Liu and Mengyu Chu and Nils Thuerey},
booktitle={International Conference on Learning Representations},
year={2025}
}
@inproceedings{bhatia2025prdp,
title={Progressively Refined Differentiable Physics},
author={Kanishk Bhatio and Felix Koehler and Nils Thuerey},
booktitle={International Conference on Learning Representations},
year={2025}
}
@inproceedings{koehler2024ape,
title={APEBench: A Benchmark for Autoregressive Neural Emulators of PDEs},
author={Felix Koehler and Simon Niedermayr and Ruediger Westermann and Nils Thuerey},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
year={2024}
}
@article{list2025differentiability,
title={Differentiability in unrolled training of neural physics simulators on transient dynamics},
author={List, Bjoern and Chen, Li-Wei and Bali, Kartik and Thuerey, Nils},
journal={Computer Methods in Applied Mechanics and Engineering},
volume={433},
pages={117441},
year={2025},
publisher={Elsevier}
}
@inproceedings{shehata2025trunc,
title={Truncation Is All You Need: Improved Sampling Of Diffusion Models For Physics-Based Simulations},
author={Youssef Shehata and Benjamin Holzschuh and Nils Thuerey},
booktitle={International Conference on Learning Representations},
year={2025}
}
@inproceedings{schnell2025td,
title={Temporal Difference Learning: Why It Can Be Fast and How It Will Be Faster},
author={Patrick Schnell and Luca Guastoni and Nils Thuerey},
booktitle={International Conference on Learning Representations},
year={2025}
}
@inproceedings{holl2024phiflow,
title={phiflow: Differentiable Simulations for PyTorch, TensorFlow and Jax},
@@ -21,7 +83,6 @@
year={2024}
}
@inproceedings{liu2024airfoils,
title={Uncertainty-aware Surrogate Models for Airfoil Flow Simulations with Denoising Diffusion Probabilistic Models},
author={Liu, Qiang and Thuerey, Nils},
@@ -51,35 +112,59 @@
url={https://joss.theoj.org/papers/10.21105/joss.06171},
}
@article{kohl2023benchmarking,
title={Benchmarking autoregressive conditional diffusion models for turbulent flow simulation},
author={Kohl, Georg and Chen, Li-Wei and Thuerey, Nils},
journal={arXiv:2309.01745},
year={2023}
}
@article{brahmachary2024unsteady,
title={Unsteady cylinder wakes from arbitrary bodies with differentiable physics-assisted neural network},
author={Brahmachary, Shuvayan and Thuerey, Nils},
journal={Physical Review E},
volume={109},
number={5},
year={2024},
publisher={APS}
}
@article{holzschuh2024fm,
title={Solving Inverse Physics Problems with Score Matching},
author={Benjamin Holzschuh and Nils Thuerey},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
volume={36},
year={2023}
}
@article{holzschuh2023smdp,
title={Solving Inverse Physics Problems with Score Matching},
author={Benjamin Holzschuh and Simona Vegetti and Thuerey, Nils},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
volume={36},
year={2023}
title={Solving Inverse Physics Problems with Score Matching},
author={Benjamin Holzschuh and Simona Vegetti and Nils Thuerey},
journal={Advances in Neural Information Processing Systems (NeurIPS)},
volume={36},
year={2023}
}
@inproceedings{franz2023nglobt,
title={Learning to Estimate Single-View Volumetric Flow Motions without 3D Supervision},
author={Erik Franz, Barbara Solenthaler, and Thuerey, Nils},
author={Erik Franz and Barbara Solenthaler and Nils Thuerey},
booktitle={ICLR},
year={2023},
url={https://github.com/tum-pbs/Neural-Global-Transport},
}
@inproceedings{kohl2023volSim,
title={Learning Similarity Metrics for Volumetric Simulations with Multiscale CNNs},
author={Kohl, Georg and Chen, Li-Wei and Thuerey, Nils},
booktitle={AAAI Conference on Artificial Intelligence},
year={2022},
url={https://github.com/tum-pbs/VOLSIM},
title={Learning Similarity Metrics for Volumetric Simulations with Multiscale CNNs},
author={Kohl, Georg and Chen, Li-Wei and Thuerey, Nils},
booktitle={AAAI Conference on Artificial Intelligence},
year={2022},
url={https://github.com/tum-pbs/VOLSIM},
}
@inproceedings{list2022piso,
title={Learned Turbulence Modelling with Differentiable Fluid Solvers},
author={Bjoern List and Liwei Chen and Thuerey, Nils},
booktitle={arXiv:2202.06988},
author={Bjoern List and Liwei Chen and Nils Thuerey},
booktitle={Journal of Fluid Mechanics (929/25)},
year={2022},
url={https://ge.in.tum.de/publications/},
}
@@ -120,8 +205,8 @@
}
@article{chu2021physgan,
author = {Chu, Mengyu and Thuerey, Nils and Seidel, Hans-Peter and Theobalt, Christian and Zayer, Rhaleb},
title ={{Learning Meaningful Controls for Fluids}},
author = {Chu, Mengyu and Thuerey, Nils and Seidel, Hans-Peter and Theobalt, Christian and Zayer, Rhaleb},
journal = ACM_TOG,
volume = {40(4)},
year = {2021},
@@ -1032,5 +1117,81 @@
year={2019}
}
# archs & prob mod
@article{goodfellow2014gan,
title={Generative adversarial networks},
author={Goodfellow, Ian and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron and Bengio, Yoshua},
journal={Advances in neural information processing systems},
volume={27},
year={2014}
}
@inproceedings{ronneberger2015unet,
title={U-net: Convolutional networks for biomedical image segmentation},
author={Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
year={2015},
}
@article{yu2015dilate,
title={Multi-scale context aggregation by dilated convolutions},
author={Yu, Fisher and Koltun, Vladlen},
journal={arXiv preprint arXiv:1511.07122},
year={2015}
}
@article{li2021fno,
title={Fourier neural operator for parametric partial differential equations},
author={Z. Li and N. B. Kovachki and K. Azizzadenesheli and B. Liu and K. Bhattacharya and A. M. Stuart and A. Anandkumar},
journal={ICLR}, year={2021}
}
@article{chen2019node,
title={Neural Ordinary Differential Equations},
author={Ricky T. Q. Chen and Yulia Rubanova and Jesse Bettencourt and David Duvenaud},
journal={arXiv:1806.07366}, year={2019}
}
@article{vincent2011dsm,
title={A connection between score matching and denoising autoencoders},
author={Vincent, Pascal},
journal={Neural computation},
volume={23},
number={7},
pages={1661--1674},
year={2011},
publisher={MIT Press}
}
@article{kobyzev2020nf,
title={Normalizing flows: An introduction and review of current methods},
author={Kobyzev, Ivan and Prince, Simon JD and Brubaker, Marcus A},
journal={IEEE transactions on pattern analysis and machine intelligence},
volume={43}, number={11},
year={2020},
publisher={IEEE}
}
@article{lipman2022flow,
title={Flow matching for generative modeling},
author={Lipman, Yaron and Chen, Ricky TQ and Ben-Hamu, Heli and Nickel, Maximilian and Le, Matt},
journal={arXiv:2210.02747}, year={2022}
}
@article{liu2022rect,
title={Flow straight and fast: Learning to generate and transfer data with rectified flow},
author={Liu, Xingchao and Gong, Chengyue and Liu, Qiang},
journal={arXiv:2209.03003}, year={2022}
}
@inproceedings{chung2023dps,
title={Diffusion posterior sampling for general noisy inverse problems},
author={Chung, Hyungjin and Kim, Jeongsol and Mccann, Michael and Klasky, Marc and Ye, Jong Chul},
booktitle={International Conference on Learning Representations},
year={2023}
}

View File

@@ -1,7 +1,8 @@
Introduction to Reinforcement Learning
=======================
Deep reinforcement learning, which we'll just call _reinforcement learning_ (RL) from now on, is a class of methods in the larger field of deep learning that lets an artificial intelligence agent explore the interactions with a surrounding environment. While doing this, the agent receives reward signals for its actions and tries to discern which actions contribute to higher rewards, to adapt its behavior accordingly. RL has been very successful at playing games such as Go {cite}`silver2017mastering`, and it bears promise for engineering applications such as robotics.
Deep reinforcement learning, which we'll just call _reinforcement learning_ (RL) from now on, is a class of methods in the larger field of deep learning that takes a different viewpoint from classic "train with data" one:
RL effectively lets an AI agent learn from interactions with an environment. While performing actions, the agent receives reward signals and tries to discern which actions contribute to higher rewards, to adapt its behavior accordingly. RL has been very successful at playing games such as Go {cite}`silver2017mastering`, and it bears promise for engineering applications such as robotics.
The setup for RL generally consists of two parts: the environment and the agent. The environment receives actions $a$ from the agent while supplying it with observations in the form of states $s$, and rewards $r$. The observations represent the fraction of the information from the respective environment state that the agent is able to perceive. The rewards are given by a predefined function, usually tailored to the environment and might contain, e.g., a game score, a penalty for wrong actions or a bounty for successfully finished tasks.

BIN
resources/arch01.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

BIN
resources/arch02.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

BIN
resources/arch03.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

BIN
resources/arch04.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 162 KiB

BIN
resources/arch05.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

BIN
resources/arch06-fno.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 226 KiB

BIN
resources/divider-gen1.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

BIN
resources/divider-gen2.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

BIN
resources/divider-gen3.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

BIN
resources/divider-gen4.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

BIN
resources/divider-gen5.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

BIN
resources/divider-gen6.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB

BIN
resources/divider-gen7.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 128 KiB

BIN
resources/divider-gen8.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 143 KiB

BIN
resources/divider-gen9.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

BIN
resources/divider-genA.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

BIN
resources/divider-genB.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 133 KiB

BIN
resources/divider-genC.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 200 KiB

After

Width:  |  Height:  |  Size: 252 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 162 KiB

After

Width:  |  Height:  |  Size: 161 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 40 KiB

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

BIN
resources/pbdl-figures.key Executable file → Normal file

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 123 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 157 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

After

Width:  |  Height:  |  Size: 254 KiB

BIN
resources/prob01-cnf.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

BIN
resources/prob02-ddpm.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

BIN
resources/prob03-fm.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 122 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 136 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 142 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 91 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 145 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 241 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 149 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 89 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Some files were not shown because too many files have changed in this diff Show More