updated intro discussion

This commit is contained in:
N_T 2025-02-03 15:28:55 +08:00
parent dbd5d53e31
commit 8f8634119d
4 changed files with 88 additions and 77 deletions

View File

@ -2,7 +2,7 @@
# Learn more at https://jupyterbook.org/customize/config.html
title: Physics-based Deep Learning
author: N. Thuerey, B. Holzschuh, P. Holl, G. Kohl, M. Lino, P. Schnell, F. Trost
author: N. Thuerey, B. Holzschuh, P. Holl, G. Kohl, M. Lino, Q. Liu, P. Schnell, F. Trost
logo: resources/logo.jpg
copyright: "2021 - 2025"
only_build_toc_files: true

View File

@ -39,13 +39,13 @@
"id": "funky-tamil",
"metadata": {},
"source": [
"One of the key concepts of the following chapters is what we'll call _differentiable physics_ (DP). This means that we use domain knowledge in the form of model equations, and then integrate discretized versions of these models into the training process. As implied by the name, having differentiable formulations is crucial for this process to support the training of neural networks.\n",
"One of the key concepts of the following chapters is what we'll call _differentiable physics_ (DP). This means that we use domain knowledge in the form of model equations, and then integrate discretized versions of these models into the training process. As implied by the name, having differentiable formulations and operators is crucial for this process to integrate with neural networks training.\n",
"\n",
"Let's illustrate the properties of deep learning via DP with the following example: We'd like to find an unknown function $f^*$ that generates solutions from a space $Y$, taking inputs from $X$, i.e. $f^*: X \\to Y$. In the following, we'll often denote _idealized_, and unknown functions with a $*$ superscript, in contrast to their discretized, realizable counterparts without this superscript. \n",
"\n",
"Let's additionally assume we have a generic differential equation $\\mathcal P^*: Y \\to Z$ (our _model_ equation), that encodes a property of the solutions, e.g. some real world behavior we'd like to match. Later on, $P^*$ will often represent time evolutions, but it could also be a constraint for conservation of mass (then $\\mathcal P^*$ would measure divergence). But to keep things as simple as possible here, the model we'll look at in the following is a mapping back to the input space $X$, i.e. $\\mathcal P^*: Y \\to X$.\n",
"Let's additionally assume we have a generic differential equation $\\mathcal P^*: Y \\to Z$ (our _model_ equation), that encodes a property of the solutions, e.g. some real world behavior we'd like to match. Later on, $P^*$ will often represent time evolutions, but it could also be a conservation law (e.g., conservation of mass, then $\\mathcal P^*$ would measure divergence). \n",
"\n",
"Using a neural network $f$ to learn the unknown and ideal function $f^*$, we could turn to classic _supervised_ training to obtain $f$ by collecting data. This classical setup requires a dataset by sampling $x$ from $X$ and adding the corresponding solutions $y$ from $Y$. We could obtain these, e.g., by classical numerical techniques. Then we train the NN $f$ in the usual way using this dataset. \n",
"Using a neural network $f$ to learn the unknown and ideal function $f^*$, we could turn to classic _supervised_ training to obtain $f$ by collecting data. This classical setup requires a dataset by sampling $x$ from $X$ and adding the corresponding solutions $y$ from $Y$. We could obtain these, e.g., by classical numerical techniques. Then we train the NN $f$ with classic methods using this dataset. \n",
"\n",
"In contrast to this supervised approach, employing a differentiable physics approach takes advantage of the fact that we can often use a discretized version of the physical model $\\mathcal P$ and employ it to guide the training of $f$. I.e., we want $f$ to be aware of our _simulator_ $\\mathcal P$, and to _interact_ with it. This can vastly improve the learning, as we'll illustrate below with a very simple example (more complex ones will follow later on).\n",
"\n",
@ -65,10 +65,11 @@
"id": "latest-amino",
"metadata": {},
"source": [
"To illustrate the difference of supervised and DP approaches, we consider the following simplified setting: Given the function $\\mathcal P: y\\to y^2$ for $y$ in the interval $[0,1]$, find the unknown function $f$ such that $\\mathcal P(f(x)) = x$ for all $x$ in $[0,1]$. Note: to make things a bit more interesting, we're using $y^2$ here for $\\mathcal P$ instead of the more common $x^2$ parabola, and the _discretization_ is simply given by representing the $x$ and $y$ via floating point numbers in the computer for this simple case.\n",
"To illustrate the difference of supervised and DP approaches, we consider the following simplified setting: Given the function $\\mathcal P: y\\to y^2$ for $y$ in the interval $[0,1]$, find the unknown function $f$ such that $\\mathcal P(f(x)) = x$ for all $x$ in $[0,1]$. E.g., for $x=0.5$, solutions would be $\\pm\\sqrt{0.5}$.\n",
"Note: to make things a bit more interesting, we're using $y^2$ here for $\\mathcal P$ instead of the more common $x^2$ parabola, and the _discretization_ is simply given by representing the $x$ and $y$ via floating point numbers in the computer for this simple case.\n",
"\n",
"We know that possible solutions for $f$ are the positive or negative square root function (for completeness: piecewise combinations would also be possible).\n",
"Knowing that this is not overly difficult, a solution that suggests itself is to train a neural network to approximate this inverse mapping $f$.\n",
"This sounds easy, so let's try to train a neural network to approximate this inverse mapping $f$.\n",
"Doing this in the \"classical\" supervised manner, i.e. purely based on data, is an obvious starting point. After all, this approach was shown to be a powerful tool for a variety of other applications, e.g., in computer vision."
]
},
@ -89,7 +90,7 @@
"id": "numerous-emphasis",
"metadata": {},
"source": [
"For supervised training, we can employ our solver $\\mathcal P$ for the problem to pre-compute the solutions we need for training: We randomly choose between the positive and the negative square root. This resembles the general case, where we would gather all data available to us (e.g., using optimization techniques to compute the solutions). Such data collection typically does not favor one particular mode from multimodal solutions."
"For supervised training, we can employ our solver $\\mathcal P$ for the problem to pre-compute the solutions we need for training: We randomly choose between the positive and the negative square root. This resembles the general case, where we would gather all data beforehand, e.g., using optimization techniques to compute the solutions or even experiments. This data collection typically does not favor one particular mode from multimodal solutions."
]
},
{
@ -111,7 +112,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Generation Y-Data\n",
"# Generation of Y-Data\n",
"sign = (- np.ones((N,)))**np.random.randint(2,size=N)\n",
"Y = np.sqrt(X) * sign"
]
@ -121,7 +122,7 @@
"id": "stone-science",
"metadata": {},
"source": [
"Now we can define a network, the loss, and the training configuration. We'll use a simple `keras` architecture with three hidden layers, ReLU activations."
"Now we can define a network, the loss, and the training configuration. We'll use a simple `keras` architecture with three hidden layers and ReLU activations."
]
},
{
@ -185,9 +186,7 @@
"id": "governmental-mixture",
"metadata": {},
"source": [
"As both NN and the data set are very small, the training converges very quickly. However, if we inspect the predictions of the network, we can see that it is nowhere near the solution we were hoping to find: it averages between the data points on both sides of the x-axis and therefore fails to find satisfying solutions to the problem above.\n",
"\n",
"The following plot nicely highlights this: it shows the data in light gray, and the supervised solution in red. "
"As both NN and the data set are very small, the training converges very quickly. Let's plot the solution: the following one shows the data in light gray, and the supervised solution in red. "
]
},
{
@ -226,6 +225,7 @@
"metadata": {},
"source": [
"😱 This is obviously completely wrong! The red solution is nowhere near one of the two modes of our solution shown in gray.\n",
"The training process has averaged between the data points on both sides of the x-axis and therefore fails to find satisfying solutions to the problem above.\n",
"\n",
"Note that the red line is often not perfectly at zero, which is where the two modes of the solution should average out in the continuous setting. This is caused by the relatively coarse sampling with only 200 points in this example.\n",
"<br>\n",
@ -246,7 +246,8 @@
"id": "acoustic-review",
"metadata": {},
"source": [
"Now let's apply a differentiable physics approach to find $f$: we'll directly include our discretized model $\\mathcal P$ in the training. \n",
"Now let's apply the differentiable physics idea as mentioned above to find $f$: we'll directly include our discretized model $\\mathcal P$ in the training. \n",
"Note that in this context, $\\mathcal P^*$ and $\\mathcal P$ actually provide a mapping back to the input space $X$, i.e. $\\mathcal P^*: Y \\to X$.\n",
"\n",
"There is no real data generation step; we only need to sample from the $[0,1]$ interval. We'll simply keep the same $x$ locations used in the previous case, and a new instance of a NN with the same architecture as before `nn_dp`:"
]
@ -260,7 +261,10 @@
"source": [
"# X-Data\n",
"# X = X , we can directly re-use the X from above, nothing has changed...\n",
"# Y is evaluated on the fly\n",
"\n",
"# P maps Y back to X, simply by computing a square, as y is a TF tensor input, the square operation **2 will be differentiable\n",
"def P(y):\n",
" return y**2\n",
"\n",
"# Model\n",
"nn_dp = tf.keras.models.Sequential([\n",
@ -274,9 +278,9 @@
"id": "conscious-budapest",
"metadata": {},
"source": [
"The loss function is the crucial point for training: we directly incorporate the function $f$ into the loss. In this simple case, the `loss_dp` function simply computes the square of the prediction `y_pred`. \n",
"The loss function is the crucial point for training: we directly incorporate the function to learn, $f$ called `nn_dp`, into the loss. Keras will evaluate `nn_dp` for an inpupt from `X`, and provide the output in the second argument `y_from_nn_dp`. On this output, we'll run our \"solver\" `P`, and the result should match the correct answer `y_true`. In this simple case, the `loss_dp` function simply computes the square of the prediction `y_pred`. \n",
"\n",
"Later on, a lot more could happen here: we could evaluate finite-difference stencils on the predicted solution, or compute a whole implicit time-integration step of a solver. Here we have a simple _mean-squared error_ term of the form $|y_{\\text{pred}}^2 - y_{\\text{true}}|^2$, which we are minimizing during training. It's not necessary to make it so simple: the more knowledge and numerical methods we can incorporate, the better we can guide the training process."
"Later on, a lot more could happen here: we could evaluate finite-difference stencils on the predicted solution, or compute a whole implicit time-integration step of a solver. Here we have a simple _mean-squared error_ term of the form $|\\mathcal P(y_{\\text{pred}}) - x_{\\text{true}}|^2$, which we are minimizing during training. It's not necessary to make it so simple: the more knowledge and numerical methods we can incorporate, the better we can guide the training process."
]
},
{
@ -288,8 +292,8 @@
"source": [
"#Loss\n",
"mse = tf.keras.losses.MeanSquaredError()\n",
"def loss_dp(y_true, y_pred):\n",
" return mse(y_true,y_pred**2)\n",
"def loss_dp(x_true, y_from_nn_dp):\n",
" return mse(x_true,P(y_from_nn_dp))\n",
"\n",
"optimizer_dp = tf.keras.optimizers.Adam(learning_rate=0.001)\n",
"nn_dp.compile(optimizer=optimizer_dp, loss=loss_dp)"

View File

@ -89,6 +89,7 @@ This project would not have been possible without the help of many people who co
- [Philipp Holl](https://ge.in.tum.de/about/philipp-holl/)
- [Georg Kohl](https://ge.in.tum.de/about/georg-kohl/)
- [Mario Lino](https://ge.in.tum.de/about/mario-lino/)
- [Qiang Liu](https://ge.in.tum.de/about/qiang-liu/)
- [Patrick Schnell](https://ge.in.tum.de/about/patrick-schnell/)
- [Felix Trost](https://ge.in.tum.de/about/)
- [Nils Thuerey](https://ge.in.tum.de/about/n-thuerey/)
@ -96,6 +97,7 @@ This project would not have been possible without the help of many people who co
Additional thanks go to
Li-Wei Chen,
Xin Luo,
Maximilian Mueller,
Chloe Paillard,
Kiwon Um,

View File

@ -2,10 +2,10 @@ Overview
============================
The name of this book, _Physics-Based Deep Learning_,
denotes combinations of physical modeling and numerical simulations with
methods based on artificial neural networks.
The general direction of Physics-Based Deep Learning represents a very
active, quickly growing and exciting field of research. The following chapter will
denotes combinations of physical modeling and **numerical simulations** with
methods based on **artificial intelligence**, i.e. neural networks.
The general direction of Physics-Based Deep Learning, also going under the name _Scientific Machine Learning_,
represents a very active, quickly growing and exciting field of research. The following chapter will
give a more thorough introduction to the topic and establish the basics
for following chapters.
@ -15,9 +15,9 @@ height: 240px
name: overview-pano
---
Understanding our environment, and predicting how it will evolve is one of the key challenges of humankind.
A key tool for achieving these goals are simulations, and next-gen simulations
could strongly profit from integrating deep learning components to make even
more accurate predictions about our world.
A key tool for achieving these goals are computer simulations, and the next generation of these simulations
will likely strongly profit from integrating AI and deep learning components, in order to make even
better accurate predictions about the phenomena in our environment.
```
## Motivation
@ -28,11 +28,11 @@ to the control of plasma fusion {cite}`maingi2019fesreport`,
using numerical analysis to obtain solutions for physical models has
become an integral part of science.
In recent years, machine learning technologies and _deep neural networks_ in particular,
In recent years, artificial intelligence driven by _deep neural networks_,
have led to impressive achievements in a variety of fields:
from image classification {cite}`krizhevsky2012` over
natural language processing {cite}`radford2019language`,
and more recently also for protein folding {cite}`alquraishi2019alphafold`.
and protein folding {cite}`alquraishi2019alphafold`, to various foundation models.
The field is very vibrant and quickly developing, with the promise of vast possibilities.
### Replacing traditional simulations?
@ -45,14 +45,17 @@ for real-world, industrial applications such as airfoil flows {cite}`chen2021hig
same time outperforming traditional solvers by orders of magnitude in terms of runtime.
Instead of relying on models that are carefully crafted
from first principles, can data collections of sufficient size
be processed to provide the correct answers?
from first principles, can sufficiently large datasets
be processed instead to provide the correct answers?
As we'll show in the next chapters, this concern is unfounded.
Rather, it is crucial for the next generation of simulation systems
to bridge both worlds: to
combine _classical numerical_ techniques with _deep learning_ methods.
combine _classical numerical_ techniques with _A.I._ methods.
In addition, the latter offer exciting new possibilities in areas that
have been challenging for traditional methods, such as dealing
with complex _distributions and uncertainty_ in simulations.
One central reason for the importance of this combination is
One central reason for the importance of the combination with numerics is
that DL approaches are powerful, but at the same time strongly profit
from domain knowledge in the form of physical models.
DL techniques and NNs are novel, sometimes difficult to apply, and
@ -70,36 +73,37 @@ developed in the field of numerical mathematics, this book will
show that it is highly beneficial to use them as much as possible
when applying DL.
### Black boxes and magic?
### Black boxes?
People who are unfamiliar with DL methods often associate neural networks
with _black boxes_, and see the training processes as something that is beyond the grasp
In the past, AI and DL methods have often associated trained neural networks
with _black boxes_, implying that they are something that is beyond the grasp
of human understanding. However, these viewpoints typically stem from
relying on hearsay and not dealing with the topic enough.
relying on hearsay and general skepticism about "hyped" topics.
Rather, the situation is a very common one in science: we are facing a new class of methods,
and "all the gritty details" are not yet fully worked out. This is pretty common
The situation is a very common one in science, though: we are facing a new class of methods,
and "all the gritty details" are not yet fully worked out. This is and has been pretty common
for all kinds of scientific advances.
Numerical methods themselves are a good example. Around 1950, numerical approximations
and solvers had a tough standing. E.g., to cite H. Goldstine,
numerical instabilities were considered to be a
"constant source of anxiety in the future" {cite}`goldstine1990history`.
By now we have a pretty good grasp of these instabilities, and numerical methods
are ubiquitous and well established.
are ubiquitous and well established. AI, neural networks follow the same path of
human progress.
Thus, it is important to be aware of the fact that -- in a way -- there is nothing
magical or otherworldly to deep learning methods. They're simply another set of
numerical tools. That being said, they're clearly fairly new, and right now
very special or otherworldly to deep learning methods. They're simply a new set of
numerical tools. That being said, they're clearly very new, and right now
definitely the most powerful set of tools we have for non-linear problems.
Just because all the details aren't fully worked out and nicely written up,
that shouldn't stop us from including these powerful methods in our numerical toolbox.
That all the details aren't fully worked out and have nicely been written up
shouldn't stop us from including these powerful methods in our numerical toolbox.
### Reconciling DL and simulations
Taking a step back, the aim of this book is to build on all the powerful techniques that we have
at our disposal for numerical simulations, and use them wherever we can in conjunction
with deep learning.
As such, a central goal is to _reconcile_ the data-centered viewpoint with physical simulations.
As such, a central goal is to _reconcile_ the AI viewpoint with physical simulations.
```{admonition} Goals of this document
:class: tip
@ -109,8 +113,8 @@ The key aspects that we will address in the following are:
- without **discarding** our knowledge about numerical methods.
At the same time, it's worth noting what we won't be covering:
- introductions to deep learning and numerical simulations,
- we're neither aiming for a broad survey of research articles in this area.
- there's no in-depth **introduction** to deep learning and numerical simulations (there are great other works already taking care of this),
- and the aim is neither a broad survey of research articles in this area.
```
The resulting methods have a huge potential to improve
@ -118,26 +122,28 @@ what can be done with numerical methods: in scenarios
where a solver targets cases from a certain well-defined problem
domain repeatedly, it can for instance make a lot of sense to once invest
significant resources to train
a neural network that supports the repeated solves. Based on the
domain-specific specialization of this network, such a hybrid solver
could vastly outperform traditional, generic solvers. And despite
a neural network that supports the repeated solves.
The development of large so-called "foundation models" is especially
promising in this area.
Based on the domain-specific specialization via fine-tuning with a smaller dataset,
a hybrid solver could vastly outperform traditional, generic solvers. And despite
the many open questions, first publications have demonstrated
that this goal is not overly far away {cite}`um2020sol,kochkov2021`.
that this goal is a realistic one {cite}`um2020sol,kochkov2021`.
Another way to look at it is that all mathematical models of our nature
are idealized approximations and contain errors. A lot of effort has been
made to obtain very good model equations, but to make the next
big step forward, DL methods offer a very powerful tool to close the
big step forward, AI and DL methods offer a very powerful tool to close the
remaining gap towards reality {cite}`akkaya2019solving`.
## Categorization
Within the area of _physics-based deep learning_,
we can distinguish a variety of different
approaches, from targeting constraints, combined methods, and
optimizations to applications. More specifically, all approaches either target
approaches, e.g., targeting constraints, combined methods,
optimizations and applications. More specifically, all approaches either target
_forward_ simulations (predicting state or temporal evolution) or _inverse_
problems (e.g., obtaining a parametrization for a physical system from
problems (e.g., obtaining a parametrization or state for a physical system from
observations).
![An overview of categories of physics-based deep learning methods](resources/physics-based-deep-learning-overview.jpg)
@ -160,17 +166,14 @@ techniques:
gradients from a PDE-based formulation. These soft constraints sometimes also go
under the name "physics-informed" training.
- _Interleaved_: the full physical simulation is interleaved and combined with
an output from a deep neural network; this requires a fully differentiable
simulator and represents the tightest coupling between the physical system and
the learning process. Interleaved differentiable physics approaches are especially important for
temporal evolutions, where they can yield an estimate of the future behavior of the
dynamics.
- _Hybrid_: the full physical simulation is interleaved and combined with
an output from a deep neural network; this usually requires a fully differentiable
simulator. It represents the tightest coupling between the physical system and
the learning process and results in a hybrid solver that combines classic techniques with AI-based ones.
Thus, methods can be categorized in terms of forward versus inverse
solve, and how tightly the physical model is integrated into the
optimization loop that trains the deep neural network. Here, especially
interleaved approaches that leverage _differentiable physics_ allow for
solve, and how tightly the physical model is integrated with the neural network.
Here, especially hybrid approaches that leverage _differentiable physics_ allow for
very tight integration of deep learning and numerical simulation methods.
@ -186,19 +189,28 @@ In contrast, we'll focus on _physical_ simulations from now on, hence the name.
When coming from other backgrounds, other names are more common however. E.g., the differentiable
physics approach is equivalent to using the adjoint method, and coupling it with a deep learning
procedure. Effectively, it is also equivalent to apply backpropagation / reverse-mode differentiation
to a numerical simulation. However, as mentioned above, motivated by the deep learning viewpoint,
to a numerical simulation.
However, as mentioned above, motivated by the deep learning viewpoint,
we'll refer to all these as "differentiable physics" approaches from now on.
The hybrid solvers that result from integrating DL with a traditional solver can also be seen
as a classic topic: in this context, the neural network has the task to _correct_ the solver.
This correction can in turn either target numerical errors, or unresolved terms in an equation.
This is a fundamental problem in science that has been addressed under various names, e.g.,
as the _closure problem_ in fluid dynamics and turbulence, as _homogenization_ or _coarse-graining_
in material science, and _parametrization_ in climate and weather simulation. The re-invention
of this goal in the different fields points to the importance of the underlying problem,
and this text will illustrate the new ways that DL offers to tackle it.
---
## Looking ahead
_Physical simulations_ are a huge field, and we won't be able to cover all possible types of physical models and simulations.
_Physics simulations_ are a huge field, and we won't be able to cover all possible types of physical models and simulations.
```{note} Rather, the focus of this book lies on:
- _Field-based simulations_ (no Lagrangian methods)
- Dense _Field-based simulations_ (no Lagrangian methods)
- Combinations with _deep learning_ (plenty of other interesting ML techniques exist, but won't be discussed here)
- Experiments are left as an _outlook_ (i.e., replacing synthetic data with real-world observations)
```
@ -218,24 +230,17 @@ A brief look at our _notation_ in the {doc}`notation` chapter won't hurt in both
## Implementations
This text also represents an introduction to a wide range of deep learning and simulation APIs.
We'll use popular deep learning APIs such as _pytorch_ [https://pytorch.org](https://pytorch.org) and _tensorflow_ [https://www.tensorflow.org](https://www.tensorflow.org), and additionally
give introductions into the differentiable simulation framework _Φ<sub>Flow</sub> (phiflow)_ [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow). Some examples also use _JAX_ [https://github.com/google/jax](https://github.com/google/jax). Thus after going through
these examples, you should have a good overview of what's available in current APIs, such that
This text also represents an introduction to deep learning and simulation APIs.
We'll primarily use the popular deep learning API _pytorch_ [https://pytorch.org](https://pytorch.org), but also a bit of _tensorflow_ [https://www.tensorflow.org](https://www.tensorflow.org), and additionally
give introductions into the differentiable simulation framework _Φ<sub>Flow</sub> (phiflow)_ [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow). Some examples also use _JAX_ [https://github.com/google/jax](https://github.com/google/jax), which provides an interesting alternative.
Thus after going through these examples, you should have a good overview of what's available in current APIs, such that
the best one can be selected for new tasks.
As we're (in most Jupyter notebook examples) dealing with stochastic optimizations, many of the following code examples will produce slightly different results each time they're run. This is fairly common with NN training, but it's important to keep in mind when executing the code. It also means that the numbers discussed in the text might not exactly match the numbers you'll see after re-running the examples.
<!-- ## A brief history of PBDL in the context of Fluids
First:
Tompson, seminal...
First: Tompson, seminal...
Chu, descriptors, early but not used
Ling et al. isotropic turb, small FC, unused?
PINNs ... and more ... -->