more text
This commit is contained in:
parent
03a4c7ef29
commit
0063c71c05
6
_toc.yml
6
_toc.yml
@ -1,9 +1,9 @@
|
||||
# Table of content
|
||||
# Learn more at https://jupyterbook.org/customize/toc.html
|
||||
# PBDL Table of content (cf https://jupyterbook.org/customize/toc.html)
|
||||
#
|
||||
- file: intro
|
||||
- file: overview.md
|
||||
sections:
|
||||
- file: overview-equations.md
|
||||
- file: overview-burgers-forw.ipynb
|
||||
- file: overview-ns-forw.ipynb
|
||||
- file: supervised
|
||||
@ -12,6 +12,7 @@
|
||||
- file: physicalloss
|
||||
sections:
|
||||
- file: physicalloss-code.ipynb
|
||||
- file: physicalloss-discuss.md
|
||||
- file: diffphys
|
||||
sections:
|
||||
- file: diffphys-code-gradient.ipynb
|
||||
@ -23,3 +24,4 @@
|
||||
- file: markdown
|
||||
- file: notebooks
|
||||
- file: references
|
||||
- file: notation
|
||||
|
@ -29,7 +29,7 @@ For the PINN representation with fully-connected networks on the other hand, we
|
||||
|
||||
The following table summarizes these findings:
|
||||
|
||||
| Method | Pro | Con |
|
||||
| Method | ✅ Pro | ❌ Con |
|
||||
|----------|-------------|------------|
|
||||
| **PINN** | - Analytic derivatives via back-propagation | - Expensive evaluation of NN, as well as derivative calculations |
|
||||
| | - Simple to implement | - Incompatible with existing numerical methods |
|
||||
|
11
intro.md
11
intro.md
@ -73,9 +73,15 @@ The contents of the following files would not have been possible without the hel
|
||||
- Ms. y
|
||||
- ...
|
||||
|
||||
% tests...
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
% some markdown tests follow ...
|
||||
|
||||
---
|
||||
|
||||
a b c
|
||||
|
||||
```{admonition} My title2
|
||||
@ -86,6 +92,7 @@ See also... Test link: {doc}`supervised`
|
||||
✅ Do this , ❌ Don't do this
|
||||
|
||||
% ----------------
|
||||
|
||||
---
|
||||
|
||||
|
||||
@ -152,6 +159,6 @@ time series, sequence prediction?] {cite}`wiewel2019lss,bkim2019deep,wiewel2020l
|
||||
_Misc jupyter book TODOs_
|
||||
|
||||
- Fix latex PDF output
|
||||
- How to include links in references?
|
||||
- How to include links to papers in the bibtex references?
|
||||
|
||||
|
||||
|
@ -1,5 +1,8 @@
|
||||
Jupyter Book Reference Stuff
|
||||
Old Jupyter Book Reference Stuff
|
||||
=======================
|
||||
|
||||
There are many ways to write content in Jupyter Book. This short section
|
||||
covers a few tips for how to do so.
|
||||
|
||||
TODO remove sometime...
|
||||
|
||||
|
38
notation.md
Normal file
38
notation.md
Normal file
@ -0,0 +1,38 @@
|
||||
|
||||
# Notation and Abbreviations
|
||||
|
||||
## Math notation:
|
||||
|
||||
| Symbol | Meaning |
|
||||
| --- | --- |
|
||||
| $A$ | matrix |
|
||||
| $\eta$ | learning rate or step size |
|
||||
| $\Gamma$ | boundary of computational domain $\Omega$ |
|
||||
| $f()$ | approximated version of $f^{*}$ |
|
||||
| $f^{*}()$ | generic function to be approximated, typically unknown |
|
||||
| $\Omega$ | computational domain |
|
||||
| $\mathcal P$ | physical model, PDE |
|
||||
| $\theta$ | neural network params |
|
||||
| $t$ | time dimension |
|
||||
| $\mathbf{u}$ | vector-valued velocity |
|
||||
| $x$ | neural network input or spatial coordinate |
|
||||
| $y$ | neural network output |
|
||||
|
||||
## Summary of the most important abbreviations:
|
||||
|
||||
| ABbreviation | Meaning |
|
||||
| --- | --- |
|
||||
| CNN | Convolutional neural network |
|
||||
| DL | Deep learning |
|
||||
| NN | Neural network |
|
||||
| PBDL | Physics-based deep learning |
|
||||
|
||||
|
||||
|
||||
% test table formatting in markdown
|
||||
% | | Sentence # | Word | POS | Tag |
|
||||
% |---:|:-------------|:-----------|:------|:------|
|
||||
% | 1 | Sentence: 1 | They | PRP | O |
|
||||
% | 2 | Sentence: 1 | marched | VBD | O |
|
||||
|
||||
|
138
overview-equations.md
Normal file
138
overview-equations.md
Normal file
@ -0,0 +1,138 @@
|
||||
Model Equations
|
||||
============================
|
||||
|
||||
overview of PDE models to be used later on ...
|
||||
|
||||
domain $\Omega$, boundary $\Gamma$
|
||||
|
||||
continuous functions, but few assumptions about continuity for now...
|
||||
|
||||
```{admonition} Notation and abbreviations
|
||||
:class: seealso
|
||||
If unsure, please check the summary of our mathematical notation
|
||||
and the abbreviations used inn: {doc}`notation`, at the bottom of the left panel.
|
||||
```
|
||||
|
||||
% \newcommand{\pde}{\mathcal{P}} % PDE ops
|
||||
% \newcommand{\pdec}{\pde_{s}}
|
||||
% \newcommand{\manifsrc}{\mathscr{S}} % coarse / "source"
|
||||
% \newcommand{\pder}{\pde_{R}}
|
||||
% \newcommand{\manifref}{\mathscr{R}}
|
||||
|
||||
% vc - coarse solutions
|
||||
% \renewcommand{\vc}[1]{\vs_{#1}} % plain coarse state at time t
|
||||
% \newcommand{\vcN}{\vs} % plain coarse state without time
|
||||
% vc - coarse solutions, modified by correction
|
||||
% \newcommand{\vct}[1]{\tilde{\vs}_{#1}} % modified / over time at time t
|
||||
% \newcommand{\vctN}{\tilde{\vs}} % modified / over time without time
|
||||
% vr - fine/reference solutions
|
||||
% \renewcommand{\vr}[1]{\mathbf{r}_{#1}} % fine / reference state at time t , never modified
|
||||
% \newcommand{\vrN}{\mathbf{r}} % plain coarse state without time
|
||||
|
||||
% \newcommand{\project}{\mathcal{T}} % transfer operator fine <> coarse
|
||||
% \newcommand{\loss}{\mathcal{L}} % generic loss function
|
||||
% \newcommand{\nn}{f_{\theta}}
|
||||
% \newcommand{\dt}{\Delta t} % timestep
|
||||
% \newcommand{\corrPre}{\mathcal{C}_{\text{pre}}} % analytic correction , "pre computed"
|
||||
% \newcommand{\corr}{\mathcal{C}} % just C for now...
|
||||
% \newcommand{\nnfunc}{F} % {\text{NN}}
|
||||
|
||||
|
||||
Some notation from SoL, move with parts from overview into "appendix"?
|
||||
|
||||
|
||||
|
||||
We typically solve a discretized PDE $\mathcal{P}$ by performing discrete time steps of size $\Delta t$.
|
||||
Each subsequent step can depend on any number of previous steps,
|
||||
$\mathbf{u}(\mathbf{x},t+\Delta t) = \mathcal{P}(\mathbf{u}(\mathbf{x},t), \mathbf{u}(\mathbf{x},t-\Delta t),...)$,
|
||||
where
|
||||
$\mathbf{x} \in \Omega \subseteq \mathbb{R}^d$ for the domain $\Omega$ in $d$
|
||||
dimensions, and $t \in \mathbb{R}^{+}$.
|
||||
|
||||
Numerical methods yield approximations of a smooth function such as $\mathbf{u}$ in a discrete
|
||||
setting and invariably introduce errors. These errors can be measured in terms
|
||||
of the deviation from the exact analytical solution.
|
||||
For discrete simulations of
|
||||
PDEs, these errors are typically expressed as a function of the truncation, $O(\Delta t^k)$
|
||||
for a given step size $\Delta t$ and an exponent $k$ that is discretization dependent.
|
||||
|
||||
The following PDEs typically work with a continuous
|
||||
velocity field $\mathbf{u}$ with $d$ dimensions and components, i.e.,
|
||||
$\mathbf{u}(\mathbf{x},t): \mathbb{R}^d \rightarrow \mathbb{R}^d $.
|
||||
For discretized versions below, $d_{i,j}$ will denote the dimensionality
|
||||
of a field such as the velocity,
|
||||
with domain size $d_{x},d_{y},d_{z}$ for source and reference in 3D.
|
||||
|
||||
% with $i \in \{s,r\}$ denoting source/inference manifold and reference manifold, respectively.
|
||||
%This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$
|
||||
%Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.
|
||||
|
||||
For all PDEs, we use non-dimensional parametrizations as outlined below,
|
||||
and the components of the velocity vector are typically denoted by $x,y,z$ subscripts, i.e.,
|
||||
$\mathbf{u} = (u_x,u_y,u_z)^T$ for $d=3$.
|
||||
|
||||
Burgers' equation in 2D. It represents a well-studied advection-diffusion PDE:
|
||||
|
||||
$\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
|
||||
\nu \nabla\cdot \nabla u_x + g_x(t),
|
||||
\\
|
||||
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
|
||||
\nu \nabla\cdot \nabla u_y + g_y(t)
|
||||
$,
|
||||
|
||||
where $\nu$ and $\mathbf{g}$ denote diffusion constant and external forces, respectively.
|
||||
|
||||
Burgers' equation in 1D without forces with $u_x = u$:
|
||||
%\begin{eqnarray}
|
||||
$\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ .
|
||||
|
||||
---
|
||||
|
||||
Later on, additional equations...
|
||||
|
||||
|
||||
|
||||
Navier-Stokes, in 2D:
|
||||
|
||||
$
|
||||
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
|
||||
- \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_x
|
||||
\\
|
||||
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
|
||||
- \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_y
|
||||
\\
|
||||
\text{subject to} \quad \nabla \cdot \mathbf{u} = 0
|
||||
$
|
||||
|
||||
|
||||
|
||||
Navier-Stokes, in 2D with Boussinesq:
|
||||
|
||||
%$\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x$
|
||||
%$ -\frac{1}{\rho} \nabla p $
|
||||
|
||||
$
|
||||
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p
|
||||
\\
|
||||
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \eta d
|
||||
\\
|
||||
\text{subject to} \quad \nabla \cdot \mathbf{u} = 0,
|
||||
\\
|
||||
\frac{\partial d}{\partial{t}} + \mathbf{u} \cdot \nabla d = 0
|
||||
$
|
||||
|
||||
|
||||
|
||||
Navier-Stokes, in 3D:
|
||||
|
||||
$
|
||||
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_x
|
||||
\\
|
||||
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_y
|
||||
\\
|
||||
\frac{\partial u_z}{\partial{t}} + \mathbf{u} \cdot \nabla u_z = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_z
|
||||
\\
|
||||
\text{subject to} \quad \nabla \cdot \mathbf{u} = 0.
|
||||
$
|
||||
|
||||
|
58
overview.md
58
overview.md
@ -1,12 +1,14 @@
|
||||
Overview
|
||||
============================
|
||||
|
||||
The following "book" of targets _"Physics-Based Deep Learning"_ techniques,
|
||||
i.e., methods that combine physical modeling and numerical simulations with
|
||||
deep learning (DL). Here, DL will typically refer to methods based
|
||||
on artificial neural networks. The general direction of
|
||||
Physics-Based Deep Learning represents a very
|
||||
active, quickly growing and exciting field of research.
|
||||
The following collection of digital documents, i.e. "book",
|
||||
targets _Physics-Based Deep Learning_ techniques.
|
||||
By that we mean combining physical modeling and numerical simulations with
|
||||
methods based on artificial neural networks.
|
||||
The general direction of Physics-Based Deep Learning represents a very
|
||||
active, quickly growing and exciting field of research -- we want to provide
|
||||
a starting point for new researchers as well as a hands-on introduction into
|
||||
state-of-the-art resarch topics.
|
||||
|
||||
## Motivation
|
||||
|
||||
@ -50,8 +52,8 @@ whether key phenomena are visible in the solutions or not.
|
||||
:class: tip
|
||||
Thus, a key aspect that we want to address in the following in the following is:
|
||||
- explain how to use DL,
|
||||
- and how to combine it with existing knowledge of physics and simulations,
|
||||
- **without throwing away** all existing numerical knowledeg and techniques!
|
||||
- how to combine it with existing knowledge of physics and simulations,
|
||||
- **without throwing away** all existing numerical knowledge and techniques!
|
||||
```
|
||||
|
||||
Rather, we want to build on all the neat techniques that we have
|
||||
@ -112,7 +114,7 @@ starting points with code examples, and illustrate pros and cons of the
|
||||
different approaches. In particular, it's important to know in which scenarios
|
||||
each of the different techniques is particularly useful.
|
||||
|
||||
```{admonition} Skip ahead if...
|
||||
```{admonition} You can skip ahead if...
|
||||
:class: tip
|
||||
|
||||
- you're very familiar with numerical methods and PDE solvers, and want to get started with DL topics right away. The _Supervised Learning_ chapter is a good starting point then.
|
||||
@ -138,37 +140,13 @@ PINNs ... and more ...
|
||||
|
||||
## Deep Learning and Neural Networks
|
||||
|
||||
Very brief intro, basic equations... approximate $f(x)=y$ with NN ...
|
||||
Very brief intro, basic equations... approximate $f^*(x)=y$ with NN $f(x;\theta)$ ...
|
||||
|
||||
Details in [Deep Learning book](https://www.deeplearningbook.org)
|
||||
learn via GD, $\partial f / \partial \theta$
|
||||
|
||||
Read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org),
|
||||
especially about [MLPs]https://www.deeplearningbook.org/contents/mlp.html and
|
||||
"Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.html).
|
||||
|
||||
## Notation and Abbreviations
|
||||
|
||||
Unify notation... TODO ...
|
||||
|
||||
Math notation:
|
||||
|
||||
| Symbol | Meaning |
|
||||
| --- | --- |
|
||||
| $x$ | NN input |
|
||||
| $y$ | NN output |
|
||||
| $\theta$ | NN params |
|
||||
|
||||
Quick summary of the most important abbreviations:
|
||||
|
||||
| ABbreviation | Meaning |
|
||||
| --- | --- |
|
||||
| CNN | Convolutional neural network |
|
||||
| DL | Deep learning |
|
||||
| NN | Neural network |
|
||||
| PBDL | Physics-based deep learning |
|
||||
|
||||
|
||||
|
||||
test table formatting in markdown
|
||||
|
||||
| | Sentence # | Word | POS | Tag |
|
||||
|---:|:-------------|:-----------|:------|:------|
|
||||
| 1 | Sentence: 1 | They | PRP | O |
|
||||
| 2 | Sentence: 1 | marched | VBD | O |
|
||||
**Note:** Classic distinction between _classification_ and _regression_ problems not so important here,
|
||||
we only deal with _regression_ problems in the following.
|
||||
|
37
physicalloss-discuss.md
Normal file
37
physicalloss-discuss.md
Normal file
@ -0,0 +1,37 @@
|
||||
Discussion of Physical Soft-Constraints
|
||||
=======================
|
||||
|
||||
The good news so far is - we have a DL method that can include
|
||||
physical laws in the form of soft constraints by minimizing residuals.
|
||||
However, as the very simple previous example illustrates, this is just a conceptual
|
||||
starting point.
|
||||
|
||||
On the positive side, we can leverage DL frameworks with backpropagation to compute
|
||||
the derivatives of the model. At the same time, this puts us at the mercy of the learned
|
||||
representation regarding the reliability of these derivatives. Also, each derivative
|
||||
requires backpropagation through the full network, which can be very slow. Especially so
|
||||
for higher-order derivatives.
|
||||
|
||||
And while the setup is realtively simple, it is generally difficult to control. The NN
|
||||
has flexibility to refine the solution by itself, but at the same time, tricks are necessary
|
||||
when it doesn't pick the right regions of the solution.
|
||||
|
||||
In general, a fundamental drawback of this approach is that it does combine with traditional
|
||||
numerical techniques well. E.g., learned representation is not suitable to be refined with
|
||||
a classical iterative solver such as the conjugate gradient method. This means many
|
||||
powerful techniques that were developed in the past decades cannot be used in this context.
|
||||
Bringing these numerical methods back into the picture will be one of the central
|
||||
goals of the next sections.
|
||||
|
||||
✅ Pro:
|
||||
- uses physical model
|
||||
- derivatives via backpropagation
|
||||
|
||||
❌ Con:
|
||||
- slow ...
|
||||
- only soft constraints
|
||||
- largely incompatible _classical_ numerical methods
|
||||
- derivatives rely on learned representation
|
||||
|
||||
Next, let's look at how we can leverage numerical methods to improve the DL accuracy and efficiency
|
||||
by making use of differentiable solvers.
|
208
physicalloss.md
208
physicalloss.md
@ -1,134 +1,98 @@
|
||||
Physical Loss Terms
|
||||
=======================
|
||||
|
||||
The supervised setting of the previous sections can quickly
|
||||
yield approximate solutions with a fairly simple training process, but what's
|
||||
quite sad to see here is that we only use physical models and numerics
|
||||
as an "external" tool to produce a big pile of data 😢.
|
||||
|
||||
Using the equations now, but no numerical methods!
|
||||
## Using Physical Models
|
||||
|
||||
Still interesting, leverages analytic derivatives of NNs, but lots of problems
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
|
||||
% \newcommand{\pde}{\mathcal{P}} % PDE ops
|
||||
% \newcommand{\pdec}{\pde_{s}}
|
||||
% \newcommand{\manifsrc}{\mathscr{S}} % coarse / "source"
|
||||
% \newcommand{\pder}{\pde_{R}}
|
||||
% \newcommand{\manifref}{\mathscr{R}}
|
||||
|
||||
% vc - coarse solutions
|
||||
% \renewcommand{\vc}[1]{\vs_{#1}} % plain coarse state at time t
|
||||
% \newcommand{\vcN}{\vs} % plain coarse state without time
|
||||
% vc - coarse solutions, modified by correction
|
||||
% \newcommand{\vct}[1]{\tilde{\vs}_{#1}} % modified / over time at time t
|
||||
% \newcommand{\vctN}{\tilde{\vs}} % modified / over time without time
|
||||
% vr - fine/reference solutions
|
||||
% \renewcommand{\vr}[1]{\mathbf{r}_{#1}} % fine / reference state at time t , never modified
|
||||
% \newcommand{\vrN}{\mathbf{r}} % plain coarse state without time
|
||||
|
||||
% \newcommand{\project}{\mathcal{T}} % transfer operator fine <> coarse
|
||||
% \newcommand{\loss}{\mathcal{L}} % generic loss function
|
||||
% \newcommand{\nn}{f_{\theta}}
|
||||
% \newcommand{\dt}{\Delta t} % timestep
|
||||
% \newcommand{\corrPre}{\mathcal{C}_{\text{pre}}} % analytic correction , "pre computed"
|
||||
% \newcommand{\corr}{\mathcal{C}} % just C for now...
|
||||
% \newcommand{\nnfunc}{F} % {\text{NN}}
|
||||
|
||||
|
||||
Some notation from SoL, move with parts from overview into "appendix"?
|
||||
|
||||
|
||||
|
||||
We typically solve a discretized PDE $\mathcal{P}$ by performing discrete time steps of size $\Delta t$.
|
||||
Each subsequent step can depend on any number of previous steps,
|
||||
$\mathbf{u}(\mathbf{x},t+\Delta t) = \mathcal{P}(\mathbf{u}(\mathbf{x},t), \mathbf{u}(\mathbf{x},t-\Delta t),...)$,
|
||||
where
|
||||
$\mathbf{x} \in \Omega \subseteq \mathbb{R}^d$ for the domain $\Omega$ in $d$
|
||||
dimensions, and $t \in \mathbb{R}^{+}$.
|
||||
|
||||
Numerical methods yield approximations of a smooth function such as $\mathbf{u}$ in a discrete
|
||||
setting and invariably introduce errors. These errors can be measured in terms
|
||||
of the deviation from the exact analytical solution.
|
||||
For discrete simulations of
|
||||
PDEs, these errors are typically expressed as a function of the truncation, $O(\Delta t^k)$
|
||||
for a given step size $\Delta t$ and an exponent $k$ that is discretization dependent.
|
||||
|
||||
The following PDEs typically work with a continuous
|
||||
velocity field $\mathbf{u}$ with $d$ dimensions and components, i.e.,
|
||||
$\mathbf{u}(\mathbf{x},t): \mathbb{R}^d \rightarrow \mathbb{R}^d $.
|
||||
For discretized versions below, $d_{i,j}$ will denote the dimensionality
|
||||
of a field such as the velocity,
|
||||
with domain size $d_{x},d_{y},d_{z}$ for source and reference in 3D.
|
||||
|
||||
% with $i \in \{s,r\}$ denoting source/inference manifold and reference manifold, respectively.
|
||||
%This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$
|
||||
%Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.
|
||||
|
||||
For all PDEs, we use non-dimensional parametrizations as outlined below,
|
||||
and the components of the velocity vector are typically denoted by $x,y,z$ subscripts, i.e.,
|
||||
$\mathbf{u} = (u_x,u_y,u_z)^T$ for $d=3$.
|
||||
|
||||
Burgers' equation in 2D. It represents a well-studied advection-diffusion PDE:
|
||||
|
||||
$\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
|
||||
\nu \nabla\cdot \nabla u_x + g_x(t),
|
||||
\\
|
||||
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
|
||||
\nu \nabla\cdot \nabla u_y + g_y(t)
|
||||
We can improve this setting by trying to bring the model equations (or parts thereof)
|
||||
into the training process. E.g., given a PDE for $\mathbf{u}(x,t)$ with a time evolution,
|
||||
we can typically express it in terms of a function $\mathcal F$ of the derivatives
|
||||
of $\mathbf{u}$ via
|
||||
$
|
||||
\mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{x..x})
|
||||
$,
|
||||
where the $_{x}$ subscripts denote spatial derivatives of higher order.
|
||||
|
||||
where $\nu$ and $\mathbf{g}$ denote diffusion constant and external forces, respectively.
|
||||
In this context we can employ DL by approxmating the unknown $\mathbf{u}$ itself
|
||||
with a NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
|
||||
naturally should be satisfied, i.e., the residual $R$ should be equal to zero:
|
||||
$
|
||||
R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{x..x}) = 0
|
||||
$
|
||||
|
||||
Burgers' equation in 1D without forces with $u_x = u$:
|
||||
%\begin{eqnarray}
|
||||
$\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ .
|
||||
This nicely integrates with the objective for training a neural network: similar to before
|
||||
we can collect sample solutions
|
||||
$[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(x)=y$.
|
||||
This is typically important, as most practical PDEs we encounter do not have unique solutions
|
||||
unless initial and boundary conditions are specified. Hence, if we only consider $R$ we might
|
||||
get solutions with random offset or other undesirable components. Hence the supervised sample points
|
||||
help to _pin down_ the solution in certain places.
|
||||
Now our training objective becomes
|
||||
|
||||
$\text{arg min}_{\theta} \ \alpha_0 \sum_i (f(x_i ; \theta)-y_i)^2 + \alpha_1 R(x_i) $,
|
||||
|
||||
where $\alpha_{0,1}$ denote hyper parameters that scale the contribution of the supervised term and
|
||||
the residual term, respectively. We could of course add additional residual terms with suitable scaling factors here.
|
||||
|
||||
Note that, similar to the data samples used for supervised training, we have no guarantees that the
|
||||
residual terms $R$ will actually reach zero during training. The non-linear optimization of the training process
|
||||
will minimize the supervised and residual terms as much as possible, but worst case, large non-zero residual
|
||||
contributions can remain. We'll look at this in more detail in the upcoming code example, for now it's important
|
||||
to remember that physical constraints in this way only represent _soft-constraints_, without guarantees
|
||||
of minimizing these constraints.
|
||||
|
||||
## Neural network derivatives
|
||||
|
||||
In order to compute the residuals at training time, it would be possible to store
|
||||
the unknowns of $\mathbf{u}$ on a computational mesh, e.g., a grid, and discretize the equations of
|
||||
$R$ there. This has a fairly long "tradition" in DL, and was proposed by Tompson et al. {cite}`tompson2017` early on.
|
||||
|
||||
Instead, a more widely used variant of employing physical soft-constraints {cite}`raissi2018hiddenphys`
|
||||
uses fully connected NNs to represent $\mathbf{u}$. This has some interesting pros and cons that we'll outline in the following.
|
||||
Due to the popularity of the version, we'll also focus on it in the following code examples and comparisons.
|
||||
|
||||
The central idea here is that the aforementioned general function $f$ that we're after in our learning problems
|
||||
can be seen as a representation of a physical field we're after. Thus, the $\mathbf{u}(x)$ will
|
||||
be turned into $\mathbf{u}(x, \theta)$ where we choose $\theta$ such that the solution to $\mathbf{u}$ is
|
||||
represented as precisely as possible.
|
||||
|
||||
One nice side effect of this viewpoint is that NN representations inherently support the calculation of derivatives.
|
||||
The derivative $\partial f / \partial \theta$ was a key building block for learning via gradient descent, as explained
|
||||
in {doc}`overview`. Here, we can use the same tools to compute spatial derivatives such as $\partial \mathbf{u} / \partial x$,
|
||||
Note that above for $R$ we've written this derivative in the shortened notation as $\mathbf{u}_{x}$.
|
||||
For functions over time this of course also works for $\partial \mathbf{u} / \partial t$, i.e. $\mathbf{u}_{t}$ in the notation above.
|
||||
|
||||
Thus, for some generic $R$, made up of $\mathbf{u}_t$ and $\mathbf{u}_{x}$ terms, we can rely on the back-propagation algorithm
|
||||
of DL frameworks to compute these derivatives once we have a NN that represents $\mathbf{u}$. Essentially, this gives us a
|
||||
function (the NN) that receives space and time coordinates to produce a solution for $\mathbf{u}$. Hence, the input is typically
|
||||
quite low-dimensional, e.g., 3+1 values for a 3D case over time, and often produces a scalar value or a spatial vector.
|
||||
Due to the lack of explicit spatial sampling points, an MLP, i.e., fully-connected NN is the architecture of choice here.
|
||||
|
||||
To pick a simple example, Burgers equation in 1D,
|
||||
$\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ , we can directly
|
||||
formulate a loss term $R = \frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} - \nu \frac{\partial^2 u}{\partial x^2} u$ that should be minimized as much as possible at training time. For each of the terms, e.g. $\frac{\partial u}{\partial x}$,
|
||||
we can simply query the DL framework that realizes $u$ to obtain the corresponding derivative.
|
||||
For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we can typically simply query the derivative function of the framework twice. In the following section, we'll give a specific example of how that works in tensorflow.
|
||||
|
||||
|
||||
## Summary so far
|
||||
|
||||
This gives us a method to include physical equations into DL learning as a soft-constraint.
|
||||
Typically, this setup is suitable for _inverse_ problems, where we have certain measurements or observations
|
||||
that we wish to find a solution of a model PDE for. Because of the high expense of the reconstruction (to be
|
||||
demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult
|
||||
to capture a wide range of solutions, such as the previous supervised airfoil example, in this way.
|
||||
|
||||
```{figure} resources/placeholder.png
|
||||
---
|
||||
|
||||
Later on, additional equations...
|
||||
height: 220px
|
||||
name: pinn-training
|
||||
---
|
||||
TODO, visual overview of PINN training
|
||||
```
|
||||
|
||||
|
||||
|
||||
Navier-Stokes, in 2D:
|
||||
|
||||
$
|
||||
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
|
||||
- \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_x
|
||||
\\
|
||||
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
|
||||
- \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_y
|
||||
\\
|
||||
\text{subject to} \quad \nabla \cdot \mathbf{u} = 0
|
||||
$
|
||||
|
||||
|
||||
|
||||
Navier-Stokes, in 2D with Boussinesq:
|
||||
|
||||
%$\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x$
|
||||
%$ -\frac{1}{\rho} \nabla p $
|
||||
|
||||
$
|
||||
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p
|
||||
\\
|
||||
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \eta d
|
||||
\\
|
||||
\text{subject to} \quad \nabla \cdot \mathbf{u} = 0,
|
||||
\\
|
||||
\frac{\partial d}{\partial{t}} + \mathbf{u} \cdot \nabla d = 0
|
||||
$
|
||||
|
||||
|
||||
|
||||
Navier-Stokes, in 3D:
|
||||
|
||||
$
|
||||
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_x
|
||||
\\
|
||||
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_y
|
||||
\\
|
||||
\frac{\partial u_z}{\partial{t}} + \mathbf{u} \cdot \nabla u_z = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_z
|
||||
\\
|
||||
\text{subject to} \quad \nabla \cdot \mathbf{u} = 0.
|
||||
$
|
||||
|
@ -762,3 +762,34 @@
|
||||
PUBLISHER = {Dept. of Computer Science 10, University of Erlangen-Nuremberg}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
% ----------------- external --------------------
|
||||
|
||||
|
||||
@inproceedings{tompson2017,
|
||||
title = {Accelerating Eulerian Fluid Simulation With Convolutional Networks},
|
||||
booktitle = {Proceedings of Machine Learning Research},
|
||||
author = {Tompson, Jonathan and Schlachter, Kristofer and Sprechmann, Pablo and Perlin, Ken},
|
||||
year = 2017,
|
||||
pages = {3424--3433}
|
||||
}
|
||||
|
||||
@article{raissi2018hiddenphys,
|
||||
title={Hidden physics models: Machine learning of nonlinear partial differential equations},
|
||||
author={Raissi, Maziar and Karniadakis, George Em},
|
||||
journal={Journal of Computational Physics},
|
||||
volume={357},
|
||||
pages={125--141},
|
||||
year={2018},
|
||||
publisher={Elsevier}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
BIN
resources/placeholder.png
Normal file
BIN
resources/placeholder.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 20 KiB |
@ -1,5 +1,85 @@
|
||||
Supervised Learning
|
||||
=======================
|
||||
|
||||
Doing things the old fashioned way...
|
||||
_Supervised_ here essentially means: "doing things the old fashioned way". Old fashioned in the context of
|
||||
deep learning (DL), of course, so it's still fairly new, and old fashioned of course also doesn't always mean bad.
|
||||
In a way this viewpoint is a starting point for all projects one would encounter in the context of DL, and
|
||||
hence is worth studying. And although it typically yields inferior results to approaches that more tightly
|
||||
couple with physics, it nonetheless can be the only choice in certain application scenarios where no good
|
||||
model equations exist.
|
||||
|
||||
## Problem Setting
|
||||
|
||||
For supervised learning, we're faced with an
|
||||
unknown function $f^*(x)=y$, collect lots of pairs of data $[x_0,y_0], ...[x_n,y_n]$ (the training data set)
|
||||
and directly train a NN to represent an approximation of $f^*$ denoted as $f$, such
|
||||
that $f(x)=y$.
|
||||
|
||||
The $f$ we can obtain is typically not exact,
|
||||
but instead we obtain it via a minimization problem:
|
||||
by adjusting weights $\theta$ of our representation with $f$ such that
|
||||
|
||||
$\text{arg min}_{\theta} \sum_i (f(x_i ; \theta)-y_i)^2$.
|
||||
|
||||
This will give us $\theta$ such that $f(x;\theta) \approx y$ as accurately as possible given
|
||||
our choice of $f$ and the hyper parameters for training. Note that above we've assumed
|
||||
the simplest case of an $L^2$ loss. A more general version would use an error metric $e(x,y)$
|
||||
to be minimized via $\text{arg min}_{\theta} \sum_i e( f(x_i ; \theta) , y_i) )$. The choice
|
||||
of a suitable metric is topic we will get back to later on.
|
||||
|
||||
Irrespective of our choice of metric, this formulation
|
||||
gives the actual "learning" process for a supervised approach.
|
||||
|
||||
The training data typically needs to be of substantial size, and hence it is attractive
|
||||
to use numerical simulations to produce a large number of training input-output pairs.
|
||||
This means that the training process uses a set of model equations, and approximates
|
||||
them numerically, in order to train the NN representation $\tilde{f}$. This
|
||||
has a bunch of advantages, e.g., we don't have measurement noise of real-world devices
|
||||
and we don't need manual labour to annotate a large number of samples to get training data.
|
||||
|
||||
On the other hand, this approach inherits the common challenges of replacing experiments
|
||||
with simulations: first, we need to ensure the chosen model has enough power to predict the
|
||||
bheavior of real-world phenomena that we're interested in.
|
||||
In addition, the numerical approximations have numerical errors
|
||||
which need to be kept small enough for a chosen application. As these topics are studied in depth
|
||||
for classical simulations, the existing knowledge can likewise be leveraged to
|
||||
set up DL training tasks.
|
||||
|
||||
```{figure} resources/placeholder.png
|
||||
---
|
||||
height: 220px
|
||||
name: supervised-training
|
||||
---
|
||||
TODO, visual overview of supervised training
|
||||
```
|
||||
|
||||
## Applications
|
||||
|
||||
Let's directly look at an example with a fairly complicated context:
|
||||
we have a turbulent airflow around wing profiles, and we'd like to know the average motion
|
||||
and pressure distribution around this airfoil for different Reynolds numbers and angles of attack.
|
||||
Thus, given an airfoil shape, Reynolds numbers, and angle of attack, we'd like to obtain
|
||||
a velocity field $\mathbf{u}$ and a pressure field $p$ in a computational domain $\Omega$
|
||||
around the airfoil in the center of $\Omega$.
|
||||
|
||||
This is classically approximated with _Reynolds-Averaged Navier Stokes_ (RANS) models, and this
|
||||
setting is still one of the most widely used applications of Navier-Stokes solver in industry.
|
||||
However, instead of relying on traditional numerical methods to solve the RANS equations,
|
||||
we know aim for training a neural network that completely bypasses the numerical solver,
|
||||
and produces the solution in terms of $\mathbf{u}$ and $p$.
|
||||
|
||||
## Discussion
|
||||
|
||||
TODO , add as separate section after code?
|
||||
TODO , discuss pros / cons of supervised learning
|
||||
TODO , CNNs powerful, graphs & co likewise possible
|
||||
|
||||
Pro:
|
||||
- very fast output and training
|
||||
|
||||
Con:
|
||||
- lots of data needed
|
||||
- undesirable averaging / inaccuracies due to direct loss
|
||||
|
||||
Outlook: interactions with external "processes" (such as embedding into a solver) very problematic, see DP later on...
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user