updated overview equations

This commit is contained in:
NT
2021-05-09 16:45:10 +08:00
parent 4060dc90c3
commit 094bd5e0b8
3 changed files with 55 additions and 45 deletions

View File

@@ -1,15 +1,15 @@
Models and Equations Models and Equations
============================ ============================
Below we'll give a very (really _very_!) brief intro to deep learning, primarily to introduce the notation. Below we'll give a brief (really _very_ brief!) intro to deep learning, primarily to introduce the notation.
In addition we'll discuss some _model equations_ below. Note that we won't use _model_ to denote trained neural networks, in contrast to some other texts. These will only be called "NNs" or "networks". A "model" will always denote model equations for a physical effect, typically a PDE. In addition we'll discuss some _model equations_ below. Note that we won't use _model_ to denote trained neural networks, in contrast to some other texts. These will only be called "NNs" or "networks". A "model" will always denote a set of model equations for a physical effect, typically PDEs.
## Deep learning and neural networks ## Deep learning and neural networks
In this book we focus on the connection with physical In this book we focus on the connection with physical
models, and there are lots of great introductions to deep learning. models, and there are lots of great introductions to deep learning.
Hence, we'll keep it short: Hence, we'll keep it short:
our goal is to approximate an unknown function the goal in deep learning is to approximate an unknown function
$$ $$
f^*(x) = y^* , f^*(x) = y^* ,
@@ -17,7 +17,7 @@ $$ (learn-base)
where $y^*$ denotes reference or "ground truth" solutions. where $y^*$ denotes reference or "ground truth" solutions.
$f^*(x)$ should be approximated with an NN representation $f(x;\theta)$. We typically determine $f$ $f^*(x)$ should be approximated with an NN representation $f(x;\theta)$. We typically determine $f$
with the help of some formulation of an error function $e(y,y^*)$, where $y=f(x;\theta)$ is the output with the help of some variant of an error function $e(y,y^*)$, where $y=f(x;\theta)$ is the output
of the NN. of the NN.
This gives a minimization problem to find $f(x;\theta)$ such that $e$ is minimized. This gives a minimization problem to find $f(x;\theta)$ such that $e$ is minimized.
In the simplest case, we can use an $L^2$ error, giving In the simplest case, we can use an $L^2$ error, giving
@@ -27,11 +27,11 @@ $$
$$ (learn-l2) $$ (learn-l2)
We typically optimize, i.e. _train_, We typically optimize, i.e. _train_,
with some variant of a stochastic gradient descent (SGD) optimizer. with a stochastic gradient descent (SGD) optimizer of your choice, e.g., Adam {cite}`kingma2014adam`.
We'll rely on auto-diff to compute the gradient w.r.t. weights, $\partial f / \partial \theta$, We'll rely on auto-diff to compute the gradient w.r.t. weights, $\partial f / \partial \theta$,
We will also assume that $e$ denotes a _scalar_ error function (also We will also assume that $e$ denotes a _scalar_ error function (also
called cost, or objective function sometimes). called cost, or objective function).
This is crucial for the efficient calculation of gradients. It is crucial for the efficient calculation of gradients that this function is scalar.
<!-- general goal, minimize E for e(x,y) ... cf. eq. 8.1 from DLbook <!-- general goal, minimize E for e(x,y) ... cf. eq. 8.1 from DLbook
introduce scalar loss, always(!) scalar... (also called *cost* or *objective* function) --> introduce scalar loss, always(!) scalar... (also called *cost* or *objective* function) -->
@@ -41,9 +41,10 @@ the **validation** set (from the same distribution, but different data),
and **test** data sets with _some_ different distribution than the training one. and **test** data sets with _some_ different distribution than the training one.
The latter distinction is important! For the test set we want The latter distinction is important! For the test set we want
_out of distribution_ (OOD) data to check how well our trained model generalizes. _out of distribution_ (OOD) data to check how well our trained model generalizes.
Note that this gives a huge range of difficulties: from tiny changes that will certainly work Note that this gives a huge range of possibilities for the test data set:
up to completely different inputs that are essentially guaranteed to fail. Hence, from tiny changes that will certainly work,
test data should be generated with care. up to completely different inputs that are essentially guaranteed to fail.
There's no gold standard, but test data should be generated with care.
Enough for now - if all the above wasn't totally obvious for you, we very strongly recommend to Enough for now - if all the above wasn't totally obvious for you, we very strongly recommend to
read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org), read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org),
@@ -67,9 +68,13 @@ Also interesting: from a math standpoint ''just'' non-linear optimization ...
The following section will give a brief outlook for the model equations The following section will give a brief outlook for the model equations
we'll be using later on in the DL examples. we'll be using later on in the DL examples.
We typically target continuous PDEs denoted by $\mathcal P^*$ We typically target continuous PDEs denoted by $\mathcal P^*$
whole solutions is of interest in a spatial domain $\Omega$ in $d$ dimensions, i.e. whose solution is of interest in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
for positions $\mathbf{x} \in \Omega \subseteq \mathbb{R}^d$. In addition, wo often consider a time evolution for a finite time interval $t \in \mathbb{R}^{+}$.
In addition, wo often consider a time evolution for $t \in \mathbb{R}^{+}$. The corresponding fields are either d-dimensional vector fields, e.g. $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$,
or scalar $\mathbf{p}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}$.
The components of a vector are typically denoted by $x,y,z$ subscripts, i.e.,
$\mathbf{v} = (v_x, v_y, v_z)^T$ for $d=3$, while
positions are denoted by $\mathbf{x} \in \Omega$.
To obtain unique solutions for $\mathcal P^*$ we need to specify suitable To obtain unique solutions for $\mathcal P^*$ we need to specify suitable
initial conditions, typically for all quantities of interest at $t=0$, initial conditions, typically for all quantities of interest at $t=0$,
@@ -91,7 +96,7 @@ Likewise, we typically have a temporal discretization via a time step $\Delta t$
```{admonition} Notation and abbreviations ```{admonition} Notation and abbreviations
:class: seealso :class: seealso
If unsure, please check the summary of our mathematical notation If unsure, please check the summary of our mathematical notation
and the abbreviations used inn: {doc}`notation`, at the bottom of the left panel. and the abbreviations used inn: {doc}`notation`.
``` ```
% \newcommand{\pde}{\mathcal{P}} % PDE ops % \newcommand{\pde}{\mathcal{P}} % PDE ops
@@ -123,18 +128,10 @@ and the abbreviations used inn: {doc}`notation`, at the bottom of the left panel
%This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$ %This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$
%Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$. %Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.
We typically solve a discretized PDE $\mathcal{P}$ by performing steps of size $\Delta t$. We solve a discretized PDE $\mathcal{P}$ by performing steps of size $\Delta t$.
For a quantity of interest $\mathbf{u}$, e.g., representing a velocity field
in $d$ dimensions via $\mathbf{u}(\mathbf{x},t): \mathbb{R}^d \rightarrow \mathbb{R}^d $.
The components of the velocity vector are typically denoted by $x,y,z$ subscripts, i.e.,
$\mathbf{u} = (u_x,u_y,u_z)^T$ for $d=3$.
The solution can be expressed as a function of $\mathbf{u}$ and its derivatives: The solution can be expressed as a function of $\mathbf{u}$ and its derivatives:
$\mathbf{u}(\mathbf{x},t+\Delta t) = $\mathbf{u}(\mathbf{x},t+\Delta t) =
\mathcal{P}(\mathbf{u}(\mathbf{x},t), \mathbf{u}(\mathbf{x},t-\Delta t)',\mathbf{u}(\mathbf{x},t-\Delta t)'',...)$, \mathcal{P}(\mathbf{u}(\mathbf{x},t), \mathbf{u}(\mathbf{x},t)',\mathbf{u}(\mathbf{x},t)'',...)$.
where
$\mathbf{x} \in \Omega \subseteq \mathbb{R}^d$ for the domain $\Omega$ in $d$
dimensions, and $t \in \mathbb{R}^{+}$.
For all PDEs, we will assume non-dimensional parametrizations as outlined below, For all PDEs, we will assume non-dimensional parametrizations as outlined below,
which could be re-scaled to real world quantities with suitable scaling factors. which could be re-scaled to real world quantities with suitable scaling factors.
@@ -151,25 +148,30 @@ The following PDEs are good examples, and we'll use them later on in different s
We'll often consider Burgers' equation We'll often consider Burgers' equation
in 1D or 2D as a starting point. in 1D or 2D as a starting point.
It represents a well-studied advection-diffusion PDE, which (unlike Navier-Stokes) It represents a well-studied PDE, which (unlike Navier-Stokes)
does not include any additional constraints such as conservation of mass. Hence, does not include any additional constraints such as conservation of mass.
it leads to interesting shock formations. Hence, it leads to interesting shock formations.
It contains an advection term (motion / transport) and a diffusion term (dissipation due to the second law of thermodynamics).
In 2D, it is given by: In 2D, it is given by:
$\begin{aligned} $$\begin{aligned}
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &=
\nu \nabla\cdot \nabla u_x + g_x(t), \nu \nabla\cdot \nabla u_x + g_x,
\\ \\
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y &= \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y &=
\nu \nabla\cdot \nabla u_y + g_y(t) \nu \nabla\cdot \nabla u_y + g_y \ ,
\end{aligned}$, \end{aligned}$$ (model-burgers2d)
where $\nu$ and $\mathbf{g}$ denote diffusion constant and external forces, respectively. where $\nu$ and $\mathbf{g}$ denote diffusion constant and external forces, respectively.
A simpler variants of Burgers' equation in 1D without forces, i.e. with $u_x = u$ A simpler variant of Burgers' equation in 1D without forces,
denoting the single 1D velocity component as $u = u_x$,
is given by: is given by:
%\begin{eqnarray} %\begin{eqnarray}
$\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ .
$$
\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u \ .
$$ (model-burgers1d)
### Navier-Stokes ### Navier-Stokes
@@ -182,7 +184,7 @@ in the form of a hard-constraint for divergence free motions.
In 2D, the Navier-Stokes equations without any external forces can be written as: In 2D, the Navier-Stokes equations without any external forces can be written as:
$\begin{aligned} $$\begin{aligned}
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &=
- \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_x - \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_x
\\ \\
@@ -190,30 +192,31 @@ $\begin{aligned}
- \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_y - \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_y
\\ \\
\text{subject to} \quad \nabla \cdot \mathbf{u} &= 0 \text{subject to} \quad \nabla \cdot \mathbf{u} &= 0
\end{aligned}$ \end{aligned}$$ (model-ns2d)
where, like before, $\nu$ denotes a diffusion constant for viscosity. where, like before, $\nu$ denotes a diffusion constant for viscosity.
An interesting variant is obtained by including the Boussinesq approximation An interesting variant is obtained by including the
[Boussinesq approximation](https://en.wikipedia.org/wiki/Boussinesq_approximation_(buoyancy))
for varying densities, e.g., for simple temperature changes of the fluid. for varying densities, e.g., for simple temperature changes of the fluid.
With a marker field $d$, e.g., representing indicating regions of high temperature, With a marker field $v$, e.g., indicating regions of high temperature,
this yields the following set of equations: this yields the following set of equations:
$\begin{aligned} $$\begin{aligned}
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= - \frac{1}{\rho} \nabla p \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= - \frac{1}{\rho} \nabla p
\\ \\
\frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y &= - \frac{1}{\rho} \nabla p + \xi d \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y &= - \frac{1}{\rho} \nabla p + \xi v
\\ \\
\text{subject to} \quad \nabla \cdot \mathbf{u} &= 0, \text{subject to} \quad \nabla \cdot \mathbf{u} &= 0,
\\ \\
\frac{\partial d}{\partial{t}} + \mathbf{u} \cdot \nabla d &= 0 \frac{\partial v}{\partial{t}} + \mathbf{u} \cdot \nabla v &= 0
\end{aligned}$ \end{aligned}$$ (model-boussinesq2d)
where $\xi$ denotes the strength of the buoyancy force. where $\xi$ denotes the strength of the buoyancy force.
And finally, the Navier-Stokes model in 3D give the following set of equations: And finally, the Navier-Stokes model in 3D give the following set of equations:
$ $$
\begin{aligned} \begin{aligned}
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_x \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_x
\\ \\
@@ -223,7 +226,7 @@ $
\\ \\
\text{subject to} \quad \nabla \cdot \mathbf{u} &= 0. \text{subject to} \quad \nabla \cdot \mathbf{u} &= 0.
\end{aligned} \end{aligned}
$ $$ (model-ns3d)
## Forward Simulations ## Forward Simulations

View File

@@ -108,7 +108,8 @@ observations).
No matter whether we're considering forward or inverse problem, No matter whether we're considering forward or inverse problem,
the most crucial differentiation for the following topics lies in the the most crucial differentiation for the following topics lies in the
nature of the integration between DL techniques nature of the integration between DL techniques
and the domain knowledge, typically in the form of model equations. and the domain knowledge, typically in the form of model equations
via partial differential equations (PDEs).
Taking a global perspective, the following three categories can be Taking a global perspective, the following three categories can be
identified to categorize _physics-based deep learning_ (PBDL) identified to categorize _physics-based deep learning_ (PBDL)
techniques: techniques:
@@ -164,7 +165,7 @@ A brief look at our _notation_ in the {doc}`notation` chapter won't hurt in both
This text also represents an introduction to a wide range of deep learning and simulation APIs. This text also represents an introduction to a wide range of deep learning and simulation APIs.
We'll use popular deep learning APIs such as _pytorch_ [https://pytorch.org](https://pytorch.org) and _tensorflow_ [https://www.tensorflow.org](https://www.tensorflow.org), and additionally We'll use popular deep learning APIs such as _pytorch_ [https://pytorch.org](https://pytorch.org) and _tensorflow_ [https://www.tensorflow.org](https://www.tensorflow.org), and additionally
give introductions into the differentiable simulation framework _phiflow_ [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow). Some examples also use _JAX_ [https://github.com/google/jax](https://github.com/google/jax). Thus after going through give introductions into the differentiable simulation framework _Φ<sub>Flow</sub> (phiflow)_ [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow). Some examples also use _JAX_ [https://github.com/google/jax](https://github.com/google/jax). Thus after going through
these examples, you should have a good overview of what's available in current APIs, such that these examples, you should have a good overview of what's available in current APIs, such that
the best one can be selected for new tasks. the best one can be selected for new tasks.

View File

@@ -855,4 +855,10 @@
year={2020}, year={2020},
} }
@article{kingma2014adam,
title={Adam: A method for stochastic optimization},
author={Kingma, Diederik P and Ba, Jimmy},
journal={arXiv preprint arXiv:1412.6980},
year={2014}
}