more text

2021-01-12 11:50:42 +08:00
parent 03a4c7ef29
commit 0063c71c05
12 changed files with 447 additions and 169 deletions
--- a/physicalloss.md
+++ b/physicalloss.md
@@ -1,134 +1,98 @@
 Physical Loss Terms
 =======================

+The supervised setting of the previous sections can quickly 
+yield approximate solutions with a fairly simple training process, but what's
+quite sad to see here is that we only use physical models and numerics
+as an "external" tool to produce a big pile of data 😢.

-Using the equations now, but no numerical methods!
+## Using Physical Models

-Still interesting, leverages analytic derivatives of NNs, but lots of problems
+We can improve this setting by trying to bring the model equations (or parts thereof)
+into the training process. E.g., given a PDE for $\mathbf{u}(x,t)$ with a time evolution, 
+we can typically express it in terms of a function $\mathcal F$ of the derivatives 
+of $\mathbf{u}$ via  
+$
+  \mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{x..x})
+$,
+where the $_{x}$ subscripts denote spatial derivatives of higher order.
+
+In this context we can employ DL by approxmating the unknown $\mathbf{u}$ itself 
+with a NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
+naturally should be satisfied, i.e., the residual $R$ should be equal to zero: 
+$
+  R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{x..x}) = 0
+$
+
+This nicely integrates with the objective for training a neural network: similar to before
+we can collect sample solutions 
+$[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(x)=y$. 
+This is typically important, as most practical PDEs we encounter do not have unique solutions
+unless initial and boundary conditions are specified. Hence, if we only consider $R$ we might
+get solutions with random offset or other undesirable components. Hence the supervised sample points
+help to _pin down_ the solution in certain places.
+Now our training objective becomes
+
+$\text{arg min}_{\theta} \ \alpha_0 \sum_i (f(x_i ; \theta)-y_i)^2 + \alpha_1 R(x_i) $,
+
+where $\alpha_{0,1}$ denote hyper parameters that scale the contribution of the supervised term and 
+the residual term, respectively. We could of course add additional residual terms with suitable scaling factors here.
+
+Note that, similar to the data samples used for supervised training, we have no guarantees that the
+residual terms $R$ will actually reach zero during training. The non-linear optimization of the training process
+will minimize the supervised and residual terms as much as possible, but worst case, large non-zero residual 
+contributions can remain. We'll look at this in more detail in the upcoming code example, for now it's important 
+to remember that physical constraints in this way only represent _soft-constraints_, without guarantees
+of minimizing these constraints.
+
+## Neural network derivatives
+
+In order to compute the residuals at training time, it would be possible to store 
+the unknowns of $\mathbf{u}$ on a computational mesh, e.g., a grid, and discretize the equations of
+$R$ there. This has a fairly long "tradition" in DL, and was proposed by Tompson et al. {cite}`tompson2017` early on.
+
+Instead, a more widely used variant of employing physical soft-constraints {cite}`raissi2018hiddenphys`
+uses fully connected NNs to represent $\mathbf{u}$. This has some interesting pros and cons that we'll outline in the following.
+Due to the popularity of the version, we'll also focus on it in the following code examples and comparisons.
+
+The central idea here is that the aforementioned general function $f$ that we're after in our learning problems
+can be seen as a representation of a physical field we're after. Thus, the $\mathbf{u}(x)$ will 
+be turned into $\mathbf{u}(x, \theta)$ where we choose $\theta$ such that the solution to $\mathbf{u}$ is 
+represented as precisely as possible.
+
+One nice side effect of this viewpoint is that NN representations inherently support the calculation of derivatives. 
+The derivative $\partial f / \partial \theta$ was a key building block for learning via gradient descent, as explained 
+in {doc}`overview`. Here, we can use the same tools to compute spatial derivatives such as $\partial \mathbf{u} / \partial x$,
+Note that above for $R$ we've written this derivative in the shortened notation as $\mathbf{u}_{x}$.
+For functions over time this of course also works for $\partial \mathbf{u} / \partial t$, i.e. $\mathbf{u}_{t}$ in the notation above.
+
+Thus, for some generic $R$, made up of $\mathbf{u}_t$ and $\mathbf{u}_{x}$ terms, we can rely on the back-propagation algorithm
+of DL frameworks to compute these derivatives once we have a NN that represents $\mathbf{u}$. Essentially, this gives us a 
+function (the NN) that receives space and time coordinates to produce a solution for $\mathbf{u}$. Hence, the input is typically
+quite low-dimensional, e.g., 3+1 values for a 3D case over time, and often produces a scalar value or a spatial vector.
+Due to the lack of explicit spatial sampling points, an MLP, i.e., fully-connected NN is the architecture of choice here.
+
+To pick a simple example, Burgers equation in 1D,
+$\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ , we can directly
+formulate a loss term $R = \frac{\partial u}{\partial t} + u \frac{\partial u}{\partial x} - \nu \frac{\partial^2 u}{\partial x^2} u$ that should be minimized as much as possible at training time. For each of the terms, e.g. $\frac{\partial u}{\partial x}$,
+we can simply query the DL framework that realizes $u$ to obtain the corresponding derivative. 
+For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we can typically simply query the derivative function of the framework twice. In the following section, we'll give a specific example of how that works in tensorflow.


+## Summary so far

+This gives us a method to include physical equations into DL learning as a soft-constraint.
+Typically, this setup is suitable for _inverse_ problems, where we have certain measurements or observations
+that we wish to find a solution of a model PDE for. Because of the high expense of the reconstruction (to be 
+demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult 
+to capture a wide range of solutions, such as the previous supervised airfoil example, in this way.
+
+```{figure} resources/placeholder.png
 ---
-
-
-% \newcommand{\pde}{\mathcal{P}}         % PDE ops
-% \newcommand{\pdec}{\pde_{s}}
-% \newcommand{\manifsrc}{\mathscr{S}}    % coarse / "source"
-% \newcommand{\pder}{\pde_{R}}
-% \newcommand{\manifref}{\mathscr{R}}
-
-% vc - coarse solutions
-% \renewcommand{\vc}[1]{\vs_{#1}}            % plain coarse state at time t
-% \newcommand{\vcN}{\vs}                     % plain coarse state without time 
-% vc - coarse solutions, modified by correction
-% \newcommand{\vct}[1]{\tilde{\vs}_{#1}}     % modified / over time at time t
-% \newcommand{\vctN}{\tilde{\vs}}            % modified / over time without time
-% vr - fine/reference solutions
-% \renewcommand{\vr}[1]{\mathbf{r}_{#1}}            % fine / reference state at time t , never modified
-% \newcommand{\vrN}{\mathbf{r}}                     % plain coarse state without time 
-
-% \newcommand{\project}{\mathcal{T}}           % transfer operator fine <> coarse
-% \newcommand{\loss}{\mathcal{L}}              % generic loss function
-% \newcommand{\nn}{f_{\theta}}
-% \newcommand{\dt}{\Delta t}                   % timestep
-% \newcommand{\corrPre}{\mathcal{C}_{\text{pre}}}            % analytic correction , "pre computed"
-% \newcommand{\corr}{\mathcal{C}}                         % just C for now...
-% \newcommand{\nnfunc}{F} % {\text{NN}}
-
-
-Some notation from SoL, move with parts from overview into "appendix"?
-
-
-
-We typically solve a discretized PDE $\mathcal{P}$ by performing discrete time steps of size $\Delta t$. 
-Each subsequent step can depend on any number of previous steps,
-$\mathbf{u}(\mathbf{x},t+\Delta t) = \mathcal{P}(\mathbf{u}(\mathbf{x},t), \mathbf{u}(\mathbf{x},t-\Delta t),...)$, 
-where
-$\mathbf{x} \in \Omega \subseteq \mathbb{R}^d$ for the domain $\Omega$ in $d$
-dimensions, and $t \in \mathbb{R}^{+}$.
-
-Numerical methods yield approximations of a smooth function such as $\mathbf{u}$ in a discrete
-setting and invariably introduce errors. These errors can be measured in terms
-of the deviation from the exact analytical solution.
-For discrete simulations of
-PDEs, these errors are typically expressed as a function of the truncation, $O(\Delta t^k)$ 
-for a given step size $\Delta t$ and an exponent $k$ that is discretization dependent.
-
-The following PDEs typically work with a continuous
-velocity field $\mathbf{u}$ with $d$ dimensions and components, i.e.,
-$\mathbf{u}(\mathbf{x},t): \mathbb{R}^d \rightarrow \mathbb{R}^d $.
-For discretized versions below, $d_{i,j}$ will denote the dimensionality
-of a field such as the velocity,
-with domain size $d_{x},d_{y},d_{z}$ for source and reference in 3D.
-
-% with $i \in \{s,r\}$ denoting source/inference manifold and reference manifold, respectively.
-%This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$
-%Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.
-
-For all PDEs, we use non-dimensional parametrizations as outlined below,
-and the components of the velocity vector are typically denoted by $x,y,z$ subscripts, i.e.,
-$\mathbf{u} = (u_x,u_y,u_z)^T$ for $d=3$.
-
-Burgers' equation in 2D. It represents a well-studied advection-diffusion PDE:
-
-$\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
-  \nu \nabla\cdot \nabla u_x + g_x(t), 
-  \\
-  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
-  \nu \nabla\cdot \nabla u_y + g_y(t)
-$, 
-
-where $\nu$ and $\mathbf{g}$ denote diffusion constant and external forces, respectively.
-
-Burgers' equation in 1D without forces with $u_x = u$:
-%\begin{eqnarray}
-$\frac{\partial u}{\partial{t}} + u \nabla u = \nu \nabla \cdot \nabla u $ .
-
+height: 220px
+name: pinn-training
 ---
-
-Later on, additional equations...
+TODO, visual overview of PINN training
+```


-
-Navier-Stokes, in 2D:
-
-$
-    \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x =
-    - \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_x  
-    \\
-    \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y =
-    - \frac{1}{\rho}\nabla{p} + \nu \nabla\cdot \nabla u_y  
-    \\
-    \text{subject to} \quad \nabla \cdot \mathbf{u} = 0
-$
-
-
-
-Navier-Stokes, in 2D with Boussinesq:
-
-%$\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x$
-%$ -\frac{1}{\rho} \nabla p $
-
-$
-  \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p 
-  \\
-  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \eta d
-  \\
-  \text{subject to} \quad \nabla \cdot \mathbf{u} = 0,
-  \\
-  \frac{\partial d}{\partial{t}} + \mathbf{u} \cdot \nabla d = 0 
-$
-
-
-
-Navier-Stokes, in 3D:
-
-$
-  \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_x 
-  \\
-  \frac{\partial u_y}{\partial{t}} + \mathbf{u} \cdot \nabla u_y = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_y 
-  \\
-  \frac{\partial u_z}{\partial{t}} + \mathbf{u} \cdot \nabla u_z = - \frac{1}{\rho} \nabla p + \nu \nabla\cdot \nabla u_z 
-  \\
-  \text{subject to} \quad \nabla \cdot \mathbf{u} = 0.
-$