udpated notation, into control
This commit is contained in:
parent
3017836c8c
commit
42061e7d00
File diff suppressed because one or more lines are too long
@ -57,9 +57,9 @@
|
||||
"$\\newcommand{\\vr}[1]{\\mathbf{r}_{#1}} \\vr{t+n}$. \n",
|
||||
"This is what we will address with an NN in the following.\n",
|
||||
"\n",
|
||||
"We'll use an $L^2$-norm in the following to quantify the deviations, i.e., \n",
|
||||
"We'll use an $L^2$-norm in the following to quantify the deviations, i.e., an error function \n",
|
||||
"$\n",
|
||||
"\\newcommand{\\loss}{\\mathcal{L}} \n",
|
||||
"\\newcommand{\\loss}{e} \n",
|
||||
"\\newcommand{\\corr}{\\mathcal{C}} \n",
|
||||
"\\newcommand{\\vc}[1]{\\mathbf{s}_{#1}} \n",
|
||||
"\\newcommand{\\vr}[1]{\\mathbf{r}_{#1}} \n",
|
||||
@ -949,7 +949,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
"version": "3.8.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -6,16 +6,66 @@
|
||||
"id": "P9P3fJaa30da"
|
||||
},
|
||||
"source": [
|
||||
"# Inverse Problems\n",
|
||||
"# Deep Learning for Inverse Problems\n",
|
||||
"\n",
|
||||
"**TODOs**\n",
|
||||
"- re-cap formulation from paper\n",
|
||||
"- show targets in results\n",
|
||||
"**TODOs**: **1) bottom: show targets in results; 2) move inverse prob discussion earlier? already for DP-NS example?**\n",
|
||||
"\n",
|
||||
"... intro ...\n",
|
||||
"from {cite}`holl2019pdecontrol`\n",
|
||||
"\n",
|
||||
"... inverse problems ...\n",
|
||||
"Inverse problems encompass a large class of practical scenarios that appear in science. In general, the goal here is not to directly compute a physical field like the velocity at a future time (this is the typical scenario for a _forward_ solve), but instead more generically compute a parameter in the model equation such that certain constraints are fulfilled. A very common goal here is to find the optimal setting for a single parameter given some constraints. E.g., this could be the global diffusion constant for an advection-diffusion model such that it fits measured data as accurately as possible. Inverse problems are encountered for any model parameter adjusted via observations, or the reconstruction of initial conditions, e.g., for particle imaging velocimetry (PIV). More complex cases aim for computing boundary geometries w.r.t. optmal conditions, e.g. to obtain a shape with minimal drag in a fluid flow.\n",
|
||||
"\n",
|
||||
"A key aspect demonstrated below will be that we're not aiming for solving only a _single instance_ of an inverse problem, but we'd like to use deep learning to solve a _large class_ of inverse problems. Thus, unlike the PINN example of {doc}`physicalloss-code` or the DP optimization of {doc}`diffphys-code-ns`, where we've solved an optimization problem for specific instances of inverse problems, we now aim for training an ANN that learns to solve a larger class of inverse problems. Nonetheless, we of course need to rely on a certain degree of similarity for these problems, otherwise there's nothing to learn.\n",
|
||||
"\n",
|
||||
"Below we will run a very challenging test case as a representative of these inverse problems: we will aim for computing a high dimensional control function that exerts forces over the full course of an incompressible fluid simulation in order to reach a desired goal state for a passively advected marker in the fluid. This means we only have very indirect constraints to be fulfilled (a single state at the end of a sequence), and a large number of degrees of freedom (the control force function is a space-time function with the same degrees of freedom as the flow field itself).\n",
|
||||
"\n",
|
||||
"The _long-term_ nature of the control is one of the aspects which makes this a tough inverse problem: any changes to the state of the physical system can lead to large change later on in time, and hence a controller needs to anticipate how the system will behave when it is influenced. This means an ANN also needs to learn how the underlying physics evolve and change, and this is exaclty where the gradients from the DP training come in to guide the learning task towards solution that can reach the goal.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Formulation\n",
|
||||
"\n",
|
||||
"With the notation from {doc}`overview-equations` this gives the minmization problem \n",
|
||||
"\n",
|
||||
"$\\text{arg min}_{\\theta} \\sum_m \\sum_i (f(x_{m,i} ; \\theta)-y^*_{m,i})^2$\n",
|
||||
"\n",
|
||||
"where $y^*_{m,i}$ denotes the samples of the target state of the marker field, \n",
|
||||
"and $x_{m,i}$ denotes the simulated state of the marker density.\n",
|
||||
"As before, the index $i$ samples our solution at different spatial locations (typically all grid cells), while the index $n$ here indicates a large collection of different target states.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Our goal is to train two networks $\\mathrm{OP}$ and $\\mathrm{CFE}$ with weights\n",
|
||||
"$\\theta_{\\mathrm{OP}}$ and $\\theta_{\\mathrm{CFE}}$ such that a sequence \n",
|
||||
"\n",
|
||||
"$\n",
|
||||
"\\newcommand{\\pde}{\\mathcal{P}}\n",
|
||||
"\\newcommand{\\net}{\\mathrm{CFE}}\n",
|
||||
"\\mathbf{u}_{n},d_{n} = \\pdec(\\net(\\pdec(\\net(\\cdots \\pdec(\\net( \\mathbf{u}_0,d_0 ))\\cdots)))) = (\\pdec\\net)^n ( \\mathbf{u}_0,d_0 ) .\n",
|
||||
"$\n",
|
||||
"\n",
|
||||
"minimizes the loss above. The $\\mathrm{OP}$ network is a predictor that determines the action of the $\\mathrm{CFE}$ network given the target $d^*$, i.e., $\\mathrm{OP}(\\mathbf{u},d,d^*)=d_{OP}$,\n",
|
||||
"and $\\mathrm{CFE}$ acts additively on the velocity field via\n",
|
||||
"$\\mathrm{CFE}(\\mathbf{u},d,d_{OP}) = \\mathbf{u} + f_{\\mathrm{OP}}(\\mathbf{u},d,d_{OP};\\theta_{\\mathrm{OP}})$, \n",
|
||||
"where we've used $f_{\\mathrm{OP}}$ to denote the ANN representation of $\\mathrm{CFE}$).\n",
|
||||
"\n",
|
||||
"For this problem, the model PDE $\\mathcal{P}$ contains a discretized version of the incompressible Navier-Stokes equations in two dimensions for a velocity $\\mathbf{u}$:\n",
|
||||
"\n",
|
||||
"$\\begin{aligned}\n",
|
||||
" \\frac{\\partial u_x}{\\partial{t}} + \\mathbf{u} \\cdot \\nabla u_x &= - \\frac{1}{\\rho} \\nabla p \n",
|
||||
" \\\\\n",
|
||||
" \\frac{\\partial u_y}{\\partial{t}} + \\mathbf{u} \\cdot \\nabla u_y &= - \\frac{1}{\\rho} \\nabla p \n",
|
||||
" \\\\\n",
|
||||
" \\text{s.t.} \\quad \\nabla \\cdot \\mathbf{u} &= 0,\n",
|
||||
"\\end{aligned}$\n",
|
||||
"\n",
|
||||
"with an additional transport equation for the marker density $d$:\n",
|
||||
"\n",
|
||||
"$\\begin{aligned}\n",
|
||||
" \\frac{\\partial d}{\\partial{t}} + \\mathbf{u} \\cdot \\nabla d &= 0 .\n",
|
||||
"\\end{aligned}$\n",
|
||||
"\n",
|
||||
"To summarize, we have a predictor $\\mathrm{OP}$ that gives us a direction, an actor $\\mathrm{CFE}$ that exerts a force on a physical model $\\mathcal{P}$. They all need to play hand in hand to reach a given target after $n$ iterations of the simulation. As apparent from this formulation, it's not a simple inverse problem, especially due to the fact that all three functions are non-linear. This is exactly why the gradients from the DP approach are so important.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
@ -28,17 +78,14 @@
|
||||
"This notebook will walk you through data generation, supervised network initialization and end-to-end training using our differentiable PDE solver, [Φ<sub>Flow</sub>](https://github.com/tum-pbs/PhiFlow). \n",
|
||||
"(_Note: this example uses an older version of Φ<sub>Flow</sub> (1.4.1)._)\n",
|
||||
"\n",
|
||||
"The code below replicates the shape transitions (experiment 2 from the ICLR 2020 paper [Learning to Control PDEs with Differentiable Physics](https://ge.in.tum.de/publications/2020-iclr-holl/)). The experiment is described in detail in section D.2 of the [appendix](https://openreview.net/pdf?id=HyeSin4FPB).\n",
|
||||
"\n",
|
||||
"**TODO, integrate?**\n",
|
||||
"If you havn't already, check out the notebook on controlling Burgers' Equation. It covers the basics in more detail.\n",
|
||||
"The code below replicates the shape transitions experiment from the ICLR 2020 paper by Holl et al. {cite}`holl2019pdecontrol`, [Learning to Control PDEs with Differentiable Physics](https://ge.in.tum.de/publications/2020-iclr-holl/), further details can be found in section D.2 of the [appendix](https://openreview.net/pdf?id=HyeSin4FPB).\n",
|
||||
"\n",
|
||||
"First we need to load phiflow and download the control code helpers (which will end up under `./src`) and some numpy arrays with intial shapes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
@ -46,35 +93,14 @@
|
||||
"id": "pwVPXx_Y30dd",
|
||||
"outputId": "17eeffaa-9651-48ea-b3a9-d479fc53adab"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Building wheel for phiflow (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
|
||||
"Could not load resample cuda libraries: CUDA binaries not found at /usr/local/lib/python3.7/dist-packages/phi/tf/cuda/build/resample.so. Run \"python setup.py cuda\" to compile them\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/usr/local/lib/python3.7/dist-packages/phi/tf/__init__.py:7: UserWarning: TensorFlow 2 is not fully supported by PhiFlow.\n",
|
||||
" warnings.warn('TensorFlow 2 is not fully supported by PhiFlow.')\n",
|
||||
"/usr/local/lib/python3.7/dist-packages/phi/tf/flow.py:14: UserWarning: TensorFlow-CUDA solver is not available. To compile it, download phiflow sources and run\n",
|
||||
"$ python setup.py tf_cuda\n",
|
||||
"before reinstalling phiflow.\n",
|
||||
" warnings.warn(\"TensorFlow-CUDA solver is not available. To compile it, download phiflow sources and run\\n$ python setup.py tf_cuda\\nbefore reinstalling phiflow.\")\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --upgrade --quiet git+https://github.com/tum-pbs/PhiFlow@1.4.1\n",
|
||||
"\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from phi.flow import *\n",
|
||||
"\n",
|
||||
"# this essentially copies over the code from https://github.com/holl-/PDE-Control\n",
|
||||
"# this essentially copies over the data and code from https://github.com/holl-/PDE-Control\n",
|
||||
"if not os.path.isfile('shapes/Shape_000000.npz'):\n",
|
||||
" import urllib.request\n",
|
||||
" url=\"https://ge.in.tum.de/download/2020-iclr-holl/control.zip\"\n",
|
||||
@ -260,7 +286,7 @@
|
||||
"\n",
|
||||
"The loss for the supervised initialization is defined as the observation loss at the center frame.\n",
|
||||
"\n",
|
||||
"$\\boldsymbol L_o^\\textrm{sup} = \\left|\\mathrm{OP}[o(t_i),o(t_j)] - u^*\\left(\\frac{t_i+t_j}{2}\\right)\\right|^2.$\n",
|
||||
"$\\boldsymbol L_o^\\textrm{sup} = \\left|\\mathrm{OP}(o(t_i),o(t_j)) - u^*\\left(\\frac{t_i+t_j}{2}\\right)\\right|^2.$\n",
|
||||
"\n",
|
||||
"Consequently, no sequence needs to be simulated (`sequence_class=None`) and an observation loss is required at frame $\\frac n 2$ (`obs_loss_frames=[n // 2]`).\n",
|
||||
"The pretrained network checkpoints are stored in `supervised_checkpoints`.\n",
|
||||
|
@ -6,8 +6,25 @@ to integrate full numerical simulations and the training of deep neural networks
|
||||
interacting with these simulations. While we've only hinted at what could be
|
||||
achieved via DP approaches it is nonetheless a good time to summarize the pros and cons.
|
||||
|
||||
|
||||
## Alternatives - Noise
|
||||
|
||||
It is worth mentioning here that other works have proposed perturbing the inputs and
|
||||
the iterations at training time with noise {cite}`sanchez2020learning` (somewhat similar to
|
||||
regularizers like dropout).
|
||||
This can help to prevent overfitting to the training states, and hence shares similarities
|
||||
with the goals of training with DP.
|
||||
|
||||
However, the noise is typically undirected, and hence not as accurate as training with
|
||||
the actual evolutions of simulations. Hence, this noise can be a good starting point
|
||||
for training that tends to overfit, but if possible, it is preferable to incorporate the
|
||||
acutal solver in the training loop via a DP approach.
|
||||
|
||||
|
||||
## Summary
|
||||
|
||||
To summarize the pros and cons of training ANNs via differentiable physics:
|
||||
|
||||
✅ Pro:
|
||||
- uses physical model and numerical methods for discretization
|
||||
- efficiency of selected methods carries over to training
|
||||
|
@ -19,12 +19,18 @@ As demonstrated with the Burgers example, the PINN solutions typically have sign
|
||||
|
||||
## Efficiency
|
||||
|
||||
The PINN approach typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfilment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.
|
||||
The PINN approaches typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfilment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.
|
||||
|
||||
A well-chosen discretization of a DP approach can remedy this behavior, and provide an improved flow of gradient information. At the same time, the reliance on a computational grid means that solutions can be obtained very quickly. Given an interpolation scheme or set of basis functions, the solution can be sampled at any point in space or time given a very local neighborhood of the computational grid. Worst case, this can lead to slight memory overheads, e.g., by repeatedly storing mostly constand values of a solution.
|
||||
|
||||
For the PINN representation with fully-connected networks on the other hand, we need to make a full pass over the potentially large number of values in the whole network to obtain a sample of the solution at a single point. The network effectively needs to encode the full high-dimensional solution. Its size likewise determines the efficiency of derivative calculations.
|
||||
|
||||
## Efficiency continued
|
||||
|
||||
That being said, because the DP approaches can cover much larger solution manifolds, the structure of these manifolds is typically also difficult to learn. E.g., when training a network with a larger number of iterations (i.e. a long look-ahead into the future), this typically represents a signal that is more difficult to learn than a short look ahead.
|
||||
|
||||
As a consequence, these training runs not only take more computational resources per ANN iteration, the also need longer to converge. Regarding resources, each computation of the look-ahead potentially requires a large number of simulation steps, and typically a similar amount of resources for the backprop step. Regarding convergence, the complexer signal that should be learned can take more training iterations or even require larger ANN structures.
|
||||
|
||||
## Summary
|
||||
|
||||
The following table summarizes these findings:
|
||||
|
@ -18,7 +18,7 @@ of the NN.
|
||||
This gives a minimization problem to find $f(x;\theta)$ such that $e$ is minimized.
|
||||
In the simplest case, we can use an $L^2$ error, giving
|
||||
|
||||
$\text{min}_{\theta} || f(x;\theta) - y^* ||_2^2$
|
||||
$\text{arg min}_{\theta} | f(x;\theta) - y^* |_2^2$
|
||||
|
||||
We typically optimize, i.e. _train_,
|
||||
with some variant of a stochastic gradient descent (SGD) optimizer.
|
||||
|
@ -27,7 +27,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
@ -35,18 +35,7 @@
|
||||
"id": "da1uZcDXdVcF",
|
||||
"outputId": "1082dc87-796c-4b57-e72e-5790fc1444c9"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/thuerey/miniconda3/envs/tf/lib/python3.8/_collections_abc.py:743: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.\n",
|
||||
" for key in self._mapping:\n",
|
||||
"/Users/thuerey/miniconda3/envs/tf/lib/python3.8/_collections_abc.py:744: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.\n",
|
||||
" yield (key, self._mapping[key])\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install --upgrade --quiet git+https://github.com/tum-pbs/PhiFlow@develop\n",
|
||||
"#!pip install --upgrade --quiet phiflow \n",
|
||||
@ -139,7 +128,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<matplotlib.image.AxesImage at 0x7f9317729df0>"
|
||||
"<matplotlib.image.AxesImage at 0x7fa14cffd5b0>"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
@ -322,7 +311,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<matplotlib.image.AxesImage at 0x7f93181c4b80>"
|
||||
"<matplotlib.image.AxesImage at 0x7fa148246700>"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
|
@ -163,6 +163,14 @@ fundamental steps. Here are some considerations for skipping ahead to the later
|
||||
A brief look at our _notation_ in the {doc}`notation` chapter won't hurt in both cases, though!
|
||||
```
|
||||
|
||||
## Implementations
|
||||
|
||||
This text also represents an introduction to a wide range of deep learning and simulation APIs.
|
||||
We'll use popoular deep learning APIs such as _pytorch_ and _tensorflow_, and additionally
|
||||
give introductions into _phiflow_ for simulations. Some examples also use _JAX_. Thus after going through
|
||||
these examples, you should have a good overview of what's available in current APIs, such that
|
||||
the best one can be selected for new tasks.
|
||||
|
||||
---
|
||||
<br>
|
||||
<br>
|
||||
|
@ -16,6 +16,32 @@
|
||||
"Note that similar to the previous forward simulation example, \n",
|
||||
"we will still be sampling the solution with 128 points ($n=128), but now we have a discretization via the NN. So we could also sample points inbetween without having to explicitly choose a basis function for interpolation. The discretization via the NN now internally determines how to use its degrees of freedom to construct the basis functions. So we have no direct control over the reconstruction.\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Formulation\n",
|
||||
"\n",
|
||||
"In terms of notation from {doc}`overview-equations` and the previous section, this means we are solving\n",
|
||||
"\n",
|
||||
"$\\text{arg min}_{\\theta} \\sum_i |f(x_i ; \\theta)-y^*_i|^2 + R(x_i)$ , \n",
|
||||
"\n",
|
||||
"where $x$ and $y^*$ are solutions at different locations in space and time, i.e. $x,y^* \\in \\mathbb{R}$.\n",
|
||||
"Together, the represent two-dimensional solutions\n",
|
||||
"$x(p,t)$ and $y^*(p,t)$ for a spatial coordinate $p$ and a time $t$, where the index $i$ sums over a set of $p,t$\n",
|
||||
"locations. The fucntion $R$ collects additional evaluations of $f$ and its derivatives which should add up to zero (we've omitted scaling factors in the objective function for simplicity).\n",
|
||||
"\n",
|
||||
"Note that, effectively, we're only dealing with individual samples of a single solution here.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Preliminaries\n",
|
||||
"\n",
|
||||
"Let's just load TF and phiflow for now, and initialize the random sampling. (_Note: this example uses an older version of phiflow (1.x), and TF 1.x._)\n",
|
||||
|
@ -860,4 +860,12 @@
|
||||
publisher={Oxford University Press}
|
||||
}
|
||||
|
||||
@inproceedings{sanchez2020learning,
|
||||
title={Learning to simulate complex physics with graph networks},
|
||||
author={Sanchez-Gonzalez, Alvaro and Godwin, Jonathan and Pfaff, Tobias and Ying, Rex and Leskovec, Jure and Battaglia, Peter},
|
||||
booktitle={International Conference on Machine Learning},
|
||||
pages={8459--8468},
|
||||
year={2020},
|
||||
}
|
||||
|
||||
|
||||
|
@ -22,7 +22,29 @@
|
||||
"However, instead of relying on traditional numerical methods to solve the RANS equations,\n",
|
||||
"we know aim for training a neural network that completely bypasses the numerical solver,\n",
|
||||
"and produces the solution in terms of $\\mathbf{u}$ and $p$.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Formulation\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"With the supervised formulation from {doc}`supervised`, our learning task is pretty straight-forward, and can be written as \n",
|
||||
"\n",
|
||||
"$\\text{arg min}_{\\theta} \\sum_i |f(x_i ; \\theta)-y^*_i|^2$ , \n",
|
||||
"\n",
|
||||
"where $x$ and $y^*$ each consist of a set of physical fields (pressure, x velocity, and y velocity), each with a dimension of $128^2$, i.e. $x,y^* \\in \\mathbb{R}^{3\\times128^2}$, and the index $i$ evaluates the difference across all discretization points in our data sets.\n",
|
||||
"\n",
|
||||
"Thus, the only point to keep in mind is that our quantities of interest contain three different physical fields. While the two velocity components are quite similar in spirit, the pressure field typically has a different behavior with an approximately squared scaling with respect to the velocity. This implies that we need to be careful with simple summation (as above), and that we should take care to normalize the data.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Code coming up...\n",
|
||||
"\n",
|
||||
"Let's get started with the implementation. Note that we'll skip the data generation process here. This example is adapted from [this codebase](https://github.com/thunil/Deep-Flow-Prediction), which you can check out for details. Here, we'll simply download a small set of training data generated with a Spalart-Almaras RANS simulation in [OpenFOAM](https://openfoam.org/)."
|
||||
|
Loading…
Reference in New Issue
Block a user