updated sol example

2021-01-24 18:07:14 +08:00 · 2021-01-24 18:07:14 +08:00 · f39cc81873
commit f39cc81873
parent 57938d7abe
3 changed files with 921 additions and 0 deletions
--- a/diffphys-code-sol.ipynb
+++ b/diffphys-code-sol.ipynb
@ -0,0 +1,921 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "SoL-karman2d.ipynb",
+      "provenance": [],
+      "collapsed_sections": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "qT_RWmTEugu9"
+      },
+      "source": [
+        "# Reducing Numerical Errors with Deep Learning\n",
+        "\n",
+        "Next, we'll target numerical errors that arise in the discretization of a continuous PDE $\\mathcal P^*$, i.e. when we formulate $\\mathcal P$. This approach will demonstrate that, despite the lack of closed-form descriptions, discretization errors often are functions with regular and repeating structures and, thus, can be learned by a neural network. Once the network is trained, it can be evaluated locally to improve the solution of a PDE-solver, i.e., to reduce its numerical error. The resulting method is a hybrid one: it will always run (a coarse) PDE solver, and then improve if at runtime with corrections inferred by an NN.\n",
+        "\n",
+        "Pretty much all numerical methods contain some form of iterative process. That can be repeated updates over time for explicit solvers,or within a single update step for implicit solvers. Below we'll target iterations over time, an example for the second case could be found [here](https://github.com/tum-pbs/CG-Solver-in-the-Loop).\n",
+        "\n",
+        "In the context of reducing errors, it's crucial to have a _differentiable physics solver_, so that the learning process can take the reaction of the solver into account. This interaction is not possible with supervised learning or PINN training. Even small inference errors of a supervised NN can accumulate over time, and lead to a data distribution that differs from the distribution of the pre-computed data. This distribution shift can lead to sub-optimal results, or even cause blow-ups of the solver.\n",
+        "\n",
+        "In order to learn the error function, we'll consider two different discretizations of the same PDE $\\mathcal P^*$: \n",
+        "a _reference_ version, which we assume to be accurate, with a discretized version \n",
+        "$\\mathcal P_r$, and solutions $\\mathbf r \\in \\mathscr R$, where $\\mathscr R$ denotes the manifold of solution of $\\mathcal P_r$.\n",
+        "In parallel to this, we have a less accurate approximation of the same PDE, which we'll refer to as the _source_ version, as this will be the solver that our NN should later on interact with. Analogously,\n",
+        "we have $\\mathcal P_s$ with solutions $\\mathbf s \\in \\mathscr S$.\n",
+        "And after training, we'll have a _hybrid_ solver that uses $\\mathcal P_s$ in conjunction with a trained network to obtain improved solutions, i.e., solutions that are closer to the ones produced by $\\mathcal P_r$.\n",
+        "\n",
+        "```{figure} resources/diffphys-sol-manifolds.jpeg\n",
+        "---\n",
+        "height: 280px\n",
+        "name: diffphys-sol-manifolds\n",
+        "---\n",
+        "Visual overview of coarse and reference manifolds\n",
+        "```\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "tayrJa7_ZzS_"
+      },
+      "source": [
+        "\n",
+        "explain coarse/ref setup...\n",
+        "Let's assume $\\mathcal{P}$ advances a solution by a time step $\\Delta t$, and let's denote $n$ consecutive steps by a superscript:\n",
+        "$\n",
+        "\\newcommand{\\pde}{\\mathcal{P}}\n",
+        "\\newcommand{\\pdec}{\\pde_{s}}\n",
+        "\\newcommand{\\vc}[1]{\\mathbf{s}_{#1}} \n",
+        "\\newcommand{\\vr}[1]{\\mathbf{r}_{#1}} \n",
+        "\\newcommand{\\vcN}{\\vs}          \n",
+        "\\newcommand{\\project}{\\mathcal{T}}   \n",
+        "\\vc{t+n} = \\pdec(\\pdec(\\cdots \\pdec( \\project \\vr{t}  )\\cdots)) = \\pdec^n ( \\project \\vr{t} ) .\n",
+        "$\n",
+        "Here we assume a mapping operator $\\mathcal{T}$ exists that transfers a reference solution to the source manifold. This could, e.g., be a simple downsampling operation.\n",
+        "Especially for longer sequences, i.e. larger $n$, the source state \n",
+        "$\\newcommand{\\vc}[1]{\\mathbf{s}_{#1}} \\vc{t+n}$\n",
+        "will deviate from a corresponding reference state\n",
+        "$\\newcommand{\\vr}[1]{\\mathbf{r}_{#1}} \\vr{t+n}$. \n",
+        "This is what we will address with an NN in the following.\n",
+        "\n",
+        "We'll use an $L^2$-norm in the following to quantify the deviations, i.e., \n",
+        "$\n",
+        "\\newcommand{\\loss}{\\mathcal{L}} \n",
+        "\\newcommand{\\corr}{\\mathcal{C}} \n",
+        "\\newcommand{\\vc}[1]{\\mathbf{s}_{#1}} \n",
+        "\\newcommand{\\vr}[1]{\\mathbf{r}_{#1}} \n",
+        "\\loss (\\vc{t},\\project \\vr{t})=\\Vert\\vc{t}-\\project \\vr{t}\\Vert_2$. \n",
+        "Our learning goal is to train at a correction operator $\\corr ( \\vc{} )$ such that \n",
+        "a solution to which the correction is applied has a lower error than the original unmodified (source) solution: \n",
+        "$\\loss ( \\pdec( \\corr (\\project \\vr{t_0}) ) , \\project \\vr{t_1}) < \\loss ( \\pdec( \\project \\vr{t_0} ), \\project \\vr{t_1})$. \n",
+        "\n",
+        "The correction function \n",
+        "$\\newcommand{\\vcN}{\\mathbf{s}} \\newcommand{\\corr}{\\mathcal{C}} \\corr (\\vcN | \\theta)$ \n",
+        "is represented as a deep neural network with weights $\\theta$\n",
+        "and receives the state $\\vcN$ to infer an additive correction field with the same dimension.\n",
+        "To distinguish the original states $\\vcN$ from the corrected ones, we'll denote the latter with an added tilde\n",
+        "$\\newcommand{\\vctN}{\\tilde{\\mathbf{s}}} \\vctN$.\n",
+        "The overall learning goal now becomes\n",
+        "\n",
+        "$\n",
+        "\\text{argmin}_\\theta | \n",
+        "( \\pdec \\corr )^n ( \\project \\vr{t} )\n",
+        "- \\project \\vr{t}|^2\n",
+        "$\n",
+        "\n",
+        "A crucial bit here that's easy to overlook is that the correction depends on the modified states, i.e.\n",
+        "it is a function of\n",
+        "$\\newcommand{\\vctN}{\\tilde{\\mathbf{s}}} \\vctN$, so we have \n",
+        "$\\newcommand{\\vctN}{\\tilde{\\mathbf{s}}} \\newcommand{\\corr}{\\mathcal{C}} \\corr (\\vctN | \\theta)$.\n",
+        "These states actually evolve over time when training. They don't exist beforehand.\n",
+        "\n",
+        "**TL;DR**:\n",
+        "We'll train a network $\\mathcal{C}$ to reduce the numerical errors of a simulator with a more accurate reference. Here it's crucial to have the _source_ solver realized as a differential physics operator, such that it can give gradients for an improved training of $\\mathcal{C}$.\n",
+        "\n",
+        "\\\\\n",
+        "\n",
+        "---\n",
+        "\n",
+        "First, let's download the prepared data set (for details on generation & loading cf. https://github.com/tum-pbs/Solver-in-the-Loop), and let's get the data handling out of the way, so that we can focus on the _interesting_ parts..."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "JwZudtWauiGa",
+        "outputId": "bd3a4a4d-706f-4210-ee4e-ca4e370b762d"
+      },
+      "source": [
+        "import os, sys, logging, argparse, pickle, glob, random, distutils.dir_util\n",
+        "\n",
+        "if not os.path.isfile('data-karman2d-train.pickle'):\n",
+        "  import urllib.request\n",
+        "  url=\"https://ge.in.tum.de/download/2020-solver-in-the-loop/sol-karman-2d-data.pickle\"\n",
+        "  print(\"Downloading training data (73MB), this can take a moment the first time...\")\n",
+        "  urllib.request.urlretrieve(url, 'data-karman2d-train.pickle')\n",
+        "\n",
+        "with open('data-karman2d-train.pickle', 'rb') as f: dataPreloaded = pickle.load(f)\n",
+        "print(\"Loaded data, {} training sims\".format(len(dataPreloaded)) )\n"
+      ],
+      "execution_count": 1,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "Downloading training data (73MB), this can take a moment the first time...\n",
+            "Loaded data, 6 training sims\n"
+          ],
+          "name": "stdout"
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "RY1F4kdWPLNG"
+      },
+      "source": [
+        "Also let's get installing / importing all the necessary libraries out of the way. And while we're at it, we can set the random seed - obviously, 42 is the ultimate choice here 🙂"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "BGN4GqxkIueM",
+        "outputId": "095adbf8-1ef6-41fd-938e-6cafcf0fdfdc"
+      },
+      "source": [
+        "!pip install --upgrade --quiet phiflow\n",
+        "%tensorflow_version 1.x\n",
+        "\n",
+        "from phi.tf.flow import *\n",
+        "import phi.tf.util\n",
+        "\n",
+        "import tensorflow as tf\n",
+        "from tensorflow import keras\n",
+        "\n",
+        "random.seed(42)\n",
+        "np.random.seed(42)\n",
+        "tf.compat.v1.set_random_seed(42)\n"
+      ],
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "\u001b[K     |████████████████████████████████| 2.7MB 4.3MB/s \n",
+            "\u001b[?25h  Building wheel for phiflow (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
+            "TensorFlow 1.x selected.\n",
+            "Could not load resample cuda libraries: CUDA binaries not found at /usr/local/lib/python3.6/dist-packages/phi/tf/cuda/build/resample.so. Run \"python setup.py cuda\" to compile them\n",
+            "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/util.py:119: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n",
+            "\n",
+            "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/profiling.py:12: The name tf.RunOptions is deprecated. Please use tf.compat.v1.RunOptions instead.\n",
+            "\n",
+            "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/profiling.py:13: The name tf.RunMetadata is deprecated. Please use tf.compat.v1.RunMetadata instead.\n",
+            "\n"
+          ],
+          "name": "stdout"
+        },
+        {
+          "output_type": "stream",
+          "text": [
+            "/usr/local/lib/python3.6/dist-packages/phi/viz/display.py:80: UserWarning: GUI is disabled because of missing dependencies: No module named 'dash_core_components'. To install all dependencies, run $ pip install phiflow[gui]\n",
+            "  warnings.warn('GUI is disabled because of missing dependencies: %s. To install all dependencies, run $ pip install phiflow[gui]' % import_error)\n",
+            "/usr/local/lib/python3.6/dist-packages/phi/tf/flow.py:15: UserWarning: TensorFlow-CUDA solver is not available. To compile it, download phiflow sources and run\n",
+            "$ python setup.py tf_cuda\n",
+            "before reinstalling phiflow.\n",
+            "  warnings.warn(\"TensorFlow-CUDA solver is not available. To compile it, download phiflow sources and run\\n$ python setup.py tf_cuda\\nbefore reinstalling phiflow.\")\n"
+          ],
+          "name": "stderr"
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "OhnzPdoww11P"
+      },
+      "source": [
+        "Now we can set up the _source_ simulation $\\newcommand{\\pdec}{\\pde_{s}} \\pdec$. \n",
+        "Note that we won't deal with \n",
+        "$\\newcommand{\\pder}{\\pde_{r}} \\pder$\n",
+        "below: the downsampled reference data is contained in the training data set. It was generated with a four times finer discretization. Below we're focusing on the interaction of the source solver and the NN. \n",
+        "\n",
+        "This code block and the next ones will define lots of functions, that will be used later on for training.\n",
+        "\n",
+        "The `KarmanFlow` solver below simulates a relatively standard wake flow case with a spherical obstacle in a rectangular domain, and an explicit viscosity solve to obtain different Reynolds numbers. This is the geometry of the setup:\n",
+        "\n",
+        "```{figure} resources/diffphys-sol-domain.png\n",
+        "---\n",
+        "height: 200px\n",
+        "name: diffphys-sol-domain\n",
+        "---\n",
+        "Domain setup for the wake flow case.\n",
+        "```\n",
+        "\n",
+        "The solver applies inflow boundary conditions for the y-velocity with a pre-multiplied mask (`velBCy,velBCyMask`), and then calls `super().step()` to run the _regular_ phiflow fluid solving step.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "6WNMcdWUw4EP"
+      },
+      "source": [
+        "class KarmanFlow(IncompressibleFlow):\n",
+        "    def __init__(self, pressure_solver=None, make_input_divfree=False, make_output_divfree=True):\n",
+        "        IncompressibleFlow.__init__(self, pressure_solver, make_input_divfree, make_output_divfree)\n",
+        "\n",
+        "        self.infl = Inflow(box[5:10, 25:75])\n",
+        "        self.obst = Obstacle(Sphere([50, 50], 10))\n",
+        "\n",
+        "    def step(self, fluid, re, res, velBCy, velBCyMask, dt=1.0, gravity=Gravity()):\n",
+        "        # apply viscosity\n",
+        "        alpha = 1.0/math.reshape(re, [fluid._batch_size, 1, 1, 1]) * dt * res * res\n",
+        "\n",
+        "        cy = diffuse(CenteredGrid(fluid.velocity.data[0].data), alpha)\n",
+        "        cx = diffuse(CenteredGrid(fluid.velocity.data[1].data), alpha)\n",
+        "\n",
+        "        # apply velocity BCs, only for y velocity for now. note: content of velBCy should be pre-multiplied\n",
+        "        cy = cy*(1.0 - velBCyMask) + velBCy\n",
+        "\n",
+        "        fluid = fluid.copied_with(velocity=StaggeredGrid([cy.data, cx.data], fluid.velocity.box))\n",
+        "\n",
+        "        return super().step(fluid=fluid, dt=dt, obstacles=[self.obst], gravity=gravity, density_effects=[self.infl], velocity_effects=())\n"
+      ],
+      "execution_count": 3,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "RYFUGICgxk0K"
+      },
+      "source": [
+        "We'll also define two alternative neural networks to represent \n",
+        "$\\newcommand{\\vcN}{\\mathbf{s}} \\newcommand{\\corr}{\\mathcal{C}} \\corr$: \n",
+        "\n",
+        "In all cases we'll use fully convolutional networks, i.e. networks without any fully-connected layers. The\n",
+        "inputs are: \n",
+        "- 2 fields with x,y velociy\n",
+        "- plus the Reynolds number as constant channel.\n",
+        "\n",
+        "The output is : \n",
+        "- a 2 component field containing the x,y velocty\n",
+        "\n",
+        "First, let's define a minimal network consisting only of three convolutional layers with ReLU activations (we're also using keras here for simplicity). The input channel dimension is defined via the `tensor_in`, then we'll go to 32 and 64 features, before reducing to 2 channels in the output. "
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "qIrWYTy6xscA"
+      },
+      "source": [
+        "def network_small(tensor_in):\n",
+        "    return keras.Sequential([\n",
+        "        keras.layers.Input(tensor=tensor_in),\n",
+        "        keras.layers.Conv2D(filters=32, kernel_size=5, padding='same', activation=tf.nn.relu),\n",
+        "        keras.layers.Conv2D(filters=64, kernel_size=5, padding='same', activation=tf.nn.relu),\n",
+        "        keras.layers.Conv2D(filters=2,  kernel_size=5, padding='same', activation=None),  # u, v\n",
+        "    ])\n"
+      ],
+      "execution_count": 4,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "YfHvdI7yxtdj"
+      },
+      "source": [
+        "For flexibility (and larger-scale tests later on), let's define a _proper_ ResNet with a few more layers. This architecture is the one from the original paper, and will give a fairly good performance (`network_small` above will train faster, but give a sub-optimal performance at inference time)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "TyfpA7Fbx0ro"
+      },
+      "source": [
+        "def network_medium(tensor_in):\n",
+        "    l_input = keras.layers.Input(tensor=tensor_in)\n",
+        "    block_0 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(l_input)\n",
+        "    block_0 = keras.layers.LeakyReLU()(block_0)\n",
+        "\n",
+        "    l_conv1 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(block_0)\n",
+        "    l_conv1 = keras.layers.LeakyReLU()(l_conv1)\n",
+        "    l_conv2 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(l_conv1)\n",
+        "    l_skip1 = keras.layers.add([block_0, l_conv2])\n",
+        "    block_1 = keras.layers.LeakyReLU()(l_skip1)\n",
+        "\n",
+        "    l_conv3 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(block_1)\n",
+        "    l_conv3 = keras.layers.LeakyReLU()(l_conv3)\n",
+        "    l_conv4 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(l_conv3)\n",
+        "    l_skip2 = keras.layers.add([block_1, l_conv4])\n",
+        "    block_2 = keras.layers.LeakyReLU()(l_skip2)\n",
+        "\n",
+        "    l_conv5 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(block_2)\n",
+        "    l_conv5 = keras.layers.LeakyReLU()(l_conv5)\n",
+        "    l_conv6 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(l_conv5)\n",
+        "    l_skip3 = keras.layers.add([block_2, l_conv6])\n",
+        "    block_3 = keras.layers.LeakyReLU()(l_skip3)\n",
+        "\n",
+        "    l_conv7 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(block_3)\n",
+        "    l_conv7 = keras.layers.LeakyReLU()(l_conv7)\n",
+        "    l_conv8 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(l_conv7)\n",
+        "    l_skip4 = keras.layers.add([block_3, l_conv8])\n",
+        "    block_4 = keras.layers.LeakyReLU()(l_skip4)\n",
+        "\n",
+        "    l_conv9 = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(block_4)\n",
+        "    l_conv9 = keras.layers.LeakyReLU()(l_conv9)\n",
+        "    l_convA = keras.layers.Conv2D(filters=32, kernel_size=5, padding='same')(l_conv9)\n",
+        "    l_skip5 = keras.layers.add([block_4, l_convA])\n",
+        "    block_5 = keras.layers.LeakyReLU()(l_skip5)\n",
+        "\n",
+        "    l_output = keras.layers.Conv2D(filters=2,  kernel_size=5, padding='same')(block_5)\n",
+        "    return keras.models.Model(inputs=l_input, outputs=l_output)\n"
+      ],
+      "execution_count": 5,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ew-MgPSlyLW-"
+      },
+      "source": [
+        "Next, we're coming to two functions which are pretty important: they transform the simulation state into an input tensor for the network, and vice versa. Hence, they're the interface between _keras/tensorflow_ and _phiflow_.\n",
+        "\n",
+        "The `to_keras` function uses the `staggered_tensor` from phiflow (a 2 component tensor instead of 2 separate grids), from which we discard the outermost layer. We then add a constant channel via `tf.constant` that is multiplied by the desired Reynolds number.\n",
+        "\n",
+        "After network evaluation, we transform the output tensor back into a staggered velocity grid for phiflow. (Note: these are two _centered_ grids with different sizes, so we leave the work to the`unstack_staggered_tensor` function in `StaggeredGrid()` constructor)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "hhGFpTjGyRyg"
+      },
+      "source": [
+        "def to_keras(fluidstate, ext_const_channel):\n",
+        "  # drop the unused edges of the staggered velocity grid making its dim same to the centered grid's\n",
+        "  return math.concat(\n",
+        "      [\n",
+        "          fluidstate.velocity.staggered_tensor()[:, :-1:, :-1:, 0:2],\n",
+        "          tf.constant(shape=fluidstate.density.data.shape, value=1.0)*math.reshape(value=ext_const_channel, shape=[fluidstate._batch_size, 1, 1, 1]),\n",
+        "      ],\n",
+        "      axis=-1\n",
+        "  )\n",
+        "\n",
+        "def to_staggered(tensor_cen, box):\n",
+        "  return StaggeredGrid(math.pad(tensor_cen, ((0,0), (0,1), (0,1), (0,0))), box=box)\n"
+      ],
+      "execution_count": 12,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "VngMwN_9y00S"
+      },
+      "source": [
+        "---\n",
+        "\n",
+        "So far so good - we also need to take care of a few more mundane tasks, e.g. the some data handling and randomization. Below we define a `Dataset` class that stores all \"ground truth\" reference data (already downsampled).\n",
+        "\n",
+        "We actually have a lot of data dimensions: multiple simulations, with many time steps, each with different fields. This makes the code below a bit more difficult to read.\n",
+        "\n",
+        "The data format for `dataPreloaded`: is  `['sim_name', frame, field (dens & vel)]`, where each field has dimension `[batch-size, y-size, x-size, channels]` (this is the standard phiflow export)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "tjywcdD2y20t"
+      },
+      "source": [
+        "class Dataset():\n",
+        "    def __init__(self, data_preloaded, num_frames, num_sims=None, batch_size=1):\n",
+        "        self.epoch         = None\n",
+        "        self.epochIdx      = 0\n",
+        "        self.batch         = None\n",
+        "        self.batchIdx      = 0\n",
+        "        self.step          = None\n",
+        "        self.stepIdx       = 0\n",
+        "\n",
+        "        self.dataPreloaded = data_preloaded\n",
+        "        self.batchSize     = batch_size\n",
+        "\n",
+        "        self.numSims       = num_sims\n",
+        "        self.numBatches    = num_sims//batch_size\n",
+        "        self.numFrames     = num_frames\n",
+        "        self.numSteps      = num_frames\n",
+        "\n",
+        "        #with open('/Users/thuerey/temp/sol-karman-2d-data.pickle', 'rb') as f: self.dataPreloaded = pickle.load(f)\n",
+        "        \n",
+        "        self.dataSims = ['karman-fdt-hires-set/sim_%06d'%i for i in range(num_sims) ]\n",
+        "        self.dataFrms = [ np.arange(num_frames) for _ in self.dataSims ]  \n",
+        "\n",
+        "        # constant additional per-sim channel: Reynolds numbers from data generation\n",
+        "        ReNrs = [160000.0, 320000.0, 640000.0,  1280000.0,  2560000.0,  5120000.0]\n",
+        "        self.extConstChannelPerSim = { self.dataSims[i]:[ReNrs[i]] for i in range(num_sims) }\n",
+        "\n",
+        "        #print(format(self.dataPreloaded[self.dataSims[0]][0][0].shape )) # debugging example: check shape of a single density field\n",
+        "\n",
+        "        # the data has the following shape ['sim', frame, field (dens/vel)] where each field is [batch-size, y-size, x-size, channels]\n",
+        "        self.resolution = self.dataPreloaded[self.dataSims[0]][0][0].shape[1:3]  \n",
+        "\n",
+        "        # compute data statistics for normalization\n",
+        "        self.dataStats = {\n",
+        "            'std': (\n",
+        "                np.std(np.concatenate([np.absolute(self.dataPreloaded[asim][i][0].reshape(-1)) for asim in self.dataSims for i in range(num_frames)], axis=-1)),  # density\n",
+        "                (\n",
+        "                    np.std(np.concatenate([np.absolute(self.dataPreloaded[asim][i][1][...,0].reshape(-1)) for asim in self.dataSims for i in range(num_frames)])),  # vel[0]\n",
+        "                    np.std(np.concatenate([np.absolute(self.dataPreloaded[asim][i][1][...,1].reshape(-1)) for asim in self.dataSims for i in range(num_frames)])),  # vel[1]\n",
+        "                )\n",
+        "            ),\n",
+        "            'ext.std': [ np.std([np.absolute(self.extConstChannelPerSim[asim][0]) for asim in self.dataSims]) ] # Re\n",
+        "        }\n",
+        "\n",
+        "        print(\"Data stats: \"+format(self.dataStats))\n",
+        "\n",
+        "    # re-shuffle data for next epoch\n",
+        "    def newEpoch(self, exclude_tail=0, shuffle_data=True):\n",
+        "        self.numSteps = self.numFrames - exclude_tail\n",
+        "        simSteps = [ (asim, self.dataFrms[i][0:(len(self.dataFrms[i])-exclude_tail)]) for i,asim in enumerate(self.dataSims) ]\n",
+        "        sim_step_pair = []\n",
+        "        for i,_ in enumerate(simSteps):\n",
+        "            sim_step_pair += [ (i, astep) for astep in simSteps[i][1] ]  # (sim_idx, step) ...\n",
+        "\n",
+        "        if shuffle_data: random.shuffle(sim_step_pair)\n",
+        "        self.epoch = [ list(sim_step_pair[i*self.numSteps:(i+1)*self.numSteps]) for i in range(self.batchSize*self.numBatches) ]\n",
+        "        self.epochIdx += 1\n",
+        "        self.batchIdx = 0\n",
+        "        self.stepIdx = 0\n",
+        "\n",
+        "    def nextBatch(self):  \n",
+        "        self.batchIdx += self.batchSize\n",
+        "        self.stepIdx = 0\n",
+        "\n",
+        "    def nextStep(self):\n",
+        "        self.stepIdx += 1\n"
+      ],
+      "execution_count": 7,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "twIMJ3V0N1FX"
+      },
+      "source": [
+        "The `nextEpoch`, `nextBatch`, and `nextStep` functions will be called at training time.\n",
+        "\n",
+        "Now we need one more function that compiles the data for a mini batch to train with, called `getData` below. It returns batches of the desired size in terms of marker density, velocity, and Reynolds number.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "Dfwd4TnqN1Tn"
+      },
+      "source": [
+        "    # for class Dataset():\n",
+        "\n",
+        "    # get one mini batch of data: [marker density, velocity, Reynolds number] all from ground truth\n",
+        "    def getData(self, consecutive_frames, with_skip=1):\n",
+        "        marker_dens = [\n",
+        "            math.concat([\n",
+        "                self.dataPreloaded[\n",
+        "                    self.dataSims[self.epoch[self.batchIdx+i][self.stepIdx][0]]  # sim_key\n",
+        "                ][\n",
+        "                    self.epoch[self.batchIdx+i][self.stepIdx][1]+j*with_skip  # steps\n",
+        "                ][0]\n",
+        "                for i in range(self.batchSize)\n",
+        "            ], axis=0) for j in range(consecutive_frames+1)\n",
+        "        ]\n",
+        "        velocity = [\n",
+        "            math.concat([\n",
+        "                self.dataPreloaded[\n",
+        "                    self.dataSims[self.epoch[self.batchIdx+i][self.stepIdx][0]]  # sim_key\n",
+        "                ][\n",
+        "                    self.epoch[self.batchIdx+i][self.stepIdx][1]+j*with_skip  # steps\n",
+        "                ][1]\n",
+        "                for i in range(self.batchSize)\n",
+        "            ], axis=0) for j in range(consecutive_frames+1)\n",
+        "        ]\n",
+        "        ext = [\n",
+        "            self.extConstChannelPerSim[\n",
+        "                self.dataSims[self.epoch[self.batchIdx+i][self.stepIdx][0]]\n",
+        "            ][0] for i in range(self.batchSize)\n",
+        "        ]\n",
+        "        return [marker_dens, velocity, ext]\n"
+      ],
+      "execution_count": 8,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "bIWnyPYlz8q7"
+      },
+      "source": [
+        "After all the definitions we can finally run some code. We can define the data set with the downloaded data from the first cell:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "59EBdEdj0QR2",
+        "outputId": "8043f090-4e7b-4178-d2d2-513981e3905b"
+      },
+      "source": [
+        "output_dir = \"./\"  # TODO create? , replaced params['train'] and params['tf']\n",
+        "nsims = 6\n",
+        "batch_size = 3\n",
+        "simsteps = 500\n",
+        "\n",
+        "dataset = Dataset( data_preloaded=dataPreloaded, num_frames=simsteps, num_sims=nsims, batch_size=batch_size )\n",
+        "#dataset.newEpoch()\n",
+        "#print(format(getData(dataset,1)))\n",
+        "#print(format(dataset.getData(1)))\n"
+      ],
+      "execution_count": 9,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "Data stats: {'std': (2.2194703, (0.32598782, 0.1820292)), 'ext.std': [1732512.6262166172]}\n"
+          ],
+          "name": "stdout"
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0N92RooWPzeA"
+      },
+      "source": [
+        "Additionally, we can define several global variables to control the training and the simulation.\n",
+        "\n",
+        "The most important and interesting one is `msteps`. It defines the number of simulation steps that are unrolled at each training iteration. This directly influences the runtime of each training step, as we first have to simulation all steps forward, and then back-propagate the gradient through all `msteps` simulation steps interleaved with the NN evaluations. However, this is where we'll receive important feedback in terms of gradients how the inferred corrections actually influence a running simulation. Hence, larger `msteps` are typically better.\n",
+        "\n",
+        "In addition we define the `source` and `reference` simulations below (note, the reference is just a placeholder for data, only the `source` simulation is actually executed). We also define the actual NN `network`. All three are initialized with the size given in the data set (`dataset.resolution`)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "EjgkdCzKP2Ip",
+        "outputId": "6b21bd54-15aa-4440-b274-c3a68ab244f8"
+      },
+      "source": [
+        "# one of the most crucial! how many simulation steps to look into the future while training\n",
+        "msteps = 4\n",
+        "\n",
+        "# this is the actual resolution in terms of cells\n",
+        "source_res = list(dataset.resolution)\n",
+        "# this is only a virtual size, in terms of abstract units for the bounding box of the domain (in practice it's important for conversions or when rescaling to physical units)\n",
+        "sim_len = 100.\n",
+        "\n",
+        "source     =   Fluid(Domain(resolution=source_res, box=box[0:sim_len*2, 0:sim_len], boundaries=OPEN), buoyancy_factor=0, batch_size=batch_size)\n",
+        "reference = [ Fluid(Domain(resolution=source_res, box=box[0:sim_len*2, 0:sim_len], boundaries=OPEN), buoyancy_factor=0, batch_size=batch_size) for _ in range(msteps) ]\n",
+        "\n",
+        "# velocity BC\n",
+        "vn = np.zeros(source.velocity.data[0].data.shape)  # st.velocity.data[0] is considered as the velocity field in y axis!\n",
+        "vn[..., 0:2, 0:vn.shape[2]-1, 0] = 1.0\n",
+        "vn[..., 0:vn.shape[1], 0:1,   0] = 1.0\n",
+        "vn[..., 0:vn.shape[1], -1:,   0] = 1.0\n",
+        "velBCy = vn\n",
+        "velBCyMask = np.copy(vn)  # warning, mask needs to be binary, 0..1, this only works if vel is also 1\n",
+        "\n",
+        "lr_in =   tf.placeholder(tf.float32, shape=[])  # learning rate\n",
+        "Re_in =   tf.placeholder(tf.float32, shape=[batch_size])  # Reynolds numbers\n",
+        "\n",
+        "source_in =   phi.tf.util.placeholder_like(source)\n",
+        "reference_in = [ phi.tf.util.placeholder_like(source) for _ in range(msteps) ]\n",
+        "\n",
+        "network = network_small(to_keras(source_in, Re_in)) # use small network for testing\n",
+        "#network = network_medium(to_keras(source_in, Re_in)) # optionally switch to larger network\n",
+        "network.summary() \n",
+        "\n"
+      ],
+      "execution_count": 10,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "WARNING:tensorflow:From /tensorflow-1.15.2/python3.6/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.\n",
+            "Instructions for updating:\n",
+            "If using Keras pass *_constraint arguments to layers.\n",
+            "Model: \"sequential\"\n",
+            "_________________________________________________________________\n",
+            "Layer (type)                 Output Shape              Param #   \n",
+            "=================================================================\n",
+            "conv2d (Conv2D)              (3, 64, 32, 32)           2432      \n",
+            "_________________________________________________________________\n",
+            "conv2d_1 (Conv2D)            (3, 64, 32, 64)           51264     \n",
+            "_________________________________________________________________\n",
+            "conv2d_2 (Conv2D)            (3, 64, 32, 2)            3202      \n",
+            "=================================================================\n",
+            "Total params: 56,898\n",
+            "Trainable params: 56,898\n",
+            "Non-trainable params: 0\n",
+            "_________________________________________________________________\n"
+          ],
+          "name": "stdout"
+        },
+        {
+          "output_type": "stream",
+          "text": [
+            "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:23: DeprecationWarning: placeholder_like may not respect the batch dimension. For State objects, use placeholder(state.shape) instead.\n",
+            "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:24: DeprecationWarning: placeholder_like may not respect the batch dimension. For State objects, use placeholder(state.shape) instead.\n"
+          ],
+          "name": "stderr"
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "AbpNPzplQZMF"
+      },
+      "source": [
+        "Now comes the **most crucial** step in the whole setup: we define the chain of simulation steps and network evaluations to be used at training time. After all the work defining helper functions, it's acutally pretty simple: we loop over `msteps`, call the simulator via `KarmanFlow.step` for an input state, and afterwards evaluate the correction via `network(to_keras())`. The correction is then added to the last simultation state in the `prediction` list (we're actually simply overwriting the last simulated step `prediction[-1]` with `velocity + correction[-1]`.\n",
+        "\n",
+        "One other important things that's happening here is normalization: the inputs to the network are divided by the standard deviations in `dataset.dataStats`. This is slightly complicated as we have to append the scaling for the Reynolds numbers to the normalization for the velocity. After evaluating the `network`, we only have a velocity left, so we can simply multiply by the standard deviation again (`* dataset.dataStats['std'][1]`)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "D5NeMcLGQaxh"
+      },
+      "source": [
+        "prediction, correction = [], []\n",
+        "for i in range(msteps):\n",
+        "    prediction += [\n",
+        "        KarmanFlow().step(\n",
+        "            source_in if i==0 else prediction[-1],\n",
+        "            re=Re_in,\n",
+        "            res=source_res[-1],  # reference resolution is size in x direction\n",
+        "            velBCy=velBCy,\n",
+        "            velBCyMask=velBCyMask\n",
+        "        )\n",
+        "    ]\n",
+        "\n",
+        "    correction += [\n",
+        "        to_staggered(\n",
+        "            network(\n",
+        "                to_keras(prediction[-1], Re_in)/[\n",
+        "                    *(dataset.dataStats['std'][1]),  # velocity\n",
+        "                    dataset.dataStats['ext.std'][0]  # Re\n",
+        "                ]\n",
+        "            ) * dataset.dataStats['std'][1],\n",
+        "            box=source.velocity.box\n",
+        "        )\n",
+        "    ]\n",
+        "\n",
+        "    prediction[-1] = prediction[-1].copied_with(velocity=prediction[-1].velocity + correction[-1])\n"
+      ],
+      "execution_count": 13,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "c4yLlDM3QfUR"
+      },
+      "source": [
+        "We also need to define a loss function for training. Here, we can simply use an $L^2$ loss over the whole sequence, i.e. the iteration over `msteps`:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "K2JcO3-QQgC9"
+      },
+      "source": [
+        "loss_steps = [\n",
+        "    tf.nn.l2_loss(\n",
+        "        (reference_in[i].velocity.staggered_tensor() - prediction[i].velocity.staggered_tensor())\n",
+        "        /dataset.dataStats['std'][1]\n",
+        "    )\n",
+        "    for i in range(msteps)\n",
+        "]\n",
+        "loss = tf.reduce_sum(loss_steps)/msteps\n"
+      ],
+      "execution_count": 14,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "E6Vly1_0QhZ1"
+      },
+      "source": [
+        "For the training, we use a standard Adam optimizer, and only run 4 epochs by default. This could (should) be increased for the larger network or to obtain more accurate results."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "PuljFamYQksW",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "e71bcaae-187c-4c10-cee8-f03bb8964af0"
+      },
+      "source": [
+        "lr = 1e-4\n",
+        "adapt_lr = True\n",
+        "resume = 0 # load existing network?\n",
+        "epochs = 4\n",
+        "\n",
+        "opt = tf.compat.v1.train.AdamOptimizer(learning_rate=lr_in)\n",
+        "train_step = opt.minimize(loss)\n",
+        "\n",
+        "tf_session = tf.Session() \n",
+        "scene = Scene.create(output_dir, count=batch_size, mkdir=False, copy_calling_script=False)\n",
+        "sess = Session(scene, session=tf_session)\n",
+        "tf.compat.v1.keras.backend.set_session(tf_session)\n",
+        "\n",
+        "sess.initialize_variables()\n",
+        "\n",
+        "# optional, load existing network...\n",
+        "if resume>0: \n",
+        "    ld_network = keras.models.load_model(output_dir+'/nn_epoch{:04d}.h5'.format(resume))\n",
+        "    network.set_weights(ld_network.get_weights())\n"
+      ],
+      "execution_count": 15,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/session.py:28: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.\n",
+            "\n",
+            "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/session.py:29: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.\n",
+            "\n"
+          ],
+          "name": "stdout"
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "p8hUXJDkRQST"
+      },
+      "source": [
+        "As this setups supports an (optional) fairly accurate training with the medium sized network from above, we'll define helper function for scheduling learning rate decay. This helps to make the network converge later on in the training. The steps below (after 10,15 etc. epochs) were determined heuristically from previous runs. Feel free to experiment."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "Am3hSdNgRPEh"
+      },
+      "source": [
+        "\n",
+        "def lr_schedule(epoch, current_lr):\n",
+        "    lr = current_lr\n",
+        "    if   epoch == 25: lr *= 0.5\n",
+        "    elif epoch == 20: lr *= 1e-1\n",
+        "    elif epoch == 15: lr *= 1e-1\n",
+        "    elif epoch == 10: lr *= 1e-1\n",
+        "    return lr\n"
+      ],
+      "execution_count": 16,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lrALctV1RWBO"
+      },
+      "source": [
+        "Finally, we can start training! This is very straight forward now, we simply loop over the desired number of iterations, get a batch each time via `getData`, feed it into the source simulation input `source_in`, and compare it in the loss with the `reference` data for the batch.\n",
+        "\n",
+        "The setup above will automatically take care that the differentiable physics setup used here provides the right gradient information, and connects to the tensorflow network. Be warned: due to the complexity of the setup, this training can take a while..."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "m3Nd8YyHRVFQ",
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "outputId": "3bea702a-14d0-43a7-ebc5-25289e27c5a5"
+      },
+      "source": [
+        "current_lr = lr\n",
+        "steps = 0\n",
+        "for j in range(epochs):  # training\n",
+        "    dataset.newEpoch(exclude_tail=msteps)\n",
+        "    if j<resume:\n",
+        "        print('resume: skipping {} epoch'.format(j+1))\n",
+        "        steps += dataset.numSteps*dataset.numBatches\n",
+        "        continue\n",
+        "\n",
+        "    current_lr = lr_schedule(j, current_lr) if adapt_lr else lr\n",
+        "    for ib in range(dataset.numBatches):   \n",
+        "        for i in range(dataset.numSteps): \n",
+        "            batch = getData(dataset, consecutive_frames=msteps) # should be dataset.getData\n",
+        "            re_nr = batch[2] # Reynolds numbers\n",
+        "            source = source.copied_with(density=batch[0][0], velocity=batch[1][0])\n",
+        "            reference = [ reference[k].copied_with(density=batch[0][k+1], velocity=batch[1][k+1]) for k in range(msteps) ]\n",
+        "\n",
+        "            my_feed_dict = { source_in: source, Re_in: re_nr, lr_in: current_lr }\n",
+        "            my_feed_dict.update(zip(reference_in, reference))\n",
+        "            _, l2 = sess.run([train_step, loss], my_feed_dict)\n",
+        "            steps += 1\n",
+        "\n",
+        "            if (j==0 and i<3) or (ib==0 and i%10==0):\n",
+        "              print('epoch {:03d}/{:03d}, batch {:03d}/{:03d}, step {:04d}/{:04d}: loss={}'.format( j+1, epochs, ib+1, dataset.numBatches, i+1, dataset.numSteps, l2 ))\n",
+        "            dataset.nextStep()\n",
+        "\n",
+        "        dataset.nextBatch()\n",
+        "\n",
+        "    if j%10==9: network.save(output_dir+'/nn_epoch{:04d}.h5'.format(j+1))\n",
+        "\n",
+        "#tf_writer_tr.close()\n",
+        "network.save(output_dir+'/final.h5')\n"
+      ],
+      "execution_count": null,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "epoch 001/004, batch 001/002, step 0001/0496: loss=6816.912109375\n",
+            "epoch 001/004, batch 001/002, step 0002/0496: loss=4036.171875\n",
+            "epoch 001/004, batch 001/002, step 0003/0496: loss=1627.9716796875\n",
+            "epoch 001/004, batch 001/002, step 0011/0496: loss=1403.9822998046875\n",
+            "epoch 001/004, batch 001/002, step 0021/0496: loss=841.949951171875\n"
+          ],
+          "name": "stdout"
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "swG7GeDpWT_Z"
+      },
+      "source": [
+        "The loss should go down from ca. 1000 initially to around 1. This is a good sign, but of course it's even more important to see how the resulting solver fares on new inputs.\n",
+        "\n",
+        "Note that after training we're realized a hybrid solver, consisting of a regular _source_ simulator, and a network that was trained to specificially interact with this simulator for a chosen domain of simulation cases.\n",
+        "\n",
+        "Due to the size of this example, we'll refer you to the external code - the network trained here should be directly useable in [this apply script](https://github.com/tum-pbs/Solver-in-the-Loop/blob/master/karman-2d/karman_apply.py).\n",
+        "\n",
+        "---\n",
+        "\n",
+        "## Next steps\n",
+        "\n",
+        "* Modify the training to further reduce the training error\n",
+        "\n",
+        "* Export the network to the external github code, and run it on new wake flow cases. You'll see that a reduced training error not always directly correlates with an improved test performance\n",
+        "\n",
+        "* Turn off the differentiable physics training (by setting `msteps=1`), and compare it with the DP version\n",
+        "\n",
+        "* Try to train with even more `msteps`. Note that due to the recurrent nature of the training, you'll probabaly have to load a pre-trained state to stabilize the first iterations"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "RumKebW_05xp"
+      },
+      "source": [
+        ""
+      ],
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
--- a/resources/diffphys-sol-domain.png
+++ b/resources/diffphys-sol-domain.png
--- a/resources/diffphys-sol-manifolds.jpeg
+++ b/resources/diffphys-sol-manifolds.jpeg