cleanup of dp code

2021-01-16 17:10:21 +08:00 · 2021-01-16 17:10:21 +08:00 · 60f3412751
commit 60f3412751
parent b68bbcfaa5
4 changed files with 204 additions and 218 deletions
--- a/diffphys-code-gradient.ipynb
+++ b/diffphys-code-gradient.ipynb
@ -10,48 +10,14 @@
    "\n",
    "We target the same reconstruction task like for the PINN example. This has some immediate implications: the evolution of the  system is now fully determined by our PDE formulatio. Hence, the only real unknown is the initial state! We will still need to re-compute all the states betwwen the initial and target state many times, just now we won't need an NN for this step, we can rely on our discretized model. Also, as we choose an initial discretization for the DP approach, the unknown initial state consists of the sampling points of the involved physical fields, and we can simply represent these unknowns as floating point variables. Hence, even for the initial state we do not need to set up an NN. Thus, our Burgers reconstruction problem reduces to a gradient-based opitmization without any NN when solving it with DP. Nonetheless, it's a very good starting point to illustrate the process.\n",
    "\n",
-    "First, we'll set up our discretized simulation. Here we can employ the phiflow solver, as shown in {doc}`overview-burgers-forw`. phiflow directly gives us a computational graph in tensorflow. So, as a first step, let's set up our grid with 128 points, and construct a tensorflow graph for 32 steps of the simulation. (As usual, you can ignore the warnings when loading TF and phiflow.)"
+    "First, we'll set up our discretized simulation. Here we can employ the phiflow solver, as shown in the overview section on _Burgers forward simulations_. phiflow directly gives us a computational graph in tensorflow. So, as a first step, let's set up our grid with 128 points, and construct a tensorflow graph for 32 steps of the simulation. (As usual, you can ignore the warnings when loading TF and phiflow.)"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
   "metadata": {},
   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Could not load resample cuda libraries: CUDA binaries not found at /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/cuda/build/resample.so. Run \"python setup.py cuda\" to compile them\n",
-      "WARNING:tensorflow:From /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/util.py:119: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n",
-      "\n",
-      "WARNING:tensorflow:From /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/profiling.py:12: The name tf.RunOptions is deprecated. Please use tf.compat.v1.RunOptions instead.\n",
-      "\n",
-      "WARNING:tensorflow:From /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/profiling.py:13: The name tf.RunMetadata is deprecated. Please use tf.compat.v1.RunMetadata instead.\n",
-      "\n",
-      "WARNING:tensorflow:From /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/util.py:30: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.\n",
-      "\n",
-      "WARNING:tensorflow:From /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/tf_backend.py:287: to_complex64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Use `tf.cast` instead.\n",
-      "WARNING:tensorflow:From /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/tf_backend.py:375: The name tf.fft is deprecated. Please use tf.signal.fft instead.\n",
-      "\n",
-      "WARNING:tensorflow:From /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/tf_backend.py:387: The name tf.ifft is deprecated. Please use tf.signal.ifft instead.\n",
-      "\n",
-      "WARNING:tensorflow:From /Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/tf_backend.py:399: The name tf.real is deprecated. Please use tf.math.real instead.\n",
-      "\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/thuerey/Dropbox/mbaDevelSelected/phiflow-v1.5/phi/tf/flow.py:15: UserWarning: TensorFlow-CUDA solver is not available. To compile it, download phiflow sources and run\n",
-      "$ python setup.py tf_cuda\n",
-      "before reinstalling phiflow.\n",
-      "  warnings.warn(\"TensorFlow-CUDA solver is not available. To compile it, download phiflow sources and run\\n$ python setup.py tf_cuda\\nbefore reinstalling phiflow.\")\n"
-     ]
-    },
    {
     "name": "stdout",
     "output_type": "stream",
@ -78,19 +44,21 @@
    "for i in range(32):\n",
    "    states.append( Burgers().step(states[-1],dt=dt) )\n",
    "\n",
-    "print(\"Each velocity is a phiflow grid like this one: \" + format(states[-1].velocity) )"
+    "print(\"Note: each velocity is a phiflow grid like this one \" + format(states[-1].velocity) )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Note, a lot has happened in these first steps, but we didnt run a single calculation for our simulation. In contrast to the first example, which used numpy as phiflow backend, we now have used the tensorflow backend (the trick here was importing phi.tf.flow instad of phi.flow). This gave us a tensorflow graph, but didn't run any real simulation steps. That will only happen once we run a tensorflow session and pass some data through this graph. So, in a way _nothing_ has happened so far in terms of our simulation! Next, let's set up the loss. That's pretty simple, we want the solution at t=0.5, i.e. step number 16, to respect the data in terms of an L2 norm. Let's also directly initialize the TF variables, and see what loss we get from the initial zero'ed initialization:"
+    "**A lot** has happened in these first steps, but we didnt run a single calculation for our simulation. In contrast to the first example, which used numpy as phiflow backend, we now have used the tensorflow backend (the trick here was importing `phi.tf.flow` instad of `phi.flow`). This gave us a tensorflow graph, but didn't run any real simulation steps. That will only happen once we run a tensorflow session and pass some data through this graph. So, in a way _nothing_ has happened so far in terms of our simulation! \n",
+    "\n",
+    "Next, let's set up the loss, such that we can start the optimization. That's actually pretty simple, we want the solution at $t=0.5$, i.e. step number 16, to respect the data in terms of an L2 norm. Let's also directly initialize the TF variables, and see what loss we get from the initial zero'ed initialization:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
@ -128,12 +96,18 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Note: because we're only constraining timestep 16, we could actually omit steps 17 to 31 here, but for fairness and comparison with the PINN case, let's include them."
+    "Not surprisingly, because we're starting from zero, there's a significant intial error of ca. 0.38 for the 16th simulation step.\n",
+    "\n",
+    "Note that because we're only constraining timestep 16, we could actually omit steps 17 to 31 in this setup. They don't have any degrees of freedom and are not constrained in any way. For fairness and for direct comparison with the previous PINN case, let's include them.\n",
+    "\n",
+    "Now we have our simulation graph in TF, we can use TF to give us a gradient for the initial state for the loss. All we need to do is run `tf.gradients(loss, [state_in.velocity.data]`, which will give us a direction for each velocity variable. Based on a linear approximation, this direction tells us how to change each of them to increase the loss function (gradients _always_ point \"upwards\"). In the following code snipped, we're additionally saving all these gradients in a list called `grads`, such that we can visualize them later on. (Normally, we could discard each gradient after performing an update step.)\n",
+    "\n",
+    "Based on the gradient, we can now take a step in the opposite direction to bring the loss down (instead of increasing it). Below we're using a learning rate `LR=5` for this step. Afterwards, we're re-evaluating the loss for the updated state to check how we did. As these updates aren't exactly fast, we're only computing five of them here:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
@ -170,23 +144,25 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's visualize the reconstruction: first \n",
+    "One of the reasons for the bad performance here is that we're querying states via `sess.run()` multiple times, instead of computing the update in one go. Nonetheless, this simple example already clearly shows that the loss is nicely going down: from above 0.38 to ca. 0.22. This is a good sign!\n",
    "\n",
-    "gradient, pointing in the \"wrong\" direction, we're taking a step along the negative direction"
+    "Before improving on the performance, let's visualize the reconstruction.\n",
+    "\n",
+    "Starting from the zero state `state0` (shown in blue), the first gradient is shown as a red line here. If you compare it with the solution it points in the opposite direction, as expected. The solution is much larger in magnitude, so we omit it here (see the next graph).\n"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[<matplotlib.lines.Line2D at 0x139d41c90>]"
+       "[<matplotlib.lines.Line2D at 0x13fa0ae90>]"
      ]
     },
-     "execution_count": 4,
+     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    },
@ -213,11 +189,6 @@
    "[grad0,s0] = sess.run([grads[0],state0], feed_dict={state_in: stateN}) #.velocity.data\n",
    "fig.plot(pltx, grad0[0].flatten()             , lw=2, color='red')  \n",
    "\n",
-    "# # target constraint at t=0.5\n",
-    "# fig.plot(pltx, target.velocity.data.flatten(), lw=2, color='forestgreen') \n",
-    "\n",
-    "# # constrained state of simulation\n",
-    "# contrained_state = sess.run(states[16], feed_dict={state_in: stateN}).velocity.data\n",
    "fig.plot(pltx, state0.velocity.data.flatten(), lw=2, color='mediumblue')"
   ]
  },
@ -225,21 +196,23 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "... how well the last state of the simulation matches the target"
+    "So the gradient looks good (feel free to plot the others from the five iterations above).\n",
+    "\n",
+    "How well does the 16th state of the simulation actually match the target after the 5 update steps? This is what the loss measures, after all. The next graph shows the constraints (i.e. the solution we'd like to obtain) in green, and the reconstructed state after the initial state was updated 16 times by the solver:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[<matplotlib.lines.Line2D at 0x139a7b890>]"
+       "[<matplotlib.lines.Line2D at 0x13fbd43d0>]"
      ]
     },
-     "execution_count": 5,
+     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    },
@ -258,10 +231,6 @@
   ],
   "source": [
    "fig = plt.figure().gca()\n",
-    "pltx = np.linspace(-1,1,n)\n",
-    "\n",
-    "# initial guess for optimization, this is where we started out...\n",
-    "fig.plot(pltx, initial.flatten()             , lw=2, color='linen')  \n",
    "\n",
    "# target constraint at t=0.5\n",
    "fig.plot(pltx, target.velocity.data.flatten(), lw=2, color='forestgreen') \n",
@ -275,9 +244,13 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Seems to be going in the right direction, but not there yet... It's a bit slow, and needs more iterations.\n",
+    "This seems to be going in the right direction! It's not there yet, but we've also only computed 5 steps so far. The two peaks with a positive velocity on the left side of the shock and the negative peak on the right side are starting to show.\n",
    "\n",
-    "Suboptimal so far: update via gradient manually computed with multiple TF session run() calls. TF optimizers do this internally, more flexibly and more efficiently... Let's start over, and switch to a `tf.train.Optimizer()` instead."
+    "This is a good indicator that the backpropagation of gradients through all of our 16 simulated steps is behaving correctly, and driving the solution in the right direction. This hints at how powerful this setup is: the gradient that we obtain from each of the simulation steps (and each operation within them) can easily be chained together into more complex sequences. Here, we're backpropagation through all 16 steps of the simulation, and we could easily enlarge this \"look-ahead\" of the optimization without any additional effort.\n",
+    "\n",
+    "A simple direct reconstruction problem like this one is always a good initial test for a DP solver, e.g., before moving to more complex setups like coupling it with an NN. If the direct optimization does not converge, there's probably still something fundamentally wrong, and there's no point involving an NN. \n",
+    "\n",
+    "Our Burgers example here looks good - before including an NN, let's address the performance issue so: here we've updated our solutoin via a gradient that is \"manually\" computed with multiple TF session run() calls. TF's own optimizers do this internally, more flexibly and more efficiently. Let's start over, and switch to a `tf.train.Optimizer()` instead."
   ]
  },
  {
@ -304,7 +277,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.7.6"
  }
 },
 "nbformat": 4,
--- a/diffphys-code-tf.ipynb
+++ b/diffphys-code-tf.ipynb
--- a/physicalloss-code.ipynb
+++ b/physicalloss-code.ipynb
--- a/physicalloss.md
+++ b/physicalloss.md
@ -21,8 +21,8 @@ into the training process. E.g., given a PDE for $\mathbf{u}(\mathbf{x},t)$ with
 we can typically express it in terms of a function $\mathcal F$ of the derivatives 
 of $\mathbf{u}$ via  
 $
-  \mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx, ... \mathbf{u}_{xx...x} )
-$,
+  \mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} ) ,
+$
 where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to one of the spatial dimensions
 of higher and higher order (this can of course also include derivatives with repsect to different axes).

@ -30,8 +30,8 @@ In this context we can employ DL by approximating the unknown $\mathbf{u}$ itsel
 with a NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
 naturally should be satisfied, i.e., the residual $R$ should be equal to zero: 
 $
-  R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx, ... \mathbf{u}_{xx...x} ) = 0
-$
+  R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} ) = 0
+$.

 This nicely integrates with the objective for training a neural network: similar to before
 we can collect sample solutions 
@ -90,7 +90,7 @@ For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we ca

 ## Summary so far

-This gives us a method to include physical equations into DL learning as a soft-constraint.
+The approach above gives us a method to include physical equations into DL learning as a soft-constraint.
 Typically, this setup is suitable for _inverse_ problems, where we have certain measurements or observations
 that we wish to find a solution of a model PDE for. Because of the high expense of the reconstruction (to be 
 demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult