updates maximilian PINN chapter

2021-07-12 17:19:02 +02:00 · 2021-07-12 17:19:02 +02:00 · 998415f530
commit 998415f530
parent fe0026a8ca
3 changed files with 23 additions and 21 deletions
--- a/physicalloss-code.ipynb
+++ b/physicalloss-code.ipynb
@ -30,19 +30,19 @@
   "source": [
    "## Formulation\n",
    "\n",
-    "In terms of notation from {doc}`overview-equations` and the previous section, this reconstruction problem means we are solving\n",
+    "In terms of the $x,y^*$ notation from {doc}`overview-equations` and the previous section, this reconstruction problem means we are solving\n",
    "\n",
    "$$\n",
    "\\text{arg min}_{\\theta} \\sum_i |f(x_i ; \\theta)-y^*_i|^2 + R(x_i) ,\n",
    "$$\n",
    "\n",
-    "where $x$ and $y^*$ are solutions at different locations in space and time, i.e.  $x,y^* \\in \\mathbb{R}$.\n",
-    "Together, they represent two-dimensional solutions\n",
-    "$x(p_i,t_i)$ and $y^*(p_i,t_i)$ for a spatial coordinate $p_i$ and a time $t_i$, where the index $i$ sums over a set of chosen $p_i,t_i$\n",
-    "locations. The residual function $R$ above collects additional evaluations of $f$ and its derivatives to formulate the residual for $\\mathcal{P}$. I.e., $R$ should simply converge to zero above, and we've omitted scaling factors in the objective function for simplicity.\n",
+    "where $x$ and $y^*$ are solutions of $u$ at different locations in space and time. As we're dealing with a 1D velocity, $x,y^* \\in \\mathbb{R}$.\n",
+    "They both represent two-dimensional solutions\n",
+    "$x(p_i,t_i)$ and $y^*(p_i,t_i)$ for a spatial coordinate $p_i$ and a time $t_i$, where the index $i$ sums over a set of chosen $p_i,t_i$ locations at which we evaluate the PDE and the approximated solutions. Thus $y^*$ denotes a reference $u$ for $\\mathcal{P}$ being Burgers equation, which $x$ should approximate as closely as possible. Thus our neural network representation of $x$ will receive $p,t$ as input to produce a velocity solution at the specified position.\n",
    "\n",
-    "Note that, effectively, we're only dealing with individual point samples of a single solution for $\\mathcal{P}$ \n",
-    "here.\n",
+    "The residual function $R$ above collects additional evaluations of $f(;\\theta)$ and its derivatives to formulate the residual for $\\mathcal{P}$. This approach -- using derivatives of a neural network to compute a PDE residual -- is typically called a _physics-informed_ approach, yielding a _physics-informed neural network_ (PINN) to represent a solution for the inverse reconstruction problem.\n",
+    "\n",
+    "Thus, in the formulation above, $R$ should simply converge to zero above. We've omitted scaling factors in the objective function for simplicity. Note that, effectively, we're only dealing with individual point samples of a single solution $u$ for $\\mathcal{P}$ here.\n",
    "\n"
   ]
  },
@ -54,7 +54,7 @@
   "source": [
    "## Preliminaries\n",
    "\n",
-    "Let's just load phiflow with the tensorflow backend for now, and initialize the random sampling. (_Note: this example uses an older version of phiflow (1.5.1)._)\n",
+    "Let's just load phiflow with the tensorflow backend for now, and initialize the random sampling. (_Note: this example uses an older version 1.5.1 of phiflow._)\n",
    "\n"
   ]
  },
@ -138,7 +138,7 @@
   "source": [
    "Most importantly, we can now also construct the residual loss function `f` that we'd like to minimize in order to guide the NN to retrieve a solution for our model equation. As can be seen in the equation at the top, we need derivatives w.r.t. $t$, $x$ and a second derivative for $x$. The first three lines of `f` below do just that.\n",
    "\n",
-    "Afterwards, we simply combine the derivates according to Burgers equation:"
+    "Afterwards, we simply combine the derivates to form Burgers equation. Here we make use of phiflow's `gradient` function:"
   ]
  },
  {
@ -291,7 +291,7 @@
    "\n",
    "The direct constraints are evaluated via `network(x, t)[:, 0] - u`, where `x` and `t` are the space-time location where we'd like to sample the solution, and `u` provides the corresponding ground truth value.\n",
    "\n",
-    "For the physical loss points, we have no ground truth solutions, but we'll only evaluate the PDE residual via the NN derivatives, to see whether the solution satisfies PDE model. If not, this directly gives us an error to be reduced via a update step in the optimization. The corresponding expression is of the form  `f(network(x, t)[:, 0], x, t)` below. Note that for both data and physics terms the `network()[:, 0]` expressions don't remove any data from the $L^2$ evaluation, they simply discard the last size-1 dimension of the $(n,1)$ tensor returned by the network."
+    "For the physical loss points, we have no ground truth solutions, but we'll only evaluate the PDE residual via the NN derivatives, to see whether the solution satisfies the PDE model. If not, this directly gives us an error to be reduced via an update step in the optimization. The corresponding expression is of the form  `f(network(x, t)[:, 0], x, t)` below. Note that for both data and physics terms the `network()[:, 0]` expressions don't remove any data from the $L^2$ evaluation, they simply discard the last size-1 dimension of the $(n,1)$ tensor returned by the network."
   ]
  },
  {
@ -392,7 +392,7 @@
    "id": "5NLrym59oXa4"
   },
   "source": [
-    "This training can take a significant amount of time, around 1 minute on a typical notebook, but at least the error goes down significantly (roughly from around 0.2 to ca. 0.03), and the network seems to successfully converge to a solution.\n",
+    "The training can take a significant amount of time, around 2 minutes on a typical notebook, but at least the error goes down significantly (roughly from around 0.2 to ca. 0.03), and the network seems to successfully converge to a solution.\n",
    "\n",
    "Let's show the reconstruction of the network, by evaluating the network at the centers of a regular grid, so that we can show the solution as an image. Note that this is actually fairly expensive, we have to run through the whole network with a few thousand weights for all of the $128 \\times 32$ sampling points in the grid.\n",
    "\n",
@ -500,7 +500,7 @@
    "\n",
    "Let's check how well the initial state at $t=0$ was reconstructed. That's the most interesting, and toughest part of the problem (the rest basically follows from the model equation and boundary conditions, given the first state).\n",
    "\n",
-    "It turns out, the accuracy of the initial state is actually not that great: the blue curve from the PINN is quite far away from the constraints via the reference data (shown in gray)... The solution will get better with larger number of iterations, but it requires a surprisingly large number of them for this fairly simple case. \n"
+    "It turns out that the accuracy of the initial state is actually not that good: the blue curve from the PINN is quite far away from the constraints via the reference data (shown in gray)... The solution will get better with larger number of iterations, but it requires a surprisingly large number of iterations for this fairly simple case. \n"
   ]
  },
  {
@ -861,7 +861,7 @@
    "\n",
    "The solution of the PINN setup above can also directly be improved, however. E.g., try to:\n",
    "\n",
-    "* Adjust parameters of the training to further decrease the error without making the solution diverge\n",
+    "* Adjust parameters of the training to further decrease the error without making the solution diverge.\n",
    "* Adapt the NN architecture for further improvements (keep track of the weight count, though).\n",
    "* Activate a different optimizer, and observe the change in behavior (this typically requires adjusting the learning rate). Note that the more complex optimizers don't necessarily do better in this relatively simple example.\n",
    "* Or modify the setup to make the test case more interesting: e.g., move the boundary conditions to a later point in simulation time, to give the reconstruction a larger time interval to reconstruct."
--- a/physicalloss-discuss.md
+++ b/physicalloss-discuss.md
@ -50,7 +50,7 @@ Thus, the physical soft constraints allow us to encode solutions to
 PDEs with the tools of NNs.
 An inherent drawback of this approach is that it yields single solutions,
 and that it does not combine with traditional numerical techniques well. 
-E.g., learned representation is not suitable to be refined with 
+E.g., the learned representation is not suitable to be refined with 
 a classical iterative solver such as the conjugate gradient method. 

 This means many
--- a/physicalloss.md
+++ b/physicalloss.md
@ -2,7 +2,7 @@ Physical Loss Terms
 =======================

 The supervised setting of the previous sections can quickly 
-yield approximate solutions with a fairly simple training process, but what's
+yield approximate solutions with a fairly simple training process. However, what's
 quite sad to see here is that we only use physical models and numerics
 as an "external" tool to produce a big pile of data 😢.

@ -31,9 +31,7 @@ $$
 where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to one of the spatial dimensions
 of higher and higher order (this can of course also include mixed derivatives with respect to different axes).

-In this context we can employ DL by approximating the unknown $\mathbf{u}$ itself 
-with an NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
-naturally should be satisfied, i.e., the residual $R$ should be equal to zero: 
+In this context, we can approximate the unknown u itself with a neural network. If the approximation, which we call $\tilde{\mathbf{u}}$, is accurate, the PDE should be satisfied naturally. In other words, the residual R should be equal to zero:

 $$
  R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} ) = 0 .
@ -44,8 +42,8 @@ we can collect sample solutions
 $[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(\mathbf{x})=y$. 
 This is typically important, as most practical PDEs we encounter do not have unique solutions
 unless initial and boundary conditions are specified. Hence, if we only consider $R$ we might
-get solutions with random offset or other undesirable components. Hence the supervised sample points
-help to _pin down_ the solution in certain places.
+get solutions with random offset or other undesirable components. The supervised sample points
+therefore help to _pin down_ the solution in certain places.
 Now our training objective becomes

 $$
@ -55,9 +53,13 @@ $$ (physloss-training)
 where $\alpha_{0,1}$ denote hyperparameters that scale the contribution of the supervised term and 
 the residual term, respectively. We could of course add additional residual terms with suitable scaling factors here.

+It is instructive to note what the two different terms in equation {eq}`physloss-training` mean: The first term is a conventional, supervised L2-loss. If we were to optimize only this loss, our network would learn to approximate the training samples well, but might average multiple modes in the solutions, and do poorly in regions in between the sample points. 
+If we, instead, were to optimize only the second term (the physical residual), our neural network might be able to locally satisfy the PDE, but still could produce solutions that are still 'off' from our training data. This can happen due to "null spaces" in the solutions, i.e., different solutions that all satisfy the residuals.
+Therefore, we optimize both objectives simultaneously such that, in the best case, the network learns to approximate the specific solutions of the training data while still capturing knowledge about the underlying PDE.
+
 Note that, similar to the data samples used for supervised training, we have no guarantees that the
 residual terms $R$ will actually reach zero during training. The non-linear optimization of the training process
-will minimize the supervised and residual terms as much as possible, but worst case, large non-zero residual 
+will minimize the supervised and residual terms as much as possible, but there is no guarantee. Large, non-zero residual 
 contributions can remain. We'll look at this in more detail in the upcoming code example, for now it's important 
 to remember that physical constraints in this way only represent _soft-constraints_, without guarantees
 of minimizing these constraints.