additional corrections Maxi

2021-07-20 10:30:21 +02:00 · 2021-07-20 10:30:21 +02:00 · 0fda2c388c
commit 0fda2c388c
parent b532d7f52f
2 changed files with 17 additions and 17 deletions
--- a/diffphys-code-ns.ipynb
+++ b/diffphys-code-ns.ipynb
@ -36,7 +36,7 @@
        "\n",
        "Next, we'll target a more complex example with the Navier-Stokes equations as physical model. In line with {doc}`overview-ns-forw`, we'll target a 2D case.\n",
        "\n",
-        "As optimization objective we'll consider a more difficult variant of the previous Burgers example: the state of the observed density $s$ should match a given target after $n=20$ steps of simulation. In contrast to before, the observed quantity (here the marker field $s$) cannot be modified in any way, but only the initial state of the velocity $\\mathbf{u}_0$ at $t=0$. This gives us a split between observable quantities for the loss formulation and quantities that we can interact with during the optimization (or later on via NNs).\n",
+        "As optimization objective we'll consider a more difficult variant of the previous Burgers example: the state of the observed density $s$ should match a given target after $n=20$ steps of simulation. In contrast to before, the observed quantity in the form of the marker field $s$ cannot be changed in any way. Only the initial state of the velocity $\\mathbf{u}_0$ at $t=0$ can be modified. This gives us a split between observable quantities for the loss formulation and quantities that we can interact with during the optimization (or later on via NNs).\n",
        "[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/diffphys-code-ns.ipynb)\n",
        "\n",
        "## Physical Model\n",
@ -135,7 +135,7 @@
      "source": [
        "## Batched simulations\n",
        "\n",
-        "Now we can set up the simulation, which will work in line with the previous \"regular\" simulation example from the {doc}`overview-ns-forw`. However, now we'll directly include an additional dimension, in line with a mini-batch used for NN training. For this, we'll include a named dimension called `inflow_loc`. This dimension will exist \"above\" the previous spatial dimensions `y`, `x` and the channel dimensions `vector`. As indicated by the name `inflow_loc`, the main differences for this dimension will lie in different locations of the inflow, in order to obtain different flow simulations. The named dimensions in phiflow make it very convenient to broadcast information across matching dimensions in different tensors.\n",
+        "Now we can set up the simulation, which will work in line with the previous \"regular\" simulation example from the {doc}`overview-ns-forw`. However, now we'll directly include an additional dimension, similar to a mini-batch used for NN training. For this, we'll introduce a named dimension called `inflow_loc`. This dimension will exist \"above\" the previous spatial dimensions `y`, `x` and the channel dimensions `vector`. As indicated by the name `inflow_loc`, the main differences for this dimension will lie in different locations of the inflow, in order to obtain different flow simulations. The named dimensions in phiflow make it very convenient to broadcast information across matching dimensions in different tensors.\n",
        "\n",
        "The `Domain` object is allocated just like before, but the `INFLOW_LOCATION` tensor now receives a string\n",
        "`'inflow_loc,vector'` that indicates the names of the two dimensions. This leads to the creation of an `inflow_loc` dimensions in addition to the two spatial dimensions (the `vector` part).\n"
@ -269,7 +269,7 @@
        "\n",
        "Let's look at how to get gradients from our simulation. The first trivial step taken care of above was to include `phi.torch.flow` to import differentiable operators from which to build our simulator.\n",
        "\n",
-        "Now we want to optimize the initial velocities so that all simulations arrive at a final state that is similar to the simulation no the right, where the inflow is located at `(16, 5)`, i.e. centered along `x`.\n",
+        "Now we want to optimize the initial velocities so that all simulations arrive at a final state that is similar to the simulation on the right, where the inflow is located at `(16, 5)`, i.e. centered along `x`.\n",
        "To achieve this, we record the gradients during the simulation and define a simple $L^2$ based loss function. The loss function we'll use is given by $L = | s_{t_e} - s_{t_e}^* |^2$, where $s_{t_e}$ denotes the smoke density, and $s_{t_e}^*$\n",
        "denotes the reference state from the fourth simulation in our batch (both evaluated at the last time step $t_e$).\n",
        "When evaluating the loss function we treat the reference state as an external constant via `field.stop_gradient()`.\n",
@ -323,7 +323,7 @@
      "source": [
        "Phiflow's `field.functional_gradient()` function is the central function to compute gradients. Next, we'll use it to obtain the gradient with respect to the initial velocity. Since the velocity is the second argument of the `simulate()` function, we pass `wrt=[1]`. (Phiflow also has a `field.spatial_gradient` function which instead computes derivatives of tensors along spatial dimensions, like `x,y`.)\n",
        "\n",
-        "`functional_gradient` in generates a gradient function, which we evaluate one term for test purposes with the initial states for smoke and velocity. The last statement prints a summary of a part of the tensor.\n"
+        "`functional_gradient` generates a gradient function. As a demonstration, the next cell evaluates the gradient once with the initial states for smoke and velocity. The last statement prints a summary of a part of the resulting gradient tensor.\n"
      ]
    },
    {
@ -585,7 +585,7 @@
      "source": [
        "Naturally, the image on the right is the same (this is the reference), and the other three simulations now exhibit a  shift towards the right. As the differences are a bit subtle, let's visualize the difference between the target configuration and the different final states.\n",
        "\n",
-        "The following images contain three sets of two: each time the original, unmodified simulation states left, and the one after optimization on its right side. Due to the difference calculation, dark regions indicate where the target should be, but isn't.\n"
+        "The following images contain the difference between the evolved simulated and target density. Hence, dark regions indicate where the target should be, but isn't. The top row shows the original states with the initial velocity being zero, while the bottom row shows the versions after the optimization has tuned the initial velocities. Hence, in each column you can compare before (top) and after (bottom): \n"
      ]
    },
    {
@ -632,7 +632,7 @@
        "id": "cuGIjhGx_trw"
      },
      "source": [
-        "These difference images clearly show that the optimization managed to align the upper region of the plumes very well. Each original image shows a clear misaligned in terms of a black halo, while the states after optimization largely overlap the target smoke configuration of the reference, and exhibit differences closer to zero for the front of each smoke cloud.\n",
+        "These difference images clearly show that the optimization managed to align the upper region of the plumes very well. Each original image (at the top) shows a clear misalignment in terms of a black halo, while the states after optimization largely overlap the target smoke configuration of the reference, and exhibit differences closer to zero for the front of each smoke cloud.\n",
        "\n",
        "Note that all three simulations need to \"work\" with a fixed inflow, hence they cannot simply \"produce\" marker density out of the blue to match the target. Also each simulation needs to take into account how the non-linear model equations change the state of the system over the course of 20 time steps. So the optimization goal is quite difficult, and it is not possible to exactly satisfy the constraints to match the reference simulation in this scenario. E.g., this is noticeable at the stems of the smoke plumes, which still show a black halo after the optimization. The optimization was not able to shift the inflow position, and hence needs to focus on aligning the upper regions of the plumes.\n"
      ]
--- a/diffphys-dpvspinn.md
+++ b/diffphys-dpvspinn.md
@ -1,14 +1,14 @@
 Differentiable Physics versus Physics-informed Training
 =======================

-In the previous sections we've seen example reconstructions that used physical residuals as soft constraints, in the form of the PINNs, and reconstructions that used a differentiable physics (DP) solver. While both methods can find minimizers for the similar inverse problems, the obtained solutions differ substantially, as does the behavior of the non-linear optimization problem that we get from each formulation. In the following we discuss these differences in more detail, and we will combine conclusions drawn from the behavior of the Burgers case of {doc}`physicalloss-code` and {doc}`diffphys-code-burgers` with observations from research papers.
+In the previous sections we've seen example reconstructions that used physical residuals as soft constraints, in the form of the PINNs, and reconstructions that used a differentiable physics (DP) solver. While both methods can find minimizers for similar inverse problems, the obtained solutions differ substantially, as does the behavior of the non-linear optimization problem that we get from each formulation. In the following we discuss these differences in more detail, and we will combine conclusions drawn from the behavior of the Burgers case of {doc}`physicalloss-code` and {doc}`diffphys-code-burgers` with observations from research papers.

 ![Divider](resources/divider3.jpg)


 ## Compatibility with existing numerical methods

-It is very obvious that the PINN implementation is quite simple, which is a positive aspect, but at the same time it differs strongly from "typical" discretizations and solution approaches that are usually to employed equations like Burgers equation. The derivatives are computed via the neural network, and hence rely on a fairly accurate representation of the solution to provide a good direction for optimization problems.
+It is very obvious that the PINN implementation is quite simple, which is a positive aspect, but at the same time it differs strongly from "typical" discretizations and solution approaches that are usually employed to solve PDEs like Burgers equation. The derivatives are computed via the neural network, and hence rely on a fairly accurate representation of the solution to provide a good direction for optimization problems.

 The DP version on the other hand inherently relies on a numerical solver that is tied into the learning process. As such it requires a discretization of the problem at hand, and via this discretization can employ existing, and potentially powerful numerical techniques. This means solutions and derivatives can be evaluated with known and controllable accuracy, and can be evaluated efficiently.

@ -16,15 +16,15 @@ The DP version on the other hand inherently relies on a numerical solver that is

 The reliance on a suitable discretization requires some understanding and knowledge of the problem under consideration. A sub-optimal discretization can impede the learning process or, worst case, lead to diverging training runs. However, given the large body of theory and practical realizations of stable solvers for a wide variety of physical problems, this is typically not an unsurmountable obstacle.

-The PINN approaches on the other hand do not require an a-priori choice of a discretization, and as such seems to be "discretization-less". This, however, is only an advantage on first sight. As they yield solutions in a computer, they naturally _have_ to discretize the problem, but they construct this discretization over the coure of the training process, in a way that is not easily controllable from the outside. Thus, the resulting accuracy is determined by how well the training manages to estimate the complexity of the problem for realistic use cases, and how well the training data approximates the unknown regions of the solution.
+The PINN approaches on the other hand do not require an a-priori choice of a discretization, and as such seems to be "discretization-less". This, however, is only an advantage on first sight. As they yield solutions in a computer, they naturally _have_ to discretize the problem, but they construct this discretization over the course of the training process, in a way that is not easily controllable from the outside. Thus, the resulting accuracy is determined by how well the training manages to estimate the complexity of the problem for realistic use cases, and how well the training data approximates the unknown regions of the solution.

 As demonstrated with the Burgers example, the PINN solutions typically have significant difficulties propagating information _backward_ in time. This is closely coupled to the efficiency of the method.

 ## Efficiency

-The PINN approaches typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfilment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.
+The PINN approaches typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfillment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.

-A well-chosen discretization of a DP approach can remedy this behavior, and provide an improved flow of gradient information. At the same time, the reliance on a computational grid means that solutions can be obtained very quickly. Given an interpolation scheme or set of basis functions, the solution can be sampled at any point in space or time given a very local neighborhood of the computational grid. Worst case, this can lead to slight memory overheads, e.g., by repeatedly storing mostly constant values of a solution.
+A well-chosen discretization of a DP approach can remedy this behavior, and provide an improved flow of gradient information. At the same time, the reliance on a computational grid means that solutions can be obtained very quickly. Given an interpolation scheme or a set of basis functions, the solution can be sampled at any point in space or time given a very local neighborhood of the computational grid. Worst case, this can lead to slight memory overheads, e.g., by repeatedly storing mostly constant values of a solution.

 For the PINN representation with fully-connected networks on the other hand, we need to make a full pass over the potentially large number of values in the whole network to obtain a sample of the solution at a single point. The network effectively needs to encode the full high-dimensional solution, and its size likewise determines the efficiency of derivative calculations.

@ -32,7 +32,7 @@ For the PINN representation with fully-connected networks on the other hand, we

 That being said, because the DP approaches can cover much larger solution manifolds, the structure of these manifolds is typically also difficult to learn. E.g., when training a network with a larger number of iterations (i.e. a long look-ahead into the future), this typically represents a signal that is more difficult to learn than a short look ahead. 

-As a consequence, these training runs not only take more computational resources per NN iteration, the also need longer to converge. Regarding resources, each computation of the look-ahead potentially requires a large number of simulation steps, and typically a similar amount of resources for the backpropagation step. Regarding convergence, the more complex signal that should be learned can take more training iterations and require larger NN structures. 
+As a consequence, these training runs not only take more computational resources per NN iteration, they also need longer to converge. Regarding resources, each computation of the look-ahead potentially requires a large number of simulation steps, and typically a similar amount of resources for the backpropagation step. Regarding convergence, the more complex signal that should be learned can take more training iterations and require larger NN structures. 


 ![Divider](resources/divider2.jpg)
@ -44,12 +44,12 @@ The following table summarizes these pros and cons of physics-informed (PI) and

 | Method   |  ✅ Pro   |  ❌ Con  |
 |----------|-------------|------------|
-| **PI** | - Analytic derivatives via backpropagation  | - Expensive evaluation of NN, as well as derivative calculations | 
-|          | - Simple to implement  | - Incompatible with existing numerical methods     | 
-|          |                  | - No control of discretization  | 
+| **PI** | - Analytic derivatives via backpropagation.  | - Expensive evaluation of NN, as well as derivative calculations. | 
+|          | - Easy to implement.  | - Incompatible with existing numerical methods.     | 
+|          |                  | - No control of discretization.  | 
 | | | |
-| **DP** | - Leverage existing numerical methods | - More complicated to implement  | 
-|          | - Efficient evaluation of simulation and derivatives | - Require understanding of problem to choose suitable discretization |
+| **DP** | - Leverage existing numerical methods. | - More complicated implementation.  | 
+|          | - Efficient evaluation of simulation and derivatives. | - Require understanding of problem to choose suitable discretization. |
 | | | |

 As a summary, both methods are definitely interesting, and have a lot of potential. There are numerous more complicated extensions and algorithmic modifications that change and improve on the various negative aspects we have discussed for both sides.