cleanup, unified notation NN instead of ANN

2021-03-10 12:15:50 +08:00 · 2021-03-10 12:15:50 +08:00 · 8556fa6c96
commit 8556fa6c96
parent 975d67d07a
15 changed files with 44 additions and 40 deletions
--- a/diffphys-code-burgers.ipynb
+++ b/diffphys-code-burgers.ipynb
@ -6,7 +6,7 @@
   "source": [
    "# Burgers Optimization with a Differentiable Physics Gradient\n",
    "\n",
-    "To illustrate the process of computing gradients in a _differentiable physics_ setting, we target the same reconstruction task like for the PINN example. This has some immediate implications: the evolution of the  system is now fully determined by our PDE formulation. Hence, the only real unknown is the initial state! We will still need to re-compute all the states betwwen the initial and target state many times, just now we won't need an NN for this step. Instead, we can rely on our discretized model. \n",
+    "To illustrate the process of computing gradients in a _differentiable physics_ (DP) setting, we target the same inverse problem (the reconstruction task) used for the PINN example  in {doc}`physicalloss-code`. The choice of DP as a method has some immediate implications: we start with a discretized PDE, and the evolution of the  system is now fully determined by the resulting numerical solver. Hence, the only real unknown is the initial state! We will still need to re-compute all the states betwwen the initial and target state many times, just now we won't need an NN for this step. Instead, we can rely on our discretized model. \n",
    "\n",
    "Also, as we choose an initial discretization for the DP approach, the unknown initial state consists of the sampling points of the involved physical fields, and we can simply represent these unknowns as floating point variables. Hence, even for the initial state we do not need to set up an NN. Thus, our Burgers reconstruction problem reduces to a gradient-based opitmization without any NN when solving it with DP. Nonetheless, it's a very good starting point to illustrate the process.\n",
    "\n",
--- a/diffphys-code-sol.ipynb
+++ b/diffphys-code-sol.ipynb
@ -8,7 +8,7 @@
   "source": [
    "# Reducing Numerical Errors with Deep Learning\n",
    "\n",
-    "Next, we'll target numerical errors that arise in the discretization of a continuous PDE $\\mathcal P^*$, i.e. when we formulate $\\mathcal P$. This approach will demonstrate that, despite the lack of closed-form descriptions, discretization errors often are functions with regular and repeating structures and, thus, can be learned by a neural network. Once the network is trained, it can be evaluated locally to improve the solution of a PDE-solver, i.e., to reduce its numerical error. The resulting method is a hybrid one: it will always run (a coarse) PDE solver, and then improve if at runtime with corrections inferred by an NN.\n",
+    "First, we'll target numerical errors that arise in the discretization of a continuous PDE $\\mathcal P^*$, i.e. when we formulate $\\mathcal P$. This approach will demonstrate that, despite the lack of closed-form descriptions, discretization errors often are functions with regular and repeating structures and, thus, can be learned by a neural network. Once the network is trained, it can be evaluated locally to improve the solution of a PDE-solver, i.e., to reduce its numerical error. The resulting method is a hybrid one: it will always run (a coarse) PDE solver, and then improve if at runtime with corrections inferred by an NN.\n",
    " \n",
    "Pretty much all numerical methods contain some form of iterative process. That can be repeated updates over time for explicit solvers,or within a single update step for implicit solvers. Below we'll target iterations over time, an example for the second case could be found [here](https://github.com/tum-pbs/CG-Solver-in-the-Loop).\n",
    "\n",
--- a/diffphys-control.ipynb
+++ b/diffphys-control.ipynb
@ -6,18 +6,18 @@
    "id": "P9P3fJaa30da"
   },
   "source": [
-    "# Deep Learning for Inverse Problems\n",
+    "# Solving Inverse Problems with NNs\n",
    "\n",
-    "**TODOs**: **1) bottom: show targets in results; 2) move inverse prob discussion earlier? already for DP-NS example?**\n",
+    "**TODOs: 1) bottom: show targets in results;**\n",
    "\n",
    "\n",
-    "Inverse problems encompass a large class of practical scenarios that appear in science. In general, the goal here is not to directly compute a physical field like the velocity at a future time (this is the typical scenario for a _forward_ solve), but instead more generically compute a parameter in the model equation such that certain constraints are fulfilled. A very common goal here is to find the optimal setting for a single parameter given some constraints. E.g., this could be the global diffusion constant for an advection-diffusion model such that it fits measured data as accurately as possible. Inverse problems are encountered for any model parameter adjusted via observations, or the reconstruction of initial conditions, e.g., for particle imaging velocimetry (PIV). More complex cases aim for computing boundary geometries w.r.t. optmal conditions, e.g. to obtain a shape with minimal drag in a fluid flow.\n",
+    "Inverse problems encompass a large class of practical scenarios that appear in science. In general, the goal here is not to directly compute a physical field like the velocity at a future time (this is the typical scenario for a _forward_ solve), but instead more generically compute a parameter in the model equation such that certain constraints are fulfilled. A very common objective is to find the optimal setting for a single parameter given some constraints. E.g., this could be the global diffusion constant for an advection-diffusion model such that it fits measured data as accurately as possible. Inverse problems are encountered for any model parameter adjusted via observations, or the reconstruction of initial conditions, e.g., for particle imaging velocimetry (PIV). More complex cases aim for computing boundary geometries w.r.t. optmal conditions, e.g. to obtain a shape with minimal drag in a fluid flow.\n",
    "\n",
-    "A key aspect demonstrated below will be that we're not aiming for solving only a _single instance_ of an inverse problem, but we'd like to use deep learning to solve a _large class_ of inverse problems. Thus, unlike the PINN example of {doc}`physicalloss-code` or the DP optimization of {doc}`diffphys-code-ns`, where we've solved an optimization problem for specific instances of inverse problems, we now aim for training an ANN that learns to solve a larger class of inverse problems. Nonetheless, we of course need to rely on a certain degree of similarity for these problems, otherwise there's nothing to learn.\n",
+    "A key aspect demonstrated below will be that we're not aiming for solving only a _single instance_ of an inverse problem, but we'd like to use deep learning to solve a _larger collection_ of inverse problems. Thus, unlike the PINN example of {doc}`physicalloss-code` or the DP optimization of {doc}`diffphys-code-ns`, where we've solved an optimization problem for specific instances of inverse problems, we now aim for training an NN that learns to solve a larger class of inverse problems, i.e., a whole solution manifold. Nonetheless, we of course need to rely on a certain degree of similarity for these problems, otherwise there's nothing to learn (and the implied assumption of continuity in the solution manifold breaks down).\n",
    "\n",
    "Below we will run a very challenging test case as a representative of these inverse problems: we will aim for computing a high dimensional control function that exerts forces over the full course of an incompressible fluid simulation in order to reach a desired goal state for a passively advected marker in the fluid. This means we only have very indirect constraints to be fulfilled (a single state at the end of a sequence), and a large number of degrees of freedom (the control force function is a space-time function with the same degrees of freedom as the flow field itself).\n",
    "\n",
-    "The _long-term_ nature of the control is one of the aspects which makes this a tough inverse problem: any changes to the state of the physical system can lead to large change later on in time, and hence a controller needs to anticipate how the system will behave when it is influenced. This means an ANN also needs to learn how the underlying physics evolve and change, and this is exaclty where the gradients from the DP training come in to guide the learning task towards solution that can reach the goal.\n"
+    "The _long-term_ nature of the control is one of the aspects which makes this a tough inverse problem: any changes to the state of the physical system can lead to large change later on in time, and hence a controller needs to anticipate how the system will behave when it is influenced. This means an NN also needs to learn how the underlying physics evolve and change, and this is exaclty where the gradients from the DP training come in to guide the learning task towards solution that can reach the goal.\n"
   ]
  },
  {
@ -47,7 +47,7 @@
    "minimizes the loss above. The $\\mathrm{OP}$ network is a predictor that determines the action of the $\\mathrm{CFE}$ network given the target $d^*$, i.e., $\\mathrm{OP}(\\mathbf{u},d,d^*)=d_{OP}$,\n",
    "and $\\mathrm{CFE}$ acts additively on the velocity field via\n",
    "$\\mathrm{CFE}(\\mathbf{u},d,d_{OP}) = \\mathbf{u} + f_{\\mathrm{OP}}(\\mathbf{u},d,d_{OP};\\theta_{\\mathrm{OP}})$,  \n",
-    "where we've used $f_{\\mathrm{OP}}$ to denote the ANN representation of $\\mathrm{CFE}$).\n",
+    "where we've used $f_{\\mathrm{OP}}$ to denote the NN representation of $\\mathrm{CFE}$).\n",
    "\n",
    "For this problem, the model PDE $\\mathcal{P}$ contains a discretized version of the incompressible Navier-Stokes equations in two dimensions for a velocity $\\mathbf{u}$:\n",
    "\n",
--- a/diffphys-discuss.md
+++ b/diffphys-discuss.md
@ -23,12 +23,12 @@ actual solver in the training loop via a DP approach.

 ## Summary

-To summarize the pros and cons of training ANNs via differentiable physics:
+To summarize the pros and cons of training NNs via differentiable physics:

 ✅ Pro: 
 - uses physical model and numerical methods for discretization
 - efficiency of selected methods carries over to training
- tight coupling of physical models and ANNs possible
+- tight coupling of physical models and NNs possible

 ❌ Con: 
 - not compatible with all simulators (need to provide gradients)
--- a/diffphys-dpvspinn.md
+++ b/diffphys-dpvspinn.md
@ -29,7 +29,7 @@ For the PINN representation with fully-connected networks on the other hand, we

 That being said, because the DP approaches can cover much larger solution manifolds, the structure of these manifolds is typically also difficult to learn. E.g., when training a network with a larger number of iterations (i.e. a long look-ahead into the future), this typically represents a signal that is more difficult to learn than a short look ahead. 

-As a consequence, these training runs not only take more computational resources per ANN iteration, the also need longer to converge. Regarding resources, each computation of the look-ahead potentially requires a large number of simulation steps, and typically a similar amount of resources for the backprop step. Regarding convergence, the complexer signal that should be learned can take more training iterations or even require larger ANN structures. 
+As a consequence, these training runs not only take more computational resources per NN iteration, the also need longer to converge. Regarding resources, each computation of the look-ahead potentially requires a large number of simulation steps, and typically a similar amount of resources for the backprop step. Regarding convergence, the complexer signal that should be learned can take more training iterations or even require larger NN structures. 

 ## Summary

--- a/diffphys-examples.md
+++ b/diffphys-examples.md
@ -11,7 +11,7 @@ interact with a numerical solver. Hence, it's a prime example of
 situations where it's crucial to bring the numerical solver into the 
 deep learning loop.

-Next, we'll show a tough inverse problem, namely the long-term control
+Next, we'll show how to let NNs solve tough inverse problems, namely the long-term control
 of a fluid simulation, following Holl et al.  {cite}`holl2019pdecontrol`. 
 This task requires long term planning,
 and hence needs two networks, one to _predict_ the evolution, 
--- a/diffphys-outlook.md
+++ b/diffphys-outlook.md
@ -10,7 +10,7 @@ we can obtain _hybrid_ methods, that use the best numerical methods that we have

 ## Interaction

-One key component for these hybrids to work well is to let the ANN _interact_ with the PDE solver at training time. Differentiable simulations allow a trained model to explore and experience the physical environment, and receive directed feedback regarding its interactions throughout the solver iterations. This combination nicely fits into the broader context of machine learning as _differentiable programming_. 
+One key component for these hybrids to work well is to let the NN _interact_ with the PDE solver at training time. Differentiable simulations allow a trained model to explore and experience the physical environment, and receive directed feedback regarding its interactions throughout the solver iterations. This combination nicely fits into the broader context of machine learning as _differentiable programming_. 

 ## Generalization

--- a/diffphys.md
+++ b/diffphys.md
@ -10,10 +10,13 @@ The central goal of this methods is to use existing numerical solvers, and equip
 them with functionality to compute gradients with respect to their inputs.
 Once this is realized for all operators of a simulation, we can leverage 
 the autodiff functionality of DL frameworks with back-propagation to let gradient 
-information from from a simulator into an ANN and vice versa. This has numerous 
+information from from a simulator into an NN and vice versa. This has numerous 
 advantages such as improved learning feedback and generalization, as we'll outline below.
+
 In contrast to physics-informed loss functions, it also enables handling more complex
-solution manifolds instead of single inverse problems.
+solution manifolds instead of single inverse problems. Thus instead of using deep learning
+to solve single inverse problems, we'll show how to train ANNs that solve 
+larger classes of inverse problems very quickly.

 ```{figure} resources/placeholder.png
 ---
@ -54,9 +57,9 @@ $\partial \mathcal P_i / \partial \mathbf{u}$.

 Note that we typically don't need derivatives 
 for all parameters of $\mathcal P$, e.g. we omit $\nu$ in the following, assuming that this is a 
-given model parameter, with which the ANN should not interact. 
+given model parameter, with which the NN should not interact. 
 Naturally, it can vary within the solution manifold that we're interested in, 
-but $\nu$ will not be the output of a ANN representation. If this is the case, we can omit
+but $\nu$ will not be the output of a NN representation. If this is the case, we can omit
 providing $\partial \mathcal P_i / \partial \nu$ in our solver. However, the following learning process
 natuarlly transfers to including $\nu$ as a degree of freedom.

@ -189,7 +192,7 @@ Informally, we'd like to find a motion that deforms $d^{~0}$ into a target state
 The simplest way to express this goal is via an $L^2$ loss between the two states. So we want
 to minimize the loss function $F=|d(t^e) - d^{\text{target}}|^2$. 

-Note that as described here this is a pure optimization task, there's no ANN involved,
+Note that as described here this is a pure optimization task, there's no NN involved,
 and our goal is to obtain $\mathbf{u}$. We do not want to apply this motion to other, unseen _test data_,
 as would be custom in a real learning task.

@ -204,7 +207,7 @@ We'd now like to find the minimizer for this objective by
 _gradient descent_ (GD), where the 
 gradient is determined by the differentiable physics approach described earlier in this chapter.
 Once things are working with GD, we can relatively easily switch to better optimizers or bring
-an ANN into the picture, hence it's always a good starting point.
+an NN into the picture, hence it's always a good starting point.

 As the discretized velocity field $\mathbf{u}$ contains all our degrees of freedom,
 what we need to update the velocity by an amount 
--- a/intro.md
+++ b/intro.md
@ -84,9 +84,9 @@ See also... Test link: {doc}`supervised`

 ## TODOs , include

- general motivation: repeated solves in classical solvers -> potential for ML
 - PINNs: often need weighting of added loss terms for different parts
 - DP intro, check transpose of Jacobians in equations
+- DP control, show targets at bottom?


 ## TODOs , Planned content
--- a/notation.md
+++ b/notation.md
@ -26,7 +26,8 @@
 | CNN  | Convolutional Neural Network |
 | DL   | Deep Learning |
 | GD   | (steepest) Gradient Descent|
-| NN   | Neural Network |
+| MLP  | Multi-Layer Perceptron, a neural network with fully connected layers |
+| NN   | Neural Network (a generic one, in contrast to, e.g., a CNN or MLP) |
 | PDE  | Partial Differential Equation |
 | PBDL | Physics-Based Deep Learning |
 | SGD  | Stochastic Gradient Descent|
--- a/overview-equations.md
+++ b/overview-equations.md
@ -2,7 +2,7 @@ Models and Equations
 ============================

 Below we'll give a very (really _very_!) brief intro to deep learning, primarily to introduce the notation.
-In addition we'll discuss some _model equations_ below. Note that we won't use _model_ to denote trained neural networks, in contrast to some other texts. These will only be called "ANNs" or "networks". A "model" will always denote model equations for a physical effect, typically a PDE.
+In addition we'll discuss some _model equations_ below. Note that we won't use _model_ to denote trained neural networks, in contrast to some other texts. These will only be called "NNs" or "networks". A "model" will always denote model equations for a physical effect, typically a PDE.

 ## Deep Learning and Neural Networks

@ -12,9 +12,9 @@ our goal is to approximate an unknown function
 $f^*(x) = y^*$ , 

 where $y^*$ denotes reference or "ground truth" solutions.
-$f^*(x)$ should be approximated with an ANN representation $f(x;\theta)$. We typically determine $f$ 
+$f^*(x)$ should be approximated with an NN representation $f(x;\theta)$. We typically determine $f$ 
 with the help of some formulation of an error function $e(y,y^*)$, where $y=f(x;\theta)$ is the output
-of the ANN.
+of the NN.
 This gives a minimization problem to find $f(x;\theta)$ such that $e$ is minimized.
 In the simplest case, we can use an $L^2$ error, giving

--- a/overview.md
+++ b/overview.md
@ -158,7 +158,7 @@ fundamental steps. Here are some considerations for skipping ahead to the later

 - you're very familiar with numerical methods and PDE solvers, and want to get started with DL topics right away. The _Supervised Learning_ chapter is a good starting point then.

- On the other hand, if you're already deep into ANNs&Co, and you'd like to skip ahead to the research related topics, we recommend starting in the _Physical Loss Terms_ chapter, which lays the foundations for the next chapters.
+- On the other hand, if you're already deep into NNs&Co, and you'd like to skip ahead to the research related topics, we recommend starting in the _Physical Loss Terms_ chapter, which lays the foundations for the next chapters.

 A brief look at our _notation_ in the {doc}`notation` chapter won't hurt in both cases, though!
 ```
--- a/physicalloss-discuss.md
+++ b/physicalloss-discuss.md
@ -12,7 +12,7 @@ representation regarding the reliability of these derivatives. Also, each deriva
 requires backpropagation through the full network, which can be very slow. Especially so
 for higher-order derivatives.

-And while the setup is relatively simple, it is generally difficult to control. The ANN
+And while the setup is relatively simple, it is generally difficult to control. The NN
 has flexibility to refine the solution by itself, but at the same time, tricks are necessary
 when it doesn't pick the right regions of the solution.

@ -37,10 +37,10 @@ we deploy it into an application.

 In contrast, for the PINN training as described here, we reconstruct a single solution in a known 
 and given space-time time. As such, any samples from this domain follow the same distribution
-and hence don't really represent test or OOD sampes. As the ANN directly encodes the solution,
+and hence don't really represent test or OOD sampes. As the NN directly encodes the solution,
 there is also little hope that it will yield different solutions, or perform well outside
 of the training distribution. If we're interested in a different solution, we most likely 
-have to start training the ANN from scratch.
+have to start training the NN from scratch.

 ## Summary

--- a/physicalloss.md
+++ b/physicalloss.md
@ -91,9 +91,9 @@ For higher order derivatives, such as $\frac{\partial^2 u}{\partial x^2}$, we ca
 ## Summary so far

 The approach above gives us a method to include physical equations into DL learning as a soft-constraint.
-Typically, this setup is suitable for _inverse_ problems, where we have certain measurements or observations
-that we wish to find a solution of a model PDE for. Because of the high expense of the reconstruction (to be 
+Typically, this setup is suitable for _inverse problems_, where we have certain measurements or observations
+for which we want to find a PDE solution. Because of the high cost of the reconstruction (to be 
 demonstrated in the following), the solution manifold typically shouldn't be overly complex. E.g., it is difficult 
-to capture a wide range of solutions, such as the previous supervised airfoil example, in this way.
+to capture a wide range of solutions, such as with the previous supervised airfoil example, in this way.


--- a/supervised-discuss.md
+++ b/supervised-discuss.md
@ -28,7 +28,7 @@ and then increase the complexity of the setup.

 A nice property of the supervised training is also that it's very stable.
 Things won't get any better when we include more complex physical 
-models, or look at more complicated ANN architectures.
+models, or look at more complicated NN architectures.

 Thus, again, make sure you can see a nice exponential falloff in your training 
 loss when starting with the simple overfitting tests. This is a good
@ -42,10 +42,10 @@ rough estimate of suitable values for $\eta$.
 A comment that you'll often hear when talking about DL approaches, and especially
 when using relatively simple training methodologies is: "Isn't it just interpolating the data?"

-Well, **yes** it is! And that's exactly what the ANN should do. In a way - there isn't 
+Well, **yes** it is! And that's exactly what the NN should do. In a way - there isn't 
 anything else to do. This is what _all_ DL approaches are about. They give us smooth
 representations of the data seen at training time. Even if we'll use fancy physical 
-models at training time later on, the ANNs just adjust their weights to represent the signals
+models at training time later on, the NNs just adjust their weights to represent the signals
 they receive, and reproduce it.

 Due to the hype and numerous success stories, people not familiar with DL often have 
@ -54,27 +54,27 @@ and general principles in data sets (["messages from god"](https://dilbert.com/s
 That's not what happens with the current state of the art. Nonetheless, it's
 the most powerful tool we have to approximate complex, non-linear functions.
 It is a great tool, but it's important to keep in mind, that once we set up the training
-correctly, all we'll get out of it is an approximation of the function the ANN
+correctly, all we'll get out of it is an approximation of the function the NN
 was trained for - no magic involved.

 An implication of this is that you shouldn't expect the network 
-to work on data it has never seen. In a way, the ANNs are so good exactly 
+to work on data it has never seen. In a way, the NNs are so good exactly 
 because they can accurately adapt to the signals they receive at training time,
 but in contrast to other learned representations, they're actually not very good
-at extrapolation. So we can't expect an ANN to magically work with new inputs.
+at extrapolation. So we can't expect an NN to magically work with new inputs.
 Rather, we need to make sure that we can properly shape the input space,
 e.g., by normalization and by focusing on invariants. In short, if you always train
 your networks for inputs in the range $[0\dots1]$, don't expect it to work
 with inputs of $[10\dots11]$. You might be able to subtract an offset of $10$ beforehand,
 and re-apply it after evaluating the network.
 As a rule of thumb: always make sure you
-actually train the ANN on the kinds of input you want to use at inference time.
+actually train the NN on the kinds of input you want to use at inference time.

 This is important to keep in mind during the next chapters: e.g., if we
-want an ANN to work in conjunction with another solver or simulation environment,
+want an NN to work in conjunction with another solver or simulation environment,
 it's important to actually bring the solver into the training process, otherwise
 the network might specialize on pre-computed data that differs from what is produced
-when combining the ANN with the solver, i.e _distribution shift_.
+when combining the NN with the solver, i.e _distribution shift_.

 ### Meshes and grids