fixing typos, unifying nomenclature

2021-09-09 10:34:36 +02:00 · 2021-09-09 10:34:36 +02:00 · e0dcf28064
commit e0dcf28064
parent 73ec4d1155
13 changed files with 381 additions and 376 deletions
--- a/diffphys-code-ns.ipynb
+++ b/diffphys-code-ns.ipynb
@ -17,19 +17,19 @@
    "\n",
    "We'll use a Navier-Stokes model with velocity $\\mathbf{u}$, no explicit viscosity term, and a smoke marker density $s$ that drives a simple Boussinesq buoyancy term $\\eta d$ adding a force along the y dimension. For the velocity this gives:\n",
    "\n",
-    "$\\begin{aligned}\n",
+    "$$\\begin{aligned}\n",
    "  \\frac{\\partial u_x}{\\partial{t}} + \\mathbf{u} \\cdot \\nabla u_x &= - \\frac{1}{\\rho} \\nabla p \n",
    "  \\\\\n",
    "  \\frac{\\partial u_y}{\\partial{t}} + \\mathbf{u} \\cdot \\nabla u_y &= - \\frac{1}{\\rho} \\nabla p + \\eta d\n",
    "  \\\\\n",
    "  \\text{s.t.} \\quad \\nabla \\cdot \\mathbf{u} &= 0,\n",
-    "\\end{aligned}$\n",
+    "\\end{aligned}$$\n",
    "\n",
    "With an additional transport equation for the passively advected marker density $s$:\n",
    "\n",
-    "$\\begin{aligned}\n",
+    "$$\\begin{aligned}\n",
    "  \\frac{\\partial s}{\\partial{t}} + \\mathbf{u} \\cdot \\nabla s &= 0 \n",
-    "\\end{aligned}$\n",
+    "\\end{aligned}$$\n",
    "\n"
   ]
  },
@ -45,7 +45,7 @@
    "With the notation from {doc}`overview-equations` the inverse problem outlined above can be formulated as a  minimization problem \n",
    "\n",
    "$$\n",
-    "\\text{arg min}_{\\mathbf{u}_{0}} \\sum_i |f(x_{t_e,i} ; \\mathbf{u}_{0} )-y^*_{t_e,i}|^2 ,\n",
+    "\\text{arg min}_{\\mathbf{u}_{0}} \\sum_i \\big( f(x_{t_e,i} ; \\mathbf{u}_{0} )-y^*_{t_e,i} \\big)^2 ,\n",
    "$$\n",
    "\n",
    "where $y^*_{t_e,i}$ are samples of the reference solution at a targeted time $t_e$,\n",
--- a/diffphys-code-sol.ipynb
+++ b/diffphys-code-sol.ipynb
@ -89,7 +89,7 @@
    "$$\n",
    "\\newcommand{\\corr}{\\mathcal{C}}  \n",
    "\\newcommand{\\vr}[1]{\\mathbf{r}_{#1}} \n",
-    "\\text{argmin}_\\theta | ( \\mathcal{P}_{s} \\corr )^n ( \\mathcal{T} \\vr{t} ) - \\mathcal{T} \\vr{t+n}|^2\n",
+    "\\text{arg min}_\\theta \\big( ( \\mathcal{P}_{s} \\corr )^n ( \\mathcal{T} \\vr{t} ) - \\mathcal{T} \\vr{t+n} \\big)^2\n",
    "$$\n",
    "\n",
    "To simplify the notation, we've dropped the sum over different samples here (the $i$ from previous versions).\n",
--- a/intro-teaser.ipynb
+++ b/intro-teaser.ipynb
@ -43,11 +43,11 @@
    "\n",
    "Let's illustrate the properties of deep learning via DP with the following example: We'd like to find an unknown function $f^*$ that generates solutions from a space $Y$, taking inputs from $X$, i.e. $f^*: X \\to Y$. In the following, we'll often denote _idealized_, and unknown functions with a $*$ superscript, in contrast to their discretized, realizable counterparts without this superscript. \n",
    "\n",
-    "Let's additionally assume we have a generic differential equation $\\mathcal P^*: Y \\to Z$ (our _model_ equation), that encodes a property of the solutions, e.g. some real world behavior we'd like to match. Later on, $P^*$ will represent time evolutions, but it could also be a constraint for conservation of mass (then $\\mathcal P^*$ would measure divergence). But to keep things as simple as possible here, the model we'll look at in the following is a mapping back to the input space $X$, i.e. $\\mathcal P^*: Y \\to X$.\n",
+    "Let's additionally assume we have a generic differential equation $\\mathcal P^*: Y \\to Z$ (our _model_ equation), that encodes a property of the solutions, e.g. some real world behavior we'd like to match. Later on, $P^*$ will often represent time evolutions, but it could also be a constraint for conservation of mass (then $\\mathcal P^*$ would measure divergence). But to keep things as simple as possible here, the model we'll look at in the following is a mapping back to the input space $X$, i.e. $\\mathcal P^*: Y \\to X$.\n",
    "\n",
    "Using a neural network $f$ to learn the unknown and ideal function $f^*$, we could turn to classic _supervised_ training to obtain $f$ by collecting data. This classical setup requires a dataset by sampling $x$ from $X$ and adding the corresponding solutions $y$ from $Y$. We could obtain these, e.g., by classical numerical techniques. Then we train the NN $f$ in the usual way using this dataset. \n",
    "\n",
-    "In contrast to this supervised approach, employing differentiable physics takes advantage of the fact that we can directly use a discretized version of the physical model $\\mathcal P$ and employ it to guide the training of $f$. I.e., we want $f$ to _interact_ with our _simulator_ $\\mathcal P$. This can vastly improve the learning, as we'll illustrate below with a very simple example (more complex ones will follow later on).\n",
+    "In contrast to this supervised approach, employing a differentiable physics approach takes advantage of the fact that we can often use a discretized version of the physical model $\\mathcal P$ and employ it to guide the training of $f$. I.e., we want $f$ to be aware of our _simulator_ $\\mathcal P$, and to _interact_ with it. This can vastly improve the learning, as we'll illustrate below with a very simple example (more complex ones will follow later on).\n",
    "\n",
    "Note that in order for the DP approach to work, $\\mathcal P$ has to be differentiable, as implied by the name. These differentials, in the form of a gradient, are what's driving the learning process.\n"
   ]
@ -65,7 +65,7 @@
   "id": "latest-amino",
   "metadata": {},
   "source": [
-    "To illustrate these two approaches, we consider the following simplified setting: Given the function $\\mathcal P: y\\to y^2$ for $y$ in the intverval $[0,1]$, find the unknown function $f$ such that $\\mathcal P(f(x)) = x$ for all $x$ in $[0,1]$. Note: to make things a bit more interesting, we're using $y^2$ here instead of the more common $x^2$ parabola, and the _discretization_ is simply given by representing the $x$ and $y$ via floating point numbers in the computer for this simple case.\n",
+    "To illustrate the difference of supervised and DP approaches, we consider the following simplified setting: Given the function $\\mathcal P: y\\to y^2$ for $y$ in the interval $[0,1]$, find the unknown function $f$ such that $\\mathcal P(f(x)) = x$ for all $x$ in $[0,1]$. Note: to make things a bit more interesting, we're using $y^2$ here for $\\mathcal P$ instead of the more common $x^2$ parabola, and the _discretization_ is simply given by representing the $x$ and $y$ via floating point numbers in the computer for this simple case.\n",
    "\n",
    "We know that possible solutions for $f$ are the positive or negative square root function (for completeness: piecewise combinations would also be possible).\n",
    "Knowing that this is not overly difficult, a solution that suggests itself is to train a neural network to approximate this inverse mapping $f$.\n",
@ -385,9 +385,11 @@
   "source": [
    "## Discussion\n",
    "\n",
-    "It's a very simple example, but it very clearly shows a failure case for supervised learning. While it might seem very artificial at first sight, many practical PDEs exhibit a variety of these modes, and it's often not clear where (and how many) exist in the solution manifold we're interested in. Using supervised learning is very dangerous in such cases - we might simply and unknowingly _blur_ out these different modes.\n",
+    "It's a very simple example, but it very clearly shows a failure case for supervised learning. While it might seem very artificial at first sight, many practical PDEs exhibit a variety of these modes, and it's often not clear where (and how many) exist in the solution manifold we're interested in. Using supervised learning is very dangerous in such cases. We might unknowingly get an average of these different modes.\n",
    "\n",
-    "Good and obvious examples are bifurcations in fluid flow. Smoke rising above a candle will start out straight, and then, due to tiny perturbations in its motion, start oscillating in a random direction. The images below illustrate this case via _numerical perturbations_: the perfectly symmetric setup will start turning left or right, depending on how the approximation errors build up. Similarly, we'll have different modes in all our numerical solutions, and typically it's important to recover them, rather than averaging them out. Hence, we'll show how to leverage training via _differentiable physics_ in the following chapters for more practical and complex cases.\n",
+    "Good and obvious examples are bifurcations in fluid flow. Smoke rising above a candle will start out straight, and then, due to tiny perturbations in its motion, start oscillating in a random direction. The images below illustrate this case via _numerical perturbations_: the perfectly symmetric setup will start turning left or right, depending on how the approximation errors build up. Averaging the two modes would give an unphysical, straight flow similar to the parabola example above.\n",
+    "\n",
+	"Similarly, we have different modes in many numerical solutions, and typically it's important to recover them, rather than averaging them out. Hence, we'll show how to leverage training via _differentiable physics_ in the following chapters for more practical and complex cases.\n",
    "\n",
    "```{figure} resources/intro-fluid-bifurcation.jpg\n",
    "---\n",
--- a/intro.md
+++ b/intro.md
@ -7,17 +7,18 @@ name: pbdl-logo-large
 ---
 ```

-Welcome to the _Physics-based Deep Learning Book_ 👋
+Welcome to the _Physics-based Deep Learning Book_ (v0.1) 👋

 **TL;DR**: 
-This document targets a practical and comprehensive introduction of everything
+This document contains a practical and comprehensive introduction of everything
 related to deep learning in the context of physical simulations.
-As much as possible, all topics come with hands-on code examples in the form of Jupyter notebooks to quickly get started.
+As much as possible, all topics come with hands-on code examples in the 
+form of Jupyter notebooks to quickly get started.
 Beyond standard _supervised_ learning from data, we'll look at _physical loss_ constraints, 
 more tightly coupled learning algorithms with _differentiable simulations_, as well as 
 reinforcement learning and uncertainty modeling.
-We live in exciting times: these methods have a huge potential to fundamentally change what we can achieve
-with simulations.
+We live in exciting times: these methods have a huge potential to fundamentally 
+change what computer simulations can achieve.

 ---

--- a/overview-burgers-forw.ipynb
+++ b/overview-burgers-forw.ipynb
--- a/overview-equations.md
+++ b/overview-equations.md
@ -27,7 +27,7 @@ $$
 $$ (learn-l2)

 We typically optimize, i.e. _train_, 
-with a stochastic gradient descent (SGD) optimizer of your choice, e.g. Adam {cite}`kingma2014adam`.
+with a stochastic gradient descent (SGD) optimizer of choice, e.g. Adam {cite}`kingma2014adam`.
 We'll rely on auto-diff to compute the gradient w.r.t. weights, $\partial f / \partial \theta$,
 We will also assume that $e$ denotes a _scalar_ error function (also
 called cost, or objective function).
@ -39,7 +39,7 @@ introduce scalar loss, always(!) scalar...  (also called *cost* or *objective* f
 For training we distinguish: the **training** data set drawn from some distribution, 
 the **validation** set (from the same distribution, but different data),
 and **test** data sets with _some_ different distribution than the training one.
-The latter distinction is important! For the test set we want 
+The latter distinction is important. For the test set we want 
 _out of distribution_ (OOD) data to check how well our trained model generalizes.
 Note that this gives a huge range of possibilities for the test data set: 
 from tiny changes that will certainly work,
@ -131,7 +131,8 @@ and the abbreviations used in: {doc}`notation`.
 We solve a discretized PDE $\mathcal{P}$ by performing steps of size $\Delta t$.
 The solution can be expressed as a function of $\mathbf{u}$ and its derivatives:
 $\mathbf{u}(\mathbf{x},t+\Delta t) = 
-\mathcal{P}(\mathbf{u}(\mathbf{x},t), \mathbf{u}(\mathbf{x},t)',\mathbf{u}(\mathbf{x},t)'',...)$.
+\mathcal{P}( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} )$, where
+  $\mathbf{u}_{x}$ denotes the spatial derivatives $\partial \mathbf{u}(\mathbf{x},t) / \partial \mathbf{x}$.

 For all PDEs, we will assume non-dimensional parametrizations as outlined below,
 which could be re-scaled to real world quantities with suitable scaling factors.
--- a/overview-ns-forw.ipynb
+++ b/overview-ns-forw.ipynb
--- a/overview.md
+++ b/overview.md
@ -66,12 +66,13 @@ as the Navier-Stokes, Maxwell's, or Schroedinger's equations.
 Seemingly trivial changes to the discretization can determine
 whether key phenomena are visible in the solutions or not.
 Rather than discarding the powerful methods that have been
-developed in the field of numerical mathematics, it 
-is highly beneficial for DL to use them as much as possible.
+developed in the field of numerical mathematics, this book will 
+show that it is highly beneficial to use them as much as possible
+when applying DL.

 ### Black boxes and magic?

-People who are unfamiliear with DL methods often associate neural networks 
+People who are unfamiliar with DL methods often associate neural networks 
 with _black boxes_, and see the training processes as something that is beyond the grasp
 of human understanding. However, these viewpoints typically stem from
 relying on hearsay and not dealing with the topic enough.
@ -81,10 +82,10 @@ and "all the gritty details" are not yet fully worked out. However, this is pret
 for scientific advances.
 Numerical methods themselves are a good example. Around 1950, numerical approximations
 and solvers had a tough standing. E.g., to cite H. Goldstine, 
-numerical instabilies were considered to be a "constant source of 
+numerical instabilities were considered to be a "constant source of 
 anxiety in the future" {cite}`goldstine1990history`. 
 By now we have a pretty good grasp of these instabilities, and numerical methods 
-are ubiquitous, and well established.
+are ubiquitous and well established.

 Thus, it is important to be aware of the fact that -- in a way -- there is nothing
 magical or otherworldly to deep learning methods. They're simply another set of 
@ -142,8 +143,8 @@ the most crucial differentiation for the following topics lies in the
 nature of the integration  between DL techniques
 and the domain knowledge, typically in the form of model equations
 via partial differential equations (PDEs).
-Taking a global perspective, the following three categories can be
-identified to categorize _physics-based deep learning_ (PBDL)
+The following three categories can be
+identified to roughly categorize _physics-based deep learning_ (PBDL)
 techniques:

 - _Supervised_: the data is produced by a physical system (real or simulated),
@ -162,12 +163,11 @@ techniques:
  temporal evolutions, where they can yield an estimate of the future behavior of the
  dynamics.

-Thus, methods can be roughly categorized in terms of forward versus inverse
+Thus, methods can be categorized in terms of forward versus inverse
 solve, and how tightly the physical model is integrated into the
 optimization loop that trains the deep neural network. Here, especially 
-the interleaved approaches
-that leverage _differentiable physics_ allow for very tight integration
-of deep learning and numerical simulation methods.
+interleaved approaches that leverage _differentiable physics_ allow for 
+very tight integration of deep learning and numerical simulation methods.


 ## Looking ahead
@ -176,8 +176,8 @@ _Physical simulations_ are a huge field, and we won't be able to cover all possi

 ```{note} Rather, the focus of this book lies on:
 - _Field-based simulations_ (no Lagrangian methods)
- Combinations with _deep learning_ (plenty of other interesting ML techniques, but not here)
- Experiments as _outlook_ (i.e., replace synthetic data with real-world observations)
+- Combinations with _deep learning_ (plenty of other interesting ML techniques exist, but won't be discussed here)
+- Experiments are left as an _outlook_ (i.e., replacing synthetic data with real-world observations)
 ```

 It's also worth noting that we're starting to build the methods from some very
--- a/physicalloss-code.ipynb
+++ b/physicalloss-code.ipynb
@ -33,7 +33,7 @@
    "In terms of the $x,y^*$ notation from {doc}`overview-equations` and the previous section, this reconstruction problem means we are solving\n",
    "\n",
    "$$\n",
-    "\\text{arg min}_{\\theta} \\sum_i |f(x_i ; \\theta)-y^*_i|^2 + R(x_i) ,\n",
+    "\\text{arg min}_{\\theta} \\sum_i ( f(x_i ; \\theta)-y^*_i )^2 + R(x_i) ,\n",
    "$$\n",
    "\n",
    "where $x$ and $y^*$ are solutions of $u$ at different locations in space and time. As we're dealing with a 1D velocity, $x,y^* \\in \\mathbb{R}$.\n",
--- a/physicalloss.md
+++ b/physicalloss.md
@ -3,7 +3,7 @@ Physical Loss Terms

 The supervised setting of the previous sections can quickly 
 yield approximate solutions with a fairly simple training process. However, what's
-quite sad to see here is that we only use physical models and numerics
+quite sad to see here is that we only use physical models and numerical methods 
 as an "external" tool to produce a big pile of data 😢.

 We as humans have a lot of knowledge about how to describe physical processes
@ -29,16 +29,17 @@ $$
 $$

 where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to one of the spatial dimensions
-of higher and higher order (this can of course also include mixed derivatives with respect to different axes).
+of higher and higher order (this can of course also include mixed derivatives with respect to different axes). \mathbf{u}_t denotes the changes over time.

-In this context, we can approximate the unknown u itself with a neural network. If the approximation, which we call $\tilde{\mathbf{u}}$, is accurate, the PDE should be satisfied naturally. In other words, the residual R should be equal to zero:
+In this context, we can approximate the unknown $\mathbf{u}$ itself with a neural network. If the approximation, which we call $\tilde{\mathbf{u}}$, is accurate, the PDE should be satisfied naturally. In other words, the residual R should be equal to zero:

 $$
  R = \mathbf{u}_t - \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} ) = 0 .
 $$

-This nicely integrates with the objective for training a neural network: similar to before
-we can collect sample solutions 
+This nicely integrates with the objective for training a neural network: we can train for 
+minimizing this residual in combination with direct loss terms.
+Similar to before, we can make use of sample solutions 
 $[x_0,y_0], ...[x_n,y_n]$ for $\mathbf{u}$ with $\mathbf{u}(\mathbf{x})=y$. 
 This is typically important, as most practical PDEs we encounter do not have unique solutions
 unless initial and boundary conditions are specified. Hence, if we only consider $R$ we might
@ -47,7 +48,7 @@ therefore help to _pin down_ the solution in certain places.
 Now our training objective becomes

 $$
-\text{arg min}_{\theta} \ \alpha_0 \sum_i (f(x_i ; \theta)-y_i)^2 + \alpha_1 R(x_i) ,
+\text{arg min}_{\theta} \ \alpha_0 \sum_i \big( f(x_i ; \theta)-y_i \big)^2 + \alpha_1 R(x_i) ,
 $$ (physloss-training)

 where $\alpha_{0,1}$ denote hyperparameters that scale the contribution of the supervised term and 
--- a/supervised-airfoils.ipynb
+++ b/supervised-airfoils.ipynb
@ -36,7 +36,7 @@
    "With the supervised formulation from {doc}`supervised`, our learning task is pretty straight-forward, and can be written as \n",
    "\n",
    "$$\\begin{aligned}\n",
-    "\\text{arg min}_{\\theta} \\sum_i |f(x_i ; \\theta)-y^*_i|^2 ,\n",
+    "\\text{arg min}_{\\theta} \\sum_i ( f(x_i ; \\theta)-y^*_i )^2 ,\n",
    "\\end{aligned}$$\n",
    "\n",
    "where $x$ and $y^*$ each consist of a set of physical fields,\n",
--- a/supervised-discuss.md
+++ b/supervised-discuss.md
@ -92,8 +92,8 @@ models at training time later on, the NNs just adjust their weights to represent
 they receive, and reproduce it.

 Due to the hype and numerous success stories, people not familiar with DL often have 
-the impression that DL works like a human mind, and is able to detect fundamental
-and general principles in data sets (["messages from god"](https://dilbert.com/strip/2000-01-03) anyone?).
+the impression that DL works like a human mind, and is able to extract fundamental
+and general principles from data sets (["messages from god"](https://dilbert.com/strip/2000-01-03) anyone?).
 That's not what happens with the current state of the art. Nonetheless, it's
 the most powerful tool we have to approximate complex, non-linear functions.
 It is a great tool, but it's important to keep in mind, that once we set up the training
@ -119,8 +119,8 @@ As a rule of thumb: make sure you actually train the NN on the
 inputs that are as similar as possible to those you want to use at inference time.

 This is important to keep in mind during the next chapters: e.g., if we
-want an NN to work in conjunction with another solver or simulation environment,
-it's important to actually bring the solver into the training process, otherwise
+want an NN to work in conjunction with a certain simulation environment,
+it's important to actually include the simulator in the training process. Otherwise,
 the network might specialize on pre-computed data that differs from what is produced
 when combining the NN with the solver, i.e it will suffer from _distribution shift_.

--- a/supervised.md
+++ b/supervised.md
@ -78,5 +78,5 @@ is a very attractive and interesting direction.

 ## Show me some code!

-Let's directly look at an example for this: we'll replace a full solver for
-_turbulent flows around airfoils_ with a surrogate model from {cite}`thuerey2020dfp`. 
+Let's finally look at a code example that trains a neural network:
+we'll replace a full solver for _turbulent flows around airfoils_ with a surrogate model from {cite}`thuerey2020dfp`.