minor tweaks in intro

2021-08-28 16:52:40 +02:00 · 2021-08-28 16:52:40 +02:00 · 8b71d57e05
commit 8b71d57e05
parent e24d1b3ecc
3 changed files with 25 additions and 25 deletions
--- a/intro-teaser.ipynb
+++ b/intro-teaser.ipynb
@ -13,9 +13,9 @@
   "id": "original-brave",
   "metadata": {},
   "source": [
-    "Let's start with a very reduced example that highlights some of the key capabilities of physics-based learning approaches. Let's assume our physical model is an extremely simple equation: a parabola along +x\n",
+    "Let's start with a very reduced example that highlights some of the key capabilities of physics-based learning approaches. Let's assume our physical model is a very simple equation: a parabola along the positive x-axis.\n",
    "\n",
-    "Despite being very simple, for every point along there are two solutions, i.e. we have two modes, one above the other one below the x axis, as shown on the left below. If we don't take care a conventional learning approach will give us an approximation like the red one shown on the middle, which is obviously completely off. With an improved learning setup, ideally by using a discretized numerical solver, we can at least accurately represent one of the modes of the solution (shown in green on the right).\n",
+    "Despite being very simple, for every point along there are two solutions, i.e. we have two modes, one above the other one below the x-axis, as shown on the left below. If we don't take care a conventional learning approach will give us an approximation like the red one shown in the middle, which is obviously completely off. With an improved learning setup, ideally, by using a discretized numerical solver, we can at least accurately represent one of the modes of the solution (shown in green on the right).\n",
    "\n",
    "```{figure} resources/intro-teaser-side-by-side.png\n",
    "---\n",
@ -49,7 +49,7 @@
    "\n",
    "In contrast to this supervised approach, employing differentiable physics takes advantage of the fact that we can directly use a discretized version of the physical model $\\mathcal P$ and employ it to guide the training of $f$. I.e., we want $f$ to _interact_ with our _simulator_ $\\mathcal P$. This can vastly improve the learning, as we'll illustrate below with a very simple example (more complex ones will follow later on).\n",
    "\n",
-    "Note that in order for the DP approach to work, $\\mathcal P$ has to differentiable, as implied by the name. These differentials, in the form of a gradient, are what's driving the learning process.\n"
+    "Note that in order for the DP approach to work, $\\mathcal P$ has to be differentiable, as implied by the name. These differentials, in the form of a gradient, are what's driving the learning process.\n"
   ]
  },
  {
@ -69,7 +69,7 @@
    "\n",
    "We know that possible solutions for $f$ are the positive or negative square root function (for completeness: piecewise combinations would also be possible).\n",
    "Knowing that this is not overly difficult, it's an obvious idea to try training a neural network to approximate this inverse mapping $f$.\n",
-    "Doing this in the \"classical\" supervised manner, i.e. purely based on data, is an obvious starting point. After all, this approach was shown to be powerful tools for a variety of other applications, e.g., in computer vision."
+    "Doing this in the \"classical\" supervised manner, i.e. purely based on data, is an obvious starting point. After all, this approach was shown to be a powerful tool for a variety of other applications, e.g., in computer vision."
   ]
  },
  {
@ -89,7 +89,7 @@
   "id": "numerous-emphasis",
   "metadata": {},
   "source": [
-    "For supervised training, we can employ our solver $\\mathcal P$ for the problem to pre-compute the solutions we need for training: We randomly choose between the positive and the negative square root. This resembles the  general case, where we would gather all data available to us (e.g., using optimization techniques to compute the solutions). Such a data collection typically does not favor one particular mode from multimodal solutions."
+    "For supervised training, we can employ our solver $\\mathcal P$ for the problem to pre-compute the solutions we need for training: We randomly choose between the positive and the negative square root. This resembles the  general case, where we would gather all data available to us (e.g., using optimization techniques to compute the solutions). Such data collection typically does not favor one particular mode from multimodal solutions."
   ]
  },
  {
@ -185,7 +185,7 @@
   "id": "governmental-mixture",
   "metadata": {},
   "source": [
-    "As both NN and data set are very small, the training converges very quickly. However, if we inspect the predictions of the network, we can see that it is nowhere near the solution we were hoping to find: it averages between the data points on both sides of the x-axis and therefore fails to find satisfying solutions to the problem above.\n",
+    "As both NN and the data set are very small, the training converges very quickly. However, if we inspect the predictions of the network, we can see that it is nowhere near the solution we were hoping to find: it averages between the data points on both sides of the x-axis and therefore fails to find satisfying solutions to the problem above.\n",
    "\n",
    "The following plot nicely highlights this: it shows the data in light gray, and the supervised solution in red. "
   ]
@ -276,7 +276,7 @@
   "source": [
    "The loss function is the crucial point for training: we directly incorporate the function f into the loss. In this simple case, the `loss_dp` function simply computes the square of the prediction `y_pred`. \n",
    "\n",
-    "Later on, a lot more could happen here: we could evaluate finite difference stencils on the predicted solution, or compute a whole implicit time-integration step of a solver. Here we have a simple _mean-squared error_ term of the form $|y_{\\text{pred}}^2 - y_{\\text{true}}|^2$, which we are minimizing during training. It's not necessary to make it so simple: the more knowledge and numerical methods we can incorporate, the better we can guide the training process."
+    "Later on, a lot more could happen here: we could evaluate finite-difference stencils on the predicted solution, or compute a whole implicit time-integration step of a solver. Here we have a simple _mean-squared error_ term of the form $|y_{\\text{pred}}^2 - y_{\\text{true}}|^2$, which we are minimizing during training. It's not necessary to make it so simple: the more knowledge and numerical methods we can incorporate, the better we can guide the training process."
   ]
  },
  {
@ -371,11 +371,11 @@
    "\n",
    "What has happened here?\n",
    "\n",
-    "- We've prevented an undesired averaging of multiple modes in the solution by evaluating our discrete model w.r.t. current prediction of the network, rather than using a pre-computed solution. This lets us find the best single mode near the network prediction, and prevents an averaging of the modes that exist in the solution manifold.\n",
+    "- We've prevented an undesired averaging of multiple modes in the solution by evaluating our discrete model w.r.t. current prediction of the network, rather than using a pre-computed solution. This lets us find the best mode near the network prediction, and prevents an averaging of the modes that exist in the solution manifold.\n",
    "\n",
-    "- We're still only getting one side of the curve! This is to be expected, because we're representing the solutions with a deterministic function. Hence, we can only represent a single mode. Interestingly, whether it's the top or bottom mode is determined by the random initialization of the weights in $f$ - run the example a couple of time to see this effect in action. To capture multiple modes we'd need to extend the NN to capture the full distribution of the outputs and parametrize it with additional dimensions.\n",
+    "- We're still only getting one side of the curve! This is to be expected because we're representing the solutions with a deterministic function. Hence, we can only represent a single mode. Interestingly, whether it's the top or bottom mode is determined by the random initialization of the weights in $f$ - run the example a couple of times to see this effect in action. To capture multiple modes we'd need to extend the NN to capture the full distribution of the outputs and parametrize it with additional dimensions.\n",
    "\n",
-    "- The region with $x$ near zero is typically still off in this example. The network essentially learns a linear approximation of one half of the parabola here. This is partially caused by the weak neural network: it is very small and shallow. In addition, the evenly spread of sample points along the x axis bias the NN towards the larger y values. These contribute more to the loss, and hence the network invests most of its resources to reduce the error in this region.\n"
+    "- The region with $x$ near zero is typically still off in this example. The network essentially learns a linear approximation of one half of the parabola here. This is partially caused by the weak neural network: it is very small and shallow. In addition, the evenly spread of sample points along the x-axis bias the NN towards the larger y values. These contribute more to the loss, and hence the network invests most of its resources to reduce the error in this region.\n"
   ]
  },
  {
@ -385,9 +385,9 @@
   "source": [
    "## Discussion\n",
    "\n",
-    "It's a very simple example, but it very clearly shows a failure case for supervised learning. While it might seem very artificial on first sight, many practical PDEs exhibit a variety of these modes, and it's often not clear where (and how many) exist in the solution manifold we're interested in. Using supervised learning is very dangerous in such cases - we might simply and unknowingly _blur_ out these different modes.\n",
+    "It's a very simple example, but it very clearly shows a failure case for supervised learning. While it might seem very artificial at first sight, many practical PDEs exhibit a variety of these modes, and it's often not clear where (and how many) exist in the solution manifold we're interested in. Using supervised learning is very dangerous in such cases - we might simply and unknowingly _blur_ out these different modes.\n",
    "\n",
-    "Good and obvious example are bifurcations in fluid flows - the smoke rising above a candle will start out straight, and then, due to tiny perturbations in its motion, start oscillating in a random direction. The images below illustrate this case via _numerical perturbations_: the perfectly symmetric setup will start turning left or right, depending on how the approximation errors build up. Similarly, we'll have different modes in all our numerical solutions, and typically it's important to recover them, rather than averaging them out. Hence, we'll show how to leverage training via _differentiable physics_ in the following chapters for more practical and complex cases.\n",
+    "A good and obvious example are bifurcations in fluid flows. Smoke rising above a candle will start out straight, and then, due to tiny perturbations in its motion, start oscillating in a random direction. The images below illustrate this case via _numerical perturbations_: the perfectly symmetric setup will start turning left or right, depending on how the approximation errors build up. Similarly, we'll have different modes in all our numerical solutions, and typically it's important to recover them, rather than averaging them out. Hence, we'll show how to leverage training via _differentiable physics_ in the following chapters for more practical and complex cases.\n",
    "\n",
    "```{figure} resources/intro-fluid-bifurcation.jpg\n",
    "---\n",
--- a/intro.md
+++ b/intro.md
@ -4,12 +4,12 @@ Welcome ...
 Welcome to the _Physics-based Deep Learning Book_ 👋

 **TL;DR**: 
-This document targets a practical and comprehensive introduction to the latest concepts
-for combining physical simulations with deep learning.
-As much as possible, the algorithms will come with hands-on code examples to quickly get started.
+This document targets a practical and comprehensive introduction of everything
+related to deep learning in the context of physical simulations.
+As much as possible, all topics come with hands-on code examples in the form of Jupyter notebooks to quickly get started.
 Beyond standard _supervised_ learning from data, we'll look at _physical loss_ constraints, 
-more tightly coupled learning algorithms with _differentiable simulations_, as well as extensions such
-as reinforcement learning and uncertainty modeling.
+more tightly coupled learning algorithms with _differentiable simulations_, as well as 
+reinforcement learning and uncertainty modeling.
 We live in exciting times: these methods have a huge potential to fundamentally change what we can achieve
 with simulations.

@ -24,7 +24,7 @@ Some visual examples of numerically simulated time sequences. In this book, we e

 ## Coming up

-As a _sneak preview_, in the next chapters will show:
+As a _sneak preview_, the next chapters will show:

 - How to train networks to infer a fluid flow around shapes like airfoils, and estimate the uncertainty of the prediction. This gives a _surrogate model_ that replaces a traditional numerical simulation.

@ -32,11 +32,11 @@ As a _sneak preview_, in the next chapters will show:

 - How to more tightly interact with a full simulator for _inverse problems_. E.g., we'll demonstrate how to circumvent the convergence problems of standard reinforcement learning techniques by leveraging simulators in the training loop.

-Over the course of the next
-chapters we will introduce different approaches for introducing physical models
+Throughout this text,
+we will introduce different approaches for introducing physical models
 into deep learning, i.e., _physics-based deep learning_ (PBDL) approaches.
 These algorithmic variants will be introduced in order of increasing
-tightness of the integration, and  pros and cons of the different approaches
+tightness of the integration, and the pros and cons of the different approaches
 will be discussed. It's important to know in which scenarios each of the
 different techniques is particularly useful.

@ -52,11 +52,11 @@ and we're eager to improve it. Thanks in advance 😀! Btw., we also maintain a

 ```{admonition} Executable code, right here, right now
 :class: tip
-We focus on jupyter notebooks, a key advantage of which is that all code examples
+We focus on Jupyter notebooks, a key advantage of which is that all code examples
 can be executed _on the spot_, from your browser. You can modify things and 
 immediately see what happens -- give it a try...
 <br><br>
-Plus, jupyter notebooks are great because they're a form of [literate programming](https://en.wikipedia.org/wiki/Literate_programming).
+Plus, Jupyter notebooks are great because they're a form of [literate programming](https://en.wikipedia.org/wiki/Literate_programming).
 ```


@ -91,7 +91,7 @@ If you find this book useful, please cite it via:
@book{thuerey2021pbdl,
  title={Physics-based Deep Learning},
  author={Nils Thuerey and Philipp Holl and Maximilian Mueller and Patrick Schnell and Felix Trost and Kiwon Um},
-  url={http://physicsbaseddeeplearning.org},
+  url={https://physicsbaseddeeplearning.org},
  year={2021},
  publisher={WWW}
 }
--- a/reinflearn-intro.md
+++ b/reinflearn-intro.md
@ -19,7 +19,7 @@ In its simplest form, the learning goal for reinforcement learning tasks can be

 $$
 \text{arg max}_{\theta} \mathbb{E}_{a \sim \pi(;s,\theta_p)} \big[ \sum_t r_t \big], 
-$$ (learn-l2)
+$$ (rl-learn-l2)

 where the reward at time $t$ (denoted by $r_t$ above) is the result of an action $a$ performed by an agent. 
 The agents choose their actions based on a neural network policy which decides via a set of given observations.