last round of updates from Maxi, minor fixes

2021-07-23 12:55:23 +02:00 · 2021-07-23 12:55:23 +02:00 · 9cea132903
commit 9cea132903
parent 6b938e735f
7 changed files with 35 additions and 34 deletions
--- a/_toc.yml
+++ b/_toc.yml
@ -45,7 +45,6 @@ parts:
  - file: others-timeseries.md
  - file: others-GANs.md
  - file: others-lagrangian.md
-  - file: others-metrics.md
 - caption: End Matter
  chapters:
  - file: outlook.md
--- a/json-cleanup-for-pdf.py
+++ b/json-cleanup-for-pdf.py
@ -48,6 +48,7 @@ for fnOut in fileList:
 	# remove TF / pytorch warnings
 	re1 = re.compile(r"WARNING:tensorflow:")
 	re2 = re.compile(r"UserWarning:")
+	re4 = re.compile(r"DeprecationWarning:")

 	# shorten data line: "0.008612174447657694, 0.02584669669548606, 0.043136357266407785"
 	re3 = re.compile(r"\[0.008612174447657694, 0.02584669669548606, 0.043136357266407785.+\]" )
@ -91,6 +92,7 @@ for fnOut in fileList:
 						nums = []
 						nums.append( re1.search( d[t][i]["outputs"][j]["text"][k] ) )
 						nums.append( re2.search( d[t][i]["outputs"][j]["text"][k] ) )
+						nums.append( re4.search( d[t][i]["outputs"][j]["text"][k] ) )
 						if (nums[0] is None) and (nums[1] is None):
 							okay = okay+1
 						else: # delete line "dell"
--- a/others-GANs.md
+++ b/others-GANs.md
@ -74,7 +74,7 @@ $$
 % | f_d( f_e(\mathbf{s};\theta_e) ;\theta_d) - \mathbf{s} |_2^2

 which, as outlined above, is a standard binary cross-entropy training for the class of real samples
-$\mathbf{y}$, and the generated ones $G(\mathbf{z}$. With the formulation above, the discriminator 
+$\mathbf{y}$, and the generated ones $G(\mathbf{z})$. With the formulation above, the discriminator 
 is trained to maximize the loss via producing an output of 1 for the real samples, and 0 for the generated ones.

 The key for the generator loss is to employ the discriminator and produce samples that are classified as
@ -119,8 +119,8 @@ of possible high-resolution solutions that would fit the low-res input.
 If a data set contains multiple such cases, and we employ supervised training,
 the network will reliably learn the mean. This averaged solution usually is one
 that is clearly undesirable, and unlike any of the individual solutions from which it was 
-computed. This situation is sometime also called _multi-modal_, i.e. the different solutions
-can be seen as modes of the data. For fluids, this can, e.g., happen when 
+computed. This is the _multi-modality_ problem, i.e. different modes existing as valid 
+equally valid solutions to a problem. For fluids, this can, e.g., happen when 
 we're facing bifurcations, as discussed in {doc}`intro-teaser`.

 The following image shows a clear example of how well GANs can circumvent 
@ -146,7 +146,7 @@ The following example compares the time derivatives of different solutions:
 ---
 name: GANs-tempoGAN-fig4
 ---
-F.l.t.r., time derivatives for: a spatial GAN (i.e. not time aware), a temporally supervised learning, a spatio-temporal GAN, and a reference solution.
+From left to right, time derivatives for: a spatial GAN (i.e. not time aware), a temporally supervised learning, a spatio-temporal GAN, and a reference solution.
 ```

 As can be seen, the GAN trained with spatio-temporal self-supervision (second from right) closely matches the reference solution on the far right. In this case the discriminator receives reference solutions over time (in the form of triplets), such that it can learn to judge whether the temporal evolution of a generated solution matches that of the reference.
@ -183,7 +183,7 @@ GANs are a powerful learning tool. Note that the discriminator $D$ is really "ju
 loss function: we can completely discard it at inference time, once the generator is fully trained.
 Hence it's also not overly crucial how much resources it needs.

-However, despite being a very powerful tools, it is (given the current state-of-the-art) questionable
+However, despite being very powerful tools, it is (given the current state-of-the-art) questionable
 whether GANs make sense when we have access to a reasonable PDE model. If we can discretize the model
 equations and include them with a differentiable physics (DP) training (cf. {doc}`diffphys`), 
 this will most likely give
--- a/others-intro.md
+++ b/others-intro.md
@ -2,7 +2,7 @@ Additional Topics
 =======================

 The next sections will give a shorter introduction to other topics that are highly 
-interesting in the context of physics-based deep learning. These topic (for now) do
+interesting in the context of physics-based deep learning. These topics (for now) do
 not come with executable notebooks, but we will still point to existing open source 
 implementations for each of them.

@ -17,6 +17,5 @@ More specifically, we will look at:

 * Meshless methods and unstructured meshes are an important topic for classical simulations. Here, we'll look at a specific Lagrangian method that employs learning in the context of dynamic, particle-based representations.

-* Finally, metrics to robustly assess the quality of similarity of measurements and results are a central topic for all numerical methods, no matter whether they employ learning or not. In the last section we will look at how DL can be used to learn specialized and improved metrics.
-	TODO {cite}`kohl2020lsim`
+% * Finally, metrics to robustly assess the quality of similarity of measurements and results are a central topic for all numerical methods, no matter whether they employ learning or not. In the last section we will look at how DL can be used to learn specialized and improved metrics. {cite}`kohl2020lsim`

--- a/others-lagrangian.md
+++ b/others-lagrangian.md
@ -1,7 +1,7 @@
 Meshless Methods
 =======================

-For all computers and based methods we need to find a suitable discrete representation.
+For all computer-based methods we need to find a suitable _discrete_ representation.
 While this is straight-forward for cases such as data consisting only of integers, it is more challenging
 for continuously changing quantities such as the temperature in a room. 
 While the previous examples have focused on aspects beyond discretization
@ -52,9 +52,10 @@ amount of additional complexity in an implementation, and the arbitrary
 connectivities call for _message-passing_ approaches between the nodes of a graph.
 This message passing is usually realized using fully-connected layers, instead of convolutions.

-Thus, in the following, we will focus on a particle-based method, which offers
-the same flexibility in terms of spatial adaptivity as GNNs, but still
-employs a convolution operator for learning the physical relationships.
+Thus, in the following, we will focus on a particle-based method {cite}`ummenhofer2019contconv`, which offers
+the same flexibility in terms of spatial adaptivity as GNNs. These were previously employed for
+a very similar goal {cite}`sanchez2020learning`, however, the method below
+enables a real convolution operator for learning the physical relationships.


 ## Meshless and particle-based methods
--- a/others-timeseries.md
+++ b/others-timeseries.md
@ -5,8 +5,8 @@ An inherent challenge for many practical PDE solvers is the large dimensionality
 Our model $\mathcal{P}$ is typically discretized with $\mathcal{O}(n^3)$ samples for a 3 dimensional 
 problem (with $n$ denoting the number of samples along one axis), 
 and for time-dependent phenomena we additionally have a discretization along
-time. The latter typically scales in accordance to the spatial dimensions, giving an
-overall number of samples on the order of $\mathcal{O}(n^4)$. Not surprisingly, 
+time. The latter typically scales in accordance to the spatial dimensions. This gives an
+overall samples count on the order of $\mathcal{O}(n^4)$. Not surprisingly, 
 the workload in these situations quickly explodes for larger $n$ (and for all practical high-fidelity applications we want $n$ to be as large as possible).

 One popular way to reduce the complexity is to map a spatial state of our system $\mathbf{s_t} \in \mathbb{R}^{n^3}$
@ -32,7 +32,7 @@ the time evolution with $f_t$, and then decode the full spatial information with

 Reducing the dimension and complexity of computational models, often called _reduced order modeling_ (ROM) or _model reduction_, is a classic topic in the computational field. Traditional techniques often employ techniques such as principal component analysis to arrive at a basis for a chosen space of solution. However, being linear by construction, these approaches have inherent limitations when representing complex, non-linear solution manifolds. In practice, all "interesting" solutions are highly non-linear, and hence DL has received a substantial amount of interest as a way to learn non-linear representations. Due to the non-linearity, DL representations can potentially yield a high accuracy with fewer degrees of freedom in the reduced model compared to classic approaches.

-The canonical NN for reduced models is an _autoencoder_. This denotes a network whose sole task is to reconstruct a given input $x$ while passing it through a bottleneck that is typically located in or near the middle of the stack of layers of the NN. The data in the bottleneck then represents the compressed, latent space representation $\mathbf{c}$, the part of the network leading up to it the encoder $f_e$, and the part after the bottleneck the decoder $f_d$. In combination, the learning task can be written as
+The canonical NN for reduced models is an _autoencoder_. This denotes a network whose sole task is to reconstruct a given input $x$ while passing it through a bottleneck that is typically located in or near the middle of the stack of layers of the NN. The data in the bottleneck then represents the compressed, latent space representation $\mathbf{c}$. The part of the network leading up to the bottleneck  $\mathbf{c}$ is the encoder $f_e$, and the part after it the decoder $f_d$. In combination, the learning task can be written as

 $$
 \text{arg min}_{\theta_e,\theta_d} | f_d( f_e(\mathbf{s};\theta_e) ;\theta_d) - \mathbf{s} |_2^2
@ -54,15 +54,11 @@ would prevent using the encoder or decoder in a standalone manner. E.g., the dec

 ### Autoencoder variants

-One popular variant of autoencoders is worth a mention here: the so-called _varational autoencoders_, or VAEs. These
-autoencoders follow the structure above, but additionally employ a loss term to shape the latent space of $\mathbf{c}$.
-Typically we use a normal distribution as target, which makes the latent space 
-an $m$ dimensional unit cube, i.e., each dimension should have a zero mean and unit standard deviation.
-This approach is especially useful if the decoder should be used as a generative model. E.g., we can then produce
-$\mathbf{c}$ samples directly, and decode them to obtain full states. 
-While this is very useful to, e.g., obtain generative models for faces or other types of natural images, it is less 
-crucial in a simulation setting. Here we rather want to obtain a latent space that facilitates the temporal prediction,
-rather than being able to easily produce samples from it.
+One popular variant of autoencoders is worth a mention here: the so-called _variational autoencoders_, or VAEs. These autoencoders follow the structure above, but additionally employ a loss term to shape the latent space of $\mathbf{c}$. Its goal is to let the latent space follow a known distribution. This makes it possible to draw samples in latent space without workarounds such as having to project samples into the latent space.
+
+Typically we use a normal distribution as target, which makes the latent space an $m$ dimensional unit cube: each dimension should have a zero mean and unit standard deviation.
+This approach is especially useful if the decoder should be used as a generative model. E.g., we can then produce $\mathbf{c}$ samples directly, and decode them to obtain full states. 
+While this is very useful for applications such as constructing generative models for faces or other types of natural images, it is less crucial in a simulation setting. Here we want to obtain a latent space that facilitates the temporal prediction, rather than being able to easily produce samples from it.


 ## Time series
@ -100,16 +96,16 @@ store the previous history of states it has seen.
 For the former variant, the prediction network $f_p$ receives more than 
 a single $\mathbf{c}_{t}$. For the latter variant, we can turn to algorithms
 from the subfield of _recurrent neural networks_ (RNNs). A variety of architectures 
-have been proposed to encode and store temporal states of a sytem, the most
+have been proposed to encode and store temporal states of a system, the most
 popular ones being 
-_long short-term memory_ (LSTM) network,
+_long short-term memory_ (LSTM) networks,
 _gated recurrent units_ (GRUs), or
-lately attenion-based _transformer_ networks.
+lately attention-based _transformer_ networks.
 No matter which variant is used, these approaches always work with fully-connected layers
 as the latent space vectors do not exhibit any spatial structure, but typically represent 
 a seemingly random collection of values.
 Due to the fully-connected layers, the prediction networks quickly grow in terms
-of their parameter count, and thus require relatively a small latent-space dimension $m$.
+of their parameter count, and thus require a relatively small latent-space dimension $m$.
 Luckily, this is in line with our main goals, as outlined at the top.

 ## End-to-end training
@ -138,7 +134,7 @@ height: 300px
 name: timeseries-lss-subdiv-prediction
 ---
 Several time frames of an example prediction from {cite}`wiewel2020lsssubdiv`, which additionally couples the
-learned time evolution with an numerically solved advection step. 
+learned time evolution with a numerically solved advection step. 
 The learned prediction is shown at the top, the reference simulation at the bottom.
 ```

--- a/outlook.md
+++ b/outlook.md
@ -1,13 +1,17 @@
 Outlook
 =======================

-Despite the lengthy discussions and numerous examples, 
-we've really just barely scratched the surface regarding the possibilities that arise in the context 
-of physics-based deep learning.
+Despite the lengthy discussions and numerous examples, we've really just barely scratched the surface regarding the possibilities that arise in the context of physics-based deep learning.
+
+Most importantly, the techniques that were explained in the previous chapter have a gigantic potential to influence all computational methods of the next decades. As demonstrated many times in the code examples, there's no magic involved, but deep learning gives us very powerful tools to represent and approximate non-linear functions. And deep learning by no means makes existing numerical methods deprecated. Rather, the two are an ideal combination.
+
+A topic that we have not touched at all so far is, that -- of course -- in the end our goal is to improve human understanding of our world. And here the view of neural networks as "black boxes" is clearly outdated. It is simply another numerical method that humans can employ, and the physical fields predicted by a network are as interpretable as the outcome of a traditional simulation
+
+**TODO**

 ![Divider](resources/divider2.jpg)

-The examples with Burgers equation and Navier-Stokes solvers are non-trivial, and good examples for advection-diffusion-type PDEs. However, there's a wide variety of other potential combinations. To name just a few promising examples from other fields:
+The examples with Burgers equation and Navier-Stokes solvers are clearly non-trivial, and good examples for advection-diffusion-type PDEs. However, there's a wide variety of other potential models that similar techniques could be applied to. To name just a few promising examples from other fields:

 * PDEs for chemical reactions often show complex behavior due to the interactions of multiple species. Here, and especially interesting direction is to train models that quickly learn to predict the evolution of an experiment or machine, and adjust control knobs to stabilize it, i.e., an online _control_ setting.