last round of updates from Maxi, minor fixes

2021-07-23 12:55:23 +02:00
parent 6b938e735f
commit 9cea132903
7 changed files with 35 additions and 34 deletions
--- a/others-timeseries.md
+++ b/others-timeseries.md
@@ -5,8 +5,8 @@ An inherent challenge for many practical PDE solvers is the large dimensionality
 Our model $\mathcal{P}$ is typically discretized with $\mathcal{O}(n^3)$ samples for a 3 dimensional 
 problem (with $n$ denoting the number of samples along one axis), 
 and for time-dependent phenomena we additionally have a discretization along
-time. The latter typically scales in accordance to the spatial dimensions, giving an
-overall number of samples on the order of $\mathcal{O}(n^4)$. Not surprisingly, 
+time. The latter typically scales in accordance to the spatial dimensions. This gives an
+overall samples count on the order of $\mathcal{O}(n^4)$. Not surprisingly, 
 the workload in these situations quickly explodes for larger $n$ (and for all practical high-fidelity applications we want $n$ to be as large as possible).

 One popular way to reduce the complexity is to map a spatial state of our system $\mathbf{s_t} \in \mathbb{R}^{n^3}$
@@ -32,7 +32,7 @@ the time evolution with $f_t$, and then decode the full spatial information with

 Reducing the dimension and complexity of computational models, often called _reduced order modeling_ (ROM) or _model reduction_, is a classic topic in the computational field. Traditional techniques often employ techniques such as principal component analysis to arrive at a basis for a chosen space of solution. However, being linear by construction, these approaches have inherent limitations when representing complex, non-linear solution manifolds. In practice, all "interesting" solutions are highly non-linear, and hence DL has received a substantial amount of interest as a way to learn non-linear representations. Due to the non-linearity, DL representations can potentially yield a high accuracy with fewer degrees of freedom in the reduced model compared to classic approaches.

-The canonical NN for reduced models is an _autoencoder_. This denotes a network whose sole task is to reconstruct a given input $x$ while passing it through a bottleneck that is typically located in or near the middle of the stack of layers of the NN. The data in the bottleneck then represents the compressed, latent space representation $\mathbf{c}$, the part of the network leading up to it the encoder $f_e$, and the part after the bottleneck the decoder $f_d$. In combination, the learning task can be written as
+The canonical NN for reduced models is an _autoencoder_. This denotes a network whose sole task is to reconstruct a given input $x$ while passing it through a bottleneck that is typically located in or near the middle of the stack of layers of the NN. The data in the bottleneck then represents the compressed, latent space representation $\mathbf{c}$. The part of the network leading up to the bottleneck  $\mathbf{c}$ is the encoder $f_e$, and the part after it the decoder $f_d$. In combination, the learning task can be written as

 $$
 \text{arg min}_{\theta_e,\theta_d} | f_d( f_e(\mathbf{s};\theta_e) ;\theta_d) - \mathbf{s} |_2^2
@@ -54,15 +54,11 @@ would prevent using the encoder or decoder in a standalone manner. E.g., the dec

 ### Autoencoder variants

-One popular variant of autoencoders is worth a mention here: the so-called _varational autoencoders_, or VAEs. These
-autoencoders follow the structure above, but additionally employ a loss term to shape the latent space of $\mathbf{c}$.
-Typically we use a normal distribution as target, which makes the latent space 
-an $m$ dimensional unit cube, i.e., each dimension should have a zero mean and unit standard deviation.
-This approach is especially useful if the decoder should be used as a generative model. E.g., we can then produce
-$\mathbf{c}$ samples directly, and decode them to obtain full states. 
-While this is very useful to, e.g., obtain generative models for faces or other types of natural images, it is less 
-crucial in a simulation setting. Here we rather want to obtain a latent space that facilitates the temporal prediction,
-rather than being able to easily produce samples from it.
+One popular variant of autoencoders is worth a mention here: the so-called _variational autoencoders_, or VAEs. These autoencoders follow the structure above, but additionally employ a loss term to shape the latent space of $\mathbf{c}$. Its goal is to let the latent space follow a known distribution. This makes it possible to draw samples in latent space without workarounds such as having to project samples into the latent space.
+
+Typically we use a normal distribution as target, which makes the latent space an $m$ dimensional unit cube: each dimension should have a zero mean and unit standard deviation.
+This approach is especially useful if the decoder should be used as a generative model. E.g., we can then produce $\mathbf{c}$ samples directly, and decode them to obtain full states. 
+While this is very useful for applications such as constructing generative models for faces or other types of natural images, it is less crucial in a simulation setting. Here we want to obtain a latent space that facilitates the temporal prediction, rather than being able to easily produce samples from it.


 ## Time series
@@ -100,16 +96,16 @@ store the previous history of states it has seen.
 For the former variant, the prediction network $f_p$ receives more than 
 a single $\mathbf{c}_{t}$. For the latter variant, we can turn to algorithms
 from the subfield of _recurrent neural networks_ (RNNs). A variety of architectures 
-have been proposed to encode and store temporal states of a sytem, the most
+have been proposed to encode and store temporal states of a system, the most
 popular ones being 
-_long short-term memory_ (LSTM) network,
+_long short-term memory_ (LSTM) networks,
 _gated recurrent units_ (GRUs), or
-lately attenion-based _transformer_ networks.
+lately attention-based _transformer_ networks.
 No matter which variant is used, these approaches always work with fully-connected layers
 as the latent space vectors do not exhibit any spatial structure, but typically represent 
 a seemingly random collection of values.
 Due to the fully-connected layers, the prediction networks quickly grow in terms
-of their parameter count, and thus require relatively a small latent-space dimension $m$.
+of their parameter count, and thus require a relatively small latent-space dimension $m$.
 Luckily, this is in line with our main goals, as outlined at the top.

 ## End-to-end training
@@ -138,7 +134,7 @@ height: 300px
 name: timeseries-lss-subdiv-prediction
 ---
 Several time frames of an example prediction from {cite}`wiewel2020lsssubdiv`, which additionally couples the
-learned time evolution with an numerically solved advection step. 
+learned time evolution with a numerically solved advection step. 
 The learned prediction is shown at the top, the reference simulation at the bottom.
 ```