integrated RL, spell check

2021-04-15 16:20:17 +08:00 · 2021-04-15 16:20:17 +08:00 · a8074987b6
commit a8074987b6
parent 4b8fee4fa0
7 changed files with 94 additions and 816 deletions
--- a/_toc.yml
+++ b/_toc.yml
@ -37,6 +37,16 @@
  - file: diffphys-control.ipynb
  - file: diffphys-outlook.md

+- part: Reinforcement Learning
+  chapters:
+  - file: reinflearn-intro.md
+  - file: reinflearn-code.ipynb
+  
+- part: PBDL and Uncertainty 
+  chapters:
+  - file: bayesian-intro.md
+  - file: bayesian-code.ipynb
+  
 - part: Physical Gradients
  chapters:
  - file: physgrad.md
@ -44,11 +54,6 @@
  - file: physgrad-nn.md
  - file: physgrad-discuss.md

- part: PBDL and Uncertainty 
-  chapters:
-  - file: bayesian-intro.md
-  - file: bayesian-code.md
-  
 - part: Fast Forward Topics
  chapters:
  - file: others-intro.md
--- a/bayesian-intro.md
+++ b/bayesian-intro.md
@ -1,7 +1,7 @@
 Introduction to Posterior Inference
 =======================

-We have to keep in mind that for all measurements, models, and discretizations we have uncertainties. In the former, this typically appears in the form of measurements errors, model equations usually encompass only parts of a system we're interested in, and for numerical simulations we inherently introduce discretization errors. So a very important question to ask here is how sure we can be sure that an answer we obtain is the correct one. From a statistics viewpoint, we'd like to know the probability distribution for the posterior, i.e., the outcomes.
+We should keep in mind that for all measurements, models, and discretizations we have uncertainties. For the former, this typically appears in the form of measurements errors, while model equations usually encompass only parts of a system we're interested in, and for numerical simulations we inherently introduce discretization errors. So a very important question to ask here is how sure we can be sure that an answer we obtain is the correct one. From a statistics viewpoint, we'd like to know the probability distribution for the posterior, i.e., the different outcomes that are possible.

 This admittedly becomes even more difficult in the context of machine learning:
 we're typically facing the task of approximating complex and unknown functions.
@ -10,22 +10,36 @@ yields a _maximum likelihood estimation_ (MLE) for the parameters of the network
 However, this MLE viewpoint does not take any of the uncertainties mentioned above into account:
 for DL training, we likewise have a numerical optimization, and hence an inherent
 approximation error and uncertainty regarding the learned representation.
-Ideally, we could change our learning problem such that we could do _posterior inference_,
+Ideally, we should reformulate our learning problem such that it enables _posterior inference_,
 i.e. learn to produce the full output distribution. However, this turns out to be an
 extremely difficult task.

 This where so called _Bayesian neural network_ (BNN) approaches come into play. They 
-make posterior inference possible by making assumptions about the probability 
-distributions of individual parameters of the network. Nonetheless, the task
+make a form of posterior inference possible by making assumptions about the probability 
+distributions of individual parameters of the network. With a distribution for the
+parameters we can evaluate the network multiple times to obtain different versions
+of the output, and in this way sample the distribution of the output.
+
+Nonetheless, the task
 remains very challenging. Training a BNN is typically significantly more difficult
 than training a regular NN. However, this should come as no surprise, as we're trying to 
-learn something fundamentally different in this case: a full probability distribution 
-instead of a point estimate.
+learn something fundamentally different here: a full probability distribution 
+instead of a point estimate. (All previous chapters "just" dealt with
+learning such point estimates.)

 ![Divider](resources/divider5.jpg)

+## Introduction to Bayesian Neural Networks
+
+
+**TODO, integrate Maximilians intro section here**
+...
+
+
 ## A practical example

-first example here with airfoils, extension from {doc}`supervised-airfoils`
-
+As a first real example for posterior inference with BNNs, let's revisit the
+case of turbulent flows around airfoils, from {doc}`supervised-airfoils`. However,
+in contrast to the point estimate learned in this section, we'll now aim for
+learning the full posterior.

--- a/diffphys.md
+++ b/diffphys.md
@ -62,7 +62,7 @@ given model parameter, with which the NN should not interact.
 Naturally, it can vary within the solution manifold that we're interested in, 
 but $\nu$ will not be the output of a NN representation. If this is the case, we can omit
 providing $\partial \mathcal P_i / \partial \nu$ in our solver. However, the following learning process
-natuarlly transfers to including $\nu$ as a degree of freedom.
+naturally transfers to including $\nu$ as a degree of freedom.

 ## Jacobians

@ -152,7 +152,7 @@ we could leverage the $O(n)$ runtime of multigrid solvers for matrix inversion.
 The flipside of this approach is, that it requires some understanding of the problem at hand, 
 and of the numerical methods. Also, a given solver might not provide gradient calculations out of the box.
 Thus, we want to employ DL for model equations that we don't have a proper grasp of, it might not be a good
-idea to direclty go for learning via a DP approach. However, if we don't really understand our model, we probably
+idea to directly go for learning via a DP approach. However, if we don't really understand our model, we probably
 should go back to studying it a bit more anyway...

 Also, in practice we can be _greedy_ with the derivative operators, and only 
@ -191,7 +191,7 @@ Note that to simplify things, we assume that $\mathbf{u}$ is only a function in
 i.e. constant over time. We'll bring back the time evolution of $\mathbf{u}$ later on.
 %
 Let's denote this re-formulation as $\mathcal P$. It maps a state of $d(t)$ into a 
-new state at an evoled time, i.e.:
+new state at an evolved time, i.e.:

 $$
    d(t+\Delta t) = \mathcal P ( ~ d(t), \mathbf{u}, t+\Delta t) 
@ -289,7 +289,7 @@ be preferable to actually constructing $A$.
 As a slightly more complex example let's consider Poisson's equation $\nabla^2 a = b$, where
 $a$ is the quantity of interest, and $b$ is given. 
 This is a very fundamental elliptic PDE that is important for 
-a variety of physical problems, from electrostatics to graviational fields. It also arises 
+a variety of physical problems, from electrostatics to gravitational fields. It also arises 
 in the context of fluids, where $a$ takes the role of a scalar pressure field in the fluid, and
 the right hand side $b$ is given by the divergence of the fluid velocity $\mathbf{u}$.

--- a/others-intro.md
+++ b/others-intro.md
@ -12,8 +12,16 @@ More specifically, we will look at:
  This typically replaces a numerical solver, and we can make use of special techniques from the DL area that target time series.

 * Generative models are likewise an own topic in DL, and here especially generative adversarial networks were shown to be powerful tools. They also represent a highly interesting training approach involving to separate NNs.
+{cite}`xie2018tempoGan`

 * Meshless methods and unstructured meshes are an important topic for classical simulations. Here, we'll look at a specific Lagrangian method that employs learning in the context of dynamic, particle-based representations.
+{cite}`prantl2019tranquil`
+{cite}`ummenhofer2019contconv`

-* Finally, metrics to reboustly assess the quality of similarity of measurements and results are a central topic for all numerical methods, no matter whether they employ learning or not. In the last section we will look at how DL can be used to learn specialized and improved metrics.
+https://github.com/intel-isl/DeepLagrangianFluids

+* Finally, metrics to robustly assess the quality of similarity of measurements and results are a central topic for all numerical methods, no matter whether they employ learning or not. In the last section we will look at how DL can be used to learn specialized and improved metrics.
+
+{cite}`kohl2020lsim`
+
+{cite}`um2020sol`
--- a/others-timeseries.md
+++ b/others-timeseries.md
@ -1,7 +1,53 @@
 Model Reduction and Time Series
 =======================

-model reduction? separate
+An inherent challenge for many practical PDE solvers is the large dimensionality of the problem.
+Our model $\mathcal{P}$ is typically discretized with $\mathcal{O}(n^3)$ samples for a 3 dimensional 
+problem (with $n$ denoting the number of samples along one axis), 
+and for time-dependent phenomena we additionally have a discretization along
+time. The latter typically scales in accordance to the spatial dimensions, giving an
+overall number of samples on the order of $\mathcal{O}(n^4)$. Not surprisingly, 
+the workload in these situations quickly explodes for larger $n$ (and for practical high-fidelity applications we want $n$ to be as large as possible).
+
+One popular way to reduce the complexity is to map a spatial state of our system $\mathbf{s_t} \in \mathbb{R}^{n^3}$
+into a much lower dimensional state $\mathbf{c_t} \in \mathbb{R}^{m}$, with $m \ll n^3$. Within this latent space,
+we estimate the evolution of our system by inferring a new state $\mathbf{c_{t+1}}$, which we then decode to obtain $\mathbf{s_{t+1}}$. In order for this to work, it's crucial that we can choose $m$ large enough that it captures all important structures in our solution manifold, and that the time prediction of $\mathbf{c_{t+1}}$ can be computed efficiently, such that we obtain a gain in performance despite the additional encoding and decoding steps. In practice, due to the explosion in terms of unknowns for regular simulations (the $\mathcal{O}(n^3)$ above) coupled a super-linear complexity for computing a new state, working with the latent space points $\mathbf{c}$ quickly pays off for small $m$.
+
+However, it's crucial that encoder and decoder do a good job at reducing the dimensionality of the problem. This is a very good task for DL approaches. Furthermore, we then need a time evolution of the latent space states $\mathbf{c}$, and for most practical model equations, we cannot find closed form solutions to evolve $\mathbf{c}$. Hence, this likewise poses a very good problem for learning methods. To summarize, we're facing to challenges: learning a good spatial encoding and decoding, together with learning an accurate time evolution.
+Below, we will describe an approach to solve this problem following Wiewel et al.
+{cite}`wiewel2019lss` & {cite}`wiewel2020lsssubdiv`, which in turn employs 
+the encoder/decoder of Kim et al. {cite}`bkim2019deep`.
+
+
+```{figure} resources/timeseries-lsp-overview.jpg
+---
+height: 200px
+name: timeseries-lsp-overview
+---
+For time series predictions with ROMs, we encode the state of our system with an encoder $f_e$, predict 
+the time evolution with $f_t$, and then decode the full spatial information with a decoder $f_d$.
+```
+
+
+## Reduced Order Models 
+
+Reducing the order of computational models, often called _reduced order modeling_ (ROM) or _model reduction_,
+as a classic topic in the computational field. Traditional techniques often employ techniques such as principal component analysis to arrive at a basis for a chosen space of solution. However, being linear by construction, these approaches have inherent limitations when representing complex, non-linear solution manifolds. And in practice, all "interesting" solutions are highly non-linear.
+
+
+$\text{arg min}_{\theta} | f_d( f_e(x;\theta_e) ;\theta_d) - x |_2^2$
+
+$f_e: \mathbb{R}^{n^3} \rightarrow \mathbb{R}^{m}$
+
+$f_d: \mathbb{R}^{m} \rightarrow \mathbb{R}^{n^3}$
+
+
+separable model
+
+
+
+## Time Series
+

 ...

--- a/physgrad.md
+++ b/physgrad.md
@ -1,6 +1,8 @@
 Physical Gradients
 =======================

+**Note, this chapter is very preliminary - probably not for the first version of the book**
+
 The next chapter will dive deeper into state-of-the-art-research, and aim for an even tighter
 integration of physics and learning.
 The approaches explained previously all integrate physical models into deep learning algorithms,
--- a/reinflearn-code.md
+++ b/reinflearn-code.md