moved RL chapter up, added dividers

2022-01-04 13:59:22 +01:00 · 2022-01-04 13:59:22 +01:00 · 83253b3503
commit 83253b3503
parent 68a6e22341
5 changed files with 60 additions and 33 deletions
--- a/_toc.yml
+++ b/_toc.yml
@ -31,6 +31,10 @@ parts:
  - file: diffphys-code-sol.ipynb
  - file: diffphys-control.ipynb
  - file: diffphys-outlook.md
+- caption: Reinforcement Learning
+  chapters:
+  - file: reinflearn-intro.md
+  - file: reinflearn-code.ipynb
 - caption: Improved Gradients
  chapters:
  - file: physgrad.md
@ -40,10 +44,6 @@ parts:
  - file: physgrad-hig.md
  - file: physgrad-hig-code.ipynb
  - file: physgrad-discuss.md
- caption: Reinforcement Learning
-  chapters:
-  - file: reinflearn-intro.md
-  - file: reinflearn-code.ipynb
 - caption: PBDL and Uncertainty
  chapters:
  - file: bayesian-intro.md
--- a/physgrad-discuss.md
+++ b/physgrad-discuss.md
@ -1,38 +1,57 @@
 Discussion
 =======================

+At this point it's a good time to take another step back, and assess the different methods of the previous chapters. For deep learning applications, we can broadly distinguish three approaches: the _regular_ differentiable physics (DP) training, the training with half-inverse gradients (HIGs), and using the physical gradients (PGs). Unfortunately, we can't simply discard two of them, and focus on a single approach for all future endeavours. However, discussing the pros and cons sheds light on some fundamental aspects of physics-based deep learning, so here we go...

-xxx TODO update , include HIG discussion xxx
-... discarded supervised, and PIs
+![Divider](resources/divider7.jpg)

-PGs higher order, custom inverse , chain PDE & NN together
+## Addressing scaling issues

-HIG more generic, numerical inversion , joint physics & NN
+First and foremost, a central motivation for improved updates is the need to address the scaling issues of the learning problems. This is not a completely new problem: numerous deep learning algorithms were proposed to address these for training NNs. However, the combination of NNs with physical simulations brings new challenges that provide new angles to tackle this problem. On the negative side, we have additional, highly non-linear operators from the PDE models. On the positive side, these operators typically do not have free parameters during learning, and thus can be treated with different, tailored methods.
+
+This is exactly where HIGs and PGs come in: instead of treating the physical simulation like the rest of the NNs (this is the DP approach), they show how much can be achieved with custom inverse solvers (PGs) or a custom numerical inversion (HIGs).
+
+## Computational Resources
+
+Both cases usually lead to more complicated and resource intensive training. However, assuming that we can re-use a trained model many times after the training has been completed, there are many areas of applications where this can quickly pay off: the trained NNs, despite being identical in runtime to those obtained from other training methods, often achieve significantly improved accuracies. Achieving similar levels of accuracy with regular Adam and DP-based training can be infeasible. 
+
+When such a trained NN is used, e.g., as a surrogate model for an inverse problem, it might be executed a large number of times, and the improved accuracy can save correspondingly large amounts of computational resources in such a follow up stage. 
+A good potential example are shape optimizations for the drag reduction of bodies immersed in a fluid {cite}`chen2021numerical`.



-In a way, the learning via physical gradients provide the tightest possible coupling
-of physics and NNs: the full non-linear process of the PDE model directly steers
-the optimization of the NN.
-
-Naturally, this comes at a cost - invertible simulators are more difficult to build
-(and less common) than the first-order gradients from
-deep learning and adjoint optimizations. Nonetheless, if they're available,
-invertible simulators can speed up convergence, and yield models that have an inherently better performance.
-Thus, once trained, these models can give a performance that we simply can't obtain
-by, e.g., training longer with a simpler approach. So, if we plan to evaluate these
-models often (e.g., ship them in an application), this increased one-time cost
-can pay off in the long run.
-
 ![Divider](resources/divider1.jpg)

-## Summary

-✅ Pro: 
- Very accurate "gradient" information for learning and optimization.
+## Summary 
+
+% DP basic, generic, 
+% PGs higher order, custom inverse , chain PDE & NN together
+% HIG more generic, numerical inversion , joint physics & NN
+
+xxx old xxx
+
+%In a way, the learning via physical gradients provide the tightest possible coupling of physics and NNs: the full non-linear process of the PDE model directly steers the optimization of the NN.
+
+PG old: Naturally, this comes at a cost - invertible simulators are more difficult to build (and less common) than the first-order gradients from deep learning and adjoint optimizations. Nonetheless, if they're available, invertible simulators can speed up convergence, and yield models that have an inherently better performance. Thus, once trained, these models can give a performance that we simply can't obtain by, e.g., training longer with a simpler approach. So, if we plan to evaluate these models often (e.g., ship them in an application), this increased one-time cost can pay off in the long run.
+
+---
+
+✅ Pro HIG: 
+- Robustly addresses scaling issues, jointly for physical models and NN.
 - Improved convergence and model performance.
- Tightest possible coupling of model PDEs and learning.

-❌ Con: 
+❌ Con HIG: 
+- Requires SVD
+- mem req
+
+---
+
+✅ Pro PG: 
+- Very accurate "gradient" information for physical simulations.
+- Strongly improved convergence and model performance.
+
+❌ Con PG: 
 - Requires inverse simulators (at least local ones).
 - Less wide-spread availability than, e.g., differentiable physics simulators.
+
--- a/physgrad-hig.md
+++ b/physgrad-hig.md
@ -87,6 +87,9 @@ To summarize, compute the HIG update requires evaluating the individual Jacobian

 % 

+![Divider](resources/divider6.jpg)
+
+
 ## Properties Illustrated via a Toy Example

 This is a good time to illustrate the properties mentioned in the previous paragraphs with a real example. 
@ -144,17 +147,18 @@ This becomes even clearer in the middle graph, showing the activations statistic

 The third graph on the right side of figure {numref}`hig-toy-example-bad` shows the resulting behavior in terms of the outputs. As already indicated by the loss values, both Adam and GN do not reach the target (the black dot). Interestingly, it's also apparent that both have much more problems along the $y^2$ direction, which we used to cause the bad conditioning: they both make some progress along the x-axis of the graph ($y^1$), but don't move much towards the $y^2$ target value. This is illustrating the discussions above: GN gets stuck due to its saturated neurons, while Adam struggles to undo the scaling of $y^2$.

+---

 %We've kept the $\eta$ in here for consistency, but in practice $\eta=1$ is used for Gauss-Newton

 ## Summary of Half-Inverse Gradients

-Note that for all examples so far, we've improved upon the _differentiable physics_ (DP) training from the previous chapters. I.e., we've focused on combinations of neural networks and PDE solving operators. The latter need to be differentiable for training with regular SGD, as well as for HIG-based training. For the physical gradients, we even need them to provide an inverse solver. Thus, the HIGs described above share more similarities with, e.g., {doc}`diffphys-code-sol` and  {doc}`diffphys-code-control`, than with {doc}`physgrad-code`.
+Note that for all examples so far, we've improved upon the _differentiable physics_ (DP) training from the previous chapters. I.e., we've focused on combinations of neural networks and PDE solving operators. The latter need to be differentiable for training with regular SGD, as well as for HIG-based training. For the physical gradients, we even need them to provide an inverse solver. Thus, the HIGs described above share more similarities with, e.g., {doc}`diffphys-code-sol` and  {doc}`diffphys-control`, than with {doc}`physgrad-code`.

 This is a good time to give a specific code example of how to train physical NNs with HIGs: we'll look at a classic case, a system of coupled oscillators.


-## xxx TODO , merge into code later on xxx
+## xxx TODO , merge into HIG example code later on xxx

 As example problem for the Half-Inverse Gradients (HIGs) we'll consider controlling a system of coupled oscillators. This is a classical problem in physics, and a good case to evaluate the HIGs due to it's smaller size. We're using two mass points, and thus we'll only have four degrees of freedom for position and velocity of both points (compared to, e.g., the $32\times32\times2$ unknowns we'd get even for "only" a small fluid simulation with 32 cells along x and y). Nonetheless, the oscillators are a highly-non trivial case: we aim for applying a control such that the initial state is reached again after a chosen time interval. Here we'll 96 steps of a fourth-order Runge-Kutta scheme, and hence the NN has to learn how to best "nudge" the two mass points over the course of all time steps, so that they end up at the desired position with the right velocity at the right time.

@ -166,6 +170,5 @@ $$

 ... which provides the basis for the RK4 time integration.

-continue with notebook text ...
-
+xxx

--- a/physgrad.md
+++ b/physgrad.md
@ -85,7 +85,8 @@ name: pg-training
 TODO, visual overview of PG training
 ```

---
+![Divider](resources/divider3.jpg)
+

 ## Traditional optimization methods

@ -207,8 +208,7 @@ are still a very active research topic, and hence many extensions have been prop



---
-
+![Divider](resources/divider4.jpg)


 ## Derivation of Physical Gradients
@ -334,6 +334,8 @@ This effectively amounts to _smoothing the objective landscape_ of an optimizati
 The equations naturally generalize to higher dimensions by replacing the integral with a path integral along any differentiable path connecting $x_0$ and $x_0 + \Delta x$ and replacing the local gradient by the local gradient in the direction of the path.


+![Divider](resources/divider5.jpg)
+


 ### Global and local inverse functions
--- a/reinflearn-intro.md
+++ b/reinflearn-intro.md
@ -35,6 +35,7 @@ Value-based methods, such as _Q-Learning_, on the other hand work by optimizing

 In addition, _actor-critic_ methods combine elements from both approaches. Here, the actions generated by a policy network are rated based on a corresponding change in state potential. These values are given by another neural network and approximate the expected cumulative reward from the given state. _Proximal policy optimization_ (PPO) {cite}`schulman2017proximal` is one example from this class of algorithms and is our choice for the example task of this chapter, which is controlling Burgers' equation as a physical environment.

+![Divider](resources/divider1.jpg)

 ## Proximal policy optimization

@ -128,6 +129,8 @@ r_t^o &=
 \end{cases}
 \end{aligned}$$

+![Divider](resources/divider2.jpg)
+
 ## Implementation

 In the following, we'll describe a way to implement a PPO-based RL training for physical systems. This implementation is also the basis for the notebook of the next section, i.e., {doc}`reinflearn-code`. While this notebook provides a practical example, and an evaluation in comparison to DP training, we'll first give a more generic overview below.