pbdl-book/supervised.md

Supervised Training
=======================

_Supervised_ here essentially means: "doing things the old fashioned way". Old fashioned in the context of 
deep learning (DL), of course, so it's still fairly new. Also, "old fashioned" doesn't 
always mean bad - it's just that later on we'll be able to do better than with a simple supervised training.

In a way, the viewpoint of "supervised training" is a starting point for all projects one would encounter in the context of DL, and
hence is worth studying. While it typically yields inferior results to approaches that more tightly 
couple with physics, it nonetheless can be the only choice in certain application scenarios where no good
model equations exist.

## Problem setting

For supervised training, we're faced with an 
unknown function $f^*(x)=y^*$, collect lots of pairs of data $[x_0,y^*_0], ...[x_n,y^*_n]$ (the training data set)
and directly train a NN to represent an approximation of $f^*$ denoted as $f$, such
that $f(x)=y \approx y^*$.

The $f$ we can obtain is typically not exact, 
but instead we obtain it via a minimization problem:
by adjusting weights $\theta$ of our representation with $f$ such that

$\text{arg min}_{\theta} \sum_i (f(x_i ; \theta)-y^*_i)^2$.

This will give us $\theta$ such that $f(x;\theta) \approx y$ as accurately as possible given
our choice of $f$ and the hyperparameters for training. Note that above we've assumed 
the simplest case of an $L^2$ loss. A more general version would use an error metric $e(x,y)$
to be minimized via $\text{arg min}_{\theta} \sum_i e( f(x_i ; \theta) , y^*_i) )$. The choice
of a suitable metric is topic we will get back to later on.

Irrespective of our choice of metric, this formulation
gives the actual "learning" process for a supervised approach.

The training data typically needs to be of substantial size, and hence it is attractive 
to use numerical simulations to produce a large number of training input-output pairs.
This means that the training process uses a set of model equations, and approximates
them numerically, in order to train the NN representation $\tilde{f}$. This
has a bunch of advantages, e.g., we don't have measurement noise of real-world devices
and we don't need manual labour to annotate a large number of samples to get training data.

On the other hand, this approach inherits the common challenges of replacing experiments
with simulations: first, we need to ensure the chosen model has enough power to predict the 
behavior of real-world phenomena that we're interested in.
In addition, the numerical approximations have numerical errors
which need to be kept small enough for a chosen application. As these topics are studied in depth
for classical simulations, the existing knowledge can likewise be leveraged to
set up DL training tasks.

```{figure} resources/supervised-training.jpg
---
height: 220px
name: supervised-training
---
A visual overview of supervised training. Quite simple overall, but it's good to keep this
in mind in comparison to the more complex variants we'll encounter later on.
```

## Show me some code!

Let's directly look at an implementation within a more complicated context:
_turbulent flows around airfoils_ from {cite}`thuerey2020deepFlowPred`.
updated supervised chapter 2021-01-21 04:45:37 +01:00			`Supervised Training`
initial checkin 2021-01-04 09:36:09 +01:00			`=======================`

more text 2021-01-12 04:50:42 +01:00			`_Supervised_ here essentially means: "doing things the old fashioned way". Old fashioned in the context of`
updated teaser, added dividers 2021-04-11 14:17:03 +02:00			`deep learning (DL), of course, so it's still fairly new. Also, "old fashioned" doesn't`
			`always mean bad - it's just that later on we'll be able to do better than with a simple supervised training.`
updated supervised chapter 2021-01-21 04:45:37 +01:00
			`In a way, the viewpoint of "supervised training" is a starting point for all projects one would encounter in the context of DL, and`
updated teaser, added dividers 2021-04-11 14:17:03 +02:00			`hence is worth studying. While it typically yields inferior results to approaches that more tightly`
more text 2021-01-12 04:50:42 +01:00			`couple with physics, it nonetheless can be the only choice in certain application scenarios where no good`
			`model equations exist.`

unified caps of headings 2021-04-12 03:19:00 +02:00			`## Problem setting`
more text 2021-01-12 04:50:42 +01:00
updated supervised chapter 2021-01-21 04:45:37 +01:00			`For supervised training, we're faced with an`
started figures 2021-03-02 14:42:27 +01:00			`unknown function $f^(x)=y^$, collect lots of pairs of data $[x_0,y^_0], ...[x_n,y^_n]$ (the training data set)`
more text 2021-01-12 04:50:42 +01:00			`and directly train a NN to represent an approximation of $f^*$ denoted as $f$, such`
started figures 2021-03-02 14:42:27 +01:00			`that $f(x)=y \approx y^*$.`
more text 2021-01-12 04:50:42 +01:00
			`The $f$ we can obtain is typically not exact,`
			`but instead we obtain it via a minimization problem:`
			`by adjusting weights $\theta$ of our representation with $f$ such that`

started figures 2021-03-02 14:42:27 +01:00			`$\text{arg min}_{\theta} \sum_i (f(x_i ; \theta)-y^*_i)^2$.`
more text 2021-01-12 04:50:42 +01:00
			`This will give us $\theta$ such that $f(x;\theta) \approx y$ as accurately as possible given`
Starting diffphys chapter 2021-01-15 09:13:41 +01:00			`our choice of $f$ and the hyperparameters for training. Note that above we've assumed`
more text 2021-01-12 04:50:42 +01:00			`the simplest case of an $L^2$ loss. A more general version would use an error metric $e(x,y)$`
started figures 2021-03-02 14:42:27 +01:00			`to be minimized via $\text{arg min}_{\theta} \sum_i e( f(x_i ; \theta) , y^*_i) )$. The choice`
more text 2021-01-12 04:50:42 +01:00			`of a suitable metric is topic we will get back to later on.`

			`Irrespective of our choice of metric, this formulation`
			`gives the actual "learning" process for a supervised approach.`

			`The training data typically needs to be of substantial size, and hence it is attractive`
			`to use numerical simulations to produce a large number of training input-output pairs.`
			`This means that the training process uses a set of model equations, and approximates`
			`them numerically, in order to train the NN representation $\tilde{f}$. This`
			`has a bunch of advantages, e.g., we don't have measurement noise of real-world devices`
			`and we don't need manual labour to annotate a large number of samples to get training data.`

			`On the other hand, this approach inherits the common challenges of replacing experiments`
			`with simulations: first, we need to ensure the chosen model has enough power to predict the`
spellcheck 2021-03-09 09:39:54 +01:00			`behavior of real-world phenomena that we're interested in.`
more text 2021-01-12 04:50:42 +01:00			`In addition, the numerical approximations have numerical errors`
			`which need to be kept small enough for a chosen application. As these topics are studied in depth`
			`for classical simulations, the existing knowledge can likewise be leveraged to`
			`set up DL training tasks.`

started figures 2021-03-02 14:42:27 +01:00			```{figure} resources/supervised-training.jpg
more text 2021-01-12 04:50:42 +01:00			`---`
			`height: 220px`
			`name: supervised-training`
			`---`
started figures 2021-03-02 14:42:27 +01:00			`A visual overview of supervised training. Quite simple overall, but it's good to keep this`
			`in mind in comparison to the more complex variants we'll encounter later on.`
more text 2021-01-12 04:50:42 +01:00			```

updated supervised chapter 2021-01-21 04:45:37 +01:00			`## Show me some code!`
more text 2021-01-12 04:50:42 +01:00
updated supervised chapter 2021-01-21 04:45:37 +01:00			`Let's directly look at an implementation within a more complicated context:`
			_turbulent flows around airfoils_ from {cite}`thuerey2020deepFlowPred`.
initial checkin 2021-01-04 09:36:09 +01:00