2021-01-21 04:45:37 +01:00
Supervised Training
2021-01-04 09:36:09 +01:00
=======================
2021-01-12 04:50:42 +01:00
_Supervised_ here essentially means: "doing things the old fashioned way". Old fashioned in the context of
2021-05-16 05:29:51 +02:00
deep learning (DL), of course, so it's still fairly new.
Also, "old fashioned" doesn't always mean bad - it's just that later on we'll discuss ways to train networks that clearly outperform approaches using supervised training.
2021-01-21 04:45:37 +01:00
2021-05-16 05:29:51 +02:00
Nonetheless, "supervised training" is a starting point for all projects one would encounter in the context of DL, and
hence it is worth studying. Also, while it typically yields inferior results to approaches that more tightly
couple with physics, it can be the only choice in certain application scenarios where no good
2021-01-12 04:50:42 +01:00
model equations exist.
2021-04-12 03:19:00 +02:00
## Problem setting
2021-01-12 04:50:42 +01:00
2021-01-21 04:45:37 +01:00
For supervised training, we're faced with an
2021-03-02 14:42:27 +01:00
unknown function $f^*(x)=y^*$, collect lots of pairs of data $[x_0,y^*_0], ...[x_n,y^*_n]$ (the training data set)
2021-07-10 10:50:51 +02:00
and directly train an NN to represent an approximation of $f^*$ denoted as $f$.
2021-01-12 04:50:42 +01:00
2021-05-16 05:29:51 +02:00
The $f$ we can obtain in this way is typically not exact,
2021-01-12 04:50:42 +01:00
but instead we obtain it via a minimization problem:
2021-05-16 05:29:51 +02:00
by adjusting the weights $\theta$ of our NN representation of $f$ such that
2021-01-12 04:50:42 +01:00
2021-05-16 05:29:51 +02:00
$$
\text{arg min}_{\theta} \sum_i (f(x_i ; \theta)-y^*_i)^2 .
$$ (supervised-training)
2021-01-12 04:50:42 +01:00
2021-07-10 10:50:51 +02:00
This will give us $\theta$ such that $f(x;\theta) = y \approx y^*$ as accurately as possible given
2021-01-15 09:13:41 +01:00
our choice of $f$ and the hyperparameters for training. Note that above we've assumed
2021-01-12 04:50:42 +01:00
the simplest case of an $L^2$ loss. A more general version would use an error metric $e(x,y)$
2021-03-02 14:42:27 +01:00
to be minimized via $\text{arg min}_{\theta} \sum_i e( f(x_i ; \theta) , y^*_i) )$. The choice
2021-05-16 05:29:51 +02:00
of a suitable metric is a topic we will get back to later on.
2021-01-12 04:50:42 +01:00
Irrespective of our choice of metric, this formulation
gives the actual "learning" process for a supervised approach.
The training data typically needs to be of substantial size, and hence it is attractive
2021-05-16 05:29:51 +02:00
to use numerical simulations solving a physical model $\mathcal{P}$
to produce a large number of reliable input-output pairs for training.
2021-01-12 04:50:42 +01:00
This means that the training process uses a set of model equations, and approximates
2021-07-10 10:50:51 +02:00
them numerically, in order to train the NN representation $f$. This
2021-05-16 05:29:51 +02:00
has quite a few advantages, e.g., we don't have measurement noise of real-world devices
2021-01-12 04:50:42 +01:00
and we don't need manual labour to annotate a large number of samples to get training data.
On the other hand, this approach inherits the common challenges of replacing experiments
with simulations: first, we need to ensure the chosen model has enough power to predict the
2021-03-09 09:39:54 +01:00
behavior of real-world phenomena that we're interested in.
2021-01-12 04:50:42 +01:00
In addition, the numerical approximations have numerical errors
which need to be kept small enough for a chosen application. As these topics are studied in depth
2021-05-16 05:29:51 +02:00
for classical simulations, and the existing knowledge can likewise be leveraged to
2021-01-12 04:50:42 +01:00
set up DL training tasks.
2021-03-02 14:42:27 +01:00
```{figure} resources/supervised-training.jpg
2021-01-12 04:50:42 +01:00
---
height: 220px
name: supervised-training
---
2021-03-02 14:42:27 +01:00
A visual overview of supervised training. Quite simple overall, but it's good to keep this
in mind in comparison to the more complex variants we'll encounter later on.
2021-01-12 04:50:42 +01:00
```
2021-05-16 05:29:51 +02:00
## Surrogate models
One of the central advantages of the supervised approach above is that
2021-07-10 10:50:51 +02:00
we obtain a _surrogate model_ , i.e., a new function that mimics the behavior of the original $\mathcal{P}$.
The numerical approximations
2021-05-16 05:29:51 +02:00
of PDE models for real world phenomena are often very expensive to compute. A trained
NN on the other hand incurs a constant cost per evaluation, and is typically trivial
to evaluate on specialized hardware such as GPUs or NN units.
Despite this, it's important to be careful:
2021-06-24 20:18:18 +02:00
NNs can quickly generate huge numbers of in between results. Consider a CNN layer with
2021-05-16 05:29:51 +02:00
$128$ features. If we apply it to an input of $128^2$, i.e. ca. 16k cells, we get $128^3$ intermediate values.
That's more than 2 million.
All these values at least need to be momentarily stored in memory, and processed by the next layer.
2021-01-12 04:50:42 +01:00
2021-05-16 05:29:51 +02:00
Nonetheless, replacing complex and expensive solvers with fast, learned approximations
is a very attractive and interesting direction.
## Show me some code!
2021-01-04 09:36:09 +01:00
2021-05-16 05:29:51 +02:00
Let's directly look at an example for this: we'll replace a full solver for
2021-07-10 10:50:51 +02:00
_turbulent flows around airfoils_ with a surrogate model from {cite}`thuerey2020dfp`.