additional corrections supervised chapter

This commit is contained in:
NT 2021-07-10 10:50:51 +02:00
parent c8feb79fe7
commit fe0026a8ca
5 changed files with 13 additions and 12 deletions

View File

@ -7,7 +7,7 @@ We should keep in mind that for all measurements, models, and discretizations we
This admittedly becomes even more difficult in the context of machine learning:
we're typically facing the task of approximating complex and unknown functions.
From a probabilistic perspective, the standard process of training a NN here
From a probabilistic perspective, the standard process of training an NN here
yields a _maximum likelihood estimation_ (MLE) for the parameters of the network.
However, this MLE viewpoint does not take any of the uncertainties mentioned above into account:
for DL training, we likewise have a numerical optimization, and hence an inherent

View File

@ -63,7 +63,7 @@ for all parameters of $\mathcal P(\mathbf{x}, \nu)$, e.g.,
we omit $\nu$ in the following, assuming that this is a
given model parameter with which the NN should not interact.
Naturally, it can vary within the solution manifold that we're interested in,
but $\nu$ will not be the output of a NN representation. If this is the case, we can omit
but $\nu$ will not be the output of an NN representation. If this is the case, we can omit
providing $\partial \mathcal P_i / \partial \nu$ in our solver. However, the following learning process
naturally transfers to including $\nu$ as a degree of freedom.

View File

@ -32,7 +32,7 @@ where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to
of higher and higher order (this can of course also include mixed derivatives with respect to different axes).
In this context we can employ DL by approximating the unknown $\mathbf{u}$ itself
with a NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
with an NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
naturally should be satisfied, i.e., the residual $R$ should be equal to zero:
$$

View File

@ -424,7 +424,7 @@
"id": "UNjBAvfWJMTR"
},
"source": [
"With an exponent of 3, this network has 147555 trainable parameters. As the subtle hint in the print statement indicates, this is a crucial number to always have in view when training NNs. It's easy to change settings, and get a network that has millions of parameters, and as a result probably all kinds of convergence and overfitting problems. The number of parrameters definitely has to be matched with the amount of training data, and should also scale with the depth of the network. How these three relate to each other exactly is problem dependent, though."
"With an exponent of 3, this network has 147555 trainable parameters. As the subtle hint in the print statement indicates, this is a crucial number to always have in view when training NNs. It's easy to change settings, and get a network that has millions of parameters, and as a result probably all kinds of convergence and overfitting problems. The number of parameters definitely has to be matched with the amount of training data, and should also scale with the depth of the network. How these three relate to each other exactly is problem dependent, though."
]
},
{
@ -435,7 +435,7 @@
"source": [
"## Training\n",
"\n",
"Finally, we can train the model. This step can take a while, as the training runs over all 320 samples 100 times, and continually evaluate the validation samples to keep track of how well the current state of the NN is doing."
"Finally, we can train the model. This step can take a while, as the training runs over all 320 samples 100 times, and continually evaluates the validation samples to keep track of how well the current state of the NN is doing."
]
},
{
@ -806,9 +806,9 @@
"\n",
"* Experiment with learning rate, dropout, and model size to reduce the error on the test set. How small can you make it with the given training data?\n",
"\n",
"* The setup above uses normalized data [the original fields by undoing the normalization](https://github.com/thunil/Deep-Flow-Prediction), and check how well the network does w.r.t. the original \n",
"* The setup above uses normalized data. Instead you can recover [the original fields by undoing the normalization](https://github.com/thunil/Deep-Flow-Prediction) to check how well the network does w.r.t. the original quantities.\n",
"\n",
"* As you'll see, it's a bit limited here what you can get out of this dataset, head over to [the main github repo of this project](https://github.com/thunil/Deep-Flow-Prediction) to download larger data sets, or generate own data\n",
"* As you'll see, it's a bit limited here what you can get out of this dataset, head over to [the main github repo of this project](https://github.com/thunil/Deep-Flow-Prediction) to download larger data sets, or generate own data.\n",
"\n"
]
}

View File

@ -14,7 +14,7 @@ model equations exist.
For supervised training, we're faced with an
unknown function $f^*(x)=y^*$, collect lots of pairs of data $[x_0,y^*_0], ...[x_n,y^*_n]$ (the training data set)
and directly train a NN to represent an approximation of $f^*$ denoted as $f$.
and directly train an NN to represent an approximation of $f^*$ denoted as $f$.
The $f$ we can obtain in this way is typically not exact,
but instead we obtain it via a minimization problem:
@ -24,7 +24,7 @@ $$
\text{arg min}_{\theta} \sum_i (f(x_i ; \theta)-y^*_i)^2 .
$$ (supervised-training)
This will give us $\theta$ such that $f(x;\theta) = y \approx y$ as accurately as possible given
This will give us $\theta$ such that $f(x;\theta) = y \approx y^*$ as accurately as possible given
our choice of $f$ and the hyperparameters for training. Note that above we've assumed
the simplest case of an $L^2$ loss. A more general version would use an error metric $e(x,y)$
to be minimized via $\text{arg min}_{\theta} \sum_i e( f(x_i ; \theta) , y^*_i) )$. The choice
@ -37,7 +37,7 @@ The training data typically needs to be of substantial size, and hence it is att
to use numerical simulations solving a physical model $\mathcal{P}$
to produce a large number of reliable input-output pairs for training.
This means that the training process uses a set of model equations, and approximates
them numerically, in order to train the NN representation $\tilde{f}$. This
them numerically, in order to train the NN representation $f$. This
has quite a few advantages, e.g., we don't have measurement noise of real-world devices
and we don't need manual labour to annotate a large number of samples to get training data.
@ -61,7 +61,8 @@ in mind in comparison to the more complex variants we'll encounter later on.
## Surrogate models
One of the central advantages of the supervised approach above is that
we obtain a _surrogate_ for the model $\mathcal{P}$. The numerical approximations
we obtain a _surrogate model_, i.e., a new function that mimics the behavior of the original $\mathcal{P}$.
The numerical approximations
of PDE models for real world phenomena are often very expensive to compute. A trained
NN on the other hand incurs a constant cost per evaluation, and is typically trivial
to evaluate on specialized hardware such as GPUs or NN units.
@ -78,4 +79,4 @@ is a very attractive and interesting direction.
## Show me some code!
Let's directly look at an example for this: we'll replace a full solver for
_turbulent flows around airfoils_ with a surrogate model (from {cite}`thuerey2020dfp`).
_turbulent flows around airfoils_ with a surrogate model from {cite}`thuerey2020dfp`.