additional corrections supervised chapter
This commit is contained in:
parent
c8feb79fe7
commit
fe0026a8ca
@ -7,7 +7,7 @@ We should keep in mind that for all measurements, models, and discretizations we
|
||||
|
||||
This admittedly becomes even more difficult in the context of machine learning:
|
||||
we're typically facing the task of approximating complex and unknown functions.
|
||||
From a probabilistic perspective, the standard process of training a NN here
|
||||
From a probabilistic perspective, the standard process of training an NN here
|
||||
yields a _maximum likelihood estimation_ (MLE) for the parameters of the network.
|
||||
However, this MLE viewpoint does not take any of the uncertainties mentioned above into account:
|
||||
for DL training, we likewise have a numerical optimization, and hence an inherent
|
||||
|
@ -63,7 +63,7 @@ for all parameters of $\mathcal P(\mathbf{x}, \nu)$, e.g.,
|
||||
we omit $\nu$ in the following, assuming that this is a
|
||||
given model parameter with which the NN should not interact.
|
||||
Naturally, it can vary within the solution manifold that we're interested in,
|
||||
but $\nu$ will not be the output of a NN representation. If this is the case, we can omit
|
||||
but $\nu$ will not be the output of an NN representation. If this is the case, we can omit
|
||||
providing $\partial \mathcal P_i / \partial \nu$ in our solver. However, the following learning process
|
||||
naturally transfers to including $\nu$ as a degree of freedom.
|
||||
|
||||
|
@ -32,7 +32,7 @@ where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to
|
||||
of higher and higher order (this can of course also include mixed derivatives with respect to different axes).
|
||||
|
||||
In this context we can employ DL by approximating the unknown $\mathbf{u}$ itself
|
||||
with a NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
|
||||
with an NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
|
||||
naturally should be satisfied, i.e., the residual $R$ should be equal to zero:
|
||||
|
||||
$$
|
||||
|
@ -424,7 +424,7 @@
|
||||
"id": "UNjBAvfWJMTR"
|
||||
},
|
||||
"source": [
|
||||
"With an exponent of 3, this network has 147555 trainable parameters. As the subtle hint in the print statement indicates, this is a crucial number to always have in view when training NNs. It's easy to change settings, and get a network that has millions of parameters, and as a result probably all kinds of convergence and overfitting problems. The number of parrameters definitely has to be matched with the amount of training data, and should also scale with the depth of the network. How these three relate to each other exactly is problem dependent, though."
|
||||
"With an exponent of 3, this network has 147555 trainable parameters. As the subtle hint in the print statement indicates, this is a crucial number to always have in view when training NNs. It's easy to change settings, and get a network that has millions of parameters, and as a result probably all kinds of convergence and overfitting problems. The number of parameters definitely has to be matched with the amount of training data, and should also scale with the depth of the network. How these three relate to each other exactly is problem dependent, though."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -435,7 +435,7 @@
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"\n",
|
||||
"Finally, we can train the model. This step can take a while, as the training runs over all 320 samples 100 times, and continually evaluate the validation samples to keep track of how well the current state of the NN is doing."
|
||||
"Finally, we can train the model. This step can take a while, as the training runs over all 320 samples 100 times, and continually evaluates the validation samples to keep track of how well the current state of the NN is doing."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -806,9 +806,9 @@
|
||||
"\n",
|
||||
"* Experiment with learning rate, dropout, and model size to reduce the error on the test set. How small can you make it with the given training data?\n",
|
||||
"\n",
|
||||
"* The setup above uses normalized data [the original fields by undoing the normalization](https://github.com/thunil/Deep-Flow-Prediction), and check how well the network does w.r.t. the original \n",
|
||||
"* The setup above uses normalized data. Instead you can recover [the original fields by undoing the normalization](https://github.com/thunil/Deep-Flow-Prediction) to check how well the network does w.r.t. the original quantities.\n",
|
||||
"\n",
|
||||
"* As you'll see, it's a bit limited here what you can get out of this dataset, head over to [the main github repo of this project](https://github.com/thunil/Deep-Flow-Prediction) to download larger data sets, or generate own data\n",
|
||||
"* As you'll see, it's a bit limited here what you can get out of this dataset, head over to [the main github repo of this project](https://github.com/thunil/Deep-Flow-Prediction) to download larger data sets, or generate own data.\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
|
@ -14,7 +14,7 @@ model equations exist.
|
||||
|
||||
For supervised training, we're faced with an
|
||||
unknown function $f^*(x)=y^*$, collect lots of pairs of data $[x_0,y^*_0], ...[x_n,y^*_n]$ (the training data set)
|
||||
and directly train a NN to represent an approximation of $f^*$ denoted as $f$.
|
||||
and directly train an NN to represent an approximation of $f^*$ denoted as $f$.
|
||||
|
||||
The $f$ we can obtain in this way is typically not exact,
|
||||
but instead we obtain it via a minimization problem:
|
||||
@ -24,7 +24,7 @@ $$
|
||||
\text{arg min}_{\theta} \sum_i (f(x_i ; \theta)-y^*_i)^2 .
|
||||
$$ (supervised-training)
|
||||
|
||||
This will give us $\theta$ such that $f(x;\theta) = y \approx y$ as accurately as possible given
|
||||
This will give us $\theta$ such that $f(x;\theta) = y \approx y^*$ as accurately as possible given
|
||||
our choice of $f$ and the hyperparameters for training. Note that above we've assumed
|
||||
the simplest case of an $L^2$ loss. A more general version would use an error metric $e(x,y)$
|
||||
to be minimized via $\text{arg min}_{\theta} \sum_i e( f(x_i ; \theta) , y^*_i) )$. The choice
|
||||
@ -37,7 +37,7 @@ The training data typically needs to be of substantial size, and hence it is att
|
||||
to use numerical simulations solving a physical model $\mathcal{P}$
|
||||
to produce a large number of reliable input-output pairs for training.
|
||||
This means that the training process uses a set of model equations, and approximates
|
||||
them numerically, in order to train the NN representation $\tilde{f}$. This
|
||||
them numerically, in order to train the NN representation $f$. This
|
||||
has quite a few advantages, e.g., we don't have measurement noise of real-world devices
|
||||
and we don't need manual labour to annotate a large number of samples to get training data.
|
||||
|
||||
@ -61,7 +61,8 @@ in mind in comparison to the more complex variants we'll encounter later on.
|
||||
## Surrogate models
|
||||
|
||||
One of the central advantages of the supervised approach above is that
|
||||
we obtain a _surrogate_ for the model $\mathcal{P}$. The numerical approximations
|
||||
we obtain a _surrogate model_, i.e., a new function that mimics the behavior of the original $\mathcal{P}$.
|
||||
The numerical approximations
|
||||
of PDE models for real world phenomena are often very expensive to compute. A trained
|
||||
NN on the other hand incurs a constant cost per evaluation, and is typically trivial
|
||||
to evaluate on specialized hardware such as GPUs or NN units.
|
||||
@ -78,4 +79,4 @@ is a very attractive and interesting direction.
|
||||
## Show me some code!
|
||||
|
||||
Let's directly look at an example for this: we'll replace a full solver for
|
||||
_turbulent flows around airfoils_ with a surrogate model (from {cite}`thuerey2020dfp`).
|
||||
_turbulent flows around airfoils_ with a surrogate model from {cite}`thuerey2020dfp`.
|
||||
|
Loading…
Reference in New Issue
Block a user