cleanup, unified notation NN instead of ANN
This commit is contained in:
@@ -28,7 +28,7 @@ and then increase the complexity of the setup.
|
||||
|
||||
A nice property of the supervised training is also that it's very stable.
|
||||
Things won't get any better when we include more complex physical
|
||||
models, or look at more complicated ANN architectures.
|
||||
models, or look at more complicated NN architectures.
|
||||
|
||||
Thus, again, make sure you can see a nice exponential falloff in your training
|
||||
loss when starting with the simple overfitting tests. This is a good
|
||||
@@ -42,10 +42,10 @@ rough estimate of suitable values for $\eta$.
|
||||
A comment that you'll often hear when talking about DL approaches, and especially
|
||||
when using relatively simple training methodologies is: "Isn't it just interpolating the data?"
|
||||
|
||||
Well, **yes** it is! And that's exactly what the ANN should do. In a way - there isn't
|
||||
Well, **yes** it is! And that's exactly what the NN should do. In a way - there isn't
|
||||
anything else to do. This is what _all_ DL approaches are about. They give us smooth
|
||||
representations of the data seen at training time. Even if we'll use fancy physical
|
||||
models at training time later on, the ANNs just adjust their weights to represent the signals
|
||||
models at training time later on, the NNs just adjust their weights to represent the signals
|
||||
they receive, and reproduce it.
|
||||
|
||||
Due to the hype and numerous success stories, people not familiar with DL often have
|
||||
@@ -54,27 +54,27 @@ and general principles in data sets (["messages from god"](https://dilbert.com/s
|
||||
That's not what happens with the current state of the art. Nonetheless, it's
|
||||
the most powerful tool we have to approximate complex, non-linear functions.
|
||||
It is a great tool, but it's important to keep in mind, that once we set up the training
|
||||
correctly, all we'll get out of it is an approximation of the function the ANN
|
||||
correctly, all we'll get out of it is an approximation of the function the NN
|
||||
was trained for - no magic involved.
|
||||
|
||||
An implication of this is that you shouldn't expect the network
|
||||
to work on data it has never seen. In a way, the ANNs are so good exactly
|
||||
to work on data it has never seen. In a way, the NNs are so good exactly
|
||||
because they can accurately adapt to the signals they receive at training time,
|
||||
but in contrast to other learned representations, they're actually not very good
|
||||
at extrapolation. So we can't expect an ANN to magically work with new inputs.
|
||||
at extrapolation. So we can't expect an NN to magically work with new inputs.
|
||||
Rather, we need to make sure that we can properly shape the input space,
|
||||
e.g., by normalization and by focusing on invariants. In short, if you always train
|
||||
your networks for inputs in the range $[0\dots1]$, don't expect it to work
|
||||
with inputs of $[10\dots11]$. You might be able to subtract an offset of $10$ beforehand,
|
||||
and re-apply it after evaluating the network.
|
||||
As a rule of thumb: always make sure you
|
||||
actually train the ANN on the kinds of input you want to use at inference time.
|
||||
actually train the NN on the kinds of input you want to use at inference time.
|
||||
|
||||
This is important to keep in mind during the next chapters: e.g., if we
|
||||
want an ANN to work in conjunction with another solver or simulation environment,
|
||||
want an NN to work in conjunction with another solver or simulation environment,
|
||||
it's important to actually bring the solver into the training process, otherwise
|
||||
the network might specialize on pre-computed data that differs from what is produced
|
||||
when combining the ANN with the solver, i.e _distribution shift_.
|
||||
when combining the NN with the solver, i.e _distribution shift_.
|
||||
|
||||
### Meshes and grids
|
||||
|
||||
|
||||
Reference in New Issue
Block a user