spellcheck

2021-03-09 16:39:54 +08:00
parent 42061e7d00
commit c443f2bfdf
12 changed files with 55 additions and 55 deletions
--- a/supervised-discuss.md
+++ b/supervised-discuss.md
@@ -16,7 +16,7 @@ using supervised training.
 _Supervised training_ is the natural starting point for **any** DL project. It always,
 and we really mean **always** here, makes sense to start with a fully supervised
 test using as little data as possible. This will be a pure overfitting test,
-but if your network can't quicklyl converge and give a very good performance 
+but if your network can't quickly converge and give a very good performance 
 on a single example, then there's something fundamentally wrong
 with your code or data. Thus, there's no reason to move on to more complex
 setups that will make finding these fundamental problems more difficult.
@@ -28,7 +28,7 @@ and then increase the complexity of the setup.

 A nice property of the supervised training is also that it's very stable.
 Things won't get any better when we include more complex physical 
-models, or look at more complicated NN architectures.
+models, or look at more complicated ANN architectures.

 Thus, again, make sure you can see a nice exponential falloff in your training 
 loss when starting with the simple overfitting tests. This is a good
@@ -42,10 +42,10 @@ rough estimate of suitable values for $\eta$.
 A comment that you'll often hear when talking about DL approaches, and especially
 when using relatively simple training methodologies is: "Isn't it just interpolating the data?"

-Well, **yes** it is! And that's exactly what the NN should do. In a way - there isn't 
+Well, **yes** it is! And that's exactly what the ANN should do. In a way - there isn't 
 anything else to do. This is what _all_ DL approaches are about. They give us smooth
 representations of the data seen at training time. Even if we'll use fancy physical 
-models at training time later on, the NNs just adjust their weights to represent the signals
+models at training time later on, the ANNs just adjust their weights to represent the signals
 they receive, and reproduce it.

 Due to the hype and numerous success stories, people not familiar with DL often have 
@@ -54,34 +54,34 @@ and general principles in data sets (["messages from god"](https://dilbert.com/s
 That's not what happens with the current state of the art. Nonetheless, it's
 the most powerful tool we have to approximate complex, non-linear functions.
 It is a great tool, but it's important to keep in mind, that once we set up the training
-correctly, all we'll get out of it is an approximation of the function the NN
+correctly, all we'll get out of it is an approximation of the function the ANN
 was trained for - no magic involved.

 An implication of this is that you shouldn't expect the network 
-to work on data it has never seen. In a way, the NNs are so good exactly 
+to work on data it has never seen. In a way, the ANNs are so good exactly 
 because they can accurately adapt to the signals they receive at training time,
-but in contrast to other learned representations, they're acutally not very good
-at extrapolation. So we can't expect an NN to magically work with new inputs.
+but in contrast to other learned representations, they're actually not very good
+at extrapolation. So we can't expect an ANN to magically work with new inputs.
 Rather, we need to make sure that we can properly shape the input space,
 e.g., by normalization and by focusing on invariants. In short, if you always train
 your networks for inputs in the range $[0\dots1]$, don't expect it to work
 with inputs of $[10\dots11]$. You might be able to subtract an offset of $10$ beforehand,
 and re-apply it after evaluating the network.
 As a rule of thumb: always make sure you
-acutally train the NN on the kinds of input you want to use at inference time.
+actually train the ANN on the kinds of input you want to use at inference time.

 This is important to keep in mind during the next chapters: e.g., if we
-want an NN to work in conjunction with another solver or simulation environment,
+want an ANN to work in conjunction with another solver or simulation environment,
 it's important to actually bring the solver into the training process, otherwise
 the network might specialize on pre-computed data that differs from what is produced
-when combining the NN with the solver, i.e _distribution shift_.
+when combining the ANN with the solver, i.e _distribution shift_.

 ### Meshes and grids

 The previous airfoil example use Cartesian grids with standard 
 convolutions. These typically give the most _bang-for-the-buck_, in terms
 of performance and stability. Nonetheless, the whole discussion here of course 
-also holds for less regular convcolutions, e.g., a less regular mesh
+also holds for less regular convolutions, e.g., a less regular mesh
 in conjunction with graph-convolutions. You will typically see reduced learning
 performance in exchange for improved stability when switching to these.