diff --git a/supervised-airfoils.ipynb b/supervised-airfoils.ipynb
index 63eeff1..09b5a91 100644
--- a/supervised-airfoils.ipynb
+++ b/supervised-airfoils.ipynb
@@ -590,7 +590,7 @@
         "_Why is the validation loss lower than the training loss_?\n",
         "The data is similar to the training data of course, but in a way it's slightly \"tougher\", because the network certainly never received any validation samples during training. It is natural that the validation loss slightly deviates from the training loss, but how can the L1 loss be _lower_ for these inputs?\n",
         "\n",
-        "This is a subtlety of the training loop above: it runs a training step first, and the loss for each point in the graph is measured with the evolving state of the network in an epoch. The network is updated, and afterwards runs through the validation samples. Thus all validation samples are using a state that is definitely different (and hopefully a bit better) than the initial states of the epoch. Hence, the validation loss can be slightly lower.\n",
+        "This is caused by the way the the training loop above is implemented in pytorch: while the training loss is evaluated in training mode via `net.train()`, the evaluation takes place after a call to `net.eval()`. This turns off batch normalization, and would disable features like dropout (if active). This slightly changes the evaluation. The code also runs a training step, and the loss for each point in the graph is measured with the evolving state of the network in an epoch. The network is updated, and afterwards runs through the validation samples. Thus all validation samples are using a state that is slightly different (and hopefully a bit better) than the initial states of the epoch. Due to both reasons, the validation loss can deviate, and in this example it's typically slightly lower.\n",
         "\n",
         "A general word of caution here: never evaluate your network with training data! That won't tell you much because overfitting is a very common problem. At least use data the network hasn't seen before, i.e. validation data, and if that looks good, try some more different (at least slightly out-of-distribution) inputs, i.e., _test data_. The next cell runs the trained network over the validation data, and displays one of them with the `showSbs` function.\n",
         "\n"