spellcheck

2021-03-09 16:39:54 +08:00
parent 42061e7d00
commit c443f2bfdf
12 changed files with 55 additions and 55 deletions
--- a/diffphys-discuss.md
+++ b/diffphys-discuss.md
@@ -18,7 +18,7 @@ with the goals of training with DP.
 However, the noise is typically undirected, and hence not as accurate as training with 
 the actual evolutions of simulations. Hence, this noise can be a good starting point 
 for training that tends to overfit, but if possible, it is preferable to incorporate the
-acutal solver in the training loop via a DP approach.
+actual solver in the training loop via a DP approach.


 ## Summary
@@ -28,7 +28,7 @@ To summarize the pros and cons of training ANNs via differentiable physics:
 ✅ Pro: 
 - uses physical model and numerical methods for discretization
 - efficiency of selected methods carries over to training
- tight coupling of physical models and NNs possible
+- tight coupling of physical models and ANNs possible

 ❌ Con: 
 - not compatible with all simulators (need to provide gradients)
--- a/diffphys-dpvspinn.md
+++ b/diffphys-dpvspinn.md
@@ -1,17 +1,17 @@
 Diff. Physics versus Phys.-informed Training
 =======================

-In the previous sections we've seen example reconstuctions that used physical residuals as soft constraints, in the form of the PINNs, and reconstuctions that used a differentiable physics (DP) solver. While both methods can find minimizers for the same minimization problem, the solutions the obtained differ substantially, as do the behavior of the non-linear optimization problem that we get from each formulation. In the following we discuss these differences in more detail, and we will combine conclusions drawn from the behavior of the Burgers case of the previous sections with observations from research papers.
+In the previous sections we've seen example reconstructions that used physical residuals as soft constraints, in the form of the PINNs, and reconstructions that used a differentiable physics (DP) solver. While both methods can find minimizers for the same minimization problem, the solutions the obtained differ substantially, as do the behavior of the non-linear optimization problem that we get from each formulation. In the following we discuss these differences in more detail, and we will combine conclusions drawn from the behavior of the Burgers case of the previous sections with observations from research papers.

 ## Compatibility with Existing Numerical Methods

-It is very obvious that the PINN implementation is quite simple, which is a positive aspect, but at the same time it differs strongly from "typical" discretziations and solution approaches that are usually to employed equations like Burgers equation. The derivatives are computed via the neural network, and hence rely on a fairly accurate representation of the solution to provide a good direction for optimizaion problems.
+It is very obvious that the PINN implementation is quite simple, which is a positive aspect, but at the same time it differs strongly from "typical" discretizations and solution approaches that are usually to employed equations like Burgers equation. The derivatives are computed via the neural network, and hence rely on a fairly accurate representation of the solution to provide a good direction for optimization problems.

 The DP version on the other hand inherently relies on a numerical solver that is tied into the learning process. As such it requires a discretization of the problem at hand, and given this discretization can employ existing, and potentially powerful numerical techniques. This means solutions and derivatives can be evaluated with known and controllable accuracy, and can be evaluated efficiently.

 ## Discretization

-The reliance on a suitable discretization requires some understanding and knowledge of the problem under consideration. A sub-optimal discretization can impede the learning prcoess or, worst case, lead to diverging trainig runs. However, given the large body of theory and practical realizations of stable solvers for a wide variety of physical problems, this is typically not an unsurmountable obstacle.
+The reliance on a suitable discretization requires some understanding and knowledge of the problem under consideration. A sub-optimal discretization can impede the learning process or, worst case, lead to diverging training runs. However, given the large body of theory and practical realizations of stable solvers for a wide variety of physical problems, this is typically not an unsurmountable obstacle.

 The PINN approaches on the other hand do not require an a-priori choice of a discretization, and as such seem to be "discretization less". This, however, is only an advantage on first sight. As they yield solutions in a computer, they naturally _have_ to discretize the problem, but they construct this discretization over the coure of the training process, in a way that is not easily controllable from the outside. Thus, the resulting accuracy is determined by how well the training manages to estimate the complexity of the problem for realistic use cases, and how well the training data approximates the unknown regions of the solution.

@@ -21,7 +21,7 @@ As demonstrated with the Burgers example, the PINN solutions typically have sign

 The PINN approaches typically perform a localized sampling and correction of the solutions, which means the corrections in the form of weight updates are likewise typically local. The fulfilment of boundary conditions in space and time can be correspondingly slow, leading to long training runs in practice.

-A well-chosen discretization of a DP approach can remedy this behavior, and provide an improved flow of gradient information. At the same time, the reliance on a computational grid means that solutions can be obtained very quickly. Given an interpolation scheme or set of basis functions, the solution can be sampled at any point in space or time given a very local neighborhood of the computational grid. Worst case, this can lead to slight memory overheads, e.g., by repeatedly storing mostly constand values of a solution.
+A well-chosen discretization of a DP approach can remedy this behavior, and provide an improved flow of gradient information. At the same time, the reliance on a computational grid means that solutions can be obtained very quickly. Given an interpolation scheme or set of basis functions, the solution can be sampled at any point in space or time given a very local neighborhood of the computational grid. Worst case, this can lead to slight memory overheads, e.g., by repeatedly storing mostly constant values of a solution.

 For the PINN representation with fully-connected networks on the other hand, we need to make a full pass over the potentially large number of values in the whole network to obtain a sample of the solution at a single point. The network effectively needs to encode the full high-dimensional solution. Its size likewise determines the efficiency of derivative calculations.

@@ -45,6 +45,6 @@ The following table summarizes these findings:

 As a summary, both methods are definitely interesting, and have a lot of potential. There are numerous more complicated extensions and algorithmic modifications that change and improve on the various negative aspects we have discussed for both sides.

-However, as of this writing, the physics-informed (PI) approach has clear limitations when it comes to performance and campatibility with existing numerical methods. Thus, when knowledge of the problem at hand is available, which typically is the case when we choose a suitable PDE model to constrain the learning process, employing a differentiable physics (DP) solver can significantly improve the training process as well as the quality of the obtained solution. Next, we will target more complex settings, i.e., fluids with Navier Stokes, to illustrate this in more detail.
+However, as of this writing, the physics-informed (PI) approach has clear limitations when it comes to performance and compatibility with existing numerical methods. Thus, when knowledge of the problem at hand is available, which typically is the case when we choose a suitable PDE model to constrain the learning process, employing a differentiable physics (DP) solver can significantly improve the training process as well as the quality of the obtained solution. Next, we will target more complex settings, i.e., fluids with Navier-Stokes, to illustrate this in more detail.


--- a/diffphys-examples.md
+++ b/diffphys-examples.md
@@ -4,7 +4,7 @@ Complex Examples with DP
 The following two sections with show code examples of two more complex cases that 
 will show what can be achieved via differentiable physics training.

-First, we'll show a scenario that employs deep learning to learn the erros
+First, we'll show a scenario that employs deep learning to learn the errors
 of a numerical simulation, following Um et al. {cite}`um2020sol`.
 This is a very fundamental task, and requires the learned model to closely
 interact with a numerical solver. Hence, it's a prime example of 
@@ -18,6 +18,6 @@ and hence needs two networks, one to _predict_ the evolution,
 and another one to _act_ to reach the desired goal. 

 Both cases require quite a bit more resources than the previous examples, so you 
-can expect these notebooks to run longer (and it's a good idea to use the checkpointing
+can expect these notebooks to run longer (and it's a good idea to use the check-pointing
 mechanisms when working with these examples).

--- a/diffphys-outlook.md
+++ b/diffphys-outlook.md
@@ -10,20 +10,20 @@ we can obtain _hybrid_ methods, that use the best numerical methods that we have

 ## Interaction

-One key component for these hybrids to work well is to let the NN _interact_ with the PDE solver at training time. Differentiable simulations allow a trained model to explore and experience the physical environment, and receive directed feedback regarding its interactions throughout the solver iterations. This combination nicely fits into the broader context of machine learning as _differentiable programming_. 
+One key component for these hybrids to work well is to let the ANN _interact_ with the PDE solver at training time. Differentiable simulations allow a trained model to explore and experience the physical environment, and receive directed feedback regarding its interactions throughout the solver iterations. This combination nicely fits into the broader context of machine learning as _differentiable programming_. 

 ## Generalization

-The hybrid approach also bears particular promise for simulators: it improves generalizing capabilities of the trained models by letting the PDE-solver handle large-scale changes to the data distribution such that the learned model can focus on localized structures not captured by the discretization. While physical models generalize very well, learned models often specialize in data distributions seen at training time. This was e.g. shown for the models reducing numerical errors of the previous chapter: the trained models can deal with solution manifolds with significant amounts of varying physical behavior, while simpler training variants quickly deteriorate over the course of recurrent time steps.
+The hybrid approach also bears particular promise for simulators: it improves generalizing capabilities of the trained models by letting the PDE-solver handle large-scale changes to the data distribution such that the learned model can focus on localized structures not captured by the discretization. While physical models generalize very well, learned models often specialize in data distributions seen at training time. This was, e.g., shown for the models reducing numerical errors of the previous chapter: the trained models can deal with solution manifolds with significant amounts of varying physical behavior, while simpler training variants quickly deteriorate over the course of recurrent time steps.

 ## Possibilities

 We've just scratched the surface regarding the possibilities of this combination. The examples with Burgers equation and Navier-Stokes solvers are non-trivial, and good examples for advection-diffusion-type PDEs. However, there's a wide variety of other potential combinations, to name just a few examples:

-* PDEs for chemical reactions often show complex behavior due to the interactions of mulltiple species. Here, and especially interesting direction is to train models that quickly learn to predict the evolution of an experiment or machine, and adjust control knobs to stabilize it, i.e., an online _control_ setting.
+* PDEs for chemical reactions often show complex behavior due to the interactions of multiple species. Here, and especially interesting direction is to train models that quickly learn to predict the evolution of an experiment or machine, and adjust control knobs to stabilize it, i.e., an online _control_ setting.

-* Plasma simulations share a lot with vorticity-based formulations for fluids, but additionally introduce terms to handle electric and magnetic interactions within the material. Likwise, controllers for plasma fusion experiments and generators are an excellent topic with plenty of potential for DL with differentiable physics.
+* Plasma simulations share a lot with vorticity-based formulations for fluids, but additionally introduce terms to handle electric and magnetic interactions within the material. Likewise, controllers for plasma fusion experiments and generators are an excellent topic with plenty of potential for DL with differentiable physics.

-* Finally, weather and climate are crucial topics for humanity, and highly complex systems of fluid flows interacting with a multitude of phenomena on the surface of our planet. Accurately modeling all these interacting systems and predicting their long-term behavior shows a lot of promise with benefitting from DL approaches that can interface with numerical simulations.
+* Finally, weather and climate are crucial topics for humanity, and highly complex systems of fluid flows interacting with a multitude of phenomena on the surface of our planet. Accurately modeling all these interacting systems and predicting their long-term behavior shows a lot of promise to benefit from DL approaches that can interface with numerical simulations.

 So overall, there's lots of exciting research work left to do - the next years and decades definitely won't be boring 👍
--- a/diffphys.md
+++ b/diffphys.md
@@ -10,7 +10,7 @@ The central goal of this methods is to use existing numerical solvers, and equip
 them with functionality to compute gradients with respect to their inputs.
 Once this is realized for all operators of a simulation, we can leverage 
 the autodiff functionality of DL frameworks with back-propagation to let gradient 
-information from from a simulator into an NN and vice versa. This has numerous 
+information from from a simulator into an ANN and vice versa. This has numerous 
 advantages such as improved learning feedback and generalization, as we'll outline below.
 In contrast to physics-informed loss functions, it also enables handling more complex
 solution manifolds instead of single inverse problems.
@@ -54,9 +54,9 @@ $\partial \mathcal P_i / \partial \mathbf{u}$.

 Note that we typically don't need derivatives 
 for all parameters of $\mathcal P$, e.g. we omit $\nu$ in the following, assuming that this is a 
-given model parameter, with which the NN should not interact. 
+given model parameter, with which the ANN should not interact. 
 Naturally, it can vary within the solution manifold that we're interested in, 
-but $\nu$ will not be the output of a NN representation. If this is the case, we can omit
+but $\nu$ will not be the output of a ANN representation. If this is the case, we can omit
 providing $\partial \mathcal P_i / \partial \nu$ in our solver. However, the following learning process
 natuarlly transfers to including $\nu$ as a degree of freedom.

@@ -189,7 +189,7 @@ Informally, we'd like to find a motion that deforms $d^{~0}$ into a target state
 The simplest way to express this goal is via an $L^2$ loss between the two states. So we want
 to minimize the loss function $F=|d(t^e) - d^{\text{target}}|^2$. 

-Note that as described here this is a pure optimization task, there's no NN involved,
+Note that as described here this is a pure optimization task, there's no ANN involved,
 and our goal is to obtain $\mathbf{u}$. We do not want to apply this motion to other, unseen _test data_,
 as would be custom in a real learning task.

@@ -204,7 +204,7 @@ We'd now like to find the minimizer for this objective by
 _gradient descent_ (GD), where the 
 gradient is determined by the differentiable physics approach described earlier in this chapter.
 Once things are working with GD, we can relatively easily switch to better optimizers or bring
-an NN into the picture, hence it's always a good starting point.
+an ANN into the picture, hence it's always a good starting point.

 As the discretized velocity field $\mathbf{u}$ contains all our degrees of freedom,
 what we need to update the velocity by an amount 
@@ -276,15 +276,15 @@ a bit more complex, matrix inversion, eg Poisson solve
 dont backprop through all CG steps (available in phiflow though)
 rather, re-use linear solver to compute multiplication by inverse matrix

-[note 1: essentialy yields implicit derivative, cf implicit function theorem & co]
+[note 1: essentially yields implicit derivative, cf implicit function theorem & co]

 [note 2: time can be "virtual" , solving for steady state
-only assumption: some iterative procedure, not just single eplicit step - then things simplify.]
+only assumption: some iterative procedure, not just single explicit step - then things simplify.]

 ## Summary of Differentiable Physics so far

 To summarize, using differentiable physical simulations 
-gives us a tool to include phsyical equations with a chosen discretization into DL learning.
+gives us a tool to include physical equations with a chosen discretization into DL learning.
 In contrast to the residual constraints of the previous chapter,
 this makes it possible to left NNs seamlessly interact with physical solvers.

--- a/intro.md
+++ b/intro.md
@@ -4,7 +4,7 @@ Welcome ...
 Welcome to the Physics-based Deep Learning Book 👋

 **TL;DR**: This document targets 
-a veriety of combinations of physical simulations with deep learning.
+a variety of combinations of physical simulations with deep learning.
 As much as possible, the algorithms will come with hands-on code examples to quickly get started.
 Beyond standard _supervised_ learning from data, we'll look at loss constraints, and 
 more tightly coupled learning algorithms with differentiable simulations.
--- a/overview-equations.md
+++ b/overview-equations.md
@@ -2,7 +2,7 @@ Models and Equations
 ============================

 Below we'll give a very (really _very_!) brief intro to deep learning, primarily to introduce the notation.
-In addition we'll discuss some _model equations_ below. Note that we won't use _model_ to denote trained neural networks, in contrast to some other texts. These will only be called "NNs" or "networks". A "model" will always denote model equations for a physical effect, typically a PDE.
+In addition we'll discuss some _model equations_ below. Note that we won't use _model_ to denote trained neural networks, in contrast to some other texts. These will only be called "ANNs" or "networks". A "model" will always denote model equations for a physical effect, typically a PDE.

 ## Deep Learning and Neural Networks

@@ -12,9 +12,9 @@ our goal is to approximate an unknown function
 $f^*(x) = y^*$ , 

 where $y^*$ denotes reference or "ground truth" solutions.
-$f^*(x)$ should be approximated with an NN representation $f(x;\theta)$. We typically determine $f$ 
+$f^*(x)$ should be approximated with an ANN representation $f(x;\theta)$. We typically determine $f$ 
 with the help of some formulation of an error function $e(y,y^*)$, where $y=f(x;\theta)$ is the output
-of the NN.
+of the ANN.
 This gives a minimization problem to find $f(x;\theta)$ such that $e$ is minimized.
 In the simplest case, we can use an $L^2$ error, giving

@@ -36,7 +36,7 @@ and **test** data sets with _some_ different distribution than the training one.
 The latter distinction is important! For the test set we want 
 _out of distribution_ (OOD) data to check how well our trained model generalizes.
 Note that this gives a huge range of difficulties: from tiny changes that will certainly work
-up to completely different inputs that are essentially guaranteeed to fail. Hence,
+up to completely different inputs that are essentially guaranteed to fail. Hence,
 test data should be generated with care.

 Enough for now - if all the above wasn't totally obvious for you, we very strongly recommend to 
@@ -81,7 +81,7 @@ This invariably introduce discretization errors, which we'd like to keep as smal
 These errors can be measured in terms of the deviation from the exact analytical solution, 
 and for discrete simulations of PDEs, they are typically expressed as a function of the truncation error 
 $O( \Delta x^k )$, where $\Delta x$ denotes the spatial step size of the discretization.
-Likewise, we typically have a temporal disceretization via a time step $\Delta t$.
+Likewise, we typically have a temporal discretization via a time step $\Delta t$.

 ```{admonition} Notation and abbreviations
 :class: seealso
@@ -119,7 +119,7 @@ and the abbreviations used inn: {doc}`notation`, at the bottom of the left panel
 %Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.

 We typically solve a discretized PDE $\mathcal{P}$ by performing steps of size $\Delta t$.
-For a quantitiy of interest $\mathbf{u}$, e.g., representing a velocity field
+For a quantity of interest $\mathbf{u}$, e.g., representing a velocity field
 in $d$ dimensions via $\mathbf{u}(\mathbf{x},t): \mathbb{R}^d \rightarrow \mathbb{R}^d $.
 The components of the velocity vector are typically denoted by $x,y,z$ subscripts, i.e.,
 $\mathbf{u} = (u_x,u_y,u_z)^T$ for $d=3$.
@@ -145,7 +145,7 @@ with actual simulations and implementation examples on the next page.
 We'll often consider Burgers' equation 
 in 1D or 2D as a starting point. 
 It represents a well-studied advection-diffusion PDE, which (unlike Navier-Stokes)
-does not include any additional constraits such as conservation of mass. Hence,
+does not include any additional constraints such as conservation of mass. Hence,
 it leads to interesting shock formations.
 In 2D, it is given by:

--- a/overview.md
+++ b/overview.md
@@ -8,7 +8,7 @@ methods based on artificial neural networks.
 The general direction of Physics-Based Deep Learning represents a very
 active, quickly growing and exciting field of research -- we want to provide 
 a starting point for new researchers as well as a hands-on introduction into
-state-of-the-art resarch topics. 
+state-of-the-art research topics. 



@@ -53,7 +53,7 @@ in physical applications are outstanding.
 The proposed techniques are novel, sometimes difficult to apply, and
 significant practical difficulties combing physics and DL persist.
 Also, many fundamental theoretical questions remain unaddressed, most importantly
-regarding data efficienty and generalization.
+regarding data efficiency and generalization.

 Over the course of the last decades,
 highly specialized and accurate discretization schemes have
@@ -76,7 +76,7 @@ Thus, the key aspects that we want to address in the following are:
 Thus, we want to build on all the powerful techniques that we have
 at our disposal, and use them wherever we can.
 I.e., our goal is to _reconcile_ the data-centered
-viewpoint and the physical simuation viewpoint.
+viewpoint and the physical simulation viewpoint.

 The resulting methods have a huge potential to improve
 what can be done with numerical methods: e.g., in scenarios
@@ -105,8 +105,8 @@ observations).
 No matter whether we're considering forward or inverse problem, 
 the most crucial differentiation for the following topics lies in the 
 nature of the integration  between DL techniques
-and the domain knowledge, typically in the form of model euqations.
-Looking ahead, we will particularly aim for a very tight intgration
+and the domain knowledge, typically in the form of model equations.
+Looking ahead, we will particularly aim for a very tight integration
 of the two, that goes beyond soft-constraints in loss functions.
 Taking a global perspective, the following three categories can be
 identified to categorize _physics-based deep learning_ (PBDL)
@@ -166,7 +166,7 @@ A brief look at our _notation_ in the {doc}`notation` chapter won't hurt in both
 ## Implementations

 This text also represents an introduction to a wide range of deep learning and simulation APIs.
-We'll use popoular deep learning APIs such as _pytorch_ and _tensorflow_, and additionally
+We'll use popular deep learning APIs such as _pytorch_ and _tensorflow_, and additionally
 give introductions into _phiflow_ for simulations. Some examples also use _JAX_. Thus after going through
 these examples, you should have a good overview of what's available in current APIs, such that
 the best one can be selected for new tasks.
--- a/physicalloss-discuss.md
+++ b/physicalloss-discuss.md
@@ -12,7 +12,7 @@ representation regarding the reliability of these derivatives. Also, each deriva
 requires backpropagation through the full network, which can be very slow. Especially so
 for higher-order derivatives.

-And while the setup is realtively simple, it is generally difficult to control. The NN
+And while the setup is relatively simple, it is generally difficult to control. The ANN
 has flexibility to refine the solution by itself, but at the same time, tricks are necessary
 when it doesn't pick the right regions of the solution.

@@ -37,15 +37,15 @@ we deploy it into an application.

 In contrast, for the PINN training as described here, we reconstruct a single solution in a known 
 and given space-time time. As such, any samples from this domain follow the same distribution
-and hence don't really represent test or OOD sampes. As the NN directly encodes the solution,
+and hence don't really represent test or OOD sampes. As the ANN directly encodes the solution,
 there is also little hope that it will yield different solutions, or perform well outside
 of the training distribution. If we're interested in a different solution, we most likely 
-have to start training the NN from scratch.
+have to start training the ANN from scratch.

 ## Summary

 Thus, the physical soft constraints allow us to encode solutions to 
-PDEs with the tools of NNs.
+PDEs with the tools of ANNs.
 An inherent drawback of this approach is that it yields single solutions,
 and that it does not combine with traditional numerical techniques well. 
 E.g., learned representation is not suitable to be refined with 
@@ -58,7 +58,7 @@ goals of the next sections.

 ✅ Pro: 
 - uses physical model
- derivatives can be convieniently compute via backpropagation
+- derivatives can be conveniently compute via backpropagation

 ❌ Con: 
 - quite slow ...
--- a/physicalloss.md
+++ b/physicalloss.md
@@ -24,7 +24,7 @@ $
  \mathbf{u}_t = \mathcal F ( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} ) ,
 $
 where the $_{\mathbf{x}}$ subscripts denote spatial derivatives with respect to one of the spatial dimensions
-of higher and higher order (this can of course also include derivatives with repsect to different axes).
+of higher and higher order (this can of course also include derivatives with respect to different axes).

 In this context we can employ DL by approximating the unknown $\mathbf{u}$ itself 
 with a NN, denoted by $\tilde{\mathbf{u}}$. If the approximation is accurate, the PDE
--- a/supervised-discuss.md
+++ b/supervised-discuss.md
@@ -16,7 +16,7 @@ using supervised training.
 _Supervised training_ is the natural starting point for **any** DL project. It always,
 and we really mean **always** here, makes sense to start with a fully supervised
 test using as little data as possible. This will be a pure overfitting test,
-but if your network can't quicklyl converge and give a very good performance 
+but if your network can't quickly converge and give a very good performance 
 on a single example, then there's something fundamentally wrong
 with your code or data. Thus, there's no reason to move on to more complex
 setups that will make finding these fundamental problems more difficult.
@@ -28,7 +28,7 @@ and then increase the complexity of the setup.

 A nice property of the supervised training is also that it's very stable.
 Things won't get any better when we include more complex physical 
-models, or look at more complicated NN architectures.
+models, or look at more complicated ANN architectures.

 Thus, again, make sure you can see a nice exponential falloff in your training 
 loss when starting with the simple overfitting tests. This is a good
@@ -42,10 +42,10 @@ rough estimate of suitable values for $\eta$.
 A comment that you'll often hear when talking about DL approaches, and especially
 when using relatively simple training methodologies is: "Isn't it just interpolating the data?"

-Well, **yes** it is! And that's exactly what the NN should do. In a way - there isn't 
+Well, **yes** it is! And that's exactly what the ANN should do. In a way - there isn't 
 anything else to do. This is what _all_ DL approaches are about. They give us smooth
 representations of the data seen at training time. Even if we'll use fancy physical 
-models at training time later on, the NNs just adjust their weights to represent the signals
+models at training time later on, the ANNs just adjust their weights to represent the signals
 they receive, and reproduce it.

 Due to the hype and numerous success stories, people not familiar with DL often have 
@@ -54,34 +54,34 @@ and general principles in data sets (["messages from god"](https://dilbert.com/s
 That's not what happens with the current state of the art. Nonetheless, it's
 the most powerful tool we have to approximate complex, non-linear functions.
 It is a great tool, but it's important to keep in mind, that once we set up the training
-correctly, all we'll get out of it is an approximation of the function the NN
+correctly, all we'll get out of it is an approximation of the function the ANN
 was trained for - no magic involved.

 An implication of this is that you shouldn't expect the network 
-to work on data it has never seen. In a way, the NNs are so good exactly 
+to work on data it has never seen. In a way, the ANNs are so good exactly 
 because they can accurately adapt to the signals they receive at training time,
-but in contrast to other learned representations, they're acutally not very good
-at extrapolation. So we can't expect an NN to magically work with new inputs.
+but in contrast to other learned representations, they're actually not very good
+at extrapolation. So we can't expect an ANN to magically work with new inputs.
 Rather, we need to make sure that we can properly shape the input space,
 e.g., by normalization and by focusing on invariants. In short, if you always train
 your networks for inputs in the range $[0\dots1]$, don't expect it to work
 with inputs of $[10\dots11]$. You might be able to subtract an offset of $10$ beforehand,
 and re-apply it after evaluating the network.
 As a rule of thumb: always make sure you
-acutally train the NN on the kinds of input you want to use at inference time.
+actually train the ANN on the kinds of input you want to use at inference time.

 This is important to keep in mind during the next chapters: e.g., if we
-want an NN to work in conjunction with another solver or simulation environment,
+want an ANN to work in conjunction with another solver or simulation environment,
 it's important to actually bring the solver into the training process, otherwise
 the network might specialize on pre-computed data that differs from what is produced
-when combining the NN with the solver, i.e _distribution shift_.
+when combining the ANN with the solver, i.e _distribution shift_.

 ### Meshes and grids

 The previous airfoil example use Cartesian grids with standard 
 convolutions. These typically give the most _bang-for-the-buck_, in terms
 of performance and stability. Nonetheless, the whole discussion here of course 
-also holds for less regular convcolutions, e.g., a less regular mesh
+also holds for less regular convolutions, e.g., a less regular mesh
 in conjunction with graph-convolutions. You will typically see reduced learning
 performance in exchange for improved stability when switching to these.

--- a/supervised.md
+++ b/supervised.md
@@ -41,7 +41,7 @@ and we don't need manual labour to annotate a large number of samples to get tra

 On the other hand, this approach inherits the common challenges of replacing experiments
 with simulations: first, we need to ensure the chosen model has enough power to predict the 
-bheavior of real-world phenomena that we're interested in.
+behavior of real-world phenomena that we're interested in.
 In addition, the numerical approximations have numerical errors
 which need to be kept small enough for a chosen application. As these topics are studied in depth
 for classical simulations, the existing knowledge can likewise be leveraged to