diff --git a/notation.md b/notation.md
index e1766fd..c26c1fb 100644
--- a/notation.md
+++ b/notation.md
@@ -24,15 +24,19 @@
 
 | Abbreviation | Meaning |
 | --- | --- |
+| AI   | Mysterious buzzword popping up in all kinds of places these days |
 | BNN  | Bayesian neural network |
-| CNN  | Convolutional neural network |
+| CNN  | Convolutional neural network (specific NN architecure) |
+| DDPM | Denoising diffusion probabilistic models (diffusion modeling variant) |
 | DL   | Deep Learning |
-| GD   | (steepest) Gradient Descent|
+| FM   | Flow matching (diffusion modeling variant) |
+| FNO  | Fourier neural operator (specific NN architecure) |
+| GD   | (steepest) Gradient Descent |
 | MLP  | Multi-Layer Perceptron, a neural network with fully connected layers |
 | NN   | Neural network (a generic one, in contrast to, e.g., a CNN or MLP) |
 | PDE  | Partial Differential Equation |
 | PBDL | Physics-Based Deep Learning |
-| SGD  | Stochastic Gradient Descent|
+| SGD  | Stochastic Gradient Descent |
 
 
 
diff --git a/overview-burgers-forw.ipynb b/overview-burgers-forw.ipynb
index f54ca38..e237593 100644
--- a/overview-burgers-forw.ipynb
+++ b/overview-burgers-forw.ipynb
@@ -6,9 +6,9 @@
    "source": [
     "# Simple Forward Simulation of Burgers Equation with phiflow\n",
     "\n",
-    "This chapter will give an introduction for how to run _forward_, i.e., regular simulations starting with a given initial state and approximating a later state numerically, and introduce the Φ<sub>Flow</sub> framework. Φ<sub>Flow</sub> provides a set of differentiable building blocks that directly interface with deep learning frameworks, and hence is a very good basis for the topics of this book. Before going for deeper and more complicated integrations, this notebook (and the next one), will show how regular simulations can be done with Φ<sub>Flow</sub>. Later on, we'll show that these simulations can be easily coupled with neural networks.\n",
+    "This chapter will give an introduction for how to run _forward_, i.e., regular simulations starting with a given initial state and approximating a later state numerically, and introduce the Φ<sub>Flow</sub> framework (in the following \"phiflow\"). Phiflow provides a set of differentiable building blocks that directly interface with deep learning frameworks, and hence is a very good basis for the topics of this book. Before going for deeper and more complicated integrations, this notebook (and the next one), will show how regular simulations can be done with phiflow. Later on, we'll show that these simulations can be easily coupled with neural networks.\n",
     "\n",
-    "The main repository for Φ<sub>Flow</sub> (in the following \"phiflow\") is [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow), and additional API documentation and examples can be found at [https://tum-pbs.github.io/PhiFlow/](https://tum-pbs.github.io/PhiFlow/).\n",
+    "The main repository for phiflow is [https://github.com/tum-pbs/PhiFlow](https://github.com/tum-pbs/PhiFlow), and additional API documentation and examples can be found at [https://tum-pbs.github.io/PhiFlow/](https://tum-pbs.github.io/PhiFlow/).\n",
     "\n",
     "For this jupyter notebook (and all following ones), you can find a _\"[run in colab]\"_ link at the end of the first paragraph (alternatively you can use the launch button at the top of the page). This will load the latest version from the PBDL github repo in a colab notebook that you can execute on the spot: \n",
     "[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/overview-burgers-forw.ipynb)\n",
@@ -31,7 +31,7 @@
    "source": [
     "## Importing and loading phiflow\n",
     "\n",
-    "Let's get some preliminaries out of the way: first we'll import the phiflow library, more specifically the `numpy` operators for fluid flow simulations: `phi.flow` (differentiable versions for a DL framework _X_ are loaded via `phi.X.flow` instead).\n",
+    "Let's get some preliminaries out of the way: first we'll import the phiflow library, more specifically the `numpy` operators for fluid flow simulations: `phi.flow` (differentiable versions for a DL framework _X_ are loaded via `phi.X.flow` instead). This allows it to easily switch between different APIs, e.g., phiflow solvers can run in either PyTorch, Tensorflow or also JAX.\n",
     "\n",
     "**Note:** Below, the first command with a \"!\" prefix will install the [phiflow python package from GitHub](https://github.com/tum-pbs/PhiFlow) via `pip` in your python environment once you uncomment it. We've assumed that phiflow isn't installed, but if you have already done so, just comment out the first line (the same will hold for all following notebooks)."
    ]
@@ -45,12 +45,12 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Using phiflow version: 3.1.0\n"
+      "Using phiflow version: 3.2.0\n"
      ]
     }
    ],
    "source": [
-    "!pip install --upgrade --quiet phiflow==3.1\n",
+    "!pip install --upgrade --quiet phiflow==3.2\n",
     "from phi.flow import *\n",
     "\n",
     "from phi import __version__\n",
diff --git a/overview-equations.md b/overview-equations.md
index 180d28d..7867a93 100644
--- a/overview-equations.md
+++ b/overview-equations.md
@@ -1,25 +1,22 @@
 Models and Equations
 ============================
 
-Below we'll give a brief (really _very_ brief!) intro to deep learning, primarily to introduce the notation.
+Below we'll give a _very_ brief intro to deep learning, primarily to introduce the notation.
 In addition we'll discuss some _model equations_ below. Note that we'll avoid using _model_ to denote trained neural networks, in contrast to some other texts and APIs. These will be called "NNs" or "networks". A "model" will typically denote a set of model equations for a physical effect, usually PDEs. 
 
 ## Deep learning and neural networks
 
-In this book we focus on the connection with physical
-models, and there are lots of great introductions to deep learning. 
-Hence, we'll keep it short: 
-the goal in deep learning is to approximate an unknown function
+The goal in deep learning is to approximate an unknown function
 
 $$
 f^*(x) = y^* , 
 $$ (learn-base) 
 
-where $y^*$ denotes reference or "ground truth" solutions.
-$f^*(x)$ should be approximated with an NN representation $f(x;\theta)$. We typically determine $f$ 
+where $y^*$ denotes reference or "ground truth" solutions, and 
+$f^*(x)$ should be approximated with an NN $f(x;\theta)$. We typically determine $f$ 
 with the help of some variant of a loss function $L(y,y^*)$, where $y=f(x;\theta)$ is the output
 of the NN.
-This gives a minimization problem to find $f(x;\theta)$ such that $e$ is minimized.
+This gives a minimization problem to find $f(x;\theta)$ such that $L$ is minimized.
 In the simplest case, we can use an $L^2$ error, giving
 
 $$
@@ -28,7 +25,7 @@ $$ (learn-l2)
 
 We typically optimize, i.e. _train_, 
 with a stochastic gradient descent (SGD) optimizer of choice, e.g. Adam {cite}`kingma2014adam`.
-We'll rely on auto-diff to compute the gradient of a _scalar_ loss $L$ w.r.t. the weights, $\partial L / \partial \theta$.
+We'll rely on auto-diff to compute the gradient of the _scalar_ loss $L$ w.r.t. the weights, $\partial L / \partial \theta$.
 It is crucial for the calculation of gradients that this function is scalar,
 and the loss function is often also called "error", "cost", or "objective" function.
 
@@ -38,14 +35,14 @@ introduce scalar loss, always(!) scalar...  (also called *cost* or *objective* f
 For training we distinguish: the **training** data set drawn from some distribution, 
 the **validation** set (from the same distribution, but different data),
 and **test** data sets with _some_ different distribution than the training one.
-The latter distinction is important. For the test set we want 
+The latter distinction is important. For testing, we usually want 
 _out of distribution_ (OOD) data to check how well our trained model generalizes.
 Note that this gives a huge range of possibilities for the test data set: 
 from tiny changes that will certainly work,
 up to completely different inputs that are essentially guaranteed to fail. 
 There's no gold standard, but test data should be generated with care.
 
-Enough for now - if all the above wasn't totally obvious for you, we very strongly recommend to 
+If the overview above wasn't obvious for you, we strongly recommend to 
 read chapters 6 to 9 of the [Deep Learning book](https://www.deeplearningbook.org),
 especially the sections about [MLPs](https://www.deeplearningbook.org/contents/mlp.html) 
 and "Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.html).
@@ -53,7 +50,7 @@ and "Conv-Nets", i.e. [CNNs](https://www.deeplearningbook.org/contents/convnets.
 ```{note} Classification vs Regression
 
 The classic ML distinction between _classification_ and _regression_ problems is not so important here:
-we only deal with _regression_ problems in the following.
+we only deal with _regression_ problems in the following. 
 
 ```
 
@@ -66,8 +63,19 @@ Also interesting: from a math standpoint ''just'' non-linear optimization ...
 
 The following section will give a brief outlook for the model equations
 we'll be using later on in the DL examples.
-We typically target continuous PDEs denoted by $\mathcal P^*$
-whose solution is of interest in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
+We typically target a continuous PDE operator denoted by $\mathcal P^*$,
+which maps inputs $\mathcal U$ to $\mathcal V$, where in the most general case $\mathcal U, \mathcal V$
+are both infinite dimensional Banach spaces, i.e. $\mathcal P^*: \mathcal U \rightarrow \mathcal V$.
+
+```{admonition} Learned solution operators vs traditional ones
+:class: tip
+Later on, the goal will be to learn $\mathcal P^*$ (or parts of it) with a neural network. A
+variety of different names are used in research: learned surrogates / hybrid simulators or emulators, 
+Neural operators or solvers, autoregressive models (if timesteps are involved), to name a few.
+```
+
+In practice, 
+the solution of interest lies in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
 In addition, we often consider a time evolution for a finite time interval $t \in \mathbb{R}^{+}$.
 The corresponding fields are either d-dimensional vector fields, for instance $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$, 
 or scalar $\mathbf{p}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}$.
@@ -79,12 +87,11 @@ To obtain unique solutions for $\mathcal P^*$ we need to specify suitable
 initial conditions, typically for all quantities of interest at $t=0$,
 and boundary conditions for the boundary of $\Omega$, denoted by $\Gamma$ in 
 the following.
-
 $\mathcal P^*$ denotes
-a continuous formulation, where we make mild assumptions about
+a continuous formulation, where we need to make mild assumptions about
 its continuity, we will typically assume that first and second derivatives exist.
 
-We can then use numerical methods to obtain approximations 
+Traditionally, we can use numerical methods to obtain approximations 
 of a smooth function such as $\mathcal P^*$ via discretization. 
 These invariably introduce discretization errors, which we'd like to keep as small as possible.
 These errors can be measured in terms of the deviation from the exact analytical solution, 
@@ -127,7 +134,7 @@ and the abbreviations used in: {doc}`notation`.
 %This yields $\vc{} \in \mathbb{R}^{d \times d_{s,x} \times d_{s,y} \times d_{s,z} }$ and $\vr{} \in \mathbb{R}^{d \times d_{r,x} \times d_{r,y} \times d_{r,z} }$
 %Typically, $d_{r,i} > d_{s,i}$ and $d_{z}=1$ for $d=2$.
 
-We solve a discretized PDE $\mathcal{P}$ by performing steps of size $\Delta t$.
+With numerical simulations we solve a discretized PDE $\mathcal{P}$ by performing steps of size $\Delta t$.
 The solution can be expressed as a function of $\mathbf{u}$ and its derivatives:
 $\mathbf{u}(\mathbf{x},t+\Delta t) = 
 \mathcal{P}( \mathbf{u}_{x}, \mathbf{u}_{xx}, ... \mathbf{u}_{xx...x} )$, where
diff --git a/overview-ns-forw.ipynb b/overview-ns-forw.ipynb
index 052df68..0b018b0 100644
--- a/overview-ns-forw.ipynb
+++ b/overview-ns-forw.ipynb
@@ -8,7 +8,7 @@
       "source": [
         "# Navier-Stokes Forward Simulation\n",
         "\n",
-        "Now let's target a somewhat more complex example: a fluid simulation based on the Navier-Stokes equations. This is still very simple with Φ<sub>Flow</sub> (phiflow), as differentiable operators for all steps exist there. The Navier-Stokes equations (in their incompressible form) introduce an additional pressure field $p$, and a constraint for conservation of mass, as introduced in equation {eq}`model-boussinesq2d`. We're also moving a marker field, denoted by $d$ here, with the flow. It indicates regions of higher temperature, and exerts a force via a buouyancy factor $\\xi$:\n",
+        "Now let's target a somewhat more complex example: a fluid simulation based on the Navier-Stokes equations. This is still very simple with Φ<sub>Flow</sub> (phiflow), as differentiable operators for all steps of the simulator are already available in phiflow. The Navier-Stokes equations (in their incompressible form) introduce an additional pressure field $p$, and a constraint for conservation of mass, as introduced in equation {eq}`model-boussinesq2d`. We're also moving a marker field, denoted by $d$ here, with the flow. It indicates regions of higher temperature, and exerts a force via a buouyancy factor $\\xi$:\n",
         "\n",
         "$$\\begin{aligned}\n",
         "    \\frac{\\partial \\mathbf{u}}{\\partial{t}} + \\mathbf{u} \\cdot \\nabla \\mathbf{u} &= - \\frac{1}{\\rho} \\nabla p + \\nu \\nabla\\cdot \\nabla \\mathbf{u} + (0,1)^T \\xi d\n",
@@ -589,4 +589,4 @@
   },
   "nbformat": 4,
   "nbformat_minor": 0
-}
\ No newline at end of file
+}
diff --git a/overview-optconv.md b/overview-optconv.md
index 2ec0d0e..ba51031 100644
--- a/overview-optconv.md
+++ b/overview-optconv.md
@@ -2,7 +2,7 @@ Optimization and Convergence
 ============================
 
 This chapter will give an overview of the derivations for different optimization algorithms.
-In contrast to other texts, we'll start with _the_ classic optimization algorithm, Newton's method,
+In contrast to other texts, we'll start with _the most classic_ optimization algorithm, Newton's method,
 derive several widely used variants from it, before coming back full circle to deep learning (DL) optimizers.
 The main goal is the put DL into the context of these classical methods. While we'll focus on DL, we will also revisit
 the classical algorithms for improved learning algorithms later on in this book. Physics simulations exaggerate the difficulties caused by neural networks, which is why the topics below have a particular relevance for physics-based learning tasks.
diff --git a/overview.md b/overview.md
index d4ab8e6..10ef8c0 100644
--- a/overview.md
+++ b/overview.md
@@ -236,7 +236,7 @@ give introductions into the differentiable simulation framework _Φ<sub>Flow</su
 Thus after going through these examples, you should have a good overview of what's available in current APIs, such that
 the best one can be selected for new tasks.
 
-As we're (in most Jupyter notebook examples) dealing with stochastic optimizations, many of the following code examples will produce slightly different results each time they're run. This is fairly common with NN training, but it's important to keep in mind when executing the code. It also means that the numbers discussed in the text might not exactly match the numbers you'll see after re-running the examples.
+As we're  dealing with stochastic optimizations in most of the Jupyter notebooks, many of the following code examples will produce slightly different results each time they're run. This is fairly common with NN training, but it's important to keep in mind when executing the code. It also means that the numbers discussed in the text might not exactly match the numbers you'll see after re-running the examples.
 
 <!-- ## A brief history of PBDL in the context of Fluids
 First: Tompson, seminal...