fixing typos

2022-05-21 14:11:33 +02:00 · 2022-05-21 14:11:33 +02:00 · a30cbbc557
commit a30cbbc557
parent 1fa19cd8c0
3 changed files with 8 additions and 8 deletions
--- a/overview-burgers-forw.ipynb
+++ b/overview-burgers-forw.ipynb
@ -132,7 +132,7 @@
   "source": [
    "## Running the simulation\n",
    "\n",
-    "Now we're ready to run the simulation itself. To ccompute the diffusion and advection components of our model equation we can simply call the existing `diffusion` and `semi_lagrangian` operators in phiflow: `diffuse.explicit(u,...)` computes an explicit diffusion step via central differences for the term $\\nu \\nabla\\cdot \\nabla u$ of our model. Next, `advect.semi_lagrangian(f,u)` is used for a stable first-order approximation of the transport of an arbitrary field `f` by a velocity `u`. In our model we have $\\partial u / \\partial{t} + u \\nabla f$, hence we use the `semi_lagrangian` function to transport the velocity with itself in the implementation:"
+    "Now we're ready to run the simulation itself. To compute the diffusion and advection components of our model equation we can simply call the existing `diffusion` and `semi_lagrangian` operators in phiflow: `diffuse.explicit(u,...)` computes an explicit diffusion step via central differences for the term $\\nu \\nabla\\cdot \\nabla u$ of our model. Next, `advect.semi_lagrangian(f,u)` is used for a stable first-order approximation of the transport of an arbitrary field `f` by a velocity `u`. In our model we have $\\partial u / \\partial{t} + u \\nabla f$, hence we use the `semi_lagrangian` function to transport the velocity with itself in the implementation:"
   ]
  },
  {
--- a/overview-ns-forw.ipynb
+++ b/overview-ns-forw.ipynb
@ -15,7 +15,7 @@
    "\\end{aligned}$$\n",
    "\n",
    "\n",
-    "Here, we're aiming for an incompressible flow (i.e., $\\rho = \\text{const}$), and use a simple buoyancy model (the Boussinesq approximation) via the term $(0,1)^T \\xi d$. This models changes in density without explicitly calculating $\\rho$, and we assume a gravity force that acts along the y direction via the vector $(0,1)^T$. \n",
+    "Here, we're aiming for an incompressible flow (i.e., $\\rho = \\text{const}$), and use a simple buoyancy model (the Boussinesq approximation) via the term $(0,1)^T \\xi d$. This approximates changes in density for incompressible solvers, without explicitly calculating $\\rho$. We assume a gravity force that acts along the y direction via the vector $(0,1)^T$. \n",
    "We'll solve this PDE on a closed domain with Dirichlet boundary conditions $\\mathbf{u}=0$ for the velocity, and Neumann boundaries $\\frac{\\partial p}{\\partial x}=0$ for pressure, on a domain $\\Omega$ with a physical size of $100 \\times 80$ units. \n",
    "[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/overview-ns-forw.ipynb)\n",
    "\n"
--- a/overview-optconv.md
+++ b/overview-optconv.md
@ -57,7 +57,7 @@ Then we also need the _Lagrange form_, which yields an exact solution for a $\xi

 $$L(x+\Delta) = L + J \Delta + \frac{1}{2} H(\xi) \Delta^2$$

-In several instances we'll make use of the fundamental theorem of calculus, repeated here for completeness:   
+In several instances we'll make use of the fundamental theorem of calculus, repeated here for completeness:   

 $$f(x+\Delta) = f(x) \int_0^1 \text{d}s ~ f'(x+s \Delta) \Delta \ . $$

@ -128,7 +128,7 @@ a parabola, and a small $H$ might overshoot in undesirable ways. The far left in

 To make statements about convergence, we need some fundamental assumptions: convexity and smoothness
 of our loss function. Then we'll focus on showing that the loss decreases, and 
-that we move along a sequence of smaller sets $\forall x ~ L(x)<L(x_n)$
+that we move along a sequence of smaller sets $\forall x ~ L(x)<L(x_n)$
 with lower loss values.

 First, we apply the fundamental theorem to L
@ -249,7 +249,7 @@ for solving classical non-linear optimization problems.
 Another attractive variant of Newton's method can be derived by
 restricting $L$ to be a classical $L^2$ loss. This gives the _Gauss-Newton_ (GN) algorithm.
 Thus, we still use $\Delta = - \frac{J^T}{H}$ , but 
-rely on a squared loss of the form $L=|f|^2$ for an arbitrary $f(x)$.
+rely on a squared loss of the form $L=|f|^2$ for an arbitrary $f(x)$.
 The derivatives of $f$ are denoted by $J_f, H_f$, in contrast to the
 generic $J,H$ for $L$, as before.
 Due to the chain rule, we have $J=2~f^T J_f$. 
@ -296,8 +296,8 @@ $$
    H \approx \sqrt{\text{diag}(J^TJ)}
 $$ (h-approx-adam)

-This is a very rough approximation of the true Hessian. We're simply using the squared, first derivatives here, and in general, of course, $\Big( \frac{\partial f}{\partial x} \Big)^2 \ne \frac{\partial^2 f}{\partial x^2}$.
-This only holds for the first-order approximation from Gauss-Newton, i.e., the first term ofequation {eq}`gauss-newton-approx`. Now Adam goes a step further, and only keeps the diagonal of $J^T J$. This quantity is readily available in deep learning in the form of the gradient of the weights, and makes the inversion of $H$ trivial. As a result, it at least provides some estimate of the curvature of the individual weights, but neglects their correlations.
+This is a very rough approximation of the true Hessian. We're simply using the squared, first derivatives here, and in general, of course, $\Big( \frac{\partial f}{\partial x} \Big)^2 \ne \frac{\partial^2 f}{\partial x^2}$.
+This only holds for the first-order approximation from Gauss-Newton, i.e., the first term of equation {eq}`gauss-newton-approx`. Now Adam goes a step further, and only keeps the diagonal of $J^T J$. This quantity is readily available in deep learning in the form of the gradient of the weights, and makes the inversion of $H$ trivial. As a result, it at least provides some estimate of the curvature of the individual weights, but neglects their correlations.

 Interestingly, Adam does not perform a full inversion via $\text{diag}(J^T J)$, but uses the component-wise square root.
 This effectively yields $\sqrt{\text{diag}(J^T J)} \approx \sqrt{\text{diag}(J^2)} \approx \text{diag}(J)$.
@ -340,7 +340,7 @@ $$\begin{aligned}

 %      & \le - \frac{ \lambda}{2} | J|^2
 % thus: $L(x+\Delta) \le L(x) -\lambda |J|^2 +  \frac{ \lambda^2 \mathcal L}{2} | J|^2$
-% (Plus NNs not convex… plus ReLU not twice diff.able, H is zero everywhere and undefined in some places)
+% (Plus NNs not convex... plus ReLU not twice diff.able, H is zero everywhere and undefined in some places)

 Like above for Newton's method in equation {eq}`newton-step-size-conv` we have a negative linear term
 that dominates the loss for small enough $\lambda$.