fixing typos
This commit is contained in:
parent
1fa19cd8c0
commit
a30cbbc557
@ -132,7 +132,7 @@
|
||||
"source": [
|
||||
"## Running the simulation\n",
|
||||
"\n",
|
||||
"Now we're ready to run the simulation itself. To ccompute the diffusion and advection components of our model equation we can simply call the existing `diffusion` and `semi_lagrangian` operators in phiflow: `diffuse.explicit(u,...)` computes an explicit diffusion step via central differences for the term $\\nu \\nabla\\cdot \\nabla u$ of our model. Next, `advect.semi_lagrangian(f,u)` is used for a stable first-order approximation of the transport of an arbitrary field `f` by a velocity `u`. In our model we have $\\partial u / \\partial{t} + u \\nabla f$, hence we use the `semi_lagrangian` function to transport the velocity with itself in the implementation:"
|
||||
"Now we're ready to run the simulation itself. To compute the diffusion and advection components of our model equation we can simply call the existing `diffusion` and `semi_lagrangian` operators in phiflow: `diffuse.explicit(u,...)` computes an explicit diffusion step via central differences for the term $\\nu \\nabla\\cdot \\nabla u$ of our model. Next, `advect.semi_lagrangian(f,u)` is used for a stable first-order approximation of the transport of an arbitrary field `f` by a velocity `u`. In our model we have $\\partial u / \\partial{t} + u \\nabla f$, hence we use the `semi_lagrangian` function to transport the velocity with itself in the implementation:"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -15,7 +15,7 @@
|
||||
"\\end{aligned}$$\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Here, we're aiming for an incompressible flow (i.e., $\\rho = \\text{const}$), and use a simple buoyancy model (the Boussinesq approximation) via the term $(0,1)^T \\xi d$. This models changes in density without explicitly calculating $\\rho$, and we assume a gravity force that acts along the y direction via the vector $(0,1)^T$. \n",
|
||||
"Here, we're aiming for an incompressible flow (i.e., $\\rho = \\text{const}$), and use a simple buoyancy model (the Boussinesq approximation) via the term $(0,1)^T \\xi d$. This approximates changes in density for incompressible solvers, without explicitly calculating $\\rho$. We assume a gravity force that acts along the y direction via the vector $(0,1)^T$. \n",
|
||||
"We'll solve this PDE on a closed domain with Dirichlet boundary conditions $\\mathbf{u}=0$ for the velocity, and Neumann boundaries $\\frac{\\partial p}{\\partial x}=0$ for pressure, on a domain $\\Omega$ with a physical size of $100 \\times 80$ units. \n",
|
||||
"[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/overview-ns-forw.ipynb)\n",
|
||||
"\n"
|
||||
|
@ -57,7 +57,7 @@ Then we also need the _Lagrange form_, which yields an exact solution for a $\xi
|
||||
|
||||
$$L(x+\Delta) = L + J \Delta + \frac{1}{2} H(\xi) \Delta^2$$
|
||||
|
||||
In several instances we'll make use of the fundamental theorem of calculus, repeated here for completeness: 
|
||||
In several instances we'll make use of the fundamental theorem of calculus, repeated here for completeness:
|
||||
|
||||
$$f(x+\Delta) = f(x) \int_0^1 \text{d}s ~ f'(x+s \Delta) \Delta \ . $$
|
||||
|
||||
@ -128,7 +128,7 @@ a parabola, and a small $H$ might overshoot in undesirable ways. The far left in
|
||||
|
||||
To make statements about convergence, we need some fundamental assumptions: convexity and smoothness
|
||||
of our loss function. Then we'll focus on showing that the loss decreases, and
|
||||
that we move along a sequence of smaller sets $\forall x ~ L(x)<L(x_n)$
|
||||
that we move along a sequence of smaller sets $\forall x ~ L(x)<L(x_n)$
|
||||
with lower loss values.
|
||||
|
||||
First, we apply the fundamental theorem to L
|
||||
@ -249,7 +249,7 @@ for solving classical non-linear optimization problems.
|
||||
Another attractive variant of Newton's method can be derived by
|
||||
restricting $L$ to be a classical $L^2$ loss. This gives the _Gauss-Newton_ (GN) algorithm.
|
||||
Thus, we still use $\Delta = - \frac{J^T}{H}$ , but
|
||||
rely on a squared loss of the form $L=|f|^2$ for an arbitrary $f(x)$.
|
||||
rely on a squared loss of the form $L=|f|^2$ for an arbitrary $f(x)$.
|
||||
The derivatives of $f$ are denoted by $J_f, H_f$, in contrast to the
|
||||
generic $J,H$ for $L$, as before.
|
||||
Due to the chain rule, we have $J=2~f^T J_f$.
|
||||
@ -296,8 +296,8 @@ $$
|
||||
H \approx \sqrt{\text{diag}(J^TJ)}
|
||||
$$ (h-approx-adam)
|
||||
|
||||
This is a very rough approximation of the true Hessian. We're simply using the squared, first derivatives here, and in general, of course, $\Big( \frac{\partial f}{\partial x} \Big)^2 \ne \frac{\partial^2 f}{\partial x^2}$.
|
||||
This only holds for the first-order approximation from Gauss-Newton, i.e., the first term ofequation {eq}`gauss-newton-approx`. Now Adam goes a step further, and only keeps the diagonal of $J^T J$. This quantity is readily available in deep learning in the form of the gradient of the weights, and makes the inversion of $H$ trivial. As a result, it at least provides some estimate of the curvature of the individual weights, but neglects their correlations.
|
||||
This is a very rough approximation of the true Hessian. We're simply using the squared, first derivatives here, and in general, of course, $\Big( \frac{\partial f}{\partial x} \Big)^2 \ne \frac{\partial^2 f}{\partial x^2}$.
|
||||
This only holds for the first-order approximation from Gauss-Newton, i.e., the first term of equation {eq}`gauss-newton-approx`. Now Adam goes a step further, and only keeps the diagonal of $J^T J$. This quantity is readily available in deep learning in the form of the gradient of the weights, and makes the inversion of $H$ trivial. As a result, it at least provides some estimate of the curvature of the individual weights, but neglects their correlations.
|
||||
|
||||
Interestingly, Adam does not perform a full inversion via $\text{diag}(J^T J)$, but uses the component-wise square root.
|
||||
This effectively yields $\sqrt{\text{diag}(J^T J)} \approx \sqrt{\text{diag}(J^2)} \approx \text{diag}(J)$.
|
||||
@ -340,7 +340,7 @@ $$\begin{aligned}
|
||||
|
||||
% & \le - \frac{ \lambda}{2} | J|^2
|
||||
% thus: $L(x+\Delta) \le L(x) -\lambda |J|^2 + \frac{ \lambda^2 \mathcal L}{2} | J|^2$
|
||||
% (Plus NNs not convex… plus ReLU not twice diff.able, H is zero everywhere and undefined in some places)
|
||||
% (Plus NNs not convex... plus ReLU not twice diff.able, H is zero everywhere and undefined in some places)
|
||||
|
||||
Like above for Newton's method in equation {eq}`newton-step-size-conv` we have a negative linear term
|
||||
that dominates the loss for small enough $\lambda$.
|
||||
|
Loading…
Reference in New Issue
Block a user