udpate of simple comparison code

2022-03-11 16:20:22 +08:00 · 2022-03-11 16:20:22 +08:00 · 627bd1f94f
commit 627bd1f94f
parent ac4faf08bb
2 changed files with 24 additions and 20 deletions
--- a/physgrad-comparison.ipynb
+++ b/physgrad-comparison.ipynb
@ -4,7 +4,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Simple Example with Physical Gradients\n",
+    "# Simple Example comparing Different Optimizers\n",
    "\n",
    "The previous section has made many comments about the advantages and disadvantages of different optimization methods. Below we'll show with a practical example how much differences these properties actually make.\n",
    "\n",
@ -46,7 +46,7 @@
    "\n",
    "In order to understand the following examples, it's important to keep in mind that we're dealing with mappings between the three _spaces_ we've introduced here:\n",
    "$\\mathbf{x}$, $\\mathbf{y}$ and $L$. A regular forward pass maps an\n",
-    "$\\mathbf{x}$ to $L$, while for the optimization we'll need to associate values\n",
+    "$\\mathbf{x}$ via $\\mathbf{y}$ to $L$, while for the optimization we'll need to associate values\n",
    "and changes in $L$ with positions in $\\mathbf{x}$. While doing this, it will \n",
    "be interesting how this influences the positions in $\\mathbf{y}$ that develop while searching for\n",
    "the right position in $\\mathbf{x}$.\n",
@ -186,7 +186,7 @@
    "J = jax.jacobian(fun_y)(x)\n",
    "print( \"Jacobian y(x): \\n\" + format(J) ) \n",
    "\n",
-    "# the following also gives error, JAX grad needs a single function object\n",
+    "# the code below also gives error, JAX grad needs a single function object\n",
    "#jax.grad( fun_L(fun_y) )(x) \n",
    "\n",
    "print( \"\\nSanity check with inverse Jacobian of y, this should give x again: \" + format(np.linalg.solve(J, np.matmul(J,x) )) +\"\\n\")\n",
@ -207,7 +207,7 @@
    "\n",
    "* _Newton's method_ as a representative of the second order methods, \n",
    "\n",
-    "* and _physical gradients_.\n"
+    "* and scale-invariant updates from _inverse simulators_.\n"
   ]
  },
  {
@ -224,7 +224,7 @@
    "&= \n",
    "- \\eta ( J_{L} J_{\\mathbf{y}} )^T  \\\\\n",
    "&=\n",
-    "- \\eta ( \\frac{\\partial L }{ \\partial \\mathbf{y} } \\frac{\\partial \\mathbf{y} }{ \\partial \\mathbf{x} }  )^T\n",
+    "- \\eta \\big( \\frac{\\partial L }{ \\partial \\mathbf{y} } \\frac{\\partial \\mathbf{y} }{ \\partial \\mathbf{x} }  \\big)^T\n",
    "\\end{aligned}$$\n",
    "\n",
    "where $\\eta$ denotes the step size parameter .\n",
@ -448,10 +448,10 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Physical Gradients\n",
+    "## Inverse simulators\n",
    "\n",
-    "Now we also use inverse physics, i.e. the inverse of y:\n",
-    "$\\mathbf{y}^{-1}(\\mathbf{x}) = [x_0 \\ x_1^{1/2}]^T$, to compute the _physical gradient_. As a slight look-ahead to the next section, we'll use a Newton's step for $L$, and combine it with the inverse physics function to get an overall update. This gives an update step:\n",
+    "Now we also use an analytical inverse of y for the optimization:\n",
+    "$\\mathbf{y}^{-1}(\\mathbf{x}) = [x_0 \\ x_1^{1/2}]^T$, to compute the scale-invariant update denoted by PG below. As a slight look-ahead to the next section, we'll use a Newton's step for $L$, and combine it with the inverse physics function to get an overall update. This gives an update step:\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\Delta \\mathbf{x} &= \n",
@ -461,7 +461,7 @@
    "\\right) - \\mathbf{x}\n",
    "\\end{aligned}$$\n",
    "\n",
-    "Below, we define our inverse function `fun_y_inv_analytic` (we'll come to a variant below), and then evaluate an optimization with the physical gradient for ten steps:\n"
+    "Below, we define our inverse function `fun_y_inv_analytic` (we'll come to a variant below), and then evaluate an optimization with the PG update for ten steps:\n"
   ]
  },
  {
@ -760,7 +760,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "These trajectories confirm the intuition outlined in the previous sections: GD in blue gives a very sub-optimal trajectory in $\\mathbf{y}$. Newton (in orange) does better, but is still clearly curved, in contrast to the straight, and diagonal red trajectory for the PG-based optimization.\n",
+    "These trajectories confirm the intuition outlined in the previous sections: GD in blue gives a very sub-optimal trajectory in $\\mathbf{y}$. Newton (in orange) does better, but is still clearly curved. It can't approximate the higher order terms of this example well enough. This is in contrast to the straight, and diagonal red trajectory for the optimization using the inverse simulator.\n",
    "\n",
    "The behavior in intermediate spaces becomes especially important when they're not only abstract latent spaces as in this example, but when they have actual physical meanings."
   ]
@ -771,13 +771,13 @@
   "source": [
    "## Conclusions \n",
    "\n",
-    "That concludes our simple example. Despite its simplicity, it already showed surprisingly large differences between gradient descent, Newton's method, and the physical gradients.\n",
+    "Despite its simplicity, this simple example already shows surprisingly large differences between gradient descent, Newton's method, and using the _inverse simulator_.\n",
    "\n",
-    "The main takeaways of this section are:\n",
-    "* GD easily yields \"unbalanced\" updates\n",
-    "* Newtons method does better, but is far from optimal\n",
-    "* PGs outperform both if an inverse function is available\n",
-    "* The choice of optimizer strongly affects progress in latent spaces\n",
+    "The main takeaways of this section are the following.\n",
+    "* GD easily yields \"unbalanced\" updates, and gets stuck.\n",
+    "* Newtons method does better, but is far from optimal.\n",
+    "* the higher-order information of the invese simulator  outperform both, even if it is applied only partially (we still used Newton's method for $L$ above).\n",
+    "* Also, the choice of optimizer strongly affects progress in latent spaces like $\\mathbf{y}$.\n",
    "    \n",
    "In the next sections we can build on these observations to use PGs for training NNs via invertible physical models."
   ]
@ -791,7 +791,7 @@
    "\n",
    "## Approximate inversions\n",
    "\n",
-    "If an analytic inverse like the `fun_y_inv_analytic` above is not readily available, we can actually resort to optimization schemes like Newton's method or BFGS to approximate it numerically. This is a topic that is orthogonal to the comparison of different optimization methods, but it can be easily illustrated based on the PG example above.\n",
+    "If an analytic inverse like the `fun_y_inv_analytic` above is not readily available, we can actually resort to optimization schemes like Newton's method or BFGS to obtain a local inverse numerically. This is a topic that is orthogonal to the comparison of different optimization methods, but it can be easily illustrated based on the PG example above.\n",
    "\n",
    "Below, we'll use the BFGS variant `fmin_l_bfgs_b` from `scipy` to compute the inverse. It's not very complicated, but we'll use numpy and scipy directly here, which makes the code a bit messier than it should be."
   ]
--- a/physgrad.md
+++ b/physgrad.md
@ -101,6 +101,7 @@ The Jacobian $\frac{\partial L}{\partial x}$ describes how the loss reacts to sm
 Surprisingly, this very widely used update has a number of undesirable properties that we'll highlight in the following. Note that we've naturally applied this update in supervised settings such as {doc}`supervised-airfoils`, but we've also used it in the differentiable physics approaches. E.g., in {doc}`diffphys-code-sol` we've computed the derivative of the fluid solver. In the latter case, we've still only updated the NN parameters, but the fluid solver Jacobian was part of {eq}`GD-update`, as shown in {eq}`loss-deriv`.


+
 **Units** 📏

 A first indicator that something is amiss with GD is that it inherently misrepresents dimensions.
@ -315,13 +316,16 @@ This expression yields a first iterative method that makes use of $\mathcal P^{-

 ## Summary

-The update obtained with a regular gradient descent method has surprising shortcomings.
+The update obtained with a regular gradient descent method has surprising shortcomings due to scaling issues.
 Classical, inversion-based methods like IGs and Newton's method remove some of these shortcomings,
 with the somewhat theoretical construct of the update from inverse simulators ($\Delta x_{\text{PG}}$)
 including the most higher-order terms.
 As such, it is interesting to consider as an "ideal" setting for improved (inverted) update steps. 
+It get's all of the aspect above right: units 📏, function sensitivity 🔍, compositions, and convergence near optima 💎.
+It provides a _scale-invariant_ update.
+This comes at the cost of requiring an expression and discretization for a local inverse solver 🎩.

-In contrast to the second- and first-order approximations from Newton's method and IGs, it can potentially take highly nonlinear effects into account. This comes at the cost of requiring an expression and discretization for a local inverse solver, but the main goal of the following sections is to illustrate how much we can gain from including all the higher-order information. Note that all three methods successfully include a rescaling of the search direction via inversion, in contrast to the previously discussed GD training. All of these methods represent different forms of differentiable physics, though.
+In contrast to the second- and first-order approximations from Newton's method and IGs, it can potentially take highly nonlinear effects into account. Due to the potentially difficult construct of the inverse simulator, the main goal of the following sections is to illustrate how much we can gain from including all the higher-order information. Note that all three methods successfully include a rescaling of the search direction via inversion, in contrast to the previously discussed GD training. All of these methods represent different forms of differentiable physics, though.

 Before moving on to including improved updates in NN training processes, we will discuss some additional theoretical aspects, 
 and then illustrate the differences between these approaches with a practical example.
@ -331,7 +335,7 @@ and then illustrate the differences between these approaches with a practical ex
 ```{note} 
 The following sections will provide an in-depth look ("deep-dive"), into 
 optimizations with inverse solvers. If you're interested in practical examples
-and connections to NNs, feel free to skip ahead to {doc}`physgrad-code` or 
+and connections to NNs, feel free to skip ahead to {doc}`physgrad-comparison` or 
 {doc}`physgrad-nn`, respectively.
 ```