updated SIP code example
This commit is contained in:
parent
203bcaa934
commit
6e36804033
9
_toc.yml
9
_toc.yml
@ -34,6 +34,15 @@ parts:
|
|||||||
chapters:
|
chapters:
|
||||||
- file: reinflearn-intro.md
|
- file: reinflearn-intro.md
|
||||||
- file: reinflearn-code.ipynb
|
- file: reinflearn-code.ipynb
|
||||||
|
- caption: Improved Gradients
|
||||||
|
chapters:
|
||||||
|
- file: physgrad.md
|
||||||
|
- file: physgrad-comparison.ipynb
|
||||||
|
- file: physgrad-nn.md
|
||||||
|
- file: physgrad-code.ipynb
|
||||||
|
- file: physgrad-hig.md
|
||||||
|
- file: physgrad-hig-code.ipynb
|
||||||
|
- file: physgrad-discuss.md
|
||||||
- caption: PBDL and Uncertainty
|
- caption: PBDL and Uncertainty
|
||||||
chapters:
|
chapters:
|
||||||
- file: bayesian-intro.md
|
- file: bayesian-intro.md
|
||||||
|
@ -17,10 +17,10 @@ fileList = [
|
|||||||
"bayesian-code.ipynb", "supervised-airfoils.ipynb", # pytorch
|
"bayesian-code.ipynb", "supervised-airfoils.ipynb", # pytorch
|
||||||
"reinflearn-code.ipynb", # phiflow
|
"reinflearn-code.ipynb", # phiflow
|
||||||
"physgrad-comparison.ipynb", # jax
|
"physgrad-comparison.ipynb", # jax
|
||||||
|
"physgrad-code.ipynb", # pip
|
||||||
]
|
]
|
||||||
|
|
||||||
#fileList = [ "diffphys-code-burgers.ipynb"] # debug, only 1 file
|
#fileList = [ "physgrad-code.ipynb"] # debug, only 1 file
|
||||||
#fileList = [ "diffphys-code-ns.ipynb"] # debug, only 1 file
|
|
||||||
|
|
||||||
|
|
||||||
# main
|
# main
|
||||||
@ -55,6 +55,10 @@ for fnOut in fileList:
|
|||||||
res.append( re.compile(r"Building wheel") ) # phiflow install, also gives weird unicode characters
|
res.append( re.compile(r"Building wheel") ) # phiflow install, also gives weird unicode characters
|
||||||
res.append( re.compile(r"warnings.warn") ) # phiflow warnings
|
res.append( re.compile(r"warnings.warn") ) # phiflow warnings
|
||||||
res.append( re.compile(r"WARNING:absl") ) # jax warnings
|
res.append( re.compile(r"WARNING:absl") ) # jax warnings
|
||||||
|
|
||||||
|
res.append( re.compile(r"ERROR: pip") ) # pip dependencies
|
||||||
|
res.append( re.compile(r"requires imgaug") ) # pip dependencies
|
||||||
|
|
||||||
# remove all "warnings.warn" from phiflow?
|
# remove all "warnings.warn" from phiflow?
|
||||||
|
|
||||||
# shorten data line: "0.008612174447657694, 0.02584669669548606, 0.043136357266407785"
|
# shorten data line: "0.008612174447657694, 0.02584669669548606, 0.043136357266407785"
|
||||||
|
10
make-pdf.sh
10
make-pdf.sh
@ -1,16 +1,14 @@
|
|||||||
# source this file with "." in a shell
|
# source this file with "." in a shell
|
||||||
|
|
||||||
# note this script assumes the following paths/versions
|
# note this script assumes the following paths/versions: python3.7 , /Users/thuerey/Library/Python/3.7/bin/jupyter-book
|
||||||
# python3.7
|
|
||||||
# /Users/thuerey/Library/Python/3.7/bin/jupyter-book
|
# do clean git checkout for changes from json-cleanup-for-pdf.py via:
|
||||||
|
# git checkout diffphys-code-burgers.ipynb diffphys-code-ns.ipynb diffphys-code-sol.ipynb physicalloss-code.ipynb bayesian-code.ipynb supervised-airfoils.ipynb reinflearn-code.ipynb
|
||||||
|
|
||||||
echo
|
echo
|
||||||
echo WARNING - still requires one manual quit of first pdf/latex pass, use shift-x to quit
|
echo WARNING - still requires one manual quit of first pdf/latex pass, use shift-x to quit
|
||||||
echo
|
echo
|
||||||
|
|
||||||
# do clean git checkout for changes from json-cleanup-for-pdf.py?
|
|
||||||
# git checkout diffphys-code-burgers.ipynb diffphys-code-ns.ipynb diffphys-code-sol.ipynb physicalloss-code.ipynb bayesian-code.ipynb supervised-airfoils.ipynb reinflearn-code.ipynb
|
|
||||||
|
|
||||||
# warning - modifies notebooks!
|
# warning - modifies notebooks!
|
||||||
python3.7 json-cleanup-for-pdf.py
|
python3.7 json-cleanup-for-pdf.py
|
||||||
|
|
||||||
|
1176
physgrad-code.ipynb
1176
physgrad-code.ipynb
File diff suppressed because one or more lines are too long
@ -7,6 +7,7 @@
|
|||||||
"# Simple Example comparing Different Optimizers\n",
|
"# Simple Example comparing Different Optimizers\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The previous section has made many comments about the advantages and disadvantages of different optimization methods. Below we'll show with a practical example how much differences these properties actually make.\n",
|
"The previous section has made many comments about the advantages and disadvantages of different optimization methods. Below we'll show with a practical example how much differences these properties actually make.\n",
|
||||||
|
"[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/physgrad-comparison.ipynb)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Problem formulation\n",
|
"## Problem formulation\n",
|
||||||
@ -884,7 +885,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Nice! It works, just like the PG version above. Not much point plotting this, it's basically the same, but let's measure the difference. Below, we compute the MAE, which for this simple example turns out to be on the order of our floating point accuracy."
|
"This confirms that the approximate inversion works, in line with the regular PG version above. There's not much point plotting this, as it's basically the same, but let's measure the difference. Below, we compute the MAE, which for this simple example turns out to be on the order of our floating point accuracy."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -44,7 +44,7 @@ To update the weights $\theta$ of the NN $f$, we perform the following update st
|
|||||||
* Given a set of inputs $y^*$, evaluate the forward pass to compute the NN prediction $x = f(y^*; \theta)$
|
* Given a set of inputs $y^*$, evaluate the forward pass to compute the NN prediction $x = f(y^*; \theta)$
|
||||||
* Compute $y$ via a forward simulation ($y = \mathcal P(x)$) and invoke the (local) inverse simulator $P^{-1}(y; x)$ to obtain the step $\Delta x_{\text{PG}} = \mathcal P^{-1} (y + \eta \Delta y; x)$ with $\Delta y = y^* - y$
|
* Compute $y$ via a forward simulation ($y = \mathcal P(x)$) and invoke the (local) inverse simulator $P^{-1}(y; x)$ to obtain the step $\Delta x_{\text{PG}} = \mathcal P^{-1} (y + \eta \Delta y; x)$ with $\Delta y = y^* - y$
|
||||||
* Evaluate the network loss, e.g., $L = \frac 1 2 || x - \tilde x ||_2^2$ with $\tilde x = x+\Delta x_{\text{PG}}$, and perform a Newton step treating $\tilde x$ as a constant
|
* Evaluate the network loss, e.g., $L = \frac 1 2 || x - \tilde x ||_2^2$ with $\tilde x = x+\Delta x_{\text{PG}}$, and perform a Newton step treating $\tilde x$ as a constant
|
||||||
* Use GD (or a GD-based optimizer like Adam) to propagate the change in $x$ to the network weights $\theta$ with a learning rate $\eta_{\text{NN}}
|
* Use GD (or a GD-based optimizer like Adam) to propagate the change in $x$ to the network weights $\theta$ with a learning rate $\eta_{\text{NN}}$
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
@ -74,11 +74,11 @@ The central reason for introducing a Newton step is the improved accuracy for th
|
|||||||
Unlike with regular Newton or the quasi-Newton methods from equation {eq}`quasi-newton-update`, we do not need the Hessian of the full system.
|
Unlike with regular Newton or the quasi-Newton methods from equation {eq}`quasi-newton-update`, we do not need the Hessian of the full system.
|
||||||
Instead, the Hessian is only needed for $L(y)$.
|
Instead, the Hessian is only needed for $L(y)$.
|
||||||
This makes Newton's method attractive again.
|
This makes Newton's method attractive again.
|
||||||
Even better, for many typical $L$ its computation can be completely forgone.
|
Even better, for many typical $L$ the analytical form of the Newton updates is known.
|
||||||
|
|
||||||
E.g., consider the most common supervised objective function, $L(y) = \frac 1 2 | y - y^*|_2^2$ as already put to use above. $y$ denotes the predicted, and $y^*$ the target value.
|
E.g., consider the most common supervised objective function, $L(y) = \frac 1 2 | y - y^*|_2^2$ as already put to use above. $y$ denotes the predicted, and $y^*$ the target value.
|
||||||
We then have $\frac{\partial L}{\partial y} = y - y^*$ and $\frac{\partial^2 L}{\partial y^2} = 1$.
|
We then have $\frac{\partial L}{\partial y} = y - y^*$ and $\frac{\partial^2 L}{\partial y^2} = 1$.
|
||||||
Using equation {eq}`quasi-newton-update`, we get $\Delta y = \eta \cdot (y^* - y)$ which can be computed without evaluating the Hessian.
|
Using equation {eq}`quasi-newton-update`, we get $\Delta y = \eta \cdot (y^* - y)$ which can be computed right away, without evaluating any additional Hessian matrices.
|
||||||
|
|
||||||
Once $\Delta y$ is determined, the gradient can be backpropagated to earlier time steps using the inverse simulator $\mathcal P^{-1}$. We've already used this combination of a Newton step for the loss and an inverse simulator for the PDE in {doc}`physgrad-comparison`.
|
Once $\Delta y$ is determined, the gradient can be backpropagated to earlier time steps using the inverse simulator $\mathcal P^{-1}$. We've already used this combination of a Newton step for the loss and an inverse simulator for the PDE in {doc}`physgrad-comparison`.
|
||||||
|
|
||||||
@ -87,8 +87,8 @@ It is not to be confused with a traditional supervised loss in $x$ space.
|
|||||||
Due to the dependency of $\mathcal P^{-1}$ on the prediction $y$, it does not average multiple modes of solutions in $x$.
|
Due to the dependency of $\mathcal P^{-1}$ on the prediction $y$, it does not average multiple modes of solutions in $x$.
|
||||||
To demonstrate this, consider the case that GD is being used as solver for the inverse simulation.
|
To demonstrate this, consider the case that GD is being used as solver for the inverse simulation.
|
||||||
Then the total loss is purely defined in $y$ space, reducing to a regular first-order optimization.
|
Then the total loss is purely defined in $y$ space, reducing to a regular first-order optimization.
|
||||||
Hence, the proxy loss function simply connects the computational graphs of inverse physics and NN for backpropagation.
|
|
||||||
|
|
||||||
|
Hence, the proxy loss function simply connects the computational graphs of inverse physics and NN for backpropagation.
|
||||||
|
|
||||||
## Iterations and time dependence
|
## Iterations and time dependence
|
||||||
|
|
||||||
@ -182,11 +182,12 @@ It provably converges when enough network updates $\Delta\theta$ are performed p
|
|||||||
While SIP training can find vastly more accurate solutions, there are some caveats to consider.
|
While SIP training can find vastly more accurate solutions, there are some caveats to consider.
|
||||||
%
|
%
|
||||||
First, an approximately scale-invariant physics solver is required. While in low-dimensional $x$ spaces, Newton's method is a good candidate, high-dimensional spaces require some other form of inversion.
|
First, an approximately scale-invariant physics solver is required. While in low-dimensional $x$ spaces, Newton's method is a good candidate, high-dimensional spaces require some other form of inversion.
|
||||||
Some equations can locally be inverted analytically but for complex problems, domain-specific knowledge may be required.
|
Some equations can locally be inverted analytically but for complex problems, domain-specific knowledge may be required,
|
||||||
|
or we can employ to numerical methods (coming up).
|
||||||
|
|
||||||
Second, SIP uses traditional first-order optimizers to determine $\Delta\theta$.
|
Second, SIP focuses on an accurate inversion of the physics part, but uses traditional first-order optimizers to determine $\Delta\theta$.
|
||||||
As discussed, these solvers behave poorly in ill-conditioned settings which can also affect SIP performance when the network outputs lie on different scales.
|
As discussed, these solvers behave poorly in ill-conditioned settings which can also affect SIP performance when the network outputs lie on very different scales.
|
||||||
Some recent works address this issue and have proposed network optimization based on inversion.
|
Thus, we should keep inversion for the NN in mind as a goal.
|
||||||
|
|
||||||
Third, while SIP training generally leads to more accurate solutions, measured in $x$ space, the same is not always true for the loss $L = \sum_i L_i$. SIP training weighs all examples equally, independent of their loss values.
|
Third, while SIP training generally leads to more accurate solutions, measured in $x$ space, the same is not always true for the loss $L = \sum_i L_i$. SIP training weighs all examples equally, independent of their loss values.
|
||||||
This can be useful, but it can cause problems in examples where regions with overly small or large curvatures $|\frac{\partial^2L}{\partial x^2}|$ distort the importance of samples.
|
This can be useful, but it can cause problems in examples where regions with overly small or large curvatures $|\frac{\partial^2L}{\partial x^2}|$ distort the importance of samples.
|
||||||
|
Loading…
Reference in New Issue
Block a user