smaller updates to figures and captions

This commit is contained in:
NT
2021-04-01 16:53:41 +08:00
parent 1eba53dca5
commit 17389f35a3
5 changed files with 5 additions and 4 deletions

View File

@@ -49,7 +49,7 @@ In this case we compute a (potentially very long) sequence of PDE solver steps i
```{figure} resources/diffphys-multistep.jpg ```{figure} resources/diffphys-multistep.jpg
--- ---
height: 220px height: 180px
name: diffphys-mulitstep name: diffphys-mulitstep
--- ---
Time stepping with interleaved DP and NN operations for $k$ solver iterations. Time stepping with interleaved DP and NN operations for $k$ solver iterations.
@@ -59,7 +59,7 @@ Note that this picture (and the ones before) have assumed an _additive_ influenc
DP setups with many time steps can be difficult to train: the gradients need to backpropagate through the full chain of PDE solver evaluations and NN evaluations. Typically, each of them represents a non-linear and complex function. Hence for larger numbers of steps, the vanishing and exploding gradient problem can make training difficult (see {doc}`diffphys-code-sol` for some practical tipps how to alleviate this). DP setups with many time steps can be difficult to train: the gradients need to backpropagate through the full chain of PDE solver evaluations and NN evaluations. Typically, each of them represents a non-linear and complex function. Hence for larger numbers of steps, the vanishing and exploding gradient problem can make training difficult (see {doc}`diffphys-code-sol` for some practical tipps how to alleviate this).
## Alternatives - Noise ## Alternatives: Noise
It is worth mentioning here that other works have proposed perturbing the inputs and It is worth mentioning here that other works have proposed perturbing the inputs and
the iterations at training time with noise {cite}`sanchez2020learning` (somewhat similar to the iterations at training time with noise {cite}`sanchez2020learning` (somewhat similar to

View File

@@ -147,7 +147,7 @@ to compute the updates (and derivatives) for these operators.
%in practice break down into larger, monolithic components %in practice break down into larger, monolithic components
E.g., as this process is very similar to adjoint method optimizations, we can re-use many of the techniques E.g., as this process is very similar to adjoint method optimizations, we can re-use many of the techniques
that were developed in this field, or leverage established numerical methods. E.g., that were developed in this field, or leverage established numerical methods. E.g.,
we could leverage the $O(n)$ complexity of multigrid solvers for matrix inversion. we could leverage the $O(n)$ runtime of multigrid solvers for matrix inversion.
The flipside of this approach is, that it requires some understanding of the problem at hand, The flipside of this approach is, that it requires some understanding of the problem at hand,
and of the numerical methods. Also, a given solver might not provide gradient calculations out of the box. and of the numerical methods. Also, a given solver might not provide gradient calculations out of the box.
@@ -161,13 +161,14 @@ never produces the parameter $\nu$ in the example above, and it doesn't appear i
loss formulation, we will never encounter a $\partial/\partial \nu$ derivative loss formulation, we will never encounter a $\partial/\partial \nu$ derivative
in our backpropagation step. in our backpropagation step.
The following figure summarizes the DP-based learning approach, and illustrates the sequence of operations that are typically processed within a single PDE solve. As many of the operations are non-linear in practice, this often leads to a challenging learning task for the NN:
```{figure} resources/diffphys-overview.jpg ```{figure} resources/diffphys-overview.jpg
--- ---
height: 220px height: 220px
name: diffphys-full-overview name: diffphys-full-overview
--- ---
TODO , details... DP learning with a PDE solver that consists of $m$ individual operators $\mathcal P_i$. The gradient travels backward through all $m$ operators before influencing the network weights $\theta$.
``` ```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 177 KiB

After

Width:  |  Height:  |  Size: 235 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 101 KiB

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.