spellchecking notebooks

This commit is contained in:
NT 2021-06-24 17:17:23 +02:00
parent 79f5aa4e33
commit d9c92c40f6
6 changed files with 28 additions and 28 deletions

View File

@ -6,9 +6,9 @@
"source": [
"# Burgers Optimization with a Differentiable Physics Gradient\n",
"\n",
"To illustrate the process of computing gradients in a _differentiable physics_ (DP) setting, we target the same inverse problem (the reconstruction task) used for the PINN example in {doc}`physicalloss-code`. The choice of DP as a method has some immediate implications: we start with a discretized PDE, and the evolution of the system is now fully determined by the resulting numerical solver. Hence, the only real unknown is the initial state. We will still need to re-compute all the states betwwen the initial and target state many times, just now we won't need an NN for this step. Instead, we can rely on our discretized model. \n",
"To illustrate the process of computing gradients in a _differentiable physics_ (DP) setting, we target the same inverse problem (the reconstruction task) used for the PINN example in {doc}`physicalloss-code`. The choice of DP as a method has some immediate implications: we start with a discretized PDE, and the evolution of the system is now fully determined by the resulting numerical solver. Hence, the only real unknown is the initial state. We will still need to re-compute all the states between the initial and target state many times, just now we won't need an NN for this step. Instead, we can rely on our discretized model. \n",
"\n",
"Also, as we choose an initial discretization for the DP approach, the unknown initial state consists of the sampling points of the involved physical fields, and we can simply represent these unknowns as floating point variables. Hence, even for the initial state we do not need to set up an NN. Thus, our Burgers reconstruction problem reduces to a gradient-based opitmization without any NN when solving it with DP. Nonetheless, it's a very good starting point to illustrate the process.\n",
"Also, as we choose an initial discretization for the DP approach, the unknown initial state consists of the sampling points of the involved physical fields, and we can simply represent these unknowns as floating point variables. Hence, even for the initial state we do not need to set up an NN. Thus, our Burgers reconstruction problem reduces to a gradient-based optimization without any NN when solving it with DP. Nonetheless, it's a very good starting point to illustrate the process.\n",
"\n",
"First, we'll set up our discretized simulation. Here we can employ phiflow, as shown in the overview section on _Burgers forward simulations_. \n",
"[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/diffphys-code-burgers.ipynb)\n",
@ -19,7 +19,7 @@
"phiflow directly gives us a sequence of differentiable operations, provided that we don't use the _numpy_ backend.\n",
"The important step here is to include `phi.tf.flow` instad of `phi.flow` (for _pytorch_ you could use `phi.torch.flow`).\n",
"\n",
"So, as a first step, let's set up some constants, and intialize our domain with 128 points. We'll directly allocate a `velocity` field, and our constraint at $t=0.5$ (step 16), now as a `scalar_grid` in phiflow.\n"
"So, as a first step, let's set up some constants, and initialize our domain with 128 points. We'll directly allocate a `velocity` field, and our constraint at $t=0.5$ (step 16), now as a `scalar_grid` in phiflow.\n"
]
},
{
@ -123,9 +123,9 @@
"source": [
"Because we're only constraining timestep 16, we could actually omit steps 17 to 31 in this setup. They don't have any degrees of freedom and are not constrained in any way. However, for fairness regarding a comparison with the previous PINN case, we include them.\n",
"\n",
"Note that we've done a lot of calcuations here: first the 32 steps of our simulation, and then another 16 steps backwards from the loss. They were recorded by the gradient tape, and used to backpropagate the loss to the initial state of the simulation.\n",
"Note that we've done a lot of calculations here: first the 32 steps of our simulation, and then another 16 steps backwards from the loss. They were recorded by the gradient tape, and used to backpropagate the loss to the initial state of the simulation.\n",
"\n",
"Not surprisingly, because we're starting from zero, there's also a significant intial error of ca. 0.38 for the 16th simulation step.\n",
"Not surprisingly, because we're starting from zero, there's also a significant initial error of ca. 0.38 for the 16th simulation step.\n",
"\n",
"So what do we get as a gradient here? It has the same dimensions as the velocity, and we can easily visualize it:\n",
"Starting from the zero state for `velocity` (shown in blue), the first gradient is shown as a green line below. If you compare it with the solution it points in the opposite direction, as expected. The solution is much larger in magnitude, so we omit it here (see the next graph).\n"
@ -266,7 +266,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This seems to be going in the right direction! It's defintely not perfect, but we've only computed 5 GD update steps so far. The two peaks with a positive velocity on the left side of the shock and the negative peak on the right side are starting to show.\n",
"This seems to be going in the right direction! It's definitely not perfect, but we've only computed 5 GD update steps so far. The two peaks with a positive velocity on the left side of the shock and the negative peak on the right side are starting to show.\n",
"\n",
"This is a good indicator that the backpropagation of gradients through all of our 16 simulated steps is behaving correctly, and that it's driving the solution in the right direction. This hints at how powerful this setup is: the gradient that we obtain from each of the simulation steps (and each operation within them) can easily be chained together into more complex sequences. In the example above, we're backpropagating through all 16 steps of the simulation, and we could easily enlarge this \"look-ahead\" of the optimization with minor changes to the code.\n",
"\n"

View File

@ -185,7 +185,7 @@
"id": "VSisaefX_trh"
},
"source": [
"The last statement verifies that our `INFLOW` grid likewise has an `inflow_loc` dimension in addition to the spatial dimenions `x` and `y`. You can test for the existence of a tensor dimension in phiflow with the `.exists` boolean, which can be evaluated for any dimension name. E.g., above `INFLOW.inflow_loc.exists` will give `True`, while `INFLOW.some_unknown_dim.exists` will give `False`. The $^b$ superscript indicates that `inflow_loc` is a batch dimension.\n",
"The last statement verifies that our `INFLOW` grid likewise has an `inflow_loc` dimension in addition to the spatial dimensions `x` and `y`. You can test for the existence of a tensor dimension in phiflow with the `.exists` boolean, which can be evaluated for any dimension name. E.g., above `INFLOW.inflow_loc.exists` will give `True`, while `INFLOW.some_unknown_dim.exists` will give `False`. The $^b$ superscript indicates that `inflow_loc` is a batch dimension.\n",
"\n",
"Phiflow tensors are automatically broadcast to new dimensions via their names, and hence typically no-reshaping operations are required. E.g., you can easily add or multiply tensors with differing dimensions. Below we'll multiply a staggered grid with a tensor of ones along the `inflow_loc` dimension to get a staggered velocity that has `x,y,inflow_loc` as dimensions via `StaggeredGrid(...) * math.ones(batch(inflow_loc=4))`."
]
@ -275,7 +275,7 @@
"When evaluating the loss function we treat the reference state as an external constant via `field.stop_gradient()`.\n",
"As outlined at the top, $s$ is a function of $\\mathbf{u}$ (via the advection equation), which in turn is given by the Navier-Stokes equations. Thus, via a chain of multiple time steps $s$ depends in the initial velocity state $\\mathbf{u}_0$.\n",
"\n",
"It is important that our initial velocity has the `inflow_loc` dimension before we record the gradients, such that we have the full \"mini-batch\" of four versions of our velocity (three of which will be updated via gradients in our optimization later on). To get the approprate velocity tensor, we initialize a `StaggeredGrid` with a tensor of zeros along the `inflow_loc` batch dimension. As the staggered grid already has `y,x` and `vector` dimensions, this gives the desired four dimensions, as verified by the print statement below.\n",
"It is important that our initial velocity has the `inflow_loc` dimension before we record the gradients, such that we have the full \"mini-batch\" of four versions of our velocity (three of which will be updated via gradients in our optimization later on). To get the appropriate velocity tensor, we initialize a `StaggeredGrid` with a tensor of zeros along the `inflow_loc` batch dimension. As the staggered grid already has `y,x` and `vector` dimensions, this gives the desired four dimensions, as verified by the print statement below.\n",
"\n",
"Phiflow provides a unified API for gradients across different platforms by using functions that need to return a loss values, in addition to optional state values. It uses a loss function based interface, for which we define the `simulate` function below. `simulate` computes the $L^2$ error outlined above and returns the evolved `smoke` and `velocity` states after 20 simulation steps.\n"
]
@ -383,7 +383,7 @@
"id": "J7IgFG4o_trp"
},
"source": [
"Not surprisingly, the fourth gradient on the left is zero (it's already matching the reference). The other three gradients have detected variations for the intial round inflow posiitons shown as positive and negative regions around the circular shape of the inflow. The ones for the larger distances on the left are also noticeably larger."
"Not surprisingly, the fourth gradient on the left is zero (it's already matching the reference). The other three gradients have detected variations for the initial round inflow positions shown as positive and negative regions around the circular shape of the inflow. The ones for the larger distances on the left are also noticeably larger."
]
},
{
@ -400,7 +400,7 @@
"\n",
"This is a difficult task: the simulation is producing different dynamics due to the differing initial spatial density configuration. Our optimization should now find a single initial velocity state that gives the same state as the reference simulation at $t=20$. Thus, after 20 non-linear update steps the simulation should reproduce the desired marker density state. It would be much easier to simply change the position of the marker inflow to arrive at this goal, but -- to make things more difficult and interesting here -- the inflow is _not_ a degree of freedom. The optimizer can only change the initial velocity $\\mathbf{u}_0$.\n",
"\n",
"The following cell implements a simple steepest gradient descent optimization: it re-evalutes the gradient function, and iterates several times to optimize $\\mathbf{u}_0$ with a learning rate (step size) `LR`.\n",
"The following cell implements a simple steepest gradient descent optimization: it re-evaluates the gradient function, and iterates several times to optimize $\\mathbf{u}_0$ with a learning rate (step size) `LR`.\n",
"\n",
"`field.functional_gradient` has a parameter `get_output` that determines whether the original results of the function (`simulate()` in our case) are returned, or only the gradient. As it's interesting to track how the loss evolves over the course of the iterations, let's redefine the gradient function with `get_output=True`. \n"
]
@ -645,7 +645,7 @@
"source": [
"## Conclusions\n",
"\n",
"This example illustrated how the differentiable physics approach can easily be extended towards significanlty more \n",
"This example illustrated how the differentiable physics approach can easily be extended towards significantly more \n",
"complex PDEs. Above, we've optimized for a mini-batch of 20 steps of a full Navier-Stokes solver.\n",
"\n",
"This is a powerful basis to bring NNs into the picture. As you might have noticed, our degrees of freedom were still a regular grid, and we've jointly solved a single inverse problem. There were three cases to solve as a mini-batch, of course, but nonetheless the setup still represents a direct optimization. Thus, in line with the PINN example of {doc}`physicalloss-code` we've not really dealt with a _machine learning_ task here. However, that will be the topic of the next chapters.\n"

View File

@ -210,7 +210,7 @@
"Domain setup for the wake flow case.\n",
"```\n",
"\n",
"The solver applies inflow boundary conditions for the y-velocity with a pre-multiplied mask (`velBCy,velBCyMask`), to set the y components at the bottom of the domai. It then calls `super().step()` to run the _regular_ phiflow fluid solving step.\n"
"The solver applies inflow boundary conditions for the y-velocity with a pre-multiplied mask (`velBCy,velBCyMask`), to set the y components at the bottom of the domain. It then calls `super().step()` to run the _regular_ phiflow fluid solving step.\n"
]
},
{
@ -648,7 +648,7 @@
"source": [
"## Interleaving simulation and NN\n",
"\n",
"Now comes the **most crucial** step in the whole setup: we define the chain of simulation steps and network evaluations to be used at training time. After all the work defining helper functions, it's acutally pretty simple: we loop over `msteps`, call the simulator via `KarmanFlow.step` for an input state, and afterwards evaluate the correction via `network(to_keras(...))`. The NN correction is then added to the last simulation state in the `prediction` list (we're actually simply overwriting the last simulated step `prediction[-1]` with `velocity + correction[-1]`.\n",
"Now comes the **most crucial** step in the whole setup: we define the chain of simulation steps and network evaluations to be used at training time. After all the work defining helper functions, it's actually pretty simple: we loop over `msteps`, call the simulator via `KarmanFlow.step` for an input state, and afterwards evaluate the correction via `network(to_keras(...))`. The NN correction is then added to the last simulation state in the `prediction` list (we're actually simply overwriting the last simulated step `prediction[-1]` with `velocity + correction[-1]`.\n",
"\n",
"One other important thing that's happening here is normalization: the inputs to the network are divided by the standard deviations in `dataset.dataStats`. This is slightly complicated as we have to append the scaling for the Reynolds numbers to the normalization for the velocity. After evaluating the `network`, we only have a velocity left, so we can simply multiply by the standard deviation of the velocity again (via `* dataset.dataStats['std'][1]`)."
]
@ -847,7 +847,7 @@
"source": [
"The loss should go down from above 1000 initially to below 10. This is a good sign, but of course it's even more important to see how the resulting solver fares on new inputs.\n",
"\n",
"Note that with this training approach we've realized a hybrid solver, consisting of a regular _source_ simulator, and a network that was trained to specificially interact with this simulator for a chosen domain of simulation cases.\n",
"Note that with this training approach we've realized a hybrid solver, consisting of a regular _source_ simulator, and a network that was trained to specifically interact with this simulator for a chosen domain of simulation cases.\n",
"\n",
"Let's see how well this works by applying it to a set of test data inputs with new Reynolds numbers that were not part of the training data.\n",
"\n",
@ -915,7 +915,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we also need a set of new data to apply the learned solver to. Below, we'll download a new set of Reynolds numbers (inbetween the ones used for training), so that we can later on run the unmodified simulator and the DL-powered one on these cases.\n",
"Now we also need a set of new data to apply the learned solver to. Below, we'll download a new set of Reynolds numbers (in between the ones used for training), so that we can later on run the unmodified simulator and the DL-powered one on these cases.\n",
"\n",
"We're creating a new dataset object `dataset_test` here, which organizes the data. We're simply using the first batch of the unshuffled dataset, though."
]
@ -993,7 +993,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The way the evaluation is implemented above, we can only evalute a proper source simulation (i.e. one that is not modified by the NN) for `evalsteps=1`."
"The way the evaluation is implemented above, we can only evaluate a proper source simulation (i.e. one that is not modified by the NN) for `evalsteps=1`."
]
},
{
@ -1220,7 +1220,7 @@
"\n",
"* Turn off the differentiable physics training (by setting `msteps=1`), and compare it with the DP version.\n",
"\n",
"* Likewise, train a model with a larger `msteps` setting, e.g., 8 or 16. Note that due to the recurrent nature of the training, you'll probabaly have to load a pre-trained state to stabilize the first iterations."
"* Likewise, train a model with a larger `msteps` setting, e.g., 8 or 16. Note that due to the recurrent nature of the training, you'll probably have to load a pre-trained state to stabilize the first iterations."
]
}
],

View File

@ -8,7 +8,7 @@
"source": [
"# Solving Inverse Problems with NNs\n",
"\n",
"Inverse problems encompass a large class of practical applications in science. In general, the goal here is not to directly compute a physical field like the velocity at a future time (this is the typical scenario for a _forward_ solve), but instead more generically compute one or more parameters in the model equations such that certain constraints are fulfilled. A very common objective is to find the optimal setting for a single parameter given some constraints. E.g., this could be the global diffusion constant for an advection-diffusion model such that it fits measured data as accurately as possible. Inverse problems are encountered for any model parameter adjusted via observations, or the reconstruction of initial conditions, e.g., for particle imaging velocimetry (PIV). More complex cases aim for computing boundary geometries w.r.t. optmal conditions, e.g. to obtain a shape with minimal drag in a fluid flow.\n",
"Inverse problems encompass a large class of practical applications in science. In general, the goal here is not to directly compute a physical field like the velocity at a future time (this is the typical scenario for a _forward_ solve), but instead more generically compute one or more parameters in the model equations such that certain constraints are fulfilled. A very common objective is to find the optimal setting for a single parameter given some constraints. E.g., this could be the global diffusion constant for an advection-diffusion model such that it fits measured data as accurately as possible. Inverse problems are encountered for any model parameter adjusted via observations, or the reconstruction of initial conditions, e.g., for particle imaging velocimetry (PIV). More complex cases aim for computing boundary geometries w.r.t. optimal conditions, e.g. to obtain a shape with minimal drag in a fluid flow.\n",
"\n",
"A key aspect below will be that we're not aiming for solving only a _single instance_ of an inverse problem, but we'd like to use deep learning to solve a _larger collection_ of inverse problems. Thus, unlike the physics-informed example of {doc}`physicalloss-code` or the differentiable physics (DP) optimization of {doc}`diffphys-code-ns`, where we've solved an optimization problem for specific instances of inverse problems, we now aim for training an NN that learns to solve a larger class of inverse problems, i.e., a whole solution manifold. Nonetheless, we of course need to rely on a certain degree of similarity for these problems, otherwise there's nothing to learn (and the implied assumption of continuity in the solution manifold breaks down).\n",
"\n",
@ -26,7 +26,7 @@
"source": [
"## Formulation\n",
"\n",
"With the notation from {doc}`overview-equations` this gives the minmization problem \n",
"With the notation from {doc}`overview-equations` this gives the minimization problem \n",
"\n",
"$$\n",
"\\text{arg min}_{\\theta} \\sum_m \\sum_i (f(x_{m,i} ; \\theta)-y^*_{m,i})^2 , \n",
@ -85,7 +85,7 @@
"\n",
"The code below replicates an inverse problem example (the shape transitions experiment) from [Learning to Control PDEs with Differentiable Physics](https://ge.in.tum.de/publications/2020-iclr-holl/) {cite}`holl2019pdecontrol`, further details can be found in section D.2 of the paper's [appendix](https://openreview.net/pdf?id=HyeSin4FPB).\n",
"\n",
"First we need to load phiflow and check out the _PDE-Control_ git repository, which also contains some numpy arrays with intial shapes."
"First we need to load phiflow and check out the _PDE-Control_ git repository, which also contains some numpy arrays with initial shapes."
]
},
{

View File

@ -17,7 +17,7 @@
"the time interval $t \\in [0,1]$.\n",
"\n",
"Note that similar to the previous forward simulation example, \n",
"we will still be sampling the solution with 128 points ($n=128$), but now we have a discretization via the NN. So we could also sample points inbetween without having to explicitly choose a basis function for interpolation. The discretization via the NN now internally determines how to use its degrees of freedom to arrange the activation functions as basis functions. So we have no direct control over the reconstruction.\n",
"we will still be sampling the solution with 128 points ($n=128$), but now we have a discretization via the NN. So we could also sample points in between without having to explicitly choose a basis function for interpolation. The discretization via the NN now internally determines how to use its degrees of freedom to arrange the activation functions as basis functions. So we have no direct control over the reconstruction.\n",
"[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/physicalloss-code.ipynb)\n",
"\n"
]
@ -287,7 +287,7 @@
"source": [
"## Loss function and training\n",
"\n",
"As objective for the learning process we can now combine the _direct_ constraints, i.e., the solution at $t=0.5$ and the Dirchlet $u=0$ boundary conditions with the loss from the PDE residuals. For both boundary constraints we'll use 100 points below, and then sample the solution in the inner region with an additional 1000 points.\n",
"As objective for the learning process we can now combine the _direct_ constraints, i.e., the solution at $t=0.5$ and the Dirichlet $u=0$ boundary conditions with the loss from the PDE residuals. For both boundary constraints we'll use 100 points below, and then sample the solution in the inner region with an additional 1000 points.\n",
"\n",
"The direct constraints are evaluated via `network(x, t)[:, 0] - u`, where `x` and `t` are the space-time location where we'd like to sample the solution, and `u` provides the corresponding ground truth value.\n",
"\n",
@ -334,7 +334,7 @@
"source": [
"The code above just initializes the evaluation of the loss, we still didn't do any optimization steps, but we're finally in a good position to get started with this.\n",
"\n",
"Despite the simple equation, the convergence is typicaly very slow. The iterations themselves are fast to compute, but this setup needs a _lot_ of iterations. To keep the runtime in a reasonable range, we only do 10k iterations by default below (`ITERS`). You can increase this value to get better results."
"Despite the simple equation, the convergence is typically very slow. The iterations themselves are fast to compute, but this setup needs a _lot_ of iterations. To keep the runtime in a reasonable range, we only do 10k iterations by default below (`ITERS`). You can increase this value to get better results."
]
},
{
@ -559,7 +559,7 @@
"Especially the maximum / minimum at $x=\\pm 1/2$ are far off, and the boundaries at $x=\\pm 1$ are not fulfilled: the solution is not at zero.\n",
"\n",
"We have the forward simulator for this simulation, so we can use the $t=0$ solution of the network to \n",
"evaluate how well the temporal evoluation was reconstructed. This measures how well the temporal evolution of the model equation was captured via the soft constraints of the PINN loss.\n",
"evaluate how well the temporal evaluation was reconstructed. This measures how well the temporal evolution of the model equation was captured via the soft constraints of the PINN loss.\n",
"\n",
"The graph below shows the initial state in blue, and two evolved states at $t=8/32$ and $t=15/32$. Note that this is all from the simulated version, we'll show the learned version next. \n",
"\n",
@ -816,7 +816,7 @@
"source": [
"Next, we'll store the full solution over the course of the $t=0 \\dots 1$ time interval, so that we can compare it later on to the full solution from a regular forward solve and compare it to the differential physics solution.\n",
"\n",
"Thus, stay tuned for the full evaluation and the comparison. This will follow in {doc}`diffphys-code-tf`, after we've discussed the details of how to run the differential physics optimization."
"Thus, stay tuned for the full evaluation and the comparison. This will follow in {doc}`diffphys-code-burgers`, after we've discussed the details of how to run the differential physics optimization."
]
},
{

View File

@ -195,7 +195,7 @@
"* `step` takes an action given by the agent and iterates the environment to the next state. It returns the resulting observation, the received reward, a flag determining whether a terminal state has been reached and a dictionary for debugging and logging information. \n",
"* `render` is called to display the current environment state in a way the creator of the environment specifies. This function can be used to inspect the training results.\n",
"\n",
"`stable-baselines3` expands on the default gym environment by providing an interface for vectorized environments. This makes it possible to compute the forward pass for multiple trajectories simultaneously which can in turn increase time efficiency because of better resource utilization. In practice, this means that the methods now work on vectors of observations, actions, rewards, terminal state flags and info dictionarys. The step method is split into `step_async` and `step_wait`, making it possible to run individual instances of the environment on different threads.\n",
"`stable-baselines3` expands on the default gym environment by providing an interface for vectorized environments. This makes it possible to compute the forward pass for multiple trajectories simultaneously which can in turn increase time efficiency because of better resource utilization. In practice, this means that the methods now work on vectors of observations, actions, rewards, terminal state flags and info dictionaries. The step method is split into `step_async` and `step_wait`, making it possible to run individual instances of the environment on different threads.\n",
"\n",
"### Physics Simulation \n",
"\n",
@ -288,7 +288,7 @@
"source": [
"## RL Evaluation\n",
"\n",
"Now that we have a trainned model, let's take a look at the results. The leftmost plot shows the results of the reinforcement learning agent. As reference, next to it are shown the ground truth, i.e. the trajectory the agent should reconstruct, and the uncontrolled simulation where the system follows its natural evolution."
"Now that we have a trained model, let's take a look at the results. The leftmost plot shows the results of the reinforcement learning agent. As reference, next to it are shown the ground truth, i.e. the trajectory the agent should reconstruct, and the uncontrolled simulation where the system follows its natural evolution."
]
},
{
@ -542,7 +542,7 @@
"source": [
"## Comparison between RL and DP\n",
"\n",
"Next, the results of both methods are compared in terms of visual quality of the resulting trajectories as well as quantitatively via the amounf of generated forces. The latter provides insights about the performance of either approaches as both methods aspire to minimize this metric during training. This is also important as the task is trivially solved with by applying a huge force at the last time step. Rather, an ideal solution takes into account the dynamics of the PDE to apply as little forces as possible. Hence, this metric is a very good one to measure how well the network has learned about the underlying physical environment (Burgers equation in this example).\n",
"Next, the results of both methods are compared in terms of visual quality of the resulting trajectories as well as quantitatively via the amount of generated forces. The latter provides insights about the performance of either approaches as both methods aspire to minimize this metric during training. This is also important as the task is trivially solved with by applying a huge force at the last time step. Rather, an ideal solution takes into account the dynamics of the PDE to apply as little forces as possible. Hence, this metric is a very good one to measure how well the network has learned about the underlying physical environment (Burgers equation in this example).\n",
"\n",
"\n"
]