SoL updates

This commit is contained in:
NT 2021-01-26 11:20:00 +08:00
parent f39cc81873
commit b8f381b14a
3 changed files with 75 additions and 50 deletions

View File

@ -3,7 +3,7 @@
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "SoL-karman2d.ipynb",
"name": "diffphys-code-sol.ipynb",
"provenance": [],
"collapsed_sections": []
},
@ -22,9 +22,11 @@
"# Reducing Numerical Errors with Deep Learning\n",
"\n",
"Next, we'll target numerical errors that arise in the discretization of a continuous PDE $\\mathcal P^*$, i.e. when we formulate $\\mathcal P$. This approach will demonstrate that, despite the lack of closed-form descriptions, discretization errors often are functions with regular and repeating structures and, thus, can be learned by a neural network. Once the network is trained, it can be evaluated locally to improve the solution of a PDE-solver, i.e., to reduce its numerical error. The resulting method is a hybrid one: it will always run (a coarse) PDE solver, and then improve if at runtime with corrections inferred by an NN.\n",
"\n",
" \n",
"Pretty much all numerical methods contain some form of iterative process. That can be repeated updates over time for explicit solvers,or within a single update step for implicit solvers. Below we'll target iterations over time, an example for the second case could be found [here](https://github.com/tum-pbs/CG-Solver-in-the-Loop).\n",
"\n",
"## Problem Formulation\n",
"\n",
"In the context of reducing errors, it's crucial to have a _differentiable physics solver_, so that the learning process can take the reaction of the solver into account. This interaction is not possible with supervised learning or PINN training. Even small inference errors of a supervised NN can accumulate over time, and lead to a data distribution that differs from the distribution of the pre-computed data. This distribution shift can lead to sub-optimal results, or even cause blow-ups of the solver.\n",
"\n",
"In order to learn the error function, we'll consider two different discretizations of the same PDE $\\mathcal P^*$: \n",
@ -36,7 +38,7 @@
"\n",
"```{figure} resources/diffphys-sol-manifolds.jpeg\n",
"---\n",
"height: 280px\n",
"height: 150px\n",
"name: diffphys-sol-manifolds\n",
"---\n",
"Visual overview of coarse and reference manifolds\n",
@ -88,9 +90,7 @@
"The overall learning goal now becomes\n",
"\n",
"$\n",
"\\text{argmin}_\\theta | \n",
"( \\pdec \\corr )^n ( \\project \\vr{t} )\n",
"- \\project \\vr{t}|^2\n",
"\\text{argmin}_\\theta | ( \\pdec \\corr )^n ( \\project \\vr{t} ) - \\project \\vr{t}|^2\n",
"$\n",
"\n",
"A crucial bit here that's easy to overlook is that the correction depends on the modified states, i.e.\n",
@ -102,10 +102,12 @@
"**TL;DR**:\n",
"We'll train a network $\\mathcal{C}$ to reduce the numerical errors of a simulator with a more accurate reference. Here it's crucial to have the _source_ solver realized as a differential physics operator, such that it can give gradients for an improved training of $\\mathcal{C}$.\n",
"\n",
"\\\\\n",
"<br>\n",
"\n",
"---\n",
"\n",
"## Getting started with the Implementation\n",
"\n",
"First, let's download the prepared data set (for details on generation & loading cf. https://github.com/tum-pbs/Solver-in-the-Loop), and let's get the data handling out of the way, so that we can focus on the _interesting_ parts..."
]
},
@ -130,7 +132,7 @@
"with open('data-karman2d-train.pickle', 'rb') as f: dataPreloaded = pickle.load(f)\n",
"print(\"Loaded data, {} training sims\".format(len(dataPreloaded)) )\n"
],
"execution_count": 1,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
@ -174,7 +176,7 @@
"np.random.seed(42)\n",
"tf.compat.v1.set_random_seed(42)\n"
],
"execution_count": 2,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
@ -212,6 +214,8 @@
"id": "OhnzPdoww11P"
},
"source": [
"## Simulation Setup\n",
"\n",
"Now we can set up the _source_ simulation $\\newcommand{\\pdec}{\\pde_{s}} \\pdec$. \n",
"Note that we won't deal with \n",
"$\\newcommand{\\pder}{\\pde_{r}} \\pder$\n",
@ -259,7 +263,7 @@
"\n",
" return super().step(fluid=fluid, dt=dt, obstacles=[self.obst], gravity=gravity, density_effects=[self.infl], velocity_effects=())\n"
],
"execution_count": 3,
"execution_count": null,
"outputs": []
},
{
@ -268,6 +272,8 @@
"id": "RYFUGICgxk0K"
},
"source": [
"## Network Architecture\n",
"\n",
"We'll also define two alternative neural networks to represent \n",
"$\\newcommand{\\vcN}{\\mathbf{s}} \\newcommand{\\corr}{\\mathcal{C}} \\corr$: \n",
"\n",
@ -296,7 +302,7 @@
" keras.layers.Conv2D(filters=2, kernel_size=5, padding='same', activation=None), # u, v\n",
" ])\n"
],
"execution_count": 4,
"execution_count": null,
"outputs": []
},
{
@ -352,7 +358,7 @@
" l_output = keras.layers.Conv2D(filters=2, kernel_size=5, padding='same')(block_5)\n",
" return keras.models.Model(inputs=l_input, outputs=l_output)\n"
],
"execution_count": 5,
"execution_count": null,
"outputs": []
},
{
@ -387,7 +393,7 @@
"def to_staggered(tensor_cen, box):\n",
" return StaggeredGrid(math.pad(tensor_cen, ((0,0), (0,1), (0,1), (0,0))), box=box)\n"
],
"execution_count": 12,
"execution_count": null,
"outputs": []
},
{
@ -398,6 +404,8 @@
"source": [
"---\n",
"\n",
"## Data Handling\n",
"\n",
"So far so good - we also need to take care of a few more mundane tasks, e.g. the some data handling and randomization. Below we define a `Dataset` class that stores all \"ground truth\" reference data (already downsampled).\n",
"\n",
"We actually have a lot of data dimensions: multiple simulations, with many time steps, each with different fields. This makes the code below a bit more difficult to read.\n",
@ -477,7 +485,7 @@
" def nextStep(self):\n",
" self.stepIdx += 1\n"
],
"execution_count": 7,
"execution_count": null,
"outputs": []
},
{
@ -528,7 +536,7 @@
" ]\n",
" return [marker_dens, velocity, ext]\n"
],
"execution_count": 8,
"execution_count": null,
"outputs": []
},
{
@ -560,7 +568,7 @@
"#print(format(getData(dataset,1)))\n",
"#print(format(dataset.getData(1)))\n"
],
"execution_count": 9,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
@ -624,7 +632,7 @@
"network.summary() \n",
"\n"
],
"execution_count": 10,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
@ -665,6 +673,8 @@
"id": "AbpNPzplQZMF"
},
"source": [
"## Interleaving Simulation and Network\n",
"\n",
"Now comes the **most crucial** step in the whole setup: we define the chain of simulation steps and network evaluations to be used at training time. After all the work defining helper functions, it's acutally pretty simple: we loop over `msteps`, call the simulator via `KarmanFlow.step` for an input state, and afterwards evaluate the correction via `network(to_keras())`. The correction is then added to the last simultation state in the `prediction` list (we're actually simply overwriting the last simulated step `prediction[-1]` with `velocity + correction[-1]`.\n",
"\n",
"One other important things that's happening here is normalization: the inputs to the network are divided by the standard deviations in `dataset.dataStats`. This is slightly complicated as we have to append the scaling for the Reynolds numbers to the normalization for the velocity. After evaluating the `network`, we only have a velocity left, so we can simply multiply by the standard deviation again (`* dataset.dataStats['std'][1]`)."
@ -702,7 +712,7 @@
"\n",
" prediction[-1] = prediction[-1].copied_with(velocity=prediction[-1].velocity + correction[-1])\n"
],
"execution_count": 13,
"execution_count": null,
"outputs": []
},
{
@ -729,7 +739,7 @@
"]\n",
"loss = tf.reduce_sum(loss_steps)/msteps\n"
],
"execution_count": 14,
"execution_count": null,
"outputs": []
},
{
@ -738,17 +748,15 @@
"id": "E6Vly1_0QhZ1"
},
"source": [
"## Training\n",
"\n",
"For the training, we use a standard Adam optimizer, and only run 4 epochs by default. This could (should) be increased for the larger network or to obtain more accurate results."
]
},
{
"cell_type": "code",
"metadata": {
"id": "PuljFamYQksW",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "e71bcaae-187c-4c10-cee8-f03bb8964af0"
"id": "PuljFamYQksW"
},
"source": [
"lr = 1e-4\n",
@ -771,19 +779,8 @@
" ld_network = keras.models.load_model(output_dir+'/nn_epoch{:04d}.h5'.format(resume))\n",
" network.set_weights(ld_network.get_weights())\n"
],
"execution_count": 15,
"outputs": [
{
"output_type": "stream",
"text": [
"WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/session.py:28: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.\n",
"\n",
"WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/session.py:29: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.\n",
"\n"
],
"name": "stdout"
}
]
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
@ -809,7 +806,7 @@
" elif epoch == 10: lr *= 1e-1\n",
" return lr\n"
],
"execution_count": 16,
"execution_count": null,
"outputs": []
},
{
@ -830,7 +827,7 @@
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "3bea702a-14d0-43a7-ebc5-25289e27c5a5"
"outputId": "148d951b-7070-4a95-c6d7-0fd91d29606e"
},
"source": [
"current_lr = lr\n",
@ -855,7 +852,7 @@
" _, l2 = sess.run([train_step, loss], my_feed_dict)\n",
" steps += 1\n",
"\n",
" if (j==0 and i<3) or (ib==0 and i%10==0):\n",
" if (j==0 and i<3) or (j==0 and ib==0 and i%31==0) or (ib==0 and i%124==0):\n",
" print('epoch {:03d}/{:03d}, batch {:03d}/{:03d}, step {:04d}/{:04d}: loss={}'.format( j+1, epochs, ib+1, dataset.numBatches, i+1, dataset.numSteps, l2 ))\n",
" dataset.nextStep()\n",
"\n",
@ -863,7 +860,7 @@
"\n",
" if j%10==9: network.save(output_dir+'/nn_epoch{:04d}.h5'.format(j+1))\n",
"\n",
"#tf_writer_tr.close()\n",
"# all done! save final version\n",
"network.save(output_dir+'/final.h5')\n"
],
"execution_count": null,
@ -871,11 +868,39 @@
{
"output_type": "stream",
"text": [
"epoch 001/004, batch 001/002, step 0001/0496: loss=6816.912109375\n",
"epoch 001/004, batch 001/002, step 0002/0496: loss=4036.171875\n",
"epoch 001/004, batch 001/002, step 0003/0496: loss=1627.9716796875\n",
"epoch 001/004, batch 001/002, step 0011/0496: loss=1403.9822998046875\n",
"epoch 001/004, batch 001/002, step 0021/0496: loss=841.949951171875\n"
"epoch 001/004, batch 001/002, step 0001/0496: loss=8114.626953125\n",
"epoch 001/004, batch 001/002, step 0002/0496: loss=3371.28125\n",
"epoch 001/004, batch 001/002, step 0003/0496: loss=1594.294189453125\n",
"epoch 001/004, batch 001/002, step 0032/0496: loss=261.2645263671875\n",
"epoch 001/004, batch 001/002, step 0063/0496: loss=124.70037078857422\n",
"epoch 001/004, batch 001/002, step 0094/0496: loss=86.60037231445312\n",
"epoch 001/004, batch 001/002, step 0125/0496: loss=93.21685028076172\n",
"epoch 001/004, batch 001/002, step 0156/0496: loss=64.77877807617188\n",
"epoch 001/004, batch 001/002, step 0187/0496: loss=58.933082580566406\n",
"epoch 001/004, batch 001/002, step 0218/0496: loss=51.40797805786133\n",
"epoch 001/004, batch 001/002, step 0249/0496: loss=42.819091796875\n",
"epoch 001/004, batch 001/002, step 0280/0496: loss=46.30024719238281\n",
"epoch 001/004, batch 001/002, step 0311/0496: loss=41.07358932495117\n",
"epoch 001/004, batch 001/002, step 0342/0496: loss=40.12362289428711\n",
"epoch 001/004, batch 001/002, step 0373/0496: loss=41.094932556152344\n",
"epoch 001/004, batch 001/002, step 0404/0496: loss=36.17275619506836\n",
"epoch 001/004, batch 001/002, step 0435/0496: loss=37.64105987548828\n",
"epoch 001/004, batch 001/002, step 0466/0496: loss=33.44026184082031\n",
"epoch 001/004, batch 002/002, step 0001/0496: loss=36.6204719543457\n",
"epoch 001/004, batch 002/002, step 0002/0496: loss=29.037982940673828\n",
"epoch 001/004, batch 002/002, step 0003/0496: loss=27.977163314819336\n",
"epoch 002/004, batch 001/002, step 0001/0496: loss=13.540712356567383\n",
"epoch 002/004, batch 001/002, step 0125/0496: loss=12.313040733337402\n",
"epoch 002/004, batch 001/002, step 0249/0496: loss=11.129035949707031\n",
"epoch 002/004, batch 001/002, step 0373/0496: loss=11.969249725341797\n",
"epoch 003/004, batch 001/002, step 0001/0496: loss=8.394614219665527\n",
"epoch 003/004, batch 001/002, step 0125/0496: loss=7.2177557945251465\n",
"epoch 003/004, batch 001/002, step 0249/0496: loss=8.274188041687012\n",
"epoch 003/004, batch 001/002, step 0373/0496: loss=9.177286148071289\n",
"epoch 004/004, batch 001/002, step 0001/0496: loss=6.306344985961914\n",
"epoch 004/004, batch 001/002, step 0125/0496: loss=4.158570289611816\n",
"epoch 004/004, batch 001/002, step 0249/0496: loss=4.282064437866211\n",
"epoch 004/004, batch 001/002, step 0373/0496: loss=5.2111334800720215\n"
],
"name": "stdout"
}
@ -887,7 +912,7 @@
"id": "swG7GeDpWT_Z"
},
"source": [
"The loss should go down from ca. 1000 initially to around 1. This is a good sign, but of course it's even more important to see how the resulting solver fares on new inputs.\n",
"The loss should go down from above 1000 initially to below 10. This is a good sign, but of course it's even more important to see how the resulting solver fares on new inputs.\n",
"\n",
"Note that after training we're realized a hybrid solver, consisting of a regular _source_ simulator, and a network that was trained to specificially interact with this simulator for a chosen domain of simulation cases.\n",
"\n",
@ -897,7 +922,7 @@
"\n",
"## Next steps\n",
"\n",
"* Modify the training to further reduce the training error\n",
"* Modify the training to further reduce the training error. With the medium network you should be able to get the loss down to around 1.\n",
"\n",
"* Export the network to the external github code, and run it on new wake flow cases. You'll see that a reduced training error not always directly correlates with an improved test performance\n",
"\n",

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

View File

@ -2,8 +2,8 @@ Supervised Training
=======================
_Supervised_ here essentially means: "doing things the old fashioned way". Old fashioned in the context of
deep learning (DL), of course, so it's still fairly new. Also, "old fashioned" of course also doesn't always mean bad
- it's just that we'll be able to do better than simple supervised training later on.
deep learning (DL), of course, so it's still fairly new. Also, "old fashioned" of course also doesn't
always mean bad - it's just that we'll be able to do better than simple supervised training later on.
In a way, the viewpoint of "supervised training" is a starting point for all projects one would encounter in the context of DL, and
hence is worth studying. And although it typically yields inferior results to approaches that more tightly