SoL updates
This commit is contained in:
parent
f39cc81873
commit
b8f381b14a
@ -3,7 +3,7 @@
|
||||
"nbformat_minor": 0,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"name": "SoL-karman2d.ipynb",
|
||||
"name": "diffphys-code-sol.ipynb",
|
||||
"provenance": [],
|
||||
"collapsed_sections": []
|
||||
},
|
||||
@ -22,9 +22,11 @@
|
||||
"# Reducing Numerical Errors with Deep Learning\n",
|
||||
"\n",
|
||||
"Next, we'll target numerical errors that arise in the discretization of a continuous PDE $\\mathcal P^*$, i.e. when we formulate $\\mathcal P$. This approach will demonstrate that, despite the lack of closed-form descriptions, discretization errors often are functions with regular and repeating structures and, thus, can be learned by a neural network. Once the network is trained, it can be evaluated locally to improve the solution of a PDE-solver, i.e., to reduce its numerical error. The resulting method is a hybrid one: it will always run (a coarse) PDE solver, and then improve if at runtime with corrections inferred by an NN.\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"Pretty much all numerical methods contain some form of iterative process. That can be repeated updates over time for explicit solvers,or within a single update step for implicit solvers. Below we'll target iterations over time, an example for the second case could be found [here](https://github.com/tum-pbs/CG-Solver-in-the-Loop).\n",
|
||||
"\n",
|
||||
"## Problem Formulation\n",
|
||||
"\n",
|
||||
"In the context of reducing errors, it's crucial to have a _differentiable physics solver_, so that the learning process can take the reaction of the solver into account. This interaction is not possible with supervised learning or PINN training. Even small inference errors of a supervised NN can accumulate over time, and lead to a data distribution that differs from the distribution of the pre-computed data. This distribution shift can lead to sub-optimal results, or even cause blow-ups of the solver.\n",
|
||||
"\n",
|
||||
"In order to learn the error function, we'll consider two different discretizations of the same PDE $\\mathcal P^*$: \n",
|
||||
@ -36,7 +38,7 @@
|
||||
"\n",
|
||||
"```{figure} resources/diffphys-sol-manifolds.jpeg\n",
|
||||
"---\n",
|
||||
"height: 280px\n",
|
||||
"height: 150px\n",
|
||||
"name: diffphys-sol-manifolds\n",
|
||||
"---\n",
|
||||
"Visual overview of coarse and reference manifolds\n",
|
||||
@ -88,9 +90,7 @@
|
||||
"The overall learning goal now becomes\n",
|
||||
"\n",
|
||||
"$\n",
|
||||
"\\text{argmin}_\\theta | \n",
|
||||
"( \\pdec \\corr )^n ( \\project \\vr{t} )\n",
|
||||
"- \\project \\vr{t}|^2\n",
|
||||
"\\text{argmin}_\\theta | ( \\pdec \\corr )^n ( \\project \\vr{t} ) - \\project \\vr{t}|^2\n",
|
||||
"$\n",
|
||||
"\n",
|
||||
"A crucial bit here that's easy to overlook is that the correction depends on the modified states, i.e.\n",
|
||||
@ -102,10 +102,12 @@
|
||||
"**TL;DR**:\n",
|
||||
"We'll train a network $\\mathcal{C}$ to reduce the numerical errors of a simulator with a more accurate reference. Here it's crucial to have the _source_ solver realized as a differential physics operator, such that it can give gradients for an improved training of $\\mathcal{C}$.\n",
|
||||
"\n",
|
||||
"\\\\\n",
|
||||
"<br>\n",
|
||||
"\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Getting started with the Implementation\n",
|
||||
"\n",
|
||||
"First, let's download the prepared data set (for details on generation & loading cf. https://github.com/tum-pbs/Solver-in-the-Loop), and let's get the data handling out of the way, so that we can focus on the _interesting_ parts..."
|
||||
]
|
||||
},
|
||||
@ -130,7 +132,7 @@
|
||||
"with open('data-karman2d-train.pickle', 'rb') as f: dataPreloaded = pickle.load(f)\n",
|
||||
"print(\"Loaded data, {} training sims\".format(len(dataPreloaded)) )\n"
|
||||
],
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
@ -174,7 +176,7 @@
|
||||
"np.random.seed(42)\n",
|
||||
"tf.compat.v1.set_random_seed(42)\n"
|
||||
],
|
||||
"execution_count": 2,
|
||||
"execution_count": null,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
@ -212,6 +214,8 @@
|
||||
"id": "OhnzPdoww11P"
|
||||
},
|
||||
"source": [
|
||||
"## Simulation Setup\n",
|
||||
"\n",
|
||||
"Now we can set up the _source_ simulation $\\newcommand{\\pdec}{\\pde_{s}} \\pdec$. \n",
|
||||
"Note that we won't deal with \n",
|
||||
"$\\newcommand{\\pder}{\\pde_{r}} \\pder$\n",
|
||||
@ -259,7 +263,7 @@
|
||||
"\n",
|
||||
" return super().step(fluid=fluid, dt=dt, obstacles=[self.obst], gravity=gravity, density_effects=[self.infl], velocity_effects=())\n"
|
||||
],
|
||||
"execution_count": 3,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -268,6 +272,8 @@
|
||||
"id": "RYFUGICgxk0K"
|
||||
},
|
||||
"source": [
|
||||
"## Network Architecture\n",
|
||||
"\n",
|
||||
"We'll also define two alternative neural networks to represent \n",
|
||||
"$\\newcommand{\\vcN}{\\mathbf{s}} \\newcommand{\\corr}{\\mathcal{C}} \\corr$: \n",
|
||||
"\n",
|
||||
@ -296,7 +302,7 @@
|
||||
" keras.layers.Conv2D(filters=2, kernel_size=5, padding='same', activation=None), # u, v\n",
|
||||
" ])\n"
|
||||
],
|
||||
"execution_count": 4,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -352,7 +358,7 @@
|
||||
" l_output = keras.layers.Conv2D(filters=2, kernel_size=5, padding='same')(block_5)\n",
|
||||
" return keras.models.Model(inputs=l_input, outputs=l_output)\n"
|
||||
],
|
||||
"execution_count": 5,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -387,7 +393,7 @@
|
||||
"def to_staggered(tensor_cen, box):\n",
|
||||
" return StaggeredGrid(math.pad(tensor_cen, ((0,0), (0,1), (0,1), (0,0))), box=box)\n"
|
||||
],
|
||||
"execution_count": 12,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -398,6 +404,8 @@
|
||||
"source": [
|
||||
"---\n",
|
||||
"\n",
|
||||
"## Data Handling\n",
|
||||
"\n",
|
||||
"So far so good - we also need to take care of a few more mundane tasks, e.g. the some data handling and randomization. Below we define a `Dataset` class that stores all \"ground truth\" reference data (already downsampled).\n",
|
||||
"\n",
|
||||
"We actually have a lot of data dimensions: multiple simulations, with many time steps, each with different fields. This makes the code below a bit more difficult to read.\n",
|
||||
@ -477,7 +485,7 @@
|
||||
" def nextStep(self):\n",
|
||||
" self.stepIdx += 1\n"
|
||||
],
|
||||
"execution_count": 7,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -528,7 +536,7 @@
|
||||
" ]\n",
|
||||
" return [marker_dens, velocity, ext]\n"
|
||||
],
|
||||
"execution_count": 8,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -560,7 +568,7 @@
|
||||
"#print(format(getData(dataset,1)))\n",
|
||||
"#print(format(dataset.getData(1)))\n"
|
||||
],
|
||||
"execution_count": 9,
|
||||
"execution_count": null,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
@ -624,7 +632,7 @@
|
||||
"network.summary() \n",
|
||||
"\n"
|
||||
],
|
||||
"execution_count": 10,
|
||||
"execution_count": null,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
@ -665,6 +673,8 @@
|
||||
"id": "AbpNPzplQZMF"
|
||||
},
|
||||
"source": [
|
||||
"## Interleaving Simulation and Network\n",
|
||||
"\n",
|
||||
"Now comes the **most crucial** step in the whole setup: we define the chain of simulation steps and network evaluations to be used at training time. After all the work defining helper functions, it's acutally pretty simple: we loop over `msteps`, call the simulator via `KarmanFlow.step` for an input state, and afterwards evaluate the correction via `network(to_keras())`. The correction is then added to the last simultation state in the `prediction` list (we're actually simply overwriting the last simulated step `prediction[-1]` with `velocity + correction[-1]`.\n",
|
||||
"\n",
|
||||
"One other important things that's happening here is normalization: the inputs to the network are divided by the standard deviations in `dataset.dataStats`. This is slightly complicated as we have to append the scaling for the Reynolds numbers to the normalization for the velocity. After evaluating the `network`, we only have a velocity left, so we can simply multiply by the standard deviation again (`* dataset.dataStats['std'][1]`)."
|
||||
@ -702,7 +712,7 @@
|
||||
"\n",
|
||||
" prediction[-1] = prediction[-1].copied_with(velocity=prediction[-1].velocity + correction[-1])\n"
|
||||
],
|
||||
"execution_count": 13,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -729,7 +739,7 @@
|
||||
"]\n",
|
||||
"loss = tf.reduce_sum(loss_steps)/msteps\n"
|
||||
],
|
||||
"execution_count": 14,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -738,17 +748,15 @@
|
||||
"id": "E6Vly1_0QhZ1"
|
||||
},
|
||||
"source": [
|
||||
"## Training\n",
|
||||
"\n",
|
||||
"For the training, we use a standard Adam optimizer, and only run 4 epochs by default. This could (should) be increased for the larger network or to obtain more accurate results."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"id": "PuljFamYQksW",
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"outputId": "e71bcaae-187c-4c10-cee8-f03bb8964af0"
|
||||
"id": "PuljFamYQksW"
|
||||
},
|
||||
"source": [
|
||||
"lr = 1e-4\n",
|
||||
@ -771,19 +779,8 @@
|
||||
" ld_network = keras.models.load_model(output_dir+'/nn_epoch{:04d}.h5'.format(resume))\n",
|
||||
" network.set_weights(ld_network.get_weights())\n"
|
||||
],
|
||||
"execution_count": 15,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/session.py:28: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.\n",
|
||||
"\n",
|
||||
"WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/phi/tf/session.py:29: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.\n",
|
||||
"\n"
|
||||
],
|
||||
"name": "stdout"
|
||||
}
|
||||
]
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
@ -809,7 +806,7 @@
|
||||
" elif epoch == 10: lr *= 1e-1\n",
|
||||
" return lr\n"
|
||||
],
|
||||
"execution_count": 16,
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
@ -830,7 +827,7 @@
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"outputId": "3bea702a-14d0-43a7-ebc5-25289e27c5a5"
|
||||
"outputId": "148d951b-7070-4a95-c6d7-0fd91d29606e"
|
||||
},
|
||||
"source": [
|
||||
"current_lr = lr\n",
|
||||
@ -855,7 +852,7 @@
|
||||
" _, l2 = sess.run([train_step, loss], my_feed_dict)\n",
|
||||
" steps += 1\n",
|
||||
"\n",
|
||||
" if (j==0 and i<3) or (ib==0 and i%10==0):\n",
|
||||
" if (j==0 and i<3) or (j==0 and ib==0 and i%31==0) or (ib==0 and i%124==0):\n",
|
||||
" print('epoch {:03d}/{:03d}, batch {:03d}/{:03d}, step {:04d}/{:04d}: loss={}'.format( j+1, epochs, ib+1, dataset.numBatches, i+1, dataset.numSteps, l2 ))\n",
|
||||
" dataset.nextStep()\n",
|
||||
"\n",
|
||||
@ -863,7 +860,7 @@
|
||||
"\n",
|
||||
" if j%10==9: network.save(output_dir+'/nn_epoch{:04d}.h5'.format(j+1))\n",
|
||||
"\n",
|
||||
"#tf_writer_tr.close()\n",
|
||||
"# all done! save final version\n",
|
||||
"network.save(output_dir+'/final.h5')\n"
|
||||
],
|
||||
"execution_count": null,
|
||||
@ -871,11 +868,39 @@
|
||||
{
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"epoch 001/004, batch 001/002, step 0001/0496: loss=6816.912109375\n",
|
||||
"epoch 001/004, batch 001/002, step 0002/0496: loss=4036.171875\n",
|
||||
"epoch 001/004, batch 001/002, step 0003/0496: loss=1627.9716796875\n",
|
||||
"epoch 001/004, batch 001/002, step 0011/0496: loss=1403.9822998046875\n",
|
||||
"epoch 001/004, batch 001/002, step 0021/0496: loss=841.949951171875\n"
|
||||
"epoch 001/004, batch 001/002, step 0001/0496: loss=8114.626953125\n",
|
||||
"epoch 001/004, batch 001/002, step 0002/0496: loss=3371.28125\n",
|
||||
"epoch 001/004, batch 001/002, step 0003/0496: loss=1594.294189453125\n",
|
||||
"epoch 001/004, batch 001/002, step 0032/0496: loss=261.2645263671875\n",
|
||||
"epoch 001/004, batch 001/002, step 0063/0496: loss=124.70037078857422\n",
|
||||
"epoch 001/004, batch 001/002, step 0094/0496: loss=86.60037231445312\n",
|
||||
"epoch 001/004, batch 001/002, step 0125/0496: loss=93.21685028076172\n",
|
||||
"epoch 001/004, batch 001/002, step 0156/0496: loss=64.77877807617188\n",
|
||||
"epoch 001/004, batch 001/002, step 0187/0496: loss=58.933082580566406\n",
|
||||
"epoch 001/004, batch 001/002, step 0218/0496: loss=51.40797805786133\n",
|
||||
"epoch 001/004, batch 001/002, step 0249/0496: loss=42.819091796875\n",
|
||||
"epoch 001/004, batch 001/002, step 0280/0496: loss=46.30024719238281\n",
|
||||
"epoch 001/004, batch 001/002, step 0311/0496: loss=41.07358932495117\n",
|
||||
"epoch 001/004, batch 001/002, step 0342/0496: loss=40.12362289428711\n",
|
||||
"epoch 001/004, batch 001/002, step 0373/0496: loss=41.094932556152344\n",
|
||||
"epoch 001/004, batch 001/002, step 0404/0496: loss=36.17275619506836\n",
|
||||
"epoch 001/004, batch 001/002, step 0435/0496: loss=37.64105987548828\n",
|
||||
"epoch 001/004, batch 001/002, step 0466/0496: loss=33.44026184082031\n",
|
||||
"epoch 001/004, batch 002/002, step 0001/0496: loss=36.6204719543457\n",
|
||||
"epoch 001/004, batch 002/002, step 0002/0496: loss=29.037982940673828\n",
|
||||
"epoch 001/004, batch 002/002, step 0003/0496: loss=27.977163314819336\n",
|
||||
"epoch 002/004, batch 001/002, step 0001/0496: loss=13.540712356567383\n",
|
||||
"epoch 002/004, batch 001/002, step 0125/0496: loss=12.313040733337402\n",
|
||||
"epoch 002/004, batch 001/002, step 0249/0496: loss=11.129035949707031\n",
|
||||
"epoch 002/004, batch 001/002, step 0373/0496: loss=11.969249725341797\n",
|
||||
"epoch 003/004, batch 001/002, step 0001/0496: loss=8.394614219665527\n",
|
||||
"epoch 003/004, batch 001/002, step 0125/0496: loss=7.2177557945251465\n",
|
||||
"epoch 003/004, batch 001/002, step 0249/0496: loss=8.274188041687012\n",
|
||||
"epoch 003/004, batch 001/002, step 0373/0496: loss=9.177286148071289\n",
|
||||
"epoch 004/004, batch 001/002, step 0001/0496: loss=6.306344985961914\n",
|
||||
"epoch 004/004, batch 001/002, step 0125/0496: loss=4.158570289611816\n",
|
||||
"epoch 004/004, batch 001/002, step 0249/0496: loss=4.282064437866211\n",
|
||||
"epoch 004/004, batch 001/002, step 0373/0496: loss=5.2111334800720215\n"
|
||||
],
|
||||
"name": "stdout"
|
||||
}
|
||||
@ -887,7 +912,7 @@
|
||||
"id": "swG7GeDpWT_Z"
|
||||
},
|
||||
"source": [
|
||||
"The loss should go down from ca. 1000 initially to around 1. This is a good sign, but of course it's even more important to see how the resulting solver fares on new inputs.\n",
|
||||
"The loss should go down from above 1000 initially to below 10. This is a good sign, but of course it's even more important to see how the resulting solver fares on new inputs.\n",
|
||||
"\n",
|
||||
"Note that after training we're realized a hybrid solver, consisting of a regular _source_ simulator, and a network that was trained to specificially interact with this simulator for a chosen domain of simulation cases.\n",
|
||||
"\n",
|
||||
@ -897,7 +922,7 @@
|
||||
"\n",
|
||||
"## Next steps\n",
|
||||
"\n",
|
||||
"* Modify the training to further reduce the training error\n",
|
||||
"* Modify the training to further reduce the training error. With the medium network you should be able to get the loss down to around 1.\n",
|
||||
"\n",
|
||||
"* Export the network to the external github code, and run it on new wake flow cases. You'll see that a reduced training error not always directly correlates with an improved test performance\n",
|
||||
"\n",
|
||||
|
BIN
resources/diffphys-sol-domain.jpeg
Normal file
BIN
resources/diffphys-sol-domain.jpeg
Normal file
Binary file not shown.
After Width: | Height: | Size: 17 KiB |
@ -2,8 +2,8 @@ Supervised Training
|
||||
=======================
|
||||
|
||||
_Supervised_ here essentially means: "doing things the old fashioned way". Old fashioned in the context of
|
||||
deep learning (DL), of course, so it's still fairly new. Also, "old fashioned" of course also doesn't always mean bad
|
||||
- it's just that we'll be able to do better than simple supervised training later on.
|
||||
deep learning (DL), of course, so it's still fairly new. Also, "old fashioned" of course also doesn't
|
||||
always mean bad - it's just that we'll be able to do better than simple supervised training later on.
|
||||
|
||||
In a way, the viewpoint of "supervised training" is a starting point for all projects one would encounter in the context of DL, and
|
||||
hence is worth studying. And although it typically yields inferior results to approaches that more tightly
|
||||
|
Loading…
Reference in New Issue
Block a user