updated diffusion time prediction and stability sections

2024-12-13 11:37:28 +08:00 · 2024-12-13 11:37:28 +08:00 · c469ed4e14
commit c469ed4e14
parent 47a51ba60c
5 changed files with 59 additions and 23 deletions
--- a/_config.yml
+++ b/_config.yml
@ -34,3 +34,12 @@ html:
  use_issues_button: true
  use_repository_button: true
  favicon: "favicon.ico"
+
+# for $$ equations in text
+parse:
+  myst_dmath_double_inline: true
+
+sphinx:
+  extra_extensions:
+    - sphinx_proof
+
--- a/_toc.yml
+++ b/_toc.yml
@ -37,8 +37,8 @@ parts:
  chapters:
  - file: probmodels-intro.md
  - file: probmodels-ddpm-fm.ipynb
-  - file: bayesian-intro.md
-  - file: bayesian-code.ipynb
+  - file: probmodels-time.ipynb
+  - file: probmodels-uncond.md
 - caption: Reinforcement Learning
  chapters:
  - file: reinflearn-intro.md
--- a/probmodels-ddpm-fm.ipynb
+++ b/probmodels-ddpm-fm.ipynb
@ -6,13 +6,13 @@
    "id": "q4kVgOM2pyzJ"
   },
   "source": [
-    "# From DDPM to Flow Matching for Airfoil RANS Flows\n",
+    "# From DDPM to Flow Matching\n",
    "\n",
    "Ever wondered how to turn your existing _denoising diffusion code_ into a _flow matching_ approach? 🤔 Or what all the fuss regarding diffusion models was about in the first place? 🧐 That's exactly what this notebook is focusing on 😎\n",
    "\n",
    "We'll be using a learning task where we can reliably generate arbitrary amounts of ground truth data, to make sure we can quantify how well the target distribution was learned. Specifically, we'll focus on Reynolds-averaged Navier-Stokes simulations around airfoils, which have the interesting characteristic that typical solvers (such as OpenFoam) transition from steady solutions to oscillating ones for larger Reynolds numbers. This transition is exactly what we'll give as a task to diffusion models below. (Details can be found in our [diffuion-based flow prediction repository](https://github.com/tum-pbs/Diffusion-based-Flow-Prediction/).)\n",
    "\n",
-    "# Intro\n",
+    "## Intro\n",
    "\n",
    "Diffusion models have been rising stars in the deep learning field in the past years, and have made it possible to train powerful generative models with surprisingly simple and robust training setups. Within this sub-field of deep learning, a very promising new development are flow-based approaches, typically going under names such as _flow matching_ {cite}`lipman2022flow` and _rectified flows_ {cite}`liu2022rect` . We'll stick to the former here for simplicity, and denote this class of models with _FM_.\n",
    "\n",
@ -27,7 +27,7 @@
    "id": "32daES8LGs6v"
   },
   "source": [
-    "# Problem statement\n",
+    "## Problem statement\n",
    "\n",
    "Instead of the previous supervised learning tasks, we'll need to consider distributions. For \"classic\" supervised tasks, we have unique input-output pairs $(x,y)$ and train a model to provide $y$ given $x$ based on internal parameters $\\theta$, i.e. $y=f(x;\\theta)$.\n",
    "\n",
@ -43,7 +43,7 @@
    "id": "AVmsZmUsGwkR"
   },
   "source": [
-    "# Implementation and Setup\n",
+    "## Implementation and Setup\n",
    "\n",
    "First, we need to install the required packages and clone the repository:\n"
   ]
@ -177,11 +177,14 @@
    " _noise schedule_  $\\beta^t \\in (0,1)$, that gradually increases from $0$ to $\\beta^t=1$ at the end of the chain, where $t=1$.\n",
    "\n",
    "By choosing a Gaussian distribution, we can decouple the steps to give a Markov chain for the distribution $q$ of the form \n",
+    "\n",
    "$$\n",
    "    q\\left(y^{0:T}\\right) =\n",
    "    q(y^0) \\prod_{t=1}^{T}  q\\left(y^{t} \\mid y^{t-1}\\right) ,\n",
    "$$\n",
+    "\n",
    "where\n",
+    "\n",
    "$$\n",
    "    q\\left(y^{t} \\mid y^{t-1}\\right) = \\mathcal{N}\n",
    "    \\left( \\sqrt{1-\\beta^{t}} y^{t-1}, \\beta^{t} \\mathbf{I}\\right) .\n",
@ -190,6 +193,7 @@
    "It's fairly obvious that we can destroy any input signal $y$ by accumulating more and more noise. What's more interesting is\n",
    "the reverse process that removes the noise, i.e. the denoising. We can likewise formulate a\n",
    "reverse Markov chain for the distribution $p_\\theta$. The subscript already indicates that we'll learn the transition and parameterize it by a set or parameters $\\theta$:\n",
+    "\n",
    "$$\n",
    "    p_\\theta\\left(y^{0:T}\\right)\n",
    "    =\n",
@ -197,7 +201,9 @@
    "    \\prod_{t=1}^{T}\n",
    "    p_\\theta \\left(y^{t-1} \\mid y^{t}\\right)\n",
    "$$\n",
+    "\n",
    "with\n",
+    "\n",
    "$$\n",
    "    y^{t} = \\sqrt{\\bar{\\alpha}^t} y^{0}\n",
    "    +\\sqrt{\\left(1-\\bar{\\alpha}^t\\right)}\\epsilon .\n",
@ -207,11 +213,14 @@
    "$\\alpha^t = 1 - \\beta^t$ and $\\bar{\\alpha}^t = \\prod_{i=1}^t \\alpha^i$.\n",
    "\n",
    "Each step $p_\\theta \\left(y^{t-1} \\mid y^{t}\\right)$ along the way now has the specific form\n",
+    "\n",
    "$$\n",
    "    p_\\theta \\left(y^{t-1} \\mid y^{t}\\right) =\n",
    "    \\mathcal{N} \\left( \\mu(f_{\\theta}) , \\sigma_{\\theta} \\right)\n",
    "$$ (eq-denoising-step)  \n",
+    "\n",
    "where we're employing a neural network $f_\\theta$ to predict the noise. We could also call the network $\\epsilon_\\theta$ here, but for consistency we'll stick to $f_\\theta$. The noise inferred by our network parametrizes the mean\n",
+    "\n",
    "$$\n",
    "    \\mu(\\epsilon) =\n",
    "    \\frac{1}{\\sqrt{\\alpha^t}}\n",
@ -219,7 +228,9 @@
    "        y^t - \\frac{\\beta^t}{\\sqrt{1 - \\bar{\\alpha}^t}} \\epsilon\n",
    "    \\Big) .\n",
    "$$\n",
+    "\n",
    "The standard deviation interestingly does not depend on the noise (and our network), but evolves over time with\n",
+    "\n",
    "$$\n",
    "    \\sigma=\n",
    "    \\frac{1-\\bar{\\alpha}^{t-1}}\n",
@ -419,7 +430,7 @@
    "id": "uR27s9bZc728"
   },
   "source": [
-    "# Flow Matching\n",
+    "## Flow Matching\n",
    "\n",
    "Instead of adding and removing noise,\n",
    "flow matching transforms probability distributions.\n",
@ -428,14 +439,18 @@
    "samples $x_t$ from a distribution $p(x_t)$.\n",
    "In short: $x_t=\\phi(x_0)$.\n",
    "\n",
-    "Later on, $x_0$ will represent samples from a simple distribution, such as a Gaussian distribution, similar to what we used for the diffusion models previously. On the other hand, $x_1$ corresponds to samples from the target distribution, i.e., samples from our training dataset ($y$ in the notation above). For the transformation of distributions, it's convenient to consider continuously changing distributions $p(x_t)$ for varying $t$. Just keep in mind that for $t=1$, we're at $x_1$ which is identical to $y$. I.e. $p(x_t)|_{t=1} = p(x_1) = p(y)$.\n",
+    "Later on, $x_0$ will represent samples from a simple distribution, such as a Gaussian distribution, similar to what we used for the diffusion models previously. $x_1$, on the other hand, corresponds to samples from the target distribution, i.e., samples from our training dataset ($y$ in the notation above). \n",
+    "As we're going from Gaussian noise towards a target, the progression is similar to what we saw for denoising: from very noisy to no noise, despite the original flow matching formulation not necessarily aiming for this behavior.\n",
+    "For the transformation of distributions, it's convenient to consider continuously changing distributions $p(x_t)$ for varying $t$. Just keep in mind that for $t=1$, we're at $x_1$ which is identical to $y$. I.e. $p(x_t)|_{t=1} = p(x_1) = p(y)$.\n",
    "\n",
    "Flow matching learns the time derivative of this transformation, the _flow_, as a time-dependent vector field $u:[0,1]\\times \\mathbb{R}^d \\rightarrow \\mathbb{R}^d$, where\n",
    "$u_t(x_t)=\\frac{d}{dt}x_t$. For a neural network $f_{\\theta}(x,t)$ the\n",
    "loss function is simply an $L^2$ between predicted and target velocities:\n",
+    "\n",
    "$$\n",
    "    \\mathcal{L}_{\\text{FM} }(\\theta) = \\mathbb{E}_{t \\sim [0,1], ~ x_t \\sim p(x_t)}\\| f_{\\theta}(x_t,t)-u_t(x_t) \\|^2 ,\n",
    "$$\n",
+    "\n",
    "where $p_t$ denotes the intermediate distributions at time $t$ with  $t \\sim [0,1]$.\n",
    "\n",
    "Looks surprisingly simple so far, and a lot like the loss for our noise estimation problem above. However, tithout additional tricks this loss function is intractable since we don't know the distributions $p(x_t)$ and the correct velocities $u_t$.\n",
@ -471,10 +486,13 @@
    "Starting with a normalized Gaussian distribution at $t=0$, we then want the standard deviations $\\sigma_t$ to linearly decrease with $1-t$, so that we're left with no randomness at $t=1$. At the same time, the mean $\\mu_t$ should change from zero to $x_1$, i.e. $\\mu_t(x_1)=t ~ x_1$.\n",
    "\n",
    "This gives the mapping:\n",
+    "\n",
    "$$\n",
    "    \\phi_t(x_0) = \\sigma_t(x_t)x_0 + \\mu_t(x_t),\n",
    "$$\n",
+    "\n",
    "with it's time derivative being the velocity:\n",
+    "\n",
    "$$\n",
    "    u_t(x_t \\vert x_1)=\\frac{d}{dt}\\phi_t(x_0)=\\sigma_t'(x_1)x_0 + \\mu_t'(x_1)\n",
    "    ~.\n",
@ -496,11 +514,14 @@
    "\n",
    "Now we have all necessary ingredients to compute the target velocities $u_t$\n",
    "as\n",
+    "\n",
    "$$\n",
    "    u_t(x_t \\vert x_1) = x_1 - (1-\\sigma_\\text{min}) x_0\n",
    "    ~,\n",
    "$$\n",
+    "\n",
    " and we can formulate the conditional version of the loss function above:\n",
+    "\n",
    "$$\n",
    "    \\mathcal{L}_{\\text{CFM}}(\\theta) = \\mathbb{E}_{t\\sim [0,1], ~ x_1\\sim p(x_1), ~ x_t \\sim p(x_t\\vert x_1)} \\big\\| f_{\\theta}(x_t,t, x_1) - u_t(x_t\\vert x_1) \\big\\|^2\n",
    "$$\n",
@ -517,7 +538,7 @@
    "id": "qnIpelg_3PR7"
   },
   "source": [
-    "# Implementing Flow Matching\n",
+    "## Implementing Flow Matching\n",
    "\n",
    "For the implementation, we'll again split the core functionality and the training code. The former is handled by the helper class `MyFlowMatcher`. It's even simpler than the previous one for denoising: `phi_t()` computes the linear forward step by interpolating two samples `x_0` and `x_1`. `u_t()` instead computes the time derivative, as explained above. Because we're aiming for a straight motion, it's constant in time, and `u_t` does not depend on `t` anymore.\n",
    "\n",
@ -675,7 +696,7 @@
    "id": "ZCGAEnlFpyzP"
   },
   "source": [
-    "# Test Evaluation\n",
+    "## Test Evaluation\n",
    "\n",
    "To evaluate the trained models on inputs that weren't used for training we first need to download some more data. This is what happens in the next cell. \n",
    "The `scale_factor=0.25` parameters of the `read_single_file()` function below make sure that we get fields of size $32 \\times 32$ like the ones in the training data set. However, the test set has previously unseen Reynolds numbers, so that we can check how well the model generalizes. While loading the data, the code also computes statistics for the ground truth mean and standard deviations (`mean_field_test_gd` and `std_field_test_gd`). This data will be used to quantify differences between the trained models later on.\n"
@ -963,7 +984,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Quantified Results\n",
+    "## Quantified Results\n",
    "\n",
    "So far, we've focused on a single test case, and this could have been a \"lucky\" one for FM. Hence, below we'll repeat the evaluation for different cases across different Reynolds numbers to obtain quantified results. In total, the test set has six different Reynolds numbers, the middle four being interpolations of the training parameters, the first and last being extrapolations.\n",
    "\n",
@ -1118,8 +1139,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "------------------------\n",
-    "[-> Back to PBDL main page](https://www.physicsbaseddeeplearning.org/)"
+    "\n"
   ]
  }
 ],
--- a/probmodels-uncond.md
+++ b/probmodels-uncond.md
@ -1,4 +1,4 @@
-How to Train Unconditionally Stable Autoregressive Neural Operators
+Unconditional Stablility
 =======================

 The results of the previous section, for time predictions with diffusion models, and earilier ones ({doc}`diffphys-discuss`)
@ -25,7 +25,7 @@ For each sequences in both data sets, three training runs of each architecture a
 As a first comparison, we'll train three network architectures with an identical U-Net architecture, that use different stabilization techniques. This comparison shows that it is possible to successfully achieve the task "unconditional stability" in different ways:
 - Unrolled training (_U-Net-ut_) where gradients are backpropagated through multiple time steps during training.
 - Networks trained on a single prediction step with added training noise (_U-Net-tn_). This technique is known to improve stability by reducing data shift, as the added noise emulates errors that accumulate during inference.
- Autoregressive conditional diffusion models (ACDM). A denoising diffusion model is conditioned on the previous time step and iteratively refines noise to create a prediction for the next step, as shown in {doc}`probmodels-time.ipynb`. 
+- Autoregressive conditional diffusion models (ACDM). A denoising diffusion model is conditioned on the previous time step and iteratively refines noise to create a prediction for the next step, as shown in {doc}`probmodels-time`. 

 NT_DEBUG, todo, more ACDM discussion below!
    images from : 2024-08-05-long-rollout-www/Long Rollouts/imgs/
@ -59,7 +59,7 @@ Figure 2 lists the percentage of stable runs for a range of ablation networks on

 ```{figure} resources/probmodels-uncond03-ma.png
 ---
-height: 240px
+height: 210px
 name: probmodels-uncond03-ma
 ---
 Percentage of stable runs on the Tra-ext data set for different ablations of unrolled training.
@ -80,7 +80,7 @@ it can substantially impact the stability of autoregressive networks. This is si

 ```{figure} resources/probmodels-uncond04a.png
 ---
-height: 240px
+height: 210px
 name: probmodels-uncond04a
 ---
 Percentage of stable runs and training time for different combinations of rollout length and batch size for the Tra-ext data set. Grey configurations are omitted due to memory limitations (mem) or due to high computational demands (-).
@ -88,7 +88,7 @@ Percentage of stable runs and training time for different combinations of rollou

 ```{figure} resources/probmodels-uncond04b.png
 ---
-height: 240px
+height: 210px
 name: probmodels-uncond04b
 ---
 Percentage of stable runs and training time for rollout length and batch size for the Inc-high data set. Grey again indicates out-of-memory (mem) or overly high computations (-).
@ -98,7 +98,7 @@ This shows that increasing the batch size is more expensive in terms of training

 ```{figure} resources/probmodels-uncond05.png
 ---
-height: 240px
+height: 180px
 name: probmodels-uncond05
 ---
 Training time for different combinations of rollout length and batch size to on the Tra-ext data set (left) and the Inc-high data set (right). Only configurations that to lead to highly stable networks (stable run percentage >= 89%) are shown.
--- a/references.bib
+++ b/references.bib
@ -54,7 +54,7 @@

@article{holzschuh2023smdp,
  title={Solving Inverse Physics Problems with Score Matching},
-  author={Benjamin Holzschuh and Simona Vegetti and Thuerey, Nils},
+  author={Benjamin Holzschuh and Simona Vegetti and Nils Thuerey},
  journal={Advances in Neural Information Processing Systems (NeurIPS)},
  volume={36},
  year={2023}
@ -62,7 +62,7 @@

@inproceedings{franz2023nglobt,
 	  title={Learning to Estimate Single-View Volumetric Flow Motions without 3D Supervision},
-	  author={Erik Franz, Barbara Solenthaler, and Thuerey, Nils},
+	  author={Erik Franz, Barbara Solenthaler, and Nils Thuerey},
 	  booktitle={ICLR},
 	  year={2023},
 	  url={https://github.com/tum-pbs/Neural-Global-Transport},
@ -78,7 +78,7 @@

@inproceedings{list2022piso,
 	  title={Learned Turbulence Modelling with Differentiable Fluid Solvers},
-	  author={Bjoern List and Liwei Chen and Thuerey, Nils},
+	  author={Bjoern List and Liwei Chen and Nils Thuerey},
 	  booktitle={arXiv:2202.06988},
 	  year={2022},
 	  url={https://ge.in.tum.de/publications/},
@ -120,8 +120,8 @@
 }

@article{chu2021physgan,
-	author = {Chu, Mengyu and Thuerey, Nils and Seidel, Hans-Peter and Theobalt, Christian and Zayer, Rhaleb},
 	 title  ={{Learning Meaningful Controls for Fluids}},
+	author = {Chu, Mengyu and Thuerey, Nils and Seidel, Hans-Peter and Theobalt, Christian and Zayer, Rhaleb},
 	 journal = ACM_TOG,
 	 volume = {40(4)},
 	 year = {2021},
@ -1081,4 +1081,11 @@
 }


+@article{holzschuh2024fm,
+  title={Solving Inverse Physics Problems with Score Matching},
+  author={Benjamin Holzschuh and Nils Thuerey},
+  journal={Advances in Neural Information Processing Systems (NeurIPS)},
+  volume={36},
+  year={2023}
+}