|
|
|
|
@@ -6,13 +6,13 @@
|
|
|
|
|
"id": "q4kVgOM2pyzJ"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# From DDPM to Flow Matching for Airfoil RANS Flows\n",
|
|
|
|
|
"# From DDPM to Flow Matching\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Ever wondered how to turn your existing _denoising diffusion code_ into a _flow matching_ approach? 🤔 Or what all the fuss regarding diffusion models was about in the first place? 🧐 That's exactly what this notebook is focusing on 😎\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"We'll be using a learning task where we can reliably generate arbitrary amounts of ground truth data, to make sure we can quantify how well the target distribution was learned. Specifically, we'll focus on Reynolds-averaged Navier-Stokes simulations around airfoils, which have the interesting characteristic that typical solvers (such as OpenFoam) transition from steady solutions to oscillating ones for larger Reynolds numbers. This transition is exactly what we'll give as a task to diffusion models below. (Details can be found in our [diffuion-based flow prediction repository](https://github.com/tum-pbs/Diffusion-based-Flow-Prediction/).)\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"# Intro\n",
|
|
|
|
|
"## Intro\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Diffusion models have been rising stars in the deep learning field in the past years, and have made it possible to train powerful generative models with surprisingly simple and robust training setups. Within this sub-field of deep learning, a very promising new development are flow-based approaches, typically going under names such as _flow matching_ {cite}`lipman2022flow` and _rectified flows_ {cite}`liu2022rect` . We'll stick to the former here for simplicity, and denote this class of models with _FM_.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -27,7 +27,7 @@
|
|
|
|
|
"id": "32daES8LGs6v"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# Problem statement\n",
|
|
|
|
|
"## Problem statement\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Instead of the previous supervised learning tasks, we'll need to consider distributions. For \"classic\" supervised tasks, we have unique input-output pairs $(x,y)$ and train a model to provide $y$ given $x$ based on internal parameters $\\theta$, i.e. $y=f(x;\\theta)$.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -43,7 +43,7 @@
|
|
|
|
|
"id": "AVmsZmUsGwkR"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# Implementation and Setup\n",
|
|
|
|
|
"## Implementation and Setup\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"First, we need to install the required packages and clone the repository:\n"
|
|
|
|
|
]
|
|
|
|
|
@@ -177,11 +177,14 @@
|
|
|
|
|
" _noise schedule_ $\\beta^t \\in (0,1)$, that gradually increases from $0$ to $\\beta^t=1$ at the end of the chain, where $t=1$.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"By choosing a Gaussian distribution, we can decouple the steps to give a Markov chain for the distribution $q$ of the form \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" q\\left(y^{0:T}\\right) =\n",
|
|
|
|
|
" q(y^0) \\prod_{t=1}^{T} q\\left(y^{t} \\mid y^{t-1}\\right) ,\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"where\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" q\\left(y^{t} \\mid y^{t-1}\\right) = \\mathcal{N}\n",
|
|
|
|
|
" \\left( \\sqrt{1-\\beta^{t}} y^{t-1}, \\beta^{t} \\mathbf{I}\\right) .\n",
|
|
|
|
|
@@ -190,6 +193,7 @@
|
|
|
|
|
"It's fairly obvious that we can destroy any input signal $y$ by accumulating more and more noise. What's more interesting is\n",
|
|
|
|
|
"the reverse process that removes the noise, i.e. the denoising. We can likewise formulate a\n",
|
|
|
|
|
"reverse Markov chain for the distribution $p_\\theta$. The subscript already indicates that we'll learn the transition and parameterize it by a set or parameters $\\theta$:\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" p_\\theta\\left(y^{0:T}\\right)\n",
|
|
|
|
|
" =\n",
|
|
|
|
|
@@ -197,7 +201,9 @@
|
|
|
|
|
" \\prod_{t=1}^{T}\n",
|
|
|
|
|
" p_\\theta \\left(y^{t-1} \\mid y^{t}\\right)\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"with\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" y^{t} = \\sqrt{\\bar{\\alpha}^t} y^{0}\n",
|
|
|
|
|
" +\\sqrt{\\left(1-\\bar{\\alpha}^t\\right)}\\epsilon .\n",
|
|
|
|
|
@@ -207,11 +213,14 @@
|
|
|
|
|
"$\\alpha^t = 1 - \\beta^t$ and $\\bar{\\alpha}^t = \\prod_{i=1}^t \\alpha^i$.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Each step $p_\\theta \\left(y^{t-1} \\mid y^{t}\\right)$ along the way now has the specific form\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" p_\\theta \\left(y^{t-1} \\mid y^{t}\\right) =\n",
|
|
|
|
|
" \\mathcal{N} \\left( \\mu(f_{\\theta}) , \\sigma_{\\theta} \\right)\n",
|
|
|
|
|
"$$ (eq-denoising-step) \n",
|
|
|
|
|
"\n",
|
|
|
|
|
"where we're employing a neural network $f_\\theta$ to predict the noise. We could also call the network $\\epsilon_\\theta$ here, but for consistency we'll stick to $f_\\theta$. The noise inferred by our network parametrizes the mean\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" \\mu(\\epsilon) =\n",
|
|
|
|
|
" \\frac{1}{\\sqrt{\\alpha^t}}\n",
|
|
|
|
|
@@ -219,7 +228,9 @@
|
|
|
|
|
" y^t - \\frac{\\beta^t}{\\sqrt{1 - \\bar{\\alpha}^t}} \\epsilon\n",
|
|
|
|
|
" \\Big) .\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"The standard deviation interestingly does not depend on the noise (and our network), but evolves over time with\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" \\sigma=\n",
|
|
|
|
|
" \\frac{1-\\bar{\\alpha}^{t-1}}\n",
|
|
|
|
|
@@ -419,7 +430,7 @@
|
|
|
|
|
"id": "uR27s9bZc728"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# Flow Matching\n",
|
|
|
|
|
"## Flow Matching\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Instead of adding and removing noise,\n",
|
|
|
|
|
"flow matching transforms probability distributions.\n",
|
|
|
|
|
@@ -428,14 +439,18 @@
|
|
|
|
|
"samples $x_t$ from a distribution $p(x_t)$.\n",
|
|
|
|
|
"In short: $x_t=\\phi(x_0)$.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Later on, $x_0$ will represent samples from a simple distribution, such as a Gaussian distribution, similar to what we used for the diffusion models previously. On the other hand, $x_1$ corresponds to samples from the target distribution, i.e., samples from our training dataset ($y$ in the notation above). For the transformation of distributions, it's convenient to consider continuously changing distributions $p(x_t)$ for varying $t$. Just keep in mind that for $t=1$, we're at $x_1$ which is identical to $y$. I.e. $p(x_t)|_{t=1} = p(x_1) = p(y)$.\n",
|
|
|
|
|
"Later on, $x_0$ will represent samples from a simple distribution, such as a Gaussian distribution, similar to what we used for the diffusion models previously. $x_1$, on the other hand, corresponds to samples from the target distribution, i.e., samples from our training dataset ($y$ in the notation above). \n",
|
|
|
|
|
"As we're going from Gaussian noise towards a target, the progression is similar to what we saw for denoising: from very noisy to no noise, despite the original flow matching formulation not necessarily aiming for this behavior.\n",
|
|
|
|
|
"For the transformation of distributions, it's convenient to consider continuously changing distributions $p(x_t)$ for varying $t$. Just keep in mind that for $t=1$, we're at $x_1$ which is identical to $y$. I.e. $p(x_t)|_{t=1} = p(x_1) = p(y)$.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Flow matching learns the time derivative of this transformation, the _flow_, as a time-dependent vector field $u:[0,1]\\times \\mathbb{R}^d \\rightarrow \\mathbb{R}^d$, where\n",
|
|
|
|
|
"$u_t(x_t)=\\frac{d}{dt}x_t$. For a neural network $f_{\\theta}(x,t)$ the\n",
|
|
|
|
|
"loss function is simply an $L^2$ between predicted and target velocities:\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" \\mathcal{L}_{\\text{FM} }(\\theta) = \\mathbb{E}_{t \\sim [0,1], ~ x_t \\sim p(x_t)}\\| f_{\\theta}(x_t,t)-u_t(x_t) \\|^2 ,\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"where $p_t$ denotes the intermediate distributions at time $t$ with $t \\sim [0,1]$.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"Looks surprisingly simple so far, and a lot like the loss for our noise estimation problem above. However, tithout additional tricks this loss function is intractable since we don't know the distributions $p(x_t)$ and the correct velocities $u_t$.\n",
|
|
|
|
|
@@ -471,10 +486,13 @@
|
|
|
|
|
"Starting with a normalized Gaussian distribution at $t=0$, we then want the standard deviations $\\sigma_t$ to linearly decrease with $1-t$, so that we're left with no randomness at $t=1$. At the same time, the mean $\\mu_t$ should change from zero to $x_1$, i.e. $\\mu_t(x_1)=t ~ x_1$.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"This gives the mapping:\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" \\phi_t(x_0) = \\sigma_t(x_t)x_0 + \\mu_t(x_t),\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"with it's time derivative being the velocity:\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" u_t(x_t \\vert x_1)=\\frac{d}{dt}\\phi_t(x_0)=\\sigma_t'(x_1)x_0 + \\mu_t'(x_1)\n",
|
|
|
|
|
" ~.\n",
|
|
|
|
|
@@ -496,11 +514,14 @@
|
|
|
|
|
"\n",
|
|
|
|
|
"Now we have all necessary ingredients to compute the target velocities $u_t$\n",
|
|
|
|
|
"as\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" u_t(x_t \\vert x_1) = x_1 - (1-\\sigma_\\text{min}) x_0\n",
|
|
|
|
|
" ~,\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
"\n",
|
|
|
|
|
" and we can formulate the conditional version of the loss function above:\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
" \\mathcal{L}_{\\text{CFM}}(\\theta) = \\mathbb{E}_{t\\sim [0,1], ~ x_1\\sim p(x_1), ~ x_t \\sim p(x_t\\vert x_1)} \\big\\| f_{\\theta}(x_t,t, x_1) - u_t(x_t\\vert x_1) \\big\\|^2\n",
|
|
|
|
|
"$$\n",
|
|
|
|
|
@@ -517,7 +538,7 @@
|
|
|
|
|
"id": "qnIpelg_3PR7"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# Implementing Flow Matching\n",
|
|
|
|
|
"## Implementing Flow Matching\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"For the implementation, we'll again split the core functionality and the training code. The former is handled by the helper class `MyFlowMatcher`. It's even simpler than the previous one for denoising: `phi_t()` computes the linear forward step by interpolating two samples `x_0` and `x_1`. `u_t()` instead computes the time derivative, as explained above. Because we're aiming for a straight motion, it's constant in time, and `u_t` does not depend on `t` anymore.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -675,7 +696,7 @@
|
|
|
|
|
"id": "ZCGAEnlFpyzP"
|
|
|
|
|
},
|
|
|
|
|
"source": [
|
|
|
|
|
"# Test Evaluation\n",
|
|
|
|
|
"## Test Evaluation\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"To evaluate the trained models on inputs that weren't used for training we first need to download some more data. This is what happens in the next cell. \n",
|
|
|
|
|
"The `scale_factor=0.25` parameters of the `read_single_file()` function below make sure that we get fields of size $32 \\times 32$ like the ones in the training data set. However, the test set has previously unseen Reynolds numbers, so that we can check how well the model generalizes. While loading the data, the code also computes statistics for the ground truth mean and standard deviations (`mean_field_test_gd` and `std_field_test_gd`). This data will be used to quantify differences between the trained models later on.\n"
|
|
|
|
|
@@ -963,7 +984,7 @@
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"# Quantified Results\n",
|
|
|
|
|
"## Quantified Results\n",
|
|
|
|
|
"\n",
|
|
|
|
|
"So far, we've focused on a single test case, and this could have been a \"lucky\" one for FM. Hence, below we'll repeat the evaluation for different cases across different Reynolds numbers to obtain quantified results. In total, the test set has six different Reynolds numbers, the middle four being interpolations of the training parameters, the first and last being extrapolations.\n",
|
|
|
|
|
"\n",
|
|
|
|
|
@@ -1118,8 +1139,7 @@
|
|
|
|
|
"cell_type": "markdown",
|
|
|
|
|
"metadata": {},
|
|
|
|
|
"source": [
|
|
|
|
|
"------------------------\n",
|
|
|
|
|
"[-> Back to PBDL main page](https://www.physicsbaseddeeplearning.org/)"
|
|
|
|
|
"\n"
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
|