added genAI dividers

2025-03-20 13:56:40 +01:00 · 2025-03-20 13:56:40 +01:00 · 39fcd963ab
commit 39fcd963ab
parent ac3586cfc1
25 changed files with 52 additions and 51 deletions
--- a/probmodels-ddpm-fm.ipynb
+++ b/probmodels-ddpm-fm.ipynb
@ -9,7 +9,7 @@
    "# Denoising and Flow Matching Side-by-side\n",
    "\n",
    "To show the capabilities of **denoising diffusion** and **flow matching**, we'll be use a learning task where we can reliably generate arbitrary amounts of ground truth data. This ensures we can quantify how well the target distribution was learned. Specifically, we'll focus on Reynolds-averaged Navier-Stokes simulations around airfoils, which have the interesting characteristic that typical solvers (such as OpenFoam) transition from steady solutions to oscillating ones for larger Reynolds numbers. This transition is exactly what we'll give as a task to diffusion models below. (Details can be found in our [diffusion-based flow prediction repository](https://github.com/tum-pbs/Diffusion-based-Flow-Prediction/).) Also, to make the notebook self-contained, we'll revisit the most important concepts from the previous section.\n",
-     "[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/probmodels-ddpm-fm.ipynb)\n",
+    "[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/probmodels-ddpm-fm.ipynb)\n",
    "\n",
    "```{note} \n",
    "If you're directly continuing reading from the previous chapter, note that there's an important difference: we'll deviate from the _simulation-based inference viewpoint, and for simplicity we'll apply denoising and flow-matching to a **forward** problem. We won't be aiming to recover $x$ for an observation $y$, but rather assume we have initial conditions $x$ from which we want to compute a solution $y$. So don't be surprised by the switched $x$ and $y$ below.\n",
@ -21,7 +21,10 @@
    "\n",
    "For the original diffusion models, especially the _denoising_ tasks were extremely successful: a neural network learns to restore a signal from pure noise. Score functions provided an alternate viewpoint, but ultimately also resulted in denoising tasks. Instead, flow-based approaches aim for transforming distributions. The goal is to transform a known one, such as gaussian noise, into one that represents the distribution of the signal or target function we're interested in. Despite these seemingly different viewpoints, all viewpoints above effectively do the same: starting with noise, they step by step turn it into samples for our target signal. Interestingly, the FM-perspective is not only more stable at training time, it also speeds up inference by orders of magnitude thanks to yielding straighter paths. And even better: if you have a working DM setup, it's surprisingly simple to turn it into an FM one.\n",
    "\n",
-    "Below, we'll highlight the simiarities and differences, and evaluate both methods with the RANS-based flow setup outlined above.\n"
+    "Below, we'll highlight the similarities and differences, and evaluate both methods with the RANS-based flow setup outlined above.\n",
+    "\n",
+    "![Divider](resources/divider-gen8.jpg)\n",
+    "\n"
   ]
  },
  {
@ -424,7 +427,8 @@
   "source": [
    "Before we investigate the capabilities of this model, let's directly train a flow matching version, so that we can compare.\n",
    "\n",
-    "---------------"
+    "\n",
+    "![Divider](resources/divider-gen9.jpg)\n"
   ]
  },
  {
--- a/probmodels-diffusion.ipynb
+++ b/probmodels-diffusion.ipynb
@ -37,6 +37,8 @@
        "with the inverted weights $\\alpha_t = 1 - \\beta_t$ and alphas accumulated for time $t$ denoted by\n",
        "$\\overline{\\alpha}_t= \\prod_{s=1}^t \\alpha_s$.\n",
        "\n",
+        "![Divider](resources/divider-gen6.jpg)\n",
+        "\n",
        "## Latent Variable Models\n",
        "\n",
        "Conceptually, this formulation gives us what's called a _latent variable model_ in the ML community. Instead of the somewhat arbitrary in between states of the Annealed Langevin Dynamics above, we now have explicitly modeled _latent_ states along the diffusion time $t$. Our distribution for targets $ x_0 \\sim q(x_0) $ is of the form  $ p_\\theta(x_0) = \\int p_\\theta(x_{0:T})dx_{1:T} , $   where  $x_1,...,x_T$ are latents with the same dimensionality as $x_0$.\n",
--- a/probmodels-discuss.md
+++ b/probmodels-discuss.md
@ -5,11 +5,9 @@ As the previous sections have demonstrated, probabilistic learning offers a wide

 At the same time, they enable a fundamentally different way to work with simulations: they provide a simple way to work with complex distributions of solutions. This is of huge importance for inverse problems, e.g. in the context of obtaining likelihood-based estimates for _simulation-based inference_. 

-That being said, diffusion based approaches will not show relatively few advantages for deterministic settings: they are not more accurate, and typically induce slightly larger computational costs. An interesting exception is the long-term stability, as discussed in {doc}`probmodels-uncond`. 
+![Divider](resources/divider-gen1.jpg)

-![Divider](resources/divider1.jpg)
-
-To summarize the key aspects of probabilistic deep learning approaches:
+That being said, diffusion based approaches will not show relatively few advantages for deterministic settings: they are not more accurate, and typically induce slightly larger computational costs. An interesting exception is the long-term stability, as discussed in {doc}`probmodels-uncond`. To summarize the key aspects of probabilistic deep learning approaches:

 ✅ Pro: 
 - Enable training and inference for distributions
@ -20,9 +18,9 @@ To summarize the key aspects of probabilistic deep learning approaches:
 - (Slightly) increased inference cost
 - No real advantage for deterministic settings

-![Divider](resources/divider7.jpg)
+One more concluding recommendation: if your problems contains ambiguities, diffusion modeling in the form of _flow matching_ is the method of choice. If your data contains reliable input-output pairs, go with simpler _deterministic training_ instead.

-To summarize: if your problems contains ambiguities, diffusion modeling in the form of _flow matching_ is the method of choice. If your data contains reliable input-output pairs, go with simpler _deterministic training_ instead.
+![Divider](resources/divider-gen3.jpg)

 Next, we can turn to a new viewpoint on learning problems, the field of _reinforcement learning_. As the next sections will point out, it is actually not so different from the topics of the previous chapters despite the new viewpoint.

--- a/probmodels-flowmatching.ipynb
+++ b/probmodels-flowmatching.ipynb
@ -58,6 +58,8 @@
        "This version is tractable and can be used for actual training runs, in contrast to the un-conditional objective from equation {eq}`eq-flow-matching`.\n",
        "This means that we can train $v_\\theta(x,t)$ to regress $u_t(x)$ generating the mapping from $p_0$ to the target distribution $p_1$.\n",
        "\n",
+        "![Divider](resources/divider-gen7.jpg)\n",
+        "\n",
        "## Mappings and Conditioning\n",
        "\n",
        "Especially important: we have a lot of freedom when specifying the mapping from $p_0$ to $p_1$ via the conditioning variable $z$ and the conditional likelihoods $p_t$ in this formulation.\n",
--- a/probmodels-graph-ellipse.ipynb
+++ b/probmodels-graph-ellipse.ipynb
@ -257,6 +257,9 @@
        "id": "cBAyoKyZV7gc"
      },
      "source": [
+        "![Divider](resources/divider-gen6.jpg)\n",
+        "\n",
+        "\n",
        "## Sample-wise Accuracy\n",
        "The next cell defines a plotting function that shows the closest ground truth pressure distribution that was found in the reference data set in black next to the neural network outputs, shown in light red.\n",
        "\n",
@ -459,6 +462,9 @@
        "id": "3WjCrVN1V7gd"
      },
      "source": [
+        "![Divider](resources/divider-gen2.jpg)\n",
+        "\n",
+        "\n",
        "## Evaluating Distributional Accuracy\n",
        "\n",
        "To evaluate a large number of samples, and compute their node wise and graph-based Wasserstein distances. These quantified metrics are a good start, but it's still interesting to visualize the distributions to provide more intuition for how well or badly certain methods do. For this, we'll plot stacks of Gaussian kernel density estimates that show the distribution of pressure values along the length of the ellipses.\n",
@ -512,7 +518,7 @@
    },
    {
      "cell_type": "code",
-      "execution_count": 12,
+      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
@ -679,14 +685,11 @@
        "# LDGN inference\n",
        "steps = dgn.nn.diffusion.DiffusionStepsGenerator('linear', DGN.diffusion_process.num_steps)(NUM_DENOISING_STEPS)\n",
        "pred = LDGN.sample_n(NUM_SAMPLES, graph,  steps=steps, batch_size=BATCH_SIZE).cpu().squeeze(-1)\n",
-        "mean = pred.mean(dim=1)\n",
-        "std  = pred.std (dim=1)\n",
-        "# Compute the accuracy of the mean and std\n",
+        "mean = pred.mean(dim=1); std  = pred.std (dim=1)\n",
        "mean_r2 = dgn.metrics.r2_accuracy(mean, gt_mean)\n",
        "std_r2  = dgn.metrics.r2_accuracy(std , gt_std )\n",
-        "print('LDGN')\n",
-        "print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
-        "# Compute the Wasserstein-2 distance\n",
+        "print('LDGN'); print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
+        "\n",
        "w2_distance_1d = dgn.metrics.w2_distance_1d(pred, graph.target)\n",
        "w2_distance_nd = dgn.metrics.w2_distance_nd(pred, graph.target)\n",
        "print(f\"Wasserstein-2 distance 1d: {w2_distance_1d:.4f}\")\n",
@ -700,14 +703,10 @@
        "# FMGN inference\n",
        "steps = np.linspace(0, 1, NUM_FM_STEPS )\n",
        "pred = FMGN.sample_n(NUM_SAMPLES, graph, steps=steps, batch_size=BATCH_SIZE).cpu().squeeze(-1)\n",
-        "mean = pred.mean(dim=1)\n",
-        "std  = pred.std (dim=1)\n",
-        "# Compute the accuracy of the mean and std\n",
+        "mean = pred.mean(dim=1); std  = pred.std (dim=1)\n",
        "mean_r2 = dgn.metrics.r2_accuracy(mean, gt_mean)\n",
        "std_r2  = dgn.metrics.r2_accuracy(std , gt_std )\n",
-        "print('Flow-Matching DGN')\n",
-        "print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
-        "# Compute the Wasserstein-2 distance\n",
+        "print('Flow-Matching DGN'); print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
        "w2_distance_1d = dgn.metrics.w2_distance_1d(pred, graph.target)\n",
        "w2_distance_nd = dgn.metrics.w2_distance_nd(pred, graph.target)\n",
        "print(f\"Wasserstein-2 distance 1d: {w2_distance_1d:.4f}\")\n",
@ -718,13 +717,10 @@
        "# LFMGN inference\n",
        "steps = np.linspace(0, 1, NUM_FM_STEPS )\n",
        "pred = LFMGN.sample_n(NUM_SAMPLES, graph, steps=steps, batch_size=BATCH_SIZE).cpu().squeeze(-1)\n",
-        "mean = pred.mean(dim=1)\n",
-        "std  = pred.std (dim=1)\n",
-        "# Compute the accuracy of the mean and std\n",
+        "mean = pred.mean(dim=1); std  = pred.std (dim=1)\n",
        "mean_r2 = dgn.metrics.r2_accuracy(mean, gt_mean)\n",
        "std_r2  = dgn.metrics.r2_accuracy(std , gt_std )\n",
-        "print('Latent Flow-Matching DGN')\n",
-        "print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
+        "print('Latent Flow-Matching DGN'); print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
        "w2_distance_1d = dgn.metrics.w2_distance_1d(pred, graph.target)\n",
        "w2_distance_nd = dgn.metrics.w2_distance_nd(pred, graph.target)\n",
        "print(f\"Wasserstein-2 distance 1d: {w2_distance_1d:.4f}\")\n",
@ -739,14 +735,10 @@
        "        BayesianGN.sample(graph).cpu()\n",
        "    )\n",
        "pred = torch.concatenate(pred_list, dim=1)\n",
-        "mean = pred.mean(dim=1)\n",
-        "std  = pred.std (dim=1)\n",
-        "# Compute the accuracy of the mean and std\n",
+        "mean = pred.mean(dim=1); std  = pred.std (dim=1)\n",
        "mean_r2 = dgn.metrics.r2_accuracy(mean, gt_mean)\n",
        "std_r2  = dgn.metrics.r2_accuracy(std , gt_std )\n",
-        "print('Bayesian Graph Net')\n",
-        "print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
-        "# Compute the Wasserstein-2 distance\n",
+        "print('Bayesian Graph Net'); print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
        "w2_distance_1d = dgn.metrics.w2_distance_1d(pred, graph.target)\n",
        "w2_distance_nd = dgn.metrics.w2_distance_nd(pred, graph.target)\n",
        "print(f\"Wasserstein-2 distance 1d: {w2_distance_1d:.4f}\")\n",
@ -756,13 +748,10 @@
        "\n",
        "# Gaussian Mixture Graph Net inference\n",
        "pred = GaussianMixGN.sample_n(NUM_SAMPLES, graph, batch_size=BATCH_SIZE).cpu().squeeze(-1)\n",
-        "mean = pred.mean(dim=1)\n",
-        "std  = pred.std (dim=1)\n",
-        "# Compute the accuracy of the mean and std\n",
+        "mean = pred.mean(dim=1); std  = pred.std (dim=1)\n",
        "mean_r2 = dgn.metrics.r2_accuracy(mean, gt_mean)\n",
        "std_r2  = dgn.metrics.r2_accuracy(std , gt_std )\n",
-        "print('Gaussian Mixture Graph Net')\n",
-        "print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
+        "print('Gaussian Mixture Graph Net'); print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
        "# Compute the Wasserstein-2 distance\n",
        "w2_distance_1d = dgn.metrics.w2_distance_1d(pred, graph.target)\n",
        "w2_distance_nd = dgn.metrics.w2_distance_nd(pred, graph.target)\n",
@ -773,20 +762,16 @@
        "\n",
        "# VGAE inference\n",
        "pred = VGAE.sample_n(NUM_SAMPLES, graph, batch_size=BATCH_SIZE).cpu().squeeze(-1)\n",
-        "mean = pred.mean(dim=1)\n",
-        "std  = pred.std (dim=1)\n",
-        "# Compute the accuracy of the mean and std\n",
+        "mean = pred.mean(dim=1); std  = pred.std (dim=1)\n",
        "mean_r2 = dgn.metrics.r2_accuracy(mean, gt_mean)\n",
        "std_r2  = dgn.metrics.r2_accuracy(std , gt_std )\n",
-        "print('VGAE')\n",
-        "print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
-        "# Compute the Wasserstein-2 distance\n",
+        "print('VGAE'); print(f\"R2 of mean: {mean_r2:.4f}\", f\"R2 of std: {std_r2:.4f}\")\n",
        "w2_distance_1d = dgn.metrics.w2_distance_1d(pred, graph.target)\n",
        "w2_distance_nd = dgn.metrics.w2_distance_nd(pred, graph.target)\n",
        "print(f\"Wasserstein-2 distance 1d: {w2_distance_1d:.4f}\")\n",
        "print(f\"Wasserstein-2 distance nd: {w2_distance_nd:.4f}\")\n",
        "pdf(ax[curr_ax], graph.pos, pred, 'VGAE', w2_distance_1d=w2_distance_1d, w2_distance_nd=w2_distance_nd, vmin=vmin, vmax=vmax)\n",
-        "curr_ax += 1\n"
+        "curr_ax += 1"
      ]
    },
    {
--- a/probmodels-graph.md
+++ b/probmodels-graph.md
@ -154,7 +154,7 @@ During inference, the condition encoder generates the conditioning features ${V}
 Unlike in conventional VGAEs, the condition encoder is necessary because, at inference time, an encoding of ${V}_c$ and ${E}_c$ is needed on graph ${\mathcal{G}}^L$, where the LDGN operates. This encoding cannot be directly generated by the encoder, as it also requires ${Z}(t)$ as input, which is unavailable during inference. An alternative approach would be to define the conditions directly in the coarse representation of the system provided by ${\mathcal{G}}^L$, but this representation lacks fine-grained details, leading to sub-optimal results.


-![Divider](resources/divider7.jpg)
+![Divider](resources/divider-gen4.jpg)


 ## Turbulent Flows around Wings in 3D
--- a/probmodels-intro.md
+++ b/probmodels-intro.md
@ -6,7 +6,7 @@ Samples $y \sim p(y)$ drawn from the distribution should follow this probability

 To summarize, instead of individual solutions $y$ we're facing a large number of samples $y \sim p(y)$.

-![Divider](resources/divider5.jpg)
+![Divider](resources/divider-gen-full.jpg)

 ## Uncertainty 

@ -65,6 +65,8 @@ Unfortunately, this is often intractable, as $z$ can be difficult to sample, and
 Some algorithms have been proposed to compute likelihoods, one popular one is Approximate Bayesian Computation (ABC), but all approaches are highly expensive and require a lot of expert knowledge to set up. They suffer from the _curse of dimensionality_, i.e. become very expensive when facing larger numbers of degrees of freedom. Thus,
 obtaining good approximations of the likelihood will be a topic that we'll revisit below.

+![Divider](resources/divider-gen4.jpg)
+
 With a function for the likelihood we can compute the 
 **distribution of the posterior**, the main quantity we're after,
 in the following way:
@ -104,7 +106,7 @@ We'll focus on the basics, and leave the _physics-based extensions_ (i.e. includ

 <br>

-![Divider](resources/divider6.jpg)
+![Divider](resources/divider-gen6.jpg)

 ```{note} Historic Alternative: Bayesian Neural Networks

--- a/probmodels-normflow.ipynb
+++ b/probmodels-normflow.ipynb
@ -41,6 +41,7 @@
        "$\\mathbb{E}_{x \\sim p(x)}[- \\log q_\\theta(x)]$.\n",
        "This means we can train $q_\\theta(x)$ simply by sampling from $p$, and minimizing the negative log-likelihood for $q_\\theta(x)$.\n",
        "\n",
+        "![Divider](resources/divider-gen4.jpg)\n",
        "\n",
        "## From Unconditional to Conditional\n",
        "\n",
--- a/probmodels-phys.md
+++ b/probmodels-phys.md
@ -31,7 +31,7 @@ We'll focus on flow matching as a state-of-the-art approach next, and afterwards



-![Divider](resources/divider5.jpg)
+![Divider](resources/divider-genA.jpg)



@ -133,6 +133,7 @@ $$
 The gradient backpropagation is stopped at the output of the simulator $\mathcal{P}$, as shown in {numref}`figure {number} <probphys02-control>`. 
 Before showing some examples of the capabilities of these two types of control, we'll discuss some of their properties.

+![Divider](resources/divider-genB.jpg)


 ### Additional Considerations 
@ -195,7 +196,7 @@ A summary of the physics-based flow matching is given by the following bullet po



-![Divider](resources/divider6.jpg)
+![Divider](resources/divider-gen1.jpg)



--- a/probmodels-score.ipynb
+++ b/probmodels-score.ipynb
@ -223,6 +223,8 @@
        "\n",
        "We will first explore how to learn the score from samples of the target distribution using a neural network. Then we will introduce a first method how to use the score to sample from the target distribution.\n",
        "\n",
+        "![Divider](resources/divider-gen5.jpg)\n",
+        "\n",
        "## Learning the Score"
      ]
    },
--- a/probmodels-time.ipynb
+++ b/probmodels-time.ipynb
@ -20,8 +20,10 @@
    "This motivates - as in the previous sections - to view the steps of a time series as a probabilistic distribution over time rather than a deterministic series of states.\n",
    "A probabilistic simulator can learn to take into account the influence of the un-observed state, and infer solutions from variations of this un-observed part of the system. Worst case, if this un-observed state has a negligible influence, we should see a mean state with an variance that's effectively zero. So there's nothing to loose! \n",
    "\n",
+    "![Divider](resources/divider-genC.jpg)\n",
+    "\n",
    "The following notebook \n",
-     "[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/probmodels-time.ipynb)\n",
+    "[[run in colab]](https://colab.research.google.com/github/tum-pbs/pbdl-book/blob/main/probmodels-time.ipynb)\n",
    " introduces an effective, distribution-based approach for temporal predictions:\n",
    "* conditional diffusion models are used to compute autoregressive rollouts to obtain a \"probabilistic simulator\"; \n",
    "* it is of course highly interesting to compare this diffusion-based predictor to the deterministic baselines and neural operators from the previous chapters;\n",
--- a/probmodels-uncond.md
+++ b/probmodels-uncond.md
@ -12,6 +12,8 @@ As errors will accumulate over time, we can expect that network size and the tot

 Note that we'll focus on time steps with a **fixed length** in the following. The "unconditional stability" refers to being stable over an arbitrary number of iterative steps. The following networks could potentially trained for variable time step sizes as well, but we will focus on the "dimension" of stability of multiple, iterative network calls below.

+![Divider](resources/divider-gen2.jpg)
+
 ## Main Considerations for an Evaluation

 As shown in the previous chapter, diffusion models perform extremely well. This can be attributed to the underlying task of working with pure noise as input (e.g., for denoising or flow matching tasks). Likewise, the network architecture has only a minor influence: the network simply needs to be large enough to provide a converging iteration. For supervised  or unrolled training, we can leverage a variety of discrete and continuous neural operators. CNNs, Unets, FNOs and Transformers are popular approaches here.
--- a/resources/divider-gen-full.jpg
+++ b/resources/divider-gen-full.jpg
--- a/resources/divider-gen1.jpg
+++ b/resources/divider-gen1.jpg
--- a/resources/divider-gen2.jpg
+++ b/resources/divider-gen2.jpg
--- a/resources/divider-gen3.jpg
+++ b/resources/divider-gen3.jpg
--- a/resources/divider-gen4.jpg
+++ b/resources/divider-gen4.jpg
--- a/resources/divider-gen5.jpg
+++ b/resources/divider-gen5.jpg
--- a/resources/divider-gen6.jpg
+++ b/resources/divider-gen6.jpg
--- a/resources/divider-gen7.jpg
+++ b/resources/divider-gen7.jpg
--- a/resources/divider-gen8.jpg
+++ b/resources/divider-gen8.jpg
--- a/resources/divider-gen9.jpg
+++ b/resources/divider-gen9.jpg
--- a/resources/divider-genA.jpg
+++ b/resources/divider-genA.jpg
--- a/resources/divider-genB.jpg
+++ b/resources/divider-genB.jpg
--- a/resources/divider-genC.jpg
+++ b/resources/divider-genC.jpg