is trained to maximize the loss via producing an output of 1 for the real samples, and 0 for the generated ones.
The key for the generator loss is to employ the discriminator and produce samples that are classified as
real by the discriminator:
$$
\text{arg min}_{\theta_g}
- \frac{1}{2}\mathbb E \text{ log} D(G(\mathbf{z}))
$$
Typically, this training is alternated, performing one step for $D$ and then one for $G$.
Thus the $D$ network is kept constant, and provides a gradient to "steer" $G$ in the right direction
to produce samples that are indistinguishable from the real ones. As $D$ is likewise an NN, it is
differentiable by construction, and can provide the necessary gradients.
## Regularization
Due to the coupled, alternating training, GAN training has a reputation of being finicky in practice.
Instead of a single, non-linear optimization problem, we now have two coupled ones, for which we need
to find a fragile balance. (Otherwise we'll get the dreaded _mode-collapse_ problem: once one of the two network "collapses" to a trivial solution, the coupled training breaks down.)
To alleviate this problem, regularization is often crucial to achieve a stable training. In the simplest case,
we can add an $L^1$ regularizer w.r.t. reference data with a small coefficient for the generator $G$. Along those lines, pre-training the generator in a supervised fashion can help to start with a stable state for $G$. (However, then $D$ usually also needs a certain amount of pre-training to keep the balance.)
## Conditional GANs
For physical problems the regular GANs which generate solutions from the randomized latent-space
$\mathbf{z}$ above are not overly useful. Rather, we often have inputs such as parameters, boundary conditions or partial solutions which should be used to infer an output. Such cases represent _conditional_ GANs,
which means that instead of $G(\mathbf{z})$, we now have $G(\mathbf{x})$, where $\mathbf{x}$ denotes the input data.
A good scenario for conditional GANs are super-resolution networks: These have the task to compute a high-resolution output given a sparse or low-resolution input solution.
---
## Ambiguous solutions
One of the main advantages of GANs is that they can prevent an undesirable
averaging for ambiguous data. E.g., consider the case of super-resolution: a
low-resolution observation that serves as input typically has an infinite number
of possible high-resolution solutions that would fit the low-res input.
If a data set contains multiple such cases, and we employ supervised training,
the network will reliably learn the mean. This averaged solution usually is one
that is clearly undesirable, and unlike any of the individual solutions from which it was
From left to right, time derivatives for: a spatial GAN (i.e. not time aware), a temporally supervised learning, a spatio-temporal GAN, and a reference solution.
As can be seen, the GAN trained with spatio-temporal self-supervision (second from right) closely matches the reference solution on the far right. In this case the discriminator receives reference solutions over time (in the form of triplets), such that it can learn to judge whether the temporal evolution of a generated solution matches that of the reference.
## Physical generative models
As a last example, GANs were also shown to be able to
accurately capture solution manifolds of PDEs parametrized by physical parameters {cite}`chu2021physgan`.
In this work, Navier-Stokes solutions parametrized by varying buoyancies, vorticity content, boundary conditions,
and obstacle geometries were learned by an NN.
This is a highly challenging solution manifold, and requires an extended "cyclic" GAN approach
that pushes the discriminator to take all the physical parameters under consideration into account.
Interestingly, the generator learns to produce realistic and accurate solutions despite
being trained purely on data, i.e. without explicit help in the form of a differentiable physics solver setup.