pbdl-book/physgrad-discuss.md

Discussion of Improved Gradients
=======================

At this point it's a good time to take another step back, and assess the different methods introduced so far. For deep learning applications, we can broadly distinguish three approaches: the _regular_ differentiable physics (DP) training, the training with half-inverse gradients (HIGs), and using the scale-invariant physics updates (SIPs). Unfortunately, we can't simply discard two of them, and focus on a single approach for all future endeavours. However, discussing the pros and cons sheds light on some fundamental aspects of physics-based deep learning.

![Divider](resources/divider7.jpg)

## Addressing scaling issues

First and foremost, a central motivation for improved updates is the need to address the scaling issues of the learning problems. This is not a completely new problem: numerous deep learning algorithms were proposed to address these for training NNs. However, the combination of NNs with physical simulations brings new challenges that at the same time provide new angles to tackle this problem. On the negative side, we have additional, highly non-linear operators from the PDE models. On the positive side, these operators typically do not have free parameters during learning, and thus can be treated with different, tailored methods.

This is exactly where HIGs and SIPs come in: instead of treating the physical simulation like the rest of the NNs (this is the DP approach), they show how much can be achieved with custom inverse solvers (SIPs) or a custom numerical inversion (HIGs). Both methods make important steps towards _scale-invariant_ training.

## Computational Resources

Both cases usually lead to more complicated and resource intensive training. However, assuming that we can re-use a trained model many times after the training has been completed, there are many areas of application where this can quickly pay off: the trained NNs, despite being identical in runtime to those obtained from other training methods, often achieve significantly improved accuracies. Achieving similar levels of accuracy with regular Adam and DP-based training can be completely infeasible. 

When such a trained NN is used, e.g., as a surrogate model for an inverse problem, it might be executed a large number of times, and the improved accuracy can save correspondingly large amounts of computational resources in such a follow up stage. 
A good potential example are shape optimizations for the drag reduction of bodies immersed in a fluid {cite}`chen2021numerical`.


![Divider](resources/divider1.jpg)


## Summary 

To summarize, this chapter demonstrated the importance of the inversion. 
An important takeaway message is that
the regular gradients from NN training are not the best choice when PDEs are 
involved. In these situations we can get much better information about how to direct the
optimization than the localized first-order information that regular gradients provide.

Even when the inversion is only done for the physics simulation component (as with SIPs), it can substantially improve the learning process. The custom inverse solvers allow us to employ higher-order information in the training.

✅ Pro SIP: 
- Very accurate "gradient" information for physical simulations.
- Often strongly improved convergence and model performance.

❌ Con SIP: 
- Require inverse simulators (at least local ones).
- Only makes the physics component scale-invariant.

---

The HIGs on the other hand, go back to first order information in the form of Jacobians. They show how useful the inversion can be even without any higher order terms. At the same time, they make use of a combined inversion of NN and physics, taking into account all samples of a mini-batch to compute an optimal first-order direction.

✅ Pro HIG: 
- Robustly addresses scaling issues, jointly for physical models and NN.
- Improved convergence and model performance.

❌ Con HIG: 
- Requires an SVD for a potentially large Jacobian matrix.
- This can be costly in terms of runtime and memory.

---

In both cases, the resulting neural networks can yield a performance that we simply can't obtain by, e.g., training longer with a simpler DP or supervised approach. So, if we plan to evaluate these models often, e.g., shipping them in an application, this increased one-time cost will pay off in the long run.

This concludes the chapter on improved learning methods for physics-based NNs. 
It's clearly an active topic of research, with plenty of room for new methods, but the algorithms here already
indicate the potential of tailored learning algorithms for physical problems. 
This also concludes the focus on numerical simulations as DL components. In the next chapter, we'll instead
focus on a different statistical viewpoint, namely the inclusion of uncertainty.
minor updates 2022-04-20 08:58:36 +02:00			`Discussion of Improved Gradients`
added missing files 2021-03-03 04:17:43 +01:00			`=======================`

numerous smaller fixes in physgrad chapter 2022-04-24 13:59:09 +02:00			At this point it's a good time to take another step back, and assess the different methods introduced so far. For deep learning applications, we can broadly distinguish three approaches: the _regular_ differentiable physics (DP) training, the training with half-inverse gradients (HIGs), and using the scale-invariant physics updates (SIPs). Unfortunately, we can't simply discard two of them, and focus on a single approach for all future endeavours. However, discussing the pros and cons sheds light on some fundamental aspects of physics-based deep learning.
some HIG PG discussion 2022-01-01 14:50:26 +01:00
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00			`![Divider](resources/divider7.jpg)`
some HIG PG discussion 2022-01-01 14:50:26 +01:00
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00			`## Addressing scaling issues`
some HIG PG discussion 2022-01-01 14:50:26 +01:00
updated discussion 2022-04-18 13:30:24 +02:00			First and foremost, a central motivation for improved updates is the need to address the scaling issues of the learning problems. This is not a completely new problem: numerous deep learning algorithms were proposed to address these for training NNs. However, the combination of NNs with physical simulations brings new challenges that at the same time provide new angles to tackle this problem. On the negative side, we have additional, highly non-linear operators from the PDE models. On the positive side, these operators typically do not have free parameters during learning, and thus can be treated with different, tailored methods.
some HIG PG discussion 2022-01-01 14:50:26 +01:00
numerous smaller fixes in physgrad chapter 2022-04-24 13:59:09 +02:00			`This is exactly where HIGs and SIPs come in: instead of treating the physical simulation like the rest of the NNs (this is the DP approach), they show how much can be achieved with custom inverse solvers (SIPs) or a custom numerical inversion (HIGs). Both methods make important steps towards _scale-invariant_ training.`
some HIG PG discussion 2022-01-01 14:50:26 +01:00
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00			`## Computational Resources`

updated discussion 2022-04-18 13:30:24 +02:00			Both cases usually lead to more complicated and resource intensive training. However, assuming that we can re-use a trained model many times after the training has been completed, there are many areas of application where this can quickly pay off: the trained NNs, despite being identical in runtime to those obtained from other training methods, often achieve significantly improved accuracies. Achieving similar levels of accuracy with regular Adam and DP-based training can be completely infeasible.
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00
			`When such a trained NN is used, e.g., as a surrogate model for an inverse problem, it might be executed a large number of times, and the improved accuracy can save correspondingly large amounts of computational resources in such a follow up stage.`
			A good potential example are shape optimizations for the drag reduction of bodies immersed in a fluid {cite}`chen2021numerical`.
some HIG PG discussion 2022-01-01 14:50:26 +01:00
PG conclusions, list formatting 2021-03-26 03:28:05 +01:00
added missing files 2021-03-03 04:17:43 +01:00
updating SIP example 2022-03-20 11:32:27 +01:00
updated discussion 2022-04-18 13:30:24 +02:00			`![Divider](resources/divider1.jpg)`
updating SIP example 2022-03-20 11:32:27 +01:00

updated discussion 2022-04-18 13:30:24 +02:00			`## Summary`
updating SIP example 2022-03-20 11:32:27 +01:00
updated discussion 2022-04-18 13:30:24 +02:00			`To summarize, this chapter demonstrated the importance of the inversion.`
			`An important takeaway message is that`
			`the regular gradients from NN training are not the best choice when PDEs are`
updating SIP example 2022-03-20 11:32:27 +01:00			`involved. In these situations we can get much better information about how to direct the`
			`optimization than the localized first-order information that regular gradients provide.`

updated discussion 2022-04-18 13:30:24 +02:00			`Even when the inversion is only done for the physics simulation component (as with SIPs), it can substantially improve the learning process. The custom inverse solvers allow us to employ higher-order information in the training.`
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00
updated discussion 2022-04-18 13:30:24 +02:00			`✅ Pro SIP:`
PG, HIG discussion 2022-01-04 15:08:36 +01:00			`- Very accurate "gradient" information for physical simulations.`
			`- Often strongly improved convergence and model performance.`
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00
updated discussion 2022-04-18 13:30:24 +02:00			`❌ Con SIP:`
			`- Require inverse simulators (at least local ones).`
numerous smaller fixes in physgrad chapter 2022-04-24 13:59:09 +02:00			`- Only makes the physics component scale-invariant.`
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00
			`---`

updated discussion 2022-04-18 13:30:24 +02:00			`The HIGs on the other hand, go back to first order information in the form of Jacobians. They show how useful the inversion can be even without any higher order terms. At the same time, they make use of a combined inversion of NN and physics, taking into account all samples of a mini-batch to compute an optimal first-order direction.`
PG, HIG discussion 2022-01-04 15:08:36 +01:00
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00			`✅ Pro HIG:`
			`- Robustly addresses scaling issues, jointly for physical models and NN.`
PG conclusions, list formatting 2021-03-26 03:28:05 +01:00			`- Improved convergence and model performance.`
added missing files 2021-03-03 04:17:43 +01:00
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00			`❌ Con HIG:`
updated discussion 2022-04-18 13:30:24 +02:00			`- Requires an SVD for a potentially large Jacobian matrix.`
			`- This can be costly in terms of runtime and memory.`
moved RL chapter up, added dividers 2022-01-04 13:59:22 +01:00
			`---`

numerous smaller fixes in physgrad chapter 2022-04-24 13:59:09 +02:00			`In both cases, the resulting neural networks can yield a performance that we simply can't obtain by, e.g., training longer with a simpler DP or supervised approach. So, if we plan to evaluate these models often, e.g., shipping them in an application, this increased one-time cost will pay off in the long run.`
PG, HIG discussion 2022-01-04 15:08:36 +01:00
added first figure, cleanup 2022-04-21 15:48:16 +02:00			`This concludes the chapter on improved learning methods for physics-based NNs.`
			`It's clearly an active topic of research, with plenty of room for new methods, but the algorithms here already`
			`indicate the potential of tailored learning algorithms for physical problems.`
			`This also concludes the focus on numerical simulations as DL components. In the next chapter, we'll instead`
numerous smaller fixes in physgrad chapter 2022-04-24 13:59:09 +02:00			`focus on a different statistical viewpoint, namely the inclusion of uncertainty.`