added know your data section, minor cleanup

This commit is contained in:
NT 2021-08-03 21:55:42 +02:00
parent 7910aa23e9
commit 215b5024f6
7 changed files with 46 additions and 7 deletions

View File

@ -74,11 +74,15 @@ This project would not have been possible without the help of many people who co
- [Nils Thuerey](https://ge.in.tum.de/about/n-thuerey/)
- [Kiwon Um](https://ge.in.tum.de/about/kiwon/)
Additional thanks go to
Georg Kohl for the nice divider images (cf. {cite}`kohl2020lsim`),
Li-Wei Chen for the airfoil data image,
and to
Chloe Paillard for proofreading parts of the document.
% future:
% - [Georg Kohl](https://ge.in.tum.de/about/georg-kohl/)
% proofreading acks:
% - Chloe Pailard
## Citation

View File

@ -49,6 +49,8 @@ for fnOut in fileList:
re1 = re.compile(r"WARNING:tensorflow:")
re2 = re.compile(r"UserWarning:")
re4 = re.compile(r"DeprecationWarning:")
re5 = re.compile(r"InsecureRequestWarning:") # for https download
# remove all "warnings.warn" from phiflow?
# shorten data line: "0.008612174447657694, 0.02584669669548606, 0.043136357266407785"
re3 = re.compile(r"\[0.008612174447657694, 0.02584669669548606, 0.043136357266407785.+\]" )
@ -93,6 +95,7 @@ for fnOut in fileList:
nums.append( re1.search( d[t][i]["outputs"][j]["text"][k] ) )
nums.append( re2.search( d[t][i]["outputs"][j]["text"][k] ) )
nums.append( re4.search( d[t][i]["outputs"][j]["text"][k] ) )
nums.append( re5.search( d[t][i]["outputs"][j]["text"][k] ) )
if (nums[0] is None) and (nums[1] is None):
okay = okay+1
else: # delete line "dell"

View File

@ -1,4 +1,4 @@
Meshless Methods
Unstructured Meshes and Meshless Methods
=======================
For all computer-based methods we need to find a suitable _discrete_ representation.

View File

@ -138,6 +138,8 @@ learned time evolution with a numerically solved advection step.
The learned prediction is shown at the top, the reference simulation at the bottom.
```
To summarize, DL allows us to move from linear subspaces to non-linear manifolds, and provides a basis for performing
complex steps (such as time evolutions) in the resulting latent space.
## Source code

View File

@ -70,7 +70,7 @@ we'll be using later on in the DL examples.
We typically target continuous PDEs denoted by $\mathcal P^*$
whose solution is of interest in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
In addition, wo often consider a time evolution for a finite time interval $t \in \mathbb{R}^{+}$.
The corresponding fields are either d-dimensional vector fields, e.g. $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$,
The corresponding fields are either d-dimensional vector fields, for instance $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$,
or scalar $\mathbf{p}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}$.
The components of a vector are typically denoted by $x,y,z$ subscripts, i.e.,
$\mathbf{v} = (v_x, v_y, v_z)^T$ for $d=3$, while
@ -203,8 +203,8 @@ in implementations, effectively computing an instantaneous pressure.
An interesting variant is obtained by including the
[Boussinesq approximation](https://en.wikipedia.org/wiki/Boussinesq_approximation_(buoyancy))
for varying densities, e.g., for simple temperature changes of the fluid.
With a marker field $v$, e.g., indicating regions of high temperature,
this yields the following set of equations:
With a marker field $v$ that indicates regions of high temperature,
it yields the following set of equations:
$$\begin{aligned}
\frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= - \frac{\Delta t}{\rho} \nabla p

View File

@ -897,7 +897,7 @@
@article{schulman2015high,
title={High-dimensional continuous control using generalized advantage estimation},
author={Schulman, John and Moritz, Philipp and Levine, Sergey and Jordan, Michael and Abbeel, Pieter},
journal={arXiv preprint arXiv:1506.02438},
journal={arXiv:1506.02438},
year={2015}
}

View File

@ -50,6 +50,36 @@ as the most central hyperparameter.
You'll probably need to reduce it later on, but you should at least get a
rough estimate of suitable values for $\eta$.
### Know your data
All data-driven methods obey the _garbage-in-garbage-out_ principle. Because of this it's important
to work on getting to know the data you are dealing with. While there's no one-size-fits-all
approach for how to best achieve this, we can strongly recommend to track
a broad range of statistics of your data set. A good starting point are
per quantity mean, standard deviation, min and max values.
If some of these contain unusual values, this is a first indicator of bad
samples in the dataset.
These values can
also be easily visualized in terms of histograms, to track down
unwanted outliers. A small number of such outliers
can easily skew a data set in undesirable ways.
Finally, checking the relationships between different quantities
is often a good idea to get some intuition for what's contained in the
data set. The next figure gives an example for this step.
```{figure} resources/supervised-example-plot.jpg
---
height: 300px
name: supervised-example-plot
---
An example from the airfoil case of the previous section: a visualization of a training data
set in terms of mean u and v velocity of 2D flow fields. It nicely shows that there are no extreme outliers,
but there are a few entries with relatively low mean u velocity on the left side.
A second, smaller data set is shown on top in red, showing that its samples cover the range of mean motions quite well.
```
### Where's the magic? 🦄
A comment that you'll often hear when talking about DL approaches, and especially