added know your data section, minor cleanup

2021-08-03 21:55:42 +02:00 · 2021-08-03 21:55:42 +02:00 · 215b5024f6
commit 215b5024f6
parent 7910aa23e9
7 changed files with 46 additions and 7 deletions
--- a/intro.md
+++ b/intro.md
@ -74,11 +74,15 @@ This project would not have been possible without the help of many people who co
 - [Nils Thuerey](https://ge.in.tum.de/about/n-thuerey/)
 - [Kiwon Um](https://ge.in.tum.de/about/kiwon/)

+Additional thanks go to 
+Georg Kohl for the nice divider images (cf. {cite}`kohl2020lsim`), 
+Li-Wei Chen for the airfoil data image, 
+and to 
+Chloe Paillard for proofreading parts of the document.
+
 % future:
 % - [Georg Kohl](https://ge.in.tum.de/about/georg-kohl/)

-% proofreading acks:
-% - Chloe Pailard

 ## Citation

--- a/json-cleanup-for-pdf.py
+++ b/json-cleanup-for-pdf.py
@ -49,6 +49,8 @@ for fnOut in fileList:
 	re1 = re.compile(r"WARNING:tensorflow:")
 	re2 = re.compile(r"UserWarning:")
 	re4 = re.compile(r"DeprecationWarning:")
+	re5 = re.compile(r"InsecureRequestWarning:") # for https download
+	# remove all "warnings.warn" from phiflow?

 	# shorten data line: "0.008612174447657694, 0.02584669669548606, 0.043136357266407785"
 	re3 = re.compile(r"\[0.008612174447657694, 0.02584669669548606, 0.043136357266407785.+\]" )
@ -93,6 +95,7 @@ for fnOut in fileList:
 						nums.append( re1.search( d[t][i]["outputs"][j]["text"][k] ) )
 						nums.append( re2.search( d[t][i]["outputs"][j]["text"][k] ) )
 						nums.append( re4.search( d[t][i]["outputs"][j]["text"][k] ) )
+						nums.append( re5.search( d[t][i]["outputs"][j]["text"][k] ) )
 						if (nums[0] is None) and (nums[1] is None):
 							okay = okay+1
 						else: # delete line "dell"
--- a/others-lagrangian.md
+++ b/others-lagrangian.md
@ -1,4 +1,4 @@
-Meshless Methods
+Unstructured Meshes and Meshless Methods
 =======================

 For all computer-based methods we need to find a suitable _discrete_ representation.
--- a/others-timeseries.md
+++ b/others-timeseries.md
@ -138,6 +138,8 @@ learned time evolution with a numerically solved advection step.
 The learned prediction is shown at the top, the reference simulation at the bottom.
 ```

+To summarize, DL allows us to move from linear subspaces to non-linear manifolds, and provides a basis for performing
+complex steps (such as time evolutions) in the resulting latent space.

 ## Source code

--- a/overview-equations.md
+++ b/overview-equations.md
@ -70,7 +70,7 @@ we'll be using later on in the DL examples.
 We typically target continuous PDEs denoted by $\mathcal P^*$
 whose solution is of interest in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
 In addition, wo often consider a time evolution for a finite time interval $t \in \mathbb{R}^{+}$.
-The corresponding fields are either d-dimensional vector fields, e.g. $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$, 
+The corresponding fields are either d-dimensional vector fields, for instance $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$, 
 or scalar $\mathbf{p}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}$.
 The components of a vector are typically denoted by $x,y,z$ subscripts, i.e.,
 $\mathbf{v} = (v_x, v_y, v_z)^T$ for $d=3$, while
@ -203,8 +203,8 @@ in implementations, effectively computing an instantaneous pressure.
 An interesting variant is obtained by including the 
 [Boussinesq approximation](https://en.wikipedia.org/wiki/Boussinesq_approximation_(buoyancy))
 for varying densities, e.g., for simple temperature changes of the fluid.
-With a marker field $v$, e.g., indicating regions of high temperature,
-this yields the following set of equations:
+With a marker field $v$ that indicates regions of high temperature,
+it yields the following set of equations:

 $$\begin{aligned}
  \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= - \frac{\Delta t}{\rho} \nabla p 
--- a/references.bib
+++ b/references.bib
@ -897,7 +897,7 @@
@article{schulman2015high,
  title={High-dimensional continuous control using generalized advantage estimation},
  author={Schulman, John and Moritz, Philipp and Levine, Sergey and Jordan, Michael and Abbeel, Pieter},
-  journal={arXiv preprint arXiv:1506.02438},
+  journal={arXiv:1506.02438},
  year={2015}
 }

--- a/supervised-discuss.md
+++ b/supervised-discuss.md
@ -50,6 +50,36 @@ as the most central hyperparameter.
 You'll probably need to reduce it later on, but you should at least get a 
 rough estimate of suitable values for $\eta$.

+### Know your data
+
+All data-driven methods obey the _garbage-in-garbage-out_ principle. Because of this it's important
+to work on getting to know the data you are dealing with. While there's no one-size-fits-all
+approach for how to best achieve this, we can strongly recommend to track
+a broad range of statistics of your data set. A good starting point are
+per quantity mean, standard deviation, min and max values. 
+If some of these contain unusual values, this is a first indicator of bad 
+samples in the dataset.
+
+These values can 
+also be easily visualized in terms of histograms, to track down
+unwanted outliers. A small number of such outliers 
+can easily skew a data set in undesirable ways.
+
+Finally, checking the relationships between different quantities 
+is often a good idea to get some intuition for what's contained in the
+data set. The next figure gives an example for this step.
+
+```{figure} resources/supervised-example-plot.jpg
+---
+height: 300px
+name: supervised-example-plot
+---
+An example from the airfoil case of the previous section: a visualization of a training data 
+set in terms of mean u and v velocity of 2D flow fields. It nicely shows that there are no extreme outliers,
+but there are a few entries with relatively low mean u velocity on the left side. 
+A second, smaller data set is shown on top in red, showing that its samples cover the range of mean motions quite well.
+```
+
 ### Where's the magic? 🦄 

 A comment that you'll often hear when talking about DL approaches, and especially