added know your data section, minor cleanup

2021-08-03 21:55:42 +02:00 · 2021-08-03 21:55:42 +02:00 · 215b5024f6
commit 215b5024f6
parent 7910aa23e9
7 changed files with 46 additions and 7 deletions
--- a/intro.md
+++ b/intro.md
@ -74,11 +74,15 @@ This project would not have been possible without the help of many people who co
 - [Nils Thuerey](https://ge.in.tum.de/about/n-thuerey/)
 - [Kiwon Um](https://ge.in.tum.de/about/kiwon/)
 Additional thanks go to 
 Georg Kohl for the nice divider images (cf. {cite}`kohl2020lsim`), 
 Li-Wei Chen for the airfoil data image, 
 and to 
 Chloe Paillard for proofreading parts of the document.
 % future:
 % - [Georg Kohl](https://ge.in.tum.de/about/georg-kohl/)
 % proofreading acks:
 % - Chloe Pailard
 ## Citation
--- a/json-cleanup-for-pdf.py
+++ b/json-cleanup-for-pdf.py
@ -49,6 +49,8 @@ for fnOut in fileList:
 	re1 = re.compile(r"WARNING:tensorflow:")
 	re2 = re.compile(r"UserWarning:")
 	re4 = re.compile(r"DeprecationWarning:")
 	re5 = re.compile(r"InsecureRequestWarning:") # for https download
 	# remove all "warnings.warn" from phiflow?
 	# shorten data line: "0.008612174447657694, 0.02584669669548606, 0.043136357266407785"
 	re3 = re.compile(r"\[0.008612174447657694, 0.02584669669548606, 0.043136357266407785.+\]" )
@ -93,6 +95,7 @@ for fnOut in fileList:
 						nums.append( re1.search( d[t][i]["outputs"][j]["text"][k] ) )
 						nums.append( re2.search( d[t][i]["outputs"][j]["text"][k] ) )
 						nums.append( re4.search( d[t][i]["outputs"][j]["text"][k] ) )
 						nums.append( re5.search( d[t][i]["outputs"][j]["text"][k] ) )
 						if (nums[0] is None) and (nums[1] is None):
 							okay = okay+1
 						else: # delete line "dell"
--- a/others-lagrangian.md
+++ b/others-lagrangian.md
@ -1,4 +1,4 @@
-Meshless Methods
+Unstructured Meshes and Meshless Methods
 =======================
 For all computer-based methods we need to find a suitable _discrete_ representation.
--- a/others-timeseries.md
+++ b/others-timeseries.md
@ -138,6 +138,8 @@ learned time evolution with a numerically solved advection step.
 The learned prediction is shown at the top, the reference simulation at the bottom.
 ```
 To summarize, DL allows us to move from linear subspaces to non-linear manifolds, and provides a basis for performing
 complex steps (such as time evolutions) in the resulting latent space.
 ## Source code
--- a/overview-equations.md
+++ b/overview-equations.md
@ -70,7 +70,7 @@ we'll be using later on in the DL examples.
 We typically target continuous PDEs denoted by $\mathcal P^*$
 whose solution is of interest in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
 In addition, wo often consider a time evolution for a finite time interval $t \in \mathbb{R}^{+}$.
-The corresponding fields are either d-dimensional vector fields, e.g. $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$, 
+The corresponding fields are either d-dimensional vector fields, for instance $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$, 
 or scalar $\mathbf{p}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}$.
 The components of a vector are typically denoted by $x,y,z$ subscripts, i.e.,
 $\mathbf{v} = (v_x, v_y, v_z)^T$ for $d=3$, while
@ -203,8 +203,8 @@ in implementations, effectively computing an instantaneous pressure.
 An interesting variant is obtained by including the 
 [Boussinesq approximation](https://en.wikipedia.org/wiki/Boussinesq_approximation_(buoyancy))
 for varying densities, e.g., for simple temperature changes of the fluid.
-With a marker field $v$, e.g., indicating regions of high temperature,
+With a marker field $v$ that indicates regions of high temperature,
-this yields the following set of equations:
+it yields the following set of equations:
 $$\begin{aligned}
  \frac{\partial u_x}{\partial{t}} + \mathbf{u} \cdot \nabla u_x &= - \frac{\Delta t}{\rho} \nabla p 
--- a/references.bib
+++ b/references.bib
@ -897,7 +897,7 @@
@article{schulman2015high,
  title={High-dimensional continuous control using generalized advantage estimation},
  author={Schulman, John and Moritz, Philipp and Levine, Sergey and Jordan, Michael and Abbeel, Pieter},
-  journal={arXiv preprint arXiv:1506.02438},
+  journal={arXiv:1506.02438},
  year={2015}
 }
--- a/supervised-discuss.md
+++ b/supervised-discuss.md
@ -50,6 +50,36 @@ as the most central hyperparameter.
 You'll probably need to reduce it later on, but you should at least get a 
 rough estimate of suitable values for $\eta$.
 ### Know your data
 All data-driven methods obey the _garbage-in-garbage-out_ principle. Because of this it's important
 to work on getting to know the data you are dealing with. While there's no one-size-fits-all
 approach for how to best achieve this, we can strongly recommend to track
 a broad range of statistics of your data set. A good starting point are
 per quantity mean, standard deviation, min and max values. 
 If some of these contain unusual values, this is a first indicator of bad 
 samples in the dataset.
 These values can 
 also be easily visualized in terms of histograms, to track down
 unwanted outliers. A small number of such outliers 
 can easily skew a data set in undesirable ways.
 Finally, checking the relationships between different quantities 
 is often a good idea to get some intuition for what's contained in the
 data set. The next figure gives an example for this step.
 ```{figure} resources/supervised-example-plot.jpg
 ---
 height: 300px
 name: supervised-example-plot
 ---
 An example from the airfoil case of the previous section: a visualization of a training data 
 set in terms of mean u and v velocity of 2D flow fields. It nicely shows that there are no extreme outliers,
 but there are a few entries with relatively low mean u velocity on the left side. 
 A second, smaller data set is shown on top in red, showing that its samples cover the range of mean motions quite well.
 ```
 ### Where's the magic? 🦄 
 A comment that you'll often hear when talking about DL approaches, and especially