corrections from Nuttapong, thanks
This commit is contained in:
parent
edd46c172c
commit
8425be8fd2
@ -1,7 +1,7 @@
|
||||
Introduction to Posterior Inference
|
||||
=======================
|
||||
|
||||
We should keep in mind that for all measurements, models, and discretizations we have uncertainties. For measurements and observations, this typically appears in the form of measurement errors. Model equations equations, on the other hand, usually encompass only parts of a system we're interested in (leaving the remainder as an uncertainty), while for numerical simulations we inherently introduce discretization errors. So a very important question to ask here is how we can be sure that an answer we obtain is the correct one. From a statisticians viewpoint, we'd like to know the posterior pobability distribution, a distribution that captures possible uncertainties we have about our model or data.
|
||||
We should keep in mind that for all measurements, models, and discretizations we have uncertainties. For measurements and observations, this typically appears in the form of measurement errors. Model equations equations, on the other hand, usually encompass only parts of a system we're interested in (leaving the remainder as an uncertainty), while for numerical simulations we inherently introduce discretization errors. So a very important question to ask here is how we can be sure that an answer we obtain is the correct one. From a statisticians viewpoint, we'd like to know the posterior probability distribution, a distribution that captures possible uncertainties we have about our model or data.
|
||||
|
||||
## Uncertainty
|
||||
|
||||
@ -12,7 +12,7 @@ yields a _maximum likelihood estimation_ (MLE) for the parameters of the network
|
||||
However, this MLE viewpoint does not take any of the uncertainties mentioned above into account:
|
||||
for DL training, we likewise have a numerical optimization, and hence an inherent
|
||||
approximation error and uncertainty regarding the learned representation.
|
||||
Ideally, we should reformulate our the learning process such that it takes
|
||||
Ideally, we should reformulate the learning process such that it takes
|
||||
its own uncertainties into account, and it should make
|
||||
_posterior inference_ possible,
|
||||
i.e. learn to produce the full output distribution. However, this turns out to be an
|
||||
|
@ -72,7 +72,7 @@ $$
|
||||
$$
|
||||
|
||||
where the prediction network is denoted by $f_p$ to distinguish it from encoder and decoder, above.
|
||||
This already implies that we're facing a recurrent task: any $ith$ step is
|
||||
This already implies that we're facing a recurrent task: any $i$th step is
|
||||
the result of $i$ evaluations of $f_p$, i.e. $\mathbf{c}_{t+i} = f_p^{(i)}( \mathbf{c}_{t};\theta_p)$.
|
||||
As there is an inherent per-evaluation error, it is typically important to train this process
|
||||
for more than a single step, such that the $f_p$ network "sees" the drift it produces in terms
|
||||
|
@ -69,7 +69,7 @@ The following section will give a brief outlook for the model equations
|
||||
we'll be using later on in the DL examples.
|
||||
We typically target continuous PDEs denoted by $\mathcal P^*$
|
||||
whose solution is of interest in a spatial domain $\Omega \subset \mathbb{R}^d$ in $d \in {1,2,3} $ dimensions.
|
||||
In addition, wo often consider a time evolution for a finite time interval $t \in \mathbb{R}^{+}$.
|
||||
In addition, we often consider a time evolution for a finite time interval $t \in \mathbb{R}^{+}$.
|
||||
The corresponding fields are either d-dimensional vector fields, for instance $\mathbf{u}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}^d$,
|
||||
or scalar $\mathbf{p}: \mathbb{R}^d \times \mathbb{R}^{+} \rightarrow \mathbb{R}$.
|
||||
The components of a vector are typically denoted by $x,y,z$ subscripts, i.e.,
|
||||
|
@ -82,9 +82,9 @@
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"The inflow will be used to inject smoke into a second centered grid `smoke` that represents the marker field $d$ from above. Note that we've defined a `Box` of size $100x80$ above. This is the physical scale in terms of spatial units in our simulation, i.e., a velocity of magnitude $1$ will move the smoke density by 1 unit per 1 time unit, which may be larger or smaller than a cell in the discretized grid, depending on the settings for `x,y`. You could parametrize your simulation grid to directly resemble real-world units, or keep appropriate conversion factors in mind. \n",
|
||||
"The inflow will be used to inject smoke into a second centered grid `smoke` that represents the marker field $d$ from above. Note that we've defined a `Box` of size $100 \times 80$ above. This is the physical scale in terms of spatial units in our simulation, i.e., a velocity of magnitude $1$ will move the smoke density by 1 unit per 1 time unit, which may be larger or smaller than a cell in the discretized grid, depending on the settings for `x,y`. You could parametrize your simulation grid to directly resemble real-world units, or keep appropriate conversion factors in mind. \n",
|
||||
"\n",
|
||||
"The inflow sphere above is already using the \"world\" coordinates: it is located at $x=30$ along the first axis, and $y=15$ (within the $100x80$ domain box).\n",
|
||||
"The inflow sphere above is already using the \"world\" coordinates: it is located at $x=30$ along the first axis, and $y=15$ (within the $100 \times 80$ domain box).\n",
|
||||
"\n",
|
||||
"Next, we create grids for the quantities we want to simulate. For this example, we require a velocity field and a smoke density field."
|
||||
],
|
||||
|
@ -43,8 +43,6 @@ The objective of the actor inherently depends on the output of the critic networ
|
||||
|
||||
This interdependence can promote instabilities, e.g., as strongly over- or underestimated state values can give wrong impulses during learning. Actions yielding higher rewards often also contribute to reaching states with higher informational value. As a consequence, when the - possibly incorrect - value estimate of individual samples are allowed to unrestrictedly affect the agent's behavior, the learning progress can collapse.
|
||||
|
||||
DEBUG TEST t’s t’s agent’s vs's TODO remove!!!
|
||||
|
||||
PPO was introduced as a method to specifically counteract this problem. The idea is to restrict the influence that individual state value estimates can have on the change of the actor's behavior during learning. PPO is a popular choice especially when working on continuous action spaces. This can be attributed to the fact that it tends to achieve good results with a stable learning progress, while still being comparatively easy to implement.
|
||||
|
||||
### PPO-clip
|
||||
|
@ -196,7 +196,7 @@
|
||||
"\n",
|
||||
"We also set up some globals to control training parameters, maybe most importantly: the learning rate `LR`, i.e. $\\eta$ from the previous setions. When your training run doesn't converge this is the first parameter to experiment with.\n",
|
||||
"\n",
|
||||
"Here, we'll keep it relatively small throughout. (Using _learning rate decay_ would be better, i.e. potentially give an improed convergence, but is omitted here for clarity.) "
|
||||
"Here, we'll keep it relatively small throughout. (Using _learning rate decay_ would be better, i.e. potentially give an improved convergence, but is omitted here for clarity.) "
|
||||
]
|
||||
},
|
||||
{
|
||||
|
Loading…
Reference in New Issue
Block a user