Merge branch 's-ccs:main' into main
This commit is contained in:
commit
3c12f1b53a
21
LICENSE
Normal file
21
LICENSE
Normal file
@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2023 s-ccs
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
BIN
TemperatureData.jld2
Normal file
BIN
TemperatureData.jld2
Normal file
Binary file not shown.
@ -42,7 +42,7 @@ website:
|
||||
- href: material/1_mon/rse/rse_basics_slides.qmd
|
||||
text: "📊 1 - RSE"
|
||||
- href: "material/1_mon/why_julia/page.qmd"
|
||||
text: "📊 2 - Why Julia"
|
||||
text: "📊 2 - Why Julia"
|
||||
- href: "material/1_mon/firststeps/firststeps_handout.qmd"
|
||||
text: "📝 3 - First Steps: Handout"
|
||||
- href: "material/1_mon/firststeps/tasks.qmd"
|
||||
@ -69,11 +69,11 @@ website:
|
||||
contents:
|
||||
- href: material/3_wed/docs/handout.qmd
|
||||
text: "📝 1 - Docs: Handout"
|
||||
- href: material/3_wed/docs/tasks.qmd"
|
||||
- href: material/3_wed/docs/tasks.qmd
|
||||
text: "🛠 1 - Docs: Exercises"
|
||||
- href: material/3_wed/vis/handout.qmd
|
||||
text: "📝 2 - Visualizations: Handout"
|
||||
- href: material/3_wed/vis/tasks.qmd"
|
||||
- href: material/3_wed/vis/tasks.qmd
|
||||
text: "🛠 2 - Visualizations: Exercises"
|
||||
- href: material/3_wed/linalg/slides.qmd
|
||||
text: "📝 3 - LinearAlgebra"
|
||||
@ -84,7 +84,7 @@ website:
|
||||
contents:
|
||||
- href: material/4_thu/sim/slides.qmd
|
||||
text: "📝 1 - Simulation"
|
||||
- href: material/4_thu/stats/missing.jl
|
||||
- href: material/4_thu/bootstrap/Bootstrap.qmd
|
||||
text: "📝 2 - Bootstrapping"
|
||||
- href: material/4_thu/parallel/slides.qmd
|
||||
text: "📝 3 - Parallelization"
|
||||
|
@ -40,6 +40,6 @@ And that's it! You should have a nice progress bar now
|
||||
|
||||
1. Implement a type `StatResult` with fields for `x`, `n`, `std` and `tvalue`
|
||||
2. Implement an outer constructor that can run `StatResult(2:10)` and return the full type including the calculated t-values.
|
||||
3. Implement a function `length` for `StatResult` to multiple-dispatch on
|
||||
3. Implement a function length for StatResult (using multiple-dispatch) which returns the `.n` field. Overload "Base.length"
|
||||
4. **Optional:** If you have time, optimize the functions, so that mean, sum, length, std etc. is not calculated multiple times - you might want to rewrite your type. Note: This is a bit tricky :)
|
||||
|
||||
|
@ -1,3 +1,7 @@
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
::: callout
|
||||
[Link to Slides](slides.qmd)
|
||||
:::
|
||||
@ -7,6 +11,7 @@
|
||||
Solve [task 1](tasks.qmd#1)
|
||||
|
||||
----
|
||||
|
||||
# Documenter.jl
|
||||
|
||||
### File-structure overview
|
||||
@ -44,7 +49,9 @@ makedocs(
|
||||
```
|
||||
|
||||
### How to generate
|
||||
`julia --project=docs/ docs/make.jl` or `]activate docs/; include("make.jl")` or use `LiveServer.jl`
|
||||
- `julia --project=docs/ docs/make.jl`, or
|
||||
- `]activate docs/; include("make.jl")`, or
|
||||
- `LiveServer.jl` + `deploydocs()`
|
||||
|
||||
### How to write
|
||||
|
||||
|
@ -10,17 +10,31 @@
|
||||
```
|
||||
docs/
|
||||
├── src/
|
||||
├── src/mydocs.jl
|
||||
├── src/index.md
|
||||
└── make.jl
|
||||
```
|
||||
|
||||
### add some docs
|
||||
2. with mydocs containing
|
||||
|
||||
````{verbatim}
|
||||
````
|
||||
|
||||
Show docstring of a single function `func`
|
||||
```@docs
|
||||
func(x)
|
||||
func
|
||||
```
|
||||
|
||||
Provide documentation for all doc_string-equiped functions
|
||||
```@autodocs
|
||||
Modules = [MyDocsPackage]
|
||||
```
|
||||
|
||||
Execute a code-block, but hide it's output
|
||||
```@example MyScope
|
||||
x = [1,2,3]
|
||||
nothing #hide
|
||||
```
|
||||
|
||||
````
|
||||
|
||||
and
|
||||
|
@ -1,6 +1,7 @@
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Slides
|
||||
|
||||
The slides are available [in pptx format here](Julia_Matrices_Optimization_JuMP_Stuttgart2023.pptx). Note that there are a few extra slides in case you are motivated to learn more!
|
||||
@ -8,4 +9,22 @@ The slides are available [in pptx format here](Julia_Matrices_Optimization_JuMP_
|
||||
## Exercise
|
||||
|
||||
The exercise is rendered [as html here](Julia_Matrices_Optimization_JuMP_Stuttgart2023.ipynb) but can also be downloaded {{< downloadthis Julia_Matrices_Optimization_JuMP_Stuttgart2023.ipynb label="Download as ipynb" >}}
|
||||
|
||||
|
||||
Before starting the exercise, please download the files [Manifest.toml](Manifest.toml) and [Project.toml](Project.toml), put the files in the same folder that \*.ipynb file is located.
|
||||
|
||||
Start Julia in that directory and run the following commands:
|
||||
|
||||
```{julia}
|
||||
|
||||
using Pkg; Pkg.activate("."); Pkg.instantiate()
|
||||
|
||||
```
|
||||
|
||||
Now, you are ready to open the notebook in VSCode or alternatively in the same Julia console that you used for installation. You can run the following commands:
|
||||
|
||||
```{julia}
|
||||
|
||||
using IJulia; notebook(dir=".")
|
||||
|
||||
```
|
||||
|
||||
|
@ -44,7 +44,7 @@ $$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad i=1,...,n,$$
|
||||
where $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$ are independent
|
||||
normally distributed errors with unknown variance $\sigma^2$.
|
||||
|
||||
*Task:* Find the straight line that fits best, i.e., find the *optimal*
|
||||
*Aim:* Find the straight line that fits best, i.e., find the *optimal*
|
||||
estimators for $\beta_0$ and $\beta_1$.
|
||||
|
||||
*Typical choice*: Least squares estimator (= maximum likelihood
|
||||
@ -152,7 +152,7 @@ Defining the *design matrix*
|
||||
$$ \mathbf{X} = \left( \begin{array}{cccc}
|
||||
1 & x_{11} & \ldots & x_{1p} \\
|
||||
\vdots & \vdots & \ddots & \vdots \\
|
||||
1 & x_{11} & \ldots & x_{1p}
|
||||
1 & x_{n1} & \ldots & x_{np}
|
||||
\end{array}\right) \qquad
|
||||
(\text{size } n \times (p+1)), $$
|
||||
|
||||
@ -177,7 +177,7 @@ regression model, but we provide explicit formulas now:
|
||||
- estimated standard errors:
|
||||
|
||||
$$
|
||||
\hat s_{\beta_i} = \sqrt{([\mathbf{X}^\top \mathbf{X}]^{-1})_{ii} \frac 1 {n-p} \|\mathbf{y} - \mathbf{X} \beta\|^2}
|
||||
\hat s_{\beta_i} = \sqrt{([\mathbf{X}^\top \mathbf{X}]^{-1})_{ii} \frac 1 {n-p-1} \|\mathbf{y} - \mathbf{X} \beta\|^2}
|
||||
$$
|
||||
|
||||
- $t$-statistics:
|
||||
@ -187,21 +187,21 @@ regression model, but we provide explicit formulas now:
|
||||
- $p$-values:
|
||||
|
||||
$$
|
||||
p\text{-value} = \mathbb{P}(|T| > t_i), \quad \text{where } T \sim t_{n-p}
|
||||
p\text{-value} = \mathbb{P}(|T| > t_i), \quad \text{where } T \sim t_{n-p-1}
|
||||
$$
|
||||
|
||||
::: {.callout-caution collapse="false"}
|
||||
|
||||
## Task 2
|
||||
|
||||
1. Implement functions that estimate the $\beta$-parameters,
|
||||
the corresponding standard errors and the $t$-statistics.
|
||||
2. Test your functions with the `tree' data set and try to reproduce the
|
||||
output above.
|
||||
1. Implement functions that estimate the $\beta$-parameters, the
|
||||
corresponding standard errors and the $t$-statistics.
|
||||
2. Test your functions with the \`tree' data set and try to reproduce
|
||||
the output above.
|
||||
:::
|
||||
|
||||
Which model is the best? For linear models, one often uses the $R^2$ characteristic.
|
||||
Roughly speaking, it gives the percentage (between 0 and 1) of the variance that can be explained by the linear model.
|
||||
Which model is the best? For linear models, one often uses the $R^2$
|
||||
characteristic. Roughly speaking, it gives the percentage (between 0
|
||||
and 1) of the variance that can be explained by the linear model.
|
||||
|
||||
``` julia
|
||||
r2(linmod1)
|
||||
@ -212,8 +212,11 @@ linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees)
|
||||
r2(linmod3)
|
||||
```
|
||||
|
||||
::: {.callout-note}
|
||||
The more covariates you add the more variance can be explained by the linear model - $R^2$ increases. In order to balance goodness-of-fit of a model and its complexity, information criteria such as `aic` are considered.
|
||||
::: callout-note
|
||||
The more covariates you add the more variance can be explained by the
|
||||
linear model - $R^2$ increases. In order to balance goodness-of-fit of a
|
||||
model and its complexity, information criteria such as `aic` are
|
||||
considered.
|
||||
:::
|
||||
|
||||
## Generalized Linear Models
|
||||
@ -256,30 +259,30 @@ $$
|
||||
|
||||
For the models above, these are:
|
||||
|
||||
+----------------+------------------+--------------------------+
|
||||
| Type of Data | Distribution | Link Function |
|
||||
| | Family | |
|
||||
+================+==================+==========================+
|
||||
| continuous | Normal | identity: |
|
||||
| | | |
|
||||
| | | $$ |
|
||||
| | | g(x)=x |
|
||||
| | | $$ |
|
||||
+----------------+------------------+--------------------------+
|
||||
| count | Poisson | log: |
|
||||
| | | |
|
||||
| | | $$ |
|
||||
| | | g(x) = \log(x) |
|
||||
| | | $$ |
|
||||
+----------------+------------------+--------------------------+
|
||||
| binary | Bernoulli | logit: |
|
||||
| | | |
|
||||
| | | $$ |
|
||||
| | | g(x) = \log\left( |
|
||||
| | | \frac{x}{1-x} |
|
||||
| | | \right) |
|
||||
| | | $$ |
|
||||
+----------------+------------------+--------------------------+
|
||||
+----------------+-----------------+-------------------------+
|
||||
| Type of Data | Distribution | Link Function |
|
||||
| | Family | |
|
||||
+================+=================+=========================+
|
||||
| continuous | Normal | identity: |
|
||||
| | | |
|
||||
| | | $$ |
|
||||
| | | g(x)=x |
|
||||
| | | $$ |
|
||||
+----------------+-----------------+-------------------------+
|
||||
| count | Poisson | log: |
|
||||
| | | |
|
||||
| | | $$ |
|
||||
| | | g(x) = \log(x) |
|
||||
| | | $$ |
|
||||
+----------------+-----------------+-------------------------+
|
||||
| binary | Bernoulli | logit: |
|
||||
| | | |
|
||||
| | | $$ |
|
||||
| | | g(x) = \log\left( |
|
||||
| | | \frac{x}{1-x} |
|
||||
| | | \right) |
|
||||
| | | $$ |
|
||||
+----------------+-----------------+-------------------------+
|
||||
|
||||
In general, the parameter vector $\beta$ is estimated via maximizing the
|
||||
likelihood, i.e.,
|
||||
@ -311,10 +314,33 @@ model = glm(@formula(participation ~ age^2),
|
||||
```
|
||||
|
||||
::: {.callout-caution collapse="false"}
|
||||
|
||||
## Task 3:
|
||||
|
||||
1. Reproduce the results of our data analysis of the `tree` data set
|
||||
using a generalized linear model with normal distribution family.
|
||||
2. Generate $n=20$ random covariates $\mathbf{x}$ and Poisson-distributed counting data with parameters $\beta_0 + \beta_1 x_i$. Re-estimate the parameters by a generalized linear model.
|
||||
2. Generate $n=20$ random covariates $\mathbf{x}$ and
|
||||
Poisson-distributed counting data with parameters
|
||||
$\beta_0 + \beta_1 x_i$. Re-estimate the parameters by a generalized
|
||||
linear model.
|
||||
:::
|
||||
|
||||
## Outlook: Linear Mixed Models
|
||||
|
||||
In the linear regression models so far, we assumed that the response
|
||||
variable $\mathbf{y}$ depends on the design matrix of covariates
|
||||
$\mathbf{X}$ - which are assumed to be given/fixed - multiplied by the
|
||||
so-called *fixed effects* coefficients $\mathbf{X}\beta$ and independent
|
||||
errors $\varepsilon$. However, in many situations, there are also random
|
||||
effects on several components of the response variable. These can be
|
||||
included in the model by adding another design matrix $\mathbf{Z}$
|
||||
multiplied by a random vector $u$, the so-called *random effects*
|
||||
coefficients, that are assumed to be jointly normally distributed with
|
||||
mean vector $0$ and variance-covariance matrix $\Sigma$ (typically *not*
|
||||
a diagonal matrix). In matrix notation, we have the following form:
|
||||
|
||||
$$
|
||||
\mathbf{y} = \mathbf{X} \beta + \mathbf{Z}u + \varepsilon
|
||||
$$
|
||||
|
||||
Maximizing the likelihood, we can estimate $\beta$ and optimally
|
||||
predict the random vector $u$.
|
||||
|
@ -3,6 +3,7 @@
|
||||
|
||||
## Backends
|
||||
Four backends:
|
||||
|
||||
1. `CairoMakie` - SVG
|
||||
2. `GLMakie` - 2D/3D/fast interactivity
|
||||
3. `WGLMakie` - Same as GLMakie, but in browser
|
||||
|
1740
material/3_wed/vis/nb_plutoCummean.jl
Normal file
1740
material/3_wed/vis/nb_plutoCummean.jl
Normal file
File diff suppressed because it is too large
Load Diff
@ -56,11 +56,11 @@ Note that there is no `cummean` function, but clever element-wise division in co
|
||||
::: {.callout-tip collapse="true"}
|
||||
## click to show solution
|
||||
|
||||
`cumsum(x) ./ 1:length(x)`
|
||||
`cumsum(x) ./ (1:length(x))`
|
||||
:::
|
||||
|
||||
## 3. Plotting!
|
||||
Now for your first plot. Use a `scatter` plot to visualize the cummulative mean output, if you do not generate a `Figure()` + `ax = f[1,1] = Axis(f)` manually, you can get it back by the scatter call. `f,ax,s = scatter()`. This is helpful as we later want to extend the `Axis` and `Figure` with other plot elements
|
||||
Now for your first plot. Use a `scatter` plot^[after a `using CairoMakie`] to visualize the cummulative mean output, if you do not generate a `Figure()` + `ax = f[1,1] = Axis(f)` manually, you can get it back by the scatter call. `f,ax,s = scatter()`. This is helpful as we later want to extend the `Axis` and `Figure` with other plot elements
|
||||
|
||||
Use `hlines!` to add a horizontal line at your "true" value
|
||||
|
||||
@ -70,7 +70,7 @@ Let's simulate 1000x datasets, each with a different seed, and take the mean ov
|
||||
|
||||
::: {.callout-tip collapse="true"}
|
||||
## click to show tip
|
||||
An easy way to call a function many times is to broadcast it on an array e.g. `1:1000` - you could also use `map` to do it, but I don't think it is as clear :)
|
||||
An easy way to call a function many times is to broadcast it on an array created e.g. via `1:1000` - you could also use `map` to do it, but I don't think it is as clear :)
|
||||
:::
|
||||
|
||||
|
||||
|
153
material/4_thu/bootstrap/Bootstrap.qmd
Normal file
153
material/4_thu/bootstrap/Bootstrap.qmd
Normal file
@ -0,0 +1,153 @@
|
||||
# Resampling-Based Statistics
|
||||
|
||||
## Motivation: Fitting Models to Data
|
||||
|
||||
Situation considered yesterday: We have data and want to fit a model with certain parameters (e.g., a linear model) -- we estimate the parameter.
|
||||
|
||||
*Notation:*
|
||||
|
||||
- data: $\mathbf{x} = (x_1, \ldots, x_n)$
|
||||
|
||||
- model with unknown parameter $\theta$
|
||||
|
||||
- estimate $\widehat \theta(\mathbf{x})$
|
||||
|
||||
### Working example: Fitting a normal distribution
|
||||
|
||||
- data: $\mathbf{x} = (x_1, \ldots, x_n)$
|
||||
|
||||
- model: $\mathcal{N}(\theta, \sigma^2)$, i.e., a normal distribution with unknown mean $\theta$ (that we want to estimate) and variance $\sigma^2$ (that we are less interested in)
|
||||
|
||||
- use empirical mean as estimator: $\widehat \theta(\mathbf{x}) = \overline{x} = \frac 1 n \sum_{i=1}^n x_i$
|
||||
|
||||
``` julia
|
||||
|
||||
using Distributions
|
||||
using Statistics
|
||||
using StatsPlots
|
||||
|
||||
d = Normal(0.0, 1.0)
|
||||
n = 100
|
||||
x = rand(d, n)
|
||||
θ = mean(x)
|
||||
```
|
||||
|
||||
*Problem:* Estimator [never]{.underline} gives the [exact]{.underline} result -- if you have random data, also the estimate is random.
|
||||
|
||||
*Aim:* Find the distribution (or at least the variance) of the estimator $\widehat \theta$ in order to get standard errors, confidence intervals, etc.
|
||||
|
||||
In some easy examples, you can calculate the distribution of $\widehat \theta$ theoretically. *Example:* If $x_i$ is $\mathcal{N}(\theta,\sigma^2)$ distributed, then the distribution of $\widehat \theta(\mathbf{x})$ is $\mathcal{N}(\theta, \sigma^2/n)$. Strategy: Estimate $\sigma^2$, e.g. via the sample variance $$ \widehat \sigma^2 = \frac 1 {n-1} \sum_{i=1}^n (x_i - \overline{x})^2 $$ and take the standard error, confidence intervals, etc. of the corresponding normal distribution.
|
||||
|
||||
``` julia
|
||||
|
||||
σ = std(x)
|
||||
est_d = Normal(θ, σ/sqrt(n))
|
||||
plot(est_d, legend=false)
|
||||
|
||||
ci_bounds = quantile(est_d, [0.025,0.975])
|
||||
vline!(ci_bounds)
|
||||
```
|
||||
|
||||
*Problem:* In more complex examples, we cannot calculate the distribution.
|
||||
|
||||
## The 'ideal' solution: Generate *new* data
|
||||
|
||||
In theory, one would ideally do the following:
|
||||
|
||||
1. Generate new independent data $\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(B)}$ (each sample of size $n$)
|
||||
2. Apply the estimator separately to each sample $\leadsto$ $\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})$
|
||||
3. Use the empirical distribution $\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})$ as a proxy to the theoretical one.
|
||||
|
||||
``` julia
|
||||
B = 1000
|
||||
est_vector_new = zeros(B)
|
||||
for i in 1:B
|
||||
x_new = rand(d, n)
|
||||
est_vector_new[i] = mean(x_new)
|
||||
end
|
||||
histogram(est_vector_new, legend=false)
|
||||
|
||||
ci_bounds_new = quantile(est_vector_new, [0.025, 0.975])
|
||||
vline!(ci_bounds_new)
|
||||
```
|
||||
|
||||
::: callout-attention
|
||||
*But:* In most real world situation, we can not generate new data if the distribution is unknown. We have to work with the data we have ...
|
||||
:::
|
||||
|
||||
## The practical solution: Resampling / Bootstrap
|
||||
|
||||
*Idea:* Use samples $\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(B)}$ that are not completely new, but obtained from [resampling]{.underline} the original data $\mathbf{x}$.
|
||||
|
||||
*Question:* How can one obtained another sample of the same size $n$? $\leadsto$ (re-)sampling with replacement
|
||||
|
||||
The overall procedure is as follows:
|
||||
|
||||
1. Generate $B$ samples $\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(B)}$ of size $n$ by independently resampling from $\mathbf{x}$ with replacement.
|
||||
2. Apply the estimator separately to each sample $\leadsto$ $\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})$
|
||||
3. Use the empirical distribution $\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})$ as a proxy to the theoretical one.
|
||||
|
||||
``` julia
|
||||
|
||||
est_vector_bs = zeros(B)
|
||||
for i in 1:B
|
||||
x_bs = rand(x, n)
|
||||
est_vector_bs[i] = mean(x_bs)
|
||||
end
|
||||
histogram(est_vector_bs, legend=false)
|
||||
|
||||
ci_bounds_bs = quantile(est_vector_bs, [0.025, 0.975])
|
||||
vline!(ci_bounds_bs)
|
||||
```
|
||||
|
||||
If the sample $\mathbf{x} = (x_1,\ldots,x_n)$ consists of independent and identically distributed data, the resampling procedure often provides a code proxy to the true (unknown) distribution of the estimator.
|
||||
|
||||
::: callout-note
|
||||
The above resampling procedure is called bootstrap (from ''To pull oneself up by one's bootstraps.'') as only data are used that are already available.
|
||||
:::
|
||||
|
||||
::: {.callout-caution collapse="false"}
|
||||
## Task 1
|
||||
|
||||
1. Reconsider the `tree` data set and the simple linear regression model `Volume ~ Girth`. Calculate a 95% confidence interval for $\beta_1$ via bootstrap and compare to the Julia output of the linear model.
|
||||
2. Use bootstrap to estimate the standard error for the predicted volume of a tree with `Girth=10` the output above.
|
||||
:::
|
||||
|
||||
*Problem:* If the data are *not* independent, the above (i.i.d.) bootstrap samples would have a misspecified dependence structure and therefore lead to a bad uncertainty estimate. For some situations, there are specific modifications of the bootstrap procedure (e.g. block bootstrap for time series), but they tend to work well only if dependence is sufficiently weak.
|
||||
|
||||
## Parametric Bootstrap
|
||||
|
||||
There are situations where it is hardly possible to construct reasonable confidence intervals or estimate the standard error. But one could at least get a [rough guess]{.underline} of the uncertainty by the following thought experiment:
|
||||
|
||||
Assume that the estimated parameter value $\theta^*$ would be equal to the true one. How uncertain would an estimate be in that case?
|
||||
|
||||
The answer is given by the following procedure, called *parametric bootstrap*:
|
||||
|
||||
1. Generate independent data $\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \ldots, \mathbf{x}^{(B)}$ (each sample of size $n$) from the model with parameter $\theta^*$
|
||||
2. Apply the estimator separately to each sample $\leadsto$ $\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})$
|
||||
3. Use the empirical distribution $\widehat \theta(\mathbf{x}^{(1)}), \ldots, \widehat \theta(\mathbf{x}^{(B)})$ as a proxy to the theoretical one.
|
||||
|
||||
::: {.callout-caution collapse="false"}
|
||||
## Task 2
|
||||
|
||||
1. Consider the following function that generate $n$ correlated samples that are uniformly distributed on $[\mu-0-5,\mu+0.5]$.
|
||||
|
||||
``` julia
|
||||
myrand = function(mu, n)
|
||||
rho = 0.9
|
||||
res = zeros(n)
|
||||
res[1] = rand(1)[1]
|
||||
if n > 1
|
||||
for i in 2:n
|
||||
res[i] = rho*res[i-1] .+ (1-rho)*rand(1)[1]
|
||||
end
|
||||
end
|
||||
res .= mu - 0.5 .+ res
|
||||
return(res)
|
||||
end
|
||||
```
|
||||
|
||||
The additional parameter `rho` (between 0 and 1) controls the strength of dependence with 0 meaning independence and 1 meaning full dependence.
|
||||
|
||||
Write functions that estimate the standard deviation of the estimated mean via (a) generating new samples from the true unknown distribution, (b) i.i.d. bootstrap, (c) parametric bootstrap 2. Use the functions for different values of `rho` and compare the results.
|
||||
:::
|
104
material/4_thu/bootstrap/CodeSnippets.jl
Normal file
104
material/4_thu/bootstrap/CodeSnippets.jl
Normal file
@ -0,0 +1,104 @@
|
||||
using Distributions
|
||||
using Statistics
|
||||
using StatsPlots
|
||||
|
||||
d = Normal(0.0, 1.0)
|
||||
n = 100
|
||||
x = rand(d, n)
|
||||
θ = mean(x)
|
||||
|
||||
#---
|
||||
|
||||
σ = std(x)
|
||||
est_d = Normal(θ, σ/sqrt(n))
|
||||
plot(est_d, legend=false)
|
||||
|
||||
ci_bounds = quantile(est_d, [0.025,0.975])
|
||||
vline!(ci_bounds)
|
||||
|
||||
#---
|
||||
|
||||
B = 1000
|
||||
est_vector = zeros(B)
|
||||
for i in 1:B
|
||||
x_new = rand(d, n)
|
||||
est_vector[i] = mean(x_new)
|
||||
end
|
||||
histogram(est_vector, legend=false)
|
||||
|
||||
ci_bounds_new = quantile(est_vector, [0.025, 0.975])
|
||||
vline!(ci_bounds_new)
|
||||
|
||||
#---
|
||||
|
||||
est_vector_bs = zeros(B)
|
||||
for i in 1:B
|
||||
x_bs = rand(x, n)
|
||||
est_vector_bs[i] = mean(x_bs)
|
||||
end
|
||||
histogram(est_vector_bs, legend=false)
|
||||
|
||||
ci_bounds_bs = quantile(est_vector_bs, [0.025, 0.975])
|
||||
vline!(ci_bounds_bs)
|
||||
|
||||
#---
|
||||
|
||||
# the following function generates a sample of (strongly)
|
||||
# correlated normal data with mean μ and variance 1
|
||||
myrandn = function (mu=0.0, n=1)
|
||||
rho = 0.9
|
||||
res = zeros(n)
|
||||
res[1] = randn(1)[1]
|
||||
if n > 1
|
||||
for i in 2:n
|
||||
res[i] = rho*res[i-1] .+ sqrt(1-rho^2)*randn(1)[1]
|
||||
end
|
||||
end
|
||||
res .= mu .+ res
|
||||
return(res)
|
||||
end
|
||||
|
||||
## Assume we obtained the following data
|
||||
n = 20
|
||||
true_mu = 1.0
|
||||
y = myrandn(true_mu, n)
|
||||
est_mu = mean(y)
|
||||
|
||||
## theoretical standard error
|
||||
B = 1000
|
||||
est_vector = zeros(B)
|
||||
for i in 1:B
|
||||
y_new = myrandn(true_mu, n)
|
||||
est_vector[i] = mean(y_new)
|
||||
end
|
||||
std(est_vector)
|
||||
|
||||
## classical i.i.d. bootstrap
|
||||
est_vector_bs = zeros(B)
|
||||
for i in 1:B
|
||||
y_bs = rand(y, n)
|
||||
est_vector_bs[i] = mean(y_bs)
|
||||
end
|
||||
std(est_vector_bs)
|
||||
|
||||
## parametric bootstrap
|
||||
est_vector_pbs = zeros(B)
|
||||
for i in 1:B
|
||||
y_pbs = myrandn(est_mu, n)
|
||||
est_vector_pbs[i] = mean(y_pbs)
|
||||
end
|
||||
std(est_vector_pbs)
|
||||
|
||||
# ---
|
||||
myrand = function(mu, n)
|
||||
rho = 0.9
|
||||
res = zeros(n)
|
||||
res[1] = rand(1)[1]
|
||||
if n > 1
|
||||
for i in 2:n
|
||||
res[i] = rho*res[i-1] .+ (1-rho)*rand(1)[1]
|
||||
end
|
||||
end
|
||||
res .= mu - 0.5 .+ res
|
||||
return(res)
|
||||
end
|
47
social.qmd
47
social.qmd
@ -4,22 +4,29 @@ title: Social Program
|
||||
|
||||
## Overview
|
||||
|
||||
|
||||
| What | When | Where |
|
||||
| ---------------------------------------------------------------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------- |
|
||||
| Get-together | Sunday (8th October) 7 pm | At a restaurant called [Metzgerei](https://maps.app.goo.gl/BchdVZWqegxgtHXi8) |
|
||||
| BBQ | Monday (9th October) | |
|
||||
| Hike to [Bärenschlössle](https://maps.app.goo.gl/xKQCuboiAR9YQksC7) + dinner | Tuesday (10th October) 6:10pm | We start at the entrance of the [SimTech building](https://maps.app.goo.gl/M8VgfDRkFiZk8wri8) |
|
||||
| Get-together/Games night | Tuesday (10th October) ~9:15 pm | At a student pub called [Unithekle](https://maps.app.goo.gl/NHdeJvjHiEPfvRh59) |
|
||||
| Conference dinner | Wednesday (11th October) | |
|
||||
| Bowling night | Thursday (12th October) 7:45 pm | At the [bowling centre in Möhringen](https://maps.app.goo.gl/yi4YfoqFXtz5uEhi6) |
|
||||
+------------------------------------------------------------------------------+----------------------------------+--------------------------------------------------------------------------------------------------+
|
||||
| What | When | Where |
|
||||
+==============================================================================+==================================+==================================================================================================+
|
||||
| Get-together | Sunday (8th October) 7 pm | At a restaurant called [Metzgerei](https://maps.app.goo.gl/BchdVZWqegxgtHXi8) |
|
||||
+------------------------------------------------------------------------------+----------------------------------+--------------------------------------------------------------------------------------------------+
|
||||
| Meet & Greet | Monday (9th October) | |
|
||||
+------------------------------------------------------------------------------+----------------------------------+--------------------------------------------------------------------------------------------------+
|
||||
| Hike to [Bärenschlössle](https://maps.app.goo.gl/xKQCuboiAR9YQksC7) + dinner | Tuesday (10th October) 6:10pm | We start at the entrance of the [SimTech building](https://maps.app.goo.gl/M8VgfDRkFiZk8wri8) |
|
||||
+------------------------------------------------------------------------------+----------------------------------+--------------------------------------------------------------------------------------------------+
|
||||
| Get-together/Games night | Tuesday (10th October) \~9:15 pm | At a student pub called [Unithekle](https://maps.app.goo.gl/NHdeJvjHiEPfvRh59) |
|
||||
+------------------------------------------------------------------------------+----------------------------------+--------------------------------------------------------------------------------------------------+
|
||||
| Conference dinner\ | Wednesday (11th October) 7:00pm | Either at the entrance of the SimTech building (6:20pm) or directly at Sophies Brauhaus (7:00pm) |
|
||||
| at [Sophies Brauhaus](https://sophies-brauhaus.de/en/) | | |
|
||||
+------------------------------------------------------------------------------+----------------------------------+--------------------------------------------------------------------------------------------------+
|
||||
| Bowling night | Thursday (12th October) 7:45 pm | At the [bowling centre in Möhringen](https://maps.app.goo.gl/yi4YfoqFXtz5uEhi6) |
|
||||
+------------------------------------------------------------------------------+----------------------------------+--------------------------------------------------------------------------------------------------+
|
||||
|
||||
## Sunday
|
||||
|
||||
Get-together at 7pm at a restaurant called [Metzgerei](https://maps.app.goo.gl/BchdVZWqegxgtHXi8) (vegetarian and vegan options available)
|
||||
|
||||
- Closest U-Bahn station(s): Schwab-/Bebelstraße (U2, U29)
|
||||
- Closest S-Bahn station(s): Schwabstraße (S1, S2, S3, S4, S5, S6, S60)
|
||||
- Closest U-Bahn station(s): Schwab-/Bebelstraße (U2, U29)
|
||||
- Closest S-Bahn station(s): Schwabstraße (S1, S2, S3, S4, S5, S6, S60)
|
||||
|
||||
## Monday
|
||||
|
||||
@ -41,15 +48,15 @@ Conference dinner (and Bene's birthday!)
|
||||
|
||||
## Thursday
|
||||
|
||||
On Thursday we will meet slightly before 20:00 at [bowling alley Möhringen](https://www.bowling-moehringen.de/) for a friendly tournament between groups.
|
||||
On Thursday we will meet slightly before 20:00 at [bowling alley Möhringen](https://www.bowling-moehringen.de/) for a friendly tournament between groups.
|
||||
|
||||
[Bowling centre in Möhringen :world_map:](https://maps.app.goo.gl/yi4YfoqFXtz5uEhi6) (8pm-10pm)
|
||||
|
||||
- We want to start bowling at 8pm. Please be there about 10 minutes earlier to rent bowling shoes.
|
||||
- Closest U-Bahn station(s): SSB-Zentrum (U3, U8, U12), Vaihinger Straße (U3, U5, U6, U8, U12)
|
||||
- Closest bus stop(s): Wallgraben (N1)
|
||||
- Closest S-Bahn station(s): Vaihingen (S1, S2, S3)
|
||||
- 25 min walk
|
||||
- We recommend taking the S-Bahn from University to Vaihingen and then change to U-Bahn
|
||||
|
||||
Beforehand you'll have time to get some food; we prepared [a map](https://www.google.com/maps/d/u/1/edit?mid=1vJQ4JtYGulsYW2M9HvyN73yKr6ESNlk&usp=sharing) with some recommendations on the way to the bowling alley, if you head there straight from campus.
|
||||
- We want to start bowling at 8pm. Please be there about 10 minutes earlier to rent bowling shoes.
|
||||
- Closest U-Bahn station(s): SSB-Zentrum (U3, U8, U12), Vaihinger Straße (U3, U5, U6, U8, U12)
|
||||
- Closest bus stop(s): Wallgraben (N1)
|
||||
- Closest S-Bahn station(s): Vaihingen (S1, S2, S3)
|
||||
- 25 min walk
|
||||
- We recommend taking the S-Bahn from University to Vaihingen and then change to U-Bahn
|
||||
|
||||
Beforehand you'll have time to get some food; we prepared [a map](https://www.google.com/maps/d/u/1/edit?mid=1vJQ4JtYGulsYW2M9HvyN73yKr6ESNlk&usp=sharing) with some recommendations on the way to the bowling alley, if you head there straight from campus.
|
||||
|
Loading…
Reference in New Issue
Block a user