Correct note on Multiple Regression.

2023-10-10 07:53:07 +02:00
parent 72e4f8da83
commit c74be9a6d7
1 changed files with 320 additions and 319 deletions
--- a/material/3_wed/regression/MultipleRegressionBasics.qmd
+++ b/material/3_wed/regression/MultipleRegressionBasics.qmd
@@ -1,319 +1,320 @@
---
+---
-editor: 
+editor: 
-  markdown: 
+  markdown: 
-    wrap: 72
+    wrap: 72
---
+---
-
+
-# Multiple Regression Basics
+# Multiple Regression Basics
-
+
-## Motivation
+## Motivation
-
+
-### Introductory Example: tree dataset from R
+### Introductory Example: tree dataset from R
-
+
-``` julia
+``` julia
-using Statistics
+using Statistics
-using Plots
+using Plots
-using RDatasets
+using RDatasets
-
+
-trees = dataset("datasets", "trees")
+trees = dataset("datasets", "trees")
-
+
-scatter(trees.Volume, trees.Girth,
+scatter(trees.Volume, trees.Girth,
-        legend=false, xlabel="Girth", ylabel="Volume")
+        legend=false, xlabel="Girth", ylabel="Volume")
-```
+```
-
+
-*Aim:* Find relationship between the *response variable* `volume` and
+*Aim:* Find relationship between the *response variable* `volume` and
-the *explanatory variable/covariate* `girth`? Can we predict the volume
+the *explanatory variable/covariate* `girth`? Can we predict the volume
-of a tree given its girth?
+of a tree given its girth?
-
+
-``` julia
+``` julia
-scatter(trees.Girth, trees.Volume,
+scatter(trees.Girth, trees.Volume,
-        legend=false, xlabel="Girth", ylabel="Volume")
+        legend=false, xlabel="Girth", ylabel="Volume")
-plot!(x -> -37 + 5*x)
+plot!(x -> -37 + 5*x)
-```
+```
-
+
-First Guess: There is a linear relation!
+First Guess: There is a linear relation!
-
+
-## Simple Linear Regression
+## Simple Linear Regression
-
+
-Main assumption: up to some error term, each measurement of the response
+Main assumption: up to some error term, each measurement of the response
-variable $y_i$ depends linearly on the corresponding value $x_i$ of the
+variable $y_i$ depends linearly on the corresponding value $x_i$ of the
-covariate
+covariate
-
+
-$\leadsto$ **(Simple) Linear Model:**
+$\leadsto$ **(Simple) Linear Model:**
-$$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,   \qquad  i=1,...,n,$$
+$$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,   \qquad  i=1,...,n,$$
-where $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$ are independent
+where $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$ are independent
-normally distributed errors with unknown variance $\sigma^2$.
+normally distributed errors with unknown variance $\sigma^2$.
-
+
-*Task:* Find the straight line that fits best, i.e., find the *optimal*
+*Task:* Find the straight line that fits best, i.e., find the *optimal*
-estimators for $\beta_0$ and $\beta_1$.
+estimators for $\beta_0$ and $\beta_1$.
-
+
-*Typical choice*: Least squares estimator (= maximum likelihood
+*Typical choice*: Least squares estimator (= maximum likelihood
-estimator for normal errors)
+estimator for normal errors)
-
+
-$$ (\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x} \beta_1\|^2 $$
+$$ (\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x} \beta_1\|^2 $$
-
+
-where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$ is the
+where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$ is the
-vector of covariates and $\mathbf{1}$ is a vector of ones.
+vector of covariates and $\mathbf{1}$ is a vector of ones.
-
+
-Written in matrix style:
+Written in matrix style:
-
+
-$$
+$$
- (\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}) \left( \begin{array}{c} \beta_0\\ \beta_1\end{array}\right) \right\|^2
+ (\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}) \left( \begin{array}{c} \beta_0\\ \beta_1\end{array}\right) \right\|^2
-$$
+$$
-
+
-Note: There is a closed-form expression for
+Note: There is a closed-form expression for
-$(\hat \beta_0, \hat \beta_1)$. We will not make use of it here, but
+$(\hat \beta_0, \hat \beta_1)$. We will not make use of it here, but
-rather use Julia to solve the problem.
+rather use Julia to solve the problem.
-
+
-``` julia
+``` julia
-lm(@formula(Volume ~ Girth), trees)
+lm(@formula(Volume ~ Girth), trees)
-```
+```
-
+
-*Interpretation of the Julia output:*
+*Interpretation of the Julia output:*
-
+
-   column `estimate` : least square estimates for $\hat \beta_0$ and
+-   column `estimate` : least square estimates for $\hat \beta_0$ and
-    $\hat \beta_1$
+    $\hat \beta_1$
-
+
-   column `Std. Error` : estimated standard deviation
+-   column `Std. Error` : estimated standard deviation
-    $\hat s_{\hat \beta_i}$ of the estimator $\hat \beta_i$
+    $\hat s_{\hat \beta_i}$ of the estimator $\hat \beta_i$
-
+
-   column `t value` : value of the $t$-statistics
+-   column `t value` : value of the $t$-statistics
-
+
-    $$ t_i = {\hat \beta_i \over \hat s_{\hat \beta_i}}, \quad i=0,1, $$
+    $$ t_i = {\hat \beta_i \over \hat s_{\hat \beta_i}}, \quad i=0,1, $$
-
+
-    Under the hypothesis $\beta_i=0$, the test statistics $t_i$ would
+    Under the hypothesis $\beta_i=0$, the test statistics $t_i$ would
-    follow a $t$-distribution.
+    follow a $t$-distribution.
-
+
-   column `Pr(>|t|)`: $p$-values for the hypotheses $\beta_i=0$ for
+-   column `Pr(>|t|)`: $p$-values for the hypotheses $\beta_i=0$ for
-    $i=0,1$
+    $i=0,1$
-
+
-::: callout-tip
+::: callout-tip
-The command `rand(n)` generates a sample of `n` "random" (i.e.,
+The command `rand(n)` generates a sample of `n` "random" (i.e.,
-uniformly distributed) random numbers. If you want to sample from
+uniformly distributed) random numbers. If you want to sample from
-another distribution, use the `Distributions` package, define an object
+another distribution, use the `Distributions` package, define an object
-being the distribution of interest, e.g. `d = Normal(0.0, 2.0)` for a
+being the distribution of interest, e.g. `d = Normal(0.0, 2.0)` for a
-normal distribution with mean 0.0 and standard deviation 2.0, and sample
+normal distribution with mean 0.0 and standard deviation 2.0, and sample
-`n` times from this distribution by `rand(d, n)`.
+`n` times from this distribution by `rand(d, n)`.
-:::
+:::
-
+
-::: {.callout-caution collapse="false"}
+::: {.callout-caution collapse="false"}
-## Task 1
+## Task 1
-
+
-1.  Generate $n=20$ covariates $\mathbf{x}$ randomly.
+1.  Generate $n=20$ covariates $\mathbf{x}$ randomly.
-2.  Given these covariates and the true parameters $\beta_0=-3$,
+2.  Given these covariates and the true parameters $\beta_0=-3$,
-    $\beta_1=2$ and $\sigma=0.5$, simulate responses from a linear model
+    $\beta_1=2$ and $\sigma=0.5$, simulate responses from a linear model
-    (with normally distributed errors) and estimate the coefficients
+    (with normally distributed errors) and estimate the coefficients
-    $\beta_0$ and $\beta_1$.
+    $\beta_0$ and $\beta_1$.
-3.  Play with different choices of the parameters above (including the
+3.  Play with different choices of the parameters above (including the
-    sample size $n$) to see the effects on the parameter estimates and
+    sample size $n$) to see the effects on the parameter estimates and
-    the $p$-values.
+    the $p$-values.
-
+:::
-::: {callout-note}
+
-The more covariates you add the more variance can be explained by the linear model - $R^2$ increases. In order to balance goodness-of-tit of a model and its complexity, information criteria such as `aic` are considered.
+## Multiple Regression Model
-:::
+
-
+*Idea*: Generalize the simple linear regression model to multiple
-## Multiple Regression Model
+covariates, w.g., predict `volume` using `girth` and \`height\`\`.
-
+
-*Idea*: Generalize the simple linear regression model to multiple
+$\leadsto$ **Linear Model:**
-covariates, w.g., predict `volume` using `girth` and \`height\`\`.
+$$y_i = \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} + \varepsilon_i,   \qquad  i=1,...,n,$$where
-
+
-$\leadsto$ **Linear Model:**
+-   $y_i$: $i$-th measurement of the response,
-$$y_i = \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} + \varepsilon_i,   \qquad  i=1,...,n,$$where
+
-
+-   $x_{i1}$: $i$ th value of first covariate,
-   $y_i$: $i$-th measurement of the response,
+
-
+-   ...
-   $x_{i1}$: $i$ th value of first covariate,
+
-
+-   $x_{ip}$: $i$-th value of $p$-th covariate,
-   ...
+
-
+-   $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$: independent normally
-   $x_{ip}$: $i$-th value of $p$-th covariate,
+    distributed errors with unknown variance $\sigma^2$.
-
+
-   $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$: independent normally
+*Task:* Find the *optimal* estimators for
-    distributed errors with unknown variance $\sigma^2$.
+$\mathbf{\beta} = (\beta_0, \beta_1, \ldots, \beta_p)$.
-
+
-*Task:* Find the *optimal* estimators for
+*Our choice again:* Least squares estimator (= maximum likelihood
-$\mathbf{\beta} = (\beta_0, \beta_1, \ldots, \beta_p)$.
+estimator for normal errors)
-
+
-*Our choice again:* Least squares estimator (= maximum likelihood
+$$
-estimator for normal errors)
+ \hat \beta = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x}_1 \beta_1 - \ldots - \mathbf{x}_p \beta_p\|^2
-
+$$
-$$
+
- \hat \beta = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x}_1 \beta_1 - \ldots - \mathbf{x}_p \beta_p\|^2
+where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$\_j is the
-$$
+vector of the $j$ th covariate and $\mathbf{1}$ is a vector of ones.
-
+
-where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$\_j is the
+Written in matrix style:
-vector of the $j$ th covariate and $\mathbf{1}$ is a vector of ones.
+
-
+$$
-Written in matrix style:
+\mathbf{\hat \beta} = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}_1,\ldots,\mathbf{x}_p) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p\end{array} \right) \right\|^2
-
+$$
-$$
+
-\mathbf{\hat \beta} = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}_1,\ldots,\mathbf{x}_p) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p\end{array} \right) \right\|^2
+Defining the *design matrix*
-$$
+
-
+$$ \mathbf{X} = \left( \begin{array}{cccc}
-Defining the *design matrix*
+                1 & x_{11} & \ldots & x_{1p} \\
-
+                \vdots & \vdots & \ddots & \vdots \\
-$$ \mathbf{X} = \left( \begin{array}{cccc}
+                1 & x_{11} & \ldots & x_{1p} 
-                1 & x_{11} & \ldots & x_{1p} \\
+                \end{array}\right) \qquad 
-                \vdots & \vdots & \ddots & \vdots \\
+  (\text{size } n \times (p+1)), $$
-                1 & x_{11} & \ldots & x_{1p} 
+
-                \end{array}\right) \qquad 
+we get the short form
-  (\text{size } n \times (p+1)), $$
+
-
+$$
-we get the short form
+\mathbf{\hat \beta} = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{X} \mathbf{\beta}  \|^2 = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
-
+$$
-$$
+
-\mathbf{\hat \beta} = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{X} \mathbf{\beta}  \|^2 = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
+\[use Julia code (existing package) to perform linear regression for
-$$
+`volume ~ girth + height`\]
-
+
-\[use Julia code (existing package) to perform linear regression for
+The interpretation of the Julia output is similar to the simple linear
-`volume ~ girth + height`\]
+regression model, but we provide explicit formulas now:
-
+
-The interpretation of the Julia output is similar to the simple linear
+-   parameter estimates:
-regression model, but we provide explicit formulas now:
+
-
+    $$
-   parameter estimates:
+     (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
-
+    $$
-    $$
+
-     (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
+-   estimated standard errors:
-    $$
+
-
+    $$
-   estimated standard errors:
+     \hat s_{\beta_i} = \sqrt{([\mathbf{X}^\top \mathbf{X}]^{-1})_{ii} \frac 1 {n-p} \|\mathbf{y} - \mathbf{X} \beta\|^2}
-
+    $$
-    $$
+
-     \hat s_{\beta_i} = \sqrt{([\mathbf{X}^\top \mathbf{X}]^{-1})_{ii} \frac 1 {n-p} \|\mathbf{y} - \mathbf{X} \beta\|^2}
+-   $t$-statistics:
-    $$
+
-
+    $$ t_i = \frac{\hat \beta_i}{\hat s_{\hat \beta_i}}, \qquad i=0,\ldots,p. $$
-   $t$-statistics:
+
-
+-   $p$-values:
-    $$ t_i = \frac{\hat \beta_i}{\hat s_{\hat \beta_i}}, \qquad i=0,\ldots,p. $$
+
-
+    $$
-   $p$-values:
+    p\text{-value} = \mathbb{P}(|T| > t_i), \quad \text{where } T \sim t_{n-p}
-
+    $$
-    $$
+
-    p\text{-value} = \mathbb{P}(|T| > t_i), \quad \text{where } T \sim t_{n-p}
+::: {.callout-caution collapse="false"}
-    $$
+
-
+## Task 2
-::: {.callout-caution collapse="false"}
+
-
+1. Implement functions that estimate the $\beta$-parameters,
-## Task 2
+the corresponding standard errors and the $t$-statistics. 
-
+2. Test your functions with the `tree' data set and try to reproduce the
-1. Implement functions that estimate the $\beta$-parameters,
+output above.
-the corresponding standard errors and the $t$-statistics. 
+:::
-2. Test your functions with the `tree' data set and try to reproduce the
+
-output above.
+Which model is the best? For linear models, one often uses the $R^2$ characteristic.
-:::
+Roughly speaking, it gives the percentage (between 0 and 1) of the variance that can be explained by the linear model.
-
+
-Which model is the best? For linear models, one often uses the $R^2$ characteristic.
+``` julia
-Roughly speaking, it gives the percentage (between 0 and 1) of the variance that can be explained by the linear model.
+r2(linmod1)
-
+r2(linmod2)
-``` julia
+
-r2(linmod1)
+linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees)
-r2(linmod2)
+
-
+r2(linmod3)
-linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees)
+```
-
+
-r2(linmod3)
+::: {callout-note}
-```
+The more covariates you add the more variance can be explained by the linear model - $R^2$ increases. In order to balance goodness-of-tit of a model and its complexity, information criteria such as `aic` are considered.
-
+:::
-## Generalized Linear Models
+
-
+## Generalized Linear Models
-Classical linear model
+
-
+Classical linear model
-$$
+
- \mathbf{y} = \mathbf{X} \beta + \varepsilon 
+$$
-$$
+ \mathbf{y} = \mathbf{X} \beta + \varepsilon 
-
+$$
-implies that
+
-$$ \mathbf{y} \mid \mathbf{X} \sim \mathcal{N}(\mathbf{X} \mathbf{\beta}, \sigma^2\mathbf{I}).$$
+implies that
-
+$$ \mathbf{y} \mid \mathbf{X} \sim \mathcal{N}(\mathbf{X} \mathbf{\beta}, \sigma^2\mathbf{I}).$$
-In particular, the conditional expectation satisfies
+
-$\mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \mathbf{X} \beta$.
+In particular, the conditional expectation satisfies
-
+$\mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \mathbf{X} \beta$.
-However, the assumption of a normal distribution is unrealistic for
+
-non-continuous data. Popular alternatives include:
+However, the assumption of a normal distribution is unrealistic for
-
+non-continuous data. Popular alternatives include:
-   for counting data: $$
+
-       \mathbf{y} \mid \mathbf{X} \sim \mathrm{Poisson}(\exp(\mathbf{X}\mathbf{\beta})) \qquad \leadsto  \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \exp(\mathbf{X} \beta)
+-   for counting data: $$
-      $$
+       \mathbf{y} \mid \mathbf{X} \sim \mathrm{Poisson}(\exp(\mathbf{X}\mathbf{\beta})) \qquad \leadsto  \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \exp(\mathbf{X} \beta)
-
+      $$
-    Here, the components are considered to be independent and the
+
-    exponential function is applied componentwise.
+    Here, the components are considered to be independent and the
-
+    exponential function is applied componentwise.
-   for binary data: $$
+
-       \mathbf{y} \mid \mathbf{X} \sim \mathrm{Bernoulli}\left( \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})} \right) \qquad \leadsto  \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})}
+-   for binary data: $$
-      $$
+       \mathbf{y} \mid \mathbf{X} \sim \mathrm{Bernoulli}\left( \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})} \right) \qquad \leadsto  \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})}
-
+      $$
-    Again, the components are considered to be independent and all the
+
-    operations are applied componentwise.
+    Again, the components are considered to be independent and all the
-
+    operations are applied componentwise.
-All these models are defined by the choice of a family of distributions
+
-and a function $g$ (the so-called *link function*) such that
+All these models are defined by the choice of a family of distributions
-
+and a function $g$ (the so-called *link function*) such that
-$$
+
- \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = g^{-1}(\mathbf{X} \beta).
+$$
-$$
+ \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = g^{-1}(\mathbf{X} \beta).
-
+$$
-For the models above, these are:
+
-
+For the models above, these are:
-+----------------+------------------+--------------------------+
+
-| Type of Data   | Distribution     | Link Function            |
+----------------+------------------+--------------------------+
-|                | Family           |                          |
+| Type of Data   | Distribution     | Link Function            |
-+================+==================+==========================+
+|                | Family           |                          |
-| continuous     | Normal           | identity:                |
+================+==================+==========================+
-|                |                  |                          |
+| continuous     | Normal           | identity:                |
-|                |                  | $$                       |
+|                |                  |                          |
-|                |                  | g(x)=x                   |
+|                |                  | $$                       |
-|                |                  | $$                       |
+|                |                  | g(x)=x                   |
-+----------------+------------------+--------------------------+
+|                |                  | $$                       |
-| count          | Poisson          | log:                     |
+----------------+------------------+--------------------------+
-|                |                  |                          |
+| count          | Poisson          | log:                     |
-|                |                  | $$                       |
+|                |                  |                          |
-|                |                  |  g(x) = \log(x)          |
+|                |                  | $$                       |
-|                |                  | $$                       |
+|                |                  |  g(x) = \log(x)          |
-+----------------+------------------+--------------------------+
+|                |                  | $$                       |
-| binary         | Bernoulli        | logit:                   |
+----------------+------------------+--------------------------+
-|                |                  |                          |
+| binary         | Bernoulli        | logit:                   |
-|                |                  | $$                       |
+|                |                  |                          |
-|                |                  | g(x) = \log\left(        |
+|                |                  | $$                       |
-|                |                  | \frac{x}{1-x}            |
+|                |                  | g(x) = \log\left(        |
-|                |                  | \right)                  |
+|                |                  | \frac{x}{1-x}            |
-|                |                  | $$                       |
+|                |                  | \right)                  |
-+----------------+------------------+--------------------------+
+|                |                  | $$                       |
-
+----------------+------------------+--------------------------+
-In general, the parameter vector $\beta$ is estimated via maximizing the
+
-likelihood, i.e.,
+In general, the parameter vector $\beta$ is estimated via maximizing the
-
+likelihood, i.e.,
-$$
+
-\hat \beta = \mathrm{argmax} \prod_{i=1}^n f(y_i \mid \mathbf{X}_{\cdot i}),
+$$
-$$
+\hat \beta = \mathrm{argmax} \prod_{i=1}^n f(y_i \mid \mathbf{X}_{\cdot i}),
-
+$$
-which is equivalent to the maximization of the log-likelihood, i.e.,
+
-
+which is equivalent to the maximization of the log-likelihood, i.e.,
-$$
+
-\hat \beta = \mathrm{argmax} \sum_{i=1}^n \log f(y_i \mid \mathbf{X}_{\cdot i}),
+$$
-$$
+\hat \beta = \mathrm{argmax} \sum_{i=1}^n \log f(y_i \mid \mathbf{X}_{\cdot i}),
-
+$$
-In the Gaussian case, the maximum likelihood estimator is identical to
+
-the least squares estimator considered above.
+In the Gaussian case, the maximum likelihood estimator is identical to
-
+the least squares estimator considered above.
-``` julia
+
-using CSV
+``` julia
-using HTTP
+using CSV
-
+using HTTP
-http_response = HTTP.get("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SwissLabor.csv")
+
-SwissLabor = DataFrame(CSV.File(http_response.body))
+http_response = HTTP.get("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SwissLabor.csv")
-
+SwissLabor = DataFrame(CSV.File(http_response.body))
-SwissLabor[!,"participation"] .= (SwissLabor.participation .== "yes")
+
-
+SwissLabor[!,"participation"] .= (SwissLabor.participation .== "yes")
-model = glm(@formula(participation ~ age^2), 
+
-            SwissLabor, Binomial(), ProbitLink())
+model = glm(@formula(participation ~ age^2), 
-```
+            SwissLabor, Binomial(), ProbitLink())
-
+```
-::: {.callout-caution collapse="false"}
+
-
+::: {.callout-caution collapse="false"}
-## Task 3:
+
-
+## Task 3:
-1.  Reproduce the results of our data analysis of the `tree` data set
+
-    using a generalized linear model with normal distribution family.
+1.  Reproduce the results of our data analysis of the `tree` data set
-2.  Generate $n=20$ random covariates $\mathbf{x}$ and Poisson-distributed counting      data with parameters $\beta_0 + \beta_1 x_i$. Re-estimate the parameters by a generalized linear model. 
+    using a generalized linear model with normal distribution family.
-:::
+2.  Generate $n=20$ random covariates $\mathbf{x}$ and Poisson-distributed counting      data with parameters $\beta_0 + \beta_1 x_i$. Re-estimate the parameters by a generalized linear model. 
 :::