Update Multiple Regression.
This commit is contained in:
parent
bb14fac836
commit
12e0a50946
@ -1,324 +1,316 @@
|
|||||||
---
|
---
|
||||||
editor:
|
editor:
|
||||||
markdown:
|
markdown:
|
||||||
wrap: 72
|
wrap: 72
|
||||||
---
|
---
|
||||||
|
|
||||||
# Multiple Regression Basics
|
# Multiple Regression Basics
|
||||||
|
|
||||||
## Motivation
|
## Motivation
|
||||||
|
|
||||||
### Introductory Example: tree dataset from R
|
### Introductory Example: tree dataset from R
|
||||||
|
|
||||||
``` julia
|
``` julia
|
||||||
using Statistics
|
using Statistics
|
||||||
using Plots
|
using Plots
|
||||||
using RDatasets
|
using RDatasets
|
||||||
|
|
||||||
trees = dataset("datasets", "trees")
|
trees = dataset("datasets", "trees")
|
||||||
|
|
||||||
scatter(trees.Volume, trees.Girth,
|
scatter(trees.Volume, trees.Girth,
|
||||||
legend=false, xlabel="Girth", ylabel="Volume")
|
legend=false, xlabel="Girth", ylabel="Volume")
|
||||||
```
|
```
|
||||||
|
|
||||||
*Aim:* Find relationship between the *response variable* `volume` and
|
*Aim:* Find relationship between the *response variable* `volume` and
|
||||||
the *explanatory variable/covariate* `girth`? Can we predict the volume
|
the *explanatory variable/covariate* `girth`? Can we predict the volume
|
||||||
of a tree given its girth?
|
of a tree given its girth?
|
||||||
|
|
||||||
``` julia
|
``` julia
|
||||||
scatter(trees.Girth, trees.Volume,
|
scatter(trees.Girth, trees.Volume,
|
||||||
legend=false, xlabel="Girth", ylabel="Volume")
|
legend=false, xlabel="Girth", ylabel="Volume")
|
||||||
plot!(x -> -37 + 5*x)
|
plot!(x -> -37 + 5*x)
|
||||||
```
|
```
|
||||||
|
|
||||||
First Guess: There is a linear relation!
|
First Guess: There is a linear relation!
|
||||||
|
|
||||||
## Simple Linear Regression
|
## Simple Linear Regression
|
||||||
|
|
||||||
Main assumption: up to some error term, each measurement of the response
|
Main assumption: up to some error term, each measurement of the response
|
||||||
variable $y_i$ depends linearly on the corresponding value $x_i$ of the
|
variable $y_i$ depends linearly on the corresponding value $x_i$ of the
|
||||||
covariate
|
covariate
|
||||||
|
|
||||||
$\leadsto$ **(Simple) Linear Model:**
|
$\leadsto$ **(Simple) Linear Model:**
|
||||||
$$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad i=1,...,n,$$
|
$$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad i=1,...,n,$$
|
||||||
where $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$ are independent
|
where $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$ are independent
|
||||||
normally distributed errors with unknown variance $\sigma^2$.
|
normally distributed errors with unknown variance $\sigma^2$.
|
||||||
|
|
||||||
*Task:* Find the straight line that fits best, i.e., find the *optimal*
|
*Task:* Find the straight line that fits best, i.e., find the *optimal*
|
||||||
estimators for $\beta_0$ and $\beta_1$.
|
estimators for $\beta_0$ and $\beta_1$.
|
||||||
|
|
||||||
*Typical choice*: Least squares estimator (= maximum likelihood
|
*Typical choice*: Least squares estimator (= maximum likelihood
|
||||||
estimator for normal errors)
|
estimator for normal errors)
|
||||||
|
|
||||||
$$ (\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x} \beta_1\|^2 $$
|
$$ (\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x} \beta_1\|^2 $$
|
||||||
|
|
||||||
where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$ is the
|
where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$ is the
|
||||||
vector of covariates and $\mathbf{1}$ is a vector of ones.
|
vector of covariates and $\mathbf{1}$ is a vector of ones.
|
||||||
|
|
||||||
Written in matrix style:
|
Written in matrix style:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
(\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}) \left( \begin{array}{c} \beta_0\\ \beta_1\end{array}\right) \right\|^2
|
(\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}) \left( \begin{array}{c} \beta_0\\ \beta_1\end{array}\right) \right\|^2
|
||||||
$$
|
$$
|
||||||
|
|
||||||
Note: There is a closed-form expression for
|
Note: There is a closed-form expression for
|
||||||
$(\hat \beta_0, \hat \beta_1)$. We will not make use of it here, but
|
$(\hat \beta_0, \hat \beta_1)$. We will not make use of it here, but
|
||||||
rather use Julia to solve the problem.
|
rather use Julia to solve the problem.
|
||||||
|
|
||||||
``` julia
|
``` julia
|
||||||
lm(@formula(Volume ~ Girth), trees)
|
lm(@formula(Volume ~ Girth), trees)
|
||||||
```
|
```
|
||||||
|
|
||||||
*Interpretation of the Julia output:*
|
*Interpretation of the Julia output:*
|
||||||
|
|
||||||
- column `estimate` : least square estimates for $\hat \beta_0$ and
|
- column `estimate` : least square estimates for $\hat \beta_0$ and
|
||||||
$\hat \beta_1$
|
$\hat \beta_1$
|
||||||
|
|
||||||
- column `Std. Error` : estimated standard deviation
|
- column `Std. Error` : estimated standard deviation
|
||||||
$\hat s_{\hat \beta_i}$ of the estimator $\hat \beta_i$
|
$\hat s_{\hat \beta_i}$ of the estimator $\hat \beta_i$
|
||||||
|
|
||||||
- column `t value` : value of the $t$-statistics
|
- column `t value` : value of the $t$-statistics
|
||||||
|
|
||||||
$$ t_i = {\hat \beta_i \over \hat s_{\hat \beta_i}}, \quad i=0,1, $$
|
$$ t_i = {\hat \beta_i \over \hat s_{\hat \beta_i}}, \quad i=0,1, $$
|
||||||
|
|
||||||
Under the hypothesis $\beta_i=0$, the test statistics $t_i$ would
|
Under the hypothesis $\beta_i=0$, the test statistics $t_i$ would
|
||||||
follow a $t$-distribution.
|
follow a $t$-distribution.
|
||||||
|
|
||||||
- column `Pr(>|t|)`: $p$-values for the hypotheses $\beta_i=0$ for
|
- column `Pr(>|t|)`: $p$-values for the hypotheses $\beta_i=0$ for
|
||||||
$i=0,1$
|
$i=0,1$
|
||||||
|
|
||||||
::: callout-tip
|
::: callout-tip
|
||||||
The command `rand(n)` generates a sample of `n` "random" (i.e.,
|
The command `rand(n)` generates a sample of `n` "random" (i.e.,
|
||||||
uniformly distributed) random numbers. If you want to sample from
|
uniformly distributed) random numbers. If you want to sample from
|
||||||
another distribution, use the `Distributions` package, define an object
|
another distribution, use the `Distributions` package, define an object
|
||||||
being the distribution of interest, e.g. `d = Normal(0.0, 2.0)` for a
|
being the distribution of interest, e.g. `d = Normal(0.0, 2.0)` for a
|
||||||
normal distribution with mean 0.0 and standard deviation 2.0, and sample
|
normal distribution with mean 0.0 and standard deviation 2.0, and sample
|
||||||
`n` times from this distribution by `rand(d, n)`.
|
`n` times from this distribution by `rand(d, n)`.
|
||||||
:::
|
:::
|
||||||
|
|
||||||
::: {.callout-caution collapse="false"}
|
::: {.callout-caution collapse="false"}
|
||||||
## Task 1
|
## Task 1
|
||||||
|
|
||||||
1. Generate $n=20$ covariates $\mathbf{x}$ randomly.
|
1. Generate $n=20$ covariates $\mathbf{x}$ randomly.
|
||||||
2. Given these covariates and the true parameters $\beta_0=-3$,
|
2. Given these covariates and the true parameters $\beta_0=-3$,
|
||||||
$\beta_1=2$ and $\sigma=0.5$, simulate responses from a linear model
|
$\beta_1=2$ and $\sigma=0.5$, simulate responses from a linear model
|
||||||
(with normally distributed errors) and estimate the coefficients
|
(with normally distributed errors) and estimate the coefficients
|
||||||
$\beta_0$ and $\beta_1$.
|
$\beta_0$ and $\beta_1$.
|
||||||
3. Play with different choices of the parameters above (including the
|
3. Play with different choices of the parameters above (including the
|
||||||
sample size $n$) to see the effects on the parameter estimates and
|
sample size $n$) to see the effects on the parameter estimates and
|
||||||
the $p$-values.
|
the $p$-values.
|
||||||
:::
|
:::
|
||||||
|
|
||||||
## Multiple Regression Model
|
## Multiple Regression Model
|
||||||
|
|
||||||
*Idea*: Generalize the simple linear regression model to multiple
|
*Idea*: Generalize the simple linear regression model to multiple
|
||||||
covariates, w.g., predict `volume` using `girth` and \`height\`\`.
|
covariates, w.g., predict `volume` using `girth` and \`height\`\`.
|
||||||
|
|
||||||
$\leadsto$ **Linear Model:**
|
$\leadsto$ **Linear Model:**
|
||||||
$$y_i = \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} + \varepsilon_i, \qquad i=1,...,n,$$where
|
$$y_i = \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} + \varepsilon_i, \qquad i=1,...,n,$$where
|
||||||
|
|
||||||
- $y_i$: $i$-th measurement of the response,
|
- $y_i$: $i$-th measurement of the response,
|
||||||
|
|
||||||
- $x_{i1}$: $i$ th value of first covariate,
|
- $x_{i1}$: $i$ th value of first covariate,
|
||||||
|
|
||||||
- ...
|
- ...
|
||||||
|
|
||||||
- $x_{ip}$: $i$-th value of $p$-th covariate,
|
- $x_{ip}$: $i$-th value of $p$-th covariate,
|
||||||
|
|
||||||
- $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$: independent normally
|
- $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$: independent normally
|
||||||
distributed errors with unknown variance $\sigma^2$.
|
distributed errors with unknown variance $\sigma^2$.
|
||||||
|
|
||||||
*Task:* Find the *optimal* estimators for
|
*Task:* Find the *optimal* estimators for
|
||||||
$\mathbf{\beta} = (\beta_0, \beta_1, \ldots, \beta_p)$.
|
$\mathbf{\beta} = (\beta_0, \beta_1, \ldots, \beta_p)$.
|
||||||
|
|
||||||
*Our choice again:* Least squares estimator (= maximum likelihood
|
*Our choice again:* Least squares estimator (= maximum likelihood
|
||||||
estimator for normal errors)
|
estimator for normal errors)
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\hat \beta = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x}_1 \beta_1 - \ldots - \mathbf{x}_p \beta_p\|^2
|
\hat \beta = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x}_1 \beta_1 - \ldots - \mathbf{x}_p \beta_p\|^2
|
||||||
$$
|
$$
|
||||||
|
|
||||||
where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$\_j is the
|
where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$\_j is the
|
||||||
vector of the $j$ th covariate and $\mathbf{1}$ is a vector of ones.
|
vector of the $j$ th covariate and $\mathbf{1}$ is a vector of ones.
|
||||||
|
|
||||||
Written in matrix style:
|
Written in matrix style:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\mathbf{\hat \beta} = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}_1,\ldots,\mathbf{x}_p) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p\end{array} \right) \right\|^2
|
\mathbf{\hat \beta} = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}_1,\ldots,\mathbf{x}_p) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p\end{array} \right) \right\|^2
|
||||||
$$
|
$$
|
||||||
|
|
||||||
Defining the *design matrix*
|
Defining the *design matrix*
|
||||||
|
|
||||||
$$ \mathbf{X} = \left( \begin{array}{cccc}
|
$$ \mathbf{X} = \left( \begin{array}{cccc}
|
||||||
1 & x_{11} & \ldots & x_{1p} \\
|
1 & x_{11} & \ldots & x_{1p} \\
|
||||||
\vdots & \vdots & \ddots & \vdots \\
|
\vdots & \vdots & \ddots & \vdots \\
|
||||||
1 & x_{11} & \ldots & x_{1p}
|
1 & x_{11} & \ldots & x_{1p}
|
||||||
\end{array}\right) \qquad
|
\end{array}\right) \qquad
|
||||||
(\text{size } n \times (p+1)), $$
|
(\text{size } n \times (p+1)), $$
|
||||||
|
|
||||||
we get the short form
|
we get the short form
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\mathbf{\hat \beta} = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{X} \mathbf{\beta} \|^2 = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
|
\mathbf{\hat \beta} = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{X} \mathbf{\beta} \|^2 = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
\[use Julia code (existing package) to perform linear regression for
|
\[use Julia code (existing package) to perform linear regression for
|
||||||
`volume ~ girth + height`\]
|
`volume ~ girth + height`\]
|
||||||
|
|
||||||
The interpretation of the Julia output is similar to the simple linear
|
The interpretation of the Julia output is similar to the simple linear
|
||||||
regression model, but we provide explicit formulas now:
|
regression model, but we provide explicit formulas now:
|
||||||
|
|
||||||
- parameter estimates:
|
- parameter estimates:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
(\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
|
(\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
- estimated standard errors:
|
- estimated standard errors:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\hat s_{\beta_i} = \sqrt{([\mathbf{X}^\top \mathbf{X}]^{-1})_{ii} \frac 1 {n-p} \|\mathbf{y} - \mathbf{X} \beta\|^2}
|
\hat s_{\beta_i} = \sqrt{([\mathbf{X}^\top \mathbf{X}]^{-1})_{ii} \frac 1 {n-p} \|\mathbf{y} - \mathbf{X} \beta\|^2}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
- $t$-statistics:
|
- $t$-statistics:
|
||||||
|
|
||||||
$$ t_i = \frac{\hat \beta_i}{\hat s_{\hat \beta_i}}, \qquad i=0,\ldots,p. $$
|
$$ t_i = \frac{\hat \beta_i}{\hat s_{\hat \beta_i}}, \qquad i=0,\ldots,p. $$
|
||||||
|
|
||||||
- $p$-values:
|
- $p$-values:
|
||||||
|
|
||||||
$$
|
$$
|
||||||
p\text{-value} = \mathbb{P}(|T| > t_i), \quad \text{where } T \sim t_{n-p}
|
p\text{-value} = \mathbb{P}(|T| > t_i), \quad \text{where } T \sim t_{n-p}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
::: {.callout-caution collapse="false"}
|
::: {.callout-caution collapse="false"}
|
||||||
|
|
||||||
## Task 2
|
## Task 2
|
||||||
|
|
||||||
1. Implement functions that estimate the $\beta$-parameters,
|
1. Implement functions that estimate the $\beta$-parameters,
|
||||||
the corresponding standard errors and the $t$-statistics.
|
the corresponding standard errors and the $t$-statistics.
|
||||||
2. Test your functions with the `tree' data set and try to reproduce the
|
2. Test your functions with the `tree' data set and try to reproduce the
|
||||||
output above.
|
output above.
|
||||||
:::
|
:::
|
||||||
|
|
||||||
Which model is the best? For linear models, one often uses the $R^2$ characteristic.
|
Which model is the best? For linear models, one often uses the $R^2$ characteristic.
|
||||||
Roughly speaking, it gives the percentage (between 0 and 1) of the variance that can be explained by the linear model.
|
Roughly speaking, it gives the percentage (between 0 and 1) of the variance that can be explained by the linear model.
|
||||||
|
|
||||||
``` julia
|
``` julia
|
||||||
r2(linmod1)
|
r2(linmod1)
|
||||||
r2(linmod2)
|
r2(linmod2)
|
||||||
|
|
||||||
linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees)
|
linmod3 = lm(@formula(Volume ~ Girth + Height + Girth*Height), trees)
|
||||||
|
|
||||||
r2(linmod3)
|
r2(linmod3)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Generalized Linear Models
|
## Generalized Linear Models
|
||||||
|
|
||||||
Classical linear model
|
Classical linear model
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\mathbf{y} = \mathbf{X} \beta + \varepsilon
|
\mathbf{y} = \mathbf{X} \beta + \varepsilon
|
||||||
$$
|
$$
|
||||||
|
|
||||||
implies that
|
implies that
|
||||||
$$ \mathbf{y} \mid \mathbf{X} \sim \mathcal{N}(\mathbf{X} \mathbf{\beta}, \sigma^2\mathbf{I}).$$
|
$$ \mathbf{y} \mid \mathbf{X} \sim \mathcal{N}(\mathbf{X} \mathbf{\beta}, \sigma^2\mathbf{I}).$$
|
||||||
|
|
||||||
In particular, the conditional expectation satisfies
|
In particular, the conditional expectation satisfies
|
||||||
$\mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \mathbf{X} \beta$.
|
$\mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \mathbf{X} \beta$.
|
||||||
|
|
||||||
However, the assumption of a normal distribution is unrealistic for
|
However, the assumption of a normal distribution is unrealistic for
|
||||||
non-continuous data. Popular alternatives include:
|
non-continuous data. Popular alternatives include:
|
||||||
|
|
||||||
- for counting data: $$
|
- for counting data: $$
|
||||||
\mathbf{y} \mid \mathbf{X} \sim \mathrm{Poisson}(\exp(\mathbf{X}\mathbf{\beta})) \qquad \leadsto \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \exp(\mathbf{X} \beta)
|
\mathbf{y} \mid \mathbf{X} \sim \mathrm{Poisson}(\exp(\mathbf{X}\mathbf{\beta})) \qquad \leadsto \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \exp(\mathbf{X} \beta)
|
||||||
$$
|
$$
|
||||||
|
|
||||||
Here, the components are considered to be independent and the
|
Here, the components are considered to be independent and the
|
||||||
exponential function is applied componentwise.
|
exponential function is applied componentwise.
|
||||||
|
|
||||||
- for binary data: $$
|
- for binary data: $$
|
||||||
\mathbf{y} \mid \mathbf{X} \sim \mathrm{Bernoulli}\left( \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})} \right) \qquad \leadsto \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})}
|
\mathbf{y} \mid \mathbf{X} \sim \mathrm{Bernoulli}\left( \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})} \right) \qquad \leadsto \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})}
|
||||||
$$
|
$$
|
||||||
|
|
||||||
Again, the components are considered to be independent and all the
|
Again, the components are considered to be independent and all the
|
||||||
operations are applied componentwise.
|
operations are applied componentwise.
|
||||||
|
|
||||||
All these models are defined by the choice of a family of distributions
|
All these models are defined by the choice of a family of distributions
|
||||||
and a function $g$ (the so-called *link function*) such that
|
and a function $g$ (the so-called *link function*) such that
|
||||||
|
|
||||||
$$
|
$$
|
||||||
\mathbb{E}(\mathbf{y} \mid \mathbf{X}) = g^{-1}(\mathbf{X} \beta).
|
\mathbb{E}(\mathbf{y} \mid \mathbf{X}) = g^{-1}(\mathbf{X} \beta).
|
||||||
$$
|
$$
|
||||||
|
|
||||||
For the models above, these are:
|
For the models above, these are:
|
||||||
|
|
||||||
+----------------+----------------+----------------+
|
+----------------+------------------+--------------------------+
|
||||||
| Type of Data | Distribution | Link Function |
|
| Type of Data | Distribution | Link Function |
|
||||||
| | Family | |
|
| | Family | |
|
||||||
+================+================+================+
|
+================+==================+==========================+
|
||||||
| continuous | Normal | identity: |
|
| continuous | Normal | identity: |
|
||||||
| | | |
|
| | | |
|
||||||
| | | $$ |
|
| | | $$ |
|
||||||
| | | g(x)=x |
|
| | | g(x)=x |
|
||||||
| | | $$ |
|
| | | $$ |
|
||||||
+----------------+----------------+----------------+
|
+----------------+------------------+--------------------------+
|
||||||
| count | Poisson | log: |
|
| count | Poisson | log: |
|
||||||
| | | |
|
| | | |
|
||||||
| | | \$\$ |
|
| | | $$ |
|
||||||
| | | |
|
| | | g(x) = \log(x) |
|
||||||
| | | g(x) = \log(x) |
|
| | | $$ |
|
||||||
| | | \$\$ |
|
+----------------+------------------+--------------------------+
|
||||||
+----------------+----------------+----------------+
|
| binary | Bernoulli | logit: |
|
||||||
| binary | Bernoulli | logit: |
|
| | | |
|
||||||
| | | |
|
| | | $$ |
|
||||||
| | | $$ |
|
| | | g(x) = \log\left( |
|
||||||
| | | g |
|
| | | \frac{x}{1-x} |
|
||||||
| | | ( |
|
| | | \right) |
|
||||||
| | | x) = \log\left |
|
| | | $$ |
|
||||||
| | | ( |
|
+----------------+------------------+--------------------------+
|
||||||
| | | \ |
|
|
||||||
| | | f |
|
In general, the parameter vector $\beta$ is estimated via maximizing the
|
||||||
| | | ra |
|
likelihood, i.e.,
|
||||||
| | | c |
|
|
||||||
| | | { |
|
$$
|
||||||
| | | x}{1-x}\right) |
|
\hat \beta = \mathrm{argmax} \prod_{i=1}^n f(y_i \mid \mathbf{X}_{\cdot i}),
|
||||||
| | | $$ |
|
$$
|
||||||
+----------------+----------------+----------------+
|
|
||||||
|
which is equivalent to the maximization of the log-likelihood, i.e.,
|
||||||
In general, the parameter vector $\beta$ is estimated via maximizing the
|
|
||||||
likelihood, i.e.,
|
$$
|
||||||
|
\hat \beta = \mathrm{argmax} \sum_{i=1}^n \log f(y_i \mid \mathbf{X}_{\cdot i}),
|
||||||
$$
|
$$
|
||||||
\hat \beta = \mathrm{argmax} \prod_{i=1}^n f(y_i \mid \mathbf{X}_{\cdot i}),
|
|
||||||
$$
|
In the Gaussian case, the maximum likelihood estimator is identical to
|
||||||
|
the least squares estimator considered above.
|
||||||
which is equivalent to the maximization of the log-likelihood, i.e.,
|
|
||||||
|
``` julia
|
||||||
$$
|
using CSV
|
||||||
\hat \beta = \mathrm{argmax} \sum_{i=1}^n \log f(y_i \mid \mathbf{X}_{\cdot i}),
|
using HTTP
|
||||||
$$
|
|
||||||
|
http_response = HTTP.get("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SwissLabor.csv")
|
||||||
In the Gaussian case, the maximum likelihood estimator is identical to
|
SwissLabor = DataFrame(CSV.File(http_response.body))
|
||||||
the least squares estimator considered above.
|
|
||||||
|
SwissLabor[!,"participation"] .= (SwissLabor.participation .== "yes")
|
||||||
``` julia
|
|
||||||
using CSV
|
model = glm(@formula(participation ~ age^2),
|
||||||
using HTTP
|
SwissLabor, Binomial(), ProbitLink())
|
||||||
|
```
|
||||||
http_response = HTTP.get("https://vincentarelbundock.github.io/Rdatasets/csv/AER/SwissLabor.csv")
|
|
||||||
SwissLabor = DataFrame(CSV.File(http_response.body))
|
::: {.callout-caution collapse="false"}
|
||||||
|
|
||||||
SwissLabor[!,"participation"] .= (SwissLabor.participation .== "yes")
|
## Task 3:
|
||||||
|
|
||||||
model = glm(@formula(participation ~ age^2),
|
1. Reproduce the results of our data analysis of the `tree` data set
|
||||||
SwissLabor, Binomial(), ProbitLink())
|
using a generalized linear model with normal distribution family.
|
||||||
```
|
2. Generate $n=20$ random covariates $\mathbf{x}$ and Poisson-distributed counting data with parameters $\beta_0 + \beta_1 x_i$. Re-estimate the parameters by a generalized linear model.
|
||||||
|
:::
|
||||||
::: {.callout-caution collapse="false"}
|
|
||||||
|
|
||||||
## Task 3:
|
|
||||||
|
|
||||||
1. Reproduce the results of our data analysis of the `tree` data set
|
|
||||||
using a generalized linear model with normal distribution family.
|
|
||||||
2. Generate $n=20$ random covariates $\mathbf{x}$ and Poisson-distributed counting data with parameters $\beta_0 + \beta_1 x_i$. Re-estimate the parameters by a generalized linear model.
|
|
||||||
:::
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user