diff --git a/material/3_wed/MultipleRegressionBasics.md b/material/3_wed/MultipleRegressionBasics.md deleted file mode 100644 index 9d610cc..0000000 --- a/material/3_wed/MultipleRegressionBasics.md +++ /dev/null @@ -1,100 +0,0 @@ -# Multiple Regression Basics - -## Motivation - -### Introductory Example: tree dataset from R - -[figure of raw data] - -*Aim:* Find relationship between the *response variable* `volume` and the *explanatory variable/covariate* `girth`? -Can we predict the volume of a tree given its girth? - -[figure including a straight line] - -First Guess: There is a linear relation! - - -## Simple Linear Regression - -Main assumption: up to some error term, each measurement of teh response variable $y_i$ depends linearly on the corresponding value $x_i$ of the covariate - -=> **(Simple) Linear Model** -$$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad i=1,...,n,$$ -where $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$ are independent normally distributed errors with unknown variance $\sigma^2$. - -*Task:* Find the straight line that fits best, i.e., find the *optimal* estimators for $\beta_0$ and $\beta_1$. - -*Typical choice*: Least squares estimator (= maximum likelihood estimator for normal errors) -```math -(\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x} \beta_1\|^2 -``` -where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$ is the vector of covariates and $\mathbf{1}$ is a vector of ones. - -Written in matrix style: -```math -(\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \| \mathbf{y} - (\mathbf{1},\mathbf{x}) \left( \begin{array}{c} \beta_0\\ \beta_1\end{array}\right) \|^2 -``` -Note: There is a closed-form expression for $(\hat \beta_0, \hat \beta_1)$. We will not make use of it here, but rather use Julia to solve the problem. - -[use Julia code (existing package) to perform linear regression for ``volume ~ girth``] - -*Interpretation of the Julia output:* -+ column ``estimate`` : least square estimates for $\hat \beta_0$ and $\hat \beta_1$ -+ column ``Std. Error`` : estimated standard deviation $s_\beta$ of the estimators -+ column ``t value`` : value of the $t$-statistics - ```math - t_i = \hat \beta_i \over s_{\beta_i}, \quad i=0,1, - ``` - Under the hypothesis $\beta_i=0$, $t_i$ would follow a $t$-distribution. -+ column ``Pr(>|t|)'': $p$-values for the hyptheses $\beta_i=0$ for $i=0,1$ - -**Exercise**: Generate a random set of covariates $\mathbf{x}$. Given these covariates and true parameters $\beta_0$, $\beta_1$ and $\sigma^2$ (you can choose them)), simulate responses from a linear model -```math -(\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x} \beta_1\|^2 -``` -and estimate the coefficients $\beta_0$ and $\beta_1$. Play with different choices of the parameters to see the effects on the parameter estimates and the $p$-values. - -## Multiple Regression Model - -*Idea*: Generalize the simple linear regression model to multiple covariates, w.g., predict ``volume`` using ``girth`` and `height``. - -=> **Linear Model** -$$y_i = \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} + \varepsilon_i, \qquad i=1,...,n,$$ -where -+ $y_i$: $i$ th measurement of the response, -+ $x_{i1}$: $i$ th value of first covariate, -... -+ $x_{ip}$: $i$ th value of $p$-th covariate, -+ $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$: independent normally distributed errors with unknown variance $\sigma^2$. - -*Task:* Find the *optimal* estimators for $\beta_0, \beta_1, \ldots, \beta_p$. - -*Our choice again:* Least squares estimator (= maximum likelihood estimator for normal errors) -```math -(\hat \beta_0, \hat \beta_1, \ldots, \beta_p) = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x}_1 \beta_1 - \mathbf{x}_p \beta_p\|^2 -``` -where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$_j is the vector of the $j$ th covariate and $\mathbf{1}$ is a vector of ones. - -Written in matrix style: -```math -\mathbf{\hat \beta} = \mathrm{argmin} \ \| \mathbf{y} - (\mathbf{1},\mathbf{x}_1,\ldots,\mathbf{x}_p) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ x_p\end{array} \right) \|^2 -``` - -Defining the *design matrix* $\mathbf{X}$ (size $n \times (p+1)$), we get the short form -```math -\mathbf{\hat \beta} = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{X} \mathbf{\beta} \|^2 = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} -``` - -[use Julia code (existing package) to perform linear regression for ``volume ~ girth + height``] - -Interpretation of the Julia output is similar to the simple linear regression model, but we provide explicit formulas now: -+ parameter estimates -+ estimated standard errors -+ $t$-statistics -+ $p$-values - - **Exercise**: Implement functions that estimate the $\beta$-parameters, the corresponding standard errors and the $t$-statistics. - Test your functions with the ```tree''' data set and try to reproduce the output above. - - - ## Potential Add-on: Multiple Regression Models with Categorical Covariates diff --git a/material/3_wed/MultipleRegressionBasics.qmd b/material/3_wed/MultipleRegressionBasics.qmd new file mode 100644 index 0000000..8af7871 --- /dev/null +++ b/material/3_wed/MultipleRegressionBasics.qmd @@ -0,0 +1,253 @@ +--- +editor: + markdown: + wrap: 72 +--- + +# Multiple Regression Basics + +## Motivation + +### Introductory Example: tree dataset from R + +\[figure of raw data\] + +*Aim:* Find relationship between the *response variable* `volume` and +the *explanatory variable/covariate* `girth`? Can we predict the volume +of a tree given its girth? + +\[figure including a straight line\] + +First Guess: There is a linear relation! + +## Simple Linear Regression + +Main assumption: up to some error term, each measurement of the response +variable $y_i$ depends linearly on the corresponding value $x_i$ of the +covariate + +$\leadsto$ **(Simple) Linear Model:** +$$y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad i=1,...,n,$$ +where $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$ are independent +normally distributed errors with unknown variance $\sigma^2$. + +*Task:* Find the straight line that fits best, i.e., find the *optimal* +estimators for $\beta_0$ and $\beta_1$. + +*Typical choice*: Least squares estimator (= maximum likelihood +estimator for normal errors) + +$$ (\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x} \beta_1\|^2 $$ + +where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$ is the +vector of covariates and $\mathbf{1}$ is a vector of ones. + +Written in matrix style: + +$$ + (\hat \beta_0, \hat \beta_1) = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}) \left( \begin{array}{c} \beta_0\\ \beta_1\end{array}\right) \right\|^2 +$$ + +Note: There is a closed-form expression for +$(\hat \beta_0, \hat \beta_1)$. We will not make use of it here, but +rather use Julia to solve the problem. + +\[use Julia code (existing package) to perform linear regression for +`volume ~ girth`\] + +*Interpretation of the Julia output:* + +- column `estimate` : least square estimates for $\hat \beta_0$ and + $\hat \beta_1$ + +- column `Std. Error` : estimated standard deviation + $\hat s_{\hat \beta_i}$ of the estimator $\hat \beta_i$ + +- column `t value` : value of the $t$-statistics + + $$ t_i = {\hat \beta_i \over \hat s_{\hat \beta_i}}, \quad i=0,1, $$ + + Under the hypothesis $\beta_i=0$, the test statistics $t_i$ would + follow a $t$-distribution. + +- column `Pr(>|t|)`: $p$-values for the hyptheses $\beta_i=0$ for + $i=0,1$ + +**Task 1**: Generate a random set of covariates $\mathbf{x}$. Given +these covariates and true parameters $\beta_0$, $\beta_1$ and $\sigma^2$ +(you can choose them)), simulate responses from a linear model and +estimate the coefficients $\beta_0$ and $\beta_1$. Play with different +choices of the parameters to see the effects on the parameter estimates +and the $p$-values. + +## Multiple Regression Model + +*Idea*: Generalize the simple linear regression model to multiple +covariates, w.g., predict `volume` using `girth` and \`height\`\`. + +$\leadsto$ **Linear Model:** +$$y_i = \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} + \varepsilon_i, \qquad i=1,...,n,$$where + +- $y_i$: $i$-th measurement of the response, + +- $x_{i1}$: $i$ th value of first covariate, + +- ... + +- $x_{ip}$: $i$-th value of $p$-th covariate, + +- $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$: independent normally + distributed errors with unknown variance $\sigma^2$. + +*Task:* Find the *optimal* estimators for +$\mathbf{\beta} = (\beta_0, \beta_1, \ldots, \beta_p)$. + +*Our choice again:* Least squares estimator (= maximum likelihood +estimator for normal errors) + +$$ + \hat \beta = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{1} \beta_0 - \mathbf{x}_1 \beta_1 - \ldots - \mathbf{x}_p \beta_p\|^2 +$$ + +where $\mathbf{y}$ is the vector of responses, $\mathbf{x}$\_j is the +vector of the $j$ th covariate and $\mathbf{1}$ is a vector of ones. + +Written in matrix style: + +$$ +\mathbf{\hat \beta} = \mathrm{argmin} \ \left\| \mathbf{y} - (\mathbf{1},\mathbf{x}_1,\ldots,\mathbf{x}_p) \left( \begin{array}{c} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p\end{array} \right) \right\|^2 +$$ + +Defining the *design matrix* + +$$ \mathbf{X} = \left( \begin{array}{cccc} + 1 & x_{11} & \ldots & x_{1p} \\ + \vdots & \vdots & \ddots & \vdots \\ + 1 & x_{11} & \ldots & x_{1p} + \end{array}\right) \qquad + (\text{size } n \times (p+1)), $$ + +we get the short form + +$$ +\mathbf{\hat \beta} = \mathrm{argmin} \ \| \mathbf{y} - \mathbf{X} \mathbf{\beta} \|^2 = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} +$$ + +\[use Julia code (existing package) to perform linear regression for +`volume ~ girth + height`\] + +The interpretation of the Julia output is similar to the simple linear +regression model, but we provide explicit formulas now: + +- parameter estimates: + + $$ + (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} + $$ + +- estimated standard errors: + + $$ + \hat s_{\beta_i} = \sqrt{([\mathbf{X}^\top \mathbf{X}]^{-1})_{ii} \frac 1 {n-p} \|\mathbf{y} - \mathbf{X} \beta\|^2} + $$ + +- $t$-statistics: + + $$ t_i = \frac{\hat \beta_i}{\hat s_{\hat \beta_i}}, \qquad i=0,\ldots,p. $$ + +- $p$-values: + + $$ + p\text{-value} = \mathbb{P}(|T| > t_i), \quad \text{where } T \sim t_{n-p} + $$ + +**Task 2**: Implement functions that estimate the $\beta$-parameters, +the corresponding standard errors and the $t$-statistics. Test your +functions with the \`\`\`tree''' data set and try to reproduce the +output above. + +## Generalized Linear Models + +Classical linear model + +$$ + \mathbf{y} = \mathbf{X} \beta + \varepsilon +$$ + +implies that +$$ \mathbf{y} \mid \mathbf{X} \sim \mathcal{N}(\mathbf{X} \mathbf{\beta}, \sigma^2\mathbf{I}).$$ + +In particular, the conditional expectation satisfies +$\mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \mathbf{X} \beta$. + +However, the assumption of a normal distribution is unrealistic for +non-continuous data. Popular alternatives include: + +- for counting data: $$ + \mathbf{y} \mid \mathbf{X} \sim \mathrm{Poisson}(\exp(\mathbf{X}\mathbf{\beta})) \qquad \leadsto \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \exp(\mathbf{X} \beta) + $$ + + Here, the components are considered to be independent and the + exponential function is applied componentwise. + +- for binary data: $$ + \mathbf{y} \mid \mathbf{X} \sim \mathrm{Bernoulli}\left( \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})} \right) \qquad \leadsto \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = \frac{\exp(\mathbf{X}\mathbf{\beta})}{1 + \exp(\mathbf{X}\mathbf{\beta})} + $$ + + Again, the components are considered to be independent and all the + operations are applied componentwise. + +All these models are defined by the choice of a family of distributions +and a function $g$ (the so-called *link function*) such that + +$$ + \mathbb{E}(\mathbf{y} \mid \mathbf{X}) = g^{-1}(\mathbf{X} \beta). +$$ + +For the models above, these are: + ++----------------------+---------------------+----------------------+ +| Type of Data | Distribution Family | Link Function | ++======================+=====================+======================+ +| continuous | Normal | identity: | +| | | | +| | | $$ | +| | | g(x)=x | +| | | $$ | ++----------------------+---------------------+----------------------+ +| count | Poisson | log: | +| | | | +| | | $$ | +| | | g(x) = \log(x) | +| | | $$ | ++----------------------+---------------------+----------------------+ +| binary | Bernoulli | logit: | +| | | | +| | | $$ | +| | | g(x) = \log\left | +| | | ( | +| | | \frac{x}{1-x}\right) | +| | | $$ | ++----------------------+---------------------+----------------------+ + +In general, the parameter vector $\beta$ is estimated via maximizing the +likelihood, i.e., + +$$ +\hat \beta = \mathrm{argmax} \prod_{i=1}^n f(y_i \mid \mathbf{X}_{\cdot i}), +$$ + +which is equivalent to the maximization of the log-likelihood, i.e., + +$$ +\hat \beta = \mathrm{argmax} \sum_{i=1}^n \log f(y_i \mid \mathbf{X}_{\cdot i}), +$$ + +In the Gaussian case, the maximum likelihood estimator is identical to +the least squares estimator considered above. + +\[\[ Example in Julia: maybe `SwissLabor` \]\] + +**Task 3:** Reproduce the results of our data analysis of the `tree` +data set using a generalized linear model with normal distribution +family.