Fix refs again (#76)
* Ch2->Ch02 * fixed latex refs again, somehow crept back in * fixed the page refs, formats synced * unsynced * executed notebook besides 10 * warnings for lasso * allow saving of output in notebooks * Ch10 executed
This commit is contained in:
@@ -36,8 +36,8 @@ Please see the `Python` resources page on the book website [statlearning.com](ht
|
|||||||
You will need to install the `ISLP` package, which provides access to the datasets and custom-built functions that we provide.
|
You will need to install the `ISLP` package, which provides access to the datasets and custom-built functions that we provide.
|
||||||
Inside a macOS or Linux terminal type `pip install ISLP`; this also installs most other packages needed in the labs. The `Python` resources page has a link to the `ISLP` documentation website.
|
Inside a macOS or Linux terminal type `pip install ISLP`; this also installs most other packages needed in the labs. The `Python` resources page has a link to the `ISLP` documentation website.
|
||||||
|
|
||||||
To run this lab, download the file `Ch2-statlearn-lab.ipynb` from the `Python` resources page.
|
To run this lab, download the file `Ch02-statlearn-lab.ipynb` from the `Python` resources page.
|
||||||
Now run the following code at the command line: `jupyter lab Ch2-statlearn-lab.ipynb`.
|
Now run the following code at the command line: `jupyter lab Ch02-statlearn-lab.ipynb`.
|
||||||
|
|
||||||
If you're using Windows, you can use the `start menu` to access `anaconda`, and follow the links. For example, to install `ISLP` and run this lab, you can run the same code above in an `anaconda` shell.
|
If you're using Windows, you can use the `start menu` to access `anaconda`, and follow the links. For example, to install `ISLP` and run this lab, you can run the same code above in an `anaconda` shell.
|
||||||
|
|
||||||
@@ -72,7 +72,7 @@ print('fit a model with', 11, 'variables')
|
|||||||
The following command will provide information about the `print()` function.
|
The following command will provide information about the `print()` function.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
print?
|
# print?
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -222,7 +222,7 @@ documentation associated with the function `fun`, if it exists.
|
|||||||
We can try this for `np.array()`.
|
We can try this for `np.array()`.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
np.array?
|
# np.array?
|
||||||
|
|
||||||
```
|
```
|
||||||
This documentation indicates that we could create a floating point array by passing a `dtype` argument into `np.array()`.
|
This documentation indicates that we could create a floating point array by passing a `dtype` argument into `np.array()`.
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -332,7 +332,7 @@ As mentioned above, there is an existing function to add a line to a plot --- `a
|
|||||||
|
|
||||||
|
|
||||||
Next we examine some diagnostic plots, several of which were discussed
|
Next we examine some diagnostic plots, several of which were discussed
|
||||||
in Section~\ref{Ch3:problems.sec}.
|
in Section 3.3.3.
|
||||||
We can find the fitted values and residuals
|
We can find the fitted values and residuals
|
||||||
of the fit as attributes of the `results` object.
|
of the fit as attributes of the `results` object.
|
||||||
Various influence measures describing the regression model
|
Various influence measures describing the regression model
|
||||||
@@ -429,7 +429,7 @@ We can access the individual components of `results` by name
|
|||||||
and
|
and
|
||||||
`np.sqrt(results.scale)` gives us the RSE.
|
`np.sqrt(results.scale)` gives us the RSE.
|
||||||
|
|
||||||
Variance inflation factors (section~\ref{Ch3:problems.sec}) are sometimes useful
|
Variance inflation factors (section 3.3.3) are sometimes useful
|
||||||
to assess the effect of collinearity in the model matrix of a regression model.
|
to assess the effect of collinearity in the model matrix of a regression model.
|
||||||
We will compute the VIFs in our multiple regression fit, and use the opportunity to introduce the idea of *list comprehension*.
|
We will compute the VIFs in our multiple regression fit, and use the opportunity to introduce the idea of *list comprehension*.
|
||||||
|
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -394,7 +394,7 @@ lda.fit(X_train, L_train)
|
|||||||
|
|
||||||
```
|
```
|
||||||
Here we have used the list comprehensions introduced
|
Here we have used the list comprehensions introduced
|
||||||
in Section~\ref{Ch3-linreg-lab:multivariate-goodness-of-fit}. Looking at our first line above, we see that the right-hand side is a list
|
in Section 3.6.4. Looking at our first line above, we see that the right-hand side is a list
|
||||||
of length two. This is because the code `for M in [X_train, X_test]` iterates over a list
|
of length two. This is because the code `for M in [X_train, X_test]` iterates over a list
|
||||||
of length two. While here we loop over a list,
|
of length two. While here we loop over a list,
|
||||||
the list comprehension method works when looping over any iterable object.
|
the list comprehension method works when looping over any iterable object.
|
||||||
@@ -443,7 +443,7 @@ lda.scalings_
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
These values provide the linear combination of `Lag1` and `Lag2` that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (\ref{Ch4:bayes.multi}).
|
These values provide the linear combination of `Lag1` and `Lag2` that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (4.24).
|
||||||
If $-0.64\times `Lag1` - 0.51 \times `Lag2` $ is large, then the LDA classifier will predict a market increase, and if it is small, then the LDA classifier will predict a market decline.
|
If $-0.64\times `Lag1` - 0.51 \times `Lag2` $ is large, then the LDA classifier will predict a market increase, and if it is small, then the LDA classifier will predict a market decline.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -452,7 +452,7 @@ lda_pred = lda.predict(X_test)
|
|||||||
```
|
```
|
||||||
|
|
||||||
As we observed in our comparison of classification methods
|
As we observed in our comparison of classification methods
|
||||||
(Section~\ref{Ch4:comparison.sec}), the LDA and logistic
|
(Section 4.5), the LDA and logistic
|
||||||
regression predictions are almost identical.
|
regression predictions are almost identical.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -511,7 +511,7 @@ The LDA classifier above is the first classifier from the
|
|||||||
`sklearn` library. We will use several other objects
|
`sklearn` library. We will use several other objects
|
||||||
from this library. The objects
|
from this library. The objects
|
||||||
follow a common structure that simplifies tasks such as cross-validation,
|
follow a common structure that simplifies tasks such as cross-validation,
|
||||||
which we will see in Chapter~\ref{Ch5:resample}. Specifically,
|
which we will see in Chapter 5. Specifically,
|
||||||
the methods first create a generic classifier without
|
the methods first create a generic classifier without
|
||||||
referring to any data. This classifier is then fit
|
referring to any data. This classifier is then fit
|
||||||
to data with the `fit()` method and predictions are
|
to data with the `fit()` method and predictions are
|
||||||
@@ -797,7 +797,7 @@ feature_std.std()
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Notice that the standard deviations are not quite $1$ here; this is again due to some procedures using the $1/n$ convention for variances (in this case `scaler()`), while others use $1/(n-1)$ (the `std()` method). See the footnote on page~\pageref{Ch4-varformula}.
|
Notice that the standard deviations are not quite $1$ here; this is again due to some procedures using the $1/n$ convention for variances (in this case `scaler()`), while others use $1/(n-1)$ (the `std()` method). See the footnote on page 183.
|
||||||
In this case it does not matter, as long as the variables are all on the same scale.
|
In this case it does not matter, as long as the variables are all on the same scale.
|
||||||
|
|
||||||
Using the function `train_test_split()` we now split the observations into a test set,
|
Using the function `train_test_split()` we now split the observations into a test set,
|
||||||
@@ -864,7 +864,7 @@ This is double the rate that one would obtain from random guessing.
|
|||||||
The number of neighbors in KNN is referred to as a *tuning parameter*, also referred to as a *hyperparameter*.
|
The number of neighbors in KNN is referred to as a *tuning parameter*, also referred to as a *hyperparameter*.
|
||||||
We do not know *a priori* what value to use. It is therefore of interest
|
We do not know *a priori* what value to use. It is therefore of interest
|
||||||
to see how the classifier performs on test data as we vary these
|
to see how the classifier performs on test data as we vary these
|
||||||
parameters. This can be achieved with a `for` loop, described in Section~\ref{Ch2-statlearn-lab:for-loops}.
|
parameters. This can be achieved with a `for` loop, described in Section 2.3.8.
|
||||||
Here we use a for loop to look at the accuracy of our classifier in the group predicted to purchase
|
Here we use a for loop to look at the accuracy of our classifier in the group predicted to purchase
|
||||||
insurance as we vary the number of neighbors from 1 to 5:
|
insurance as we vary the number of neighbors from 1 to 5:
|
||||||
|
|
||||||
@@ -891,7 +891,7 @@ As a comparison, we can also fit a logistic regression model to the
|
|||||||
data. This can also be done
|
data. This can also be done
|
||||||
with `sklearn`, though by default it fits
|
with `sklearn`, though by default it fits
|
||||||
something like the *ridge regression* version
|
something like the *ridge regression* version
|
||||||
of logistic regression, which we introduce in Chapter~\ref{Ch6:varselect}. This can
|
of logistic regression, which we introduce in Chapter 6. This can
|
||||||
be modified by appropriately setting the argument `C` below. Its default
|
be modified by appropriately setting the argument `C` below. Its default
|
||||||
value is 1 but by setting it to a very large number, the algorithm converges to the same solution as the usual (unregularized)
|
value is 1 but by setting it to a very large number, the algorithm converges to the same solution as the usual (unregularized)
|
||||||
logistic regression estimator discussed above.
|
logistic regression estimator discussed above.
|
||||||
@@ -935,7 +935,7 @@ confusion_table(logit_labels, y_test)
|
|||||||
|
|
||||||
```
|
```
|
||||||
## Linear and Poisson Regression on the Bikeshare Data
|
## Linear and Poisson Regression on the Bikeshare Data
|
||||||
Here we fit linear and Poisson regression models to the `Bikeshare` data, as described in Section~\ref{Ch4:sec:pois}.
|
Here we fit linear and Poisson regression models to the `Bikeshare` data, as described in Section 4.6.
|
||||||
The response `bikers` measures the number of bike rentals per hour
|
The response `bikers` measures the number of bike rentals per hour
|
||||||
in Washington, DC in the period 2010--2012.
|
in Washington, DC in the period 2010--2012.
|
||||||
|
|
||||||
@@ -976,7 +976,7 @@ variables constant, there are on average about 7 more riders in
|
|||||||
February than in January. Similarly there are about 16.5 more riders
|
February than in January. Similarly there are about 16.5 more riders
|
||||||
in March than in January.
|
in March than in January.
|
||||||
|
|
||||||
The results seen in Section~\ref{sec:bikeshare.linear}
|
The results seen in Section 4.6.1
|
||||||
used a slightly different coding of the variables `hr` and `mnth`, as follows:
|
used a slightly different coding of the variables `hr` and `mnth`, as follows:
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -1030,7 +1030,7 @@ np.allclose(M_lm.fittedvalues, M2_lm.fittedvalues)
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
To reproduce the left-hand side of Figure~\ref{Ch4:bikeshare}
|
To reproduce the left-hand side of Figure 4.13
|
||||||
we must first obtain the coefficient estimates associated with
|
we must first obtain the coefficient estimates associated with
|
||||||
`mnth`. The coefficients for January through November can be obtained
|
`mnth`. The coefficients for January through November can be obtained
|
||||||
directly from the `M2_lm` object. The coefficient for December
|
directly from the `M2_lm` object. The coefficient for December
|
||||||
@@ -1070,7 +1070,7 @@ ax_month.set_ylabel('Coefficient', fontsize=20);
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Reproducing the right-hand plot in Figure~\ref{Ch4:bikeshare} follows a similar process.
|
Reproducing the right-hand plot in Figure 4.13 follows a similar process.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
coef_hr = S2[S2.index.str.contains('hr')]['coef']
|
coef_hr = S2[S2.index.str.contains('hr')]['coef']
|
||||||
@@ -1105,7 +1105,7 @@ M_pois = sm.GLM(Y, X2, family=sm.families.Poisson()).fit()
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce Figure~\ref{Ch4:bikeshare.pois}. We first complete these coefficients as before.
|
We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce Figure 4.15. We first complete these coefficients as before.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
S_pois = summarize(M_pois)
|
S_pois = summarize(M_pois)
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -226,7 +226,7 @@ for i, d in enumerate(range(1,6)):
|
|||||||
cv_error
|
cv_error
|
||||||
|
|
||||||
```
|
```
|
||||||
As in Figure~\ref{Ch5:cvplot}, we see a sharp drop in the estimated test MSE between the linear and
|
As in Figure 5.4, we see a sharp drop in the estimated test MSE between the linear and
|
||||||
quadratic fits, but then no clear improvement from using higher-degree polynomials.
|
quadratic fits, but then no clear improvement from using higher-degree polynomials.
|
||||||
|
|
||||||
Above we introduced the `outer()` method of the `np.power()`
|
Above we introduced the `outer()` method of the `np.power()`
|
||||||
@@ -267,7 +267,7 @@ cv_error
|
|||||||
Notice that the computation time is much shorter than that of LOOCV.
|
Notice that the computation time is much shorter than that of LOOCV.
|
||||||
(In principle, the computation time for LOOCV for a least squares
|
(In principle, the computation time for LOOCV for a least squares
|
||||||
linear model should be faster than for $k$-fold CV, due to the
|
linear model should be faster than for $k$-fold CV, due to the
|
||||||
availability of the formula~(\ref{Ch5:eq:LOOCVform}) for LOOCV;
|
availability of the formula~(5.2) for LOOCV;
|
||||||
however, the generic `cross_validate()` function does not make
|
however, the generic `cross_validate()` function does not make
|
||||||
use of this formula.) We still see little evidence that using cubic
|
use of this formula.) We still see little evidence that using cubic
|
||||||
or higher-degree polynomial terms leads to a lower test error than simply
|
or higher-degree polynomial terms leads to a lower test error than simply
|
||||||
@@ -314,7 +314,7 @@ incurred by picking different random folds.
|
|||||||
|
|
||||||
## The Bootstrap
|
## The Bootstrap
|
||||||
We illustrate the use of the bootstrap in the simple example
|
We illustrate the use of the bootstrap in the simple example
|
||||||
{of Section~\ref{Ch5:sec:bootstrap},} as well as on an example involving
|
{of Section 5.2,} as well as on an example involving
|
||||||
estimating the accuracy of the linear regression model on the `Auto`
|
estimating the accuracy of the linear regression model on the `Auto`
|
||||||
data set.
|
data set.
|
||||||
### Estimating the Accuracy of a Statistic of Interest
|
### Estimating the Accuracy of a Statistic of Interest
|
||||||
@@ -329,8 +329,8 @@ in a dataframe.
|
|||||||
To illustrate the bootstrap, we
|
To illustrate the bootstrap, we
|
||||||
start with a simple example.
|
start with a simple example.
|
||||||
The `Portfolio` data set in the `ISLP` package is described
|
The `Portfolio` data set in the `ISLP` package is described
|
||||||
in Section~\ref{Ch5:sec:bootstrap}. The goal is to estimate the
|
in Section 5.2. The goal is to estimate the
|
||||||
sampling variance of the parameter $\alpha$ given in formula~(\ref{Ch5:min.var}). We will
|
sampling variance of the parameter $\alpha$ given in formula~(5.7). We will
|
||||||
create a function
|
create a function
|
||||||
`alpha_func()`, which takes as input a dataframe `D` assumed
|
`alpha_func()`, which takes as input a dataframe `D` assumed
|
||||||
to have columns `X` and `Y`, as well as a
|
to have columns `X` and `Y`, as well as a
|
||||||
@@ -349,7 +349,7 @@ def alpha_func(D, idx):
|
|||||||
```
|
```
|
||||||
This function returns an estimate for $\alpha$
|
This function returns an estimate for $\alpha$
|
||||||
based on applying the minimum
|
based on applying the minimum
|
||||||
variance formula (\ref{Ch5:min.var}) to the observations indexed by
|
variance formula (5.7) to the observations indexed by
|
||||||
the argument `idx`. For instance, the following command
|
the argument `idx`. For instance, the following command
|
||||||
estimates $\alpha$ using all 100 observations.
|
estimates $\alpha$ using all 100 observations.
|
||||||
|
|
||||||
@@ -419,7 +419,7 @@ intercept and slope terms for the linear regression model that uses
|
|||||||
`horsepower` to predict `mpg` in the `Auto` data set. We
|
`horsepower` to predict `mpg` in the `Auto` data set. We
|
||||||
will compare the estimates obtained using the bootstrap to those
|
will compare the estimates obtained using the bootstrap to those
|
||||||
obtained using the formulas for ${\rm SE}(\hat{\beta}_0)$ and
|
obtained using the formulas for ${\rm SE}(\hat{\beta}_0)$ and
|
||||||
${\rm SE}(\hat{\beta}_1)$ described in Section~\ref{Ch3:secoefsec}.
|
${\rm SE}(\hat{\beta}_1)$ described in Section 3.1.2.
|
||||||
|
|
||||||
To use our `boot_SE()` function, we must write a function (its
|
To use our `boot_SE()` function, we must write a function (its
|
||||||
first argument)
|
first argument)
|
||||||
@@ -488,7 +488,7 @@ This indicates that the bootstrap estimate for ${\rm SE}(\hat{\beta}_0)$ is
|
|||||||
0.85, and that the bootstrap
|
0.85, and that the bootstrap
|
||||||
estimate for ${\rm SE}(\hat{\beta}_1)$ is
|
estimate for ${\rm SE}(\hat{\beta}_1)$ is
|
||||||
0.0074. As discussed in
|
0.0074. As discussed in
|
||||||
Section~\ref{Ch3:secoefsec}, standard formulas can be used to compute
|
Section 3.1.2, standard formulas can be used to compute
|
||||||
the standard errors for the regression coefficients in a linear
|
the standard errors for the regression coefficients in a linear
|
||||||
model. These can be obtained using the `summarize()` function
|
model. These can be obtained using the `summarize()` function
|
||||||
from `ISLP.sm`.
|
from `ISLP.sm`.
|
||||||
@@ -502,7 +502,7 @@ model_se
|
|||||||
|
|
||||||
|
|
||||||
The standard error estimates for $\hat{\beta}_0$ and $\hat{\beta}_1$
|
The standard error estimates for $\hat{\beta}_0$ and $\hat{\beta}_1$
|
||||||
obtained using the formulas from Section~\ref{Ch3:secoefsec} are
|
obtained using the formulas from Section 3.1.2 are
|
||||||
0.717 for the
|
0.717 for the
|
||||||
intercept and
|
intercept and
|
||||||
0.006 for the
|
0.006 for the
|
||||||
@@ -510,13 +510,13 @@ slope. Interestingly, these are somewhat different from the estimates
|
|||||||
obtained using the bootstrap. Does this indicate a problem with the
|
obtained using the bootstrap. Does this indicate a problem with the
|
||||||
bootstrap? In fact, it suggests the opposite. Recall that the
|
bootstrap? In fact, it suggests the opposite. Recall that the
|
||||||
standard formulas given in
|
standard formulas given in
|
||||||
{Equation~\ref{Ch3:se.eqn} on page~\pageref{Ch3:se.eqn}}
|
Equation 3.8 on page 75
|
||||||
rely on certain assumptions. For example,
|
rely on certain assumptions. For example,
|
||||||
they depend on the unknown parameter $\sigma^2$, the noise
|
they depend on the unknown parameter $\sigma^2$, the noise
|
||||||
variance. We then estimate $\sigma^2$ using the RSS. Now although the
|
variance. We then estimate $\sigma^2$ using the RSS. Now although the
|
||||||
formulas for the standard errors do not rely on the linear model being
|
formulas for the standard errors do not rely on the linear model being
|
||||||
correct, the estimate for $\sigma^2$ does. We see
|
correct, the estimate for $\sigma^2$ does. We see
|
||||||
{in Figure~\ref{Ch3:polyplot} on page~\pageref{Ch3:polyplot}} that there is
|
in Figure 3.8 on page 99 that there is
|
||||||
a non-linear relationship in the data, and so the residuals from a
|
a non-linear relationship in the data, and so the residuals from a
|
||||||
linear fit will be inflated, and so will $\hat{\sigma}^2$. Secondly,
|
linear fit will be inflated, and so will $\hat{\sigma}^2$. Secondly,
|
||||||
the standard formulas assume (somewhat unrealistically) that the $x_i$
|
the standard formulas assume (somewhat unrealistically) that the $x_i$
|
||||||
@@ -529,7 +529,7 @@ the results from `sm.OLS`.
|
|||||||
Below we compute the bootstrap standard error estimates and the
|
Below we compute the bootstrap standard error estimates and the
|
||||||
standard linear regression estimates that result from fitting the
|
standard linear regression estimates that result from fitting the
|
||||||
quadratic model to the data. Since this model provides a good fit to
|
quadratic model to the data. Since this model provides a good fit to
|
||||||
the data (Figure~\ref{Ch3:polyplot}), there is now a better
|
the data (Figure 3.8), there is now a better
|
||||||
correspondence between the bootstrap estimates and the standard
|
correspondence between the bootstrap estimates and the standard
|
||||||
estimates of ${\rm SE}(\hat{\beta}_0)$, ${\rm SE}(\hat{\beta}_1)$ and
|
estimates of ${\rm SE}(\hat{\beta}_0)$, ${\rm SE}(\hat{\beta}_1)$ and
|
||||||
${\rm SE}(\hat{\beta}_2)$.
|
${\rm SE}(\hat{\beta}_2)$.
|
||||||
|
|||||||
@@ -5,6 +5,7 @@
|
|||||||
"id": "6dde3cef",
|
"id": "6dde3cef",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
|
"\n",
|
||||||
"# Cross-Validation and the Bootstrap\n",
|
"# Cross-Validation and the Bootstrap\n",
|
||||||
"\n",
|
"\n",
|
||||||
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2.1/Ch05-resample-lab.ipynb\">\n",
|
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2.1/Ch05-resample-lab.ipynb\">\n",
|
||||||
@@ -32,10 +33,10 @@
|
|||||||
"id": "f1deb5cc",
|
"id": "f1deb5cc",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:38.213402Z",
|
"iopub.execute_input": "2026-02-05T00:47:05.730449Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:38.212455Z",
|
"iopub.status.busy": "2026-02-05T00:47:05.730329Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:39.121524Z",
|
"iopub.status.idle": "2026-02-05T00:47:06.513542Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:39.121125Z"
|
"shell.execute_reply": "2026-02-05T00:47:06.513033Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 2
|
"lines_to_next_cell": 2
|
||||||
},
|
},
|
||||||
@@ -64,10 +65,10 @@
|
|||||||
"id": "268c41b3",
|
"id": "268c41b3",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:39.123447Z",
|
"iopub.execute_input": "2026-02-05T00:47:06.515215Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:39.123256Z",
|
"iopub.status.busy": "2026-02-05T00:47:06.515043Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:39.125378Z",
|
"iopub.status.idle": "2026-02-05T00:47:06.517293Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:39.125152Z"
|
"shell.execute_reply": "2026-02-05T00:47:06.516837Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 2
|
"lines_to_next_cell": 2
|
||||||
},
|
},
|
||||||
@@ -108,10 +109,10 @@
|
|||||||
"id": "22f44ae0",
|
"id": "22f44ae0",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:39.126552Z",
|
"iopub.execute_input": "2026-02-05T00:47:06.518601Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:39.126448Z",
|
"iopub.status.busy": "2026-02-05T00:47:06.518485Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:39.136455Z",
|
"iopub.status.idle": "2026-02-05T00:47:06.524299Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:39.135968Z"
|
"shell.execute_reply": "2026-02-05T00:47:06.523724Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
@@ -136,10 +137,10 @@
|
|||||||
"id": "0c32e917",
|
"id": "0c32e917",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:39.138496Z",
|
"iopub.execute_input": "2026-02-05T00:47:06.525648Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:39.138360Z",
|
"iopub.status.busy": "2026-02-05T00:47:06.525549Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:39.144210Z",
|
"iopub.status.idle": "2026-02-05T00:47:06.531033Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:39.143911Z"
|
"shell.execute_reply": "2026-02-05T00:47:06.530633Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
@@ -166,17 +167,17 @@
|
|||||||
"id": "86ce4f85",
|
"id": "86ce4f85",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:39.145874Z",
|
"iopub.execute_input": "2026-02-05T00:47:06.532517Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:39.145769Z",
|
"iopub.status.busy": "2026-02-05T00:47:06.532422Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:39.150170Z",
|
"iopub.status.idle": "2026-02-05T00:47:06.537369Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:39.149908Z"
|
"shell.execute_reply": "2026-02-05T00:47:06.536814Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"23.61661706966988"
|
"np.float64(23.616617069669893)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 5,
|
"execution_count": 5,
|
||||||
@@ -210,10 +211,10 @@
|
|||||||
"id": "50a66a97",
|
"id": "50a66a97",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:39.151449Z",
|
"iopub.execute_input": "2026-02-05T00:47:06.538690Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:39.151374Z",
|
"iopub.status.busy": "2026-02-05T00:47:06.538577Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:39.153368Z",
|
"iopub.status.idle": "2026-02-05T00:47:06.540965Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:39.153165Z"
|
"shell.execute_reply": "2026-02-05T00:47:06.540491Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
@@ -253,10 +254,10 @@
|
|||||||
"id": "d49b6999",
|
"id": "d49b6999",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:39.154485Z",
|
"iopub.execute_input": "2026-02-05T00:47:06.542270Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:39.154414Z",
|
"iopub.status.busy": "2026-02-05T00:47:06.542165Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:39.167468Z",
|
"iopub.status.idle": "2026-02-05T00:47:06.560011Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:39.167243Z"
|
"shell.execute_reply": "2026-02-05T00:47:06.559510Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
@@ -297,10 +298,10 @@
|
|||||||
"id": "dac8bd54",
|
"id": "dac8bd54",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:39.168726Z",
|
"iopub.execute_input": "2026-02-05T00:47:06.561313Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:39.168651Z",
|
"iopub.status.busy": "2026-02-05T00:47:06.561212Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:39.182426Z",
|
"iopub.status.idle": "2026-02-05T00:47:06.579372Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:39.182175Z"
|
"shell.execute_reply": "2026-02-05T00:47:06.578863Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
@@ -380,10 +381,10 @@
|
|||||||
"id": "601ae443",
|
"id": "601ae443",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:39.184522Z",
|
"iopub.execute_input": "2026-02-05T00:47:06.580908Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:39.184425Z",
|
"iopub.status.busy": "2026-02-05T00:47:06.580793Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.007057Z",
|
"iopub.status.idle": "2026-02-05T00:47:07.783382Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.006826Z"
|
"shell.execute_reply": "2026-02-05T00:47:07.782912Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -391,7 +392,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"24.23151351792924"
|
"np.float64(24.231513517929226)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 9,
|
"execution_count": 9,
|
||||||
@@ -448,10 +449,10 @@
|
|||||||
"id": "11226c85",
|
"id": "11226c85",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.008355Z",
|
"iopub.execute_input": "2026-02-05T00:47:07.785034Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.008281Z",
|
"iopub.status.busy": "2026-02-05T00:47:07.784918Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.469400Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.636881Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.469130Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.636274Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -459,7 +460,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"array([24.23151352, 19.24821312, 19.33498406, 19.42443033, 19.03323827])"
|
"array([24.23151352, 19.24821312, 19.33498406, 19.4244303 , 19.03321527])"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 10,
|
"execution_count": 10,
|
||||||
@@ -486,7 +487,7 @@
|
|||||||
"id": "a3a920ae",
|
"id": "a3a920ae",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"As in Figure~\\ref{Ch5:cvplot}, we see a sharp drop in the estimated test MSE between the linear and\n",
|
"As in Figure 5.4, we see a sharp drop in the estimated test MSE between the linear and\n",
|
||||||
"quadratic fits, but then no clear improvement from using higher-degree polynomials.\n",
|
"quadratic fits, but then no clear improvement from using higher-degree polynomials.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Above we introduced the `outer()` method of the `np.power()`\n",
|
"Above we introduced the `outer()` method of the `np.power()`\n",
|
||||||
@@ -505,10 +506,10 @@
|
|||||||
"id": "64b64d97",
|
"id": "64b64d97",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.470650Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.638904Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.470570Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.638760Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.472721Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.641760Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.472520Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.641094Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
@@ -547,10 +548,10 @@
|
|||||||
"id": "ca0f972f",
|
"id": "ca0f972f",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.473794Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.643305Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.473725Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.643162Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.491806Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.675141Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.491547Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.674740Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -558,7 +559,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"array([24.20766449, 19.18533142, 19.27626666, 19.47848403, 19.13720581])"
|
"array([24.20766449, 19.18533142, 19.27626666, 19.47848402, 19.13718373])"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 12,
|
"execution_count": 12,
|
||||||
@@ -589,7 +590,7 @@
|
|||||||
"Notice that the computation time is much shorter than that of LOOCV.\n",
|
"Notice that the computation time is much shorter than that of LOOCV.\n",
|
||||||
"(In principle, the computation time for LOOCV for a least squares\n",
|
"(In principle, the computation time for LOOCV for a least squares\n",
|
||||||
"linear model should be faster than for $k$-fold CV, due to the\n",
|
"linear model should be faster than for $k$-fold CV, due to the\n",
|
||||||
"availability of the formula~(\\ref{Ch5:eq:LOOCVform}) for LOOCV;\n",
|
"availability of the formula~(5.2) for LOOCV;\n",
|
||||||
"however, the generic `cross_validate()` function does not make\n",
|
"however, the generic `cross_validate()` function does not make\n",
|
||||||
"use of this formula.) We still see little evidence that using cubic\n",
|
"use of this formula.) We still see little evidence that using cubic\n",
|
||||||
"or higher-degree polynomial terms leads to a lower test error than simply\n",
|
"or higher-degree polynomial terms leads to a lower test error than simply\n",
|
||||||
@@ -613,10 +614,10 @@
|
|||||||
"id": "080cdb29",
|
"id": "080cdb29",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.493215Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.677151Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.493138Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.676983Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.498978Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.685777Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.498768Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.685174Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 2
|
"lines_to_next_cell": 2
|
||||||
},
|
},
|
||||||
@@ -657,17 +658,17 @@
|
|||||||
"id": "7c46de2b",
|
"id": "7c46de2b",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.500169Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.687387Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.500100Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.687268Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.525494Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.726376Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.525258Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.725862Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"(23.802232661034168, 1.421845094109185)"
|
"(np.float64(23.802232661034168), np.float64(1.4218450941091916))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 14,
|
"execution_count": 14,
|
||||||
@@ -699,7 +700,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"## The Bootstrap\n",
|
"## The Bootstrap\n",
|
||||||
"We illustrate the use of the bootstrap in the simple example\n",
|
"We illustrate the use of the bootstrap in the simple example\n",
|
||||||
" {of Section~\\ref{Ch5:sec:bootstrap},} as well as on an example involving\n",
|
" {of Section 5.2,} as well as on an example involving\n",
|
||||||
"estimating the accuracy of the linear regression model on the `Auto`\n",
|
"estimating the accuracy of the linear regression model on the `Auto`\n",
|
||||||
"data set.\n",
|
"data set.\n",
|
||||||
"### Estimating the Accuracy of a Statistic of Interest\n",
|
"### Estimating the Accuracy of a Statistic of Interest\n",
|
||||||
@@ -714,8 +715,8 @@
|
|||||||
"To illustrate the bootstrap, we\n",
|
"To illustrate the bootstrap, we\n",
|
||||||
"start with a simple example.\n",
|
"start with a simple example.\n",
|
||||||
"The `Portfolio` data set in the `ISLP` package is described\n",
|
"The `Portfolio` data set in the `ISLP` package is described\n",
|
||||||
"in Section~\\ref{Ch5:sec:bootstrap}. The goal is to estimate the\n",
|
"in Section 5.2. The goal is to estimate the\n",
|
||||||
"sampling variance of the parameter $\\alpha$ given in formula~(\\ref{Ch5:min.var}). We will\n",
|
"sampling variance of the parameter $\\alpha$ given in formula~(5.7). We will\n",
|
||||||
"create a function\n",
|
"create a function\n",
|
||||||
"`alpha_func()`, which takes as input a dataframe `D` assumed\n",
|
"`alpha_func()`, which takes as input a dataframe `D` assumed\n",
|
||||||
"to have columns `X` and `Y`, as well as a\n",
|
"to have columns `X` and `Y`, as well as a\n",
|
||||||
@@ -731,10 +732,10 @@
|
|||||||
"id": "a4b6d9b3",
|
"id": "a4b6d9b3",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.526599Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.727991Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.526530Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.727860Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.529182Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.731089Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.528979Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.730582Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -754,7 +755,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"This function returns an estimate for $\\alpha$\n",
|
"This function returns an estimate for $\\alpha$\n",
|
||||||
"based on applying the minimum\n",
|
"based on applying the minimum\n",
|
||||||
" variance formula (\\ref{Ch5:min.var}) to the observations indexed by\n",
|
" variance formula (5.7) to the observations indexed by\n",
|
||||||
"the argument `idx`. For instance, the following command\n",
|
"the argument `idx`. For instance, the following command\n",
|
||||||
"estimates $\\alpha$ using all 100 observations."
|
"estimates $\\alpha$ using all 100 observations."
|
||||||
]
|
]
|
||||||
@@ -765,17 +766,17 @@
|
|||||||
"id": "81498a11",
|
"id": "81498a11",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.530362Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.732467Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.530277Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.732368Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.532680Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.735439Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.532462Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.734958Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"0.57583207459283"
|
"np.float64(0.57583207459283)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 16,
|
"execution_count": 16,
|
||||||
@@ -804,10 +805,10 @@
|
|||||||
"id": "64fe1cb6",
|
"id": "64fe1cb6",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.533876Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.736816Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.533810Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.736715Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.536977Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.739777Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.536767Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.739384Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 2
|
"lines_to_next_cell": 2
|
||||||
},
|
},
|
||||||
@@ -815,7 +816,7 @@
|
|||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"0.6074452469619004"
|
"np.float64(0.6074452469619002)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 17,
|
"execution_count": 17,
|
||||||
@@ -847,10 +848,10 @@
|
|||||||
"id": "dd16bbae",
|
"id": "dd16bbae",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.538089Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.741328Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.538010Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.741207Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.540142Z",
|
"iopub.status.idle": "2026-02-05T00:47:08.743628Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.539916Z"
|
"shell.execute_reply": "2026-02-05T00:47:08.743222Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -892,17 +893,17 @@
|
|||||||
"id": "b42b4585",
|
"id": "b42b4585",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.541333Z",
|
"iopub.execute_input": "2026-02-05T00:47:08.744970Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.541248Z",
|
"iopub.status.busy": "2026-02-05T00:47:08.744866Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.761713Z",
|
"iopub.status.idle": "2026-02-05T00:47:09.034568Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.761472Z"
|
"shell.execute_reply": "2026-02-05T00:47:09.034067Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
{
|
{
|
||||||
"data": {
|
"data": {
|
||||||
"text/plain": [
|
"text/plain": [
|
||||||
"0.09118176521277699"
|
"np.float64(0.09118176521277699)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"execution_count": 19,
|
"execution_count": 19,
|
||||||
@@ -934,7 +935,7 @@
|
|||||||
"`horsepower` to predict `mpg` in the `Auto` data set. We\n",
|
"`horsepower` to predict `mpg` in the `Auto` data set. We\n",
|
||||||
"will compare the estimates obtained using the bootstrap to those\n",
|
"will compare the estimates obtained using the bootstrap to those\n",
|
||||||
"obtained using the formulas for ${\\rm SE}(\\hat{\\beta}_0)$ and\n",
|
"obtained using the formulas for ${\\rm SE}(\\hat{\\beta}_0)$ and\n",
|
||||||
"${\\rm SE}(\\hat{\\beta}_1)$ described in Section~\\ref{Ch3:secoefsec}.\n",
|
"${\\rm SE}(\\hat{\\beta}_1)$ described in Section 3.1.2.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To use our `boot_SE()` function, we must write a function (its\n",
|
"To use our `boot_SE()` function, we must write a function (its\n",
|
||||||
"first argument)\n",
|
"first argument)\n",
|
||||||
@@ -958,10 +959,10 @@
|
|||||||
"id": "6bc11784",
|
"id": "6bc11784",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.762911Z",
|
"iopub.execute_input": "2026-02-05T00:47:09.036186Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.762837Z",
|
"iopub.status.busy": "2026-02-05T00:47:09.036064Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.764597Z",
|
"iopub.status.idle": "2026-02-05T00:47:09.038120Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.764401Z"
|
"shell.execute_reply": "2026-02-05T00:47:09.037695Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -993,10 +994,10 @@
|
|||||||
"id": "740cd50c",
|
"id": "740cd50c",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.765800Z",
|
"iopub.execute_input": "2026-02-05T00:47:09.039473Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.765734Z",
|
"iopub.status.busy": "2026-02-05T00:47:09.039363Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.767191Z",
|
"iopub.status.idle": "2026-02-05T00:47:09.041198Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.766986Z"
|
"shell.execute_reply": "2026-02-05T00:47:09.040802Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -1026,10 +1027,10 @@
|
|||||||
"id": "ffb3ec50",
|
"id": "ffb3ec50",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.768266Z",
|
"iopub.execute_input": "2026-02-05T00:47:09.042630Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.768204Z",
|
"iopub.status.busy": "2026-02-05T00:47:09.042520Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:40.789931Z",
|
"iopub.status.idle": "2026-02-05T00:47:09.074556Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:40.789705Z"
|
"shell.execute_reply": "2026-02-05T00:47:09.074079Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -1077,10 +1078,10 @@
|
|||||||
"id": "7d561f70",
|
"id": "7d561f70",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:40.791196Z",
|
"iopub.execute_input": "2026-02-05T00:47:09.076044Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:40.791129Z",
|
"iopub.status.busy": "2026-02-05T00:47:09.075921Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:42.690473Z",
|
"iopub.status.idle": "2026-02-05T00:47:11.975218Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:42.690189Z"
|
"shell.execute_reply": "2026-02-05T00:47:11.974653Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 2
|
"lines_to_next_cell": 2
|
||||||
},
|
},
|
||||||
@@ -1115,7 +1116,7 @@
|
|||||||
"0.85, and that the bootstrap\n",
|
"0.85, and that the bootstrap\n",
|
||||||
"estimate for ${\\rm SE}(\\hat{\\beta}_1)$ is\n",
|
"estimate for ${\\rm SE}(\\hat{\\beta}_1)$ is\n",
|
||||||
"0.0074. As discussed in\n",
|
"0.0074. As discussed in\n",
|
||||||
"Section~\\ref{Ch3:secoefsec}, standard formulas can be used to compute\n",
|
"Section 3.1.2, standard formulas can be used to compute\n",
|
||||||
"the standard errors for the regression coefficients in a linear\n",
|
"the standard errors for the regression coefficients in a linear\n",
|
||||||
"model. These can be obtained using the `summarize()` function\n",
|
"model. These can be obtained using the `summarize()` function\n",
|
||||||
"from `ISLP.sm`."
|
"from `ISLP.sm`."
|
||||||
@@ -1127,10 +1128,10 @@
|
|||||||
"id": "3888aa0a",
|
"id": "3888aa0a",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:42.692058Z",
|
"iopub.execute_input": "2026-02-05T00:47:11.976651Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:42.691967Z",
|
"iopub.status.busy": "2026-02-05T00:47:11.976530Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:42.737592Z",
|
"iopub.status.idle": "2026-02-05T00:47:11.995335Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:42.737346Z"
|
"shell.execute_reply": "2026-02-05T00:47:11.994822Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 2
|
"lines_to_next_cell": 2
|
||||||
},
|
},
|
||||||
@@ -1160,7 +1161,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The standard error estimates for $\\hat{\\beta}_0$ and $\\hat{\\beta}_1$\n",
|
"The standard error estimates for $\\hat{\\beta}_0$ and $\\hat{\\beta}_1$\n",
|
||||||
"obtained using the formulas from Section~\\ref{Ch3:secoefsec} are\n",
|
"obtained using the formulas from Section 3.1.2 are\n",
|
||||||
"0.717 for the\n",
|
"0.717 for the\n",
|
||||||
"intercept and\n",
|
"intercept and\n",
|
||||||
"0.006 for the\n",
|
"0.006 for the\n",
|
||||||
@@ -1168,13 +1169,13 @@
|
|||||||
"obtained using the bootstrap. Does this indicate a problem with the\n",
|
"obtained using the bootstrap. Does this indicate a problem with the\n",
|
||||||
"bootstrap? In fact, it suggests the opposite. Recall that the\n",
|
"bootstrap? In fact, it suggests the opposite. Recall that the\n",
|
||||||
"standard formulas given in\n",
|
"standard formulas given in\n",
|
||||||
" {Equation~\\ref{Ch3:se.eqn} on page~\\pageref{Ch3:se.eqn}}\n",
|
"Equation 3.8 on page 75\n",
|
||||||
"rely on certain assumptions. For example,\n",
|
"rely on certain assumptions. For example,\n",
|
||||||
"they depend on the unknown parameter $\\sigma^2$, the noise\n",
|
"they depend on the unknown parameter $\\sigma^2$, the noise\n",
|
||||||
"variance. We then estimate $\\sigma^2$ using the RSS. Now although the\n",
|
"variance. We then estimate $\\sigma^2$ using the RSS. Now although the\n",
|
||||||
"formulas for the standard errors do not rely on the linear model being\n",
|
"formulas for the standard errors do not rely on the linear model being\n",
|
||||||
"correct, the estimate for $\\sigma^2$ does. We see\n",
|
"correct, the estimate for $\\sigma^2$ does. We see\n",
|
||||||
" {in Figure~\\ref{Ch3:polyplot} on page~\\pageref{Ch3:polyplot}} that there is\n",
|
"in Figure 3.8 on page 99 that there is\n",
|
||||||
"a non-linear relationship in the data, and so the residuals from a\n",
|
"a non-linear relationship in the data, and so the residuals from a\n",
|
||||||
"linear fit will be inflated, and so will $\\hat{\\sigma}^2$. Secondly,\n",
|
"linear fit will be inflated, and so will $\\hat{\\sigma}^2$. Secondly,\n",
|
||||||
"the standard formulas assume (somewhat unrealistically) that the $x_i$\n",
|
"the standard formulas assume (somewhat unrealistically) that the $x_i$\n",
|
||||||
@@ -1187,7 +1188,7 @@
|
|||||||
"Below we compute the bootstrap standard error estimates and the\n",
|
"Below we compute the bootstrap standard error estimates and the\n",
|
||||||
"standard linear regression estimates that result from fitting the\n",
|
"standard linear regression estimates that result from fitting the\n",
|
||||||
"quadratic model to the data. Since this model provides a good fit to\n",
|
"quadratic model to the data. Since this model provides a good fit to\n",
|
||||||
"the data (Figure~\\ref{Ch3:polyplot}), there is now a better\n",
|
"the data (Figure 3.8), there is now a better\n",
|
||||||
"correspondence between the bootstrap estimates and the standard\n",
|
"correspondence between the bootstrap estimates and the standard\n",
|
||||||
"estimates of ${\\rm SE}(\\hat{\\beta}_0)$, ${\\rm SE}(\\hat{\\beta}_1)$ and\n",
|
"estimates of ${\\rm SE}(\\hat{\\beta}_0)$, ${\\rm SE}(\\hat{\\beta}_1)$ and\n",
|
||||||
"${\\rm SE}(\\hat{\\beta}_2)$."
|
"${\\rm SE}(\\hat{\\beta}_2)$."
|
||||||
@@ -1199,10 +1200,10 @@
|
|||||||
"id": "acc3e32c",
|
"id": "acc3e32c",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:42.739037Z",
|
"iopub.execute_input": "2026-02-05T00:47:11.996827Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:42.738882Z",
|
"iopub.status.busy": "2026-02-05T00:47:11.996698Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:45.405121Z",
|
"iopub.status.idle": "2026-02-05T00:47:16.184970Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:45.404784Z"
|
"shell.execute_reply": "2026-02-05T00:47:16.184301Z"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
@@ -1242,10 +1243,10 @@
|
|||||||
"id": "dca5340c",
|
"id": "dca5340c",
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"execution": {
|
"execution": {
|
||||||
"iopub.execute_input": "2026-02-02T23:39:45.406403Z",
|
"iopub.execute_input": "2026-02-05T00:47:16.186711Z",
|
||||||
"iopub.status.busy": "2026-02-02T23:39:45.406328Z",
|
"iopub.status.busy": "2026-02-05T00:47:16.186585Z",
|
||||||
"iopub.status.idle": "2026-02-02T23:39:45.416112Z",
|
"iopub.status.idle": "2026-02-05T00:47:16.197450Z",
|
||||||
"shell.execute_reply": "2026-02-02T23:39:45.415874Z"
|
"shell.execute_reply": "2026-02-05T00:47:16.197112Z"
|
||||||
},
|
},
|
||||||
"lines_to_next_cell": 0
|
"lines_to_next_cell": 0
|
||||||
},
|
},
|
||||||
@@ -1272,7 +1273,7 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"id": "e98297be",
|
"id": "d714b6a3",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"\n",
|
"\n",
|
||||||
@@ -1283,8 +1284,7 @@
|
|||||||
"metadata": {
|
"metadata": {
|
||||||
"jupytext": {
|
"jupytext": {
|
||||||
"cell_metadata_filter": "-all",
|
"cell_metadata_filter": "-all",
|
||||||
"formats": "ipynb",
|
"formats": "ipynb"
|
||||||
"main_language": "python"
|
|
||||||
},
|
},
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3 (ipykernel)",
|
"display_name": "Python 3 (ipykernel)",
|
||||||
@@ -1301,7 +1301,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.11.4"
|
"version": "3.12.10"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -86,7 +86,7 @@ Hitters.shape
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
We first choose the best model using forward selection based on $C_p$ (\ref{Ch6:eq:cp}). This score
|
We first choose the best model using forward selection based on $C_p$ (6.2). This score
|
||||||
is not built in as a metric to `sklearn`. We therefore define a function to compute it ourselves, and use
|
is not built in as a metric to `sklearn`. We therefore define a function to compute it ourselves, and use
|
||||||
it as a scorer. By default, `sklearn` tries to maximize a score, hence
|
it as a scorer. By default, `sklearn` tries to maximize a score, hence
|
||||||
our scoring function computes the negative $C_p$ statistic.
|
our scoring function computes the negative $C_p$ statistic.
|
||||||
@@ -111,7 +111,7 @@ sigma2 = OLS(Y,X).fit().scale
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The function `sklearn_selected()` expects a scorer with just three arguments --- the last three in the definition of `nCp()` above. We use the function `partial()` first seen in Section~\ref{Ch5-resample-lab:the-bootstrap} to freeze the first argument with our estimate of $\sigma^2$.
|
The function `sklearn_selected()` expects a scorer with just three arguments --- the last three in the definition of `nCp()` above. We use the function `partial()` first seen in Section 5.3.3 to freeze the first argument with our estimate of $\sigma^2$.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
neg_Cp = partial(nCp, sigma2)
|
neg_Cp = partial(nCp, sigma2)
|
||||||
@@ -363,7 +363,7 @@ Since we
|
|||||||
standardize first, in order to find coefficient
|
standardize first, in order to find coefficient
|
||||||
estimates on the original scale, we must *unstandardize*
|
estimates on the original scale, we must *unstandardize*
|
||||||
the coefficient estimates. The parameter
|
the coefficient estimates. The parameter
|
||||||
$\lambda$ in (\ref{Ch6:ridge}) and (\ref{Ch6:LASSO}) is called `alphas` in `sklearn`. In order to
|
$\lambda$ in (6.5) and (6.7) is called `alphas` in `sklearn`. In order to
|
||||||
be consistent with the rest of this chapter, we use `lambdas`
|
be consistent with the rest of this chapter, we use `lambdas`
|
||||||
rather than `alphas` in what follows. {At the time of publication, ridge fits like the one in code chunk [23] issue unwarranted convergence warning messages; we suppressed these when we filtered the warnings above.}
|
rather than `alphas` in what follows. {At the time of publication, ridge fits like the one in code chunk [23] issue unwarranted convergence warning messages; we suppressed these when we filtered the warnings above.}
|
||||||
|
|
||||||
@@ -540,7 +540,7 @@ grid.best_params_['ridge__alpha']
|
|||||||
grid.best_estimator_
|
grid.best_estimator_
|
||||||
|
|
||||||
```
|
```
|
||||||
Recall we set up the `kfold` object for 5-fold cross-validation on page~\pageref{line:choos-among-models}. We now plot the cross-validated MSE as a function of $-\log(\lambda)$, which has shrinkage decreasing from left
|
Recall we set up the `kfold` object for 5-fold cross-validation on page 271. We now plot the cross-validated MSE as a function of $-\log(\lambda)$, which has shrinkage decreasing from left
|
||||||
to right.
|
to right.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -640,7 +640,7 @@ not perform variable selection!
|
|||||||
### Evaluating Test Error of Cross-Validated Ridge
|
### Evaluating Test Error of Cross-Validated Ridge
|
||||||
Choosing $\lambda$ using cross-validation provides a single regression
|
Choosing $\lambda$ using cross-validation provides a single regression
|
||||||
estimator, similar to fitting a linear regression model as we saw in
|
estimator, similar to fitting a linear regression model as we saw in
|
||||||
Chapter~\ref{Ch3:linreg}. It is therefore reasonable to estimate what its test error
|
Chapter 3. It is therefore reasonable to estimate what its test error
|
||||||
is. We run into a problem here in that cross-validation will have
|
is. We run into a problem here in that cross-validation will have
|
||||||
*touched* all of its data in choosing $\lambda$, hence we have no
|
*touched* all of its data in choosing $\lambda$, hence we have no
|
||||||
further data to estimate test error. A compromise is to do an initial
|
further data to estimate test error. A compromise is to do an initial
|
||||||
@@ -728,7 +728,7 @@ ax.set_ylabel('Standardized coefficiients', fontsize=20);
|
|||||||
```
|
```
|
||||||
The smallest cross-validated error is lower than the test set MSE of the null model
|
The smallest cross-validated error is lower than the test set MSE of the null model
|
||||||
and of least squares, and very similar to the test MSE of 115526.71 of ridge
|
and of least squares, and very similar to the test MSE of 115526.71 of ridge
|
||||||
regression (page~\pageref{page:MSECVRidge}) with $\lambda$ chosen by cross-validation.
|
regression (page 278) with $\lambda$ chosen by cross-validation.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
np.min(tuned_lasso.mse_path_.mean(1))
|
np.min(tuned_lasso.mse_path_.mean(1))
|
||||||
@@ -776,11 +776,11 @@ Principal components regression (PCR) can be performed using
|
|||||||
`PCA()` from the `sklearn.decomposition`
|
`PCA()` from the `sklearn.decomposition`
|
||||||
module. We now apply PCR to the `Hitters` data, in order to
|
module. We now apply PCR to the `Hitters` data, in order to
|
||||||
predict `Salary`. Again, ensure that the missing values have
|
predict `Salary`. Again, ensure that the missing values have
|
||||||
been removed from the data, as described in Section~\ref{Ch6-varselect-lab:lab-1-subset-selection-methods}.
|
been removed from the data, as described in Section 6.5.1.
|
||||||
|
|
||||||
We use `LinearRegression()` to fit the regression model
|
We use `LinearRegression()` to fit the regression model
|
||||||
here. Note that it fits an intercept by default, unlike
|
here. Note that it fits an intercept by default, unlike
|
||||||
the `OLS()` function seen earlier in Section~\ref{Ch6-varselect-lab:lab-1-subset-selection-methods}.
|
the `OLS()` function seen earlier in Section 6.5.1.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
pca = PCA(n_components=2)
|
pca = PCA(n_components=2)
|
||||||
@@ -864,7 +864,7 @@ cv_null = skm.cross_validate(linreg,
|
|||||||
The `explained_variance_ratio_`
|
The `explained_variance_ratio_`
|
||||||
attribute of our `PCA` object provides the *percentage of variance explained* in the predictors and in the response using
|
attribute of our `PCA` object provides the *percentage of variance explained* in the predictors and in the response using
|
||||||
different numbers of components. This concept is discussed in greater
|
different numbers of components. This concept is discussed in greater
|
||||||
detail in Section~\ref{Ch10:sec:pca}.
|
detail in Section 12.2.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
pipe.named_steps['pca'].explained_variance_ratio_
|
pipe.named_steps['pca'].explained_variance_ratio_
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -47,7 +47,7 @@ from ISLP.pygam import (approx_lam,
|
|||||||
```
|
```
|
||||||
|
|
||||||
## Polynomial Regression and Step Functions
|
## Polynomial Regression and Step Functions
|
||||||
We start by demonstrating how Figure~\ref{Ch7:fig:poly} can be reproduced.
|
We start by demonstrating how Figure 7.1 can be reproduced.
|
||||||
Let's begin by loading the data.
|
Let's begin by loading the data.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -59,7 +59,7 @@ age = Wage['age']
|
|||||||
|
|
||||||
Throughout most of this lab, our response is `Wage['wage']`, which
|
Throughout most of this lab, our response is `Wage['wage']`, which
|
||||||
we have stored as `y` above.
|
we have stored as `y` above.
|
||||||
As in Section~\ref{Ch3-linreg-lab:non-linear-transformations-of-the-predictors}, we will use the `poly()` function to create a model matrix
|
As in Section 3.6.6, we will use the `poly()` function to create a model matrix
|
||||||
that will fit a $4$th degree polynomial in `age`.
|
that will fit a $4$th degree polynomial in `age`.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -73,11 +73,11 @@ summarize(M)
|
|||||||
This polynomial is constructed using the function `poly()`,
|
This polynomial is constructed using the function `poly()`,
|
||||||
which creates
|
which creates
|
||||||
a special *transformer* `Poly()` (using `sklearn` terminology
|
a special *transformer* `Poly()` (using `sklearn` terminology
|
||||||
for feature transformations such as `PCA()` seen in Section \ref{Ch6-varselect-lab:principal-components-regression}) which
|
for feature transformations such as `PCA()` seen in Section 6.5.3) which
|
||||||
allows for easy evaluation of the polynomial at new data points. Here `poly()` is referred to as a *helper* function, and sets up the transformation; `Poly()` is the actual workhorse that computes the transformation. See also
|
allows for easy evaluation of the polynomial at new data points. Here `poly()` is referred to as a *helper* function, and sets up the transformation; `Poly()` is the actual workhorse that computes the transformation. See also
|
||||||
the
|
the
|
||||||
discussion of transformations on
|
discussion of transformations on
|
||||||
page~\pageref{Ch3-linreg-lab:using-transformations-fit-and-transform}.
|
page 118.
|
||||||
|
|
||||||
In the code above, the first line executes the `fit()` method
|
In the code above, the first line executes the `fit()` method
|
||||||
using the dataframe
|
using the dataframe
|
||||||
@@ -140,7 +140,7 @@ def plot_wage_fit(age_df,
|
|||||||
We include an argument `alpha` to `ax.scatter()`
|
We include an argument `alpha` to `ax.scatter()`
|
||||||
to add some transparency to the points. This provides a visual indication
|
to add some transparency to the points. This provides a visual indication
|
||||||
of density. Notice the use of the `zip()` function in the
|
of density. Notice the use of the `zip()` function in the
|
||||||
`for` loop above (see Section~\ref{Ch2-statlearn-lab:for-loops}).
|
`for` loop above (see Section 2.3.8).
|
||||||
We have three lines to plot, each with different colors and line
|
We have three lines to plot, each with different colors and line
|
||||||
types. Here `zip()` conveniently bundles these together as
|
types. Here `zip()` conveniently bundles these together as
|
||||||
iterators in the loop. {In `Python`{} speak, an "iterator" is an object with a finite number of values, that can be iterated on, as in a loop.}
|
iterators in the loop. {In `Python`{} speak, an "iterator" is an object with a finite number of values, that can be iterated on, as in a loop.}
|
||||||
@@ -243,7 +243,7 @@ anova_lm(*[sm.OLS(y, X_).fit() for X_ in XEs])
|
|||||||
|
|
||||||
|
|
||||||
As an alternative to using hypothesis tests and ANOVA, we could choose
|
As an alternative to using hypothesis tests and ANOVA, we could choose
|
||||||
the polynomial degree using cross-validation, as discussed in Chapter~\ref{Ch5:resample}.
|
the polynomial degree using cross-validation, as discussed in Chapter 5.
|
||||||
|
|
||||||
Next we consider the task of predicting whether an individual earns
|
Next we consider the task of predicting whether an individual earns
|
||||||
more than $250,000 per year. We proceed much as before, except
|
more than $250,000 per year. We proceed much as before, except
|
||||||
@@ -302,7 +302,7 @@ value do not cover each other up. This type of plot is often called a
|
|||||||
*rug plot*.
|
*rug plot*.
|
||||||
|
|
||||||
In order to fit a step function, as discussed in
|
In order to fit a step function, as discussed in
|
||||||
Section~\ref{Ch7:sec:scolstep-function}, we first use the `pd.qcut()`
|
Section 7.2, we first use the `pd.qcut()`
|
||||||
function to discretize `age` based on quantiles. Then we use `pd.get_dummies()` to create the
|
function to discretize `age` based on quantiles. Then we use `pd.get_dummies()` to create the
|
||||||
columns of the model matrix for this categorical variable. Note that this function will
|
columns of the model matrix for this categorical variable. Note that this function will
|
||||||
include *all* columns for a given categorical, rather than the usual approach which drops one
|
include *all* columns for a given categorical, rather than the usual approach which drops one
|
||||||
@@ -334,7 +334,7 @@ evaluation functions are in the `scipy.interpolate` package;
|
|||||||
we have simply wrapped them as transforms
|
we have simply wrapped them as transforms
|
||||||
similar to `Poly()` and `PCA()`.
|
similar to `Poly()` and `PCA()`.
|
||||||
|
|
||||||
In Section~\ref{Ch7:sec:scolr-splin}, we saw
|
In Section 7.4, we saw
|
||||||
that regression splines can be fit by constructing an appropriate
|
that regression splines can be fit by constructing an appropriate
|
||||||
matrix of basis functions. The `BSpline()` function generates the
|
matrix of basis functions. The `BSpline()` function generates the
|
||||||
entire matrix of basis functions for splines with the specified set of
|
entire matrix of basis functions for splines with the specified set of
|
||||||
@@ -349,7 +349,7 @@ bs_age.shape
|
|||||||
```
|
```
|
||||||
This results in a seven-column matrix, which is what is expected for a cubic-spline basis with 3 interior knots.
|
This results in a seven-column matrix, which is what is expected for a cubic-spline basis with 3 interior knots.
|
||||||
We can form this same matrix using the `bs()` object,
|
We can form this same matrix using the `bs()` object,
|
||||||
which facilitates adding this to a model-matrix builder (as in `poly()` versus its workhorse `Poly()`) described in Section~\ref{Ch7-nonlin-lab:polynomial-regression-and-step-functions}.
|
which facilitates adding this to a model-matrix builder (as in `poly()` versus its workhorse `Poly()`) described in Section 7.8.1.
|
||||||
|
|
||||||
We now fit a cubic spline model to the `Wage` data.
|
We now fit a cubic spline model to the `Wage` data.
|
||||||
|
|
||||||
@@ -458,7 +458,7 @@ of a model matrix with a particular smoothing operation:
|
|||||||
`s` for smoothing spline; `l` for linear, and `f` for factor or categorical variables.
|
`s` for smoothing spline; `l` for linear, and `f` for factor or categorical variables.
|
||||||
The argument `0` passed to `s` below indicates that this smoother will
|
The argument `0` passed to `s` below indicates that this smoother will
|
||||||
apply to the first column of a feature matrix. Below, we pass it a
|
apply to the first column of a feature matrix. Below, we pass it a
|
||||||
matrix with a single column: `X_age`. The argument `lam` is the penalty parameter $\lambda$ as discussed in Section~\ref{Ch7:sec5.2}.
|
matrix with a single column: `X_age`. The argument `lam` is the penalty parameter $\lambda$ as discussed in Section 7.5.2.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
X_age = np.asarray(age).reshape((-1,1))
|
X_age = np.asarray(age).reshape((-1,1))
|
||||||
@@ -548,7 +548,7 @@ The strength of generalized additive models lies in their ability to fit multiva
|
|||||||
|
|
||||||
We now fit a GAM by hand to predict
|
We now fit a GAM by hand to predict
|
||||||
`wage` using natural spline functions of `year` and `age`,
|
`wage` using natural spline functions of `year` and `age`,
|
||||||
treating `education` as a qualitative predictor, as in (\ref{Ch7:nsmod}).
|
treating `education` as a qualitative predictor, as in (7.16).
|
||||||
Since this is just a big linear regression model
|
Since this is just a big linear regression model
|
||||||
using an appropriate choice of basis functions, we can simply do this
|
using an appropriate choice of basis functions, we can simply do this
|
||||||
using the `sm.OLS()` function.
|
using the `sm.OLS()` function.
|
||||||
@@ -631,9 +631,9 @@ ax.set_title('Partial dependence of year on wage', fontsize=20);
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
We now fit the model (\ref{Ch7:nsmod}) using smoothing splines rather
|
We now fit the model (7.16) using smoothing splines rather
|
||||||
than natural splines. All of the
|
than natural splines. All of the
|
||||||
terms in (\ref{Ch7:nsmod}) are fit simultaneously, taking each other
|
terms in (7.16) are fit simultaneously, taking each other
|
||||||
into account to explain the response. The `pygam` package only works with matrices, so we must convert
|
into account to explain the response. The `pygam` package only works with matrices, so we must convert
|
||||||
the categorical series `education` to its array representation, which can be found
|
the categorical series `education` to its array representation, which can be found
|
||||||
with the `cat.codes` attribute of `education`. As `year` only has 7 unique values, we
|
with the `cat.codes` attribute of `education`. As `year` only has 7 unique values, we
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -86,11 +86,11 @@ clf.fit(X, High)
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
In our discussion of qualitative features in Section~\ref{ch3:sec3},
|
In our discussion of qualitative features in Section 3.3,
|
||||||
we noted that for a linear regression model such a feature could be
|
we noted that for a linear regression model such a feature could be
|
||||||
represented by including a matrix of dummy variables (one-hot-encoding) in the model
|
represented by including a matrix of dummy variables (one-hot-encoding) in the model
|
||||||
matrix, using the formula notation of `statsmodels`.
|
matrix, using the formula notation of `statsmodels`.
|
||||||
As mentioned in Section~\ref{Ch8:decison.tree.sec}, there is a more
|
As mentioned in Section 8.1, there is a more
|
||||||
natural way to handle qualitative features when building a decision
|
natural way to handle qualitative features when building a decision
|
||||||
tree, that does not require such dummy variables; each split amounts to partitioning the levels into two groups.
|
tree, that does not require such dummy variables; each split amounts to partitioning the levels into two groups.
|
||||||
However,
|
However,
|
||||||
@@ -121,7 +121,7 @@ resid_dev
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
This is closely related to the *entropy*, defined in (\ref{Ch8:eq:cross-entropy}).
|
This is closely related to the *entropy*, defined in (8.7).
|
||||||
A small deviance indicates a
|
A small deviance indicates a
|
||||||
tree that provides a good fit to the (training) data.
|
tree that provides a good fit to the (training) data.
|
||||||
|
|
||||||
@@ -158,7 +158,7 @@ on these data, we must estimate the test error rather than simply
|
|||||||
computing the training error. We split the observations into a
|
computing the training error. We split the observations into a
|
||||||
training set and a test set, build the tree using the training set,
|
training set and a test set, build the tree using the training set,
|
||||||
and evaluate its performance on the test data. This pattern is
|
and evaluate its performance on the test data. This pattern is
|
||||||
similar to that in Chapter~\ref{Ch6:varselect}, with the linear models
|
similar to that in Chapter 6, with the linear models
|
||||||
replaced here by decision trees --- the code for validation
|
replaced here by decision trees --- the code for validation
|
||||||
is almost identical. This approach leads to correct predictions
|
is almost identical. This approach leads to correct predictions
|
||||||
for 68.5% of the locations in the test data set.
|
for 68.5% of the locations in the test data set.
|
||||||
@@ -431,7 +431,7 @@ feature_imp.sort_values(by='importance', ascending=False)
|
|||||||
```
|
```
|
||||||
This
|
This
|
||||||
is a relative measure of the total decrease in node impurity that results from
|
is a relative measure of the total decrease in node impurity that results from
|
||||||
splits over that variable, averaged over all trees (this was plotted in Figure~\ref{Ch8:fig:varimp} for a model fit to the `Heart` data).
|
splits over that variable, averaged over all trees (this was plotted in Figure 8.9 for a model fit to the `Heart` data).
|
||||||
|
|
||||||
The results indicate that across all of the trees considered in the
|
The results indicate that across all of the trees considered in the
|
||||||
random forest, the wealth level of the community (`lstat`) and the
|
random forest, the wealth level of the community (`lstat`) and the
|
||||||
@@ -495,7 +495,7 @@ np.mean((y_test - y_hat_boost)**2)
|
|||||||
The test MSE obtained is 14.48,
|
The test MSE obtained is 14.48,
|
||||||
similar to the test MSE for bagging. If we want to, we can
|
similar to the test MSE for bagging. If we want to, we can
|
||||||
perform boosting with a different value of the shrinkage parameter
|
perform boosting with a different value of the shrinkage parameter
|
||||||
$\lambda$ in (\ref{Ch8:alphaboost}). The default value is 0.001, but
|
$\lambda$ in (8.10). The default value is 0.001, but
|
||||||
this is easily modified. Here we take $\lambda=0.2$.
|
this is easily modified. Here we take $\lambda=0.2$.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -255,9 +255,9 @@ kernel we use `kernel="poly"`, and to fit an SVM with a
|
|||||||
radial kernel we use
|
radial kernel we use
|
||||||
`kernel="rbf"`. In the former case we also use the
|
`kernel="rbf"`. In the former case we also use the
|
||||||
`degree` argument to specify a degree for the polynomial kernel
|
`degree` argument to specify a degree for the polynomial kernel
|
||||||
(this is $d$ in (\ref{Ch9:eq:polyd})), and in the latter case we use
|
(this is $d$ in (9.22)), and in the latter case we use
|
||||||
`gamma` to specify a value of $\gamma$ for the radial basis
|
`gamma` to specify a value of $\gamma$ for the radial basis
|
||||||
kernel (\ref{Ch9:eq:radial}).
|
kernel (9.24).
|
||||||
|
|
||||||
We first generate some data with a non-linear class boundary, as follows:
|
We first generate some data with a non-linear class boundary, as follows:
|
||||||
|
|
||||||
@@ -377,7 +377,7 @@ classifier, the fitted value for an observation $X= (X_1, X_2, \ldots,
|
|||||||
X_p)^T$ takes the form $\hat{\beta}_0 + \hat{\beta}_1 X_1 +
|
X_p)^T$ takes the form $\hat{\beta}_0 + \hat{\beta}_1 X_1 +
|
||||||
\hat{\beta}_2 X_2 + \ldots + \hat{\beta}_p X_p$. For an SVM with a
|
\hat{\beta}_2 X_2 + \ldots + \hat{\beta}_p X_p$. For an SVM with a
|
||||||
non-linear kernel, the equation that yields the fitted value is given
|
non-linear kernel, the equation that yields the fitted value is given
|
||||||
in (\ref{Ch9:eq:svmip}). The sign of the fitted value
|
in (9.23). The sign of the fitted value
|
||||||
determines on which side of the decision boundary the observation
|
determines on which side of the decision boundary the observation
|
||||||
lies. Therefore, the relationship between the fitted value and the
|
lies. Therefore, the relationship between the fitted value and the
|
||||||
class prediction for a given observation is simple: if the fitted
|
class prediction for a given observation is simple: if the fitted
|
||||||
|
|||||||
2190
Ch09-svm-lab.ipynb
2190
Ch09-svm-lab.ipynb
File diff suppressed because one or more lines are too long
@@ -1,5 +1,4 @@
|
|||||||
|
|
||||||
|
|
||||||
# Deep Learning
|
# Deep Learning
|
||||||
|
|
||||||
<a target="_blank" href="https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2.1/Ch10-deeplearning-lab.ipynb">
|
<a target="_blank" href="https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2.1/Ch10-deeplearning-lab.ipynb">
|
||||||
@@ -21,7 +20,8 @@ Much of our code is adapted from there, as well as the `pytorch_lightning` docum
|
|||||||
We start with several standard imports that we have seen before.
|
We start with several standard imports that we have seen before.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
import numpy as np, pandas as pd
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
from matplotlib.pyplot import subplots
|
from matplotlib.pyplot import subplots
|
||||||
from sklearn.linear_model import \
|
from sklearn.linear_model import \
|
||||||
(LinearRegression,
|
(LinearRegression,
|
||||||
@@ -57,7 +57,7 @@ the `torchmetrics` package has utilities to compute
|
|||||||
various metrics to evaluate performance when fitting
|
various metrics to evaluate performance when fitting
|
||||||
a model. The `torchinfo` package provides a useful
|
a model. The `torchinfo` package provides a useful
|
||||||
summary of the layers of a model. We use the `read_image()`
|
summary of the layers of a model. We use the `read_image()`
|
||||||
function when loading test images in Section~\ref{Ch13-deeplearning-lab:using-pretrained-cnn-models}.
|
function when loading test images in Section 10.9.4.
|
||||||
|
|
||||||
If you have not already installed the packages `torchvision`
|
If you have not already installed the packages `torchvision`
|
||||||
and `torchinfo` you can install them by running
|
and `torchinfo` you can install them by running
|
||||||
@@ -153,17 +153,19 @@ in our example applying the `ResNet50` model
|
|||||||
to some of our own images.
|
to some of our own images.
|
||||||
The `json` module will be used to load
|
The `json` module will be used to load
|
||||||
a JSON file for looking up classes to identify the labels of the
|
a JSON file for looking up classes to identify the labels of the
|
||||||
pictures in the `ResNet50` example.
|
pictures in the `ResNet50` example. We'll also import `warnings` to filter
|
||||||
|
out warnings when fitting the LASSO to the IMDB data.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
from glob import glob
|
from glob import glob
|
||||||
import json
|
import json
|
||||||
|
import warnings
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Single Layer Network on Hitters Data
|
## Single Layer Network on Hitters Data
|
||||||
We start by fitting the models in Section~\ref{Ch13:sec:when-use-deep} on the `Hitters` data.
|
We start by fitting the models in Section 10.6 on the `Hitters` data.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
Hitters = load_data('Hitters').dropna()
|
Hitters = load_data('Hitters').dropna()
|
||||||
@@ -217,7 +219,7 @@ np.abs(Yhat_test - Y_test).mean()
|
|||||||
|
|
||||||
Next we fit the lasso using `sklearn`. We are using
|
Next we fit the lasso using `sklearn`. We are using
|
||||||
mean absolute error to select and evaluate a model, rather than mean squared error.
|
mean absolute error to select and evaluate a model, rather than mean squared error.
|
||||||
The specialized solver we used in Section~\ref{Ch6-varselect-lab:lab-2-ridge-regression-and-the-lasso} uses only mean squared error. So here, with a bit more work, we create a cross-validation grid and perform the cross-validation directly.
|
The specialized solver we used in Section 6.5.2 uses only mean squared error. So here, with a bit more work, we create a cross-validation grid and perform the cross-validation directly.
|
||||||
|
|
||||||
We encode a pipeline with two steps: we first normalize the features using a `StandardScaler()` transform,
|
We encode a pipeline with two steps: we first normalize the features using a `StandardScaler()` transform,
|
||||||
and then fit the lasso without further normalization.
|
and then fit the lasso without further normalization.
|
||||||
@@ -439,7 +441,7 @@ hit_module = SimpleModule.regression(hit_model,
|
|||||||
```
|
```
|
||||||
|
|
||||||
By using the `SimpleModule.regression()` method, we indicate that we will use squared-error loss as in
|
By using the `SimpleModule.regression()` method, we indicate that we will use squared-error loss as in
|
||||||
(\ref{Ch13:eq:4}).
|
(10.23).
|
||||||
We have also asked for mean absolute error to be tracked as well
|
We have also asked for mean absolute error to be tracked as well
|
||||||
in the metrics that are logged.
|
in the metrics that are logged.
|
||||||
|
|
||||||
@@ -476,7 +478,7 @@ hit_trainer = Trainer(deterministic=True,
|
|||||||
hit_trainer.fit(hit_module, datamodule=hit_dm)
|
hit_trainer.fit(hit_module, datamodule=hit_dm)
|
||||||
```
|
```
|
||||||
At each step of SGD, the algorithm randomly selects 32 training observations for
|
At each step of SGD, the algorithm randomly selects 32 training observations for
|
||||||
the computation of the gradient. Recall from Section~\ref{Ch13:sec:fitt-neur-netw}
|
the computation of the gradient. Recall from Section 10.7
|
||||||
that an epoch amounts to the number of SGD steps required to process $n$
|
that an epoch amounts to the number of SGD steps required to process $n$
|
||||||
observations. Since the training set has
|
observations. Since the training set has
|
||||||
$n=175$, and we specified a `batch_size` of 32 in the construction of `hit_dm`, an epoch is $175/32=5.5$ SGD steps.
|
$n=175$, and we specified a `batch_size` of 32 in the construction of `hit_dm`, an epoch is $175/32=5.5$ SGD steps.
|
||||||
@@ -765,8 +767,8 @@ mnist_trainer.test(mnist_module,
|
|||||||
datamodule=mnist_dm)
|
datamodule=mnist_dm)
|
||||||
```
|
```
|
||||||
|
|
||||||
Table~\ref{Ch13:tab:mnist} also reports the error rates resulting from LDA (Chapter~\ref{Ch4:classification}) and multiclass logistic
|
Table 10.1 also reports the error rates resulting from LDA (Chapter 4) and multiclass logistic
|
||||||
regression. For LDA we refer the reader to Section~\ref{Ch4-classification-lab:linear-discriminant-analysis}.
|
regression. For LDA we refer the reader to Section 4.7.3.
|
||||||
Although we could use the `sklearn` function `LogisticRegression()` to fit
|
Although we could use the `sklearn` function `LogisticRegression()` to fit
|
||||||
multiclass logistic regression, we are set up here to fit such a model
|
multiclass logistic regression, we are set up here to fit such a model
|
||||||
with `torch`.
|
with `torch`.
|
||||||
@@ -871,7 +873,7 @@ for idx, (X_ ,Y_) in enumerate(cifar_dm.train_dataloader()):
|
|||||||
|
|
||||||
|
|
||||||
Before we start, we look at some of the training images; similar code produced
|
Before we start, we look at some of the training images; similar code produced
|
||||||
Figure~\ref{Ch13:fig:cifar100} on page \pageref{Ch13:fig:cifar100}. The example below also illustrates
|
Figure 10.5 on page 406. The example below also illustrates
|
||||||
that `TensorDataset` objects can be indexed with integers --- we are choosing
|
that `TensorDataset` objects can be indexed with integers --- we are choosing
|
||||||
random images from the training data by indexing `cifar_train`. In order to display correctly,
|
random images from the training data by indexing `cifar_train`. In order to display correctly,
|
||||||
we must reorder the dimensions by a call to `np.transpose()`.
|
we must reorder the dimensions by a call to `np.transpose()`.
|
||||||
@@ -894,7 +896,7 @@ for i in range(5):
|
|||||||
Here the `imshow()` method recognizes from the shape of its argument that it is a 3-dimensional array, with the last dimension indexing the three RGB color channels.
|
Here the `imshow()` method recognizes from the shape of its argument that it is a 3-dimensional array, with the last dimension indexing the three RGB color channels.
|
||||||
|
|
||||||
We specify a moderately-sized CNN for
|
We specify a moderately-sized CNN for
|
||||||
demonstration purposes, similar in structure to Figure~\ref{Ch13:fig:DeepCNN}.
|
demonstration purposes, similar in structure to Figure 10.8.
|
||||||
We use several layers, each consisting of convolution, ReLU, and max-pooling steps.
|
We use several layers, each consisting of convolution, ReLU, and max-pooling steps.
|
||||||
We first define a module that defines one of these layers. As in our
|
We first define a module that defines one of these layers. As in our
|
||||||
previous examples, we overwrite the `__init__()` and `forward()` methods
|
previous examples, we overwrite the `__init__()` and `forward()` methods
|
||||||
@@ -1034,7 +1036,7 @@ summary_plot(cifar_results,
|
|||||||
ax,
|
ax,
|
||||||
col='accuracy',
|
col='accuracy',
|
||||||
ylabel='Accuracy')
|
ylabel='Accuracy')
|
||||||
ax.set_xticks(np.linspace(0, 10, 6).astype(int))
|
ax.set_xticks(np.linspace(0, 30, 7).astype(int))
|
||||||
ax.set_ylabel('Accuracy')
|
ax.set_ylabel('Accuracy')
|
||||||
ax.set_ylim([0, 1]);
|
ax.set_ylim([0, 1]);
|
||||||
```
|
```
|
||||||
@@ -1083,7 +1085,7 @@ clauses; if it works, we get the speedup, if it fails, nothing happens.
|
|||||||
|
|
||||||
## Using Pretrained CNN Models
|
## Using Pretrained CNN Models
|
||||||
We now show how to use a CNN pretrained on the `imagenet` database to classify natural
|
We now show how to use a CNN pretrained on the `imagenet` database to classify natural
|
||||||
images, and demonstrate how we produced Figure~\ref{Ch13:fig:homeimages}.
|
images, and demonstrate how we produced Figure 10.10.
|
||||||
We copied six JPEG images from a digital photo album into the
|
We copied six JPEG images from a digital photo album into the
|
||||||
directory `book_images`. These images are available
|
directory `book_images`. These images are available
|
||||||
from the data section of <www.statlearning.com>, the ISLP book website. Download `book_images.zip`; when
|
from the data section of <www.statlearning.com>, the ISLP book website. Download `book_images.zip`; when
|
||||||
@@ -1192,7 +1194,7 @@ del(cifar_test,
|
|||||||
|
|
||||||
|
|
||||||
## IMDB Document Classification
|
## IMDB Document Classification
|
||||||
We now implement models for sentiment classification (Section~\ref{Ch13:sec:docum-class}) on the `IMDB`
|
We now implement models for sentiment classification (Section 10.4) on the `IMDB`
|
||||||
dataset. As mentioned above code block~8, we are using
|
dataset. As mentioned above code block~8, we are using
|
||||||
a preprocessed version of the `IMDB` dataset found in the
|
a preprocessed version of the `IMDB` dataset found in the
|
||||||
`keras` package. As `keras` uses `tensorflow`, a different
|
`keras` package. As `keras` uses `tensorflow`, a different
|
||||||
@@ -1346,7 +1348,7 @@ matrix that is recognized by `sklearn.`
|
|||||||
```
|
```
|
||||||
|
|
||||||
Similar to what we did in
|
Similar to what we did in
|
||||||
Section~\ref{Ch13-deeplearning-lab:single-layer-network-on-hitters-data},
|
Section 10.9.1,
|
||||||
we construct a series of 50 values for the lasso reguralization parameter $\lambda$.
|
we construct a series of 50 values for the lasso reguralization parameter $\lambda$.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -1369,16 +1371,20 @@ logit = LogisticRegression(penalty='l1',
|
|||||||
|
|
||||||
```
|
```
|
||||||
The path of 50 values takes approximately 40 seconds to run.
|
The path of 50 values takes approximately 40 seconds to run.
|
||||||
|
As in Chapter 6, we will filter out warnings, this time using a context manager.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
coefs = []
|
with warnings.catch_warnings():
|
||||||
intercepts = []
|
warnings.simplefilter("ignore")
|
||||||
|
|
||||||
for l in lam_val:
|
coefs = []
|
||||||
logit.C = 1/l
|
intercepts = []
|
||||||
logit.fit(X_train, Y_train)
|
|
||||||
coefs.append(logit.coef_.copy())
|
for l in lam_val:
|
||||||
intercepts.append(logit.intercept_)
|
logit.C = 1/l
|
||||||
|
logit.fit(X_train, Y_train)
|
||||||
|
coefs.append(logit.coef_.copy())
|
||||||
|
intercepts.append(logit.intercept_)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -1454,16 +1460,16 @@ del(imdb_model,
|
|||||||
|
|
||||||
## Recurrent Neural Networks
|
## Recurrent Neural Networks
|
||||||
In this lab we fit the models illustrated in
|
In this lab we fit the models illustrated in
|
||||||
Section~\ref{Ch13:sec:recurr-neur-netw}.
|
Section 10.5.
|
||||||
|
|
||||||
|
|
||||||
### Sequential Models for Document Classification
|
### Sequential Models for Document Classification
|
||||||
Here we fit a simple LSTM RNN for sentiment prediction to
|
Here we fit a simple LSTM RNN for sentiment prediction to
|
||||||
the `IMDb` movie-review data, as discussed in Section~\ref{Ch13:sec:sequ-models-docum}.
|
the `IMDb` movie-review data, as discussed in Section 10.5.1.
|
||||||
For an RNN we use the sequence of words in a document, taking their
|
For an RNN we use the sequence of words in a document, taking their
|
||||||
order into account. We loaded the preprocessed
|
order into account. We loaded the preprocessed
|
||||||
data at the beginning of
|
data at the beginning of
|
||||||
Section~\ref{Ch13-deeplearning-lab:imdb-document-classification}.
|
Section 10.9.5.
|
||||||
A script that details the preprocessing can be found in the
|
A script that details the preprocessing can be found in the
|
||||||
`ISLP` library. Notably, since more than 90% of the documents
|
`ISLP` library. Notably, since more than 90% of the documents
|
||||||
had fewer than 500 words, we set the document length to 500. For
|
had fewer than 500 words, we set the document length to 500. For
|
||||||
@@ -1578,7 +1584,7 @@ del(lstm_model,
|
|||||||
|
|
||||||
|
|
||||||
### Time Series Prediction
|
### Time Series Prediction
|
||||||
We now show how to fit the models in Section~\ref{Ch13:sec:time-seri-pred}
|
We now show how to fit the models in Section 10.5.2
|
||||||
for time series prediction.
|
for time series prediction.
|
||||||
We first load and standardize the data.
|
We first load and standardize the data.
|
||||||
|
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -10,9 +10,9 @@
|
|||||||
|
|
||||||
|
|
||||||
In this lab, we perform survival analyses on three separate data
|
In this lab, we perform survival analyses on three separate data
|
||||||
sets. In Section~\ref{brain.cancer.sec} we analyze the `BrainCancer`
|
sets. In Section 11.8.1 we analyze the `BrainCancer`
|
||||||
data that was first described in Section~\ref{sec:KM}. In Section~\ref{time.to.pub.sec}, we examine the `Publication`
|
data that was first described in Section 11.3. In Section 11.8.2, we examine the `Publication`
|
||||||
data from Section~\ref{sec:pub}. Finally, Section~\ref{call.center.sec} explores
|
data from Section 11.5.4. Finally, Section 11.8.3 explores
|
||||||
a simulated call-center data set.
|
a simulated call-center data set.
|
||||||
|
|
||||||
We begin by importing some of our libraries at this top
|
We begin by importing some of our libraries at this top
|
||||||
@@ -82,7 +82,7 @@ observation. But some scientists might use the opposite coding. For
|
|||||||
the `BrainCancer` data set 35 patients died before the end of
|
the `BrainCancer` data set 35 patients died before the end of
|
||||||
the study, so we are using the conventional coding.
|
the study, so we are using the conventional coding.
|
||||||
|
|
||||||
To begin the analysis, we re-create the Kaplan-Meier survival curve shown in Figure~\ref{fig:survbrain}. The main
|
To begin the analysis, we re-create the Kaplan-Meier survival curve shown in Figure 11.2. The main
|
||||||
package we will use for survival analysis
|
package we will use for survival analysis
|
||||||
is `lifelines`.
|
is `lifelines`.
|
||||||
The variable `time` corresponds to $y_i$, the time to the $i$th event (either censoring or
|
The variable `time` corresponds to $y_i$, the time to the $i$th event (either censoring or
|
||||||
@@ -102,7 +102,7 @@ km_brain.plot(label='Kaplan Meier estimate', ax=ax)
|
|||||||
```
|
```
|
||||||
|
|
||||||
Next we create Kaplan-Meier survival curves that are stratified by
|
Next we create Kaplan-Meier survival curves that are stratified by
|
||||||
`sex`, in order to reproduce Figure~\ref{fig:survbrain2}.
|
`sex`, in order to reproduce Figure 11.3.
|
||||||
We do this using the `groupby()` method of a dataframe.
|
We do this using the `groupby()` method of a dataframe.
|
||||||
This method returns a generator that can
|
This method returns a generator that can
|
||||||
be iterated over in the `for` loop. In this case,
|
be iterated over in the `for` loop. In this case,
|
||||||
@@ -130,7 +130,7 @@ for sex, df in BrainCancer.groupby('sex'):
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
As discussed in Section~\ref{sec:logrank}, we can perform a
|
As discussed in Section 11.4, we can perform a
|
||||||
log-rank test to compare the survival of males to females. We use
|
log-rank test to compare the survival of males to females. We use
|
||||||
the `logrank_test()` function from the `lifelines.statistics` module.
|
the `logrank_test()` function from the `lifelines.statistics` module.
|
||||||
The first two arguments are the event times, with the second
|
The first two arguments are the event times, with the second
|
||||||
@@ -266,9 +266,9 @@ predicted_survival.plot(ax=ax);
|
|||||||
|
|
||||||
|
|
||||||
## Publication Data
|
## Publication Data
|
||||||
The `Publication` data presented in Section~\ref{sec:pub} can be
|
The `Publication` data presented in Section 11.5.4 can be
|
||||||
found in the `ISLP` package.
|
found in the `ISLP` package.
|
||||||
We first reproduce Figure~\ref{fig:lauersurv} by plotting the Kaplan-Meier curves
|
We first reproduce Figure 11.5 by plotting the Kaplan-Meier curves
|
||||||
stratified on the `posres` variable, which records whether the
|
stratified on the `posres` variable, which records whether the
|
||||||
study had a positive or negative result.
|
study had a positive or negative result.
|
||||||
|
|
||||||
@@ -323,7 +323,7 @@ of the study, and whether the study had positive or negative results.
|
|||||||
|
|
||||||
In this section, we will simulate survival data using the relationship
|
In this section, we will simulate survival data using the relationship
|
||||||
between cumulative hazard and
|
between cumulative hazard and
|
||||||
the survival function explored in Exercise \ref{ex:all3}.
|
the survival function explored in Exercise 8.
|
||||||
Our simulated data will represent the observed
|
Our simulated data will represent the observed
|
||||||
wait times (in seconds) for 2,000 customers who have phoned a call
|
wait times (in seconds) for 2,000 customers who have phoned a call
|
||||||
center. In this context, censoring occurs if a customer hangs up
|
center. In this context, censoring occurs if a customer hangs up
|
||||||
@@ -394,7 +394,7 @@ that the risk of a call being answered at Center B is 0.74 times the
|
|||||||
risk that it will be answered at Center A; in other words, the wait
|
risk that it will be answered at Center A; in other words, the wait
|
||||||
times are a bit longer at Center B.
|
times are a bit longer at Center B.
|
||||||
|
|
||||||
Recall from Section~\ref{Ch2-statlearn-lab:loading-data} the use of `lambda`
|
Recall from Section 2.3.7 the use of `lambda`
|
||||||
for creating short functions on the fly.
|
for creating short functions on the fly.
|
||||||
We use the function
|
We use the function
|
||||||
`sim_time()` from the `ISLP.survival` package. This function
|
`sim_time()` from the `ISLP.survival` package. This function
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -164,7 +164,7 @@ for k in range(pcaUS.components_.shape[1]):
|
|||||||
USArrests.columns[k])
|
USArrests.columns[k])
|
||||||
|
|
||||||
```
|
```
|
||||||
Notice that this figure is a reflection of Figure~\ref{Ch10:fig:USArrests:obs} through the $y$-axis. Recall that the
|
Notice that this figure is a reflection of Figure 12.1 through the $y$-axis. Recall that the
|
||||||
principal components are only unique up to a sign change, so we can
|
principal components are only unique up to a sign change, so we can
|
||||||
reproduce that figure by flipping the
|
reproduce that figure by flipping the
|
||||||
signs of the second set of scores and loadings.
|
signs of the second set of scores and loadings.
|
||||||
@@ -241,7 +241,7 @@ ax.set_xticks(ticks)
|
|||||||
fig
|
fig
|
||||||
|
|
||||||
```
|
```
|
||||||
The result is similar to that shown in Figure~\ref{Ch10:fig:USArrests:scree}. Note
|
The result is similar to that shown in Figure 12.3. Note
|
||||||
that the method `cumsum()` computes the cumulative sum of
|
that the method `cumsum()` computes the cumulative sum of
|
||||||
the elements of a numeric vector. For instance:
|
the elements of a numeric vector. For instance:
|
||||||
|
|
||||||
@@ -253,15 +253,15 @@ np.cumsum(a)
|
|||||||
## Matrix Completion
|
## Matrix Completion
|
||||||
|
|
||||||
We now re-create the analysis carried out on the `USArrests` data in
|
We now re-create the analysis carried out on the `USArrests` data in
|
||||||
Section~\ref{Ch10:sec:princ-comp-with}.
|
Section 12.3.
|
||||||
|
|
||||||
We saw in Section~\ref{ch10:sec2.2} that solving the optimization
|
We saw in Section 12.2.2 that solving the optimization
|
||||||
problem~(\ref{Ch10:eq:mc2}) on a centered data matrix $\bf X$ is
|
problem~(12.6) on a centered data matrix $\bf X$ is
|
||||||
equivalent to computing the first $M$ principal
|
equivalent to computing the first $M$ principal
|
||||||
components of the data. We use our scaled
|
components of the data. We use our scaled
|
||||||
and centered `USArrests` data as $\bf X$ below. The *singular value decomposition*
|
and centered `USArrests` data as $\bf X$ below. The *singular value decomposition*
|
||||||
(SVD) is a general algorithm for solving
|
(SVD) is a general algorithm for solving
|
||||||
(\ref{Ch10:eq:mc2}).
|
(12.6).
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
X = USArrests_scaled
|
X = USArrests_scaled
|
||||||
@@ -319,9 +319,9 @@ Here the array `r_idx`
|
|||||||
contains 20 integers from 0 to 49; this represents the states (rows of `X`) that are selected to contain missing values. And `c_idx` contains
|
contains 20 integers from 0 to 49; this represents the states (rows of `X`) that are selected to contain missing values. And `c_idx` contains
|
||||||
20 integers from 0 to 3, representing the features (columns in `X`) that contain the missing values for each of the selected states.
|
20 integers from 0 to 3, representing the features (columns in `X`) that contain the missing values for each of the selected states.
|
||||||
|
|
||||||
We now write some code to implement Algorithm~\ref{Ch10:alg:hardimpute}.
|
We now write some code to implement Algorithm 12.1.
|
||||||
We first write a function that takes in a matrix, and returns an approximation to the matrix using the `svd()` function.
|
We first write a function that takes in a matrix, and returns an approximation to the matrix using the `svd()` function.
|
||||||
This will be needed in Step 2 of Algorithm~\ref{Ch10:alg:hardimpute}.
|
This will be needed in Step 2 of Algorithm 12.1.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
def low_rank(X, M=1):
|
def low_rank(X, M=1):
|
||||||
@@ -330,7 +330,7 @@ def low_rank(X, M=1):
|
|||||||
return L.dot(V[:M])
|
return L.dot(V[:M])
|
||||||
|
|
||||||
```
|
```
|
||||||
To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm~\ref{Ch10:alg:hardimpute} --- by replacing
|
To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm 12.1 --- by replacing
|
||||||
the missing values with the column means of the non-missing entries. These are stored in
|
the missing values with the column means of the non-missing entries. These are stored in
|
||||||
`Xbar` below after running `np.nanmean()` over the row axis.
|
`Xbar` below after running `np.nanmean()` over the row axis.
|
||||||
We make a copy so that when we assign values to `Xhat` below we do not also overwrite the
|
We make a copy so that when we assign values to `Xhat` below we do not also overwrite the
|
||||||
@@ -360,11 +360,11 @@ a given element is `True` if the corresponding matrix element is missing. The no
|
|||||||
because it allows us to access both the missing and non-missing entries. We store the mean of the squared non-missing elements in `mss0`.
|
because it allows us to access both the missing and non-missing entries. We store the mean of the squared non-missing elements in `mss0`.
|
||||||
We store the mean squared error of the non-missing elements of the old version of `Xhat` in `mssold` (which currently
|
We store the mean squared error of the non-missing elements of the old version of `Xhat` in `mssold` (which currently
|
||||||
agrees with `mss0`). We plan to store the mean squared error of the non-missing elements of the current version of `Xhat` in `mss`, and will then
|
agrees with `mss0`). We plan to store the mean squared error of the non-missing elements of the current version of `Xhat` in `mss`, and will then
|
||||||
iterate Step 2 of Algorithm~\ref{Ch10:alg:hardimpute} until the *relative error*, defined as
|
iterate Step 2 of Algorithm 12.1 until the *relative error*, defined as
|
||||||
`(mssold - mss) / mss0`, falls below `thresh = 1e-7`.
|
`(mssold - mss) / mss0`, falls below `thresh = 1e-7`.
|
||||||
{Algorithm~\ref{Ch10:alg:hardimpute} tells us to iterate Step 2 until \eqref{Ch10:eq:mc6} is no longer decreasing. Determining whether \eqref{Ch10:eq:mc6} is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm~\ref{Ch10:alg:hardimpute} to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}
|
{Algorithm 12.1 tells us to iterate Step 2 until 12.14 is no longer decreasing. Determining whether 12.14 is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm 12.1 to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}
|
||||||
|
|
||||||
In Step 2(a) of Algorithm~\ref{Ch10:alg:hardimpute}, we approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we use `Xapp` to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:
|
In Step 2(a) of Algorithm 12.1, we approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we use `Xapp` to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
while rel_err > thresh:
|
while rel_err > thresh:
|
||||||
@@ -393,7 +393,7 @@ np.corrcoef(Xapp[ismiss], X[ismiss])[0,1]
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
In this lab, we implemented Algorithm~\ref{Ch10:alg:hardimpute} ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.
|
In this lab, we implemented Algorithm 12.1 ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.
|
||||||
|
|
||||||
|
|
||||||
## Clustering
|
## Clustering
|
||||||
@@ -464,7 +464,7 @@ We have used the `n_init` argument to run the $K$-means with 20
|
|||||||
initial cluster assignments (the default is 10). If a
|
initial cluster assignments (the default is 10). If a
|
||||||
value of `n_init` greater than one is used, then $K$-means
|
value of `n_init` greater than one is used, then $K$-means
|
||||||
clustering will be performed using multiple random assignments in
|
clustering will be performed using multiple random assignments in
|
||||||
Step 1 of Algorithm~\ref{Ch10:alg:km}, and the `KMeans()`
|
Step 1 of Algorithm 12.2, and the `KMeans()`
|
||||||
function will report only the best results. Here we compare using
|
function will report only the best results. Here we compare using
|
||||||
`n_init=1` to `n_init=20`.
|
`n_init=1` to `n_init=20`.
|
||||||
|
|
||||||
@@ -480,7 +480,7 @@ kmeans1.inertia_, kmeans20.inertia_
|
|||||||
```
|
```
|
||||||
Note that `kmeans.inertia_` is the total within-cluster sum
|
Note that `kmeans.inertia_` is the total within-cluster sum
|
||||||
of squares, which we seek to minimize by performing $K$-means
|
of squares, which we seek to minimize by performing $K$-means
|
||||||
clustering \eqref{Ch10:eq:kmeans}.
|
clustering 12.17.
|
||||||
|
|
||||||
We *strongly* recommend always running $K$-means clustering with
|
We *strongly* recommend always running $K$-means clustering with
|
||||||
a large value of `n_init`, such as 20 or 50, since otherwise an
|
a large value of `n_init`, such as 20 or 50, since otherwise an
|
||||||
@@ -846,7 +846,7 @@ results in four distinct clusters. It is easy to verify that the
|
|||||||
resulting clusters are the same as the ones we obtained in
|
resulting clusters are the same as the ones we obtained in
|
||||||
`comp_cut`.
|
`comp_cut`.
|
||||||
|
|
||||||
We claimed earlier in Section~\ref{Ch10:subsec:hc} that
|
We claimed earlier in Section 12.4.2 that
|
||||||
$K$-means clustering and hierarchical clustering with the dendrogram
|
$K$-means clustering and hierarchical clustering with the dendrogram
|
||||||
cut to obtain the same number of clusters can yield very different
|
cut to obtain the same number of clusters can yield very different
|
||||||
results. How do these `NCI60` hierarchical clustering results compare
|
results. How do these `NCI60` hierarchical clustering results compare
|
||||||
|
|||||||
3004
Ch12-unsup-lab.ipynb
3004
Ch12-unsup-lab.ipynb
File diff suppressed because one or more lines are too long
@@ -91,7 +91,7 @@ truth = pd.Categorical(true_mean == 0,
|
|||||||
|
|
||||||
```
|
```
|
||||||
Since this is a simulated data set, we can create a $2 \times 2$ table
|
Since this is a simulated data set, we can create a $2 \times 2$ table
|
||||||
similar to Table~\ref{Ch12:tab-hypotheses}.
|
similar to Table 13.2.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
pd.crosstab(decision,
|
pd.crosstab(decision,
|
||||||
@@ -102,7 +102,7 @@ pd.crosstab(decision,
|
|||||||
```
|
```
|
||||||
Therefore, at level $\alpha=0.05$, we reject 15 of the 50 false
|
Therefore, at level $\alpha=0.05$, we reject 15 of the 50 false
|
||||||
null hypotheses, and we incorrectly reject 5 of the true null
|
null hypotheses, and we incorrectly reject 5 of the true null
|
||||||
hypotheses. Using the notation from Section~\ref{sec:fwer}, we have
|
hypotheses. Using the notation from Section 13.3, we have
|
||||||
$V=5$, $S=15$, $U=45$ and $W=35$.
|
$V=5$, $S=15$, $U=45$ and $W=35$.
|
||||||
We have set $\alpha=0.05$, which means that we expect to reject around
|
We have set $\alpha=0.05$, which means that we expect to reject around
|
||||||
5% of the true null hypotheses. This is in line with the $2 \times 2$
|
5% of the true null hypotheses. This is in line with the $2 \times 2$
|
||||||
@@ -140,12 +140,12 @@ pd.crosstab(decision,
|
|||||||
|
|
||||||
|
|
||||||
## Family-Wise Error Rate
|
## Family-Wise Error Rate
|
||||||
Recall from \eqref{eq:FWER.indep} that if the null hypothesis is true
|
Recall from 13.5 that if the null hypothesis is true
|
||||||
for each of $m$ independent hypothesis tests, then the FWER is equal
|
for each of $m$ independent hypothesis tests, then the FWER is equal
|
||||||
to $1-(1-\alpha)^m$. We can use this expression to compute the FWER
|
to $1-(1-\alpha)^m$. We can use this expression to compute the FWER
|
||||||
for $m=1,\ldots, 500$ and $\alpha=0.05$, $0.01$, and $0.001$.
|
for $m=1,\ldots, 500$ and $\alpha=0.05$, $0.01$, and $0.001$.
|
||||||
We plot the FWER for these values of $\alpha$ in order to
|
We plot the FWER for these values of $\alpha$ in order to
|
||||||
reproduce Figure~\ref{Ch12:fwer}.
|
reproduce Figure 13.2.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
m = np.linspace(1, 501)
|
m = np.linspace(1, 501)
|
||||||
@@ -263,7 +263,7 @@ However, we decided to perform this test only after examining the data
|
|||||||
and noting that Managers One and Two had the highest and lowest mean
|
and noting that Managers One and Two had the highest and lowest mean
|
||||||
performances. In a sense, this means that we have implicitly
|
performances. In a sense, this means that we have implicitly
|
||||||
performed ${5 \choose 2} = 5(5-1)/2=10$ hypothesis tests, rather than
|
performed ${5 \choose 2} = 5(5-1)/2=10$ hypothesis tests, rather than
|
||||||
just one, as discussed in Section~\ref{tukey.sec}. Hence, we use the
|
just one, as discussed in Section 13.3.2. Hence, we use the
|
||||||
`pairwise_tukeyhsd()` function from
|
`pairwise_tukeyhsd()` function from
|
||||||
`statsmodels.stats.multicomp` to apply Tukey’s method
|
`statsmodels.stats.multicomp` to apply Tukey’s method
|
||||||
in order to adjust for multiple testing. This function takes
|
in order to adjust for multiple testing. This function takes
|
||||||
@@ -350,7 +350,7 @@ null hypotheses!
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
Figure~\ref{Ch12:fig:BonferroniBenjamini} displays the ordered
|
Figure 13.6 displays the ordered
|
||||||
$p$-values, $p_{(1)} \leq p_{(2)} \leq \cdots \leq p_{(2000)}$, for
|
$p$-values, $p_{(1)} \leq p_{(2)} \leq \cdots \leq p_{(2000)}$, for
|
||||||
the `Fund` dataset, as well as the threshold for rejection by the
|
the `Fund` dataset, as well as the threshold for rejection by the
|
||||||
Benjamini--Hochberg procedure. Recall that the Benjamini--Hochberg
|
Benjamini--Hochberg procedure. Recall that the Benjamini--Hochberg
|
||||||
@@ -379,7 +379,7 @@ else:
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
We now reproduce the middle panel of Figure~\ref{Ch12:fig:BonferroniBenjamini}.
|
We now reproduce the middle panel of Figure 13.6.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
fig, ax = plt.subplots()
|
fig, ax = plt.subplots()
|
||||||
@@ -398,7 +398,7 @@ ax.axline((0, 0), (1,q/m), c='k', ls='--', linewidth=3);
|
|||||||
## A Re-Sampling Approach
|
## A Re-Sampling Approach
|
||||||
Here, we implement the re-sampling approach to hypothesis testing
|
Here, we implement the re-sampling approach to hypothesis testing
|
||||||
using the `Khan` dataset, which we investigated in
|
using the `Khan` dataset, which we investigated in
|
||||||
Section~\ref{sec:permutations}. First, we merge the training and
|
Section 13.5. First, we merge the training and
|
||||||
testing data, which results in observations on 83 patients for
|
testing data, which results in observations on 83 patients for
|
||||||
2,308 genes.
|
2,308 genes.
|
||||||
|
|
||||||
@@ -464,7 +464,7 @@ for b in range(B):
|
|||||||
This fraction, 0.0398,
|
This fraction, 0.0398,
|
||||||
is our re-sampling-based $p$-value.
|
is our re-sampling-based $p$-value.
|
||||||
It is almost identical to the $p$-value of 0.0412 obtained using the theoretical null distribution.
|
It is almost identical to the $p$-value of 0.0412 obtained using the theoretical null distribution.
|
||||||
We can plot a histogram of the re-sampling-based test statistics in order to reproduce Figure~\ref{Ch12:fig-permp-1}.
|
We can plot a histogram of the re-sampling-based test statistics in order to reproduce Figure 13.7.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
fig, ax = plt.subplots(figsize=(8,8))
|
fig, ax = plt.subplots(figsize=(8,8))
|
||||||
@@ -487,7 +487,7 @@ ax.set_xlabel("Null Distribution of Test Statistic");
|
|||||||
The re-sampling-based null distribution is almost identical to the theoretical null distribution, which is displayed in red.
|
The re-sampling-based null distribution is almost identical to the theoretical null distribution, which is displayed in red.
|
||||||
|
|
||||||
Finally, we implement the plug-in re-sampling FDR approach outlined in
|
Finally, we implement the plug-in re-sampling FDR approach outlined in
|
||||||
Algorithm~\ref{Ch12:alg-plugin-fdr}. Depending on the speed of your
|
Algorithm 13.4. Depending on the speed of your
|
||||||
computer, calculating the FDR for all 2,308 genes in the `Khan`
|
computer, calculating the FDR for all 2,308 genes in the `Khan`
|
||||||
dataset may take a while. Hence, we will illustrate the approach on a
|
dataset may take a while. Hence, we will illustrate the approach on a
|
||||||
random subset of 100 genes. For each gene, we first compute the
|
random subset of 100 genes. For each gene, we first compute the
|
||||||
@@ -520,7 +520,7 @@ for j in range(m):
|
|||||||
Next, we compute the number of rejected null hypotheses $R$, the
|
Next, we compute the number of rejected null hypotheses $R$, the
|
||||||
estimated number of false positives $\widehat{V}$, and the estimated
|
estimated number of false positives $\widehat{V}$, and the estimated
|
||||||
FDR, for a range of threshold values $c$ in
|
FDR, for a range of threshold values $c$ in
|
||||||
Algorithm~\ref{Ch12:alg-plugin-fdr}. The threshold values are chosen
|
Algorithm 13.4. The threshold values are chosen
|
||||||
using the absolute values of the test statistics from the 100 genes.
|
using the absolute values of the test statistics from the 100 genes.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -559,8 +559,8 @@ sorted(idx[np.abs(T_vals) >= cutoffs[FDRs < 0.2].min()])
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The next line generates Figure~\ref{fig:labfdr}, which is similar
|
The next line generates Figure 13.11, which is similar
|
||||||
to Figure~\ref{Ch12:fig-plugin-fdr},
|
to Figure 13.9,
|
||||||
except that it is based on only a subset of the genes.
|
except that it is based on only a subset of the genes.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
|
|||||||
File diff suppressed because one or more lines are too long
@@ -23,7 +23,8 @@ def setup_env(outdir,
|
|||||||
uv_executable,
|
uv_executable,
|
||||||
timeout,
|
timeout,
|
||||||
kernel,
|
kernel,
|
||||||
allow_errors):
|
allow_errors,
|
||||||
|
overwrite):
|
||||||
"""
|
"""
|
||||||
Sets up a student environment for ISLP_labs.
|
Sets up a student environment for ISLP_labs.
|
||||||
|
|
||||||
@@ -45,6 +46,8 @@ def setup_env(outdir,
|
|||||||
Kernel to use for running notebooks.
|
Kernel to use for running notebooks.
|
||||||
allow_errors : bool
|
allow_errors : bool
|
||||||
Allow errors when running notebooks.
|
Allow errors when running notebooks.
|
||||||
|
overwrite: bool
|
||||||
|
Save notebook outputs in place?
|
||||||
"""
|
"""
|
||||||
repo_url = 'https://github.com/intro-stat-learning/ISLP_labs.git'
|
repo_url = 'https://github.com/intro-stat-learning/ISLP_labs.git'
|
||||||
|
|
||||||
@@ -110,7 +113,8 @@ def setup_env(outdir,
|
|||||||
if kernel:
|
if kernel:
|
||||||
nbconvert_command.extend(['--kernel', kernel])
|
nbconvert_command.extend(['--kernel', kernel])
|
||||||
nbconvert_command.append('--allow-errors')
|
nbconvert_command.append('--allow-errors')
|
||||||
|
if overwrite:
|
||||||
|
nbconvert_command.append('--inplace')
|
||||||
run_command(nbconvert_command, cwd=str(outdir))
|
run_command(nbconvert_command, cwd=str(outdir))
|
||||||
else:
|
else:
|
||||||
run_command([str(uv_bin / 'pip'), 'install', 'pytest', 'nbmake'], cwd=str(outdir))
|
run_command([str(uv_bin / 'pip'), 'install', 'pytest', 'nbmake'], cwd=str(outdir))
|
||||||
@@ -128,6 +132,9 @@ def setup_env(outdir,
|
|||||||
str(nbfile)]
|
str(nbfile)]
|
||||||
if kernel:
|
if kernel:
|
||||||
pytest_command.append(f'--nbmake-kernel={kernel}')
|
pytest_command.append(f'--nbmake-kernel={kernel}')
|
||||||
|
if overwrite:
|
||||||
|
pytest_command.append('--overwrite')
|
||||||
|
|
||||||
# nbmake does not have --allow-errors in the same way as nbconvert
|
# nbmake does not have --allow-errors in the same way as nbconvert
|
||||||
# If errors are not allowed, nbmake will fail on first error naturally
|
# If errors are not allowed, nbmake will fail on first error naturally
|
||||||
|
|
||||||
@@ -152,7 +159,8 @@ def main():
|
|||||||
parser.add_argument('--uv-executable', default='uv', help='The `uv` executable to use (default: "uv")')
|
parser.add_argument('--uv-executable', default='uv', help='The `uv` executable to use (default: "uv")')
|
||||||
parser.add_argument('--timeout', type=int, default=3600, help='Timeout for running notebooks (default: 3600)')
|
parser.add_argument('--timeout', type=int, default=3600, help='Timeout for running notebooks (default: 3600)')
|
||||||
parser.add_argument('--kernel', default=None, help='Kernel to use for running notebooks')
|
parser.add_argument('--kernel', default=None, help='Kernel to use for running notebooks')
|
||||||
parser.add_argument('--allow-errors', action='store_true', help='Allow errors when running notebooks. For older commits, Ch02-statlearn-lab.ipynb raises exceptions. Since v2.2.1 these are handled by appropriate tags in the .ipynb cells.')
|
parser.add_argument('--allow-errors', action='store_true', default=False, help='Allow errors when running notebooks. For older commits, Ch02-statlearn-lab.ipynb raises exceptions. Since v2.2.1 these are handled by appropriate tags in the .ipynb cells.')
|
||||||
|
parser.add_argument('--overwrite', action='store_true', default=False, help='If True, store output of cells in notebook(s).')
|
||||||
|
|
||||||
parser.add_argument('nbfiles', nargs='*', help='Optional list of notebooks to run.')
|
parser.add_argument('nbfiles', nargs='*', help='Optional list of notebooks to run.')
|
||||||
|
|
||||||
@@ -165,7 +173,8 @@ def main():
|
|||||||
args.uv_executable,
|
args.uv_executable,
|
||||||
args.timeout,
|
args.timeout,
|
||||||
args.kernel,
|
args.kernel,
|
||||||
args.allow_errors)
|
args.allow_errors,
|
||||||
|
args.overwrite)
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
main()
|
main()
|
||||||
|
|||||||
Reference in New Issue
Block a user