Fix refs again (#76)

* Ch2->Ch02

* fixed latex refs again, somehow crept back in

* fixed the page refs, formats synced

* unsynced

* executed notebook besides 10

* warnings for lasso

* allow saving of output in notebooks

* Ch10 executed
This commit is contained in:
Jonathan Taylor
2026-02-04 17:40:52 -08:00
committed by GitHub
parent 3d9af7c4b0
commit 6bf6160a3d
25 changed files with 21872 additions and 3191 deletions

View File

@@ -1,5 +1,4 @@
# Deep Learning
<a target="_blank" href="https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2.1/Ch10-deeplearning-lab.ipynb">
@@ -21,7 +20,8 @@ Much of our code is adapted from there, as well as the `pytorch_lightning` docum
We start with several standard imports that we have seen before.
```{python}
import numpy as np, pandas as pd
import numpy as np
import pandas as pd
from matplotlib.pyplot import subplots
from sklearn.linear_model import \
(LinearRegression,
@@ -57,7 +57,7 @@ the `torchmetrics` package has utilities to compute
various metrics to evaluate performance when fitting
a model. The `torchinfo` package provides a useful
summary of the layers of a model. We use the `read_image()`
function when loading test images in Section~\ref{Ch13-deeplearning-lab:using-pretrained-cnn-models}.
function when loading test images in Section 10.9.4.
If you have not already installed the packages `torchvision`
and `torchinfo` you can install them by running
@@ -153,17 +153,19 @@ in our example applying the `ResNet50` model
to some of our own images.
The `json` module will be used to load
a JSON file for looking up classes to identify the labels of the
pictures in the `ResNet50` example.
pictures in the `ResNet50` example. We'll also import `warnings` to filter
out warnings when fitting the LASSO to the IMDB data.
```{python}
from glob import glob
import json
import warnings
```
## Single Layer Network on Hitters Data
We start by fitting the models in Section~\ref{Ch13:sec:when-use-deep} on the `Hitters` data.
We start by fitting the models in Section 10.6 on the `Hitters` data.
```{python}
Hitters = load_data('Hitters').dropna()
@@ -217,7 +219,7 @@ np.abs(Yhat_test - Y_test).mean()
Next we fit the lasso using `sklearn`. We are using
mean absolute error to select and evaluate a model, rather than mean squared error.
The specialized solver we used in Section~\ref{Ch6-varselect-lab:lab-2-ridge-regression-and-the-lasso} uses only mean squared error. So here, with a bit more work, we create a cross-validation grid and perform the cross-validation directly.
The specialized solver we used in Section 6.5.2 uses only mean squared error. So here, with a bit more work, we create a cross-validation grid and perform the cross-validation directly.
We encode a pipeline with two steps: we first normalize the features using a `StandardScaler()` transform,
and then fit the lasso without further normalization.
@@ -439,7 +441,7 @@ hit_module = SimpleModule.regression(hit_model,
```
By using the `SimpleModule.regression()` method, we indicate that we will use squared-error loss as in
(\ref{Ch13:eq:4}).
(10.23).
We have also asked for mean absolute error to be tracked as well
in the metrics that are logged.
@@ -476,7 +478,7 @@ hit_trainer = Trainer(deterministic=True,
hit_trainer.fit(hit_module, datamodule=hit_dm)
```
At each step of SGD, the algorithm randomly selects 32 training observations for
the computation of the gradient. Recall from Section~\ref{Ch13:sec:fitt-neur-netw}
the computation of the gradient. Recall from Section 10.7
that an epoch amounts to the number of SGD steps required to process $n$
observations. Since the training set has
$n=175$, and we specified a `batch_size` of 32 in the construction of `hit_dm`, an epoch is $175/32=5.5$ SGD steps.
@@ -765,8 +767,8 @@ mnist_trainer.test(mnist_module,
datamodule=mnist_dm)
```
Table~\ref{Ch13:tab:mnist} also reports the error rates resulting from LDA (Chapter~\ref{Ch4:classification}) and multiclass logistic
regression. For LDA we refer the reader to Section~\ref{Ch4-classification-lab:linear-discriminant-analysis}.
Table 10.1 also reports the error rates resulting from LDA (Chapter 4) and multiclass logistic
regression. For LDA we refer the reader to Section 4.7.3.
Although we could use the `sklearn` function `LogisticRegression()` to fit
multiclass logistic regression, we are set up here to fit such a model
with `torch`.
@@ -871,7 +873,7 @@ for idx, (X_ ,Y_) in enumerate(cifar_dm.train_dataloader()):
Before we start, we look at some of the training images; similar code produced
Figure~\ref{Ch13:fig:cifar100} on page \pageref{Ch13:fig:cifar100}. The example below also illustrates
Figure 10.5 on page 406. The example below also illustrates
that `TensorDataset` objects can be indexed with integers --- we are choosing
random images from the training data by indexing `cifar_train`. In order to display correctly,
we must reorder the dimensions by a call to `np.transpose()`.
@@ -894,7 +896,7 @@ for i in range(5):
Here the `imshow()` method recognizes from the shape of its argument that it is a 3-dimensional array, with the last dimension indexing the three RGB color channels.
We specify a moderately-sized CNN for
demonstration purposes, similar in structure to Figure~\ref{Ch13:fig:DeepCNN}.
demonstration purposes, similar in structure to Figure 10.8.
We use several layers, each consisting of convolution, ReLU, and max-pooling steps.
We first define a module that defines one of these layers. As in our
previous examples, we overwrite the `__init__()` and `forward()` methods
@@ -1034,7 +1036,7 @@ summary_plot(cifar_results,
ax,
col='accuracy',
ylabel='Accuracy')
ax.set_xticks(np.linspace(0, 10, 6).astype(int))
ax.set_xticks(np.linspace(0, 30, 7).astype(int))
ax.set_ylabel('Accuracy')
ax.set_ylim([0, 1]);
```
@@ -1083,7 +1085,7 @@ clauses; if it works, we get the speedup, if it fails, nothing happens.
## Using Pretrained CNN Models
We now show how to use a CNN pretrained on the `imagenet` database to classify natural
images, and demonstrate how we produced Figure~\ref{Ch13:fig:homeimages}.
images, and demonstrate how we produced Figure 10.10.
We copied six JPEG images from a digital photo album into the
directory `book_images`. These images are available
from the data section of <www.statlearning.com>, the ISLP book website. Download `book_images.zip`; when
@@ -1192,7 +1194,7 @@ del(cifar_test,
## IMDB Document Classification
We now implement models for sentiment classification (Section~\ref{Ch13:sec:docum-class}) on the `IMDB`
We now implement models for sentiment classification (Section 10.4) on the `IMDB`
dataset. As mentioned above code block~8, we are using
a preprocessed version of the `IMDB` dataset found in the
`keras` package. As `keras` uses `tensorflow`, a different
@@ -1346,7 +1348,7 @@ matrix that is recognized by `sklearn.`
```
Similar to what we did in
Section~\ref{Ch13-deeplearning-lab:single-layer-network-on-hitters-data},
Section 10.9.1,
we construct a series of 50 values for the lasso reguralization parameter $\lambda$.
```{python}
@@ -1369,16 +1371,20 @@ logit = LogisticRegression(penalty='l1',
```
The path of 50 values takes approximately 40 seconds to run.
As in Chapter 6, we will filter out warnings, this time using a context manager.
```{python}
coefs = []
intercepts = []
with warnings.catch_warnings():
warnings.simplefilter("ignore")
coefs = []
intercepts = []
for l in lam_val:
logit.C = 1/l
logit.fit(X_train, Y_train)
coefs.append(logit.coef_.copy())
intercepts.append(logit.intercept_)
for l in lam_val:
logit.C = 1/l
logit.fit(X_train, Y_train)
coefs.append(logit.coef_.copy())
intercepts.append(logit.intercept_)
```
@@ -1454,16 +1460,16 @@ del(imdb_model,
## Recurrent Neural Networks
In this lab we fit the models illustrated in
Section~\ref{Ch13:sec:recurr-neur-netw}.
Section 10.5.
### Sequential Models for Document Classification
Here we fit a simple LSTM RNN for sentiment prediction to
the `IMDb` movie-review data, as discussed in Section~\ref{Ch13:sec:sequ-models-docum}.
the `IMDb` movie-review data, as discussed in Section 10.5.1.
For an RNN we use the sequence of words in a document, taking their
order into account. We loaded the preprocessed
data at the beginning of
Section~\ref{Ch13-deeplearning-lab:imdb-document-classification}.
Section 10.9.5.
A script that details the preprocessing can be found in the
`ISLP` library. Notably, since more than 90% of the documents
had fewer than 500 words, we set the document length to 500. For
@@ -1578,7 +1584,7 @@ del(lstm_model,
### Time Series Prediction
We now show how to fit the models in Section~\ref{Ch13:sec:time-seri-pred}
We now show how to fit the models in Section 10.5.2
for time series prediction.
We first load and standardize the data.