Fix refs again (#76)

* Ch2->Ch02 * fixed latex refs again, somehow crept back in * fixed the page refs, formats synced * unsynced * executed notebook besides 10 * warnings for lasso * allow saving of output in notebooks * Ch10 executed
2026-02-04 17:40:52 -08:00
parent 3d9af7c4b0
commit 6bf6160a3d
25 changed files with 21872 additions and 3191 deletions
--- a/Ch02-statlearn-lab.Rmd
+++ b/Ch02-statlearn-lab.Rmd
@@ -36,8 +36,8 @@ Please see the `Python` resources page on the book website [statlearning.com](ht
 You will need to install the `ISLP` package, which provides access to the datasets and custom-built functions that we provide.
 Inside a macOS or Linux terminal type `pip install ISLP`; this also installs most other packages needed in the labs. The `Python` resources page has a link to the `ISLP` documentation website.

-To run this lab, download the file `Ch2-statlearn-lab.ipynb` from the `Python` resources page. 
-Now run the following code at the command line: `jupyter lab Ch2-statlearn-lab.ipynb`.
+To run this lab, download the file `Ch02-statlearn-lab.ipynb` from the `Python` resources page. 
+Now run the following code at the command line: `jupyter lab Ch02-statlearn-lab.ipynb`.

 If you're using Windows, you can use the `start menu` to access `anaconda`, and follow the links. For example, to install `ISLP` and run this lab, you can run the same code above in an `anaconda` shell.

@@ -72,7 +72,7 @@ print('fit a model with', 11, 'variables')
 The following command will provide information about the `print()` function.

 ```{python}
-print?
+# print?

 ```

@@ -222,7 +222,7 @@ documentation associated with the function `fun`, if it exists.
 We can try this for `np.array()`. 

 ```{python}
-np.array?
+# np.array?

 ```
 This documentation indicates that we could create a floating point array by passing a `dtype` argument into `np.array()`.
--- a/Ch02-statlearn-lab.ipynb
+++ b/Ch02-statlearn-lab.ipynb
--- a/Ch03-linreg-lab.Rmd
+++ b/Ch03-linreg-lab.Rmd
@@ -332,7 +332,7 @@ As mentioned above, there is an existing function to add a line to a plot --- `a


 Next we examine some diagnostic plots, several of which were discussed
-in Section~\ref{Ch3:problems.sec}.
+in Section 3.3.3.
 We can find the fitted values and residuals
 of the fit as attributes of the `results` object.
 Various influence measures describing the regression model
@@ -429,7 +429,7 @@ We can access the individual components of `results` by name
 and
 `np.sqrt(results.scale)` gives us the RSE.

-Variance inflation factors (section~\ref{Ch3:problems.sec}) are sometimes useful
+Variance inflation factors (section 3.3.3) are sometimes useful
 to assess the effect of collinearity in the model matrix of a regression model.
 We will compute the VIFs in our multiple regression fit, and use the opportunity to introduce the idea of *list comprehension*.

--- a/Ch03-linreg-lab.ipynb
+++ b/Ch03-linreg-lab.ipynb
--- a/Ch04-classification-lab.Rmd
+++ b/Ch04-classification-lab.Rmd
@@ -394,7 +394,7 @@ lda.fit(X_train, L_train)

 ```
 Here we have used the list comprehensions introduced
-in Section~\ref{Ch3-linreg-lab:multivariate-goodness-of-fit}. Looking at our first line above, we see that the right-hand side is a list
+in Section 3.6.4. Looking at our first line above, we see that the right-hand side is a list
 of length two. This is because the code `for M in [X_train, X_test]` iterates over a list
 of length two. While here we loop over a list,
 the list comprehension method works when looping over any iterable object.
@@ -443,7 +443,7 @@ lda.scalings_

 ```

-These values provide the linear combination of `Lag1`  and `Lag2`  that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (\ref{Ch4:bayes.multi}).
+These values provide the linear combination of `Lag1`  and `Lag2`  that are used to form the LDA decision rule. In other words, these are the multipliers of the elements of $X=x$ in (4.24).
  If $-0.64\times `Lag1`  - 0.51 \times `Lag2` $ is large, then the LDA classifier will predict a market increase, and if it is small, then the LDA classifier will predict a market decline.

 ```{python}
@@ -452,7 +452,7 @@ lda_pred = lda.predict(X_test)
 ```

 As we observed in our comparison of classification methods
- (Section~\ref{Ch4:comparison.sec}),  the LDA and logistic
+ (Section 4.5),  the LDA and logistic
 regression predictions are almost identical.

 ```{python}
@@ -511,7 +511,7 @@ The LDA classifier above is the first classifier from the
 `sklearn` library. We will use several other objects
 from this library. The objects
 follow a common structure that simplifies tasks such as cross-validation,
-which we will see in Chapter~\ref{Ch5:resample}. Specifically,
+which we will see in Chapter 5. Specifically,
 the methods first create a generic classifier without
 referring to any data. This classifier is then fit
 to data with the `fit()`  method and predictions are
@@ -797,7 +797,7 @@ feature_std.std()

 ```

-Notice that the standard deviations are not quite $1$ here; this is again due to some procedures using the $1/n$ convention for variances (in this case `scaler()`), while others use $1/(n-1)$ (the `std()` method). See the footnote on page~\pageref{Ch4-varformula}.
+Notice that the standard deviations are not quite $1$ here; this is again due to some procedures using the $1/n$ convention for variances (in this case `scaler()`), while others use $1/(n-1)$ (the `std()` method). See the footnote on page 183.
 In this case it does not matter, as long as the variables are all on the same scale.

 Using the function `train_test_split()`  we now split the observations into a test set,
@@ -864,7 +864,7 @@ This is double the rate that one would obtain from random guessing.
 The number of neighbors in KNN is referred to as a *tuning parameter*, also referred to as a *hyperparameter*.
 We do not know *a priori* what value to use. It is therefore of interest
 to see how the classifier performs on test data as we vary these
-parameters. This can be achieved with a `for` loop, described in Section~\ref{Ch2-statlearn-lab:for-loops}.
+parameters. This can be achieved with a `for` loop, described in Section 2.3.8.
 Here we use a for loop to look at the accuracy of our classifier in the group predicted to purchase
 insurance as we vary the number of neighbors from 1 to 5:

@@ -891,7 +891,7 @@ As a comparison, we can also fit a logistic regression model to the
 data. This can also be done
 with `sklearn`, though by default it fits
 something like the *ridge regression* version
-of logistic regression, which we introduce in Chapter~\ref{Ch6:varselect}. This can
+of logistic regression, which we introduce in Chapter 6. This can
 be modified by appropriately setting the argument `C` below. Its default
 value is 1 but by setting it to a very large number, the algorithm converges to the same solution as the usual (unregularized)
 logistic regression estimator discussed above.
@@ -935,7 +935,7 @@ confusion_table(logit_labels, y_test)

 ```
 ## Linear and Poisson Regression on the Bikeshare Data
-Here we fit linear and  Poisson regression models to the `Bikeshare` data, as described in Section~\ref{Ch4:sec:pois}.
+Here we fit linear and  Poisson regression models to the `Bikeshare` data, as described in Section 4.6.
 The response `bikers` measures the number of bike rentals per hour
 in Washington, DC in the period 2010--2012.

@@ -976,7 +976,7 @@ variables constant, there are on average about 7 more riders in
 February than in January. Similarly there are about 16.5 more riders
 in March than in January.

-The results seen in Section~\ref{sec:bikeshare.linear}
+The results seen in Section 4.6.1
 used a slightly different coding of the variables `hr` and `mnth`, as follows:

 ```{python}
@@ -1030,7 +1030,7 @@ np.allclose(M_lm.fittedvalues, M2_lm.fittedvalues)
 ```


-To reproduce the left-hand side of Figure~\ref{Ch4:bikeshare}
+To reproduce the left-hand side of Figure 4.13
 we must first obtain the coefficient estimates associated with
 `mnth`. The coefficients for January through November can be obtained
 directly from the `M2_lm` object. The coefficient for December
@@ -1070,7 +1070,7 @@ ax_month.set_ylabel('Coefficient', fontsize=20);

 ```

-Reproducing the  right-hand plot in Figure~\ref{Ch4:bikeshare}  follows a similar process.
+Reproducing the  right-hand plot in Figure 4.13  follows a similar process.

 ```{python}
 coef_hr = S2[S2.index.str.contains('hr')]['coef']
@@ -1105,7 +1105,7 @@ M_pois = sm.GLM(Y, X2, family=sm.families.Poisson()).fit()

 ```

-We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce  Figure~\ref{Ch4:bikeshare.pois}. We first complete these coefficients as before.
+We can plot the coefficients associated with `mnth` and `hr`, in order to reproduce  Figure 4.15. We first complete these coefficients as before.

 ```{python}
 S_pois = summarize(M_pois)
--- a/Ch04-classification-lab.ipynb
+++ b/Ch04-classification-lab.ipynb
--- a/Ch05-resample-lab.Rmd
+++ b/Ch05-resample-lab.Rmd
@@ -226,7 +226,7 @@ for i, d in enumerate(range(1,6)):
 cv_error

 ```
-As in Figure~\ref{Ch5:cvplot}, we see a sharp drop in the estimated test MSE between the linear and
+As in Figure 5.4, we see a sharp drop in the estimated test MSE between the linear and
 quadratic fits, but then no clear improvement from using higher-degree polynomials.

 Above we introduced the `outer()`  method of the `np.power()`
@@ -267,7 +267,7 @@ cv_error
 Notice that the computation time is much shorter than that of LOOCV.
 (In principle, the computation time for LOOCV for a least squares
 linear model should be faster than for $k$-fold CV, due to the
-availability of the formula~(\ref{Ch5:eq:LOOCVform})  for LOOCV;
+availability of the formula~(5.2)  for LOOCV;
 however, the generic `cross_validate()`  function does not make
 use of this formula.)  We still see little evidence that using cubic
 or higher-degree polynomial terms leads to a lower test error than simply
@@ -314,7 +314,7 @@ incurred by picking different random folds.

 ## The Bootstrap
 We illustrate the use of the bootstrap in the simple example
- {of Section~\ref{Ch5:sec:bootstrap},}  as well as on an example involving
+ {of Section 5.2,}  as well as on an example involving
 estimating the accuracy of the linear regression model on the  `Auto`
 data set.
 ### Estimating the Accuracy of a Statistic of Interest
@@ -329,8 +329,8 @@ in a dataframe.
 To illustrate the bootstrap, we
 start with a simple example.
 The  `Portfolio`  data set in the `ISLP` package is described
-in Section~\ref{Ch5:sec:bootstrap}. The goal is to estimate the
-sampling variance of the parameter $\alpha$ given in formula~(\ref{Ch5:min.var}).  We will
+in Section 5.2. The goal is to estimate the
+sampling variance of the parameter $\alpha$ given in formula~(5.7).  We will
 create a function
 `alpha_func()`, which takes as input a dataframe `D` assumed
 to have columns `X` and `Y`, as well as a
@@ -349,7 +349,7 @@ def alpha_func(D, idx):
 ```
 This function returns an estimate for $\alpha$
 based on applying the minimum
-    variance formula (\ref{Ch5:min.var}) to the observations indexed by
+    variance formula (5.7) to the observations indexed by
 the argument `idx`.  For instance, the following command
 estimates $\alpha$ using all 100 observations.

@@ -419,7 +419,7 @@ intercept and slope terms for the linear regression model that uses
 `horsepower` to predict `mpg` in the  `Auto`  data set. We
 will compare the estimates obtained using the bootstrap to those
 obtained using the formulas for ${\rm SE}(\hat{\beta}_0)$ and
-${\rm SE}(\hat{\beta}_1)$ described in Section~\ref{Ch3:secoefsec}.
+${\rm SE}(\hat{\beta}_1)$ described in Section 3.1.2.

 To use our `boot_SE()` function, we must write a function (its
 first argument)
@@ -488,7 +488,7 @@ This indicates that the bootstrap estimate for ${\rm SE}(\hat{\beta}_0)$ is
 0.85, and that the bootstrap
 estimate for ${\rm SE}(\hat{\beta}_1)$ is
 0.0074.  As discussed in
-Section~\ref{Ch3:secoefsec}, standard formulas can be used to compute
+Section 3.1.2, standard formulas can be used to compute
 the standard errors for the regression coefficients in a linear
 model. These can be obtained using the `summarize()`  function
 from `ISLP.sm`.
@@ -502,7 +502,7 @@ model_se


 The standard error estimates for $\hat{\beta}_0$ and $\hat{\beta}_1$
-obtained using the formulas  from Section~\ref{Ch3:secoefsec}  are
+obtained using the formulas  from Section 3.1.2  are
 0.717 for the
 intercept and
 0.006 for the
@@ -510,13 +510,13 @@ slope. Interestingly, these are somewhat different from the estimates
 obtained using the bootstrap.  Does this indicate a problem with the
 bootstrap? In fact, it suggests the opposite.  Recall that the
 standard formulas given in
- {Equation~\ref{Ch3:se.eqn} on page~\pageref{Ch3:se.eqn}}
+Equation 3.8 on page 75
 rely on certain assumptions. For example,
 they depend on the unknown parameter $\sigma^2$, the noise
 variance. We then estimate $\sigma^2$ using the RSS. Now although the
 formulas for the standard errors do not rely on the linear model being
 correct, the estimate for $\sigma^2$ does.  We see
- {in Figure~\ref{Ch3:polyplot} on page~\pageref{Ch3:polyplot}}  that there is
+in Figure 3.8 on page 99  that there is
 a non-linear relationship in the data, and so the residuals from a
 linear fit will be inflated, and so will $\hat{\sigma}^2$.  Secondly,
 the standard formulas assume (somewhat unrealistically) that the $x_i$
@@ -529,7 +529,7 @@ the results from `sm.OLS`.
 Below we compute the bootstrap standard error estimates and the
 standard linear regression estimates that result from fitting the
 quadratic model to the data. Since this model provides a good fit to
-the data (Figure~\ref{Ch3:polyplot}), there is now a better
+the data (Figure 3.8), there is now a better
 correspondence between the bootstrap estimates and the standard
 estimates of ${\rm SE}(\hat{\beta}_0)$, ${\rm SE}(\hat{\beta}_1)$ and
 ${\rm SE}(\hat{\beta}_2)$.
--- a/Ch05-resample-lab.ipynb
+++ b/Ch05-resample-lab.ipynb
@@ -5,6 +5,7 @@
   "id": "6dde3cef",
   "metadata": {},
   "source": [
+    "\n",
    "# Cross-Validation and the Bootstrap\n",
    "\n",
    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2.1/Ch05-resample-lab.ipynb\">\n",
@@ -32,10 +33,10 @@
   "id": "f1deb5cc",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:38.213402Z",
-     "iopub.status.busy": "2026-02-02T23:39:38.212455Z",
-     "iopub.status.idle": "2026-02-02T23:39:39.121524Z",
-     "shell.execute_reply": "2026-02-02T23:39:39.121125Z"
+     "iopub.execute_input": "2026-02-05T00:47:05.730449Z",
+     "iopub.status.busy": "2026-02-05T00:47:05.730329Z",
+     "iopub.status.idle": "2026-02-05T00:47:06.513542Z",
+     "shell.execute_reply": "2026-02-05T00:47:06.513033Z"
    },
    "lines_to_next_cell": 2
   },
@@ -64,10 +65,10 @@
   "id": "268c41b3",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:39.123447Z",
-     "iopub.status.busy": "2026-02-02T23:39:39.123256Z",
-     "iopub.status.idle": "2026-02-02T23:39:39.125378Z",
-     "shell.execute_reply": "2026-02-02T23:39:39.125152Z"
+     "iopub.execute_input": "2026-02-05T00:47:06.515215Z",
+     "iopub.status.busy": "2026-02-05T00:47:06.515043Z",
+     "iopub.status.idle": "2026-02-05T00:47:06.517293Z",
+     "shell.execute_reply": "2026-02-05T00:47:06.516837Z"
    },
    "lines_to_next_cell": 2
   },
@@ -108,10 +109,10 @@
   "id": "22f44ae0",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:39.126552Z",
-     "iopub.status.busy": "2026-02-02T23:39:39.126448Z",
-     "iopub.status.idle": "2026-02-02T23:39:39.136455Z",
-     "shell.execute_reply": "2026-02-02T23:39:39.135968Z"
+     "iopub.execute_input": "2026-02-05T00:47:06.518601Z",
+     "iopub.status.busy": "2026-02-05T00:47:06.518485Z",
+     "iopub.status.idle": "2026-02-05T00:47:06.524299Z",
+     "shell.execute_reply": "2026-02-05T00:47:06.523724Z"
    }
   },
   "outputs": [],
@@ -136,10 +137,10 @@
   "id": "0c32e917",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:39.138496Z",
-     "iopub.status.busy": "2026-02-02T23:39:39.138360Z",
-     "iopub.status.idle": "2026-02-02T23:39:39.144210Z",
-     "shell.execute_reply": "2026-02-02T23:39:39.143911Z"
+     "iopub.execute_input": "2026-02-05T00:47:06.525648Z",
+     "iopub.status.busy": "2026-02-05T00:47:06.525549Z",
+     "iopub.status.idle": "2026-02-05T00:47:06.531033Z",
+     "shell.execute_reply": "2026-02-05T00:47:06.530633Z"
    }
   },
   "outputs": [],
@@ -166,17 +167,17 @@
   "id": "86ce4f85",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:39.145874Z",
-     "iopub.status.busy": "2026-02-02T23:39:39.145769Z",
-     "iopub.status.idle": "2026-02-02T23:39:39.150170Z",
-     "shell.execute_reply": "2026-02-02T23:39:39.149908Z"
+     "iopub.execute_input": "2026-02-05T00:47:06.532517Z",
+     "iopub.status.busy": "2026-02-05T00:47:06.532422Z",
+     "iopub.status.idle": "2026-02-05T00:47:06.537369Z",
+     "shell.execute_reply": "2026-02-05T00:47:06.536814Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "23.61661706966988"
+       "np.float64(23.616617069669893)"
      ]
     },
     "execution_count": 5,
@@ -210,10 +211,10 @@
   "id": "50a66a97",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:39.151449Z",
-     "iopub.status.busy": "2026-02-02T23:39:39.151374Z",
-     "iopub.status.idle": "2026-02-02T23:39:39.153368Z",
-     "shell.execute_reply": "2026-02-02T23:39:39.153165Z"
+     "iopub.execute_input": "2026-02-05T00:47:06.538690Z",
+     "iopub.status.busy": "2026-02-05T00:47:06.538577Z",
+     "iopub.status.idle": "2026-02-05T00:47:06.540965Z",
+     "shell.execute_reply": "2026-02-05T00:47:06.540491Z"
    }
   },
   "outputs": [],
@@ -253,10 +254,10 @@
   "id": "d49b6999",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:39.154485Z",
-     "iopub.status.busy": "2026-02-02T23:39:39.154414Z",
-     "iopub.status.idle": "2026-02-02T23:39:39.167468Z",
-     "shell.execute_reply": "2026-02-02T23:39:39.167243Z"
+     "iopub.execute_input": "2026-02-05T00:47:06.542270Z",
+     "iopub.status.busy": "2026-02-05T00:47:06.542165Z",
+     "iopub.status.idle": "2026-02-05T00:47:06.560011Z",
+     "shell.execute_reply": "2026-02-05T00:47:06.559510Z"
    }
   },
   "outputs": [
@@ -297,10 +298,10 @@
   "id": "dac8bd54",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:39.168726Z",
-     "iopub.status.busy": "2026-02-02T23:39:39.168651Z",
-     "iopub.status.idle": "2026-02-02T23:39:39.182426Z",
-     "shell.execute_reply": "2026-02-02T23:39:39.182175Z"
+     "iopub.execute_input": "2026-02-05T00:47:06.561313Z",
+     "iopub.status.busy": "2026-02-05T00:47:06.561212Z",
+     "iopub.status.idle": "2026-02-05T00:47:06.579372Z",
+     "shell.execute_reply": "2026-02-05T00:47:06.578863Z"
    }
   },
   "outputs": [
@@ -380,10 +381,10 @@
   "id": "601ae443",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:39.184522Z",
-     "iopub.status.busy": "2026-02-02T23:39:39.184425Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.007057Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.006826Z"
+     "iopub.execute_input": "2026-02-05T00:47:06.580908Z",
+     "iopub.status.busy": "2026-02-05T00:47:06.580793Z",
+     "iopub.status.idle": "2026-02-05T00:47:07.783382Z",
+     "shell.execute_reply": "2026-02-05T00:47:07.782912Z"
    },
    "lines_to_next_cell": 0
   },
@@ -391,7 +392,7 @@
    {
     "data": {
      "text/plain": [
-       "24.23151351792924"
+       "np.float64(24.231513517929226)"
      ]
     },
     "execution_count": 9,
@@ -448,10 +449,10 @@
   "id": "11226c85",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.008355Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.008281Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.469400Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.469130Z"
+     "iopub.execute_input": "2026-02-05T00:47:07.785034Z",
+     "iopub.status.busy": "2026-02-05T00:47:07.784918Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.636881Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.636274Z"
    },
    "lines_to_next_cell": 0
   },
@@ -459,7 +460,7 @@
    {
     "data": {
      "text/plain": [
-       "array([24.23151352, 19.24821312, 19.33498406, 19.42443033, 19.03323827])"
+       "array([24.23151352, 19.24821312, 19.33498406, 19.4244303 , 19.03321527])"
      ]
     },
     "execution_count": 10,
@@ -486,7 +487,7 @@
   "id": "a3a920ae",
   "metadata": {},
   "source": [
-    "As in Figure~\\ref{Ch5:cvplot}, we see a sharp drop in the estimated test MSE between the linear and\n",
+    "As in Figure 5.4, we see a sharp drop in the estimated test MSE between the linear and\n",
    "quadratic fits, but then no clear improvement from using higher-degree polynomials.\n",
    "\n",
    "Above we introduced the `outer()`  method of the `np.power()`\n",
@@ -505,10 +506,10 @@
   "id": "64b64d97",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.470650Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.470570Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.472721Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.472520Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.638904Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.638760Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.641760Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.641094Z"
    }
   },
   "outputs": [
@@ -547,10 +548,10 @@
   "id": "ca0f972f",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.473794Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.473725Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.491806Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.491547Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.643305Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.643162Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.675141Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.674740Z"
    },
    "lines_to_next_cell": 0
   },
@@ -558,7 +559,7 @@
    {
     "data": {
      "text/plain": [
-       "array([24.20766449, 19.18533142, 19.27626666, 19.47848403, 19.13720581])"
+       "array([24.20766449, 19.18533142, 19.27626666, 19.47848402, 19.13718373])"
      ]
     },
     "execution_count": 12,
@@ -589,7 +590,7 @@
    "Notice that the computation time is much shorter than that of LOOCV.\n",
    "(In principle, the computation time for LOOCV for a least squares\n",
    "linear model should be faster than for $k$-fold CV, due to the\n",
-    "availability of the formula~(\\ref{Ch5:eq:LOOCVform})  for LOOCV;\n",
+    "availability of the formula~(5.2)  for LOOCV;\n",
    "however, the generic `cross_validate()`  function does not make\n",
    "use of this formula.)  We still see little evidence that using cubic\n",
    "or higher-degree polynomial terms leads to a lower test error than simply\n",
@@ -613,10 +614,10 @@
   "id": "080cdb29",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.493215Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.493138Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.498978Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.498768Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.677151Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.676983Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.685777Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.685174Z"
    },
    "lines_to_next_cell": 2
   },
@@ -657,17 +658,17 @@
   "id": "7c46de2b",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.500169Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.500100Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.525494Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.525258Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.687387Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.687268Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.726376Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.725862Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "(23.802232661034168, 1.421845094109185)"
+       "(np.float64(23.802232661034168), np.float64(1.4218450941091916))"
      ]
     },
     "execution_count": 14,
@@ -699,7 +700,7 @@
    "\n",
    "## The Bootstrap\n",
    "We illustrate the use of the bootstrap in the simple example\n",
-    " {of Section~\\ref{Ch5:sec:bootstrap},}  as well as on an example involving\n",
+    " {of Section 5.2,}  as well as on an example involving\n",
    "estimating the accuracy of the linear regression model on the  `Auto`\n",
    "data set.\n",
    "### Estimating the Accuracy of a Statistic of Interest\n",
@@ -714,8 +715,8 @@
    "To illustrate the bootstrap, we\n",
    "start with a simple example.\n",
    "The  `Portfolio`  data set in the `ISLP` package is described\n",
-    "in Section~\\ref{Ch5:sec:bootstrap}. The goal is to estimate the\n",
-    "sampling variance of the parameter $\\alpha$ given in formula~(\\ref{Ch5:min.var}).  We will\n",
+    "in Section 5.2. The goal is to estimate the\n",
+    "sampling variance of the parameter $\\alpha$ given in formula~(5.7).  We will\n",
    "create a function\n",
    "`alpha_func()`, which takes as input a dataframe `D` assumed\n",
    "to have columns `X` and `Y`, as well as a\n",
@@ -731,10 +732,10 @@
   "id": "a4b6d9b3",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.526599Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.526530Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.529182Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.528979Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.727991Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.727860Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.731089Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.730582Z"
    },
    "lines_to_next_cell": 0
   },
@@ -754,7 +755,7 @@
   "source": [
    "This function returns an estimate for $\\alpha$\n",
    "based on applying the minimum\n",
-    "    variance formula (\\ref{Ch5:min.var}) to the observations indexed by\n",
+    "    variance formula (5.7) to the observations indexed by\n",
    "the argument `idx`.  For instance, the following command\n",
    "estimates $\\alpha$ using all 100 observations."
   ]
@@ -765,17 +766,17 @@
   "id": "81498a11",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.530362Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.530277Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.532680Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.532462Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.732467Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.732368Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.735439Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.734958Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "0.57583207459283"
+       "np.float64(0.57583207459283)"
      ]
     },
     "execution_count": 16,
@@ -804,10 +805,10 @@
   "id": "64fe1cb6",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.533876Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.533810Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.536977Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.536767Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.736816Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.736715Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.739777Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.739384Z"
    },
    "lines_to_next_cell": 2
   },
@@ -815,7 +816,7 @@
    {
     "data": {
      "text/plain": [
-       "0.6074452469619004"
+       "np.float64(0.6074452469619002)"
      ]
     },
     "execution_count": 17,
@@ -847,10 +848,10 @@
   "id": "dd16bbae",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.538089Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.538010Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.540142Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.539916Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.741328Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.741207Z",
+     "iopub.status.idle": "2026-02-05T00:47:08.743628Z",
+     "shell.execute_reply": "2026-02-05T00:47:08.743222Z"
    },
    "lines_to_next_cell": 0
   },
@@ -892,17 +893,17 @@
   "id": "b42b4585",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.541333Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.541248Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.761713Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.761472Z"
+     "iopub.execute_input": "2026-02-05T00:47:08.744970Z",
+     "iopub.status.busy": "2026-02-05T00:47:08.744866Z",
+     "iopub.status.idle": "2026-02-05T00:47:09.034568Z",
+     "shell.execute_reply": "2026-02-05T00:47:09.034067Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "0.09118176521277699"
+       "np.float64(0.09118176521277699)"
      ]
     },
     "execution_count": 19,
@@ -934,7 +935,7 @@
    "`horsepower` to predict `mpg` in the  `Auto`  data set. We\n",
    "will compare the estimates obtained using the bootstrap to those\n",
    "obtained using the formulas for ${\\rm SE}(\\hat{\\beta}_0)$ and\n",
-    "${\\rm SE}(\\hat{\\beta}_1)$ described in Section~\\ref{Ch3:secoefsec}.\n",
+    "${\\rm SE}(\\hat{\\beta}_1)$ described in Section 3.1.2.\n",
    "\n",
    "To use our `boot_SE()` function, we must write a function (its\n",
    "first argument)\n",
@@ -958,10 +959,10 @@
   "id": "6bc11784",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.762911Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.762837Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.764597Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.764401Z"
+     "iopub.execute_input": "2026-02-05T00:47:09.036186Z",
+     "iopub.status.busy": "2026-02-05T00:47:09.036064Z",
+     "iopub.status.idle": "2026-02-05T00:47:09.038120Z",
+     "shell.execute_reply": "2026-02-05T00:47:09.037695Z"
    },
    "lines_to_next_cell": 0
   },
@@ -993,10 +994,10 @@
   "id": "740cd50c",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.765800Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.765734Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.767191Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.766986Z"
+     "iopub.execute_input": "2026-02-05T00:47:09.039473Z",
+     "iopub.status.busy": "2026-02-05T00:47:09.039363Z",
+     "iopub.status.idle": "2026-02-05T00:47:09.041198Z",
+     "shell.execute_reply": "2026-02-05T00:47:09.040802Z"
    },
    "lines_to_next_cell": 0
   },
@@ -1026,10 +1027,10 @@
   "id": "ffb3ec50",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.768266Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.768204Z",
-     "iopub.status.idle": "2026-02-02T23:39:40.789931Z",
-     "shell.execute_reply": "2026-02-02T23:39:40.789705Z"
+     "iopub.execute_input": "2026-02-05T00:47:09.042630Z",
+     "iopub.status.busy": "2026-02-05T00:47:09.042520Z",
+     "iopub.status.idle": "2026-02-05T00:47:09.074556Z",
+     "shell.execute_reply": "2026-02-05T00:47:09.074079Z"
    },
    "lines_to_next_cell": 0
   },
@@ -1077,10 +1078,10 @@
   "id": "7d561f70",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:40.791196Z",
-     "iopub.status.busy": "2026-02-02T23:39:40.791129Z",
-     "iopub.status.idle": "2026-02-02T23:39:42.690473Z",
-     "shell.execute_reply": "2026-02-02T23:39:42.690189Z"
+     "iopub.execute_input": "2026-02-05T00:47:09.076044Z",
+     "iopub.status.busy": "2026-02-05T00:47:09.075921Z",
+     "iopub.status.idle": "2026-02-05T00:47:11.975218Z",
+     "shell.execute_reply": "2026-02-05T00:47:11.974653Z"
    },
    "lines_to_next_cell": 2
   },
@@ -1115,7 +1116,7 @@
    "0.85, and that the bootstrap\n",
    "estimate for ${\\rm SE}(\\hat{\\beta}_1)$ is\n",
    "0.0074.  As discussed in\n",
-    "Section~\\ref{Ch3:secoefsec}, standard formulas can be used to compute\n",
+    "Section 3.1.2, standard formulas can be used to compute\n",
    "the standard errors for the regression coefficients in a linear\n",
    "model. These can be obtained using the `summarize()`  function\n",
    "from `ISLP.sm`."
@@ -1127,10 +1128,10 @@
   "id": "3888aa0a",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:42.692058Z",
-     "iopub.status.busy": "2026-02-02T23:39:42.691967Z",
-     "iopub.status.idle": "2026-02-02T23:39:42.737592Z",
-     "shell.execute_reply": "2026-02-02T23:39:42.737346Z"
+     "iopub.execute_input": "2026-02-05T00:47:11.976651Z",
+     "iopub.status.busy": "2026-02-05T00:47:11.976530Z",
+     "iopub.status.idle": "2026-02-05T00:47:11.995335Z",
+     "shell.execute_reply": "2026-02-05T00:47:11.994822Z"
    },
    "lines_to_next_cell": 2
   },
@@ -1160,7 +1161,7 @@
   "metadata": {},
   "source": [
    "The standard error estimates for $\\hat{\\beta}_0$ and $\\hat{\\beta}_1$\n",
-    "obtained using the formulas  from Section~\\ref{Ch3:secoefsec}  are\n",
+    "obtained using the formulas  from Section 3.1.2  are\n",
    "0.717 for the\n",
    "intercept and\n",
    "0.006 for the\n",
@@ -1168,13 +1169,13 @@
    "obtained using the bootstrap.  Does this indicate a problem with the\n",
    "bootstrap? In fact, it suggests the opposite.  Recall that the\n",
    "standard formulas given in\n",
-    " {Equation~\\ref{Ch3:se.eqn} on page~\\pageref{Ch3:se.eqn}}\n",
+    "Equation 3.8 on page 75\n",
    "rely on certain assumptions. For example,\n",
    "they depend on the unknown parameter $\\sigma^2$, the noise\n",
    "variance. We then estimate $\\sigma^2$ using the RSS. Now although the\n",
    "formulas for the standard errors do not rely on the linear model being\n",
    "correct, the estimate for $\\sigma^2$ does.  We see\n",
-    " {in Figure~\\ref{Ch3:polyplot} on page~\\pageref{Ch3:polyplot}}  that there is\n",
+    "in Figure 3.8 on page 99  that there is\n",
    "a non-linear relationship in the data, and so the residuals from a\n",
    "linear fit will be inflated, and so will $\\hat{\\sigma}^2$.  Secondly,\n",
    "the standard formulas assume (somewhat unrealistically) that the $x_i$\n",
@@ -1187,7 +1188,7 @@
    "Below we compute the bootstrap standard error estimates and the\n",
    "standard linear regression estimates that result from fitting the\n",
    "quadratic model to the data. Since this model provides a good fit to\n",
-    "the data (Figure~\\ref{Ch3:polyplot}), there is now a better\n",
+    "the data (Figure 3.8), there is now a better\n",
    "correspondence between the bootstrap estimates and the standard\n",
    "estimates of ${\\rm SE}(\\hat{\\beta}_0)$, ${\\rm SE}(\\hat{\\beta}_1)$ and\n",
    "${\\rm SE}(\\hat{\\beta}_2)$."
@@ -1199,10 +1200,10 @@
   "id": "acc3e32c",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:42.739037Z",
-     "iopub.status.busy": "2026-02-02T23:39:42.738882Z",
-     "iopub.status.idle": "2026-02-02T23:39:45.405121Z",
-     "shell.execute_reply": "2026-02-02T23:39:45.404784Z"
+     "iopub.execute_input": "2026-02-05T00:47:11.996827Z",
+     "iopub.status.busy": "2026-02-05T00:47:11.996698Z",
+     "iopub.status.idle": "2026-02-05T00:47:16.184970Z",
+     "shell.execute_reply": "2026-02-05T00:47:16.184301Z"
    }
   },
   "outputs": [
@@ -1242,10 +1243,10 @@
   "id": "dca5340c",
   "metadata": {
    "execution": {
-     "iopub.execute_input": "2026-02-02T23:39:45.406403Z",
-     "iopub.status.busy": "2026-02-02T23:39:45.406328Z",
-     "iopub.status.idle": "2026-02-02T23:39:45.416112Z",
-     "shell.execute_reply": "2026-02-02T23:39:45.415874Z"
+     "iopub.execute_input": "2026-02-05T00:47:16.186711Z",
+     "iopub.status.busy": "2026-02-05T00:47:16.186585Z",
+     "iopub.status.idle": "2026-02-05T00:47:16.197450Z",
+     "shell.execute_reply": "2026-02-05T00:47:16.197112Z"
    },
    "lines_to_next_cell": 0
   },
@@ -1272,7 +1273,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "e98297be",
+   "id": "d714b6a3",
   "metadata": {},
   "source": [
    "\n",
@@ -1283,8 +1284,7 @@
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
-   "formats": "ipynb",
-   "main_language": "python"
+   "formats": "ipynb"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
@@ -1301,7 +1301,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.12.10"
  }
 },
 "nbformat": 4,
--- a/Ch06-varselect-lab.Rmd
+++ b/Ch06-varselect-lab.Rmd
@@ -86,7 +86,7 @@ Hitters.shape
 ```


-We first choose the best model using forward selection based on $C_p$ (\ref{Ch6:eq:cp}). This score
+We first choose the best model using forward selection based on $C_p$ (6.2). This score
 is not built in as a metric to `sklearn`. We therefore define a function to compute it ourselves, and use
 it as a scorer. By default, `sklearn` tries to maximize a score, hence
  our scoring function  computes the negative $C_p$ statistic.
@@ -111,7 +111,7 @@ sigma2 = OLS(Y,X).fit().scale

 ```

-The function `sklearn_selected()` expects a scorer with just three arguments --- the last three in the definition of `nCp()` above. We use the function `partial()` first seen in Section~\ref{Ch5-resample-lab:the-bootstrap} to freeze the first argument with our estimate of $\sigma^2$.
+The function `sklearn_selected()` expects a scorer with just three arguments --- the last three in the definition of `nCp()` above. We use the function `partial()` first seen in Section 5.3.3 to freeze the first argument with our estimate of $\sigma^2$.

 ```{python}
 neg_Cp = partial(nCp, sigma2)
@@ -363,7 +363,7 @@ Since we
 standardize first, in order to find coefficient
 estimates on the original scale, we must *unstandardize*
 the coefficient estimates. The parameter
-$\lambda$ in (\ref{Ch6:ridge}) and (\ref{Ch6:LASSO}) is called `alphas` in `sklearn`. In order to
+$\lambda$ in (6.5) and (6.7) is called `alphas` in `sklearn`. In order to
 be consistent with the rest of this chapter, we use `lambdas`
 rather than `alphas` in what follows.  {At the time of publication, ridge fits like the one in code chunk [23] issue unwarranted convergence warning messages; we suppressed these when we filtered the warnings above.}

@@ -540,7 +540,7 @@ grid.best_params_['ridge__alpha']
 grid.best_estimator_

 ```
-Recall we set up the `kfold` object for 5-fold cross-validation on page~\pageref{line:choos-among-models}. We now plot the cross-validated MSE as a function of $-\log(\lambda)$, which has shrinkage decreasing from left
+Recall we set up the `kfold` object for 5-fold cross-validation on page 271. We now plot the cross-validated MSE as a function of $-\log(\lambda)$, which has shrinkage decreasing from left
 to right.

 ```{python}
@@ -640,7 +640,7 @@ not perform variable selection!
 ### Evaluating Test Error of Cross-Validated Ridge
 Choosing $\lambda$ using cross-validation provides a single regression
 estimator, similar to fitting a linear regression model as we saw in
-Chapter~\ref{Ch3:linreg}. It is therefore reasonable to estimate what its test error
+Chapter 3. It is therefore reasonable to estimate what its test error
 is. We run into a problem here in that cross-validation will have
 *touched* all of its data in choosing $\lambda$, hence we have no
 further data to estimate test error. A compromise is to do an initial
@@ -728,7 +728,7 @@ ax.set_ylabel('Standardized coefficiients', fontsize=20);
 ```
 The smallest cross-validated error is lower than the test set MSE of the null model
 and of least squares, and very similar to the test MSE of 115526.71 of ridge
-regression (page~\pageref{page:MSECVRidge}) with $\lambda$ chosen by cross-validation.
+regression (page 278) with $\lambda$ chosen by cross-validation.

 ```{python}
 np.min(tuned_lasso.mse_path_.mean(1))
@@ -776,11 +776,11 @@ Principal components regression (PCR) can be performed using
 `PCA()`  from the `sklearn.decomposition`
 module. We now apply PCR to the  `Hitters`  data, in order to
 predict `Salary`. Again, ensure that the missing values have
-been removed from the data, as described in Section~\ref{Ch6-varselect-lab:lab-1-subset-selection-methods}.
+been removed from the data, as described in Section 6.5.1.

 We use `LinearRegression()`  to fit the regression model
 here. Note that it fits an intercept by default, unlike
-the `OLS()` function seen earlier in Section~\ref{Ch6-varselect-lab:lab-1-subset-selection-methods}.
+the `OLS()` function seen earlier in Section 6.5.1.

 ```{python}
 pca = PCA(n_components=2)
@@ -864,7 +864,7 @@ cv_null = skm.cross_validate(linreg,
 The `explained_variance_ratio_`
 attribute of our `PCA` object provides the *percentage of variance explained* in the predictors and in the response using
 different numbers of components. This concept is discussed in greater
-detail in Section~\ref{Ch10:sec:pca}.
+detail in Section 12.2.

 ```{python}
 pipe.named_steps['pca'].explained_variance_ratio_
--- a/Ch06-varselect-lab.ipynb
+++ b/Ch06-varselect-lab.ipynb
--- a/Ch07-nonlin-lab.Rmd
+++ b/Ch07-nonlin-lab.Rmd
@@ -47,7 +47,7 @@ from ISLP.pygam import (approx_lam,
 ```

 ## Polynomial Regression and Step Functions
-We start by demonstrating how Figure~\ref{Ch7:fig:poly} can be reproduced.
+We start by demonstrating how Figure 7.1 can be reproduced.
 Let's  begin by loading the data.

 ```{python}
@@ -59,7 +59,7 @@ age = Wage['age']

 Throughout most of this lab, our response is `Wage['wage']`, which
 we have stored as `y` above. 
-As in Section~\ref{Ch3-linreg-lab:non-linear-transformations-of-the-predictors}, we will use the `poly()` function to create a model matrix
+As in Section 3.6.6, we will use the `poly()` function to create a model matrix
 that will fit a $4$th degree polynomial in `age`.

 ```{python}
@@ -73,11 +73,11 @@ summarize(M)
 This polynomial is constructed using the function `poly()`,
 which creates
 a special *transformer* `Poly()` (using `sklearn` terminology
-for feature transformations such as `PCA()` seen in Section \ref{Ch6-varselect-lab:principal-components-regression}) which
+for feature transformations such as `PCA()` seen in Section 6.5.3) which
 allows for easy evaluation of the polynomial at new data points. Here `poly()` is referred to as a *helper* function, and sets up the transformation; `Poly()` is the actual workhorse that computes the transformation. See also 
 the 
 discussion of transformations on
-page~\pageref{Ch3-linreg-lab:using-transformations-fit-and-transform}. 
+page 118.

 In the code above, the first line executes the `fit()` method
 using the dataframe
@@ -140,7 +140,7 @@ def plot_wage_fit(age_df,
 We include an argument `alpha` to `ax.scatter()`
 to add some transparency to the points. This provides a visual indication
 of density. Notice the use of the `zip()` function in the
-`for` loop above (see Section~\ref{Ch2-statlearn-lab:for-loops}).
+`for` loop above (see Section 2.3.8).
 We have three lines to plot, each with different colors and line
 types. Here `zip()` conveniently bundles these together as
 iterators in the loop. {In `Python`{} speak, an "iterator" is an object with a finite number of values, that can be iterated on, as in a loop.}
@@ -243,7 +243,7 @@ anova_lm(*[sm.OLS(y, X_).fit() for X_ in XEs])


 As an alternative to using hypothesis tests and ANOVA, we could choose
-the polynomial degree using cross-validation, as discussed in Chapter~\ref{Ch5:resample}.
+the polynomial degree using cross-validation, as discussed in Chapter 5.

 Next we consider the task of predicting whether an individual earns
 more than $250,000 per year. We proceed much as before, except
@@ -302,7 +302,7 @@ value do not cover each other up. This type of plot is often called a
 *rug plot*.

 In order to fit a step function, as discussed in
-Section~\ref{Ch7:sec:scolstep-function},   we first use the `pd.qcut()`
+Section 7.2,   we first use the `pd.qcut()`
 function to discretize `age` based on quantiles.  Then  we use `pd.get_dummies()` to create the
 columns of the model matrix for this categorical variable. Note that this function will
 include *all* columns for a given categorical, rather than the usual approach which drops one
@@ -334,7 +334,7 @@ evaluation functions are in the `scipy.interpolate` package;
 we have simply wrapped them as transforms
 similar to `Poly()` and `PCA()`.

-In Section~\ref{Ch7:sec:scolr-splin}, we saw
+In Section 7.4, we saw
 that regression splines can be fit by constructing an appropriate
 matrix of basis functions.  The `BSpline()`  function generates the
 entire matrix of basis functions for splines with the specified set of
@@ -349,7 +349,7 @@ bs_age.shape
 ```
 This results in a seven-column matrix, which is what is expected for a cubic-spline basis with 3 interior knots. 
 We can form this same matrix using the `bs()` object,
-which facilitates adding this to a model-matrix builder (as in `poly()` versus its workhorse `Poly()`) described in Section~\ref{Ch7-nonlin-lab:polynomial-regression-and-step-functions}.
+which facilitates adding this to a model-matrix builder (as in `poly()` versus its workhorse `Poly()`) described in Section 7.8.1.

 We now fit a cubic spline model to the `Wage`  data. 

@@ -458,7 +458,7 @@ of a model matrix with a particular smoothing operation:
 `s` for smoothing spline; `l` for linear, and `f` for factor or categorical variables.
 The argument `0` passed to `s` below indicates that this smoother will
 apply to the first column of a feature matrix. Below, we pass it a
-matrix with a single column: `X_age`. The argument `lam` is the penalty parameter $\lambda$ as discussed in Section~\ref{Ch7:sec5.2}.
+matrix with a single column: `X_age`. The argument `lam` is the penalty parameter $\lambda$ as discussed in Section 7.5.2.

 ```{python}
 X_age = np.asarray(age).reshape((-1,1))
@@ -548,7 +548,7 @@ The strength of generalized additive models lies in their ability to fit multiva

 We now fit a GAM by hand to predict
 `wage` using natural spline functions of `year` and `age`,
-treating `education` as a qualitative predictor, as in (\ref{Ch7:nsmod}).
+treating `education` as a qualitative predictor, as in (7.16).
 Since this is just a big linear regression model
 using an appropriate choice of basis functions, we can simply do this
 using the `sm.OLS()`  function.
@@ -631,9 +631,9 @@ ax.set_title('Partial dependence of year on wage', fontsize=20);

 ```

-We now fit the model (\ref{Ch7:nsmod})  using smoothing splines rather
+We now fit the model (7.16)  using smoothing splines rather
 than natural splines.  All of the
-terms in  (\ref{Ch7:nsmod})  are fit simultaneously, taking each other
+terms in  (7.16)  are fit simultaneously, taking each other
 into account to explain the response. The `pygam` package only works with matrices, so we must convert
 the categorical series `education` to its array representation, which can be found
 with the `cat.codes` attribute of `education`. As `year` only has 7 unique values, we
--- a/Ch07-nonlin-lab.ipynb
+++ b/Ch07-nonlin-lab.ipynb
--- a/Ch08-baggboost-lab.Rmd
+++ b/Ch08-baggboost-lab.Rmd
@@ -86,11 +86,11 @@ clf.fit(X, High)
 ```


-In our discussion of qualitative features in Section~\ref{ch3:sec3},
+In our discussion of qualitative features in Section 3.3,
 we noted that for a linear regression model such a feature could be
 represented by including a matrix of dummy variables (one-hot-encoding) in the model
 matrix, using the formula notation of `statsmodels`.
-As mentioned in Section~\ref{Ch8:decison.tree.sec}, there is a more
+As mentioned in Section 8.1, there is a more
 natural way to handle qualitative features when building a decision
 tree, that does not require such dummy variables; each split amounts to partitioning the levels into two groups.
 However, 
@@ -121,7 +121,7 @@ resid_dev

 ```

-This is closely related to the *entropy*, defined in (\ref{Ch8:eq:cross-entropy}).
+This is closely related to the *entropy*, defined in (8.7).
 A small deviance indicates a
 tree that provides a good fit to the (training) data.
 
@@ -158,7 +158,7 @@ on these data, we must estimate the test error rather than simply
 computing the training error. We split the observations into a
 training set and a test set, build the tree using the training set,
 and evaluate its performance on the test data. This pattern is
-similar to that in Chapter~\ref{Ch6:varselect}, with the linear models
+similar to that in Chapter 6, with the linear models
 replaced here by decision trees --- the code for validation
 is almost identical. This approach leads to correct predictions
 for 68.5% of the locations in the test data set.
@@ -431,7 +431,7 @@ feature_imp.sort_values(by='importance', ascending=False)
 ```
 This
 is a relative measure of the total decrease in node impurity that results from
-splits over that variable, averaged over all trees (this was plotted in Figure~\ref{Ch8:fig:varimp} for a model fit to the `Heart` data). 
+splits over that variable, averaged over all trees (this was plotted in Figure 8.9 for a model fit to the `Heart` data). 

 The results indicate that across all of the trees considered in the
 random forest, the wealth level of the community (`lstat`) and the
@@ -495,7 +495,7 @@ np.mean((y_test - y_hat_boost)**2)
 The test MSE obtained is 14.48,
 similar to the test MSE for bagging. If we want to, we can
 perform boosting with a different value of the shrinkage parameter
-$\lambda$ in  (\ref{Ch8:alphaboost}). The default value is 0.001, but
+$\lambda$ in  (8.10). The default value is 0.001, but
 this is easily modified.  Here we take $\lambda=0.2$.

 ```{python}
--- a/Ch08-baggboost-lab.ipynb
+++ b/Ch08-baggboost-lab.ipynb
--- a/Ch09-svm-lab.Rmd
+++ b/Ch09-svm-lab.Rmd
@@ -255,9 +255,9 @@ kernel we use `kernel="poly"`, and to fit an SVM with a
 radial kernel  we use
 `kernel="rbf"`.  In the former case we also use the
 `degree` argument to specify a degree for the polynomial kernel
-(this is $d$ in (\ref{Ch9:eq:polyd})), and in the latter case we use
+(this is $d$ in (9.22)), and in the latter case we use
 `gamma` to specify a value of $\gamma$ for the radial basis
-kernel  (\ref{Ch9:eq:radial}).
+kernel  (9.24).

 We first generate some data with a non-linear class boundary, as follows:

@@ -377,7 +377,7 @@ classifier, the fitted value for an observation $X= (X_1, X_2, \ldots,
 X_p)^T$ takes the form $\hat{\beta}_0 + \hat{\beta}_1 X_1 +
 \hat{\beta}_2 X_2 + \ldots + \hat{\beta}_p X_p$. For an SVM with a
 non-linear kernel, the equation that yields the fitted value is given
-in  (\ref{Ch9:eq:svmip}). The sign of the fitted value
+in  (9.23). The sign of the fitted value
 determines on which side of the decision boundary the observation
 lies. Therefore, the relationship between the fitted value and the
 class prediction for a given observation is simple: if the fitted
--- a/Ch09-svm-lab.ipynb
+++ b/Ch09-svm-lab.ipynb
--- a/Ch10-deeplearning-lab.Rmd
+++ b/Ch10-deeplearning-lab.Rmd
@@ -1,5 +1,4 @@

-
 # Deep Learning

 <a target="_blank" href="https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2.1/Ch10-deeplearning-lab.ipynb">
@@ -21,7 +20,8 @@ Much of our code is adapted from there, as well as the `pytorch_lightning` docum
 We start with several standard imports that we have  seen before.

 ```{python}
-import numpy as np, pandas as pd
+import numpy as np
+import pandas as pd
 from matplotlib.pyplot import subplots
 from sklearn.linear_model import \
     (LinearRegression,
@@ -57,7 +57,7 @@ the `torchmetrics` package has utilities to compute
 various metrics to evaluate performance when fitting
 a model. The `torchinfo` package provides a useful
 summary of the layers of a model. We use the `read_image()`
-function when loading test images in Section~\ref{Ch13-deeplearning-lab:using-pretrained-cnn-models}.
+function when loading test images in Section 10.9.4.

 If you have not already installed the packages `torchvision`
 and `torchinfo` you can install them by running
@@ -153,17 +153,19 @@ in our example applying the `ResNet50` model
 to some of our own images.
 The `json` module will be used to load
 a JSON file for looking up classes to identify the labels of the
-pictures in the `ResNet50` example.
+pictures in the `ResNet50` example. We'll also import `warnings` to filter
+out warnings when fitting the LASSO to the IMDB data.

 ```{python}
 from glob import glob
 import json
+import warnings

 ```


 ## Single Layer Network on Hitters Data
-We start by fitting the models in Section~\ref{Ch13:sec:when-use-deep} on the `Hitters` data.
+We start by fitting the models in Section 10.6 on the `Hitters` data.

 ```{python}
 Hitters = load_data('Hitters').dropna()
@@ -217,7 +219,7 @@ np.abs(Yhat_test - Y_test).mean()

 Next we fit the lasso using `sklearn`. We are using
 mean absolute error to select and evaluate a model, rather than mean squared error.
-The specialized solver we used in Section~\ref{Ch6-varselect-lab:lab-2-ridge-regression-and-the-lasso} uses only mean squared error. So here, with a bit more work,  we create a cross-validation grid and perform the cross-validation directly.  
+The specialized solver we used in Section 6.5.2 uses only mean squared error. So here, with a bit more work,  we create a cross-validation grid and perform the cross-validation directly.  

 We encode a pipeline with two steps: we first normalize the features using a `StandardScaler()` transform,
 and then fit the lasso without further normalization. 
@@ -439,7 +441,7 @@ hit_module = SimpleModule.regression(hit_model,
 ```

 By using the `SimpleModule.regression()` method,  we indicate that we will use squared-error loss as in
-(\ref{Ch13:eq:4}).
+(10.23).
 We have also asked for mean absolute error to be tracked as well
 in the metrics that are logged.

@@ -476,7 +478,7 @@ hit_trainer = Trainer(deterministic=True,
 hit_trainer.fit(hit_module, datamodule=hit_dm)
 ```
 At each step of SGD, the algorithm randomly selects 32 training observations for
-the computation of the gradient. Recall from Section~\ref{Ch13:sec:fitt-neur-netw}
+the computation of the gradient. Recall from Section 10.7
 that an epoch amounts to the number of SGD steps required to process $n$
 observations. Since the training set has
 $n=175$, and we specified a `batch_size` of 32 in the construction of  `hit_dm`, an epoch is $175/32=5.5$ SGD steps.
@@ -765,8 +767,8 @@ mnist_trainer.test(mnist_module,
                   datamodule=mnist_dm)
 ```

-Table~\ref{Ch13:tab:mnist} also reports the error rates resulting from LDA (Chapter~\ref{Ch4:classification}) and multiclass logistic
-regression. For LDA we refer the reader to Section~\ref{Ch4-classification-lab:linear-discriminant-analysis}.
+Table 10.1 also reports the error rates resulting from LDA (Chapter 4) and multiclass logistic
+regression. For LDA we refer the reader to Section 4.7.3.
 Although we could use the `sklearn` function `LogisticRegression()` to fit  
 multiclass logistic regression, we are set up here to fit such a model
 with `torch`.
@@ -871,7 +873,7 @@ for idx, (X_ ,Y_) in enumerate(cifar_dm.train_dataloader()):


 Before we start, we look at some of the training images; similar code produced
-Figure~\ref{Ch13:fig:cifar100} on page \pageref{Ch13:fig:cifar100}. The example below also illustrates
+Figure 10.5 on page 406. The example below also illustrates
 that `TensorDataset` objects can be indexed with integers --- we are choosing
 random images from the training data by indexing `cifar_train`. In order to display correctly,
 we must reorder the dimensions by a call to `np.transpose()`.
@@ -894,7 +896,7 @@ for i in range(5):
 Here the `imshow()` method recognizes from the shape of its argument that it is a 3-dimensional array, with the last dimension indexing the three RGB color channels.

 We specify a moderately-sized  CNN for
-demonstration purposes, similar in structure to Figure~\ref{Ch13:fig:DeepCNN}.
+demonstration purposes, similar in structure to Figure 10.8.
 We use several layers, each consisting of  convolution, ReLU, and max-pooling steps.
 We first define a module that defines one of these layers. As in our
 previous examples, we overwrite the `__init__()` and `forward()` methods
@@ -1034,7 +1036,7 @@ summary_plot(cifar_results,
             ax,
             col='accuracy',
             ylabel='Accuracy')
-ax.set_xticks(np.linspace(0, 10, 6).astype(int))
+ax.set_xticks(np.linspace(0, 30, 7).astype(int))
 ax.set_ylabel('Accuracy')
 ax.set_ylim([0, 1]);
 ```
@@ -1083,7 +1085,7 @@ clauses; if it works, we get the speedup, if it fails, nothing happens.

 ## Using Pretrained CNN Models
 We now show how to use a CNN pretrained on the  `imagenet` database to classify natural
-images, and demonstrate how we produced Figure~\ref{Ch13:fig:homeimages}.
+images, and demonstrate how we produced Figure 10.10.
 We copied six JPEG images from a digital photo album into the
 directory `book_images`. These images are available
 from the data section of  <www.statlearning.com>, the ISLP book website. Download `book_images.zip`; when
@@ -1192,7 +1194,7 @@ del(cifar_test,


 ## IMDB Document Classification
-We now implement models for sentiment classification (Section~\ref{Ch13:sec:docum-class})  on the `IMDB`
+We now implement models for sentiment classification (Section 10.4)  on the `IMDB`
 dataset. As mentioned above code block~8, we are using
 a preprocessed version of the `IMDB` dataset found in the
 `keras` package. As `keras` uses `tensorflow`, a different
@@ -1346,7 +1348,7 @@ matrix that is recognized by `sklearn.`
 ```

 Similar to what we did in
-Section~\ref{Ch13-deeplearning-lab:single-layer-network-on-hitters-data},
+Section 10.9.1,
 we construct a series of 50 values for the lasso reguralization parameter $\lambda$.

 ```{python}
@@ -1369,16 +1371,20 @@ logit = LogisticRegression(penalty='l1',

 ```
 The path of 50 values takes approximately 40 seconds to run.
+As in Chapter 6, we will filter out warnings, this time using a context manager.

 ```{python}
-coefs = []
-intercepts = []
+with warnings.catch_warnings():
+    warnings.simplefilter("ignore")
+    
+    coefs = []
+    intercepts = []

-for l in lam_val:
-    logit.C = 1/l
-    logit.fit(X_train, Y_train)
-    coefs.append(logit.coef_.copy())
-    intercepts.append(logit.intercept_)
+    for l in lam_val:
+        logit.C = 1/l
+        logit.fit(X_train, Y_train)
+        coefs.append(logit.coef_.copy())
+        intercepts.append(logit.intercept_)

 ```

@@ -1454,16 +1460,16 @@ del(imdb_model,

 ## Recurrent Neural Networks
 In this lab we fit the models illustrated in
-Section~\ref{Ch13:sec:recurr-neur-netw}.
+Section 10.5.


 ### Sequential Models for Document Classification
 Here we  fit a simple  LSTM RNN for sentiment prediction to
-the `IMDb` movie-review data, as discussed in Section~\ref{Ch13:sec:sequ-models-docum}.
+the `IMDb` movie-review data, as discussed in Section 10.5.1.
 For an RNN we use the sequence of words in a document, taking their
 order into account. We loaded the preprocessed 
 data at the beginning of
-Section~\ref{Ch13-deeplearning-lab:imdb-document-classification}.
+Section 10.9.5.
 A script that details the preprocessing can be found in the
 `ISLP` library. Notably, since more than 90% of the documents
 had fewer than 500 words, we set the document length to 500. For
@@ -1578,7 +1584,7 @@ del(lstm_model,


 ### Time Series Prediction
-We now show how to fit the models in Section~\ref{Ch13:sec:time-seri-pred}
+We now show how to fit the models in Section 10.5.2
 for  time series prediction.
 We first load and standardize the data.

--- a/Ch10-deeplearning-lab.ipynb
+++ b/Ch10-deeplearning-lab.ipynb
--- a/Ch11-surv-lab.Rmd
+++ b/Ch11-surv-lab.Rmd
@@ -10,9 +10,9 @@


 In this lab, we perform survival analyses on three separate data
-sets. In  Section~\ref{brain.cancer.sec} we analyze the  `BrainCancer` 
-data  that was first described in Section~\ref{sec:KM}. In Section~\ref{time.to.pub.sec}, we examine the  `Publication` 
-data  from Section~\ref{sec:pub}. Finally,  Section~\ref{call.center.sec}  explores
+sets. In  Section 11.8.1 we analyze the  `BrainCancer` 
+data  that was first described in Section 11.3. In Section 11.8.2, we examine the  `Publication` 
+data  from Section 11.5.4. Finally,  Section 11.8.3  explores
 a simulated call-center data set.

 We begin by importing some of our libraries at this top
@@ -82,7 +82,7 @@ observation. But some scientists might use the opposite coding. For
 the  `BrainCancer`  data set 35 patients died before the end of
 the study, so we are using the conventional coding.

-To begin the analysis, we re-create  the Kaplan-Meier survival curve shown in Figure~\ref{fig:survbrain}. The main
+To begin the analysis, we re-create  the Kaplan-Meier survival curve shown in Figure 11.2. The main
 package we will use for survival analysis
 is `lifelines`.
 The variable  `time`  corresponds to $y_i$, the time to the $i$th event (either censoring or
@@ -102,7 +102,7 @@ km_brain.plot(label='Kaplan Meier estimate', ax=ax)
 ```

 Next we create Kaplan-Meier survival curves that are stratified by
-`sex`, in order to reproduce  Figure~\ref{fig:survbrain2}.
+`sex`, in order to reproduce  Figure 11.3.
 We  do this using the `groupby()` method of  a  dataframe.
 This  method returns a generator that can
 be  iterated over in the `for` loop. In this case,
@@ -130,7 +130,7 @@ for sex, df in BrainCancer.groupby('sex'):

 ```

-As discussed in Section~\ref{sec:logrank}, we can perform a
+As discussed in Section 11.4, we can perform a
 log-rank test to compare the survival of males to females. We use
 the `logrank_test()` function from the `lifelines.statistics` module.
 The first two arguments are the event times, with the second
@@ -266,9 +266,9 @@ predicted_survival.plot(ax=ax);


 ## Publication Data
-The  `Publication`  data   presented in Section~\ref{sec:pub}  can be
+The  `Publication`  data   presented in Section 11.5.4  can be
 found in the `ISLP` package.
-We first reproduce Figure~\ref{fig:lauersurv}  by plotting the Kaplan-Meier curves
+We first reproduce Figure 11.5  by plotting the Kaplan-Meier curves
 stratified on the  `posres`  variable, which records whether the
 study had a positive or negative result.

@@ -323,7 +323,7 @@ of the study, and whether the study had positive or negative results.

 In this section, we will simulate survival data using the relationship
 between cumulative hazard and
-the survival function explored in Exercise \ref{ex:all3}.
+the survival function explored in Exercise 8.
 Our simulated data will represent the observed
 wait times (in seconds) for 2,000 customers who have phoned a call
 center.  In this context, censoring occurs if a customer hangs up
@@ -394,7 +394,7 @@ that the risk of a call being answered at Center B is 0.74 times the
 risk that it will be answered at Center A; in other words, the wait
 times are a bit longer at Center B.

-Recall from Section~\ref{Ch2-statlearn-lab:loading-data} the use of `lambda`
+Recall from Section 2.3.7 the use of `lambda`
 for creating short functions on the fly.
 We use the function
 `sim_time()` from the `ISLP.survival` package. This function
--- a/Ch11-surv-lab.ipynb
+++ b/Ch11-surv-lab.ipynb
--- a/Ch12-unsup-lab.Rmd
+++ b/Ch12-unsup-lab.Rmd
@@ -164,7 +164,7 @@ for k in range(pcaUS.components_.shape[1]):
            USArrests.columns[k])

 ```
-Notice that this figure is a reflection of Figure~\ref{Ch10:fig:USArrests:obs} through the $y$-axis. Recall that the
+Notice that this figure is a reflection of Figure 12.1 through the $y$-axis. Recall that the
 principal components are only unique up to a sign change, so we can
 reproduce that figure by flipping the
 signs of the second set of scores and loadings.
@@ -241,7 +241,7 @@ ax.set_xticks(ticks)
 fig

 ```
-The result is similar to that shown in Figure~\ref{Ch10:fig:USArrests:scree}.  Note
+The result is similar to that shown in Figure 12.3.  Note
 that the method `cumsum()`   computes the cumulative sum of
 the elements of a numeric vector. For instance:

@@ -253,15 +253,15 @@ np.cumsum(a)
 ## Matrix Completion

 We now re-create the analysis carried out on the `USArrests` data in
-Section~\ref{Ch10:sec:princ-comp-with}.
+Section 12.3.

-We saw  in Section~\ref{ch10:sec2.2}  that solving the optimization
-problem~(\ref{Ch10:eq:mc2})   on a centered data matrix $\bf X$ is
+We saw  in Section 12.2.2  that solving the optimization
+problem~(12.6)   on a centered data matrix $\bf X$ is
 equivalent to computing the first $M$ principal
 components of the data.  We use our scaled
 and centered `USArrests` data as $\bf X$ below. The *singular value decomposition* 
 (SVD)  is a general algorithm for solving
-(\ref{Ch10:eq:mc2}). 
+(12.6). 

 ```{python}
 X = USArrests_scaled
@@ -319,9 +319,9 @@ Here the array `r_idx`
 contains 20 integers from 0 to 49; this represents the states (rows of `X`) that are selected to contain missing values. And `c_idx` contains
 20 integers from 0 to 3, representing the features (columns in `X`) that contain the missing values for each of the selected states.

-We now write some code to implement Algorithm~\ref{Ch10:alg:hardimpute}. 
+We now write some code to implement Algorithm 12.1. 
 We first write a  function that takes in a matrix, and returns an approximation to the matrix using the `svd()` function.
-This will be needed in Step 2 of Algorithm~\ref{Ch10:alg:hardimpute}.
+This will be needed in Step 2 of Algorithm 12.1.

 ```{python}
 def low_rank(X, M=1):
@@ -330,7 +330,7 @@ def low_rank(X, M=1):
    return L.dot(V[:M])

 ```
-To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm~\ref{Ch10:alg:hardimpute} ---  by replacing
+To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm 12.1 ---  by replacing
 the missing values with the column means of the non-missing entries. These are stored in
 `Xbar` below after running `np.nanmean()` over the row axis.
 We make a copy so that when we assign values to `Xhat` below we do not also overwrite the
@@ -360,11 +360,11 @@ a given element is `True` if the corresponding matrix element is missing. The no
 because it allows us to access both the missing and non-missing entries. We store the mean of the squared non-missing elements in `mss0`.
 We store the mean squared error  of the non-missing elements  of the old version of `Xhat` in `mssold` (which currently
 agrees with `mss0`). We plan to store the mean squared error of the non-missing elements of the current version of `Xhat` in `mss`, and will then
-iterate Step 2 of  Algorithm~\ref{Ch10:alg:hardimpute}  until the *relative error*, defined as
+iterate Step 2 of  Algorithm 12.1  until the *relative error*, defined as
 `(mssold - mss) / mss0`, falls below `thresh = 1e-7`.
- {Algorithm~\ref{Ch10:alg:hardimpute} tells us to iterate Step 2 until \eqref{Ch10:eq:mc6} is no longer decreasing. Determining whether \eqref{Ch10:eq:mc6}  is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm~\ref{Ch10:alg:hardimpute} to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}
+ {Algorithm 12.1 tells us to iterate Step 2 until 12.14 is no longer decreasing. Determining whether 12.14  is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm 12.1 to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}

-In Step 2(a) of Algorithm~\ref{Ch10:alg:hardimpute}, we  approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we  use `Xapp`  to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:
+In Step 2(a) of Algorithm 12.1, we  approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we  use `Xapp`  to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:

 ```{python}
 while rel_err > thresh:
@@ -393,7 +393,7 @@ np.corrcoef(Xapp[ismiss], X[ismiss])[0,1]
 ```


-In this lab, we implemented  Algorithm~\ref{Ch10:alg:hardimpute}  ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.
+In this lab, we implemented  Algorithm 12.1  ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.


 ## Clustering
@@ -464,7 +464,7 @@ We have used the `n_init` argument to run the $K$-means with 20
 initial cluster assignments (the default is 10). If a
 value of `n_init` greater than one is used, then $K$-means
 clustering will be performed using multiple random assignments in
-Step 1 of  Algorithm~\ref{Ch10:alg:km}, and the `KMeans()` 
+Step 1 of  Algorithm 12.2, and the `KMeans()` 
 function will report only the best results. Here we compare using
 `n_init=1` to `n_init=20`.

@@ -480,7 +480,7 @@ kmeans1.inertia_, kmeans20.inertia_
 ```
 Note that `kmeans.inertia_` is the total within-cluster sum
 of squares, which we seek to minimize by performing $K$-means
-clustering \eqref{Ch10:eq:kmeans}. 
+clustering 12.17. 

 We *strongly* recommend always running $K$-means clustering with
 a large value of `n_init`, such as 20 or 50, since otherwise an
@@ -846,7 +846,7 @@ results in four distinct clusters. It is easy to verify that the
 resulting clusters are the same as the ones we obtained in
 `comp_cut`.

-We claimed earlier in Section~\ref{Ch10:subsec:hc} that
+We claimed earlier in Section 12.4.2 that
 $K$-means clustering and hierarchical clustering with the dendrogram
 cut to obtain the same number of clusters can yield very different
 results.  How do these `NCI60` hierarchical clustering results compare
--- a/Ch12-unsup-lab.ipynb
+++ b/Ch12-unsup-lab.ipynb
--- a/Ch13-multiple-lab.Rmd
+++ b/Ch13-multiple-lab.Rmd
@@ -91,7 +91,7 @@ truth = pd.Categorical(true_mean == 0,

 ```
 Since this is a simulated data set, we can create a $2 \times 2$ table
-similar to  Table~\ref{Ch12:tab-hypotheses}.
+similar to  Table 13.2.

 ```{python}
 pd.crosstab(decision,
@@ -102,7 +102,7 @@ pd.crosstab(decision,
 ```
 Therefore, at level $\alpha=0.05$, we reject 15 of the 50 false
 null hypotheses, and we incorrectly reject 5 of the true null
-hypotheses. Using the notation from Section~\ref{sec:fwer}, we have
+hypotheses. Using the notation from Section 13.3, we have
 $V=5$, $S=15$, $U=45$ and $W=35$.
 We have set $\alpha=0.05$, which means that we expect to reject around
 5% of the true null hypotheses. This is in line with the $2 \times 2$
@@ -140,12 +140,12 @@ pd.crosstab(decision,


 ## Family-Wise Error Rate
-Recall from  \eqref{eq:FWER.indep}  that if the null hypothesis is true
+Recall from  13.5  that if the null hypothesis is true
 for each of $m$ independent hypothesis tests, then the FWER is equal
 to $1-(1-\alpha)^m$. We can use this expression to compute the FWER
 for $m=1,\ldots, 500$ and $\alpha=0.05$, $0.01$, and $0.001$.
 We plot the FWER for these values of $\alpha$ in order to
-reproduce  Figure~\ref{Ch12:fwer}.
+reproduce  Figure 13.2.

 ```{python}
 m = np.linspace(1, 501)
@@ -263,7 +263,7 @@ However, we decided to perform this test only after examining the data
 and noting that Managers One and Two had the highest and lowest mean
 performances.  In a sense, this means that we have implicitly
 performed ${5 \choose 2} = 5(5-1)/2=10$ hypothesis tests, rather than
-just one, as discussed in  Section~\ref{tukey.sec}.  Hence, we use the
+just one, as discussed in  Section 13.3.2.  Hence, we use the
 `pairwise_tukeyhsd()`  function from
 `statsmodels.stats.multicomp` to apply Tukey’s method
  in order to adjust for multiple testing.  This function takes
@@ -350,7 +350,7 @@ null hypotheses!
 ```


-Figure~\ref{Ch12:fig:BonferroniBenjamini} displays the ordered
+Figure 13.6 displays the ordered
 $p$-values, $p_{(1)} \leq p_{(2)} \leq \cdots \leq p_{(2000)}$, for
 the  `Fund`  dataset, as well as the threshold for rejection by the
 Benjamini--Hochberg procedure.  Recall that the Benjamini--Hochberg
@@ -379,7 +379,7 @@ else:

 ```

-We now reproduce  the middle panel of Figure~\ref{Ch12:fig:BonferroniBenjamini}.
+We now reproduce  the middle panel of Figure 13.6.

 ```{python}
 fig, ax = plt.subplots()
@@ -398,7 +398,7 @@ ax.axline((0, 0), (1,q/m), c='k', ls='--', linewidth=3);
 ## A Re-Sampling Approach
 Here, we implement the re-sampling approach to hypothesis testing
 using the  `Khan`  dataset, which we investigated in
-Section~\ref{sec:permutations}.  First, we merge the training and
+Section 13.5.  First, we merge the training and
 testing data, which results in observations on 83 patients for
 2,308 genes.

@@ -464,7 +464,7 @@ for b in range(B):
 This fraction, 0.0398,
 is our re-sampling-based $p$-value.
 It is almost identical to the $p$-value of  0.0412 obtained using the theoretical null distribution.
-We can plot  a histogram of the re-sampling-based test statistics in order to reproduce  Figure~\ref{Ch12:fig-permp-1}.
+We can plot  a histogram of the re-sampling-based test statistics in order to reproduce  Figure 13.7.

 ```{python}
 fig, ax = plt.subplots(figsize=(8,8))
@@ -487,7 +487,7 @@ ax.set_xlabel("Null Distribution of Test Statistic");
 The re-sampling-based null distribution is almost identical to the theoretical null distribution, which is displayed in red.

 Finally, we implement the plug-in re-sampling FDR approach outlined in
-Algorithm~\ref{Ch12:alg-plugin-fdr}. Depending on the speed of your
+Algorithm 13.4. Depending on the speed of your
 computer, calculating the FDR for all 2,308 genes in the `Khan`
 dataset may take a while.  Hence, we will illustrate the approach on a
 random subset of 100 genes.  For each gene, we first compute the
@@ -520,7 +520,7 @@ for j in range(m):
 Next, we compute the number of rejected null hypotheses $R$, the
 estimated number of false positives $\widehat{V}$, and the estimated
 FDR, for a range of threshold values $c$ in
-Algorithm~\ref{Ch12:alg-plugin-fdr}. The threshold values are chosen
+Algorithm 13.4. The threshold values are chosen
 using the absolute values of the test statistics from the 100 genes.

 ```{python}
@@ -559,8 +559,8 @@ sorted(idx[np.abs(T_vals) >= cutoffs[FDRs < 0.2].min()])

 ```

-The next line  generates  Figure~\ref{fig:labfdr}, which is similar
-to  Figure~\ref{Ch12:fig-plugin-fdr},
+The next line  generates  Figure 13.11, which is similar
+to  Figure 13.9,
 except that it is based on only  a subset of the genes.

 ```{python}
--- a/Ch13-multiple-lab.ipynb
+++ b/Ch13-multiple-lab.ipynb
--- a/setup_notebook_env.py
+++ b/setup_notebook_env.py
@@ -23,7 +23,8 @@ def setup_env(outdir,
              uv_executable,
              timeout,
              kernel,
-              allow_errors):
+              allow_errors,
+              overwrite):
    """
    Sets up a student environment for ISLP_labs.

@@ -45,6 +46,8 @@ def setup_env(outdir,
        Kernel to use for running notebooks.
    allow_errors : bool
        Allow errors when running notebooks.
+    overwrite: bool
+        Save notebook outputs in place?
    """
    repo_url = 'https://github.com/intro-stat-learning/ISLP_labs.git'

@@ -110,7 +113,8 @@ def setup_env(outdir,
                    if kernel:
                        nbconvert_command.extend(['--kernel', kernel])
                    nbconvert_command.append('--allow-errors')
-
+                    if overwrite:
+                        nbconvert_command.append('--inplace')
                    run_command(nbconvert_command, cwd=str(outdir))
            else:
                run_command([str(uv_bin / 'pip'), 'install', 'pytest', 'nbmake'], cwd=str(outdir))
@@ -128,6 +132,9 @@ def setup_env(outdir,
                                      str(nbfile)]
                    if kernel:
                        pytest_command.append(f'--nbmake-kernel={kernel}')
+                    if overwrite:
+                        pytest_command.append('--overwrite')
+                        
                    # nbmake does not have --allow-errors in the same way as nbconvert
                    # If errors are not allowed, nbmake will fail on first error naturally

@@ -152,7 +159,8 @@ def main():
    parser.add_argument('--uv-executable', default='uv', help='The `uv` executable to use (default: "uv")')
    parser.add_argument('--timeout', type=int, default=3600, help='Timeout for running notebooks (default: 3600)')
    parser.add_argument('--kernel', default=None, help='Kernel to use for running notebooks')
-    parser.add_argument('--allow-errors', action='store_true', help='Allow errors when running notebooks. For older commits, Ch02-statlearn-lab.ipynb raises exceptions. Since v2.2.1 these are handled by appropriate tags in the .ipynb cells.')
+    parser.add_argument('--allow-errors', action='store_true', default=False, help='Allow errors when running notebooks. For older commits, Ch02-statlearn-lab.ipynb raises exceptions. Since v2.2.1 these are handled by appropriate tags in the .ipynb cells.')
+    parser.add_argument('--overwrite', action='store_true', default=False, help='If True, store output of cells in notebook(s).')

    parser.add_argument('nbfiles', nargs='*', help='Optional list of notebooks to run.')

@@ -165,7 +173,8 @@ def main():
              args.uv_executable,
              args.timeout,
              args.kernel,
-              args.allow_errors)
+              args.allow_errors,
+              args.overwrite)

 if __name__ == '__main__':
    main()