pairing notebooks

2023-08-20 19:41:01 -07:00
parent c82e9d5067
commit 058e89ef1c
22 changed files with 489 additions and 346 deletions
--- a/Ch06-varselect-lab.Rmd
+++ b/Ch06-varselect-lab.Rmd
@@ -1,3 +1,16 @@
+---
+jupyter:
+  jupytext:
+    cell_metadata_filter: -all
+    formats: ipynb,Rmd
+    main_language: python
+    text_representation:
+      extension: .Rmd
+      format_name: rmarkdown
+      format_version: '1.2'
+      jupytext_version: 1.14.7
+---
+

 # Chapter 6

@@ -32,7 +45,7 @@ from ISLP.models import \
     (Stepwise,
      sklearn_selected,
      sklearn_selection_path)
-!pip install l0bnb
+# !pip install l0bnb
 from l0bnb import fit_path

 ```
@@ -61,7 +74,7 @@ Hitters = load_data('Hitters')
 np.isnan(Hitters['Salary']).sum()

 ```
-    
+
 We see that `Salary` is missing for 59 players. The
 `dropna()`  method of data frames removes all of the rows that have missing
 values in any variable (by default --- see  `Hitters.dropna?`).
@@ -71,8 +84,8 @@ Hitters = Hitters.dropna();
 Hitters.shape

 ```
-    
- 
+
+
 We first choose the best model using forward selection based on $C_p$ (6.2). This score
 is not built in as a metric to `sklearn`. We therefore define a function to compute it ourselves, and use
 it as a scorer. By default, `sklearn` tries to maximize a score, hence
@@ -106,7 +119,7 @@ neg_Cp = partial(nCp, sigma2)
 ```
 We can now use `neg_Cp()` as a scorer for model selection.

- 
+

 Along with a score we need to specify the search strategy. This is done through the object
 `Stepwise()`  in the `ISLP.models` package. The method `Stepwise.first_peak()`
@@ -120,7 +133,7 @@ strategy = Stepwise.first_peak(design,
                               max_terms=len(design.terms))

 ```
- 
+
 We now fit a linear regression model with `Salary` as outcome using forward
 selection. To do so, we use the function `sklearn_selected()`  from the `ISLP.models` package. This takes
 a model from `statsmodels` along with a search strategy and selects a model with its
@@ -134,7 +147,7 @@ hitters_MSE.fit(Hitters, Y)
 hitters_MSE.selected_state_

 ```
- 
+
 Using `neg_Cp` results in a smaller model, as expected, with just 10 variables selected.

 ```{python}
@@ -145,7 +158,7 @@ hitters_Cp.fit(Hitters, Y)
 hitters_Cp.selected_state_

 ```
- 
+
 ### Choosing Among Models Using the Validation Set Approach and Cross-Validation
 
 As an  alternative to using $C_p$, we might try cross-validation to select a model in forward selection. For this, we need a
@@ -167,7 +180,7 @@ strategy = Stepwise.fixed_steps(design,
 full_path = sklearn_selection_path(OLS, strategy)

 ```
- 
+
 We now fit the full forward-selection path on the `Hitters` data and compute the fitted values.

 ```{python}
@@ -176,8 +189,8 @@ Yhat_in = full_path.predict(Hitters)
 Yhat_in.shape

 ```
-    
- 
+
+
 This gives us an array of fitted values --- 20 steps in all, including the fitted mean for the null model --- which we can use to evaluate
 in-sample MSE. As expected, the in-sample MSE improves each step we take,
 indicating we must use either the validation or cross-validation
@@ -266,7 +279,7 @@ ax.legend()
 mse_fig

 ```
- 
+
 To repeat the above using the validation set approach, we simply change our
 `cv` argument to a validation set: one random split of the data into a test and training. We choose a test size
 of 20%, similar to the size of each test set in 5-fold cross-validation.`skm.ShuffleSplit()`
@@ -296,7 +309,7 @@ ax.legend()
 mse_fig

 ```
- 
+

 ### Best Subset Selection
 Forward stepwise is a *greedy* selection procedure; at each step it augments the current set by including one additional variable.  We now apply best subset selection  to the  `Hitters` 
@@ -324,7 +337,7 @@ path = fit_path(X,
                max_nonzeros=X.shape[1])

 ```
- 
+
 The function `fit_path()` returns a list whose values include the fitted coefficients as `B`, an intercept as `B0`, as well as a few other attributes related to the particular path algorithm used. Such details are beyond the scope of this book.

 ```{python}
@@ -392,7 +405,7 @@ soln_path.index.name = 'negative log(lambda)'
 soln_path

 ```
- 
+
 We plot the paths to get a sense of how the coefficients vary with $\lambda$.
 To control the location of the legend we first set `legend` to `False` in the
 plot method, adding it afterward with the `legend()` method of `ax`.
@@ -416,14 +429,14 @@ beta_hat = soln_path.loc[soln_path.index[39]]
 lambdas[39], beta_hat

 ```
- 
+
 Let’s compute the $\ell_2$ norm of the standardized coefficients.

 ```{python}
 np.linalg.norm(beta_hat)

 ```
-    
+
 In contrast, here is the $\ell_2$ norm when $\lambda$ is 2.44e-01.
 Note the much larger $\ell_2$ norm of the
 coefficients associated with this smaller value of $\lambda$.
@@ -477,7 +490,7 @@ results = skm.cross_validate(ridge,
 -results['test_score']

 ```
-    
+
 The test MSE is 1.342e+05.  Note
 that if we had instead simply fit a model with just an intercept, we
 would have predicted each test observation using the mean of the
@@ -514,7 +527,7 @@ grid.best_params_['ridge__alpha']
 grid.best_estimator_

 ```
- 
+
 Alternatively, we can use 5-fold cross-validation.

 ```{python}
@@ -540,7 +553,7 @@ ax.set_xlabel('$-\log(\lambda)$', fontsize=20)
 ax.set_ylabel('Cross-validated MSE', fontsize=20);

 ```
- 
+
 One can cross-validate different metrics to choose a parameter. The default
 metric for `skl.ElasticNet()` is test $R^2$.
 Let’s compare $R^2$ to MSE for cross-validation here.
@@ -552,7 +565,7 @@ grid_r2 = skm.GridSearchCV(pipe,
 grid_r2.fit(X, Y)

 ```
- 
+
 Finally, let’s plot the results for cross-validated $R^2$.

 ```{python}
@@ -564,7 +577,7 @@ ax.set_xlabel('$-\log(\lambda)$', fontsize=20)
 ax.set_ylabel('Cross-validated $R^2$', fontsize=20);

 ```
- 
+

 ### Fast Cross-Validation for Solution Paths
 The ridge, lasso, and elastic net can be efficiently fit along a sequence of $\lambda$ values, creating what is known as a *solution path* or *regularization path*. Hence there is specialized code to fit
@@ -584,7 +597,7 @@ pipeCV = Pipeline(steps=[('scaler', scaler),
 pipeCV.fit(X, Y)

 ```
- 
+
 Let’s produce a plot again of the cross-validation error to see that
 it is similar to using `skm.GridSearchCV`.

@@ -600,7 +613,7 @@ ax.set_xlabel('$-\log(\lambda)$', fontsize=20)
 ax.set_ylabel('Cross-validated MSE', fontsize=20);

 ```
- 
+
 We see that the value of $\lambda$ that results in the
 smallest cross-validation error is 1.19e-02, available
 as the value `tuned_ridge.alpha_`. What is the test MSE
@@ -610,7 +623,7 @@ associated with this value of $\lambda$?
 np.min(tuned_ridge.mse_path_.mean(1))

 ```
-    
+
 This represents a further improvement over the test MSE that we got
 using $\lambda=4$.  Finally, `tuned_ridge.coef_`
 has the coefficients fit on the entire data set
@@ -666,7 +679,7 @@ results = skm.cross_validate(pipeCV,

 ```
    
- 
+

 ### The Lasso
 We saw that ridge regression with a wise choice of $\lambda$ can
@@ -721,7 +734,7 @@ regression (page 305) with $\lambda$ chosen by cross-validation.
 np.min(tuned_lasso.mse_path_.mean(1))

 ```
-    
+
 Let’s again produce a plot of the cross-validation error.


@@ -746,7 +759,7 @@ variables.
 tuned_lasso.coef_

 ```
- 
+
 As in ridge regression, we could evaluate the test error
 of cross-validated lasso by first splitting into
 test and training sets and internally running
@@ -757,7 +770,7 @@ this as an exercise.
 ## PCR and PLS Regression

 ### Principal Components Regression
- 
+

 Principal components regression (PCR) can be performed using
 `PCA()`  from the `sklearn.decomposition`
@@ -778,7 +791,7 @@ pipe.fit(X, Y)
 pipe.named_steps['linreg'].coef_

 ```
- 
+
 When performing PCA, the results vary depending
 on whether the data has been *standardized* or not.
 As in the earlier examples, this can be accomplished
@@ -792,7 +805,7 @@ pipe.fit(X, Y)
 pipe.named_steps['linreg'].coef_

 ```
- 
+
 We can of course use CV to choose the number of components, by
 using `skm.GridSearchCV`, in this
 case fixing the parameters to vary the
@@ -807,7 +820,7 @@ grid = skm.GridSearchCV(pipe,
 grid.fit(X, Y)

 ```
- 
+
 Let’s plot the results as we have for other methods.

 ```{python}
@@ -822,7 +835,7 @@ ax.set_xticks(n_comp[::2])
 ax.set_ylim([50000,250000]);

 ```
- 
+
 We see that the smallest cross-validation error occurs when
 17
 components are used. However, from the plot we also see that the
@@ -846,8 +859,8 @@ cv_null = skm.cross_validate(linreg,
 -cv_null['test_score'].mean()

 ```
-    
- 
+
+
 The `explained_variance_ratio_`
 attribute of our `PCA` object provides the *percentage of variance explained* in the predictors and in the response using
 different numbers of components. This concept is discussed in greater
@@ -857,7 +870,7 @@ detail in Section 12.2.
 pipe.named_steps['pca'].explained_variance_ratio_

 ```
-    
+
 Briefly, we can think of
 this as the amount of information about the predictors
 that is captured using $M$ principal components. For example, setting
@@ -880,7 +893,7 @@ pls = PLSRegression(n_components=2,
 pls.fit(X, Y)

 ```
- 
+
 As was the case in PCR, we will want to
 use CV to choose the number of components.

@@ -893,7 +906,7 @@ grid = skm.GridSearchCV(pls,
 grid.fit(X, Y)

 ```
- 
+
 As for our other methods, we plot the MSE.

 ```{python}
@@ -908,7 +921,7 @@ ax.set_xticks(n_comp[::2])
 ax.set_ylim([50000,250000]);

 ```
- 
+
 CV error is minimized at 12,
 though there is little noticable difference between this point and a much lower number like 2 or 3 components.