fixing whitespace in Rmd so diff of errata is cleaner (#46)

* fixing whitespace in Rmd so diff of errata is cleaner * reapply kwargs fix
2025-04-03 12:25:17 -07:00
parent 7f1103e140
commit 8fa98567ee
12 changed files with 392 additions and 410 deletions
--- a/Ch09-svm-lab.Rmd
+++ b/Ch09-svm-lab.Rmd
@@ -1,6 +1,3 @@
-
- 
-
 # Support Vector Machines

 <a target="_blank" href="https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2/Ch09-svm-lab.ipynb">
@@ -31,7 +28,7 @@ from ISLP.svm import plot as plot_svm
 from sklearn.metrics import RocCurveDisplay

 ```
- 
+
 We will use the function `RocCurveDisplay.from_estimator()` to
 produce several ROC plots, using a shorthand `roc_curve`.

@@ -76,8 +73,8 @@ svm_linear = SVC(C=10, kernel='linear')
 svm_linear.fit(X, y)

 ```
-    
- 
+
+
 The support vector classifier with two features can
 be visualized by plotting values of its *decision function*.
 We have included a function for this in the `ISLP` package (inspired by a similar
@@ -91,7 +88,7 @@ plot_svm(X,
         ax=ax)

 ```
- 
+
 The decision
 boundary between the two classes is linear (because we used the
 argument `kernel='linear'`). The support vectors are marked with `+`
@@ -118,8 +115,8 @@ coefficients of the linear decision boundary as follows:
 svm_linear.coef_

 ```
-    
- 
+
+
 Since the support vector machine is an estimator in `sklearn`, we
 can use the usual machinery to tune it.

@@ -136,8 +133,8 @@ grid.fit(X, y)
 grid.best_params_

 ```
-    
- 
+
+
 We can easily access the cross-validation errors for each of these models
 in  `grid.cv_results_`. This prints out a lot of detail, so we
 extract the accuracy results only.
@@ -158,7 +155,7 @@ y_test = np.array([-1]*10+[1]*10)
 X_test[y_test==1] += 1

 ```
- 
+
 Now we predict the class labels of these test observations. Here we
 use the best model selected by cross-validation in order to make the
 predictions.
@@ -169,7 +166,7 @@ y_test_hat = best_.predict(X_test)
 confusion_table(y_test_hat, y_test)

 ```
-    
+
 Thus, with this value of `C`,
 70% of the test
 observations are correctly classified.  What if we had instead used
@@ -182,7 +179,7 @@ y_test_hat = svm_.predict(X_test)
 confusion_table(y_test_hat, y_test)

 ```
-    
+
 In this case 60% of test observations are correctly classified.

 We now consider a situation in which the two classes are linearly
@@ -197,7 +194,7 @@ fig, ax = subplots(figsize=(8,8))
 ax.scatter(X[:,0], X[:,1], c=y, cmap=cm.coolwarm);

 ```
- 
+
 Now the observations are just barely linearly separable.

 ```{python}
@@ -206,7 +203,7 @@ y_hat = svm_.predict(X)
 confusion_table(y_hat, y)

 ```
-    
+
 We fit the
 support vector classifier and plot the resulting hyperplane, using a
 very large value of `C` so that no observations are
@@ -232,7 +229,7 @@ y_hat = svm_.predict(X)
 confusion_table(y_hat, y)

 ```
-    
+
 Using `C=0.1`, we again do not misclassify any training observations, but we
 also obtain a much wider margin and make use of twelve support
 vectors. These jointly define the orientation of the decision boundary, and since there are more of them, it is more stable. It seems possible that this model will perform better on test
@@ -246,7 +243,7 @@ plot_svm(X,
         ax=ax)

 ```
- 
+

 ## Support Vector Machine
 In order to fit an SVM using a non-linear kernel, we once again use
@@ -269,7 +266,7 @@ X[100:150] -= 2
 y = np.array([1]*150+[2]*50)

 ```
- 
+
 Plotting the data makes it clear that the class boundary is indeed non-linear.

 ```{python}
@@ -280,8 +277,8 @@ ax.scatter(X[:,0],
           cmap=cm.coolwarm);

 ```
-    
- 
+
+
 The data is randomly split into training and testing groups. We then
 fit the training data using the `SVC()`  estimator with a
 radial kernel and $\gamma=1$:
@@ -298,7 +295,7 @@ svm_rbf = SVC(kernel="rbf", gamma=1, C=1)
 svm_rbf.fit(X_train, y_train)

 ```
- 
+
 The plot shows that the resulting SVM has a decidedly non-linear
 boundary. 

@@ -310,7 +307,7 @@ plot_svm(X_train,
         ax=ax)

 ```
- 
+
 We can see from the figure that there are a fair number of training
 errors in this SVM fit.  If we increase the value of `C`, we
 can reduce the number of training errors. However, this comes at the
@@ -327,7 +324,7 @@ plot_svm(X_train,
         ax=ax)

 ```
- 
+
 We can perform cross-validation using `skm.GridSearchCV()`  to select the
 best choice of $\gamma$ and `C` for an SVM with a radial
 kernel:
@@ -346,7 +343,7 @@ grid.fit(X_train, y_train)
 grid.best_params_

 ```
-    
+
 The best choice of parameters under five-fold CV is achieved at `C=1`
 and `gamma=0.5`, though several other values also achieve the same
 value.
@@ -363,7 +360,7 @@ y_hat_test = best_svm.predict(X_test)
 confusion_table(y_hat_test, y_test)

 ```
-    
+
 With these parameters, 12% of test
 observations are misclassified by this SVM.

@@ -423,7 +420,7 @@ roc_curve(svm_flex,
          ax=ax);

 ```
- 
+
 However, these ROC curves are all on the training data. We are really
 more interested in the level of prediction accuracy on the test
 data. When we compute the ROC curves on the test data, the model with
@@ -439,7 +436,7 @@ roc_curve(svm_flex,
 fig;

 ```
- 
+
 Let’s look at our tuned SVM.

 ```{python}
@@ -458,7 +455,7 @@ for (X_, y_, c, name) in zip(
              color=c)

 ```
- 
+
 ## SVM with Multiple Classes

 If the response is a factor containing more than two levels, then the
@@ -477,7 +474,7 @@ fig, ax = subplots(figsize=(8,8))
 ax.scatter(X[:,0], X[:,1], c=y, cmap=cm.coolwarm);

 ```
- 
+
 We now fit an SVM to the data:

 ```{python}
@@ -513,7 +510,7 @@ Khan = load_data('Khan')
 Khan['xtrain'].shape, Khan['xtest'].shape

 ```
-    
+
 This data set consists of expression measurements for 2,308
 genes. The training and test sets consist of 63 and 20
 observations, respectively.
@@ -532,7 +529,7 @@ confusion_table(khan_linear.predict(Khan['xtrain']),
                Khan['ytrain'])

 ```
-    
+
 We  see that there are *no* training
 errors. In fact, this is not surprising, because the large number of
 variables relative to the number of observations implies that it is
@@ -545,7 +542,7 @@ confusion_table(khan_linear.predict(Khan['xtest']),
                Khan['ytest'])

 ```
-    
+
 We see that using `C=10` yields two test set errors on these data.