fixing whitespace in Rmd so diff of errata is cleaner (#46)

* fixing whitespace in Rmd so diff of errata is cleaner

* reapply kwargs fix
This commit is contained in:
Jonathan Taylor
2025-04-03 12:25:17 -07:00
parent 7f1103e140
commit 8fa98567ee
12 changed files with 392 additions and 410 deletions

View File

@@ -1,6 +1,3 @@
# Support Vector Machines
<a target="_blank" href="https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2/Ch09-svm-lab.ipynb">
@@ -31,7 +28,7 @@ from ISLP.svm import plot as plot_svm
from sklearn.metrics import RocCurveDisplay
```
We will use the function `RocCurveDisplay.from_estimator()` to
produce several ROC plots, using a shorthand `roc_curve`.
@@ -76,8 +73,8 @@ svm_linear = SVC(C=10, kernel='linear')
svm_linear.fit(X, y)
```
The support vector classifier with two features can
be visualized by plotting values of its *decision function*.
We have included a function for this in the `ISLP` package (inspired by a similar
@@ -91,7 +88,7 @@ plot_svm(X,
ax=ax)
```
The decision
boundary between the two classes is linear (because we used the
argument `kernel='linear'`). The support vectors are marked with `+`
@@ -118,8 +115,8 @@ coefficients of the linear decision boundary as follows:
svm_linear.coef_
```
Since the support vector machine is an estimator in `sklearn`, we
can use the usual machinery to tune it.
@@ -136,8 +133,8 @@ grid.fit(X, y)
grid.best_params_
```
We can easily access the cross-validation errors for each of these models
in `grid.cv_results_`. This prints out a lot of detail, so we
extract the accuracy results only.
@@ -158,7 +155,7 @@ y_test = np.array([-1]*10+[1]*10)
X_test[y_test==1] += 1
```
Now we predict the class labels of these test observations. Here we
use the best model selected by cross-validation in order to make the
predictions.
@@ -169,7 +166,7 @@ y_test_hat = best_.predict(X_test)
confusion_table(y_test_hat, y_test)
```
Thus, with this value of `C`,
70% of the test
observations are correctly classified. What if we had instead used
@@ -182,7 +179,7 @@ y_test_hat = svm_.predict(X_test)
confusion_table(y_test_hat, y_test)
```
In this case 60% of test observations are correctly classified.
We now consider a situation in which the two classes are linearly
@@ -197,7 +194,7 @@ fig, ax = subplots(figsize=(8,8))
ax.scatter(X[:,0], X[:,1], c=y, cmap=cm.coolwarm);
```
Now the observations are just barely linearly separable.
```{python}
@@ -206,7 +203,7 @@ y_hat = svm_.predict(X)
confusion_table(y_hat, y)
```
We fit the
support vector classifier and plot the resulting hyperplane, using a
very large value of `C` so that no observations are
@@ -232,7 +229,7 @@ y_hat = svm_.predict(X)
confusion_table(y_hat, y)
```
Using `C=0.1`, we again do not misclassify any training observations, but we
also obtain a much wider margin and make use of twelve support
vectors. These jointly define the orientation of the decision boundary, and since there are more of them, it is more stable. It seems possible that this model will perform better on test
@@ -246,7 +243,7 @@ plot_svm(X,
ax=ax)
```
## Support Vector Machine
In order to fit an SVM using a non-linear kernel, we once again use
@@ -269,7 +266,7 @@ X[100:150] -= 2
y = np.array([1]*150+[2]*50)
```
Plotting the data makes it clear that the class boundary is indeed non-linear.
```{python}
@@ -280,8 +277,8 @@ ax.scatter(X[:,0],
cmap=cm.coolwarm);
```
The data is randomly split into training and testing groups. We then
fit the training data using the `SVC()` estimator with a
radial kernel and $\gamma=1$:
@@ -298,7 +295,7 @@ svm_rbf = SVC(kernel="rbf", gamma=1, C=1)
svm_rbf.fit(X_train, y_train)
```
The plot shows that the resulting SVM has a decidedly non-linear
boundary.
@@ -310,7 +307,7 @@ plot_svm(X_train,
ax=ax)
```
We can see from the figure that there are a fair number of training
errors in this SVM fit. If we increase the value of `C`, we
can reduce the number of training errors. However, this comes at the
@@ -327,7 +324,7 @@ plot_svm(X_train,
ax=ax)
```
We can perform cross-validation using `skm.GridSearchCV()` to select the
best choice of $\gamma$ and `C` for an SVM with a radial
kernel:
@@ -346,7 +343,7 @@ grid.fit(X_train, y_train)
grid.best_params_
```
The best choice of parameters under five-fold CV is achieved at `C=1`
and `gamma=0.5`, though several other values also achieve the same
value.
@@ -363,7 +360,7 @@ y_hat_test = best_svm.predict(X_test)
confusion_table(y_hat_test, y_test)
```
With these parameters, 12% of test
observations are misclassified by this SVM.
@@ -423,7 +420,7 @@ roc_curve(svm_flex,
ax=ax);
```
However, these ROC curves are all on the training data. We are really
more interested in the level of prediction accuracy on the test
data. When we compute the ROC curves on the test data, the model with
@@ -439,7 +436,7 @@ roc_curve(svm_flex,
fig;
```
Lets look at our tuned SVM.
```{python}
@@ -458,7 +455,7 @@ for (X_, y_, c, name) in zip(
color=c)
```
## SVM with Multiple Classes
If the response is a factor containing more than two levels, then the
@@ -477,7 +474,7 @@ fig, ax = subplots(figsize=(8,8))
ax.scatter(X[:,0], X[:,1], c=y, cmap=cm.coolwarm);
```
We now fit an SVM to the data:
```{python}
@@ -513,7 +510,7 @@ Khan = load_data('Khan')
Khan['xtrain'].shape, Khan['xtest'].shape
```
This data set consists of expression measurements for 2,308
genes. The training and test sets consist of 63 and 20
observations, respectively.
@@ -532,7 +529,7 @@ confusion_table(khan_linear.predict(Khan['xtrain']),
Khan['ytrain'])
```
We see that there are *no* training
errors. In fact, this is not surprising, because the large number of
variables relative to the number of observations implies that it is
@@ -545,7 +542,7 @@ confusion_table(khan_linear.predict(Khan['xtest']),
Khan['ytest'])
```
We see that using `C=10` yields two test set errors on these data.