fixing whitespace in Rmd so diff of errata is cleaner (#46)
* fixing whitespace in Rmd so diff of errata is cleaner * reapply kwargs fix
This commit is contained in:
@@ -1,6 +1,3 @@
|
||||
|
||||
|
||||
|
||||
# Support Vector Machines
|
||||
|
||||
<a target="_blank" href="https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2/Ch09-svm-lab.ipynb">
|
||||
@@ -31,7 +28,7 @@ from ISLP.svm import plot as plot_svm
|
||||
from sklearn.metrics import RocCurveDisplay
|
||||
|
||||
```
|
||||
|
||||
|
||||
We will use the function `RocCurveDisplay.from_estimator()` to
|
||||
produce several ROC plots, using a shorthand `roc_curve`.
|
||||
|
||||
@@ -76,8 +73,8 @@ svm_linear = SVC(C=10, kernel='linear')
|
||||
svm_linear.fit(X, y)
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
The support vector classifier with two features can
|
||||
be visualized by plotting values of its *decision function*.
|
||||
We have included a function for this in the `ISLP` package (inspired by a similar
|
||||
@@ -91,7 +88,7 @@ plot_svm(X,
|
||||
ax=ax)
|
||||
|
||||
```
|
||||
|
||||
|
||||
The decision
|
||||
boundary between the two classes is linear (because we used the
|
||||
argument `kernel='linear'`). The support vectors are marked with `+`
|
||||
@@ -118,8 +115,8 @@ coefficients of the linear decision boundary as follows:
|
||||
svm_linear.coef_
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
Since the support vector machine is an estimator in `sklearn`, we
|
||||
can use the usual machinery to tune it.
|
||||
|
||||
@@ -136,8 +133,8 @@ grid.fit(X, y)
|
||||
grid.best_params_
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
We can easily access the cross-validation errors for each of these models
|
||||
in `grid.cv_results_`. This prints out a lot of detail, so we
|
||||
extract the accuracy results only.
|
||||
@@ -158,7 +155,7 @@ y_test = np.array([-1]*10+[1]*10)
|
||||
X_test[y_test==1] += 1
|
||||
|
||||
```
|
||||
|
||||
|
||||
Now we predict the class labels of these test observations. Here we
|
||||
use the best model selected by cross-validation in order to make the
|
||||
predictions.
|
||||
@@ -169,7 +166,7 @@ y_test_hat = best_.predict(X_test)
|
||||
confusion_table(y_test_hat, y_test)
|
||||
|
||||
```
|
||||
|
||||
|
||||
Thus, with this value of `C`,
|
||||
70% of the test
|
||||
observations are correctly classified. What if we had instead used
|
||||
@@ -182,7 +179,7 @@ y_test_hat = svm_.predict(X_test)
|
||||
confusion_table(y_test_hat, y_test)
|
||||
|
||||
```
|
||||
|
||||
|
||||
In this case 60% of test observations are correctly classified.
|
||||
|
||||
We now consider a situation in which the two classes are linearly
|
||||
@@ -197,7 +194,7 @@ fig, ax = subplots(figsize=(8,8))
|
||||
ax.scatter(X[:,0], X[:,1], c=y, cmap=cm.coolwarm);
|
||||
|
||||
```
|
||||
|
||||
|
||||
Now the observations are just barely linearly separable.
|
||||
|
||||
```{python}
|
||||
@@ -206,7 +203,7 @@ y_hat = svm_.predict(X)
|
||||
confusion_table(y_hat, y)
|
||||
|
||||
```
|
||||
|
||||
|
||||
We fit the
|
||||
support vector classifier and plot the resulting hyperplane, using a
|
||||
very large value of `C` so that no observations are
|
||||
@@ -232,7 +229,7 @@ y_hat = svm_.predict(X)
|
||||
confusion_table(y_hat, y)
|
||||
|
||||
```
|
||||
|
||||
|
||||
Using `C=0.1`, we again do not misclassify any training observations, but we
|
||||
also obtain a much wider margin and make use of twelve support
|
||||
vectors. These jointly define the orientation of the decision boundary, and since there are more of them, it is more stable. It seems possible that this model will perform better on test
|
||||
@@ -246,7 +243,7 @@ plot_svm(X,
|
||||
ax=ax)
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
## Support Vector Machine
|
||||
In order to fit an SVM using a non-linear kernel, we once again use
|
||||
@@ -269,7 +266,7 @@ X[100:150] -= 2
|
||||
y = np.array([1]*150+[2]*50)
|
||||
|
||||
```
|
||||
|
||||
|
||||
Plotting the data makes it clear that the class boundary is indeed non-linear.
|
||||
|
||||
```{python}
|
||||
@@ -280,8 +277,8 @@ ax.scatter(X[:,0],
|
||||
cmap=cm.coolwarm);
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
The data is randomly split into training and testing groups. We then
|
||||
fit the training data using the `SVC()` estimator with a
|
||||
radial kernel and $\gamma=1$:
|
||||
@@ -298,7 +295,7 @@ svm_rbf = SVC(kernel="rbf", gamma=1, C=1)
|
||||
svm_rbf.fit(X_train, y_train)
|
||||
|
||||
```
|
||||
|
||||
|
||||
The plot shows that the resulting SVM has a decidedly non-linear
|
||||
boundary.
|
||||
|
||||
@@ -310,7 +307,7 @@ plot_svm(X_train,
|
||||
ax=ax)
|
||||
|
||||
```
|
||||
|
||||
|
||||
We can see from the figure that there are a fair number of training
|
||||
errors in this SVM fit. If we increase the value of `C`, we
|
||||
can reduce the number of training errors. However, this comes at the
|
||||
@@ -327,7 +324,7 @@ plot_svm(X_train,
|
||||
ax=ax)
|
||||
|
||||
```
|
||||
|
||||
|
||||
We can perform cross-validation using `skm.GridSearchCV()` to select the
|
||||
best choice of $\gamma$ and `C` for an SVM with a radial
|
||||
kernel:
|
||||
@@ -346,7 +343,7 @@ grid.fit(X_train, y_train)
|
||||
grid.best_params_
|
||||
|
||||
```
|
||||
|
||||
|
||||
The best choice of parameters under five-fold CV is achieved at `C=1`
|
||||
and `gamma=0.5`, though several other values also achieve the same
|
||||
value.
|
||||
@@ -363,7 +360,7 @@ y_hat_test = best_svm.predict(X_test)
|
||||
confusion_table(y_hat_test, y_test)
|
||||
|
||||
```
|
||||
|
||||
|
||||
With these parameters, 12% of test
|
||||
observations are misclassified by this SVM.
|
||||
|
||||
@@ -423,7 +420,7 @@ roc_curve(svm_flex,
|
||||
ax=ax);
|
||||
|
||||
```
|
||||
|
||||
|
||||
However, these ROC curves are all on the training data. We are really
|
||||
more interested in the level of prediction accuracy on the test
|
||||
data. When we compute the ROC curves on the test data, the model with
|
||||
@@ -439,7 +436,7 @@ roc_curve(svm_flex,
|
||||
fig;
|
||||
|
||||
```
|
||||
|
||||
|
||||
Let’s look at our tuned SVM.
|
||||
|
||||
```{python}
|
||||
@@ -458,7 +455,7 @@ for (X_, y_, c, name) in zip(
|
||||
color=c)
|
||||
|
||||
```
|
||||
|
||||
|
||||
## SVM with Multiple Classes
|
||||
|
||||
If the response is a factor containing more than two levels, then the
|
||||
@@ -477,7 +474,7 @@ fig, ax = subplots(figsize=(8,8))
|
||||
ax.scatter(X[:,0], X[:,1], c=y, cmap=cm.coolwarm);
|
||||
|
||||
```
|
||||
|
||||
|
||||
We now fit an SVM to the data:
|
||||
|
||||
```{python}
|
||||
@@ -513,7 +510,7 @@ Khan = load_data('Khan')
|
||||
Khan['xtrain'].shape, Khan['xtest'].shape
|
||||
|
||||
```
|
||||
|
||||
|
||||
This data set consists of expression measurements for 2,308
|
||||
genes. The training and test sets consist of 63 and 20
|
||||
observations, respectively.
|
||||
@@ -532,7 +529,7 @@ confusion_table(khan_linear.predict(Khan['xtrain']),
|
||||
Khan['ytrain'])
|
||||
|
||||
```
|
||||
|
||||
|
||||
We see that there are *no* training
|
||||
errors. In fact, this is not surprising, because the large number of
|
||||
variables relative to the number of observations implies that it is
|
||||
@@ -545,7 +542,7 @@ confusion_table(khan_linear.predict(Khan['xtest']),
|
||||
Khan['ytest'])
|
||||
|
||||
```
|
||||
|
||||
|
||||
We see that using `C=10` yields two test set errors on these data.
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user