fixing whitespace in Rmd so diff of errata is cleaner (#46)

* fixing whitespace in Rmd so diff of errata is cleaner

* reapply kwargs fix
This commit is contained in:
Jonathan Taylor
2025-04-03 12:25:17 -07:00
parent 7f1103e140
commit 8fa98567ee
12 changed files with 392 additions and 410 deletions

View File

@@ -1,6 +1,3 @@
# Tree-Based Methods
<a target="_blank" href="https://colab.research.google.com/github/intro-stat-learning/ISLP_labs/blob/v2.2/Ch08-baggboost-lab.ipynb">
@@ -38,10 +35,10 @@ from sklearn.ensemble import \
from ISLP.bart import BART
```
## Fitting Classification Trees
We first use classification trees to analyze the `Carseats` data set.
In these data, `Sales` is a continuous variable, and so we begin
@@ -57,7 +54,7 @@ High = np.where(Carseats.Sales > 8,
"No")
```
We now use `DecisionTreeClassifier()` to fit a classification tree in
order to predict `High` using all variables but `Sales`.
To do so, we must form a model matrix as we did when fitting regression
@@ -85,8 +82,8 @@ clf = DTC(criterion='entropy',
clf.fit(X, High)
```
In our discussion of qualitative features in Section~\ref{ch3:sec3},
we noted that for a linear regression model such a feature could be
represented by including a matrix of dummy variables (one-hot-encoding) in the model
@@ -102,8 +99,8 @@ advantage of this approach; instead it simply treats the one-hot-encoded levels
accuracy_score(High, clf.predict(X))
```
With only the default arguments, the training error rate is
21%.
For classification trees, we can
@@ -121,7 +118,7 @@ resid_dev = np.sum(log_loss(High, clf.predict_proba(X)))
resid_dev
```
This is closely related to the *entropy*, defined in (\ref{Ch8:eq:cross-entropy}).
A small deviance indicates a
tree that provides a good fit to the (training) data.
@@ -153,7 +150,7 @@ print(export_text(clf,
show_weights=True))
```
In order to properly evaluate the performance of a classification tree
on these data, we must estimate the test error rather than simply
computing the training error. We split the observations into a
@@ -256,8 +253,8 @@ confusion = confusion_table(best_.predict(X_test),
confusion
```
Now 72.0% of the test observations are correctly classified, which is slightly worse than the error for the full tree (with 35 leaves). So cross-validation has not helped us much here; it only pruned off 5 leaves, at a cost of a slightly worse error. These results would change if we were to change the random number seeds above; even though cross-validation gives an unbiased approach to model selection, it does have variance.
@@ -275,7 +272,7 @@ feature_names = list(D.columns)
X = np.asarray(D)
```
First, we split the data into training and test sets, and fit the tree
to the training data. Here we use 30% of the data for the test set.
@@ -290,7 +287,7 @@ to the training data. Here we use 30% of the data for the test set.
random_state=0)
```
Having formed our training and test data sets, we fit the regression tree.
```{python}
@@ -302,7 +299,7 @@ plot_tree(reg,
ax=ax);
```
The variable `lstat` measures the percentage of individuals with
lower socioeconomic status. The tree indicates that lower
values of `lstat` correspond to more expensive houses.
@@ -326,7 +323,7 @@ grid = skm.GridSearchCV(reg,
G = grid.fit(X_train, y_train)
```
In keeping with the cross-validation results, we use the pruned tree
to make predictions on the test set.
@@ -335,8 +332,8 @@ best_ = grid.best_estimator_
np.mean((y_test - best_.predict(X_test))**2)
```
In other words, the test set MSE associated with the regression tree
is 28.07. The square root of
the MSE is therefore around
@@ -359,7 +356,7 @@ plot_tree(G.best_estimator_,
## Bagging and Random Forests
Here we apply bagging and random forests to the `Boston` data, using
the `RandomForestRegressor()` from the `sklearn.ensemble` package. Recall
@@ -372,8 +369,8 @@ bag_boston = RF(max_features=X_train.shape[1], random_state=0)
bag_boston.fit(X_train, y_train)
```
The argument `max_features` indicates that all 12 predictors should
be considered for each split of the tree --- in other words, that
bagging should be done. How well does this bagged model perform on
@@ -386,7 +383,7 @@ ax.scatter(y_hat_bag, y_test)
np.mean((y_test - y_hat_bag)**2)
```
The test set MSE associated with the bagged regression tree is
14.63, about half that obtained using an optimally-pruned single
tree. We could change the number of trees grown from the default of
@@ -417,8 +414,8 @@ y_hat_RF = RF_boston.predict(X_test)
np.mean((y_test - y_hat_RF)**2)
```
The test set MSE is 20.04;
this indicates that random forests did somewhat worse than bagging
in this case. Extracting the `feature_importances_` values from the fitted model, we can view the
@@ -442,7 +439,7 @@ house size (`rm`) are by far the two most important variables.
## Boosting
Here we use `GradientBoostingRegressor()` from `sklearn.ensemble`
to fit boosted regression trees to the `Boston` data
@@ -461,7 +458,7 @@ boost_boston = GBR(n_estimators=5000,
boost_boston.fit(X_train, y_train)
```
We can see how the training error decreases with the `train_score_` attribute.
To get an idea of how the test error decreases we can use the
`staged_predict()` method to get the predicted values along the path.
@@ -484,7 +481,7 @@ ax.plot(plot_idx,
ax.legend();
```
We now use the boosted model to predict `medv` on the test set:
```{python}
@@ -492,7 +489,7 @@ y_hat_boost = boost_boston.predict(X_test);
np.mean((y_test - y_hat_boost)**2)
```
The test MSE obtained is 14.48,
similar to the test MSE for bagging. If we want to, we can
perform boosting with a different value of the shrinkage parameter
@@ -510,8 +507,8 @@ y_hat_boost = boost_boston.predict(X_test);
np.mean((y_test - y_hat_boost)**2)
```
In this case, using $\lambda=0.2$ leads to a almost the same test MSE
as when using $\lambda=0.001$.
@@ -519,7 +516,7 @@ as when using $\lambda=0.001$.
## Bayesian Additive Regression Trees
In this section we demonstrate a `Python` implementation of BART found in the
`ISLP.bart` package. We fit a model
@@ -532,8 +529,8 @@ bart_boston = BART(random_state=0, burnin=5, ndraw=15)
bart_boston.fit(X_train, y_train)
```
On this data set, with this split into test and training, we see that the test error of BART is similar to that of random forest.
```{python}
@@ -541,8 +538,8 @@ yhat_test = bart_boston.predict(X_test.astype(np.float32))
np.mean((y_test - yhat_test)**2)
```
We can check how many times each variable appeared in the collection of trees.
This gives a summary similar to the variable importance plot for boosting and random forests.