@@ -836,7 +836,7 @@ A[1:4:2,0:3:2]
|
|||||||
|
|
||||||
|
|
||||||
Why are we able to retrieve a submatrix directly using slices but not using lists?
|
Why are we able to retrieve a submatrix directly using slices but not using lists?
|
||||||
Its because they are different `Python` types, and
|
It's because they are different `Python` types, and
|
||||||
are treated differently by `numpy`.
|
are treated differently by `numpy`.
|
||||||
Slices can be used to extract objects from arbitrary sequences, such as strings, lists, and tuples, while the use of lists for indexing is more limited.
|
Slices can be used to extract objects from arbitrary sequences, such as strings, lists, and tuples, while the use of lists for indexing is more limited.
|
||||||
|
|
||||||
@@ -889,7 +889,8 @@ A[np.array([0,1,0,1])]
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
By contrast, `keep_rows` retrieves only the second and fourth rows of `A` --- i.e. the rows for which the Boolean equals `TRUE`.
|
By contrast, `keep_rows` retrieves only the second and fourth rows of `A` --- i.e. the rows for which the Boolean equals `True`.
|
||||||
|
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
A[keep_rows]
|
A[keep_rows]
|
||||||
@@ -1152,7 +1153,7 @@ Auto_re.loc[lambda df: (df['year'] > 80) & (df['mpg'] > 30),
|
|||||||
The symbol `&` computes an element-wise *and* operation.
|
The symbol `&` computes an element-wise *and* operation.
|
||||||
As another example, suppose that we want to retrieve all `Ford` and `Datsun`
|
As another example, suppose that we want to retrieve all `Ford` and `Datsun`
|
||||||
cars with `displacement` less than 300. We check whether each `name` entry contains either the string `ford` or `datsun` using the `str.contains()` method of the `index` attribute of
|
cars with `displacement` less than 300. We check whether each `name` entry contains either the string `ford` or `datsun` using the `str.contains()` method of the `index` attribute of
|
||||||
of the dataframe:
|
the dataframe:
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
Auto_re.loc[lambda df: (df['displacement'] < 300)
|
Auto_re.loc[lambda df: (df['displacement'] < 300)
|
||||||
|
|||||||
@@ -102,7 +102,7 @@ matrices (also called design matrices) using the `ModelSpec()` transform from `
|
|||||||
|
|
||||||
We will use the `Boston` housing data set, which is contained in the `ISLP` package. The `Boston` dataset records `medv` (median house value) for $506$ neighborhoods
|
We will use the `Boston` housing data set, which is contained in the `ISLP` package. The `Boston` dataset records `medv` (median house value) for $506$ neighborhoods
|
||||||
around Boston. We will build a regression model to predict `medv` using $13$
|
around Boston. We will build a regression model to predict `medv` using $13$
|
||||||
predictors such as `rmvar` (average number of rooms per house),
|
predictors such as `rm` (average number of rooms per house),
|
||||||
`age` (proportion of owner-occupied units built prior to 1940), and `lstat` (percent of
|
`age` (proportion of owner-occupied units built prior to 1940), and `lstat` (percent of
|
||||||
households with low socioeconomic status). We will use `statsmodels` for this
|
households with low socioeconomic status). We will use `statsmodels` for this
|
||||||
task, a `Python` package that implements several commonly used
|
task, a `Python` package that implements several commonly used
|
||||||
@@ -252,7 +252,7 @@ We can produce confidence intervals for the predicted values.
|
|||||||
new_predictions.conf_int(alpha=0.05)
|
new_predictions.conf_int(alpha=0.05)
|
||||||
|
|
||||||
```
|
```
|
||||||
Prediction intervals are computing by setting `obs=True`:
|
Prediction intervals are computed by setting `obs=True`:
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
new_predictions.conf_int(obs=True, alpha=0.05)
|
new_predictions.conf_int(obs=True, alpha=0.05)
|
||||||
@@ -286,7 +286,7 @@ def abline(ax, b, m):
|
|||||||
```
|
```
|
||||||
A few things are illustrated above. First we see the syntax for defining a function:
|
A few things are illustrated above. First we see the syntax for defining a function:
|
||||||
`def funcname(...)`. The function has arguments `ax, b, m`
|
`def funcname(...)`. The function has arguments `ax, b, m`
|
||||||
where `ax` is an axis object for an exisiting plot, `b` is the intercept and
|
where `ax` is an axis object for an existing plot, `b` is the intercept and
|
||||||
`m` is the slope of the desired line. Other plotting options can be passed on to
|
`m` is the slope of the desired line. Other plotting options can be passed on to
|
||||||
`ax.plot` by including additional optional arguments as follows:
|
`ax.plot` by including additional optional arguments as follows:
|
||||||
|
|
||||||
@@ -539,7 +539,7 @@ and `lstat`.
|
|||||||
|
|
||||||
The function `anova_lm()` can take more than two nested models
|
The function `anova_lm()` can take more than two nested models
|
||||||
as input, in which case it compares every successive pair of models.
|
as input, in which case it compares every successive pair of models.
|
||||||
That also explains why their are `NaN`s in the first row above, since
|
That also explains why there are `NaN`s in the first row above, since
|
||||||
there is no previous model with which to compare the first.
|
there is no previous model with which to compare the first.
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -88,7 +88,7 @@ fit is $23.62$.
|
|||||||
|
|
||||||
We can also estimate the validation error for
|
We can also estimate the validation error for
|
||||||
higher-degree polynomial regressions. We first provide a function `evalMSE()` that takes a model string as well
|
higher-degree polynomial regressions. We first provide a function `evalMSE()` that takes a model string as well
|
||||||
as a training and test set and returns the MSE on the test set.
|
as training and test sets and returns the MSE on the test set.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
def evalMSE(terms,
|
def evalMSE(terms,
|
||||||
@@ -195,7 +195,7 @@ object with the appropriate `fit()`, `predict()`,
|
|||||||
and `score()` methods, an
|
and `score()` methods, an
|
||||||
array of features `X` and a response `Y`.
|
array of features `X` and a response `Y`.
|
||||||
We also included an additional argument `cv` to `cross_validate()`; specifying an integer
|
We also included an additional argument `cv` to `cross_validate()`; specifying an integer
|
||||||
$K$ results in $K$-fold cross-validation. We have provided a value
|
$k$ results in $k$-fold cross-validation. We have provided a value
|
||||||
corresponding to the total number of observations, which results in
|
corresponding to the total number of observations, which results in
|
||||||
leave-one-out cross-validation (LOOCV). The `cross_validate()` function produces a dictionary with several components;
|
leave-one-out cross-validation (LOOCV). The `cross_validate()` function produces a dictionary with several components;
|
||||||
we simply want the cross-validated test score here (MSE), which is estimated to be 24.23.
|
we simply want the cross-validated test score here (MSE), which is estimated to be 24.23.
|
||||||
@@ -243,8 +243,8 @@ np.add.outer(A, B)
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
In the CV example above, we used $K=n$, but of course we can also use $K<n$. The code is very similar
|
In the CV example above, we used $k=n$, but of course we can also use $k<n$. The code is very similar
|
||||||
to the above (and is significantly faster). Here we use `KFold()` to partition the data into $K=10$ random groups. We use `random_state` to set a random seed and initialize a vector `cv_error` in which we will store the CV errors corresponding to the
|
to the above (and is significantly faster). Here we use `KFold()` to partition the data into $k=10$ random groups. We use `random_state` to set a random seed and initialize a vector `cv_error` in which we will store the CV errors corresponding to the
|
||||||
polynomial fits of degrees one to five.
|
polynomial fits of degrees one to five.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -264,7 +264,7 @@ cv_error
|
|||||||
```
|
```
|
||||||
Notice that the computation time is much shorter than that of LOOCV.
|
Notice that the computation time is much shorter than that of LOOCV.
|
||||||
(In principle, the computation time for LOOCV for a least squares
|
(In principle, the computation time for LOOCV for a least squares
|
||||||
linear model should be faster than for $K$-fold CV, due to the
|
linear model should be faster than for $k$-fold CV, due to the
|
||||||
availability of the formula~(\ref{Ch5:eq:LOOCVform}) for LOOCV;
|
availability of the formula~(\ref{Ch5:eq:LOOCVform}) for LOOCV;
|
||||||
however, the generic `cross_validate()` function does not make
|
however, the generic `cross_validate()` function does not make
|
||||||
use of this formula.) We still see little evidence that using cubic
|
use of this formula.) We still see little evidence that using cubic
|
||||||
@@ -273,8 +273,9 @@ using a quadratic fit.
|
|||||||
|
|
||||||
|
|
||||||
The `cross_validate()` function is flexible and can take
|
The `cross_validate()` function is flexible and can take
|
||||||
different splitting mechanisms as an argument. For instance, one can use the `ShuffleSplit()` funtion to implement
|
different splitting mechanisms as an argument. For instance, one can use the `ShuffleSplit()`
|
||||||
the validation set approach just as easily as K-fold cross-validation.
|
function to implement
|
||||||
|
the validation set approach just as easily as $k$-fold cross-validation.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
validation = ShuffleSplit(n_splits=1,
|
validation = ShuffleSplit(n_splits=1,
|
||||||
@@ -511,7 +512,7 @@ standard formulas given in
|
|||||||
rely on certain assumptions. For example,
|
rely on certain assumptions. For example,
|
||||||
they depend on the unknown parameter $\sigma^2$, the noise
|
they depend on the unknown parameter $\sigma^2$, the noise
|
||||||
variance. We then estimate $\sigma^2$ using the RSS. Now although the
|
variance. We then estimate $\sigma^2$ using the RSS. Now although the
|
||||||
formula for the standard errors do not rely on the linear model being
|
formulas for the standard errors do not rely on the linear model being
|
||||||
correct, the estimate for $\sigma^2$ does. We see
|
correct, the estimate for $\sigma^2$ does. We see
|
||||||
{in Figure~\ref{Ch3:polyplot} on page~\pageref{Ch3:polyplot}} that there is
|
{in Figure~\ref{Ch3:polyplot} on page~\pageref{Ch3:polyplot}} that there is
|
||||||
a non-linear relationship in the data, and so the residuals from a
|
a non-linear relationship in the data, and so the residuals from a
|
||||||
|
|||||||
@@ -334,7 +334,7 @@ The function `fit_path()` returns a list whose values include the fitted coeffic
|
|||||||
path[3]
|
path[3]
|
||||||
|
|
||||||
```
|
```
|
||||||
In the example above, we see that at the fourth step in the path, we have two nonzero coefficients in `'B'`, corresponding to the value $0.114$ for the penalty parameter `lambda_0`.
|
In the example above, we see that at the fourth step in the path, we have two nonzero coefficients in `'B'`, corresponding to the value $0.0114$ for the penalty parameter `lambda_0`.
|
||||||
We could make predictions using this sequence of fits on a validation set as a function of `lambda_0`, or with more work using cross-validation.
|
We could make predictions using this sequence of fits on a validation set as a function of `lambda_0`, or with more work using cross-validation.
|
||||||
|
|
||||||
## Ridge Regression and the Lasso
|
## Ridge Regression and the Lasso
|
||||||
@@ -913,6 +913,6 @@ ax.set_ylim([50000,250000]);
|
|||||||
```
|
```
|
||||||
|
|
||||||
CV error is minimized at 12,
|
CV error is minimized at 12,
|
||||||
though there is little noticable difference between this point and a much lower number like 2 or 3 components.
|
though there is little noticeable difference between this point and a much lower number like 2 or 3 components.
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -223,7 +223,7 @@ grid.fit(X_train, High_train)
|
|||||||
grid.best_score_
|
grid.best_score_
|
||||||
|
|
||||||
```
|
```
|
||||||
Let’s take a look at the pruned true.
|
Let’s take a look at the pruned tree.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
ax = subplots(figsize=(12, 12))[1]
|
ax = subplots(figsize=(12, 12))[1]
|
||||||
@@ -509,7 +509,7 @@ np.mean((y_test - y_hat_boost)**2)
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
In this case, using $\lambda=0.2$ leads to a almost the same test MSE
|
In this case, using $\lambda=0.2$ leads to almost the same test MSE
|
||||||
as when using $\lambda=0.001$.
|
as when using $\lambda=0.001$.
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -42,7 +42,7 @@ roc_curve = RocCurveDisplay.from_estimator # shorthand
|
|||||||
We now use the `SupportVectorClassifier()` function (abbreviated `SVC()`) from `sklearn` to fit the support vector
|
We now use the `SupportVectorClassifier()` function (abbreviated `SVC()`) from `sklearn` to fit the support vector
|
||||||
classifier for a given value of the parameter `C`. The
|
classifier for a given value of the parameter `C`. The
|
||||||
`C` argument allows us to specify the cost of a violation to
|
`C` argument allows us to specify the cost of a violation to
|
||||||
the margin. When the `cost` argument is small, then the margins
|
the margin. When the `C` argument is small, then the margins
|
||||||
will be wide and many support vectors will be on the margin or will
|
will be wide and many support vectors will be on the margin or will
|
||||||
violate the margin. When the `C` argument is large, then the
|
violate the margin. When the `C` argument is large, then the
|
||||||
margins will be narrow and there will be few support vectors on the
|
margins will be narrow and there will be few support vectors on the
|
||||||
|
|||||||
@@ -1137,7 +1137,7 @@ img_preds = resnet_model(imgs)
|
|||||||
Let’s look at the predicted probabilities for each of the top 3 choices. First we compute
|
Let’s look at the predicted probabilities for each of the top 3 choices. First we compute
|
||||||
the probabilities by applying the softmax to the logits in `img_preds`. Note that
|
the probabilities by applying the softmax to the logits in `img_preds`. Note that
|
||||||
we have had to call the `detach()` method on the tensor `img_preds` in order to convert
|
we have had to call the `detach()` method on the tensor `img_preds` in order to convert
|
||||||
it to our a more familiar `ndarray`.
|
it to a more familiar `ndarray`.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
img_probs = np.exp(np.asarray(img_preds.detach()))
|
img_probs = np.exp(np.asarray(img_preds.detach()))
|
||||||
|
|||||||
@@ -10,7 +10,7 @@
|
|||||||
In this lab we demonstrate PCA and clustering on several datasets.
|
In this lab we demonstrate PCA and clustering on several datasets.
|
||||||
As in other labs, we import some of our libraries at this top
|
As in other labs, we import some of our libraries at this top
|
||||||
level. This makes the code more readable, as scanning the first few
|
level. This makes the code more readable, as scanning the first few
|
||||||
lines of the notebook tell us what libraries are used in this
|
lines of the notebook tells us what libraries are used in this
|
||||||
notebook.
|
notebook.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
@@ -837,7 +837,7 @@ ax.axhline(140, c='r', linewidth=4);
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The `axhline()` function draws a horizontal line line on top of any
|
The `axhline()` function draws a horizontal line on top of any
|
||||||
existing set of axes. The argument `140` plots a horizontal
|
existing set of axes. The argument `140` plots a horizontal
|
||||||
line at height 140 on the dendrogram; this is a height that
|
line at height 140 on the dendrogram; this is a height that
|
||||||
results in four distinct clusters. It is easy to verify that the
|
results in four distinct clusters. It is easy to verify that the
|
||||||
|
|||||||
Reference in New Issue
Block a user