diff --git a/Ch02-statlearn-lab.Rmd b/Ch02-statlearn-lab.Rmd index 8cf2c9c..3f7804b 100644 --- a/Ch02-statlearn-lab.Rmd +++ b/Ch02-statlearn-lab.Rmd @@ -836,7 +836,7 @@ A[1:4:2,0:3:2] Why are we able to retrieve a submatrix directly using slices but not using lists? -Its because they are different `Python` types, and +It's because they are different `Python` types, and are treated differently by `numpy`. Slices can be used to extract objects from arbitrary sequences, such as strings, lists, and tuples, while the use of lists for indexing is more limited. @@ -889,7 +889,8 @@ A[np.array([0,1,0,1])] ``` - By contrast, `keep_rows` retrieves only the second and fourth rows of `A` --- i.e. the rows for which the Boolean equals `TRUE`. + By contrast, `keep_rows` retrieves only the second and fourth rows of `A` --- i.e. the rows for which the Boolean equals `True`. + ```{python} A[keep_rows] @@ -1152,7 +1153,7 @@ Auto_re.loc[lambda df: (df['year'] > 80) & (df['mpg'] > 30), The symbol `&` computes an element-wise *and* operation. As another example, suppose that we want to retrieve all `Ford` and `Datsun` cars with `displacement` less than 300. We check whether each `name` entry contains either the string `ford` or `datsun` using the `str.contains()` method of the `index` attribute of -of the dataframe: +the dataframe: ```{python} Auto_re.loc[lambda df: (df['displacement'] < 300) diff --git a/Ch03-linreg-lab.Rmd b/Ch03-linreg-lab.Rmd index 3c71a23..ec0bf89 100644 --- a/Ch03-linreg-lab.Rmd +++ b/Ch03-linreg-lab.Rmd @@ -102,7 +102,7 @@ matrices (also called design matrices) using the `ModelSpec()` transform from ` We will use the `Boston` housing data set, which is contained in the `ISLP` package. The `Boston` dataset records `medv` (median house value) for $506$ neighborhoods around Boston. We will build a regression model to predict `medv` using $13$ -predictors such as `rmvar` (average number of rooms per house), +predictors such as `rm` (average number of rooms per house), `age` (proportion of owner-occupied units built prior to 1940), and `lstat` (percent of households with low socioeconomic status). We will use `statsmodels` for this task, a `Python` package that implements several commonly used @@ -252,7 +252,7 @@ We can produce confidence intervals for the predicted values. new_predictions.conf_int(alpha=0.05) ``` -Prediction intervals are computing by setting `obs=True`: +Prediction intervals are computed by setting `obs=True`: ```{python} new_predictions.conf_int(obs=True, alpha=0.05) @@ -286,7 +286,7 @@ def abline(ax, b, m): ``` A few things are illustrated above. First we see the syntax for defining a function: `def funcname(...)`. The function has arguments `ax, b, m` -where `ax` is an axis object for an exisiting plot, `b` is the intercept and +where `ax` is an axis object for an existing plot, `b` is the intercept and `m` is the slope of the desired line. Other plotting options can be passed on to `ax.plot` by including additional optional arguments as follows: @@ -539,7 +539,7 @@ and `lstat`. The function `anova_lm()` can take more than two nested models as input, in which case it compares every successive pair of models. -That also explains why their are `NaN`s in the first row above, since +That also explains why there are `NaN`s in the first row above, since there is no previous model with which to compare the first. diff --git a/Ch05-resample-lab.Rmd b/Ch05-resample-lab.Rmd index c1c2bef..21820c1 100644 --- a/Ch05-resample-lab.Rmd +++ b/Ch05-resample-lab.Rmd @@ -88,7 +88,7 @@ fit is $23.62$. We can also estimate the validation error for higher-degree polynomial regressions. We first provide a function `evalMSE()` that takes a model string as well -as a training and test set and returns the MSE on the test set. +as training and test sets and returns the MSE on the test set. ```{python} def evalMSE(terms, @@ -195,7 +195,7 @@ object with the appropriate `fit()`, `predict()`, and `score()` methods, an array of features `X` and a response `Y`. We also included an additional argument `cv` to `cross_validate()`; specifying an integer -$K$ results in $K$-fold cross-validation. We have provided a value +$k$ results in $k$-fold cross-validation. We have provided a value corresponding to the total number of observations, which results in leave-one-out cross-validation (LOOCV). The `cross_validate()` function produces a dictionary with several components; we simply want the cross-validated test score here (MSE), which is estimated to be 24.23. @@ -243,8 +243,8 @@ np.add.outer(A, B) ``` -In the CV example above, we used $K=n$, but of course we can also use $K