Fix refs again (#76)

* Ch2->Ch02

* fixed latex refs again, somehow crept back in

* fixed the page refs, formats synced

* unsynced

* executed notebook besides 10

* warnings for lasso

* allow saving of output in notebooks

* Ch10 executed
This commit is contained in:
Jonathan Taylor
2026-02-04 17:40:52 -08:00
committed by GitHub
parent 3d9af7c4b0
commit 6bf6160a3d
25 changed files with 21872 additions and 3191 deletions

View File

@@ -164,7 +164,7 @@ for k in range(pcaUS.components_.shape[1]):
USArrests.columns[k])
```
Notice that this figure is a reflection of Figure~\ref{Ch10:fig:USArrests:obs} through the $y$-axis. Recall that the
Notice that this figure is a reflection of Figure 12.1 through the $y$-axis. Recall that the
principal components are only unique up to a sign change, so we can
reproduce that figure by flipping the
signs of the second set of scores and loadings.
@@ -241,7 +241,7 @@ ax.set_xticks(ticks)
fig
```
The result is similar to that shown in Figure~\ref{Ch10:fig:USArrests:scree}. Note
The result is similar to that shown in Figure 12.3. Note
that the method `cumsum()` computes the cumulative sum of
the elements of a numeric vector. For instance:
@@ -253,15 +253,15 @@ np.cumsum(a)
## Matrix Completion
We now re-create the analysis carried out on the `USArrests` data in
Section~\ref{Ch10:sec:princ-comp-with}.
Section 12.3.
We saw in Section~\ref{ch10:sec2.2} that solving the optimization
problem~(\ref{Ch10:eq:mc2}) on a centered data matrix $\bf X$ is
We saw in Section 12.2.2 that solving the optimization
problem~(12.6) on a centered data matrix $\bf X$ is
equivalent to computing the first $M$ principal
components of the data. We use our scaled
and centered `USArrests` data as $\bf X$ below. The *singular value decomposition*
(SVD) is a general algorithm for solving
(\ref{Ch10:eq:mc2}).
(12.6).
```{python}
X = USArrests_scaled
@@ -319,9 +319,9 @@ Here the array `r_idx`
contains 20 integers from 0 to 49; this represents the states (rows of `X`) that are selected to contain missing values. And `c_idx` contains
20 integers from 0 to 3, representing the features (columns in `X`) that contain the missing values for each of the selected states.
We now write some code to implement Algorithm~\ref{Ch10:alg:hardimpute}.
We now write some code to implement Algorithm 12.1.
We first write a function that takes in a matrix, and returns an approximation to the matrix using the `svd()` function.
This will be needed in Step 2 of Algorithm~\ref{Ch10:alg:hardimpute}.
This will be needed in Step 2 of Algorithm 12.1.
```{python}
def low_rank(X, M=1):
@@ -330,7 +330,7 @@ def low_rank(X, M=1):
return L.dot(V[:M])
```
To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm~\ref{Ch10:alg:hardimpute} --- by replacing
To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm 12.1 --- by replacing
the missing values with the column means of the non-missing entries. These are stored in
`Xbar` below after running `np.nanmean()` over the row axis.
We make a copy so that when we assign values to `Xhat` below we do not also overwrite the
@@ -360,11 +360,11 @@ a given element is `True` if the corresponding matrix element is missing. The no
because it allows us to access both the missing and non-missing entries. We store the mean of the squared non-missing elements in `mss0`.
We store the mean squared error of the non-missing elements of the old version of `Xhat` in `mssold` (which currently
agrees with `mss0`). We plan to store the mean squared error of the non-missing elements of the current version of `Xhat` in `mss`, and will then
iterate Step 2 of Algorithm~\ref{Ch10:alg:hardimpute} until the *relative error*, defined as
iterate Step 2 of Algorithm 12.1 until the *relative error*, defined as
`(mssold - mss) / mss0`, falls below `thresh = 1e-7`.
{Algorithm~\ref{Ch10:alg:hardimpute} tells us to iterate Step 2 until \eqref{Ch10:eq:mc6} is no longer decreasing. Determining whether \eqref{Ch10:eq:mc6} is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm~\ref{Ch10:alg:hardimpute} to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}
{Algorithm 12.1 tells us to iterate Step 2 until 12.14 is no longer decreasing. Determining whether 12.14 is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm 12.1 to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}
In Step 2(a) of Algorithm~\ref{Ch10:alg:hardimpute}, we approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we use `Xapp` to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:
In Step 2(a) of Algorithm 12.1, we approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we use `Xapp` to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:
```{python}
while rel_err > thresh:
@@ -393,7 +393,7 @@ np.corrcoef(Xapp[ismiss], X[ismiss])[0,1]
```
In this lab, we implemented Algorithm~\ref{Ch10:alg:hardimpute} ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.
In this lab, we implemented Algorithm 12.1 ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.
## Clustering
@@ -464,7 +464,7 @@ We have used the `n_init` argument to run the $K$-means with 20
initial cluster assignments (the default is 10). If a
value of `n_init` greater than one is used, then $K$-means
clustering will be performed using multiple random assignments in
Step 1 of Algorithm~\ref{Ch10:alg:km}, and the `KMeans()`
Step 1 of Algorithm 12.2, and the `KMeans()`
function will report only the best results. Here we compare using
`n_init=1` to `n_init=20`.
@@ -480,7 +480,7 @@ kmeans1.inertia_, kmeans20.inertia_
```
Note that `kmeans.inertia_` is the total within-cluster sum
of squares, which we seek to minimize by performing $K$-means
clustering \eqref{Ch10:eq:kmeans}.
clustering 12.17.
We *strongly* recommend always running $K$-means clustering with
a large value of `n_init`, such as 20 or 50, since otherwise an
@@ -846,7 +846,7 @@ results in four distinct clusters. It is easy to verify that the
resulting clusters are the same as the ones we obtained in
`comp_cut`.
We claimed earlier in Section~\ref{Ch10:subsec:hc} that
We claimed earlier in Section 12.4.2 that
$K$-means clustering and hierarchical clustering with the dendrogram
cut to obtain the same number of clusters can yield very different
results. How do these `NCI60` hierarchical clustering results compare