Fix refs again (#76)

* Ch2->Ch02 * fixed latex refs again, somehow crept back in * fixed the page refs, formats synced * unsynced * executed notebook besides 10 * warnings for lasso * allow saving of output in notebooks * Ch10 executed
2026-02-04 17:40:52 -08:00
parent 3d9af7c4b0
commit 6bf6160a3d
25 changed files with 21872 additions and 3191 deletions
--- a/Ch12-unsup-lab.Rmd
+++ b/Ch12-unsup-lab.Rmd
@@ -164,7 +164,7 @@ for k in range(pcaUS.components_.shape[1]):
            USArrests.columns[k])

 ```
-Notice that this figure is a reflection of Figure~\ref{Ch10:fig:USArrests:obs} through the $y$-axis. Recall that the
+Notice that this figure is a reflection of Figure 12.1 through the $y$-axis. Recall that the
 principal components are only unique up to a sign change, so we can
 reproduce that figure by flipping the
 signs of the second set of scores and loadings.
@@ -241,7 +241,7 @@ ax.set_xticks(ticks)
 fig

 ```
-The result is similar to that shown in Figure~\ref{Ch10:fig:USArrests:scree}.  Note
+The result is similar to that shown in Figure 12.3.  Note
 that the method `cumsum()`   computes the cumulative sum of
 the elements of a numeric vector. For instance:

@@ -253,15 +253,15 @@ np.cumsum(a)
 ## Matrix Completion

 We now re-create the analysis carried out on the `USArrests` data in
-Section~\ref{Ch10:sec:princ-comp-with}.
+Section 12.3.

-We saw  in Section~\ref{ch10:sec2.2}  that solving the optimization
-problem~(\ref{Ch10:eq:mc2})   on a centered data matrix $\bf X$ is
+We saw  in Section 12.2.2  that solving the optimization
+problem~(12.6)   on a centered data matrix $\bf X$ is
 equivalent to computing the first $M$ principal
 components of the data.  We use our scaled
 and centered `USArrests` data as $\bf X$ below. The *singular value decomposition* 
 (SVD)  is a general algorithm for solving
-(\ref{Ch10:eq:mc2}). 
+(12.6). 

 ```{python}
 X = USArrests_scaled
@@ -319,9 +319,9 @@ Here the array `r_idx`
 contains 20 integers from 0 to 49; this represents the states (rows of `X`) that are selected to contain missing values. And `c_idx` contains
 20 integers from 0 to 3, representing the features (columns in `X`) that contain the missing values for each of the selected states.

-We now write some code to implement Algorithm~\ref{Ch10:alg:hardimpute}. 
+We now write some code to implement Algorithm 12.1. 
 We first write a  function that takes in a matrix, and returns an approximation to the matrix using the `svd()` function.
-This will be needed in Step 2 of Algorithm~\ref{Ch10:alg:hardimpute}.
+This will be needed in Step 2 of Algorithm 12.1.

 ```{python}
 def low_rank(X, M=1):
@@ -330,7 +330,7 @@ def low_rank(X, M=1):
    return L.dot(V[:M])

 ```
-To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm~\ref{Ch10:alg:hardimpute} ---  by replacing
+To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm 12.1 ---  by replacing
 the missing values with the column means of the non-missing entries. These are stored in
 `Xbar` below after running `np.nanmean()` over the row axis.
 We make a copy so that when we assign values to `Xhat` below we do not also overwrite the
@@ -360,11 +360,11 @@ a given element is `True` if the corresponding matrix element is missing. The no
 because it allows us to access both the missing and non-missing entries. We store the mean of the squared non-missing elements in `mss0`.
 We store the mean squared error  of the non-missing elements  of the old version of `Xhat` in `mssold` (which currently
 agrees with `mss0`). We plan to store the mean squared error of the non-missing elements of the current version of `Xhat` in `mss`, and will then
-iterate Step 2 of  Algorithm~\ref{Ch10:alg:hardimpute}  until the *relative error*, defined as
+iterate Step 2 of  Algorithm 12.1  until the *relative error*, defined as
 `(mssold - mss) / mss0`, falls below `thresh = 1e-7`.
- {Algorithm~\ref{Ch10:alg:hardimpute} tells us to iterate Step 2 until \eqref{Ch10:eq:mc6} is no longer decreasing. Determining whether \eqref{Ch10:eq:mc6}  is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm~\ref{Ch10:alg:hardimpute} to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}
+ {Algorithm 12.1 tells us to iterate Step 2 until 12.14 is no longer decreasing. Determining whether 12.14  is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm 12.1 to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}

-In Step 2(a) of Algorithm~\ref{Ch10:alg:hardimpute}, we  approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we  use `Xapp`  to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:
+In Step 2(a) of Algorithm 12.1, we  approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we  use `Xapp`  to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:

 ```{python}
 while rel_err > thresh:
@@ -393,7 +393,7 @@ np.corrcoef(Xapp[ismiss], X[ismiss])[0,1]
 ```


-In this lab, we implemented  Algorithm~\ref{Ch10:alg:hardimpute}  ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.
+In this lab, we implemented  Algorithm 12.1  ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.


 ## Clustering
@@ -464,7 +464,7 @@ We have used the `n_init` argument to run the $K$-means with 20
 initial cluster assignments (the default is 10). If a
 value of `n_init` greater than one is used, then $K$-means
 clustering will be performed using multiple random assignments in
-Step 1 of  Algorithm~\ref{Ch10:alg:km}, and the `KMeans()` 
+Step 1 of  Algorithm 12.2, and the `KMeans()` 
 function will report only the best results. Here we compare using
 `n_init=1` to `n_init=20`.

@@ -480,7 +480,7 @@ kmeans1.inertia_, kmeans20.inertia_
 ```
 Note that `kmeans.inertia_` is the total within-cluster sum
 of squares, which we seek to minimize by performing $K$-means
-clustering \eqref{Ch10:eq:kmeans}. 
+clustering 12.17. 

 We *strongly* recommend always running $K$-means clustering with
 a large value of `n_init`, such as 20 or 50, since otherwise an
@@ -846,7 +846,7 @@ results in four distinct clusters. It is easy to verify that the
 resulting clusters are the same as the ones we obtained in
 `comp_cut`.

-We claimed earlier in Section~\ref{Ch10:subsec:hc} that
+We claimed earlier in Section 12.4.2 that
 $K$-means clustering and hierarchical clustering with the dendrogram
 cut to obtain the same number of clusters can yield very different
 results.  How do these `NCI60` hierarchical clustering results compare