Fix refs again (#76)
* Ch2->Ch02 * fixed latex refs again, somehow crept back in * fixed the page refs, formats synced * unsynced * executed notebook besides 10 * warnings for lasso * allow saving of output in notebooks * Ch10 executed
This commit is contained in:
@@ -164,7 +164,7 @@ for k in range(pcaUS.components_.shape[1]):
|
||||
USArrests.columns[k])
|
||||
|
||||
```
|
||||
Notice that this figure is a reflection of Figure~\ref{Ch10:fig:USArrests:obs} through the $y$-axis. Recall that the
|
||||
Notice that this figure is a reflection of Figure 12.1 through the $y$-axis. Recall that the
|
||||
principal components are only unique up to a sign change, so we can
|
||||
reproduce that figure by flipping the
|
||||
signs of the second set of scores and loadings.
|
||||
@@ -241,7 +241,7 @@ ax.set_xticks(ticks)
|
||||
fig
|
||||
|
||||
```
|
||||
The result is similar to that shown in Figure~\ref{Ch10:fig:USArrests:scree}. Note
|
||||
The result is similar to that shown in Figure 12.3. Note
|
||||
that the method `cumsum()` computes the cumulative sum of
|
||||
the elements of a numeric vector. For instance:
|
||||
|
||||
@@ -253,15 +253,15 @@ np.cumsum(a)
|
||||
## Matrix Completion
|
||||
|
||||
We now re-create the analysis carried out on the `USArrests` data in
|
||||
Section~\ref{Ch10:sec:princ-comp-with}.
|
||||
Section 12.3.
|
||||
|
||||
We saw in Section~\ref{ch10:sec2.2} that solving the optimization
|
||||
problem~(\ref{Ch10:eq:mc2}) on a centered data matrix $\bf X$ is
|
||||
We saw in Section 12.2.2 that solving the optimization
|
||||
problem~(12.6) on a centered data matrix $\bf X$ is
|
||||
equivalent to computing the first $M$ principal
|
||||
components of the data. We use our scaled
|
||||
and centered `USArrests` data as $\bf X$ below. The *singular value decomposition*
|
||||
(SVD) is a general algorithm for solving
|
||||
(\ref{Ch10:eq:mc2}).
|
||||
(12.6).
|
||||
|
||||
```{python}
|
||||
X = USArrests_scaled
|
||||
@@ -319,9 +319,9 @@ Here the array `r_idx`
|
||||
contains 20 integers from 0 to 49; this represents the states (rows of `X`) that are selected to contain missing values. And `c_idx` contains
|
||||
20 integers from 0 to 3, representing the features (columns in `X`) that contain the missing values for each of the selected states.
|
||||
|
||||
We now write some code to implement Algorithm~\ref{Ch10:alg:hardimpute}.
|
||||
We now write some code to implement Algorithm 12.1.
|
||||
We first write a function that takes in a matrix, and returns an approximation to the matrix using the `svd()` function.
|
||||
This will be needed in Step 2 of Algorithm~\ref{Ch10:alg:hardimpute}.
|
||||
This will be needed in Step 2 of Algorithm 12.1.
|
||||
|
||||
```{python}
|
||||
def low_rank(X, M=1):
|
||||
@@ -330,7 +330,7 @@ def low_rank(X, M=1):
|
||||
return L.dot(V[:M])
|
||||
|
||||
```
|
||||
To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm~\ref{Ch10:alg:hardimpute} --- by replacing
|
||||
To conduct Step 1 of the algorithm, we initialize `Xhat` --- this is $\tilde{\bf X}$ in Algorithm 12.1 --- by replacing
|
||||
the missing values with the column means of the non-missing entries. These are stored in
|
||||
`Xbar` below after running `np.nanmean()` over the row axis.
|
||||
We make a copy so that when we assign values to `Xhat` below we do not also overwrite the
|
||||
@@ -360,11 +360,11 @@ a given element is `True` if the corresponding matrix element is missing. The no
|
||||
because it allows us to access both the missing and non-missing entries. We store the mean of the squared non-missing elements in `mss0`.
|
||||
We store the mean squared error of the non-missing elements of the old version of `Xhat` in `mssold` (which currently
|
||||
agrees with `mss0`). We plan to store the mean squared error of the non-missing elements of the current version of `Xhat` in `mss`, and will then
|
||||
iterate Step 2 of Algorithm~\ref{Ch10:alg:hardimpute} until the *relative error*, defined as
|
||||
iterate Step 2 of Algorithm 12.1 until the *relative error*, defined as
|
||||
`(mssold - mss) / mss0`, falls below `thresh = 1e-7`.
|
||||
{Algorithm~\ref{Ch10:alg:hardimpute} tells us to iterate Step 2 until \eqref{Ch10:eq:mc6} is no longer decreasing. Determining whether \eqref{Ch10:eq:mc6} is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm~\ref{Ch10:alg:hardimpute} to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}
|
||||
{Algorithm 12.1 tells us to iterate Step 2 until 12.14 is no longer decreasing. Determining whether 12.14 is decreasing requires us only to keep track of `mssold - mss`. However, in practice, we keep track of `(mssold - mss) / mss0` instead: this makes it so that the number of iterations required for Algorithm 12.1 to converge does not depend on whether we multiplied the raw data $\bf X$ by a constant factor.}
|
||||
|
||||
In Step 2(a) of Algorithm~\ref{Ch10:alg:hardimpute}, we approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we use `Xapp` to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:
|
||||
In Step 2(a) of Algorithm 12.1, we approximate `Xhat` using `low_rank()`; we call this `Xapp`. In Step 2(b), we use `Xapp` to update the estimates for elements in `Xhat` that are missing in `Xna`. Finally, in Step 2(c), we compute the relative error. These three steps are contained in the following `while` loop:
|
||||
|
||||
```{python}
|
||||
while rel_err > thresh:
|
||||
@@ -393,7 +393,7 @@ np.corrcoef(Xapp[ismiss], X[ismiss])[0,1]
|
||||
```
|
||||
|
||||
|
||||
In this lab, we implemented Algorithm~\ref{Ch10:alg:hardimpute} ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.
|
||||
In this lab, we implemented Algorithm 12.1 ourselves for didactic purposes. However, a reader who wishes to apply matrix completion to their data might look to more specialized `Python`{} implementations.
|
||||
|
||||
|
||||
## Clustering
|
||||
@@ -464,7 +464,7 @@ We have used the `n_init` argument to run the $K$-means with 20
|
||||
initial cluster assignments (the default is 10). If a
|
||||
value of `n_init` greater than one is used, then $K$-means
|
||||
clustering will be performed using multiple random assignments in
|
||||
Step 1 of Algorithm~\ref{Ch10:alg:km}, and the `KMeans()`
|
||||
Step 1 of Algorithm 12.2, and the `KMeans()`
|
||||
function will report only the best results. Here we compare using
|
||||
`n_init=1` to `n_init=20`.
|
||||
|
||||
@@ -480,7 +480,7 @@ kmeans1.inertia_, kmeans20.inertia_
|
||||
```
|
||||
Note that `kmeans.inertia_` is the total within-cluster sum
|
||||
of squares, which we seek to minimize by performing $K$-means
|
||||
clustering \eqref{Ch10:eq:kmeans}.
|
||||
clustering 12.17.
|
||||
|
||||
We *strongly* recommend always running $K$-means clustering with
|
||||
a large value of `n_init`, such as 20 or 50, since otherwise an
|
||||
@@ -846,7 +846,7 @@ results in four distinct clusters. It is easy to verify that the
|
||||
resulting clusters are the same as the ones we obtained in
|
||||
`comp_cut`.
|
||||
|
||||
We claimed earlier in Section~\ref{Ch10:subsec:hc} that
|
||||
We claimed earlier in Section 12.4.2 that
|
||||
$K$-means clustering and hierarchical clustering with the dendrogram
|
||||
cut to obtain the same number of clusters can yield very different
|
||||
results. How do these `NCI60` hierarchical clustering results compare
|
||||
|
||||
Reference in New Issue
Block a user