fixing whitespace in Rmd so diff of errata is cleaner (#46)
* fixing whitespace in Rmd so diff of errata is cleaner * reapply kwargs fix
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
We include our usual imports seen in earlier labs.
|
||||
|
||||
@@ -20,7 +20,7 @@ import statsmodels.api as sm
|
||||
from ISLP import load_data
|
||||
|
||||
```
|
||||
|
||||
|
||||
We also collect the new imports
|
||||
needed for this lab.
|
||||
|
||||
@@ -52,7 +52,7 @@ true_mean = np.array([0.5]*50 + [0]*50)
|
||||
X += true_mean[None,:]
|
||||
|
||||
```
|
||||
|
||||
|
||||
To begin, we use `ttest_1samp()` from the
|
||||
`scipy.stats` module to test $H_{0}: \mu_1=0$, the null
|
||||
hypothesis that the first variable has mean zero.
|
||||
@@ -62,7 +62,7 @@ result = ttest_1samp(X[:,0], 0)
|
||||
result.pvalue
|
||||
|
||||
```
|
||||
|
||||
|
||||
The $p$-value comes out to 0.931, which is not low enough to
|
||||
reject the null hypothesis at level $\alpha=0.05$. In this case,
|
||||
$\mu_1=0.5$, so the null hypothesis is false. Therefore, we have made
|
||||
@@ -159,7 +159,7 @@ ax.legend()
|
||||
ax.axhline(0.05, c='k', ls='--');
|
||||
|
||||
```
|
||||
|
||||
|
||||
As discussed previously, even for moderate values of $m$ such as $50$,
|
||||
the FWER exceeds $0.05$ unless $\alpha$ is set to a very low value,
|
||||
such as $0.001$. Of course, the problem with setting $\alpha$ to such
|
||||
@@ -181,7 +181,7 @@ for i in range(5):
|
||||
fund_mini_pvals
|
||||
|
||||
```
|
||||
|
||||
|
||||
The $p$-values are low for Managers One and Three, and high for the
|
||||
other three managers. However, we cannot simply reject $H_{0,1}$ and
|
||||
$H_{0,3}$, since this would fail to account for the multiple testing
|
||||
@@ -211,8 +211,8 @@ reject, bonf = mult_test(fund_mini_pvals, method = "bonferroni")[:2]
|
||||
reject
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
The $p$-values `bonf` are simply the `fund_mini_pvalues` multiplied by 5 and truncated to be less than
|
||||
or equal to 1.
|
||||
|
||||
@@ -220,7 +220,7 @@ or equal to 1.
|
||||
bonf, np.minimum(fund_mini_pvals * 5, 1)
|
||||
|
||||
```
|
||||
|
||||
|
||||
Therefore, using Bonferroni’s method, we are able to reject the null hypothesis only for Manager
|
||||
One while controlling FWER at $0.05$.
|
||||
|
||||
@@ -232,8 +232,8 @@ hypotheses for Managers One and Three at a FWER of $0.05$.
|
||||
mult_test(fund_mini_pvals, method = "holm", alpha=0.05)[:2]
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
As discussed previously, Manager One seems to perform particularly
|
||||
well, whereas Manager Two has poor performance.
|
||||
|
||||
@@ -242,8 +242,8 @@ well, whereas Manager Two has poor performance.
|
||||
fund_mini.mean()
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
Is there evidence of a meaningful difference in performance between
|
||||
these two managers? We can check this by performing a paired $t$-test using the `ttest_rel()` function
|
||||
from `scipy.stats`:
|
||||
@@ -253,7 +253,7 @@ ttest_rel(fund_mini['Manager1'],
|
||||
fund_mini['Manager2']).pvalue
|
||||
|
||||
```
|
||||
|
||||
|
||||
The test results in a $p$-value of 0.038,
|
||||
suggesting a statistically significant difference.
|
||||
|
||||
@@ -278,8 +278,8 @@ tukey = pairwise_tukeyhsd(returns, managers)
|
||||
print(tukey.summary())
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
The `pairwise_tukeyhsd()` function provides confidence intervals
|
||||
for the difference between each pair of managers (`lower` and
|
||||
`upper`), as well as a $p$-value. All of these quantities have
|
||||
@@ -309,7 +309,7 @@ for i, manager in enumerate(Fund.columns):
|
||||
fund_pvalues[i] = ttest_1samp(Fund[manager], 0).pvalue
|
||||
|
||||
```
|
||||
|
||||
|
||||
There are far too many managers to consider trying to control the FWER.
|
||||
Instead, we focus on controlling the FDR: that is, the expected fraction of rejected null hypotheses that are actually false positives.
|
||||
The `multipletests()` function (abbreviated `mult_test()`) can be used to carry out the Benjamini--Hochberg procedure.
|
||||
@@ -319,7 +319,7 @@ fund_qvalues = mult_test(fund_pvalues, method = "fdr_bh")[1]
|
||||
fund_qvalues[:10]
|
||||
|
||||
```
|
||||
|
||||
|
||||
The *q-values* output by the
|
||||
Benjamini--Hochberg procedure can be interpreted as the smallest FDR
|
||||
threshold at which we would reject a particular null hypothesis. For
|
||||
@@ -346,8 +346,8 @@ null hypotheses!
|
||||
(fund_pvalues <= 0.1 / 2000).sum()
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
Figure~\ref{Ch12:fig:BonferroniBenjamini} displays the ordered
|
||||
$p$-values, $p_{(1)} \leq p_{(2)} \leq \cdots \leq p_{(2000)}$, for
|
||||
the `Fund` dataset, as well as the threshold for rejection by the
|
||||
@@ -376,7 +376,7 @@ else:
|
||||
sorted_set_ = []
|
||||
|
||||
```
|
||||
|
||||
|
||||
We now reproduce the middle panel of Figure~\ref{Ch12:fig:BonferroniBenjamini}.
|
||||
|
||||
```{python}
|
||||
@@ -391,7 +391,7 @@ ax.scatter(sorted_set_+1, sorted_[sorted_set_], c='r', s=20)
|
||||
ax.axline((0, 0), (1,q/m), c='k', ls='--', linewidth=3);
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
## A Re-Sampling Approach
|
||||
Here, we implement the re-sampling approach to hypothesis testing
|
||||
@@ -407,8 +407,8 @@ D['Y'] = pd.concat([Khan['ytrain'], Khan['ytest']])
|
||||
D['Y'].value_counts()
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
There are four classes of cancer. For each gene, we compare the mean
|
||||
expression in the second class (rhabdomyosarcoma) to the mean
|
||||
expression in the fourth class (Burkitt’s lymphoma). Performing a
|
||||
@@ -428,8 +428,8 @@ observedT, pvalue = ttest_ind(D2[gene_11],
|
||||
observedT, pvalue
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
However, this $p$-value relies on the assumption that under the null
|
||||
hypothesis of no difference between the two groups, the test statistic
|
||||
follows a $t$-distribution with $29+25-2=52$ degrees of freedom.
|
||||
@@ -457,8 +457,8 @@ for b in range(B):
|
||||
(np.abs(Tnull) < np.abs(observedT)).mean()
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
This fraction, 0.0398,
|
||||
is our re-sampling-based $p$-value.
|
||||
It is almost identical to the $p$-value of 0.0412 obtained using the theoretical null distribution.
|
||||
@@ -514,7 +514,7 @@ for j in range(m):
|
||||
Tnull_vals[j,b] = ttest_.statistic
|
||||
|
||||
```
|
||||
|
||||
|
||||
Next, we compute the number of rejected null hypotheses $R$, the
|
||||
estimated number of false positives $\widehat{V}$, and the estimated
|
||||
FDR, for a range of threshold values $c$ in
|
||||
@@ -532,7 +532,7 @@ for j in range(m):
|
||||
FDRs[j] = V / R
|
||||
|
||||
```
|
||||
|
||||
|
||||
Now, for any given FDR, we can find the genes that will be
|
||||
rejected. For example, with FDR controlled at 0.1, we reject 15 of the
|
||||
100 null hypotheses. On average, we would expect about one or two of
|
||||
@@ -548,7 +548,7 @@ the genes whose estimated FDR is less than 0.1.
|
||||
sorted(idx[np.abs(T_vals) >= cutoffs[FDRs < 0.1].min()])
|
||||
|
||||
```
|
||||
|
||||
|
||||
At an FDR threshold of 0.2, more genes are selected, at the cost of having a higher expected
|
||||
proportion of false discoveries.
|
||||
|
||||
@@ -556,7 +556,7 @@ proportion of false discoveries.
|
||||
sorted(idx[np.abs(T_vals) >= cutoffs[FDRs < 0.2].min()])
|
||||
|
||||
```
|
||||
|
||||
|
||||
The next line generates Figure~\ref{fig:labfdr}, which is similar
|
||||
to Figure~\ref{Ch12:fig-plugin-fdr},
|
||||
except that it is based on only a subset of the genes.
|
||||
|
||||
Reference in New Issue
Block a user