Multiple Hypothesis Testing
1. Motivation
It is important to control for false discoveries when multiple hypotheses are tested. Under the Neyman-Pearson formulation, each hypothesis test involves a decision rule with false positive rate (FPR) less than α (e.g. α=0.05). However, if there are m α-level independent tests, the probability of at least one false discovery could be as high as min(mα,1). Multiple hypothesis testing correction involves adjusting the significance level(s) to control error related to false discoveries. Some of the material presented is based on UC Berkeley’s Data102 course.
2. P-values
Consider H0:θ=θ0 versus H1:θ=θ1. Let Pθ0(x) be the distribution of data X∈Rp under the null, and let S={X(i)}mi=1 be the observed dataset. Additionally, denote S0 as the unobserved dataset drawn from Pθ0(x).
If the statistic T(S0) has tail cumulative distribution function (CDF) F(t)=Pθ0(T(S0)>t), then the p-value is defined as the random variable P=F(T(S)). The graphical illustration of the density of T (short for T(S)) is shown below.
An important fact about p-value P is that it has Unif(0,1) distribution under the null. A random variable has Unif(0,1) distribution if and only if it has CDF F(p)=p for p∈[0,1]. We now show P has CDF F(p)=p.
Pθ0(P≤p)=Pθ0(F(T)≤p)=Pθ0(T>F−1(p))=F(F−1(p))=pwhere the first equality is by definition of P. For the second equality, it is helpful to recall that for the 1-to-1 function F(⋅), F:T→u and F−1:u→T. Then from diagram above, notice that F(T) is decreasing with respect to T. The third equality is from definition of F(⋅).
3. Bonferroni Correction
Let V be the number of false positives. Then the probability of at least one false discovery V>0 among m tests (not necessarily independent) is defined as the family-wise error rate (FWER). Bonferroni correction adjusts the significance level to α/m. This controls the FWER to be at most α. If there are m0≤m true null hypotheses, then
FWER=P(m0⋃i=1{Pi≤αm})≤m0∑i=1P(Pi≤αm)=m0αm≤mαm=α,where the first inequality is from union bound (Boole’s inequality). In practice, the observed p-value pi is adjusted according to
padji=min{mpi,1}for i=1,…,m. Then the i-th null hypothesis is rejected if padji≤α. Let us simulate 10 p-values from unif(0,0.3) and implement Bonferroni corrected p-values.
from statsmodels.stats.multitest import multipletests
import numpy as np
# simulation
n = 10
pvals = np.random.uniform(0, 0.3, size = n)
# implementation of Bonferroni correction
def bonferroni(pvals):
m = len(pvals)
return sorted([round(m * p, 7) if m * p <= 1 else 1 for p in pvals])
pAdj1 = bonferroni(pvals)
_, pAdj2, _, _ = multipletests(pvals, method = "bonferroni", returnsorted = True)
print(pAdj1)
print(pAdj2)
[0.0677386, 0.1100421, 0.5649019, 0.7202574, 0.7341066, 1, 1, 1, 1, 1]
[0.06773857 0.11004206 0.56490188 0.72025737 0.7341066 1.
1. 1. 1. 1. ]
4. Benjamini-Hochberg
A major criticism of Bonferroni correction is that it is too conservative - false positives are avoided at the expense of false negatives. The Benjamini-Hochberg (BH) procedure instead controls the FDR to avoid more false negatives. The FDR among m tests is defined as
FDR=E[VR]where R is number of rejections among m tests. BH procedure adjusts the p-value cutoff by allowing looser p-value cutoffs provided given earlier discoveries. This is graphically illustrated below.
The BH procedure is as follows
- For each independent test, compute the p-value pi. Sort the p-value from smallest to largest p(1)<⋯<p(m).
- Select R=max{i:p(i)<iαm}.
- Reject null hypotheses with p-value ≤p(R).
By construction, this procedure rejects exactly R hypotheses, and
p(R)<Rαm⇒p(R)mR<αLet m0≤m be the number true null hypotheses. Let Xi=1(pi≤p(R)) be whether hypothesis i is rejected or not. Since pi∼unif(0,1), Xi∼bernoulli(p(R)). Under the assumption that tests are independent, V=∑m0i=1Xi∼binomial(m0,p(R)). Then by definition
FDR=E(VR)=1RE[V]=m0p(R)R≤mp(R)R<α.In practice, the observed p-value pi is adjusted according to
padji=minj=i,…,m(min(mjp(j),1))for i=1,…,m. The i-th null hypothesis is rejected if padji≤α. This results in exactly R rejected null hypotheses because if i≤R, then padji<α, because
padji=minj=i,…,m(min(mjp(j),1))≤mRp(R)<α.The first inequality is from definition of minimum over a set that includes mp(R)R, and the second inequality is by construction of mRp(R). If i>R, then padji>α because p(R) is defined as the last p-value in sorted p-values with p(i)<iαm. Let us simulate 10 p-values from unif(0,0.3) and implement BH corrected p-values.
# simulation
n = 10
pvals = np.random.uniform(0, 0.3, size = n)
# implementation of benjamini-hochberg correction
def benjamini_hochberg(pvals):
m = len(pvals)
pAdj = [None] * m
pvals.sort()
pTransform = [m * pvals[i] / (i + 1) for i in range(m)]
for i in range(m):
minPTransform = min(pTransform[i:m])
if minPTransform > 1:
pAdj[i] = 1
else:
pAdj[i] = minPTransform
return sorted([round(p, 7) for p in pAdj])
pAdj1 = benjamini_hochberg(pvals)
_, pAdj2, _, _ = multipletests(pvals, method = "fdr_bh", returnsorted = True)
print(pAdj1)
print(pAdj2)
[0.1788644, 0.1838926, 0.1940673, 0.1940673, 0.1940673, 0.1940673, 0.2294974, 0.2294974, 0.2294974, 0.2294974]
[0.17886437 0.18389261 0.19406731 0.19406731 0.19406731 0.19406731
0.22949745 0.22949745 0.22949745 0.22949745]