1. Motivation

It is important to control for false discoveries when multiple hypotheses are tested. Under the Neyman-Pearson formulation, each hypothesis test involves a decision rule with false positive rate (FPR) less than α (e.g. α=0.05). However, if there are m α-level independent tests, the probability of at least one false discovery could be as high as min(mα,1). Multiple hypothesis testing correction involves adjusting the significance level(s) to control error related to false discoveries. Some of the material presented is based on UC Berkeley’s Data102 course.

2. P-values

Consider H0:θ=θ0 versus H1:θ=θ1. Let Pθ0(x) be the distribution of data XRp under the null, and let S={X(i)}mi=1 be the observed dataset. Additionally, denote S0 as the unobserved dataset drawn from Pθ0(x).

If the statistic T(S0) has tail cumulative distribution function (CDF) F(t)=Pθ0(T(S0)>t), then the p-value is defined as the random variable P=F(T(S)). The graphical illustration of the density of T (short for T(S)) is shown below.

png

An important fact about p-value P is that it has Unif(0,1) distribution under the null. A random variable has Unif(0,1) distribution if and only if it has CDF F(p)=p for p[0,1]. We now show P has CDF F(p)=p.

Pθ0(Pp)=Pθ0(F(T)p)=Pθ0(T>F1(p))=F(F1(p))=p

where the first equality is by definition of P. For the second equality, it is helpful to recall that for the 1-to-1 function F(), F:Tu and F1:uT. Then from diagram above, notice that F(T) is decreasing with respect to T. The third equality is from definition of F().

3. Bonferroni Correction

Let V be the number of false positives. Then the probability of at least one false discovery V>0 among m tests (not necessarily independent) is defined as the family-wise error rate (FWER). Bonferroni correction adjusts the significance level to α/m. This controls the FWER to be at most α. If there are m0m true null hypotheses, then

FWER=P(m0i=1{Piαm})m0i=1P(Piαm)=m0αmmαm=α,

where the first inequality is from union bound (Boole’s inequality). In practice, the observed p-value pi is adjusted according to

padji=min{mpi,1}

for i=1,,m. Then the i-th null hypothesis is rejected if padjiα. Let us simulate 10 p-values from unif(0,0.3) and implement Bonferroni corrected p-values.

from statsmodels.stats.multitest import multipletests
import numpy as np

# simulation
n = 10
pvals = np.random.uniform(0, 0.3, size = n)

# implementation of Bonferroni correction
def bonferroni(pvals):
    m = len(pvals)
    return sorted([round(m * p, 7) if m * p <= 1 else 1 for p in pvals])

pAdj1 = bonferroni(pvals)
_, pAdj2, _, _ = multipletests(pvals, method = "bonferroni", returnsorted = True)

print(pAdj1)
print(pAdj2)
[0.0677386, 0.1100421, 0.5649019, 0.7202574, 0.7341066, 1, 1, 1, 1, 1]
[0.06773857 0.11004206 0.56490188 0.72025737 0.7341066  1.
 1.         1.         1.         1.        ]

4. Benjamini-Hochberg

A major criticism of Bonferroni correction is that it is too conservative - false positives are avoided at the expense of false negatives. The Benjamini-Hochberg (BH) procedure instead controls the FDR to avoid more false negatives. The FDR among m tests is defined as

FDR=E[VR]

where R is number of rejections among m tests. BH procedure adjusts the p-value cutoff by allowing looser p-value cutoffs provided given earlier discoveries. This is graphically illustrated below.

png

The BH procedure is as follows

  1. For each independent test, compute the p-value pi. Sort the p-value from smallest to largest p(1)<<p(m).
  2. Select R=max{i:p(i)<iαm}.
  3. Reject null hypotheses with p-value p(R).

By construction, this procedure rejects exactly R hypotheses, and

p(R)<Rαmp(R)mR<α

Let m0m be the number true null hypotheses. Let Xi=1(pip(R)) be whether hypothesis i is rejected or not. Since piunif(0,1), Xibernoulli(p(R)). Under the assumption that tests are independent, V=m0i=1Xibinomial(m0,p(R)). Then by definition

FDR=E(VR)=1RE[V]=m0p(R)Rmp(R)R<α.

In practice, the observed p-value pi is adjusted according to

padji=minj=i,,m(min(mjp(j),1))

for i=1,,m. The i-th null hypothesis is rejected if padjiα. This results in exactly R rejected null hypotheses because if iR, then padji<α, because

padji=minj=i,,m(min(mjp(j),1))mRp(R)<α.

The first inequality is from definition of minimum over a set that includes mp(R)R, and the second inequality is by construction of mRp(R). If i>R, then padji>α because p(R) is defined as the last p-value in sorted p-values with p(i)<iαm. Let us simulate 10 p-values from unif(0,0.3) and implement BH corrected p-values.

# simulation
n = 10
pvals = np.random.uniform(0, 0.3, size = n)

# implementation of benjamini-hochberg correction
def benjamini_hochberg(pvals):
    m = len(pvals)
    pAdj = [None] * m
    pvals.sort()
    pTransform = [m * pvals[i] / (i + 1) for i in range(m)]
    for i in range(m):
        minPTransform = min(pTransform[i:m])
        if minPTransform > 1:
            pAdj[i] = 1
        else:
            pAdj[i] = minPTransform
    return sorted([round(p, 7) for p in pAdj])

pAdj1 = benjamini_hochberg(pvals)
_, pAdj2, _, _ = multipletests(pvals, method = "fdr_bh", returnsorted = True)

print(pAdj1)
print(pAdj2)
[0.1788644, 0.1838926, 0.1940673, 0.1940673, 0.1940673, 0.1940673, 0.2294974, 0.2294974, 0.2294974, 0.2294974]
[0.17886437 0.18389261 0.19406731 0.19406731 0.19406731 0.19406731
 0.22949745 0.22949745 0.22949745 0.22949745]