In statistics, Fisher's method^{[1]}^{[2]}, also known as Fisher's combined probability test, is a technique for data fusion or "metaanalysis" (analysis of analyses). It was developed by and named for Ronald Fisher. In its basic form, it is used to combine the results from several independent tests bearing upon the same overall hypothesis (H_{0}).
Contents 
Fisher's method combines extreme value probabilities from each test, commonly known as "pvalues", into one test statistic (X^{2}) using the formula
where p_{i} is the pvalue for the i^{th} hypothesis test. When the pvalues tend to be small, the test statistic X^{2} will be large, which suggests that the null hypotheses are not true for every test.
When all the null hypotheses are true, and the p_{i} (or their corresponding test statistics) are independent, X^{2} has a chisquare distribution with 2k degrees of freedom, where k is the number of tests being combined. This fact can be used to determine the pvalue for X^{2}.
The null distribution of X^{2} is a chisquare distribution for the following reason. Under the null hypothesis for test i, the pvalue p_{i} follows a uniform distribution on the interval [0,1]. The negative natural logarithm of a uniformly distributed value follows an exponential distribution. Scaling a value that follows an exponential distribution by two yields a quantity that follows a chisquare distribution with two degrees of freedom. Finally, the sum of k independent chisquare values, each with two degrees of freedom, follows a chisquare distribution with 2k degrees of freedom.
Fisher's method is applied to a collection of independent test statistics, typically based on separate studies having the same null hypothesis. The metaanalysis null hypothesis is that all of the separate null hypotheses are true. The metaanalysis alternative hypothesis is that at least one of the separate alternative hypotheses is true.
In some settings, it makes sense to consider the possibility of "heterogeneity," in which the null hypothesis holds in some studies but not in others, or where different alternative hypotheses may hold in different studies. A common reason for heterogeneity is that effect sizes may differ among populations. For example, consider a collection of medical studies looking at the risk of a high glucose diet for developing type II diabetes. Due to genetic or environmental factors, the risk associated with a given level of glucose consumption may be greater in some human populations than in others.
In other settings, rejecting the null hypothesis for one study implies that the alternative hypothesis holds for all studies. For example, consider several experiments designed to test a particular physical law. When there is no heterogeneity, any discrepancies among the results from separate studies or experiments are due to chance, possibly driven by differences in power, rather than reflecting differences in the true states of the populations being investigated.
In the case of a metaanalysis using twosided tests, it is possible to reject the metaanalysis null hypothesis even when the individual studies show strong effects in differing directions. In this case, we are rejecting the hypothesis that the null hypothesis is true in every study, but this does not imply that there is a uniform alternative hypothesis that holds across all studies. Thus, two sided metaanalysis is particularly sensitive to heterogeneity in the alternative hypotheses. One sided metaanalyses can detect heterogeneity in the effect magnitudes, but is insensitive to heterogeneity in the effect directions.
A closely related approach to Fisher's method is based on Zscores rather than pvalues. If we let Z_{i} = F^{1}(1−p_{i}), where F is the standard normal cumulative distribution function, then
Z = k ^{− 1 / 2}  ∑  Z_{i} 
i 
is a Zscore for the overall metaanalysis. This Zscore is appropriate for onesided righttailed pvalues; minor modifications can be made if twosided or lefttailed pvalues are being analyzed. This method is named for the sociologist Samuel A. Stouffer.
Since Fisher's method is based on the average of −log(p_{i}) values, and the Zscore method is based on the average of the Z_{i} values, the relationship between these two approaches follows from the relationship between z and −log(p) = −log(1−F(z)). For the normal distribution, these two values are not perfectly linearly related, but they follow a highly linear relationship over the range of Zvalues most often observed, from 1 to 5. As a result, the power of the Zscore method is nearly identical to the power of Fisher's method.
One advantage of the Zscore approach is that it is straightforward to introduce weights. If the i^{th} Zscore is weighted by w_{i}, then the metaanalysis Zscore is
which follows a standard normal distribution under the null hypothesis. While weighted versions of Fisher's statistic can be derived, the null distribution becomes a weighted sum of independent chisquare statistics, which is less convenient to work with.
In the case that the tests are not independent, the null distribution of X^{2} is more complicated. Dependence among the p_{i} does not affect the expected value of X^{2}, which continues to be 2k under the null hypothesis. If the covariance matrix of the log_{e}p_{i} is known, then it is possible to calculate the variance of X^{2}, and from this a normal approximation could be used to form a pvalue for X^{2}. Dependence among statistical tests is generally positive, which means that the pvalue of X^{2} is too small (anticonservative) if the dependency is not taken into account. Thus, if Fisher's method for independent tests is applied in a dependent setting, and the pvalue is not small enough to reject the null hypothesis, then that conclusion will continue to hold even if the dependence is not properly accounted for. However, if positive dependence is not accounted for, and the metaanalysis pvalue is found to be small, the evidence for the alternative hypothesis is generally overstated.
