Bootstrapping (statistics): Wikis

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

Updated live from Wikipedia, last check: May 22, 2013 19:18 UTC (40 seconds ago)

In statistics, bootstrapping is a modern, computer-intensive, general purpose approach to statistical inference, falling within a broader class of resampling methods.

Bootstrapping is the practice of estimating properties of an estimator (such as its variance) by measuring those properties when sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution of the observed data. In the case where a set of observations can be assumed to be from an independent and identically distributed population, this can be implemented by constructing a number of resamples of the observed dataset (and of equal size to the observed dataset), each of which is obtained by random sampling with replacement from the original dataset.

It may also be used for constructing hypothesis tests. It is often used as an alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors.

The advantage of bootstrapping over analytical methods is its great simplicity - it is straightforward to apply the bootstrap to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients.

The disadvantage of bootstrapping is that while (under some conditions) it is asymptotically consistent, it does not provide general finite-sample guarantees, and has a tendency to be overly optimistic.[citation needed] The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence of samples) where these would be more formally stated in other approaches.

Informal description

Bootstrapping allows one to gather many alternative versions of the single statistic that would ordinarily be calculated from one sample. For example, assume we are interested in the height of people worldwide. As we cannot measure all the population, we sample only a small part of it. From that sample only one value of a statistic can be obtained, i.e one mean, or one standard deviation etc., and hence we don't see how variable that statistic is. When using bootstrapping, we randomly extract a new sample of n heights out of the N sampled data, where each person can be selected at most t times. By doing this several times, we create a large number of datasets that we might have seen and compute the statistic for each of these datasets. Thus we get an estimate of the distribution of the statistic. The key to the strategy is to create alternative versions of data that "we might have seen".

Situations where bootstrapping procedures are useful

Ader et al.(2008) recommend use of bootstrapping procedures for any of the following situations:

• When the theoretical distribution of a statistic is complicated or unknown. Since the bootstrapping procedure is distribution-independent it provides an indirect method to assess the properties of the distribution underlying the sample and the parameters of interest that are derived from this distribution.
• When the sample size is insufficient for straightforward statistical inference. If the underlying distribution is well-known, bootstrapping provides a way to account for the distortions caused by the specific sample that may not be fully representative of the population.
• When power calculations have to be performed, and a small pilot sample is available. Most power and sample size calculations are heavily dependent on the standard deviation of the statistic of interest. If the estimate used is incorrect, the required sample size will also be wrong. One method to get an impression of the variation of the statistic is to use a small pilot sample and perform bootstrapping on it to get impression of the variance.

How many bootstrap samples are enough?

The number of bootstrap samples recommended in the literature has increased as available computing power has increased. If the results really matter, as many samples as is reasonable given available computing power and time should be used. Increasing the number of samples cannot increase the amount of information in the original data, it can only reduce the effects of random sampling errors which can arise from a bootstrap procedure itself.

Types of bootstrap scheme

In univariate problems, it is usually acceptable to resample the individual observations with replacement ("case resampling" below). However, in small samples, a parametric bootstrap approach might be preferred, and for some problems a smooth bootstrap will likely be preferred.

For regression problems, various other alternatives are available.

Case resampling

In regression problems, case resampling refers to the simple scheme of resampling individual cases - often rows of a data set. For regression problems, so long as the data set is fairly large, this simple scheme is often acceptable. However, the method is open to criticism[citation needed].

In regression problems, the explanatory variables are often fixed, or at least observed with more control than the response variable. Also, the range of the explanatory variables defines the information available from them. Therefore, to resample cases means that each bootstrap sample will lose some information. As such, alternative bootstrap procedures should be considered.

Smooth bootstrap

Under this scheme, a small amount of (usually normally distributed) zero-centered random noise is added on to each resampled observation. This is equivalent to sampling from a kernel density estimate of the data.

Parametric bootstrap

In this case a parametric model is fitted to the data, often by maximum likelihood, and samples of random numbers are drawn from this fitted model. Usually the sample drawn has the same sample size as the original data. Then the quantity, or estimate, of interest is calculated from these data. This sampling process is repeated many times as for other bootstrap methods. The use of a parametric model at the sampling stage of the bootstrap methodology leads to procedures which are different from those obtained by applying basic statistical theory to inference for the same model.

Resampling residuals

Another approach to bootstrapping in regression problems is to resample residuals. The method proceeds as follows.

1. Fit the model and retain the fitted values $\hat y_i$ and the residuals $\hat{\epsilon}_i = y_i - \hat{y}_i, (i = 1,\dots, n)$.
2. For each pair, (xi,yi), in which xi is the (possibly multivariate) explanatory variable, add a randomly resampled residual, $\hat{\epsilon}_j$, to the response variable yi. In other words create synthetic response variables $y^*_i = y_i + \hat{\epsilon}_j$ where j is selected randomly from the list $(1,\dots ,n)$ for every i.
3. Refit the model using the fictitious response variables $y^*_i$, and retain the quantities of interest (often the parameters, $\hat\mu^*_i$, estimated from the synthetic $y^*_i$).
4. Repeat steps 2 and 3 many, many times.

This scheme has the advantage that it retains the information in the explanatory variables. However, a question arises as to which residuals to resample. Raw residuals are one option, another is studentized residuals (in linear regression). Whilst there are arguments in favour of using studentized residuals, in practice it often makes little difference and it is easy to run both schemes and compare the results against each other.

Gaussian process regression bootstrap

When data are temporally correlated straightforward bootstrapping destroys the inherent correlations. This method uses Gaussian process regression to fit a probabilistic model from which replicates may then be drawn. Gaussian processes are methods from Bayesian non-parametric statistics but are here used to construct a parametric bootstrap approach, which implicitly allows the time-dependence of the data to be taken into account.

Wild bootstrap

Each residual is randomly multiplied by 1 or -1. This method assumes that the 'true' residual distribution is symmetric and can offer advantages over simple residual sampling for smaller sample sizes.

Choice of statistic - pivoting

In situations where it is essential to extract as much information as possible from a data-set, consideration needs to be given to exactly what estimate or statistic should be the subject of the bootstrapping. Suppose inference is required about the mean of some observations. Then two possibilities are:

• generate bootstrap samples of the sample mean to construct a confidence interval for the mean;
• generate bootstrap samples of the new statistic (mean divided by sample standard deviation), construct a confidence interval for this, then derive the final confidence interval for the mean by multiplying the end-points of the initial interval by the sample standard deviation of the original sample.

The results will be different, and simulations results suggest that the second approach is better. The approach may derive partly from the standard parametric approach for Normal distributions, but is rather more general. The idea is to try to make use of a pivotal quantity, or to find a derived statistic that is approximately pivotal. See also ancillary statistic.

Deriving confidence intervals from the bootstrap distribution

There are several ways of using the bootstrap distribution in order to calculate confidence intervals for the simulated statistics and no method is considered best for all problems. The trade-off is between simplicity and generality and the aim of different adjusted methods strive for better coverage.

The effect of bias and the lack of symmetry on bootstrap confidence intervals

Bias: When we compare the mean of the bootstrap distribution of a statistic with the corresponding statistic from the original sample, we are checking for bias. As long as the bootstrap distribution reveals no bias and its shape is symmetric, the percentile confidence interval is a good way to estimate. Bias in the bootstrap distribution will lead to bias in the confidence interval estimate. Some of the different methods try to correct for this bias.

Lack of symmetry in the bootstrap distribution raises another issue — how should the asymmetry of the distribution be reflected in the confidence interval?

Methods for bootstrap confidence intervals

Methods for constructing bootstrap confidence intervals include:

• percentile bootstrap - this one is the simplest. It is derived by using the 2.5 and the 97.5 percentiles of the bootstrap distribution as the limits of the 95% confidence interval. This method can be applied to any statistics and will work well in cases where the bootstrap distribution is symmetrical and centered on the observed statstic (see: Efron 1982). But when this doesn't apply, the percentile bootstrap will tend to be over optimistic (see Schenker 1985). Schenker notes that when working with small sample sizes (i.e: less than 50), the percentile confidence intervals for (for example) the variance statistic will be too narrow. So that with a sample of 20 points, 90% confidence interval will include the true variance only 78% of the time.
• basic bootstrap - this is a "turned around" version of the percentile bootstrap.
• studentized bootstrap
• bias-corrected bootstrap - adjusts for bias in the bootstrap distribution.
• The bootstrap bias-corrected and accelerated (BCa) bootstrap (also known as the accelerated bootstrap), by Efron (1987). This adjusts for both bias and skewness in the bootstrap distribution. This approach is accurate in a wide variety of settings, has reasonable computation requirements, and does not produce excessively wide intervals.[citation needed]

Example applications

Application to testing for mediation

Bootstrapping is becoming the most popular method of testing mediation[1] because it does not require the normality assumption to be met, and because it can be effectively utilized with smaller sample sizes (N < 20). However, mediation continues to be (perhaps inappropriately) most frequently determined using (1) the logic of Baron and Kenny[2] or (2) the Sobel test: see mediation.

Smoothed bootstrap

Newcomb's speed-of-light data are used in the book Bayesian Data Analysis by Gelman et al. and can be found online.[3] Some analysis of these data appears on the robust statistics page.

The data set contains two obvious outliers so that, as an estimate of location, the median is to be preferred over the mean. Bootstrapping is a method often employed for estimating confidence intervals for medians. However, the median is a discrete statistic, and this fact shows up in the bootstrap distribution.

In order to smooth over the discreteness of the median, we can add a small amount of N(0,σ2) random noise to each bootstrap sample. We choose $\sigma = 1/\sqrt n$ for sample size n.

Histograms of the bootstrap distribution and the smooth bootstrap distribution appear below. The bootstrap distribution is very jagged because there are only a small number of values that the median can take. The smoothed bootstrap distribution overcomes this jaggedness.

Although the bootstrap distribution of the median looks ugly and intuitively wrong, confidence intervals from it are not bad in this example. The simple 95% percentile interval is (26, 28.5) for the simple bootstrap and (25.98, 28.46) for the smoothed bootstrap.

Relation to other approaches to inference

Relationship to other resampling methods

The bootstrap is distinguished from :

• the jackknife procedure, used to estimate biases of sample statistics and to estimate variances, and
• cross-validation, in which the parameters (e.g., regression weights, factor loadings) that are estimated in one subsample are applied to another subsample.

For more details see bootstrap resampling.

Bootstrap aggregating (bagging) is a meta-algorithm based on averaging the results of multiple bootstrap samples.

U-statistics

In situations where an obvious statistic can be devised to measure a required characteristic using only a small number, r, of data items, a corresponding statistic based on the entire sample can be formulated. Given an r-sample statistic, one can create an n-sample statistic by something similar to bootstrapping (taking the average of the statistic over all subsamples of size r). This procedure is known to have certain good properties and the result is a U-statistic. The sample mean and sample variance are of this form, for r=1 and r=2.

Homophony

Due to homophony, bootstrapping is also the source of many jokes, as exemplified by a cover of SAS'Discute magazine, the journal of ENSAE, the leading French school in statistics and economics, in January 2010: "Boobs trap, how it can bias your entire life".

Notes

1. ^ Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, and Computers, 36, 717–731 Macros for SAS and SPSS
2. ^ Mediation by David A. Kenny
3. ^ Data from examples in Bayesian Data Analysis

• Adèr, H. J., Mellenbergh G. J., & Hand, D. J. (2008). Advising on research methods: A consultant's companion. Huizen, The Netherlands: Johannes van Kessel Publishing. ISBN 9789079418015
• Chernick, Michael R. (1999). Bootstrap Methods, A practitioner's guide. Wiley Series in Probability and Statistics.
• Davison, A. C.; Hinkley, D. Bootstrap Methods and their Application. (1997). Bootstrap Methods and their Application. Cambridge: Cambridge Series in Statistical and Probabilistic Mathematics.  software.
• Davison, A. C.; Hinkley, D. Bootstrap Methods and their Application. (2006). Bootstrap Methods and their Application (8th ed.). Cambridge: Cambridge Series in Statistical and Probabilistic Mathematics.
• Diaconis, P. & Efron, B. (May 1983). "Computer-intensive methods in statistics". Scientific American: 116–130.
• Efron, B. (1979). "Bootstrap Methods: Another Look at the Jackknife". The Annals of Statistics 7 (1): 1–26. doi:10.1214/aos/1176344552.
• Efron, B. (1981). "Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods". Biometrika 68: 589–599. doi:10.1093/biomet/68.3.589.
• Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. 38. Society of Industrial and Applied Mathematics CBMS-NSF Monographs.
• Efron, B. (1987). "Better Bootstrap Confidence Intervals". Journal of the American Statistical Association 82: 171–185. JSTOR 2289144.
• Efron, B.; Tibshirani, R. (1993). An Introduction to the Bootstrap. Boca Raton, FL: Chapman & Hall/CRC.  software
• Edgington, E. S. (1995). Randomization tests. New York: M. Dekker.
• Kirk, P.; Stumpf, M.P.H. (2009). "Gaussian process regression bootstrapping: Exploring the effects of uncertainty in time course data". Bioinformatics 25: 1300–1306.  10.1093/bioinformatics/btp139
• Hesterberg, T. C., D. S. Moore, S. Monaghan, A. Clipson, and R. Epstein (2005): Bootstrap Methods and Permutation Tests, software.
• Mooney, C Z & Duval, R D (1993). Bootstrapping. A Nonparametric Approach to Statistical Inference. Sage University Paper series on Quantitative Applications in the Social Sciences, 07-095. Newbury Park, CA: Sage. ISBN 080395381X
• Simon, J. L. (1997): Resampling: The New Statistics, Resampling Stats, ISBN 0534217206
• Gillies, D. (2008) Lecture notes for Intelligent Data and Probabilistic Inference (Bayesian networks), Lecture 14: Sampling and re-sampling. pg 4.
• Bluvband, Z., Peshes, L. (1993) Bootstrap Technology for RAM analysis
• Bluvband, Z., Peshes, L. (1995) Bootstrap Information Technology and Design of Intelligent Inference Machine