The Full Wiki

Standard error (statistics): Wikis

Advertisements
  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

For a value that is sampled with an unbiased normally distributed error, the above depicts the proportion of samples that would fall between 0, 1, 2, and 3 standard errors above and below the actual value.

The standard error of a method of measurement or estimation is the standard deviation of the sampling distribution associated with the estimation method.[1] The term may also be used to refer to an estimate of that standard deviation, derived from a particular sample used to compute the estimate.

For example, the sample mean is the usual estimator of a population mean. However, different samples drawn from that same population would in general have different values of the sample mean. The standard error of the mean (i.e., of using the sample mean as a method of estimating the population mean) is the standard deviation of those sample means over all possible samples (of a given size) drawn from the population. Secondarily, the standard error of the mean can refer to an estimate of that standard deviation, computed from the sample of data being analysed at the time.

The term standard error is derived from the fact that, as long as the estimator is unbiased, the standard deviation of the error (the difference between the estimate and the true value) is the same as the standard deviation of the estimates themselves; this is true since the standard deviation of the difference between the random variable and its expected value is equal to the standard deviation of a random variable itself.

In many practical applications, the true value of the standard deviation is usually unknown. As a result, the term standard error is often used to refer to an estimate of this unknown quantity. In such cases it is important to be clear about what has been done and to attempt to take proper account of the fact that the standard error is only an estimate. Unfortunately, this is not often possible and it may then be better to use an approach that avoids using a standard error, for example by using maximum likelihood or a more formal approach to deriving confidence intervals. One well-known case where a proper allowance can be made arises where the Student's t-distribution is used to provide a confidence interval for an estimated mean or difference of means. In other cases, the standard error may usefully be used to provide an indication of the size of the uncertainty, but its formal or semi-formal use to provide confidence intervals or tests should be avoided unless the sample size is at least moderately large. Here "large enough" would depend on the particular quantities being analysed.

Contents

Standard error of the mean

The standard error of the mean (SEM) is the standard deviation of the sample mean estimate of a population mean. (It can also be viewed as the standard deviation of the error in the sample mean relative to the true mean, since the sample mean is an unbiased estimator.) SEM is usually estimated by the sample estimate of the population standard deviation (sample standard deviation) divided by the square root of the sample size (assuming statistical independence of the values in the sample):

SE_\bar{x}\ = \frac{s}{\sqrt{n}}

where

s is the sample standard deviation (i.e., the sample based estimate of the standard deviation of the population), and
n is the size (number of observations) of the sample.

This estimate may be compared with the formula for the true standard deviation of the mean:

SD_\bar{x}\ = \frac{\sigma}{\sqrt{n}}

where

σ is the standard deviation of the population.

Note 1: Standard error may also be defined as the standard deviation of the residual error term.[2][3]

Note 2: Both the standard error and the standard deviation of small samples tend to systematically underestimate the population standard error and deviations: the standard error of the mean is a biased estimator of the population standard error. With n = 2 the underestimate is about 25%, but for n = 6 the underestimate is only 5%. Gurland and Tripathi (1971)[4] provide a correction and equation for this effect. Sokal and Rohlf (1981)[5] give an equation of the correction factor for small samples of n < 20. See unbiased estimation of standard deviation for further discussion.

A practical result: Decreasing the uncertainty in your mean value estimate by a factor of two requires that you acquire four times as many observations in your sample. Worse, decreasing standard error by a factor of ten requires a hundred times as many observations.

Assumptions and usage

If the data are assumed to be normally distributed, quantiles of the normal distribution and the sample mean and standard error can be used to calculate approximate confidence intervals for the mean. The following expressions can be used to calculate the upper and lower 95% confidence limits, where \bar{x} is equal to the sample mean, SE is equal to the standard error for the sample mean, and 1.96 is the .975 quantile of the normal distribution:

Upper 95% Limit = \bar{x} + (S_E\cdot 1.96) ,
Lower 95% Limit = \bar{x} - (S_E\cdot 1.96) .

In particular, the standard error of a sample statistic (such as sample mean) is the estimated standard deviation of the error in the process by which it was generated. In other words, it is the standard deviation of the sampling distribution of the sample statistic. The notation for standard error can be any one of SE, SEM (for standard error of measurement or mean), or SE.

Standard errors provide simple measures of uncertainty in a value and are often used because:

Correction for finite population

The formula given above for the standard error assumes that the sample size is much smaller than the population size, so that the population can be considered to be effectively infinite in size. When the sampling fraction is large (approximately at 5% or more), the estimate of the error must be corrected using a “finite population correction” [6]

 \text{FPC} = \sqrt{\frac{N-n}{N-1}}

to account for the added precision gained by sampling close to a larger percentage of the population. The effect of the FPC is that the error becomes zero when the sample size n is equal to the population size N.

Correction for correlation in the sample

Expected error in the mean of A for a sample of n data points with sample bias coefficient ρ. The unbiased standard error plots as the ρ=0 line with log-log slope -½.

If values of the measured quantity A are not statistically independent but have been obtained from known locations in parameter space x, an unbiased estimate of error in the mean may be obtained by multiplying the standard error above by the factor f:

f= \sqrt{\frac{1+(n-1) \rho}{1-\rho}} ,

where the sample bias coefficient ρ is the average of the autocorrelation-coefficient ρij value (a quantity between -1 and 1) for all sample point pairs. See unbiased estimation of standard deviation for more discussion.

Relative Standard Error

The relative standard error (RSE) is simply the standard error divided by the mean and expressed as a percentage. For example, consider two surveys of household income that both result in a sample mean of $50,000. If one survey has a standard error of $10,000 and the other has a standard error of $5,000, then the relative standard errors are 20% and 10% respectively. Intuitively, the survey with the lower standard error would seem to be more reliable since there is less dispersion around the mean. In fact, data organizations often set reliability standards that their data must meet before publication. For example, the U.S. National Center for Health Statistics typically does not report an estimate if the relative standard error exceeds 30%. (NCHS also typically requires at least 30 observations for an estimate to be reported.)

See also

References

  1. ^ Everitt, B.S. (2003) The Cambridge Dictionary of Statistics, CUP. ISBN 0-521-81099-x
  2. ^ Kenney, J. and Keeping, E.S. (1963) Mathematics of Statistics, van Nostrand, p. 187
  3. ^ Zwillinger D. (1995), Standard Mathematical Tables and Formulae, Chapman&Hall/CRC. ISBN 0849324793 p. 626
  4. ^ Gurland, J; Tripathi RC (1971). "A simple approximation for unbiased estimation of the standard deviation". American Statistician 25 (4): 30–32.  
  5. ^ Sokal and Rohlf (1981) Biometry: Principles and Practice of Statistics in Biological Research , 2nd ed.. ISBN 0716712547 , p 53
  6. ^ Isserlis (1981, equation (1))
Advertisements

Study guide

Up to date as of January 14, 2010

From Wikiversity


The standard error (SE), in statistics, is a measure of spread for a sample of data. Mathematically, the SE is the standard deviation divided by the square root of the sample size.

See also


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message