# Sample (statistics): Wikis

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

# Encyclopedia

In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size. Samples are collected and statistics are calculated from the samples so that one can make inferences or extrapolations from the sample to the population. This process of collecting information from a sample is referred to as sampling.

The best way to avoid a biased or unrepresentative sample is to select a random sample, also known as a probability sample. A random sample is defined as a sample where the probability that any individual member from the population being selected as part of the sample is exactly the same as any other individual member of the population. Several types of random samples are simple random samples, systematic samples, stratified random samples, and cluster random samples.

A sample that is not random is called a nonrandom sample or a nonprobability sample. Some examples of nonrandom samples are convenience samples, judgment samples, purposive samples, quota samples, snowball samples, and quadrature nodes in quasi-Monte Carlo methods.

## Mathematical description of random sample

In mathematical terms, given a random variable X with distribution F, a random sample of length n =1,2,3,... is a set of n independent, identically distributed (iid) random variables with distribution F. 

A sample concretely represents n experiments in which we measure the same quantity. For example, if X represents the height of an individual and we measure n individuals, Xi will be the height of the i-th individual. Note that a sample of random variables (i.e. a set of measurable functions) must not be confused with the realizations of these variables (which are the values that these random variables take). In other words, Xi is a function representing the measurement at the i-th experiment and xi = Xi(ω) is the value we actually get when making the measurement.

The concept of a sample thus includes the process of how the data are obtained (that is, the random variables). This is necessary so that mathematical statements can be made about the sample and statistics computed from it, such as the sample mean and covariance.

## Notes

1. ^ Samuel S. Wilks, Mathematical Statistics, John Wiley, 1962, Section 8.1

crystal

# Study guide

Up to date as of January 14, 2010

### From Wikiversity Please help develop this page This page was created, but so far, little content has been added. Everyone is invited to help expand and create educational content for Wikiversity. If you need help learning how to add content, see the editing tutorial and the MediaWiki syntax reference. To help you get started with content, we have automatically added references below to other Wikimedia Foundation projects. This will help you find materials such as information, media and quotations on which to base the development of "Sample (statistics)" as an educational resource. However, please do not simply copy-and-paste large chunks from other projects. You can also use the links in the blue box to help you classify this page by subject, educational level and resource type. Run a search on Sample (statistics) at Wikipedia. Search Wikimedia Commons for images, sounds and other media related to: Sample (statistics) Search for Sample (statistics) on the following projects: Lost on Wikiversity? Please help by choosing project boxes to classify this resource by: subject educational level resource type Run a search on Sample (statistics) at Wikipedia.

# Simple English

In statistics a sample is part of a population. The sample is carefully chosen. It should represent the whole population fairly, without bias. The reason samples are needed is that populations may be so large that counting all the individuals may not be possible or practical.

Therefore, solving a problem in statistics usually starts with sampling. Sampling is about choosing which data to take for later analysis. As an example, suppose the pollution of a lake should be analysed for a study. Depending on where the samples of water were taken, the studies can have different results. As a general rule, samples need to be random. This means the chance or probability of selecting one individual is the same as the chance of selecting any other individual.

In practice, random samples are always taken by means of a well-defined procedure. A procedure is a set of rules, a sequence of steps written down on paper and followed to the letter. Even so, some bias may remain in the sample. Consider the problem of desiging a sample to predict the result of an election poll. All known methods have their problems, and the results of an election are often different from predictions based on a sample. If you collect opinions by using telephones, or by meeting people in the street, the sample always has bias. Therefore, in cases like this a completely neutral sample is never possible. In such cases a statistician will think about how to measure the amount of bias, and there are ways to estimate this.

A similar situation occurs when scientists measure a physical property, say the weight of a piece of metal, or the speed of light. If we weigh an object with sensitive equipment we will get minutely different results. No system of measurement is ever perfect. We get a series of estimates, each one being a measurement. These are samples, with a certain degree of error. Statistics is designed to describe error, and carry out analysis on this kind of data.

## Stratified sampling

If a population has obvious sub-populations, then each of the sub-populations needs to be sampled. This is called stratified sampling.

Another type of stratified sample deals with variation. Here larger samples are taken from the more variable sub-populations so that the summary statistics such as the means and standard deviations, are more reliable.

## References

1. Lohr, Sharon L. 1999. Sampling: design and analysis. Duxbury.
2. Kish, Leslie 1995. Survey sampling. Wiley, N.Y. ISBN 0-471-10949-5
3. Stuart, Alan 1962. Basic ideas of scientific sampling. Hafner, New York.