From Wikipedia, the free encyclopedia
Selection bias is a statistical
bias in which there is an error in choosing the individuals or
groups to take part in a scientific study.[1] It is
sometimes referred to as the selection effect. The
term "selection bias" most often refers to the distortion of a statistical analysis,
resulting from the method of collecting samples. If the selection
bias is not taken into account then any conclusions drawn may be
wrong.
Types
There are many types of possible selection bias, including:
Sampling
bias
Sampling bias
is systematic error due to a non-random sample of a population[2],
causing some members of the population to be less likely to be
included than others, resulting in a biased sample,
defined as a statistical sample of a population (or non-human
factors) in which all participants are not equally balanced or
objectively represented.[3]
It is mostly classified as a subtype of selection bias[4],
sometimes specifically termed sample selection bias[5][6], but
some classify it as a separate type of bias[7].
A distinction, albeit not universally accepted, of sampling bias
is that it undermines the external validity of a test (the
ability of its results to be generalized to the rest of the
population), while selection bias mainly addresses internal
validity for differences or similarities found in the sample at
hand. In this sense, errors occurring in the process of gathering
the sample or cohort cause sampling bias, while errors in any
process thereafter cause selection bias.
Examples include self-selection, pre-screening of trial
participants, discounting trial subjects/tests that did not run to
completion and migration bias by excluding subjects who have
recently moved into or out of the study area.
Time
interval
- Selecting end-points of a series. For example, to maximize a
claimed trend, you could start the time series at an unusually low
year, and end on a high one.
- Early termination of a trial at a time when its results support
a desired conclusion.
- A trial may be terminated early at an extreme value (often for
ethical reasons), but the
extreme value is likely to be reached by the variable with the
largest variance, even if
all variables have a similar mean.
As a result of that early termination, therefore, the means of
variables with larger variances are overestimated.
- Analyzing the lengths of intervals by selecting intervals that
occupy randomly chosen points in time or space, a process that
favors longer intervals. This is known as length time
bias.
Exposure
- Susceptibility bias
- Clinical susceptibility bias, when one disease
predisposes for a second disease, and the treatment for the first
disease erroneously appear to predispose to the second disease. For
example, postmenopausal syndrome gives a higher likelihood of also
developing endometrial cancer, so estrogens
given for the postmenopausal syndrome may receive a higher than
actual blame for causing endometrial cancer.[8]
- Protopathic bias, when a treatment for the first
symptoms of a disease or other outcome appear to cause the outcome.
It is a potential bias when there is a lag time from the first
symptoms and start of treatment before actual diagnosis. [8]
It can be mitigated by lagging, that is, exclusion of
exposures that occurred in a certain time period before
diagnosis.[9]
- Indication bias, a potential mix up between cause and
effect when exposure is dependent on indication. E.g. a treatment
is given to people in high risk of acquiring a disease, potentially
causing a preponderance of treated people among those acquiring the
disease. This may cause an erroneous appearance of the treatment
being a cause of the disease.[10]
Data
- Partitioning data with knowledge of the contents of the
partitions, and then analyzing them with tests designed for blindly
chosen partitions.
- Rejection of "bad" data on arbitrary grounds, instead of
according to previously stated or generally agreed criteria.
- Rejection of "outliers" on statistical grounds that fail to
take into account important information that could be derived from
"wild" observations [11]
Studies
- Selection of which studies to include in a meta-analysis (see
also combinatorial
meta-analysis)
- Performing repeated experiments and reporting only the most
favourable results, perhaps relabelling lab records of other
experiments as "calibration tests", "instrumentation errors" or
"preliminary surveys".
- Presenting the most significant result of a data dredge as if
it were a single experiment (which is logically the same as the
previous item, but is seen as much less dishonest).
Attrition
Attrition bias is a kind of selection bias caused by
attrition (loss of participants),[12]
discounting trial subjects/tests that did not run to completion. It
includes dropout, nonresponse (lower response rate),
withdrawal and protocol deviators. It gives
biased results where it is unequal in regard to exposure and/or
outcome. For example, in a test of a dieting program, the
researcher may simply reject everyone who drops out of the trial,
but most of those who drop out are those for whom it was not
working. Different loss of subjects in intervention and comparison
group may change the characteristics of these groups and outcomes
irrespective of the studied intervention.[12]
Avoidance
In the general case, selection biases cannot be overcome with
statistical analysis of existing data alone, though Heckman
correction may be used in special cases. An informal assessment
of the degree of selection bias can be made by examining
correlations between (exogenous) background variables and a
treatment indicator. However, in regression models, it is correlation between
unobserved determinants of the outcome and
unobserved determinants of selection into the sample which
bias estimates, and this correlation between unobservables cannot
be directly assessed by the observed determinants of treatment.[13]
Related
issues
Selection bias is closely related to:
- publication bias or reporting bias, the distortion produced in
community perception or meta-analyses by not publishing
uninteresting (usually negative) results, or results which go
against the experimenter's prejudices, a sponsor's interests, or
community expectations.
- confirmation bias, the distortion
produced by experiments that are designed to seek confirmatory
evidence instead of trying to disprove the hypothesis.
- exclusion bias, results from applying different criteria to
cases and controls in regards to participation eligibility for a
study/different variables serving as basis for exclusion.
See also
Notes
- ^
Dictionary of Cancer Terms
--> selection bias Retrieved on September 23, 2009.
- ^
Medical Dictionary - 'Sampling
Bias' Retrieved on September 23, 2009
- ^
TheFreeDictionary--> biased
sample Retrieved on 2009-09-23. Site in turn cites: Mosby's
Medical Dictionary, 8th edition.
- ^
Dictionary of Cancer Terms
--> Selection Bias Retrieved on September 23, 2009
- ^
The effects of sample
selection bias on racial differences in child abuse reporting
Ards S, Chung C, Myers SL Jr. Child Abuse Negl. 1999
Dec;23(12):1209; author reply 1211-5. PMID: 9504213
- ^
Sample Selection Bias
Correction Theory Corinna Cortes, Mehryar Mohri, Michael Riley,
and Afshin Rostamizadeh. New York University.
- ^
Page 262 in: Behavioral
Science. Board Review Series. By Barbara Fadem. ISBN:
0781782570, 9780781782579. 216 pages
- ^ a
b
Feinstein AR, Horwitz RI (November
1978). "A critique of the statistical evidence associating
estrogens with endometrial cancer". Cancer Res.
38 (11 Pt 2): 4001–5. PMID 698947.
- ^
Tamim H, Monfared AA, LeLorier J
(March 2007). "Application of lag-time into exposure definitions to
control for protopathic bias". Pharmacoepidemiol Drug Saf
16 (3): 250–8. doi:10.1002/pds.1360.
PMID 17245804.
- ^
Page 159 in: Matthew R. Weir (2005).
Hypertension (Key Diseases) (Acp Key Diseases Series).
Philadelphia, Pa: American College of Physicians. ISBN
1-930513-58-5.
- ^
Kruskal, W. (1960) Some notes
on wild observations, Technometrics.
- ^ a
b
Jüni P, Egger M. Empirical evidence of attrition bias in clinical
trials. Int J Epidemiol. 2005 Feb;34(1):87-8.
- ^
Heckman, J. (1979) Sample selection bias as a specification error.
Econometrica, 47, 153–61.