The Full Wiki

Specificity (tests): Wikis

Advertisements

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

(Redirected to Sensitivity and specificity article)

From Wikipedia, the free encyclopedia

Sensitivity and specificity are statistical measures of the performance of a binary classification test. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified as such (e.g. the percentage of sick people who are identified as having the condition). Specificity measures the proportion of negatives which are correctly identified (e.g. the percentage of healthy people who are identified as not having the condition). These two measures are closely related to the concepts of type I and type II errors. A theoretical, optimal prediction can achieve 100% sensitivity (i.e. predict all people from the sick group as sick) and 100% specificity (i.e. not predict anyone from the healthy group as sick).

For any test, there is usually a trade-off between each measure. For example: in an airport security setting in which one is testing for potential threats to airline safety, we are generally willing to risk unnecessarily detaining passengers because of their belt buckles and keys (low specificity), in order to increase the chance of identifying nearly all objects that pose a threat to the aircraft and all those aboard (high sensitivity). This trade-off can be represented graphically using a ROC curve.

Contents

Definitions

Imagine a scenario where people are tested for a disease. The test outcome can be positive (sick) or negative (healthy), while the actual health status of the persons may be different. In that setting:

  • True positive: Sick people correctly diagnosed as sick
  • False positive: Healthy people incorrectly identified as sick
  • True negative: Healthy people correctly identified as healthy
  • False negative: Sick people incorrectly identified as healthy.
Advertisements

Specificity

To use an example of a detection dog used by law enforcement to track drugs, a dog may be trained specifically to find cocaine. Another dog may be trained to find cocaine, heroin and marijuana. The second dog is looking for so many smells it can get confused and sometimes picks out odours like shampoo, so it will begin to lead the law enforcement agents to innocent packages, thus it's less specific. Thus, a much larger number of packages will be "picked up" as suspicious by the second dog, leading to what is called false positives - test results labeled as positive (drugs) but that are really negative (shampoo).

In terms of specificity, the first dog doesn't miss any cocaine and does not pick out any shampoo, so it is very specific. If it makes a call it has a high chance of being right. The second dog finds more drugs (sensitivity), but is less specific for drugs because it also finds shampoo. It makes more calls, but has more chance of being wrong. Which dog you choose depends on what you are wanting to do.

{\rm specificity}=\frac{\rm number\ of\ True\ Negatives}{{\rm number\ of\ True\ Negatives}+{\rm number\ of\ False\ Positives}}

A specificity of 100% means that the test recognizes all actual negatives - for example, all healthy people will be recognized as healthy. Because 100% specificity means no positives are erroneously tagged, a positive result in a high specificity test is used to confirm the disease. The maximum can trivially be achieved by a test that claims everybody healthy regardless of the true condition. Therefore, the specificity alone does not tell us how well the test recognizes positive cases. We also need to know the sensitivity of the test.

A test with a high specificity has a low type I error rate.

Specificity is sometimes confused with the precision or the positive predictive value, both of which refer to the fraction of returned positives that are true positives. The distinction is critical when the classes are different sizes. A test with very high specificity can have very low precision if there are far more true negatives than true positives, and vice versa.

Sensitivity

Continuing with the example of the law enforcement tracking dog, an old dog might be retired because its nose becomes less sensitive to picking up the odor of drugs, and it begins to miss lots of drugs that it ordinarily would have sniffed out. This dog illustrates poor sensitivity, as it would give an "all clear" to not only those packages that do not contain any drugs (true negatives), but also to some packages that do contain drugs (false negatives).

{\rm sensitivity}=\frac{\rm number\ of\ True\ Positives}{{\rm number\ of\ True\ Positives}+{\rm number\ of\ False\ Negatives}}

A sensitivity of 100% means that the test recognizes all actual positives - for example, all sick people are recognized as being ill. Thus, in contrast to a high specificity test, negative results in a high sensitivity test are used to rule out the disease.

Sensitivity alone does not tell us how well the test predicts other classes (that is, about the negative cases). In the binary classification, as illustrated above, this is the corresponding specificity test.

Sensitivity is not the same as the positive predictive value (ratio of true positives to combined true and false positives), which is as much a statement about the proportion of actual positives in the population being tested as it is about the test.

The calculation of sensitivity does not take into account indeterminate test results. If a test cannot be repeated, the options are to exclude indeterminate samples from analysis (but the number of exclusions should be stated when quoting sensitivity), or, alternatively, indeterminate samples can be treated as false negatives (which gives the worst-case value for sensitivity and may therefore underestimate it).

A test with a high sensitivity has a low type II error rate.

Worked example

Relationships among terms
Condition
(as determined by "Gold standard")
Positive Negative
Test
outcome
Positive True Positive False Positive
(Type I error, P-value)
Positive predictive value
Negative False Negative
(Type II error)
True Negative Negative predictive value

Sensitivity

Specificity
A worked example
The fecal occult blood (FOB) screen test was used in 203 people to look for bowel cancer:
Patients with bowel cancer
(as confirmed on endoscopy)
Positive Negative
FOB
test
Positive TP = 2 FP = 18 → Positive predictive value
= TP / (TP + FP)
= 2 / (2 + 18)
= 2 / 20
= 10%
Negative FN = 1 TN = 182 → Negative predictive value
= TN / (FN + TN)
= 182 / (1 + 182)
= 182 / 183
≈ 99.5%

Sensitivity
= TP / (TP + FN)
= 2 / (2 + 1)
= 2 / 3
≈ 66.67%

Specificity
= TN / (FP + TN)
= 182 / (18 + 182)
= 182 / 200
= 91%

Related calculations

  • False positive rate (α) = FP / (FP + TN) = 18 / (18 + 182) = 9% = 1 − specificity
  • False negative rate (β) = FN / (TP + FN) = 1 / (2 + 1) = 33% = 1 − sensitivity
  • Power = sensitivity = 1 − β
  • Likelihood ratio positive = sensitivity / (1 − specificity) = 66.67% / (1 − 91%) = 7.4
  • Likelihood ratio negative = (1 − sensitivity) / specificity = (1 − 66.67%) / 91% = 0.37

Hence with large numbers of false positives and few false negatives, a positive FOB screen test is in itself poor at confirming cancer (PPV = 10%) and further investigations must be undertaken, it will, however, pick up 66.7% of all cancers (the sensitivity). However as a screening test, a negative result is very good at reassuring that a patient does not have cancer (NPV = 99.5%) and at this initial screen correctly identifies 91% of those who do not have cancer (the specificity).

Terminology in information retrieval

In information retrieval positive predictive value is called precision, and sensitivity is called recall.

The F-measure can be used as a single measure of performance of the test. The F-measure is the harmonic mean of precision and recall:

F = 2 \times \frac{{\rm precision} \times {\rm recall} }{ {\rm precision} + {\rm recall}}.

In the traditional language of statistical hypothesis testing, the sensitivity of a test is called the statistical power of the test, although the word power in that context has a more general usage that is not applicable in the present context. A sensitive test will have fewer Type II errors.

See also

Terminology and derivations
from a confusion matrix
true positive (TP)
eqv. with hit
true negative (TN)
eqv. with correct rejection
false positive (FP)
eqv. with false alarm, Type I error
false negative (FN)
eqv. with miss, Type II error
sensitivity or true positive rate (TPR)
eqv. with hit rate, recall
TPR = TP / P = TP / (TP + FN)
false positive rate (FPR)
eqv. with false alarm rate, fall-out
FPR = FP / N = FP / (FP + TN)
accuracy (ACC)
ACC = (TP + TN) / (P + N)
specificity (SPC) or True Negative Rate
SPC = TN / N = TN / (FP + TN) = 1 − FPR
positive predictive value (PPV)
eqv. with precision
PPV = TP / (TP + FP)
negative predictive value (NPV)
NPV = TN / (TN + FN)
false discovery rate (FDR)
FDR = FP / (FP + TP)
Matthews correlation coefficient (MCC)
MCC = (TPTN - FPFN)/ \sqrt{P N P' N'}

Source: Fawcett (2004).

References

External links


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message