From Wikipedia, the free encyclopedia
In the social sciences, scaling is the process
of measuring or
ordering entities with respect to quantitative attributes or
traits. For example, a scaling technique might involve estimating
individuals' levels of extraversion, or the perceived quality of
products. Certain methods of scaling permit estimation of
magnitudes on a continuum, while other methods
provide only for relative ordering of the entities.
See level of measurement for an
account of qualitatively different kinds of measurement scales.
Comparative and
noncomparative scaling
With comparative scaling, the items are directly
compared with each other (example : Do you prefer Pepsi or Coke?). In noncomparative scaling each item is scaled
independently of the others (example : How do you feel about
Coke?).
Composite
measures
Composite measures of variables are created
by combining two or more separate empirical indicators into a single
measure. Composite measures measure complex concepts more
adequately than single indicators, extend the range of scores
available and are more efficient at handling multiple items.
In addition to scales, there are two other types of composite
measures. Indexes are similar to scales except multiple indicators
of a variable are combined into a single measure. The index of
consumer confidence, for example, is a combination of several
measures of consumer attitudes. A typology is similar to an index
except the variable is measured at the nominal level.
Indexes are constructed by accumulating scores assigned to
individual attributes, while scales are constructed through the
assignment of scores to patterns of attributes.
While indexes and scales provide measures of a single dimension, typologies are
often employed to examine the intersection of two or more
dimensions. Typologies are very useful analytical tools and can be
easily used as independent
variables, although since they are not unidimensional it is
difficult to use them as a dependent
variable.
Data
types
The type of information collected can influence scale
construction. Different types of information are measured in
different ways.
- Some data are measured at the nominal
level. That is, any numbers used are mere
labels : they express no mathematical properties. Examples are
SKU inventory codes and UPC bar codes.
- Some data are measured at the ordinal
level. Numbers indicate the relative position of
items, but not the magnitude of difference. An example is a
preference ranking.
- Some data are measured at the interval
level. Numbers indicate the magnitude of difference
between items, but there is no absolute zero point. Examples are
attitude scales and opinion scales.
- Some data are measured at the ratio
level. Numbers indicate magnitude of difference and
there is a fixed zero point. Ratios can be calculated. Examples
include: age, income, price, costs, sales revenue, sales volume,
and market share.
Scale construction
decisions
- What level of data is involved (nominal, ordinal, interval, or
ratio)?
- What will the results be used for?
- Should you use a scale, index, or typology?
- What types of statistical analysis would be useful?
- Should you use a comparative scale or a noncomparative
scale?
- How many scale divisions or categories should be used (1 to 10;
1 to 7; -3 to +3)?
- Should there be an odd or even number of divisions? (Odd gives
neutral center value; even forces respondents to take a non-neutral
position.)
- What should the nature and descriptiveness of the scale labels
be?
- What should the physical form or layout of the scale be?
(graphic, simple linear, vertical, horizontal)
- Should a response be forced or be left optional?
Comparative scaling
techniques
- Pairwise comparison scale - a
respondent is presented with two items at a time and asked to
select one (example : Do you prefer Pepsi or Coke?). This is
an ordinal level technique when a measurement model is not applied.
Krus and Kennedy (1977) elaborated the paired comparison scaling
within their domain-referenced model. The Bradley-Terry-Luce (BTL)
model (Bradley and Terry, 1952; Luce, 1959) can be applied in order
to derive measurements provided the data derived from paired
comparisons possess an appropriate structure. Thurstone's Law of comparative judgment
can also be applied in such contexts.
- Rasch model
scaling - respondents interact with items and comparisons are
inferred between items from the responses to obtain scale values.
Respondents are subsequently also scaled based on their responses
to items given the item scale values. The Rasch model has a close
relation to the BTL model.
- Rank-order scale - a respondent is presented
with several items simultaneously and asked to rank them
(example : Rate the following advertisements from 1 to 10.).
This is an ordinal level technique.
- Bogardus
social distance scale - measures the degree to which a
person is willing to associate with a class or type of people. It
asks how willing the respondent is to make various associations.
The results are reduced to a single score on a scale. There are
also non-comparative versions of this scale.
- Q-Sort scale - Up to 140 items are sorted into
groups based a rank-order procedure.
- Guttman
scale - This is a procedure to determine whether a set
of items can be rank-ordered on a unidimensional scale. It utilizes
the intensity structure among several indicators of a given
variable. Statements are listed in order of importance. The rating
is scaled by summing all responses until the first negative
response in the list. The Guttman scale is related to Rasch
measurement; specifically, Rasch models bring the Guttman approach
within a probabilistic framework.
- Constant sum scale - a respondent is given a
constant sum of money, script, credits, or points and asked to
allocate these to various items (example : If you had 100 Yen
to spend on food products, how much would you spend on product A,
on product B, on product C, etc.). This is an ordinal level
technique.
- Magnitude estimation scale - In a psychophysics
procedure invented by S. S. Stevens people simply assign
numbers to the dimension of judgment. The geometric mean of those
numbers usually produces a power law with a characteristic exponent. In
cross-modality matching instead of assigning numbers, people
manipulate another dimension, such as loudness or brightness to
match the items. Typically the exponent of the psychometric
function can be predicted from the magnitude estimation exponents
of each dimension.
Non-comparative scaling
techniques
- Continuous rating scale (also called the
graphic rating scale) - respondents rate items by placing a mark on
a line. The line is usually labeled at each end. There are
sometimes a series of numbers, called scale points, (say, from zero
to 100) under the line. Scoring and codification is difficult.
- Likert
scale - Respondents are asked to indicate the amount
of agreement or disagreement (from strongly agree to strongly
disagree) on a five- to nine-point scale. The same format is used
for multiple questions. This categorical scaling procedure can
easily be extended to a magnitude estimation procedure that uses
the full scale of numbers rather than verbal categories.
- Phrase completion scales -
Respondents are asked to complete a phrase on an 11-point response
scale in which 0 represents the absence of the theoretical
construct and 10 represents the theorized maximum amount of the
construct being measured. The same basic format is used for
multiple questions.
- Semantic
differential scale - Respondents are asked to rate on
a 7 point scale an item on various attributes. Each attribute
requires a scale with bipolar terminal labels.
- Stapel scale - This is a unipolar ten-point
rating scale. It ranges from +5 to -5 and has no neutral zero
point.
- Thurstone scale - This is a
scaling technique that incorporates the intensity structure among
indicators.
- Mathematically derived scale - Researchers
infer respondents’ evaluations mathematically. Two examples are multi dimensional scaling and conjoint analysis.
Scale
evaluation
Scales should be tested for reliability, generalizability, and validity. Generalizability is the ability
to make inferences from a sample to the population, given the scale
you have selected. Reliability is the extent to which a scale will
produce consistent results. Test-retest reliability checks how
similar the results are if the research is repeated under similar
circumstances. Alternative forms reliability checks how similar the
results are if the research is repeated using different forms of
the scale. Internal consistency reliability checks how well the
individual measures included in the scale are converted into a
composite measure.
Scales and indexes have to be validated. Internal validation
checks the relation between the individual measures included in the
scale, and the composite scale itself. External validation checks
the relation between the composite scale and other indicators of
the variable, indicators not included in the scale. Content
validation (also called face validity) checks how well the scale
measures what is supposed to measure. Criterion validation checks
how meaningful the scale criteria are relative to other possible
criteria. Construct validation checks what underlying construct is
being measured. There are three variants of construct
validity. They are convergent validity, discriminant validity, and nomological validity (Campbell and
Fiske, 1959; Krus and Ney, 1978). The coefficient of
reproducibility indicates how well the data from the individual
measures included in the scale can be reconstructed from the
composite scale.
See also
References
- Bradley, R.A. & Terry, M.E. (1952): Rank analysis of
incomplete block designs, I. the method of paired comparisons.
Biometrika, 39, 324-345.
- Campbell, D. T. & Fiske, D. W. (1959) Convergent and
discriminant validation by the multitrait-multimethod matrix.
Psychological Bulletin, 56, 81-105.
- Hodge, D. R. & Gillespie, D. F. (2003). Phrase Completions:
An alternative to Likert scales. Social Work Research, 27(1),
45-55.
- Hodge, D. R. & Gillespie, D. F. (2005). Phrase Completion
Scales. In K. Kempf-Leonard (Editor). Encyclopedia of Social
Measurement. (Vol. 3, pp. 53–62). San Diego: Academic
Press.
- Krus, D. J. & Kennedy, P. H. (1977) Normal scaling of
dominance matrices: The domain-referenced model. Educational
and Psychological Measurement, 37, 189-193 (Request reprint).
- Krus, D. J. & Ney, R. G. (1978) Convergent and discriminant
validity in item analysis. Educational and Psychological
Measurement, 38, 135-137 (Request reprint).
- Luce, R.D. (1959): Individual Choice Behaviours: A
Theoretical Analysis. New York: J. Wiley.
Lists of
related topics