Factor analysis is a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors. The observed variables are modeled as linear combinations of the factors, plus "error" terms. The information gained about the interdependencies can be used later to reduce the set of variables in a dataset. Factor analysis originated in psychometrics, and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data.
Factor analysis is related to principal component analysis (PCA) but not identical. Because PCA performs a variancemaximizing rotation of the variable space, it takes into account all variability in the variables. In contrast, factor analysis estimates how much of the variability is due to common factors ("communality"). The two methods become essentially equivalent if the error terms in the factor analysis model (the variability not explained by common factors, see below) can be assumed to all have the same variance.
Contents 
Suppose we have a set of p observable random variables, with means .
Suppose for some unknown constants l_{ij} and k unobserved random variables F_{j}, where and , where k < p, we have
Here ε_{i} is independently distributed error terms with zero mean and finite variance, which may not be the same for all of them. Let Var(ε_{i}) = ψ_{i}, so that we have
In matrix terms, we have
Also we will impose the following assumptions on F.
Any solution for the above set of equations following the constraints for F is defined as the factors, and L as the loading matrix.
Suppose Cov(x) = Σ. Then note that from the conditions just imposed on F, we have
or
or
Note that for any orthogonal matrix Q if we set L = LQ and F = Q^{T}F, the criteria for being factors and factor loadings still hold. Hence a set of factors and factor loadings is identical only up to orthogonal transformations.
The following example is a simplification for expository purposes, and should not be taken to be realistic. Suppose a psychologist proposes a theory that there are two kinds of intelligence, "verbal intelligence" and "mathematical intelligence", neither of which is directly observed. Evidence for the theory is sought in the examination scores from each of 10 different academic fields of 1000 students. If each student is chosen randomly from a large population, then each student's 10 scores are random variables. The psychologist's theory may say that for each of the 10 academic fields, the score averaged over the group of all students who share some common pair of values for verbal and mathematical "intelligences" is some constant times their level of verbal intelligence plus another constant times their level of mathematical intelligence, i.e., it is a linear combination of those two "factors". The numbers for a particular subject, by which the two kinds of intelligence are multiplied to obtain the expected score, are posited by the theory to be the same for all intelligence level pairs, and are called "factor loadings" for this subject. For example, the theory may hold that the average student's aptitude in the field of amphibiology is
The numbers 10 and 6 are the factor loadings associated with amphibiology. Other academic subjects may have different factor loadings.
Two students having identical degrees of verbal intelligence and identical degrees of mathematical intelligence may have different aptitudes in amphibiology because individual aptitudes differ from average aptitudes. That difference is called the "error" — a statistical term that means the amount by which an individual differs from what is average for his or her levels of intelligence (see errors and residuals in statistics).
The observable data that go into factor analysis would be 10 scores of each of the 1000 students, a total of 10,000 numbers. The factor loadings and levels of the two kinds of intelligence of each student must be inferred from the data.
In the example above, for i = 1, ..., 1,000 the ith student's scores are
where
In matrix notation, we have
where
Observe that by doubling the scale on which "verbal intelligence"—the first component in each column of F—is measured, and simultaneously halving the factor loadings for verbal intelligence makes no difference to the model. Thus, no generality is lost by assuming that the standard deviation of verbal intelligence is 1. Likewise for mathematical intelligence. Moreover, for similar reasons, no generality is lost by assuming the two factors are uncorrelated with each other. The "errors" ε are taken to be independent of each other. The variances of the "errors" associated with the 10 different subjects are not assumed to be equal.
Note that, since any rotation of a solution is also a solution, this makes interpreting the factors difficult. See disadvantages below. In this particular example, if we do not know beforehand that the two types of intelligence are uncorrelated, then we cannot interpret the two factors as the two different types of intelligence. Even if they are uncorrelated, we cannot tell which factor corresponds to verbal intelligence and which corresponds to mathematical intelligence without an outside argument.
The values of the loadings L, the averages μ, and the variances of the "errors" ε must be estimated given the observed data X.
Exploratory factor analysis (EFA) is used to uncover the underlying structure of a relatively large set of variables. The researcher's a priori assumption is that any indicator may be associated with any factor. This is the most common form of factor analysis. There is no prior theory and one uses factor loadings to intuit the factor structure of the data.
Confirmatory factor analysis (CFA) seeks to determine if the number of factors and the loadings of measured (indicator) variables on them conform to what is expected on the basis of preestablished theory. Indicator variables are selected on the basis of prior theory and factor analysis is used to see if they load as predicted on the expected number of factors. The researcher's à priori assumption is that each factor (the number and labels of which may be specified à priori) is associated with a specified subset of indicator variables. A minimum requirement of confirmatory factor analysis is that one hypothesizes beforehand the number of factors in the model, but usually also the researcher will posit expectations about which variables will load on which factors. The researcher seeks to determine, for instance, if measures created to represent a latent variable really belong together.
Principal component analysis (PCA): The most common form of factor analysis, PCA seeks a linear combination of variables such that the maximum variance is extracted from the variables. It then removes this variance and seeks a second linear combination which explains the maximum proportion of the remaining variance, and so on. This is called the principal axis method and results in orthogonal (uncorrelated) factors.
Canonical factor analysis , also called Rao's canonical factoring, is a different method of computing the same model as PCA, which uses the principal axis method. CFA seeks factors which have the highest canonical correlation with the observed variables. CFA is unaffected by arbitrary rescaling of the data.
Common factor analysis, also called principal factor analysis (PFA) or principal axis factoring (PAF), seeks the least number of factors which can account for the common variance (correlation) of a set of variables.
Image factoring: based on the correlation matrix of predicted variables rather than actual variables, where each variable is predicted from the others using multiple regression. Alpha factoring: based on maximizing the reliability of factors, assuming variables are randomly sampled from a universe of variables. All other methods assume cases to be sampled and variables fixed.
Factor loadings: The factor loadings, also called component loadings in PCA, are the correlation coefficients between the variables (rows) and factors (columns). Analogous to Pearson's r, the squared factor loading is the percent of variance in that indicator variable explained by the factor. To get the percent of variance in all the variables accounted for by each factor, add the sum of the squared factor loadings for that factor (column) and divide by the number of variables. (Note the number of variables equals the sum of their variances as the variance of a standardized variable is 1.) This is the same as dividing the factor's eigenvalue by the number of variables.
Interpreting factor loadings: By one rule of thumb in confirmatory factor analysis, loadings should be .7 or higher to confirm that independent variables identified a priori are represented by a particular factor, on the rationale that the .7 level corresponds to about half of the variance in the indicator being explained by the factor. However, the .7 standard is a high one and reallife data may well not meet this criterion, which is why some researchers, particularly for exploratory purposes, will use a lower level such as .4 for the central factor and .25 for other factors call loadings above .6 "high" and those below .4 "low". In any event, factor loadings must be interpreted in the light of theory, not by arbitrary cutoff levels.
In oblique rotation, one gets both a pattern matrix and a structure matrix. The structure matrix is simply the factor loading matrix as in orthogonal rotation, representing the variance in a measured variable explained by a factor on both a unique and common contributions basis. The pattern matrix, in contrast, contains coefficients which just represent unique contributions. The more factors, the lower the pattern coefficients as a rule since there will be more common contributions to variance explained. For oblique rotation, the researcher looks at both the structure and pattern coefficients when attributing a label to a factor.
Communality (h2): The sum of the squared factor loadings for all factors for a given variable (row) is the variance in that variable accounted for by all the factors, and this is called the communality. The communality measures the percent of variance in a given variable explained by all the factors jointly and may be interpreted as the reliability of the indicator. Spurious solutions: If the communality exceeds 1.0, there is a spurious solution, which may reflect too small a sample or the researcher has too many or too few factors.
Uniqueness of a variable: 1h2. That is, uniqueness is the variability of a variable minus its communality.
Eigenvalues:/Characteristic roots: The eigenvalue for a given factor measures the variance in all the variables which is accounted for by that factor. The ratio of eigenvalues is the ratio of explanatory importance of the factors with respect to the variables. If a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be ignored as redundant with more important factors. Eigenvalues measure the amount of variation in the total sample accounted for by each factor.
Extraction sums of squared loadings: Initial eigenvalues and eigenvalues after extraction (listed by SPSS as "Extraction Sums of Squared Loadings") are the same for PCA extraction, but for other extraction methods, eigenvalues after extraction will be lower than their initial counterparts. SPSS also prints "Rotation Sums of Squared Loadings" and even for PCA, these eigenvalues will differ from initial and extraction eigenvalues, though their total will be the same.
Factor scores: Also called component scores in PCA, factor scores are the scores of each case (row) on each factor (column). To compute the factor score for a given case for a given factor, one takes the case's standardized score on each variable, multiplies by the corresponding factor loading of the variable for the given factor, and sums these products. Computing factor scores allows one to look for factor outliers. Also, factor scores may be used as variables in subsequent modeling.
Comprehensibility: Though not a strictly mathematical criterion, there is much to be said for limiting the number of factors to those whose dimension of meaning is readily comprehensible. Often this is the first two or three. Using one or more of the methods below, the researcher determines an appropriate range of solutions to investigate. For instance, the Kaiser criterion may suggest three factors and the scree test may suggest 5, so the researcher may request 3, 4, and 5factor solutions and select the solution which generates the most comprehensible factor structure.
Kaiser criterion: The Kaiser rule is to drop all components with eigenvalues under 1.0. The Kaiser criterion is the default in SPSS and most computer programs but is not recommended when used as the sole cutoff criterion for estimated the number of factors.
Scree plot: The Cattell scree test plots the components as the X axis and the corresponding eigenvalues as the Yaxis. As one moves to the right, toward later components, the eigenvalues drop. When the drop ceases and the curve makes an elbow toward less steep decline, Cattell's scree test says to drop all further components after the one starting the elbow. This rule is sometimes criticised for being amenable to researchercontrolled "fudging." That is, as picking the "elbow" can be subjective because the curve has multiple elbows or is a smooth curve, the researcher may be tempted to set the cutoff at the number of factors desired by his or her research agenda.
Variance explained criteria: Some researchers simply use the rule of keeping enough factors to account for 90%; (sometimes 80%) of the variation. Where the researcher's goal emphasizes parsimony (explaining variance with as few factors as possible), the criterion could be as low as 50%
Before dropping a factor below one's cutoff, however, the researcher should check its correlation with the dependent variable. A very small factor can have a large correlation with the dependent variable, in which case it should not be dropped.
Rotation serves to make the output more understandable and is usually necessary to facilitate the interpretation of factors.
Varimax rotation is an orthogonal rotation of the factor axes to maximize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which has the effect of differentiating the original variables by extracted factor. Each factor will tend to have either large or small loadings of any particular variable. A varimax solution yields results which make it as easy as possible to identify each variable with a single factor. This is the most common rotation option.
Quartimax rotation is an orthogonal alternative which minimizes the number of factors needed to explain each variable. This type of rotation often generates a general factor on which most variables are loaded to a high or medium degree. Such a factor structure is usually not helpful to the research purpose.
Equimax rotation is a compromise between Varimax and Quartimax criteria.
Direct oblimin rotation is the standard method when one wishes a nonorthogonal (oblique) solution – that is, one in which the factors are allowed to be correlated. This will result in higher eigenvalues but diminished interpretability of the factors. See below.
Promax rotation is an alternative nonorthogonal (oblique) rotation method which is computationally faster than the direct oblimin method and therefore is sometimes used for very large datasets.
Charles Spearman spearheaded the use of factor analysis in the field of psychology and is sometimes credited with the invention of factor analysis. He discovered that school children's scores on a wide variety of seemingly unrelated subjects were positively correlated, which led him to postulate that a general mental ability, or g, underlies and shapes human cognitive performance. His postulate now enjoys broad support in the field of intelligence research, where it is known as the g theory.
Raymond Cattell expanded on Spearman’s idea of a twofactor theory of intelligence after performing his own tests and factor analysis. He used a multifactor theory to explain intelligence. Cattell’s theory addressed alternate factors in intellectual development, including motivation and psychology. Cattell also developed several mathematical methods for adjusting psychometric graphs, such as his "scree" test and similarity coefficients. His research led to the development of his theory of fluid and crystallized intelligence, as well as his 16 Personality Factors theory of personality. Cattell was a strong advocate of factor analysis and psychometrics. He believed that all theory should be derived from research, which supports the continued use of empirical observation and objective testing to study human intelligence.
Factor analysis is used to identify "factors" that explain a variety of results on different tests. For example, intelligence research found that people who get a high score on a test of verbal ability are also good on other tests that require verbal abilities. Researchers explained this by using factor analysis to isolate one factor, often called crystallized intelligence or verbal intelligence, that represents the degree to which someone is able to solve problems involving verbal skills.
Factor analysis in psychology is most often associated with intelligence research. However, it also has been used to find factors in a broad range of domains such as personality, attitudes, beliefs, etc. It is linked to psychometrics, as it can assess the validity of an instrument by finding if the instrument indeed measures the postulated factors.
The basic steps are:
The data collection stage is usually done by marketing research professionals. Survey questions ask the respondent to rate a product sample or descriptions of product concepts on a range of attributes. Anywhere from five to twenty attributes are chosen. They could include things like: ease of use, weight, accuracy, durability, colourfulness, price, or size. The attributes chosen will vary depending on the product being studied. The same question is asked about all the products in the study. The data for multiple products is coded and input into a statistical program such as R, SPSS, SAS, Stata, and SYSTAT.
The analysis will isolate the underlying factors that explain the data. Factor analysis is an interdependence technique. The complete set of interdependent relationships are examined. There is no specification of either dependent variables, independent variables, or causality. Factor analysis assumes that all the rating data on different attributes can be reduced down to a few important dimensions. This reduction is possible because the attributes are related. The rating given to any one attribute is partially the result of the influence of other attributes. The statistical algorithm deconstructs the rating (called a raw score) into its various components, and reconstructs the partial scores into underlying factor scores. The degree of correlation between the initial raw score and the final factor score is called a factor loading. There are two approaches to factor analysis: "principal component analysis" (the total variance in the data is considered); and "common factor analysis" (the common variance is considered).
Note that principal component analysis and common factor analysis differ in terms of their conceptual underpinnings. The factors produced by principal component analysis are conceptualized as being linear combinations of the variables whereas the factors produced by common factor analysis are conceptualized as being latent variables. Computationally, the only difference is that the diagonal of the relationships matrix is replaced with communalities (the variance accounted for by more than one variable) in common factor analysis. This has the result of making the factor scores indeterminate and thus differ depending on the method used to compute them whereas those produced by principal component analysis are not dependent on the method of computation. Although there have been heated debates over the merits of the two methods, a number of leading statisticians have concluded that in practice there is little difference (Velicer and Jackson, 1990) which makes sense since the computations are quite similar despite the differing conceptual bases, especially for datasets where communalities are high and/or there are many variables, reducing the influence of the diagonal of the relationship matrix on the final result (Gorsuch, 1983).
The use of principal components in a semantic space can vary somewhat because the components may only "predict" but not "map" to the vector space. This produces a statistical principal component use where the most salient words or themes represent the preferred basis. [ok]
Factor analysis has also been widely used in physical sciences such as geochemistry, ecology, and hydrochemistry^{[1]} .
In groundwater quality management, it is important to relate the spatial distribution of different chemical parameters to different possible sources, which have different chemical signatures. For example, a sulfide mine is likely to be associated with high levels of acidity, dissolved sulfates and transition metals. These signatures can be identified as factors through Rmode factor analysis, and the location of possible sources can be suggested by contouring the factor scores.^{[2]}
In geochemistry, different factors can correspond to different mineral associations, and thus to mineralisation.^{[3]}
Economists might use factor analysis to see whether productivity, profits and workforce can be reduced down to an underlying dimension of company growth.
Factor analysis is a family of multivariate statistical techniques for identifying related clusters of variables (factors). There are two main approaches:
