# Computational formula for the variance: Wikis

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

# Encyclopedia

In probability theory and statistics, the computational formula for the variance Var(X) of a random variable X is the formula $\operatorname{Var}(X) = \operatorname{E}(X^2) - \operatorname{E}^2(X)\,$

where E(X) is the expected value of X. This formula can be generalized for covariance: $\operatorname{Cov}(X_i, X_j) = \operatorname{E}(X_iX_j) -\operatorname{E}(X_i)\operatorname{E}(X_j)$

as well as for the n by n covariance matrix of a random vector of length n: $\operatorname{Var}(\mathbf{X}) = \operatorname{E}(\mathbf{X X^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{X})^\top$

and for the n by m cross-covariance matrix between two random vectors of length n and m: $\operatorname{Cov}(\textbf{X},\textbf{Y})= \operatorname{E}(\mathbf{X Y^\top}) - \operatorname{E}(\mathbf{X})\operatorname{E}(\mathbf{Y})^\top$

where expectations are taken element-wise and $\mathbf{X}=\{X_1,X_2,\ldots,X_n\}$ and $\mathbf{Y}=\{Y_1,Y_2,\ldots,Y_m\}$ are random vectors of respective lengths n and m.

A closely related identity can be used to calculate the sample variance, which is often used as an unbiased estimate of the population variance: $\hat{\sigma}^2 \equiv \frac{1}{N-1}\sum_{i=1}^N(x_i-\bar{x})^2 = \frac{N}{N-1}\big(\frac{1}{N}\sum_{i=1}^Nx_i^2 - \bar{x}^2\big)$

These results are often used in practice to calculate the variance when it is inconvenient to center a random variable by subtracting its expected value or to center a set of data by subtracting the sample mean. However in some cases it is an easier calculation to carry out the centering first and then directly apply the definition of the variance.

## Proof

The computational formula for the population variance follows in a straightforward manner from the linearity of expected values and the definition of variance: $\begin{array}{ccl} \operatorname{Var}(X)&=&\operatorname{E}\left[X - \operatorname{E}(X)\right]^2\ &=&\operatorname{E}\left[X^2 - 2X\operatorname{E}(X) + \operatorname{E}(X)^2\right]\ &=&\operatorname{E}(X^2) - 2\operatorname{E}(X)\operatorname{E}(X) + \operatorname{E}(X)^2\ &=&\operatorname{E}(X^2) - \operatorname{E}(X)^2. \end{array}$

To prove the result for the sample variance $\hat{\sigma}^2 = \frac{1}{N-1}\sum_{i=1}^N(X_i-\bar{X})^2,$

note that the sample variance can be expressed as $\hat{\sigma}^2 = \frac{N}{N-1}\operatorname{Var}(X^*)$

where X * is sampled uniformly with replacement from the observed data X1, ..., Xn and the variance on the right side is a population variance. Therefore the computational formula for the sample variance follows directly from the computational formula for the population variance. Alternatively, the result can be derived by a direct algebraic calculation using the identity: \begin{align} \sum_{i=1}^N (x_i - \overline{x})^2 & = {} \sum_{i=1}^N (x_i^2 - 2 x_i\overline{x} + \overline{x}^2) \ & {} = \left(\sum_{i=1}^N x_i^2\right) - \left(2 \overline{x} \sum_{i=1}^N x_i\right) + N\overline{x}^2 \ & {} = \left(\sum_{i=1}^N x_i^2\right) - 2 \overline{x} (N\overline{x}) + N\overline{x}^2 \ & {} = \left(\sum_{i=1}^N x_i^2\right) - 2N\overline{x}^2 + N\overline{x}^2 \ & {} = \left(\sum_{i=1}^N x_i^2\right) - N\overline{x}^2. \end{align}

## Applications

Its applications in systolic geometry include Loewner's torus inequality.