# Omitted-variable bias: Wikis

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

# Encyclopedia

In statistics, omitted-variable bias (OVB) is the bias that appears in estimates of parameters in a regression analysis when the assumed specification is incorrect, in that it omits an independent variable (possibly non-delineated) that should be in the model.

## Omitted-variable bias in linear regression

Two conditions must hold true for omitted-variable bias to exist in linear regression:

• the omitted variable must be a determinant of the dependent variable (i.e., its true regression coefficient is not zero); and
• the omitted variable must be correlated with one or more of the included independent variables.

As an example, consider a linear model of the form

$y_i = x_i \beta + z_i \delta + u_i,\qquad i = 1,\dots,n$

where

• xi is a 1 × p row vector, and is part of the observed data;
• β is a p × 1 column vector of unobservable parameters to be estimated;
• zi is a scalar and is part of the observed data;
• δ is a scalar and is an unobservable parameter to be estimated;
• the error terms ui are unobservable random variables having expected value 0 (conditionally on xi and zi);
• the dependent variables yi are part of the observed data.

We let

$X = \left[ \begin{array}{c} x_1 \\ \vdots \\ x_n \end{array} \right] \in \mathbb{R}^{n\times p},$

and

$Y = \left[ \begin{array}{c} y_1 \\ \vdots \\ y_n \end{array} \right],\quad Z = \left[ \begin{array}{c} z_1 \\ \vdots \\ z_n \end{array} \right],\quad U = \left[ \begin{array}{c} u_1 \\ \vdots \\ u_n \end{array} \right] \in \mathbb{R}^{n\times 1}.$

Then through the usual least squares calculation, the estimated parameter vector $\hat{\beta}$ based only on the observed x-values but omitting the observed z values, is given by:

$\hat{\beta} = (X'X)^{-1}X'Y\,$

(where the "prime" notation means the transpose of a matrix).

Substituting for Y based on the assumed linear model,

\begin{align} \hat{\beta} & = (X'X)^{-1}X'(X\beta+Z\delta+U) \ & =(X'X)^{-1}X'X\beta + (X'X)^{-1}X'Z\delta + (X'X)^{-1}X'U \ & =\beta + (X'X)^{-1}X'Z\delta + (X'X)^{-1}X'U. \end{align}

Taking expectations, the final term

(X'X) − 1X'U

falls out by the assumption that U has zero expectation. Simplifying the remaining terms:

\begin{align} E[ \hat{\beta} ] & = \beta + (X'X)^{-1}X'Z\delta \ & = \beta + \text{bias}. \end{align}

The second term above is the omitted-variable bias in this case. Note that the bias is equal to the weighted portion of zi which is "explained" by xi.

## References

• Greene, WH (1993). Econometric Analysis, 2nd ed.. Macmillan. pp. 245–246.