A power law is a special kind of mathematical relationship between two quantities. When the number or frequency of an object or event varies as a power of some attribute of that object (e.g., its size), the number or frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary as a power of the size of the population, and hence follows a power law. Power laws govern a wide variety of natural and manmade phenomena, including frequencies of words in most languages, frequencies of family names, sizes of craters on the moon and of solar flares, the sizes of power outages, earthquakes, and wars, the popularity of books and music, and many other quantities.
Contents 
A power law is any polynomial relationship that exhibits the property of scale invariance. The most common power laws relate two variables and have the form
where a and k are constants, and o(x^{k}) is an asymptotically small function of x^{k}. Here, k is typically called the scaling exponent, where the word "scaling" denotes the fact that a powerlaw function satisfies where c is a constant. Thus, a rescaling of the function's argument changes the constant of proportionality but preserves the shape of the function itself. This point becomes clearer if we take the logarithm of both sides:
Notice that this expression has the form of a linear relationship with slope k. Rescaling the argument produces a linear shift of the function up or down but leaves both the basic form and the slope k unchanged.
Powerlaw relations characterize a staggering number of naturally occurring phenomena, and this is one of the principal reasons why they have attracted such wide interest. For instance, inversesquare laws, such as gravitation and the Coulomb force, are power laws, as are many common mathematical formulae such as the quadratic law of area of the circle. However much of the recent interest in power laws comes from the study of probability distributions: it's now known that the distributions of a wide variety of quantities seem to follow the powerlaw form, at least in their upper tail (large events). The behavior of these large events connects these quantities to the study of theory of large deviations (also called extreme value theory), which considers the frequency of extremely rare events like stock market crashes and large natural disasters. It is primarily in the study of statistical distributions that the name "power law" is used; in other areas the powerlaw functional form is more often referred to simply as a polynomial form or polynomial function.
Scientific interest in power law relations stems partly from the ease with which certain general classes of mechanisms generate them. The demonstration of a powerlaw relation in some data can point to specific kinds of mechanisms that might underlie the natural phenomenon in question, and can indicate a deep connection with other, seemingly unrelated systems (see the reference by Simon and the subsection on universality below). The ubiquity of powerlaw relations in physics is partly due to dimensional constraints, while in complex systems, power laws are often thought to be signatures of hierarchy or of specific stochastic processes. A few notable examples of power laws are the GutenbergRichter law for earthquake sizes, Pareto's law of income distribution, structural selfsimilarity of fractals, and scaling laws in biological systems. Research on the origins of powerlaw relations, and efforts to observe and validate them in the real world, is an active topic of research in many fields of science, including physics, computer science, linguistics, geophysics, sociology, economics and more.
The main property of power laws that makes them interesting is their scale invariance. Given a relation f(x) = ax^{k}, scaling the argument x by a constant factor causes only a proportionate scaling of the function itself. That is,
That is, scaling by a constant simply multiplies the original powerlaw relation by the constant c^{k}. Thus, it follows that all power laws with a particular scaling exponent are equivalent up to constant factors, since each is simply a scaled version of the others. This behavior is what produces the linear relationship when both logarithms are taken of both f(x) and x, and the straightline on the loglog plot is often called the signature of a power law. Notably, however, with real data, such straightness is necessary, but not a sufficient condition for the data following a powerlaw relation. In fact, there are many ways to generate finite amounts of data that mimic this signature behavior, but, in their asymptotic limit, are not true power laws. Thus, accurately fitting and validating powerlaw models is an active area of research in statistics.
The equivalence of power laws with a particular scaling exponent can have a deeper origin in the dynamical processes that generate the powerlaw relation. In physics, for example, phase transitions in thermodynamic systems are associated with the emergence of powerlaw distributions of certain quantities, whose exponents are referred to as the critical exponents of the system. Diverse systems with the same critical exponents — that is, which display identical scaling behaviour as they approach criticality — can be shown, via renormalization group theory, to share the same fundamental dynamics. For instance, the behavior of water and CO_{2} at their boiling points fall in the same universality class because they have identical critical exponents. In fact, almost all material phase transitions are described by a small set of universality classes. Similar observations have been made, though not as comprehensively, for various selforganized critical systems, where the critical point of the system is an attractor. Formally, this sharing of dynamics is referred to as universality, and systems with precisely the same critical exponents are said to belong to the same universality class.
The general powerlaw function follows the polynomial form given above, and is a ubiquitous form throughout mathematics and science. Notably, however, not all polynomial functions are power laws because not all polynomials exhibit the property of scale invariance. Typically, powerlaw functions are polynomials in a single variable, and are explicitly used to model the scaling behavior of natural processes. For instance, allometric scaling laws for the relation of biological variables are some of the best known powerlaw functions in nature. In this context, the o(x^{k}) term is most typically replaced by a deviation term ε, which can represent uncertainty in the observed values (perhaps measurement or sampling errors) or provide a simple way for observations to deviate from the no powerlaw function (perhaps for stochastic reasons):
A powerlaw distribution is any that, in the most general sense, has the form
where α > 1, and L(x) is a slowly varying function, which is any function that satisfies with t constant. This property of L(x) follows directly from the requirement that p(x) be asymptotically scale invariant; thus, the form of L(x) only controls the shape and finite extent of the lower tail. For instance, if L(x) is the constant function, then we have a powerlaw that holds for all values of x. In many cases, it is convenient to assume a lower bound x_{min} from which the law holds. Combining these two cases, and where x is a continuous variable, the power law has the form
where the prefactor to x ^{− α} is the normalizing constant. We can now consider several properties of this distribution. For instance, its moments are given by
which is only well defined for m < α − 1. That is, all moments diverge: when α < 2, the average and all higherorder moments are infinite; when 2 < α < 3, the mean exists, but the variance and higherorder moments are infinite, etc. For finitesize samples drawn from such distribution, this behavior implies that the central moment estimators (like the mean and the variance) for diverging moments will never converge  as more data is accumulated, they continue to grow.
Another kind of powerlaw distribution, which does not satisfy the general form above, is the power law with an exponential cutoff
In this distribution, the exponential decay term e ^{− λx} eventually overwhelms the powerlaw behavior at very large values of x. This distribution does not scale and is thus not asymptotically a power law; however, it does approximately scale over a finite region before the cutoff. (Note that the pure form above is a subset of this family, with λ = 0.) This distribution is a common alternative to the asymptotic powerlaw distribution because it naturally captures finitesize effects. For instance, although the Gutenberg–Richter law is commonly cited as an example of a powerlaw distribution, the distribution of earthquake magnitudes cannot scale as a power law in the limit because there is a finite amount of energy in the Earth's crust and thus there must be some maximum size to an earthquake. As the scaling behavior approaches this size, it must taper off.
In general, powerlaw distributions are plotted on doubly logarithmic axes, which emphasizes the upper tail region. The most convenient way to do this is via the (complementary) cumulative distribution (cdf), P(x) = Pr(X > x),
Note that the cdf is also a powerlaw function, but with a smaller scaling exponent. For data, an equivalent form of the cdf is the rankfrequency approach, in which we first sort the n observed values in ascending order, and plot them against the vector .
Although it can be convenient to logbin the data, or otherwise smooth the probability density (mass) function directly, these methods introduce an implicit bias in the representation of the data, and thus should be avoided. The cdf, on the other hand, introduces no bias in the data and preserves the linear signature on doubly logarithmic axes.
There are many ways of estimating the value of the scaling exponent for a powerlaw tail, however not all of them yield unbiased and consistent answers. The most reliable techniques are often based on the method of maximum likelihood. Alternative methods are often based on making a linear regression on either the loglog probability, the loglog cumulative distribution function, or on logbinned data, but these approaches should be avoided as they can all lead to highly biased estimates of the scaling exponent (see the Clauset et al. reference below).
For realvalued data, we fit a powerlaw distribution of the form
to the data . Given a choice for x_{min}, a simple derivation by this method yields the estimator equation
where {x_{i}} are the n data points . (For a more detailed derivation, see Hall or Newman below.) This estimator exhibits a small finite samplesize bias of order O(n ^{− 1}), which is small when n > 100. Further, the uncertainty in the estimation can be derived from the maximum likelihood argument, and has the form . This estimator is equivalent to the popular Hill estimator from quantitative finance and extreme value theory.
For a set of n integervalued data points {x_{i}}, again where each , the maximum likelihood exponent is the solution to the transcendental equation
where ζ(α,x_{min}) is the incomplete zeta function. The uncertainty in this estimate follows the same formula as for the continuous equation. However, the two equations for are not equivalent, and the continuous version should not be applied to discrete data, nor vice versa.
Further, both of these estimators require the choice of x_{min}. For functions with a nontrivial L(x) function, choosing x_{min} too small produces a significant bias in , while choosing it too large increases the uncertainty in , and reduces the statistical power of our model. In general, the best choice of x_{min} depends strongly on the particular form of the lower tail, represented by L(x) above.
More about these methods, and the conditions under which they can be used, can be found in the Clauset et al. reference below. Further, this comprehensive review article provides usable code (Matlab and R) for estimation and testing routines for powerlaw distributions.
A great many powerlaw distributions have been conjectured in recent years. For instance, power laws are thought to characterize the behavior of the upper tails for the popularity of websites, number of species per genus, the popularity of given names, the size of financial returns, and many others. However, much debate remains as to which of these tails are actually powerlaw distributed and which are not. For instance, it is commonly accepted now that the famous Gutenberg–Richter law decays more rapidly than a pure powerlaw tail because of a finite exponential cutoff in the upper tail.
Although powerlaw relations are attractive for many theoretical reasons, demonstrating that data do indeed follow a powerlaw relation requires more than simply fitting such a model to the data. In general, many alternative functional forms can appear to follow a powerlaw form for some extent. Thus, the preferred method for validation of powerlaw relations is by testing many orthogonal predictions of a particular generative mechanism against data, and not simply fitting a powerlaw relation to a particular kind of data. As such, the validation of powerlaw claims remains a very active field of research in many areas of modern science.^{[citation needed]}
