Cover's Theorem is a statement in computational learning theory and is one of the primary theoretical motivations for the use of nonlinear kernel methods in machine learning applications. The theorem states that given a set of training data that is not linearly separable, one can with high probability transform it into a training set that is linearly separable by projecting it into a higher dimensional space via some nonlinear transformation.
A complex patternclassification problem, cast in a highdimensional space nonlinearly, is more likely to be linearly separable that in a lowdimensional space, provided tat the space is no densely populated.
– Cover, T.M. , Geometrical and Statistical properties of systems of linear inequalities with applications in pattern recognition., 1965
Haykin, Simon (2009). Neural Networks and Learning Machines Third Edition. Upper Saddle River, New Jersey: Pearson Education Inc. pp. 232236. ISBN 9780131471399.
Cover, T.M. (1965). "Geometrical and Statistical properties of systems of linear inequalities with applications in pattern recognition". IEEE Transactions on Electronic Computers EC14: 326334.
