Curse of dimensionality

Curse of dimensionality is a term that was introduced by Richard Bellman to describe the rapid increase in volume as you add more dimensions in a mathematical space.

Leo Breiman gives as an example the fact that 100 observations covering the one-dimensional space of real numbers between 0 and 1 pretty good. From these observations, a histogram can be, for example, calculate and conclusions can be drawn. If now in a 10 - dimensional space of the same kind are ( each dimension can be values ​​between 0 and 1 ) 100 samples collected, these are isolated points, which do not cover the area by far. To a similar cover how to achieve in one-dimensional space, one would have to draw 1020 samples - which would at least be a very complex undertaking.

Impact on distance functions

An often -cited formulation of the " curse " says that for different types of random distributions of the data sets, the difference between the smallest and the largest distance between data sets in comparison to the smallest distance is arbitrarily small, and therefore for the relevant distributions in high-dimensional spaces, the results of distance functions ( and underlying algorithms) are no longer useful. This can be formalized as

Recent research results indicate, however, that not the pure number of dimensions is crucial as additional relevant information can better separate the data. Only for separation " irrelevant " additional dimensions cause the effect. But then the resulting sequence a cluster analysis using suitable methods While the exact distance values ​​are similar remains stable, still possible.

Machine Learning

The curse of dimensionality is a serious obstacle to machine learning problems, the need to learn a few sample elements, the structure of a high-dimensional space.

Index Structures

Many index structures are either distance-based, or try the space dimension as to share. These methods usually have problems with high-dimensional data, either because the distance values ​​are not meaningful enough, or the number of dimensions and the resulting partition options cause problems.

Numerical Integration

In the numerical integration of the curse of dimensionality also plays a major role, since the number of required function evaluations with simple algorithms such as the trapezoidal rule increases exponentially with the number of dimensions. The result is that the convergence rate decreases dramatically. For simple tasks can get around this problem using Monte - Carlo methods, since they are not dependent on the dimensionality. Otherwise hierarchical decomposition methods are used.

209812
de