Mahalanobis distance

The Mahalanobis distance ( by Prasanta Chandra Mahalanobis ) is a distance measure between points in a multidimensional vector space. The Mahalanobis distance is used specifically in the statistics, for example in the context of multivariate methods.

For multivariate distributions, the m coordinates of a point are represented as m- dimensional column vector. Man holds him as the realization of a random vector X on the covariance matrix S.

The distance between two points x and y so distributed is then determined by the Mahalanobis distance

The Mahalanobis distance is scale-invariant and translation-invariant.

Graphic form the points of equal Mahalanobis distance from a center of the two-dimensional ellipse ( the axes of which show not necessarily in the direction of the coordinate axes ), while it is a circle on the Euclidean distance. Is the covariance matrix is the identity matrix (this is exactly the case when the individual components of the random vector X are each mutually independent and have variance 1 ), the Mahalanobis distance corresponds to the Euclidean distance. The separating surfaces with the same distance between two points can be any conic sections in the Mahalanobis distance.

Mathematically, the Mahalanobis distance from the m-dimensional normal distribution with mean vector and covariance matrix, where applicable. This distribution actually has the density

By taking the logarithm of this expression is obtained

For a constant, corresponding to the lack of root, the prefactor and the summands of the Mahalanobis distance.

Applications

In the discriminant analysis, the assignment of a point is determined to a certain given population, inter alia with the Mahalanobis distance. Another application is the detection of outliers using the Mahalanobis distance, the point Y is replaced by a ( tough ) location parameter. Critical is to be noted that both the covariance matrix and the position parameters can be distorted by outliers. They are estimated in most cases by robust methods, such as the MCD estimators. Furthermore, in the use of the Mahalanobis distance as Abstandsklassifikator two cases can be distinguished:

The decision for an alternative has to be justified by empirical analyzes.

540063
de