Multivariate normal distribution

The multi-dimensional or multivariate normal distribution is a type of multivariate probability distributions and is a generalization of the ( one-dimensional ) normal distribution to multiple dimensions dar.

Determining a multivariate normal distribution through two distribution parameters - the vector of the expected values ​​of the one-dimensional components and the covariance matrix, which correspond to the parameters of the one-dimensional normal distributions.

Multivariate normal random variables occur as limits of certain sums of independent multi-dimensional random variables. This is the transfer of the central limit theorem to the multidimensional case.

Because they occur according to where multi-dimensional random variables can be viewed as a superposition of many unconnected individual effects, they have for the practice of great importance.

Due to the so-called reproductive property of the multivariate normal distribution, the distribution of sums ( and linear combinations ) can be a multivariate normal random variables distributor concretely specify what plays a role in the field of multivariate statistics.

The multivariate normal distribution: general case

A p- dimensional real random variable is normally distributed with mean vector and ( positive definite ) covariance matrix if it has a density function of the form

Possesses. Here, the determinant of the covariance matrix.

We write

For the corresponding distribution function, there is no closed formula. The corresponding integrals have to be calculated numerically.

The multivariate normal distribution has the following features:

  • Are the components of X pairwise uncorrelated, they are also statistically independent.
  • The linear transformation with a matrix ( with ) and is - dimensional normally distributed. This is true after the definition given here only when is non-singular, and therefore has a non-zero determinant.
  • The linear transformation
  • May also have a singular covariance matrix. One then speaks of a degenerate and singular multivariate normal distribution. In this case, no density function exists.
  • Conditional distribution with partial knowledge of the random vector: Due to a multivariate normal random vector to a vector part, the result is itself distributed multivariate normal, for

The marginal distribution of the multivariate normal distribution

Be multivariate normally distributed. For an arbitrary partition and, , is that the marginal distributions and ( multivariate ) are normal distributions.

However, the converse is not true, as the following example shows:

Let and be defined by

Said. Then just and

Thus, the covariance (and thus the correlation) and of equal if and only if. But and are not independent by definition, as is always the same. Therefore, especially not normally distributed multivariate.

The -dimensional standard normal distribution

The probability at which the density function

Is defined is called the standard normal distribution of the dimension. The dimensional standard normal distribution is apart by translations ( i.e., the expected value ) is the only multivariate distribution whose components are statistically independent, and their density is also rotationally symmetrical.

Density of the two-dimensional normal distribution

Is the density function of the two-dimensional normal distribution with mean = (0,0), and coefficient of correlation

In the more general case with two-dimensional average value = (0,0) and any variances, the density function

And the most general case with mean = you get by translation (replace by and by ).

Example of a multivariate normal distribution

Consider an apple orchard with many of the same age, ie comparable apple trees. We want to know the characteristics of the size of the apple trees, the number of sheets and income. Thus, the random variables are defined:

Height of a tree [m ]; : Yield [100 kg]; : Number of leaves [ 1000 pieces ].

The variables are each normally distributed as

Most of the trees are so tall by 4 ± 1m, very small or very large trees are rare. For a large tree of income tends to be higher than that of a small tree, but there are of course now and then a big tree with little income. Yield and size are correlated, is the covariance and correlation coefficient.

Likewise, with the correlation coefficient, and the correlation coefficients.

Summing up the three random variables in the random vector together is multivariate normally distributed. This is not true in general ( cf. The marginal distribution of the multivariate normal distribution). In the present case, then, for the joint distribution of


The corresponding correlation matrix is

Samples in multivariate distributions

In reality, the distribution of parameters of a multivariate distribution is not known in most cases. These parameters must therefore be estimated.

Draw a sample of size. Each realization of the random vector could be conceived as a point in one -dimensional hyperspace. This gives the data matrix as

In which each line contains the coordinates of a point.

The expected value vector is estimated by the mean vector of the arithmetic mean values

With the components

For the estimation of the covariance matrix itself with respect to the arithmetic mean -centered data matrix proves to be useful. It is calculated as

With the elements, the one vector, a column vector of all ones with length represents.

The covariance matrix has the estimated components

It is calculated as

The correlation matrix estimated by the pair-wise correlation coefficients

On its main diagonal are all ones.

Example, to samples

It 10 apple trees were randomly selected and measured each 3 Properties :: height of a tree [m ]; : Yield [100 kg]; : Number of leaves [ 1000 pieces ]. 10 these observations are summarized in the data matrix:

The mean values ​​are calculated, as exemplified in as

They yield the mean vector

For the centered data matrix, one obtains the centered observations by the corresponding average is departed from the column:


Calculating the covariance matrix of the covariances in the example,

And according to the variances

So that the covariance matrix


Accordingly, we obtain for the correlation matrix, for example,

Or a total of

Generation of multi-dimensional, normally distributed random numbers

A common method used to generate a random vector X N-dimensional normal distribution with mean vector μ and ( symmetric and positive definite ) covariance matrix Σ can be specified as follows: