Multivariate normal distribution
The multi-dimensional or multivariate normal distribution is a type of multivariate probability distributions and is a generalization of the ( one-dimensional ) normal distribution to multiple dimensions dar.
Determining a multivariate normal distribution through two distribution parameters - the vector of the expected values of the one-dimensional components and the covariance matrix, which correspond to the parameters of the one-dimensional normal distributions.
Multivariate normal random variables occur as limits of certain sums of independent multi-dimensional random variables. This is the transfer of the central limit theorem to the multidimensional case.
Because they occur according to where multi-dimensional random variables can be viewed as a superposition of many unconnected individual effects, they have for the practice of great importance.
Due to the so-called reproductive property of the multivariate normal distribution, the distribution of sums ( and linear combinations ) can be a multivariate normal random variables distributor concretely specify what plays a role in the field of multivariate statistics.
The multivariate normal distribution: general case
A p- dimensional real random variable is normally distributed with mean vector and ( positive definite ) covariance matrix if it has a density function of the form
Possesses. Here, the determinant of the covariance matrix.
We write
For the corresponding distribution function, there is no closed formula. The corresponding integrals have to be calculated numerically.
The multivariate normal distribution has the following features:
- Are the components of X pairwise uncorrelated, they are also statistically independent.
- The linear transformation with a matrix ( with ) and is - dimensional normally distributed. This is true after the definition given here only when is non-singular, and therefore has a non-zero determinant.
- The linear transformation
- May also have a singular covariance matrix. One then speaks of a degenerate and singular multivariate normal distribution. In this case, no density function exists.
- Conditional distribution with partial knowledge of the random vector: Due to a multivariate normal random vector to a vector part, the result is itself distributed multivariate normal, for
The marginal distribution of the multivariate normal distribution
Be multivariate normally distributed. For an arbitrary partition and, , is that the marginal distributions and ( multivariate ) are normal distributions.
However, the converse is not true, as the following example shows:
Let and be defined by
Said. Then just and
Thus, the covariance (and thus the correlation) and of equal if and only if. But and are not independent by definition, as is always the same. Therefore, especially not normally distributed multivariate.
The -dimensional standard normal distribution
The probability at which the density function
Is defined is called the standard normal distribution of the dimension. The dimensional standard normal distribution is apart by translations ( i.e., the expected value ) is the only multivariate distribution whose components are statistically independent, and their density is also rotationally symmetrical.
Density of the two-dimensional normal distribution
Is the density function of the two-dimensional normal distribution with mean = (0,0), and coefficient of correlation
In the more general case with two-dimensional average value = (0,0) and any variances, the density function
And the most general case with mean = you get by translation (replace by and by ).
Example of a multivariate normal distribution
Consider an apple orchard with many of the same age, ie comparable apple trees. We want to know the characteristics of the size of the apple trees, the number of sheets and income. Thus, the random variables are defined:
Height of a tree [m ]; : Yield [100 kg]; : Number of leaves [ 1000 pieces ].
The variables are each normally distributed as
Most of the trees are so tall by 4 ± 1m, very small or very large trees are rare. For a large tree of income tends to be higher than that of a small tree, but there are of course now and then a big tree with little income. Yield and size are correlated, is the covariance and correlation coefficient.
Likewise, with the correlation coefficient, and the correlation coefficients.
Summing up the three random variables in the random vector together is multivariate normally distributed. This is not true in general ( cf. The marginal distribution of the multivariate normal distribution). In the present case, then, for the joint distribution of
And
The corresponding correlation matrix is
Samples in multivariate distributions
In reality, the distribution of parameters of a multivariate distribution is not known in most cases. These parameters must therefore be estimated.
Draw a sample of size. Each realization of the random vector could be conceived as a point in one -dimensional hyperspace. This gives the data matrix as
In which each line contains the coordinates of a point.
The expected value vector is estimated by the mean vector of the arithmetic mean values
With the components
For the estimation of the covariance matrix itself with respect to the arithmetic mean -centered data matrix proves to be useful. It is calculated as
With the elements, the one vector, a column vector of all ones with length represents.
The covariance matrix has the estimated components
It is calculated as
The correlation matrix estimated by the pair-wise correlation coefficients
On its main diagonal are all ones.
Example, to samples
It 10 apple trees were randomly selected and measured each 3 Properties :: height of a tree [m ]; : Yield [100 kg]; : Number of leaves [ 1000 pieces ]. 10 these observations are summarized in the data matrix:
The mean values are calculated, as exemplified in as
They yield the mean vector
For the centered data matrix, one obtains the centered observations by the corresponding average is departed from the column:
So
Calculating the covariance matrix of the covariances in the example,
And according to the variances
So that the covariance matrix
Results.
Accordingly, we obtain for the correlation matrix, for example,
Or a total of
Generation of multi-dimensional, normally distributed random numbers
A common method used to generate a random vector X N-dimensional normal distribution with mean vector μ and ( symmetric and positive definite ) covariance matrix Σ can be specified as follows: