Maximum likelihood

The maximum likelihood method (. Engl of maximum likelihood ) referred to in the statistics, a parametric estimation methods. Here is simplified as the procedure is that one parameter is selected as the estimate, in accordance with its distribution, the realization of the observed data appears most plausible.

In the case of a parameter -dependent probability function

Becomes an observed output therefore considers the following likelihood function for different parameters:

It denotes the space of all results and the space of all possible parameter values ​​.

For a particular value of the parameter that is, the likelihood means the probability of observing the result. As the maximum likelihood estimate will be referred to corresponding to the one in which the likelihood function is maximized. In the case of continuous distributions applies an analog definition only the probability function in this situation will be replaced by the corresponding probability density.

  • 7.1 existence
  • 7.2 Asymptotic Properties
  • 7.3 General test 7.3.1 Likelihood ratio test
  • 7.3.2 Wald test

Motivation

In simple terms means the maximum likelihood method the following: When performing statistical analysis, we examine generally a sample with a specific amount of a population. As the investigation of the entire population, in most cases in terms of costs and the impossible the important characteristics of the population are unknown. Such characteristics are, for example, the expected value or the standard deviation. Since you these characteristics, however, the statistical calculations, which you want to perform, requires, one must estimate the unknown parameters of the entire population based on the known sample.

The maximum likelihood method is now used in situations where the elements of the population can be interpreted as a realization of a random experiment, which depends on an unknown parameter, even to this but uniquely determined and is known. According to the interesting characteristics depend exclusively by this unknown parameters, ie can be represented as a function of it. A maximum likelihood estimator will now be referred to the one parameter, the probability of obtaining the sample maximized.

The maximum likelihood method is due to its advantages over other estimation techniques (such as least squares method and the moment ), the main principle for the production of estimators for the parameters of a distribution.

A heuristic derivation

Now consider the following example: There is an urn with a large number of balls that are either black or red. Since the investigation of all balls appears virtually impossible to get a sample of ten balls is drawn ( as with replacement). In this sample now had a red and nine black balls. Based on this one sample is now to draw the true probability of a red ball in the overall population ( urn ), to be estimated.

The maximum likelihood method is now trying to create so that the appearance of our sample so that is most likely this estimate. For this purpose one could " play around " with which the probability estimator for our sample result is maximally.

" Try ", for example, 0.2 as an estimate of the probability p of a red ball, so you can use the binomial distribution B (10, 0.2, 1) calculate the probability of the observed result ( exactly one red ball ): the result is 0.2684.

" Try " is it with 0.1 as an estimate of p, ie calculated B (10, 0.1, 1) for the probability that exactly one red ball is drawn, the result is 0.3874.

The probability that the observed result (just a red ball ) was caused in the sample of a population of red spheres probability p = 0.1, ie, is greater than with p = 0.2. Thus according to the maximum likelihood method 0.1 would be a better estimator for the fraction p of red spheres in the population. It turns out that for p = 0.1 (see red line for k = 1 in the figure ) is the probability of the observed outcome is greatest. Therefore, 0.1, the maximum likelihood estimation of P. One can show that generally results in a maximum likelihood estimate of p at k red balls in the sample K/10.

Definition

When the maximum likelihood method is assumed to be random variables, the probability density or function is dependent on a parameter. If a simple random sample with independent and identically distributed realizations, this is how the density function or probability function factorize as follows:

Rather than evaluate the density for arbitrary values ​​for a fixed parameter, the density can be reversed considered as a function of for observed and thus fixed realizations. This results in the likelihood function

If this function is maximized as a function of, we obtain the maximum likelihood estimate for. The value of words will be searched, where the sample values ​​have the greatest density or probability function. The maximum likelihood estimator is in this sense, the most plausible parameter value for the realizations of the random variables. If the maximization of this feature, by forming the first derivative with respect to this, and then sets zero. (: Log-likelihood function for short) is used as it has due to the monotonicity of the logarithm of its maximum at the same position as the non - logarithmic density function, since this can be very costly in density functions with complex exponent expressions, the log-likelihood function is often However, it is easier to calculate:

Examples

Discrete distribution, continuous parameter space

The number of calls to two telephone operator in an hour in a call center can with a Poisson distribution

Be modeled. The first telephone operators go three and the second of five calls per hour independently. The likelihood function for the unknown parameters is obtained as

Substituting the values ​​of the probability function

A, it follows that

The first derivative of the likelihood function is given by

And the zeros to, and. Only for the likelihood function has a maximum, and this is the maximum likelihood estimate.

In the general case, with telephone operators, each of which receives calls per hour, the likelihood function is measured as the

And the log - likelihood function as

The derivative with respect to yields

And after forming, the maximum likelihood estimate is as

And the corresponding estimator as

Discrete distribution, finite parameter space

An urn contains balls which are either red or black. The exact number of red spheres is not known. One after the other balls are drawn and each placed back into the urn. Observed ( first ball is red ), ( second ball is red ), ( third ball is black) and ( fourth ball is red).

Wanted is now the most plausible according to the maximum likelihood principle composition of balls in the urn.

In each train is the probability of drawing a red ball, the same. Because of the independence of draws, the probability of the observed outcome and the corresponding likelihood function as a function of the unknown parameter is given by

This produces the following function values ​​:

It follows that the likelihood function is maximum for. Thus, the most plausible parameter value for the realization of three balls in the four red drawings, and thus the estimated value according to the maximum likelihood method.

Continuous distribution, continuous parameter space

For the normal distribution with the probability density

Is the likelihood function of a sample of size

This family of distributions has two parameters:.

As log-likelihood function is obtained

The partial derivatives of noisy and after

And

Substituting both expressions are equal to zero and solve the resulting nonlinear system of equations, we obtain

And

In fact, the function has its maximum at this point.

For the expectation value of results

That is, the maximum likelihood estimator of is unbiased.

But, as corrected in the article sample variance is shown, applies

The estimator for is therefore not unbiased.

Historical development

The maximum likelihood method goes back to RA Fisher, who initially. In relative ignorance of preparatory work by Gauss in work from 1912, 1921 and finally in 1922, developed under the name later known The main results were also derived in 1908 from FY Edgeworth.

Maximum likelihood estimate

A maximum likelihood estimation is referred to in the statistical parameter estimation, which is calculated according to the maximum likelihood method. In the English literature, the abbreviation MLE ( maximum likelihood estimate) is it very common. An estimate flows in the prior knowledge in the form of an a-priori probability is called maximum a posteriori estimate ( MAP).

Characteristics of the maximum likelihood estimators

The special quality of maximum likelihood estimators reflected in the fact that it represents the most efficient method for the estimation of certain parameters in the rule.

Existence

Under certain regularity assumptions can be demonstrated that the maximum likelihood estimators exist, which is obviously not due to their implicit definition as the unique maximum point of a non-specified probability function. The conditions required for this proof are basically solely from assumptions about the interchangeability of integration and differentiation, which is satisfied in most models considered.

Asymptotic properties

If maximum likelihood estimators exist, then they are asymptotically efficient, ie, they converge in distribution to a normally distributed random variable whose variance is the inverse of the Fisher information. Formally speaking is the maximum likelihood estimator for a parameter and the matrix of Fisher information from. Then, the following convergence statement holds

This limit is set specifically so important because as a consequence of the Cramer -Rao inequality, the inverse of the Fisher information is the best possible variance for the estimation of an unknown parameter. In this respect, the maximum likelihood method with respect to considerations asymptotic optimal.

General tests

The convergence of the maximum likelihood estimator for a normal distribution allows the derivative of general tests to test models and coefficients:

The chart at right illustrates the operation of the tests: The likelihood ratio test compares the values ​​of the likelihood functions together, the Wald test checks the distance between the estimated parameters and the given parameters and the score test whether the derivative of the likelihood function is zero.

Since these tests are only asymptotically valid, it is for "small" sample sizes often tests with better optimality properties.

Likelihood-ratio test

The likelihood-ratio test checks whether two of nested models differ significantly from each other. If a parameter vector, two parameter spaces are ( reduced model, full model ) and the likelihood function, then applies under the null hypothesis (vs. )

A rejection of the null hypothesis means that the "full" model provides a significantly better explanation than the "reduced " model.

Wald test

While the likelihood ratio test comparing models, the Wald test is aimed at individual coefficients ( univariate ) or coefficient groups ( multivariate ). From the asymptotic properties follows, that is, follows under the null hypothesis (vs. )

Akaike information criterion

The maximum likelihood method is also closely linked to the Akaike information criterion (AIC ). Hirotsugu Akaike showed that the maximum of the likelihood function of a distorted estimators for the Kullback -Leibler divergence, the distance between the true model and the maximum likelihood model. The larger the value of the maximum -likelihood, the closer the ML model is the true model, ie is selected the model that has the smallest AIC. The asymptotic bias is just the number of parameters to be estimated. With the AIC can compare in contrast to the likelihood ratio, Wald and score test, and non- nested ML models.

Disadvantage

However, these desirable characteristics of the maximum likelihood approach is based on the assumption of the data key generating process, that is, the assumed density function of the random variables investigated. The disadvantage of the maximum likelihood is because a specific assumption of the whole distribution of the random variable has to be made. However, if this is violated, it may be that the maximum likelihood estimator is inconsistent. Only in some cases, it is irrelevant whether the random variable actually obeys the assumed distribution, but this is not true in general. By maximum likelihood estimators obtained that are consistent, even if the specified underlying distribution assumption is violated, are so-called pseudo - maximum likelihood estimator. These estimators can have efficiency problems in small samples.

Example: Maximum Likelihood in the molecular phylogeny

The maximum likelihood criterion is considered one of the standard methods for the calculation of phylogenetic trees to phylogenetic relationships between organisms - to explore - mostly based on DNA or protein sequences. As an explicit maximum likelihood method allows the application of different evolutionary models, which are incorporated in the form of Substitutionsmatrizen in the family tree calculations. Either empirical models may be used ( protein sequence ) and the probabilities for the various point mutations between nucleotides can be estimated from the data set and with regard to the likelihood value ( ) optimized ( DNA sequences). In general ML as the most reliable and least artifact - prone method under the phylogenetic tree construction methods. However, this requires a careful taxon - "sampling" and usually a complex model of evolution.

282815
de