Akaike information criterion

An information criterion is a criterion for model selection in statistics. It follows the idea of Occam's Razor, that a model should not be unnecessarily complex and balances the goodness of fit of the estimated model to the available empirical data ( sample ) and its complexity, measured by the number of parameters from. The number of parameters will be passed " punitive " taken into account, otherwise complex models would be preferred by many parameters. In this sense, the corrected coefficient of determination, which goes back to Henri Theil ( 1970), a forerunner of the currently known information criteria.

All information criteria used today is the same, that they are in two different formulations. Either the measurement is formulated for goodness of fit than the maximum probability or than the minimum variance of the residuals. This gives rise to different interpretations. In the former, the model is " best ", in which the respective information criterion has the highest value (the " punitive " number of parameters has to be deducted ). In the latter, the model with the lowest value of the information criterion best ( the number of parameters must be " punitive " added).

Akaike information criterion

The historically oldest criterion was proposed in 1973 by Hirotugu Akaike as " on information criterion" and is now considered Akaikes information criterion ( Akaike information criterion engl. AIC) known.

In the population, there is a distribution of a variable with unknown density function. In the maximum likelihood estimation ( ML), it is assumed that the distribution in the population a known distribution with unknown parameter (s), so that the density function can than write. The Kullback -Leibler divergence is used as a distance measure between and. This is the estimated parameter from the maximum likelihood method. The better the ML model, the smaller is the Kullback -Leibler divergence.

Akaike showed that the negative LogLikelihood function is a distorted estimators for the Kullback -Leibler divergence and that the distortion converges asymptotically ( sample size tends to infinity ) against the number of parameters to be estimated. Therefore, the AIC gives the logarithmic likelihood function as

In the classical regression model with normally distributed errors can be negative LogLikelihood write with the help of the variance of the error terms:

With the sample size. The variance of the estimated disturbance variables by means of the estimated residuals from the regression model.

Bayesian Information Criterion

The disadvantage of the Akaike information criterion of is that the penalty term of the sample size is independent. In large samples are improvements in log-likelihood or the residual variance "easier" possible, which is why the criterion for large samples can tend to be models appear to be favorable with relatively many parameters. Therefore, the use of Gideon Schwarz Bayesian information criterion proposed in 1978 ( engl. Bayesian Information Criterion (BIC ) or Schwarz Bayesian Criterion (SBC ) ) is recommended:

Or

For this criterion, the factor of the criminal Terms grows logarithmically with the number of observations. As early as eight observations (ln 8 = 2.07944 > 2) punished the SBC additional parameters sharper than the AIC.

The latter model is often used, especially in sociology. Kuha (2004) draws attention to the different objectives of the two parameters: while the BIC is to show the true model, is excluded the existence of a true model when AIC and trying to make the best possible prediction.

More information criteria

In addition, there are other, less frequently used information criteria, such as:

  • The Hannan -Quinn of
  • The Deviance Information Criterion - DIC ( Spiegelhalter, Best, Carlin and van der Linde (2002) )
  • EIC ( Ishiguro, Sakamoto, and Kitgawa (1997))
  • FIC ( Wei ( 1992)), GIC ( Nishii (1984))
  • NIC ( Murata, Yoshizawa and Amari (1991))
  • TIC ( Takeuchi (1976 ) )

A report based on information criteria of statistical test is the Vuong test.

37893
de