Bernstein–von Mises theorem

Please help to eliminate the shortcomings of this article, and please take part you in the discussion! (Insert item)

The set of Bernstein - von Mises establishes an important link between Bayesian statistics and frequentist statistics. In parametric models the a posteriori distribution usually focused regardless of the a priori distribution asymptotically (for large number of observations ) around the true parameter ( consistency of the Bayesian estimator ).

The appropriately centered and scaled a posteriori distribution is even asymptotically by the theorem of Bernstein - von Mises a normal distribution with the inverse Fisher information matrix as the covariance matrix ( asymptotic efficiency of the Bayesian estimator ). Thus, optimal frequentist and Bayesian approaches result in parametric models asymptotic to qualitatively the same results.

So is the a posteriori distribution of the unknown variables in a problem, in a sense of the a priori distribution independent as soon as the amount of information gained by the sample is large enough.

Example of use

In the following, the application of the sentence and the typical approach of Bayesian inference to be illustrated by a simple example: Observed is a random variable and its realization based on a set of measured data from the sampling area. These data should be described by a stochastic model with unknown parameters, which can also be vector-valued. Before the data are collected, both their values ​​and those of the parameter is uncertain and a common stochastic model for makes sense. In this interpretation, the parameter is a random variable with a prior distribution. This is still unknown apparent before the actual measurement data and it must be made ​​a "reasonable" a priori assumption about them. After observing the data, the opinion of the parameter is updated. All of the available information is described by the posterior distribution. This is given by the theorem of Bayes as

Wherein the term representing the so-called likelihood function describing the distribution of a given parameter. It is to be hoped that the posterior distribution allows a better and more accurate values ​​than the original naive prior distribution. This last step is usually referred to as Bayesian learning and is an essential step in learning in neural networks. Now let us take this last posterior distribution as a new prior distribution, levy a new additional data and repeat the above procedure, we obtain a further, updated posterior distribution after a further Bayesian learning step. This now includes information from two data sets and should thus provide an even better and more accurate conclusion on. The fact that the repeated application of this Bayesian learning successfully approximates the actual distribution of, is testimony of the Bernstein -von Mises theorem. The convergence of this method to the actual distribution of is almost certainly under certain conditions and is independent of the prior distribution.

History

The theorem is named after Richard von Mises and Sergei Natanowitsch Bernstein, although the first rigorous proof by Joseph L. Doob in 1949 was given for random variables with finite probability spaces. Later, Lucien Le Cam, PhD student Lorraine Black, the mathematician David A. Freedman and Persi Diaconis generalizes the theorem and its prerequisites. In a remarkable result by David A. Freedman from 1965 should be noted: the Bernstein -von Mises theorem is almost certainly " not applicable " if the random variable lives in an infinitely countable probability space. In other words, does not converge in this case for almost all initial prior distributions, the method against the true distribution. The intuitive reason for this is that the learned in a particular Bayesian learning step information on the degree 0. A negative consequence of this is already apparent in high-dimensional but finite problems, such as Persi Diaconis and David A. Freedman in their publication of 1986 note in the last sentence of the abstract:

"Unfortunately, in high - dimensional problems, arbitrary details of the prior can really matter; . Indeed, the prior can swamp the data, no matter how much data you have That is what our examples suggest, and why we did is advise against the mechanical use of Bayesian nonparametric techniques ".

"Unfortunately, the exact details of the prior distribution are in high -dimensional problems is really important. Because actually, the prior of the data" how much data is in the wrong direction relegate "regardless of whatever is available. This is what our examples suggest, and why we advise against it, simply mechanically apply the Bayesian non-parametric techniques. "

The well-known statistician AWF Edwards once remarked similarly, "Sometimes in defense of the Bayesian approach is said that the choice of the prior distribution in practice is irrelevant, because they hardly affected if there is enough data, the posterior distribution. Depending less on this ' defense ' is said, the better. "

110008
de