kernel density estimation

The kernel density estimation ( also Parzen window method, English: kernel density estimation, KDE) is a method for estimating the probability distribution of a random variable.

In classical statistics, it is assumed that statistical phenomena follow a given probability distribution and that this distribution is realized by random sampling. In the non-parametric statistical methods will be developed to identify from the realization of a sample distribution underlying. A known method is to create a histogram. A disadvantage of this method is that the resulting histogram is not continuous. In many cases, however, assume that the underlying distribution can be regarded as continuous. For example, the distribution of waiting times in a queue or the return of shares.

Core density estimator described in the following, however, is a method which enables a continuous estimate of the unknown distribution. More precisely, the kernel density estimate is uniformly consistent, continuous estimates of the Lebesgue density of an unknown probability measure by a sequence of densities.

Example

In the following example, the density of a standard normal distribution (black dashed line) is estimated by kernel density estimation. In the concrete situation of estimating this curve is of course unknown and should be estimated by the kernel density estimate. It has generated a sample ( from the periphery 100), which is distributed according to the standard normal distribution. With different bandwidths then a kernel density estimation was performed. One can clearly see that the quality of the kernel density estimator depends on the selected bandwidth. Too small a bandwidth appears " blurred ", while a too large bandwidth to " coarse" is.

Cores

With the core continuous Lebesgue density of a nearly 's choosing the probability measure is called. Possible cores are approximately:

  • Gaussian kernel
  • Cauchy kernel
  • Picard - core
  • Epanechnikov kernel

These cores are densities of similar shape as the Cauchykern shown. Core density estimator is a superposition in the form of the sum of corresponding scaled cores positioned depending on the sample implementation. And a scaling factor in front to ensure that the resulting sum in turn represents the density of a probability measure. The following figure is a sample of size 10 taken as a basis, which is shown as black circles. In the Cauchykerne (green dashed line) are shown, from the superposition of the kernel density estimator results (red curve).

Epanechnikov the core is that of core, which minimizes the mean square deviation of the associated kernel density estimator among all cores.

The kernel density estimator

Is a sample, a core, the core density estimator is defined for bandwidth as:.

The selection of the bandwidth is crucial for the quality of the approximation. With appropriate, depending on the sample size selected bandwidth, the sequence of the kernel density estimator almost surely uniformly to the density of the unknown probability measure. This statement is substantiated in the following set of Nadaraja.

Set of Nadaraja

The kit provides the statement that with an appropriately chosen bandwidth an arbitrarily good estimate of the unknown distribution by choosing a sufficiently large sample is possible.

Be a core of bounded variation. The density of a probability measure is uniformly continuous. With and are defined for the bandwidths. Then the sequence of kernel density estimators with probability 1 against uniform, ie

Application

The kernel density estimation is used by statisticians since about 1950 and is commonly used in ecology to describe the space of action of an animal, since this method found its way into the branch of science in the 1990s. Thus, the probability can be calculated, with which an animal is staying in a particular spatial region. Action space predictions (eg contour lines ) are represented by colored lines.

472850
de