Kernel-Regression

Kernel regression is a series of nonparametric statistical methods, in which the dependence of a random size of output data using kernel density estimation are estimated. The type of a function represented by the regression curve, in contrast to the linear regression is not set to be linear. The advantage is a better fit to the data in the event of non-linear relationships. Depending on whether the output data are themselves random or not, we distinguish between random and fixed design design approaches. The basic procedure in 1964 independently by Geoffrey Watson and Elisbar Nadaraia ( English transcription: Elizbar Nadaraya ) is proposed.

Univariate kernel regression

Kernel density estimator

A kernel density estimator for bandwidth is an estimate of the unknown density function of a variable. Is a sample, a core, the core density estimate is defined as:

As the chart on the right shows, the choice of bandwidth is crucial for the quality of the approximation.

Nadaraya -Watson estimator

The Nadaraya -Watson estimator estimates an unknown regression function from the observed data as

And with a core and a bandwidth. The function is a function that observations near a large weight and observations far from a small weight maps. The bandwidth determines the range in which the observations have a large weight.

While the choice of the core can usually be done quite freely, the choice of bandwidth has a major impact on the smoothness of the estimator. The chart on the right shows that a large bandwidth (green) results in a smoother estimate than the choice of a small band width ( blue).

Derivation

The idea of ​​the Nadaraya -Watson estimator based on the fact that the unknown regression function

Is represented by means of the conditional expectation by the joint density and the marginal density.

The unknown densities and are estimated using a kernel density estimate. To calculate the joint density of the observations of a bivariate kernel density estimator is used with product core and band widths and:

It follows

And by means of kernel density estimation for the Nadaraya -Watson estimator.

Properties

1 As in the case of linear regression, the Nadaraya Watson estimator as a linear combination of the weighting functions can be written:

Thus, the Nadaraya -Watson estimator is the (local) weighted average of the observed values ​​, it is

The chart on the right shows the weights for different values ​​of (blue:, green:, red: ). The dotplot below zero shows the data of the explanatory variable. The greater the bandwidth is (solid line Vs. Dashed line), the more observations to have a weight equal to zero. The less data available ( right), the more the available observations must be weighted.

2 The mean square deviation results approximate as

With and independent of and. So that the convergence is slower than in the linear regression, ie with the same number of observations, the predicted value in the linear regression can be estimated more accurately than the Nadaraya -Watson estimator.

The squared distortion ( " bias ") of the Nadaraya -Watson estimator

And with the first and second derivatives of the unknown regression function, the first derivative of the density.

And the variance of the estimator

With and.

Bandwidth choice

The main problem with the kernel regression is the choice of an appropriate bandwidth. The basis is the minimization of the mean square deviation

Or its approximation. However, the approximation contains the second derivative of the unknown regression function and the unknown density function and its derivative. Instead, the data-based averaged square deviation is

Minimized. Since it is used to estimate the value of a bandwidth leads to a ( resubstitution estimate). Therefore, a leave-one -out cross-validation is performed, ie for calculating the estimate of all the observations are used except the i-th. This is calculated for different bandwidths of. The band width, which gives a minimum ASE is then taken to estimate the unknown regression function.

Confidence bands

According to the estimate of the regression function, the question arises how far it deviates from the true function. The work of Bickel and Rosenblatt (1973 ) gives two theorems for pointwise confidence intervals and uniform confidence bands.

Besides the information about the difference between the confidence bands and provide an indication of whether a possible parametric regression model, eg a linear regression fits to the data. If the estimated course of the regression function of the parametric regression model outside the confidence bands, so this is an indication that the parametric regression model does not fit the data. A formal test is possible by means of bootstrap techniques.

Pointwise confidence bands: Under certain conditions, converges in distribution

With, and.

If the band width is small enough, then the asymptotic bias can be neglected in comparison with the asymptotic variance. This approximate confidence intervals can be calculated

With the quantile of the standard normal distribution. The unknown density is estimated with a kernel density estimate and

The chart on the right shows the Nadaraya - Watson estimator with pointwise 95% confidence interval (red lines). The black linear regression line is in different areas clearly outside the Konfidenzbandes. This is an indication that a linear regression model is not appropriate here.

Uniform confidence bands: Under somewhat stronger conditions than before, and with, converges with and for cores with support in

With

The condition is not a limitation, since the data can be transformed only in the interval. Thereafter, the confidence interval is calculated, and transformed back to the original data.

Gasser- Müller estimator

In the fixed design case with the density is known, so it must not be estimated. This simplifies both the calculations and the mathematical treatment of the estimator. For this case, the Gasser- Müller estimator was defined as

With

And, and.

Properties

1 Gasser Müller estimator as Nadaraya -Watson estimate a linear estimator and the sum of the weighting functions is one.

2 For the mean square deviation:

.

Local polynomial kernel regression

The Nadaraya - Watson estimator can be written as a solution of the following local minimization problem:

That is, for each of a locally constant value is determined, which is equal to the value of the Nadaraya - Watson estimate of the location.

Instead of local constants can also be a polynomial are used:

That is, the unknown regression value is approximated by a local polynomial. The local polynomial kernel regression results from anywhere

The chart on the right shows at selected locations, the local polynomials used. The Nadaraya - Watson estimator ( red) uses locally constant functions. The locally linear kernel regression (blue) uses locally linear functions on the site. The selected sites are identical in the graph with data points. The vertical gray lines connect the local polynomials with the corresponding x - value ( data point). The point of intersection with the red and blue yields the polynomial estimate at the corresponding position for the Nadaraya Watson estimator and the locally linear kernel regression.

Advantages and Features

The local polynomial regression provides compared to the Nadaraya - Watson estimator some advantages:

  • In general, the locally constant influence of observed values ​​that are both left and right of the value. At the edges of which, however, does not work, and this leads to boundary effects. The local polynomial kernel regression, however, locally approximated by a polynomial and can avoid this problem.
  • To estimate the th derivative, one could simply derive the Nadaraya - Watson repeatedly. With the local polynomial kernel regression, however, results in a significantly more elegant way:
  • As in the case of linear regression, and the Nadaraya -Watson estimate the local regression polynomial kernel can be written as the linear combination of weight functions:

Estimation of the beta coefficients

If we define the following matrices:

And

Then the estimate of beta Koffizienten arise as

The data necessary for the derivation of coefficients in the estimation process thus automatically calculated!

In order to practically perform the estimate, we can calculate

And calculated

Local linear kernel regression

One of the most well-known locally linear regression models () is LOESS or LOWESS ( locally weighted scatterplot Abbreviated of smoothing ). However, the LOWESS is no locally - linear kernel regression, because

  • The regression weights are estimated robustly and
  • The bandwidth varies with.

The chart on the right shows two different methods for kernel regression: local constant (red, Nadaraya - Watson) and local linear ( blue). Especially at the edges approximated locally linear kernel regression data somewhat better.

The locally linear kernel regression is calculated as

The mean square errors of the local linear regression results, as the Nadaraya -Watson estimator, as

With

And the variance is identical to the variance of the Nadaraya -Watson estimator. The simpler form of the bias makes the locally linear kernel regression attractive for practical purposes.

473415
de