Heteroscedasticity

Heteroscedasticity (also ( residuals ) variance heterogeneity, gr σκεδαστός, skedastós, scattered, distributed, dispersible ) means in statistics different scattering within a data measurement. If the variance of the residuals (and hence the variance of the variables declared itself) is not significantly different for all occurrences of the other ( predictor ) variables, there is homoscedasticity ( ( residuals ) homogeneity of variance ). The term plays an important role, particularly in econometrics and empirical research.

Occurrence

In the statistical methods are often used, in which several similar characteristics play a role. For example, it has been used in the regression analysis, a set of data points, in which a straight line is inserted as precisely as possible. The deviations of the data points from the straight line are called residuals or error terms and are probabilistically each random variable. Do these error terms all have the same variance, there is homoscedasticity.

If this error terms, however, do not have the same variance, the simple least-squares method is not efficient resulting in estimated values ​​for the regression coefficients. This means that these estimates do not have the minimum possible variance. In addition is not possible, a naive implementation of the t-test; the t- values ​​are no longer useful. Remedy in many cases a suitable data normalization: heteroskedasticity prevails, it may well be useful to transform the data by applying the logarithm or the square root to achieve homoscedasticity. This then leads to the correct use of the Gauss - Markov theorem.

Consequences of heteroskedasticity in linear regression

  • The first OLS assumption is fulfilled, ie the exogenous variable is not correlated with the residual still
  • The exogenous and endogenous variables are not identically distributed, with the result that the OLS estimators are no longer efficient and distorted the standard errors of the coefficients and is inconsistent ( why must the standard error heteroscedasticity calculated in a different way to be )

Examples

Heteroscedasticity in time series

A typical example of heteroscedasticity is when in a time series, the deviations from the trend line with progress over time increase ( eg for accuracy in weather forecasting: the farther in the future, the less likely is an accurate forecast). However, certain characteristic abnormalities such as volatility clusters can also be observed in the time series without constant variance. Therefore, an attempt was made in the context of volatility models to base the course of the variance of a systematic explanation is based.

Heteroscedasticity in the linear regression

Heteroscedasticity can occur in a linear regression. This is a problem because in the classical linear regression analysis homoscedasticity of the residuals is assumed. The graph below shows the variables Mean tonnage per house (X) and the Middle Purchase price per house (Y) for ( almost) every district in Boston (Boston Housing data). The graph Linear regression shows the relationship between the two variables. The red line shows the residual for the rightmost observation, ie the difference between the observed value ( round circle ) and the estimated value on the regression line.

In the graphic heteroscedastic residuals can see the residuals for all observations. Regarding the dispersion of the residues in the range of 4-5 or in the range from 7.5 spaces spaces so as to be smaller than the dispersion in the range 5-7.5 spaces. The dispersion of the residuals in each area is so different, so heteroscedastic. If the dispersion of the residuals equal in all areas, then they would be homoscedastic.

Test method

Known methods to check the null hypothesis " homoscedasticity " are the Goldfeld Quandt test, the White test, the Levene test, the Glejser test and the Breusch - Pagan test.

  • Regression analysis
390278
de