Spurious relationship

Spurious correlation or (English ) spurious relationship indicates a correlation between two variables, which is no causal link is based. This is often the case when so-called intervening variables are not considered. The phenomenon has been known since the early days of statistics; the term spurious correlation was coined in 1954 by HA Simon. The German term is misleading because not only apparent, but actually there is a correlation, but it is not causality (→ correlation to the delimitation of the concepts). Spurious correlation is the statistical equivalent of the consideration in philosophy ergo propter hoc fallacy cum hoc ( common occurrence does not imply causation).

Example

A well-known example in the statistics is the correlation between the number of children births and the number of pairs of storks in different regions. Although there is a correlation between the number of births and the number of pairs of storks, there is no direct causal relationship. Indeed, there are but a causal relationship to a third ( intervening ) variable: the rurality of the region. The rural is a region, the higher the number of children births and the greater the number of pairs of storks. This leads to the correlation between the number of children births and the number of pairs of storks.

Apparent regression

The apparent regression is a special case of regression, in which a statistically significant correlation between a variable and a variable can be found, which is properly not logically justified. Bill regressions are due to a common trend in the variables involved. An indication of apparent regression is a high coefficient of determination and Durbin -Watson coefficient close to zero ( high positive first-order autocorrelation ). In addition, the Dickey -Fuller test insofar identifies this as a non- stationary time series, an indication of an apparent regression yields.

An example in the applications is the spurious regression problem- econometrics, to which Granger and Newbold in 1974 pointed out, after also correlate two independent random walks without deterministic trend component (or other forms of stochastic processes with unit root), even though there is stochastic independence. More precisely those caused by autocorrelation violations of the conditions of a regression model lead to, for example, the test statistics for the hypothesis that the slope parameter of the regression line is equal to zero ( t-statistic ), diverge with increasing amount of data, so if enough data is collected, always a connection is established.

712467
de