Anscombe's quartet

The Anscombe quartet consists of four sets of data points that have nearly identical simple statistical properties, but applied look very different. Each of the four sets is eleven ( x, y) points. These four quantities were constructed in 1973 by the English statistician Francis Anscombe in order to highlight the importance of a graphical data analysis and to demonstrate the effects of outliers.

Representation

For the four sets of points applies:

The first scatter plot (top left) seems to be a simple linear relationship to suggest the two variables appear correlated and could be normally distributed. The second scatter plot (top right) shows no normal distribution; a relationship between the variables is obvious, but this is not linear, the conditions for the Pearson correlation are therefore not given here. In the third scatter plot (bottom left) is apparently still a linear relationship, however, reduces an outlier the correlation coefficient from 1 to 0.816. The fourth scattergram (bottom right) also shows an outlier, which results in a high correlation coefficient, even though there is no linear relationship between the remaining data points.

The Anscombe quartet is used to highlight the importance of graphical data analysis that should be done before starting on the statistical properties of the data with the analysis due to an assumption. Furthermore, it shows that simple statistical measures to describe data are not always sufficient.

The four sets of data points are summarized in the table below. The x - values ​​are the same for the first three quantity.

Using evolutionary algorithms can now those records whose main statistical indicators are the same, but in a graphical representation show completely different properties, generate automatically.

67826
de