Scatter plot

A scattergram (English scatter plot) is the graphical representation of the observed values ​​of two pairs of statistical characteristics. These pairs of values ​​are entered in a Cartesian coordinate system, thus resulting in a point cloud. The representation of the points can be made by various small icons.

Application

It is hoped that the pattern of the points in the scatter diagram to identify information about the dependence structure of the two features that are represented by the coordinates.

The adjacent example diagram contains points that represent the two features " length" and " width" of various artillery ships. The ships are divided into four classes, which different colors are assigned ( destroyer, light cruiser, heavy cruiser, battleship ). It can be seen by such a scatterplot at a glance different correlations in the captured data.

Frequently occurring correlations are clusters ( clusters ) and linear structures. Clusters can be examined using the cluster analysis. For quantification of linear correlations focuses on providing the regression analysis.

If a third (metric ) feature will also be presented with which the size of the points (or channels ) can be done. The resulting chart type called bubble chart.

Scatterplots with ordinal features

There are only suitable for continuously distributed data scatter plots in general. Observations discretely distributed characteristics overlap when the same values ​​are present multiple times. However, there are ways to represent ordinal features in scatter plots:

  • By "Sunflowers " is entered at each coordinate pair, a circle. The number of pairs of values ​​that lie on this point is represented by dashes on the circle, so that creates a stylized sunflower.
  • Through a " jittered scatterplot " ( " shaky scatterplot " ): It will be added to the data small random numbers, so that the values ​​can be easily pulled apart and result in a point cloud. However, one must be aware that the values ​​lie above one another really. You could " pseudometrisch " call.

The following example shows both ways based on the evaluation of a statistics lecture, in which the features of " fabric is understandable " and " overall score statistics" ( grades 1-4 ) were entered in a scatter plot.

Dotplot

A dotplot is also called the one-dimensional scatter diagram. In it a variable is represented either on the x- axis or the y-axis (as shown in the diagrams ). Depending on how many values ​​assume the observations of the variables, there is the problem that you are seeing only one data point, although behind him (many ) can hide further observations.

Similar to sunflower scatterplot symbols can be used to represent the number of points. In the graph on the left symbolizes a larger circle that behind him hide more observations than in a smaller circle. In the graph on the right, a circle is drawn for each data point. The same value occurs repeatedly, more circles are drawn to the right of the first circle.

Another possibility is to draw on one axis and the observation value of the other axis a random, for example, derived from a uniform distribution to select value. Also one can draw a density estimate.

The dot plot allows insight into the distribution of a variable, eg where the observations are particularly dense or distribute the observations on only a few values ​​.

Scatterplot matrix

In a scatterplot matrix is a scatter diagram can be drawn for a multivariate data set for each pair of variables. Here, the scatter plots distinguish the top right of which the lower left of the graph only by which variable is mapped to the x -axis and y -axis. That the corresponding point clouds are mirrored only to the 45 degree line. In variations of the scatterplot matrix will take the mirrored point cloud data, other information, such as Correlation coefficients or regression functions shown.

On the diagonals only the variable names are registered in the graph on the right. However, there is also here a number of variations, such as with more information ( box plots, density estimates) over the respective variable.

The scatterplot matrix has some drawbacks:

  • The number of variables, which is shown, it should not be too large since otherwise the area for each scatter diagram is too small and thus confusing. The brothers Tukey therefore have metrics, collectively, with Scagnostics proposed that characterize the property of the point cloud:
  • The scatterplot matrix shows in variable exactly projections of multivariate data. However, an interesting data structure must not be visible in these projections. Then you should draw on the Grand Tour or Projection Pursuit method either.
292627
de