Multidimensional scaling

Multidimensional scaling (also known similarity structure analysis, abbreviated MDS) is a set of multivariate statistical techniques. Your formal goal is to arrange the objects in space such that the distances ( distances) between the objects in the space as closely as possible match the raised Un / similarities. The farther the objects are from each other, the more dissimilar they are and the closer they are to the other leg, the more similar they are. So information about pairs of objects are collected, in order to determine metric information about the objects.

The solution of the multidimensional scaling, the so-called configuration is usually estimated in two or three dimensions, which facilitates the interpretability. In principle, the configuration of objects to be determined in a dimensional space. In addition to the spatial configuration of objects multidimensional scaling provides a set of indicators ( eg Stress1, S- Stress, ALSCAL, R, etc. ), which assess the quality of the configuration.

Multidimensional scaling goes back to the psychologist Warren S. Torgerson (publications from 1952 to 1968 ). The main statistical methods are the metric or non-metric multidimensional scaling by Kruskal.

An application example of the multidimensional scaling is the property fitting in marketing.

3.1 Shepard - Kruskal algorithm 3.1.1 Pool Adjacent Violators algorithm
3.1.2 Calculation of the new positions

3.3.1 STRESS Dimensions
3.3.2 R ²
3.3.3 energy

Various methods of MDS

With the different methods of MDS can be broadly distinguished between those for square matrices and those for rectangular matrices. The values within a matrix may be obtained at a designated maximum matrixkonditional data are compared, and accordingly when data zeilenkonditionalen only values within a line.

There are three model configurations are distinguished:

MDS simple: a matrix and a configuration (it is assumed to be all subjects inherent perceptual space, which is not checked by the model. )
Repeated MDS (!) more than one matrix and a configuration ( the same hypothesis as for the simple MDS, but this is here examined by the model )
INDSCAL: more than one matrix and more than one configuration, more precisely each individual matrix for each dimension are assigned upset or elongation factors and applied to a general configuration. It is assumed that an all subjects inherent perceptual space whose dimensions are assessed individually but as a separate matter what is checked by the process.

The methods of zeilenkonditionale data include:

Anchor point method: an object used as a reference point for all other objects. The matrix is then indeed square, but asymmetric and therefore zeilenkonditional.
Multidimensional Unfolding ( MDU ): not an object but each subject is interpreted as an anchor point.

Metric multidimensional scaling

The aim of metric multidimensional scaling is to be so arranged objects with distances in high dimensional space into a smaller -dimensional space that the Euclidean distances in this space as closely as possible resemble the distances. This configuration can be easily interpreted by using the Euclidean metric, since distances between objects correspond to their distance as the crow flies.

In addition to Euclidean distance measures and the metrics used in factor analysis are used. In discrete models is, among other things, the Manhattan metric used.

Are given as starting values instead of distances similarity measures between objects, so these can be through the transformation

Converted into distances.

Algorithm

The method of multidimensional scaling can be divided into 4 steps describe:

Example

Where are the distances of the fastest car connections between different cities and wanted to be the coordinates of the cities.

The metric multidimensional scaling for a configuration in two dimensions with a statistical software results

The found configuration is unique, except for rotation and scaling:

Each course rotated solution provides the same ( Euclidean ) distances between cities, and thus these solutions are equivalent.
Due to the standardization in the algorithm a uniform multiplication of the distance of cities from the zero point provides the same coordinates for the cities.

Non - metric multidimensional scaling

The non-metric multidimensional scaling plans to expand the metric multidimensional scaling in two aspects:

Slopes of the dissimilarities with the distances over together, so this function must be weakly monotonic: Applies only then must apply.

Therefore brings you the pairs of dissimilarities in a ranking

We obtain the monotonicity condition

Shepard - Kruskal algorithm

The Shepard - Kruskal algorithm iteratively determines the configuration:

Pool Adjacent Violators algorithm

If the monotonicity condition is not violated between two adjacent points, we use the distance as the respective disparity so.
If the monotonicity condition between two ( ) or more ( ) adjacent points is injured, so we use the average of the corresponding distances than disparities, ie.

What transformations are allowed in the calculation of disparities depends on the scale level of the raw data. The distances in the space of perception may take a different scale level but quite. To what extent permitted an increase in the scale levels is assessed by means of the compression ratio Q ( * number of objects ) number of similarities / ( number of dimensions). In the "simple" MDS the raw data already exist in aggregated form, therefore usually represent the average values of the responses of the respondents dar.

Calculate the new positions

The new position is calculated as

The position of the object at the time and is a weighting factor ( choose not too large, because the stress value can also deteriorate - usually 0.2).

Now, if two objects are too far apart relative to their similarity ( greater than 1 so that the expression in brackets is negative), they are pushed towards each other (the direction is determined by the difference in the second clip). Rather two dissimilar objects that are too close together, to move away from each other. Thus, the stress value is usually lowered and the iteration is continued by the second step, whereby the stress value again decreases generally.

Example

Based on the above example, we can create a ranking of distances and establish the monotonicity condition:

It was chosen at the beginning of a random configuration:

This results in:

From the calculated Euclidean distances shows that the monotonicity condition is violated in two areas:

The disparities are therefore calculated as mean values ( 1.7546 and 1.9447 ) of the corresponding regions. With the disparities, the point positions can now be moved. This process is iterated and leads to the adjacent solution.

Demolition, or certification criteria

The aim of the procedure is an optimal adjustment of the MDS solution to the raw data and thus the lowest possible STRESS or energy value or the largest possible R ². These values are to be understood as the difference between disparity and distance. The values do not change or only slightly the iteration is terminated.

STRESS Dimensions

The calculated STRESS ( Kruskal ) as a root of the sum of the squared deviations of the disparities of the distances, divided by the sum of the squared distances. Thus STRESS is a normalized variance:

An alternative measure is STRESS

With the average of all distances.

In principle, there are no exact specifications for which STRESS value is still acceptable and what can be described as "good". " To even have a standard, you have the, nullste all null hypotheses ' examined and thousands of scales from random data by MDS and it remembers which stress values arise " (see BORG / STAUFENBIEL 1989). Kruskal reference values for the STRESS value has created, in which one can orient oneself.

R ²

In addition to the basic cost criteria Stress An alternative measure is used as a quality criterion for the adaptation of the configuration of the raw data. R ² is the squared correlation of the distances with the disparities and to see as the level of the linear fit of the disparities in the distances. In practice, values greater than 0.9 for R ² are considered acceptable.

Energy

The weighting of the summands in the formula leads to energy measurements

Software

In statistical programs such as SPSS, the MDS can be performed automatically. In R, the function performs cmdscale by MDS. The same is true with Matlab, which MDS provides mdscale by the function.

Joseph Kruskal

586494