RANSAC

RANSAC ( random sample consensus English, German as " consistent with a random sample " ) is an algorithm for the estimation of a model within a series of measurements with outliers and gross errors. Due to its robustness, it is mainly used in the evaluation of automatic measurements primarily in the field of machine vision. Here RANSAC support - by calculating an adjusted amount of data outliers, the so-called consensus Sets - correction methods such as the method of least squares, which usually fail in a larger number of outliers.

6.1 LO- RANSAC
6.2 MSAC

Introduction and Principle

Often occur as a result of measurement on the data points that represent physical values such as pressure, distance, or temperature, or the like, economic sizes. In those matters, as precisely as possible matching, parameter-dependent model curve is to be laid. Here are more data points than are necessary for the determination of the parameters; the model is so overdetermined. The measured values can contain outliers, ie values that do not fit into the expected measurement series. Since measurements were carried out mostly manually to the development of digital technology, led control by the operator to the fact that the proportion of outliers was usually low. The balancing algorithms used at the time, such as the method of least squares, are well suited for the analysis of such data sets with little outliers: With the help of the model is determined, and then tries to detect the outliers first with all of the data points.

With the development of digital technology since the early 1980s changed the basics. With the new possibilities automatic measuring methods were used mainly in the field of machine vision increasingly. As a result, here is often a large number of values before, which usually contains many outliers. The traditional methods are based on a normal distribution of errors and provide partly meaningful results if the data points contain many outliers. This is illustrated in the adjacent illustration. It is to be adjusted is a straight line ( the model ) to the points ( the measured values ). The single outlier among the 20 data points on the one hand are the straight not excluded by traditional methods before determining. On the other hand, it affects the fit line disproportionately (so-called fulcrum ) due to its location.

The RANSAC algorithm based on a new iterative approach. Instead of all the measured values to compensate for common only as many random values are used as are necessary to calculate the model parameter (in the case of a straight line would be the two points). It is initially assumed that the values you selected are no outliers. This assumption is tested by first calculated the model from the values randomly chosen and then the distance of all measured values (ie not only the originally selected ) is determined for this model. Is the distance to the model of a measured value is smaller than a predetermined threshold, then the measured value is not a serious error relating to the calculated model. He supported it thus. The more measurements support the model, the more likely contained the randomly selected for model calculation values no outliers. These three steps - random selection of measurements, calculation of the model parameters and determination of support - are repeated several times independently. In each iteration is stored, which measured data supports that model. This quantity is called the consensus set. From the largest consensus set that contains no more outliers in the ideal case is finally determined with a traditional compensation method, the solution.

RANSAC was officially introduced in 1981 by Martin A. Fischler and Robert C. Bolles in the Communications of the ACM. An internal presentation at SRI International, where both authors were working, already took place in March 1980. An alternative to RANSAC are M- estimators. This is in comparison to other estimates, such as the maximum likelihood estimators robust against outliers.

Applications

As mentioned, occur many outliers especially in automatic measurements. This is often done in the field of machine vision, so that RANSAC is especially widespread here. The following are some applications are presented.

In image processing, RANSAC is used in the determination of homologous points between two camera images. Homologous, the two pixels that generates a single object point in the two images. The assignment of homologous points is called the correspondence problem. The result of an automatic analysis usually contains a larger number of incorrect assignments. Methods that are based on the results of the correspondence analysis, use RANSAC to eliminate the mismatches. An example of this approach is to create a panoramic image from several smaller individual images ( stitching). Another is to calculate the epipolar geometry. This is a model of the geometry, showing the geometric relationships between the different camera images of the same object. RANSAC here is used to determine the fundamental matrix that describes the geometric relationship between the images.

In the DARPA Grand Challenge, a competition for autonomous land vehicles, RANSAC was used to determine the roadway level and to reconstruct the motion of the vehicle.

The algorithm is also used in order to adapt three-dimensional point in noisy quantities geometric body such as a cylinder or the like, or to automatically segmenting point clouds. Here are all points that do not belong to the same segment, considered as outliers. According to an estimate of the dominant body in the point cloud are all removed at this body belonging points to determine other body in the next step can. This process is repeated until all of the body were found in the set of points.

Procedure and parameters

Is a prerequisite for RANSAC that more data points are available, as are necessary to determine the model parameters. The algorithm consists of the following steps:

After performing several iterations that subset is selected which contains the highest number of points ( such as a has been found ). Only this subset are calculated using one of the conventional compensation method, the model parameters. An alternative variant of the algorithm terminates the iterations early if enough points support in step 3 the model. This variant is called preemptive - called RANSAC - that is prematurely Crashing. In this procedure, you must know in advance how large the outlier proportion is about, so it can be estimated if enough measurements support the model.

The algorithm depends mainly on three parameters:

Maximum distance of a data point on the model

This parameter is fundamental for the success of the algorithm. In contrast to the other two parameters, it must be specified ( a review of the consensus set does not need to be performed, and the number of iterations can be almost arbitrarily selected high ). If the value is too large or too small, the algorithm may fail. This is illustrated in the following three frames. In each case, one iteration step. The two randomly selected points with which the green model has just been calculated, are colored blue. The error bounds are shown as black lines. Within these lines, a point must be to support the model line. If the distance is too large, too many outliers are not detected, so that a wrong model may have the same number of outliers as a real model (Fig. 1a and 1b). If it is set too low, it may happen that a model that was calculated from an outlier- free set of values that is supported by too few other values that are not outliers ( Fig. 2).

Problems with too large or too small choice of error bound

1b. Second, false solution with the same number of outliers as in 1a.

Second error bound is too small.

Despite these problems, this value must be usually determined empirically. Only when the standard deviation of the measured values is well-known, the error limit can be calculated using the laws of probability distribution.

Number of iterations

The number of repetitions can be set so that with a certain probability at least once a outlier- free subset is selected from the data points. Is the number of data points needed to calculate a model, and the relative proportion of outliers in the data, then the probability that, for repetitions of each time at least one outlier with selected

And ensure that this is a maximum, must be chosen large enough. More specifically, at least

Repetitions needed. The number thus depends only on the proportion of outliers, the number of parameters of the model function and the specific probability of drawing from at least one outlier- free subset. It is independent of the total number of measured values.

In the following table, the necessary number of repetitions depends on the outlier percentage and the number of points required, which are required for determination of the model parameters shown. The probability of selecting at least one outlier- free subset of all data points is thereby set at 99%.

Size of the Consensus sets

In the general version of the algorithm, this value need not necessarily be known, since it can be used in absence of a plausibility check easily the largest consensus set later on. But His knowledge is necessary for the prematurely terminating variant. In this, the minimum size of the consensus set is usually determined analytically or experimentally. A good approximation, the total amount of the measured values , less the portion of outliers, that is suspected in the data. For data points, the minimum size is the same. For example, with 12 data points and 20 % outliers, the minimum size of approximately 10

Adaptive determination of the parameters

The proportion of outliers on the total amount of data points is often unknown. Thus, it is not possible to determine the needed number of iterations, and the minimum size of a consensus set. In this case, the algorithm is initialized to the worst-case assumption of a breakaway portion of for example 50 %, and calculates the number of iterations, and the size of the consensus set accordingly. After each iteration of the two values to be adjusted if a more consistent amount was found. If started as the algorithm with a share of 50 % outliers and contains the calculated consensus but set 80 % of all data points, there is an improved value for the outlier of 20%. The number of iterations and the required size of the consensus set are then recalculated.

Example

In a lot of points in the plane is to fit a straight line. The points are shown in the first image. The second picture shows the result of various passages is located. Red dots support the line. Points whose distance is the straight line is greater than the error bound are colored blue. The third picture shows the solution found after 1000 iterations.

RANSAC for fitting a straight line to data points

Second animation of several iterations.

3 result after 1000 iterations.

Developments

There are some extensions of RANSAC, two of which are important are presented here.

LO- RANSAC

It has been shown in experiments that usually more iterations than the theoretically sufficient number are needed: Is calculated with an outlier- free set of points a model, so do not have all the other values that are not outliers, support this model. The problem is illustrated in the adjacent figure. Although the line by means of two error-free values was calculated (black dots) are classified some other obvious right points to the right in the picture above as an outlier ( blue stars).

For this reason, the original algorithm in LO- RANSAC (local Optimized RANSAC ) in step 3 is extended. First the subset of the points is determined as usual, are not outliers. Thereafter, as the method of least squares as determined by the amount and with the aid of any compensation method, the model again. The subset is calculated last, the distance to this optimized model is less than the error bound. It is this improved subset is stored as a consensus set.

MSAC

On RANSAC, the model is selected which is supported by most of the measured values. This corresponds to the minimization of a sum in which all fault-free values with 0 and all outliers received with a constant value:

The computed model can be very inaccurate if the error bound is set too high - the higher it is, the more solutions have the same values for. In extreme cases, all the errors of the measured values are less than the error bound, so that it is always 0. To ensure that no outliers can be detected, and RANSAC provides a poor estimate.

With this function, outliers will continue to receive a specific "penalty" that is greater than the error bound. Values below the error bound to go directly to the error instead of 0 in the sum. Thus, the problem addressed is eliminated, since a point to the sum contributes the less, the better it fits the model.

Approximation error Empirical research

526671