Least squares

The method of least squares in the standard mathematical method for curve fitting. Here, a curve is searched to a data point cloud which extends as close to the data points.

The physical data measurements, economic variables or the like representing, while the curve is derived from a parameter-dependent adjusted problem family of functions. The least squares method is then to determine the curve parameter such that the sum of the squared deviations of the curve of the observed point is minimized. The differences are called residuals.

In the example graphic data points are registered. In a first step, a functional class is selected, which should fit on the problem and the data, here is a logistic function. Whose parameters are now determined such that the sum of the squares of the deviations of the observations y e is minimized to the values of the function. The graph shows the deviation e at the point x can be seen as the perpendicular distance of the observation y of the curve.

In the stochastic method of least squares is often used as a method of estimation in the regression analysis, where it is referred to as least-squares estimation. Applied as system identification is the method of least squares in conjunction with model tests eg for engineers to determine a solution to the paradoxical situation model parameters for unknown laws.

3.1 Special case of a simple linear best-fit line 3.1.1 Example with a best-fit line

3.5.1 Numerical treatment of the solution

4.1 Example of the enzyme kinetics of a non linearisable model function

5.1 multicollinearity
5.2 outliers

History

On New Years Day 1801, the Italian astronomer Giuseppe Piazzi discovered the dwarf planet Ceres. 40 days he was able to follow the track, Ceres disappeared behind the Sun. During the year, many scientists tried unsuccessfully to calculate the path based on Piazzi's observations - assuming a circular path, because only for such time could the orbital elements of observed sky positions are determined mathematically. The 24 -year-old Gauss, however, could also compute elliptical orbits of three individual observations. But as templates significantly more path points, he turned to his method of least squares, so as to increase the accuracy. As the minor planets again found Franz Xaver von Zach and Heinrich Wilhelm Olbers in December 1801 at the exact predicted by Gauss place that is not only a great success for Gauss's method was: Piazzi's reputation, the ending because of its conflict with an orbital path points suffered greatly had, was also restored.

The foundations of his method had already developed in 1795 at the age of 18 gauss. Basis was the brainchild of Pierre -Simon Laplace, sum the amounts of errors, so that adding the error to zero. Gaussian instead took the squares and the artificial additional requirement could leave out the errors. Regardless of the Frenchman Adrien -Marie Legendre developed the same method for the first time in 1805 at the end of a little work on the calculation of cometary orbits, and published a second memoir about the year 1810. From him the name comes Méthode of moindres scarves ( method of least squares).

1809 Gauss published then in the second volume of his work Theoria motus celestial mechanics corporum Coelestium in sectionibus conicis solem ambientium ( theory of motion of celestial bodies that orbit the sun in conic sections ) its procedure, including the normal equations and Gaussian elimination. He mentioned that he had discovered it and used before Legendre, resulting in a priority dispute between the two. The least squares method has now quickly, the standard method for the treatment of astronomical or geodetic data sets.

Gauss then used the process extensively in his survey of the Kingdom of Hanover by triangulation. 1821 and 1823 appeared the two-part work, and in 1826 an addition to the Theoria combinationis observationum erroribus minimis obnoxiae ( theory of the smallest errors subject combination of observations ) in which Gauss was able to provide a justification for his procedure in comparison to the other was so successful: the method of least squares is optimal in a wide sense, that is better than other methods. The precise statement is known as the Gauss - Markov, since the work of Gauss was little noticed and was eventually rediscovered in the 20th century by Andrei Markov and published. Theoria Combinationis also contains significant progress in efficiently solving the arising linear systems of equations, such as the Gauss- Seidel method and the LU decomposition. In the 1840s, the mathematics teacher Ferdinand Clemens dealt with the method.

The French surveying officer André- Louis Cholesky developed during the First World War, the Cholesky decomposition, which again represented a substantial improvement in efficiency compared to the solution method of Gauss. In the 1960s, Gene Golub developed the idea of solving the arising linear systems of equations using QR decomposition.

The procedure

Requirements

One considers a dependent variable that is affected by a variable or several variables. Thus, the elongation of a spring, however, the profits of a company depends only on the applied force, a number of factors such as sales, various costs or equity. To simplify the notation, the representation is limited to a variable in the following. The relationship between the variables and is a model function, such as a parabola or an exponential function

Which depends on as well as functional parameters modeled. This function comes from either the knowledge of the user; in the case of the spring, this is about Hooke's law, and thus a linear function of the spring constant as the only parameter. In more difficult cases, such as that of the company, the choice of the function type preceded by an arbitrarily complex modeling process, possibly also different model functions must be applied and the results are compared.

To obtain information about the parameters and thus the concrete nature of the connection corresponding observation values are applicable to each of the given values of the independent variables. The parameters are used to adjust the selected function type on these observed values . The goal is now, the parameters to be chosen so that the model function approximates the data as possible.

Gauss and Legendre had the idea to make distributional assumptions about the measurement errors of this observation values . You should be zero on average, have a constant variance and be stochastically independent of any other measurement errors. It requires so that the measurement errors no systematic information is more to it, so they vary by chance around zero. In addition, the measurement error should be normally distributed, what the has a probabilistic benefits and other guarantees that are outliers in virtually excluded.

To determine the parameters under these assumptions, it is generally necessary that significantly more data points exist as a parameter, so it must be true.

Minimizing the sum of squares

The criterion for the determination of the approximation should be chosen such that great deviations of the model function are punished more of the data than small ones. If no solution is possible without any deviations, then the compromise is the best general criterion with the lowest total punishment.

This is the sum of the squares, which is also known as sum of squared errors, as the sum of the squared differences between the values of the curve and the data model defined. In formula notation and results

It should then be selected those parameters for which the sum of the squares is minimal:

How exactly this minimization problem is achieved depends on the type of the model function.

Linear model function

Linear model functions are linear combinations of any generally non-linear basis functions. For such model functions, the minimization problem can be solved analytically via an extreme value approach without iterative approximation steps. First, some simple special cases and examples are shown.

Special case of a simple linear best-fit line

A simple model with two non -linear parameters represents the first-order polynomial

Dar. are looking at given measured values, the coefficients and the fittest line. The deviations between the requested line and the respective measured values

Called matching errors or residuals. Wanted are now the coefficients and with the smallest sum of squares

The major advantage of the approach with this square of the error is seen when performing this minimization mathematically: The sum function is α0 and α1 be construed as a function of two variables ( the incoming measurement values are numerical constants), then the derivative (or more precisely partial derivatives ) of the function formed by these variables and eventually sought by this derivation, the zero point. This results in the linear system

With the solution

Here, the arithmetic mean of values , accordingly. The solution may be using the shifting theorem as

Be specified. These results can also be derived with functions of a real variable, ie without partial derivatives.

Example, with a best-fit line

The following example will show the approximating linear function. There were randomly selected 10 warships and with respect to several characteristics, including length ( m ) and width ( m), analyzed. It is to be examined whether the width of a warship possibly has a fixed relation to the length.

The scatter plot shows that there is a distinct linear relationship between length and breadth of a vessel. Thus, it is made a best fit line as a model function, and calculated using the method of least squares. Is then obtained analogously to the above case, first

And accordingly

In the following Table the data are listed together with the intermediate results.

This is determined as

So that one could say, with every meter length grows a warship on average about 16 inches in width. The constant term is calculated as

The adjustment of the points is quite good. On average, the error between the predicted width by means of the characteristic length and the width 2.1 m observed. Also, the coefficient of determination, as a normalized coefficient, gives a value of approximately 92 % (100 % would be a mean deviation of 0 m, respectively); The calculation of the sample for determination.

Simple polynomial fit curves

General as a linear fit line are Ausgleichspolynome

Which will now be illustrated by an example.

As results of the micro-census survey conducted by the Federal Statistical Office, the average weights of men are given by age group (Source: Federal Statistical Office, Wiesbaden 2009). For the analysis of the age groups were replaced by the class averages. It will analyze the dependence of the variables weight (y) of the variables of age (x).

The scatter plot suggests a nearly parabolic relationship between x and y, which can often be well approximated by a polynomial. There is a polynomial approach of the form

Tries. These resolve to the 4th order polynomial

The measurement points deviate on average ( standard deviation) from 0.19 kg of the model function. If we reduce the degree of the polynomial to 3, we obtain the solution

With a mean deviation of 0.22 kg, and the polynomial degree 2, the solution

With a mean deviation of 0.42 kg. As can be seen, the coefficients of the lower terms change the omission of higher terms. The method attempts to make the best of every situation. Accordingly, the lack of higher terms by means of the lower terms are balanced as well as possible, to the mathematical optimum is reached. With the second-degree polynomial ( parabola) is the profile of the measurement points is still very well described ( see figure).

Special case of a linear regression function with multiple variables

If the model function a multi-dimensional first-order polynomial, so instead of having only one variable number of independent model variables, we obtain a linear function of the form

On the residuals

Leads and the minimization approach

Can be solved.

The general linear case

Hereinafter the general case will be shown of any linear model functions with any dimension. For a given -dimensional measurement function

With independent variables was optimally engineered linear model function

Sought whose square deviation should be to a minimum. have the function of co-ordinates to be determined linearly incoming parameters and any selected for adaptation to the problem of linearly independent functions.

For a given measurement points, , ..., we obtain the matching error

Or in matrix notation

Where the vector that sums up the matrix, the basis function values , the parameter vector, the parameters and the vector observations.

The minimization problem can be in the form

Be written.

Solution of the minimization problem

The minimization problem is shown in the general linear case as

This problem is always solvable. If the matrix has full rank, the solution is even unique. The partial derivatives with respect to and setting it to zero to determine an extremum yield a linear system of normal equations (also normal equations )

Which provides the solution of the minimization problem and must be solved numerically in general. The matrix is positive definite, so that there is a minimum at the extremum found. Thus solving the minimization problem of the linear model functions can be reduced to solving a system of equations. In the simple case of the straight line can be the solution, as has been shown, even be specified directly as a simple formula.

Alternatively, the normal equations in the representation

Tender, with the standard scalar symbolizes. The basic functions are as vectors to read, with the discrete sampling points at the location of the observations.

Furthermore, can the minimization problem with a singular value decomposition to analyze well. This also motivated the expression of the pseudo-inverse, a generalization of the ordinary inverse of a matrix. This then provides a perspective on non-square systems of linear equations, which allows a non- stochastic, but algebraically motivated term solution.

Numerical treatment of the solution

For the numerical solution of the problem, there are two ways. Firstly, the normal equations can

Be solved, which are uniquely solvable if the matrix A has full rank. Further, the system matrix has the property to be positive definite, so their eigenvalues are all positive. Together with the symmetry of this can be exploited in the use of the numerical method for solving: for example the Cholesky decomposition or the CG process. Since both methods are strongly influenced by the condition of the matrix, sometimes this is not a recommended approach: It's A poorly conditioned, so is square ill-conditioned. This means that rounding errors may be so far strengthened that they make the result unusable.

On the other hand the original minimization problem provides a more stable alternative because it with a small value of the minimum of one condition of the condition in the order of A, for large values of the square of the condition A has. In order to compute the solution using a QR decomposition, which is generated by Householder transformations or Givens rotations. The basic idea is that orthogonal transformations do not change the Euclidean norm of a vector. In order to

For any orthogonal matrix Q. To solve the problem so a QR decomposition of A can be calculated, which is transformed as the right side directly. This leads to a form

With being a right upper triangular matrix. The solution thus obtained by solving the system of equations

The norm of the minimum is then obtained from the remaining components of the transformed right-hand side because the corresponding equations due to the zero rows can be fulfilled in never.

In the statistical regression analysis we speak with more given variables of multiple regression. The approach is also known as OLS ( ordinary least squares ), in contrast to GLS (generalized least squares ), the generalized regression model with residuals that deviate from the distribution assumption as uncorrelated and Homoskedastie. In contrast, there are at multivariate regression for each observation many values before, so that instead of a vector there is a matrix. The linear regression models have been investigated intensively in the probabilistic statistics. Particularly in econometrics, including complex recursively defined linear structure equations are analyzed to model economic systems.

Problems with constraints

Additional information frequently the parameters are known, which are formulated by the constraints, then present in the equation or inequality constraints. Equations appear, for example, when certain data points to be interpolated. Inequalities appear frequently, generally in the form of intervals for individual parameters. In the introduction, the spring constant has been mentioned, this is always greater than zero and can be estimated for the specific case under consideration is always upward.

In case this equation can be used when a problem made sense to reshape the original minimization problem in a lower dimension, the solution automatically satisfies the constraints.

More difficult is the Ungleichungsfall. Here arises the problem in linear inequalities

Where the inequalities are meant componentwise. This problem has a unique solution as a convex optimization problem and can for example be addressed by such methods to the solution.

Quadratic inequalities arise, for example when using a Tychonow regularization for solving integral equations. The solubility is not always the case here. The numerical solution can be done for example with special QR decompositions.

Nonlinear model functions

With the advent of powerful computers in particular wins the nonlinear regression in importance. Here, the parameters are not linear in the function. Nonlinear modeling allows in principle the fit data to any equation of the form. Since these equations define curves, the terms non-linear regression and " curve fitting " are most often used interchangeably.

Some nonlinear problems can be converted by appropriate substitution in linear and dissolve then as above. A multiplicative model of the form

Can be converted, for example by taking logarithms in an additive system. This approach is, among others, in the growth theory application.

In general, results from nonlinear model functions a problem of the form

With a non- linear function. Partial differentiation then results in a system of normal equations, which can not be solved analytically. A numerical solution can be iteratively with the Gauss -Newton method here. However, that has the problem that the convergence of the method is not assured.

Current programs often work with a variant of the Levenberg -Marquardt algorithm. In this method, although the convergence is also not secured, however, by a regularization of the monotony of the approximate result guaranteed. Moreover, the method is more tolerant than the original method, higher deviations of the estimates. Both methods are related by Newton's method converge and usually square, at each step thus doubles the number of correct decimal places.

If the differentiation due to the complexity of the objective function is too complex, there are a number of other methods as a fallback to available that do not require derivatives, see for methods of local nonlinear optimization.

Example from the enzyme kinetics of a non linearisable model function

An example of regression models, which are fully nonlinear, is the enzyme kinetics. It should be required that only y ( reaction rate ) and not (substrate concentration) is subject to error and thus can be used as a variable. The Lineweaver -Burk relationship is indeed a proper algebraic transformation of the Michaelis -Menten equation, but its application only provides correct results if the measurements are error-free. This results from the fact that reality only with an extended Michaelis -Menten relationship

Leaves with an error parameters describing. This equation can not be linearized, so the solution must be determined iteratively here.

Misconduct in non-eligibility

The least squares method allows to calculate the most likely of all model parameters under certain conditions. This requires a proper model have been chosen, a sufficient amount of measurements available and the deviations of the measured values compared to the model system must form a normal distribution. In practice, however, the method can also be used for non-fulfillment of these conditions for various purposes. Nevertheless, it should be noted that the method of least squares can provide completely undesirable results under certain unfavorable conditions. For example, should no outliers in the measurements on hand because they distort the estimation result. In addition, multicollinearity is unfavorable among the parameters to be estimated, since this causes numerical problems. In addition, can also regressors that are far away from the others, affect the results of the compensation calculation greatly. One speaks here of values with large leverage (high leverage value ).

Multicollinearity

Multicollinearity arises when the series of measurements of two given variables xi and xj are correlated very high, so are almost linearly dependent. In the linear case, this means that the determinant of the normal equation matrix is very small and the norm of the inverse reversed very large, the condition of being so severely impaired. The normal equations are then numerically difficult to solve. The solution values can be implausibly large and small changes in the observations lead to major changes in the estimates.

Runaway

When outlier data values are defined, the " do not fit into a series of measurements ." These values affect the calculation of the parameters and strongly distort the result. To avoid this, the data must be analyzed for faulty observations. The discovered outliers can be eliminated, for example, the series of measurements or apply alternative outlier- resistant method of calculation as weighted regression or the three-group method.

In the first case it is checked after the first computation of the estimates by statistical tests whether outliers are present in individual measurements. These measured values are then eliminated, and calculates the estimated values again. This method is useful when only a few outliers are present.

In the weighted regression, the dependent variable as a function of their residuals are weighted. Outliers, ie observations with large residuals, obtain a low weight, which may be graduated according to the size of the residual. The algorithm by Mosteller and Tukey (1977), referred to as " biweighting " unproblematic value by 1 and outliers are weighted with 0, which caused the suppression of the outlier. In the weighted regression several iterations are usually required until no longer changes the amount of detected outliers.

Generalized least squares models

Main article: regression analysis.

By soaking the strong requirements in the process of the observation error, we obtain so-called generalized least squares approaches. Important special cases then have their own names, such as weighted least squares, where the errors are indeed further assumed to be uncorrelated, but not more of the same variance. This leads to a problem of the form

Where D is a diagonal matrix. The variances vary greatly, the corresponding normal equations have a very great condition, so the problem should be solved directly.

Assuming further that the errors in the measurement data in the model function should be taken into account, the result is the "total least squares " in the form

Wherein the error in the model and the error in the data.

Finally, there is the possibility of putting a normal distribution based. This step corresponds to the minimization of the Euclidean norm is not in, but the sum norm. Such models are the subject of the regression analysis.

Residual (numerical analysis) Ceres (dwarf planet) Adrien-Marie Legendre Andrey Markov Gene H. Golub Equity (finance) Mathematical model Symmetric matrix Conjugate gradient method Levenberg–Marquardt algorithm Rate of convergence Allgemeine Deutsche Biographie Digital Object Identifier

329663