Nelder–Mead method

The downhill simplex method or Nelder Mead method is in contrast to the namesake for linear problems ( simplex algorithm ) is a method for optimizing nonlinear functions of several parameters. He falls into the category of hill climbing or downhill search method. It can be applied, for example, the curve Fitten.

It was introduced by John Nelder and Roger Mead 1965.

Basics

A special feature of this simplex algorithm does not require derivatives of the function according to the parameters which are not predictable with reasonable effort in some problems. Through meaningful comparisons of function values ​​of several points in the parameter space is approximated with step-size control the tendency of the values ​​and the gradient direction optimum similar to a Regula falsi. The method does not converge extremely fast, but it is simple and relatively robust.

The algorithm is based on a Simplex, the simplest volume in the N- dimensional parameter space, which is spanned from N 1 points. In the one-dimensional on the number line, the triangle, etc. is a track in two dimensions Each point corresponds to a ( coordinate ) set of parameters (or variables), and at any point, one can calculate a function value obtained for example as error or cost value can view. Among these N 1 points then one determines the "worst" and "best". From one iteration to the next normally these points is the " worst " is replaced by a new one which is "better" hopefully. The "best" point as you have always maintains the best solution.

In a visualization in two dimensions with contour lines for the to be optimized function you can see ( in the pictures on the right) a triangle that determined the optimum approaches by engaging in work to one of its sides, stretches or in the vicinity of the optimum contract around this. Refer to the illustration at the bottom.

Without derivatives account for various traps such as pathological behavior of these derivatives at discontinuities, so that the method does generally converges more slowly (ie, linear) works as an optimization method with derivatives, but much more robust. The potential pitfalls in the simplex method are similar to most optimization methods unexpected local secondary minima (or optima ) can be converged to, or that the process to contract too early for one or more parameters to a constant value.

The algorithm itself

The downhill simplex method finds the optimum of a function with variables ( parameters). The method consists of three steps:

The core of the method of Nelder and Mead is the strategy with which the third step of the new simplex is formed. Nelder and Mead have is adopting the following rules: Always at the worst point ( center of the ) remaining simplex reflect (reflection, using the factor alpha). a) If this item is better than everyone else, make another, even bigger step ( gamma factor ) in the same direction ( expansion) and calculate the associated value function.

C ) If the reflected point is not better than everyone else, check to see if the reflected point is better than the second-worst point of the simplex:

The coordinate changes of the parameters during these steps, are calculated by means of the Nelder / Mead parameters alpha, beta and gamma, are generally as 1 (" reflective " ), 1 /2 ( " contraction " and " compression " ) and 2 ( "expansion" ) are recognized.

According to these rules, the iteration continues until a convergence criterion is met. The simplex is thereby shift towards the optimum and finally pull together around it.

Extensions for emergencies

As for other methods, there is always the danger that is converged to an optimum addition. This risk can be mitigated by several measures:

  • After a certain number of steps will be rescheduled. For this, the best so far point is retained and built around it a new start - simplex by the coordinate ( only ) of a parameter is varied, for example, by 5 % for each additional point.
  • Just as before, except the additional simplex points are more flexible by random values ​​even more.
  • Instead of such measures to take after a fixed number of steps, you can make the time and the current state of the iteration dependent. To do this, the problem of course be better known that its can be sure also.
  • Alternatively, you can even make the time of the new setting variable by random number.
  • An additional query could intercept the event that only one of the parameters has gone prematurely, then you could send him a special treatment.

All of these options require an intensive preoccupation with the respective specific problem, there is unfortunately no general recipes for it.

Extension to the curve Fitten

Will not optimize a single analytic function of N parameters (coordinates), but adapt a theory function to a measurement curve is only a small additional step is necessary: As to optimizing function is used often, the error sum of squares, ie the sum of the squares of the differences from the individual measurement values ​​( curve points ), and the theoretical function values ​​at the same positions, the latter being on the one hand depend on the x-coordinate of the trace, and additionally N of the parameters. Whether you still draws the square root of the sum of squared errors is irrelevant for the simplex method. See also the method of least squares.

However, the downhill simplex method is very flexible ( unlike, say, the Levenberg -Marquardt algorithm), so you can depending on the situation other functions used for optimization:

  • The data have a strong scattering: instead of the sum of the squared deviation of minimizing the sum of the amounts of the deviations. Outliers affect the outcome less.
  • Curve fitting typically works with the assumption that the error of the dependent variable ( Dy = ycalc - yobs ) regardless of the associated independent variable) is ( homoscedasticity ). In some cases, however, Ay is a function of x. This situation occurs in particular when x and y extend over several orders of magnitude. If one were to minimize the sum of squared errors in such cases, you would get a good fit at large x / y values ​​, but a very bad case of small, hardly contribute to the error sum of squares. In such cases, minimized χ2, ie the sum of the squared relative deviations Σ ( Ay / y ) 2

Sample implementation

Pseudo code (C -like):

/ / Declarations        / / User preferences, typed int NP = ...; / / Number of parameters that enter in FQS ()                       / / Also " dimension of the parameter space " double PARP [ NP] = { ...}; / / Primary parameter set double pars [ NP] = { ...}; / / Secondary parameter set, are fluctuation ago double FQS (int pf) / / to minimizing function depends    {/ / On the parameters par0 [ pf ] [... ].      ... / / Must be created depending on the problem.                       / / To write to pf for the parameter set No.    } / / Calculated at the end in par0 [ pf ] (see below)                       / / Is stored. / / End user requirements / / Initialization / / Simplex parameters double NA = 1, NB = .5, NC = 2; / / Alpha, beta, gamma / / Parameter sets, NP 1 piece for the simplex itself, plus 3 more double par0 [ 4 NP ] [ NP 1]; / / First index is: 0 NP the (NP 1) sets of parameters of the simplex. / / Places to 0 always the worst point of the simplex, / / To 1 always the best point of the simplex / / NP 1 point P *, see below / / NP 2 point P **, see below / / NP 3 P- center cross of the simplex, see below / / In second index: 0 error value for this set / / 1. NP individual parameter values ​​of this set / / Initialization / / Simplex in primary and secondary values Parameter sets 0 to NP assign primary values ​​; In parameter sets 1 to NP ( i ) each     Parameter No. i assign secondary value; / / Calculate the initial error sums for parameter sets 0 to NP: feh = FQS ( i); / / Stores result in par0 [i ], supra / / Loop for iteration, the main loop bool done = false; while (! done)      NT ; / / Simplex step # NT           / / Worst and best point determine (highest or lowest error value )      Values ​​fmin and fmax Preassign on error of set No. 0;      for all parameter sets from 1 to NP:         if error for set No. i is greater than fmax:            even worse point, fmax to this value, point No. 0 exchange         if error for set No. i is less than fmin:            even better point, fmin change to this value, point No. 1      / / Convergence query, we're done?      / / This must also be created individually depending on the problem,      / / As an example the condition that all parameters by only 1 ‰      / / Vary.      done = true;      for all parameters in all set from 0 to NP:         if only for a parameter, the relative deviation of the value            between worst and best point is greater than 0.001, then:               done = false; / / Not ready yet      / / When done, close with best point par0 [ 1 .3 ] as a result of the algorithm      / / Center of the simplex P CROSS by index NP 3      Calculate averages over sets 1 to NP ( ie without worst point No. 0! )         save and set as No. NP 3;      / / Point P * by reflection of the worst point      / / At the center P CROSS ( with Nelder / Mead parameters alpha = NA) by index NP 1      within a set of parameters:         Par0 [ NP 1 ] [i ] = (1 NA) * par0 [ NP 3 ] [i ] - NA * par0 [i ];      fs = par0 [ NP 1 ] = FQS (NP 1 ); / / And calculate error           / / Case distinction whether achieved improvement      if f < fmin / / new point P * in comparison to previously best      / / / Yes, better!      | / / Further step to point P ** in the same direction by index NP 2      | / / ( With Nelder / Mead - parameter gamma = NC)      | Within a set of parameters:      | Par0 [ NP 2 ] [i ] = (1 NC) * par0 [ NP 1 ] [i ] - NC * par0 [ NP 3 ] [i ];      | Fs = par0 [ NP 2 ] = FQS (NP 2); / / And calculate error      |      | If fs > = fmin / / again compared to previously best point      | / / / No further improvement: P Discard **, P * by index 0,      | | So replace with a new ( slightly better ) / / value is so far      | \ Parameter set No. NP copy 1 after No. 0;      | Else / / fs > = fmin      | / / / Yes, even an improvement!      | | / / Even better than P *?      | | If fs > par0 [ NP 1]      | | / No, continue to use P / / * and so far the worst replace index 0      | | \ Parameter set No. NP copy 1 after No. 0;      | | Else / / fs > par0 [ NP 1]      | | / / / Yes, continue to use P ** and so far replace worst on Index 0      \ \ \ Parameter set No. NP Copy 2 by No. 0;      else / / of f < fmin      / / Check / not better than best yet now, if P * at least in general an improvement     . | If fs * for at least one of the points 0th NP is less of P, then:      * Replace / / / yes, better than at least one, with P the previously worst point on Index 0 |      | \ Parameter set No. NP copy 1 after No. 0;      | Else / / fs of P *      | / / / No, still worse than the other points      | | / / Additional test: even worse than before fmax worst point?      | | If f > fmax / / worse than previously Worst?      | | / / Yes, first do nothing      | | Else / / fs > fmax      Replace / / / no, so far at least worst point so that | |      | \ \ Parameter set No. NP copy 1 after No. 0;      | / / New point P ** by contraction between far worst point (index 0 )      | / / And P CROSS ( NP index 3 ) to index NP 2 ( with Nelder / Mead - parameter beta = NB)      | Within a set of parameters:      | Par0 [ NP 2 ] [i ] = NB * par0 [i ] (1 - NB) * par0 [ NP 3 ] [i ];      | Fs = par0 [ NP 2 ] = FQS (NP 2); / / And calculate error      | / / P ** better than previously worst point?      | If f < fmax / / better than previously Worst?      Replace / / / yes, so far the worst by P ** |      | \ Parameter set No. NP Copy 2 by No. 0;      | Else / / fs < fmax      | / / / No, no improvement, emergency      | | / / Followed by general compression by a factor of 2 for the best-ever point around      | | For all parameter sets except No. 1 ( best yet ): / / ie j = 0 and 2 NP.      | | Within a parameter set j:      | | Par0 [j ] [i ] = ( par0 [j ] [i ] par0 [i ] ) / 2;      \ \ Fs = par0 [j ] = FQS (j); / / And calculate error      / / End of loop A directly executable implementation in QBasic can be found below.

Graphical representation of an example run

The picture shows the result of a program run. The red and blue lines represent the gradient lines for the valleys or peaks dar. It is a circular trench in an oblique plane, it is interested here the local minimum within the circular trench. Simplex iteration is the minimum at the intersection of the red lines. In the x -and y- direction, the graphic ranges from 0 to 4

292926
de