Conjugate gradient method

The CG method (of English. Conjugate gradients or conjugate gradient method ) is an efficient numerical method for the solution of large linear systems of equations of the form with symmetric, positive definite system matrix A.

The method provides, in exact arithmetic after the latest steps the exact solution, the size of the square matrix. In particular, however, it is interesting as an iterative process, since the error decreases monotonically. The CG method can be classified into the class of Krylov subspace methods.

It was first proposed in 1952 by Eduard Stiefel and Hestenes Magnus. An equivalent for certain systems of equations method also suggested Cornelius Lanczos in the early 1950s with the Lanczos method.

Idea of the CG method

The idea of the CG method is that to minimize the square shape

Equivalent to solve is. Here denotes the standard scalar product.

The gradient of the site is to be calculated quickly and precisely thus with large, sparse matrices. The idea of the CG method, it is now in place to minimize in the direction of the residual as in the gradient in a different direction, the function of a sub-space. The directions are all -conjugated, that is, it is

Iterates the CG process are then chosen to form the minimum of affinity in the space spanned by the vectors, and to be moved:

It can be shown that also applies:

The last part shows that the search directions to the Krylowraum A and span. The CG method can therefore alternatively define directly as Krylov subspace methods.

Since the vectors are all -conjugated, is the dimension of straight. So is a matrix, the procedure terminates after at most steps, if it is accurately calculated. Numerical errors can be eliminated by further iterations. Therefore, we consider the gradients indicates the residual. Falls below the norm of this residual a certain threshold, the process is aborted.

The method is based on a successive - orthogonal basis for the best and minimized in the respective direction.

The problem with the iterative process is to find the optimal step size. In order to determine the goodness of a point a full matrix multiplication is necessary in each case, which incidentally equal delivers a new gradient. If the increment along a given gradient to be determined, the appropriate method rather a simple mountaineers algorithm.

CG method without preconditioning

First, choose a desired and calculated:

For throws you:

  • Save matrix-vector product in order to figure it out only once
  • Find in the direction of the location of the minimum of the function and update the gradient and the residual
  • Correct the search direction by means of and

To the residual in the norm is less than a tolerance (for ).

Variants

There are different variants of the method, in addition to the first Roger Fletcher and Colin Reeves eg Hestenes and boots of Davidon Fletcher Powell and or Polak and Ribière. These are great for quadratic forms ( as defined above ) are identical, since the other terms vanish due to the orthogonality of the residuals. If, however, in order to minimize the CG method an approximated by a quadratic function form, these variations often show better convergence behavior as the original formulation of Fletcher and Reeves.

  • ( Fletcher- Reeves )
  • ( Polak - Ribière )
  • ( Hestenes - Stiefel )

CG method with symmetric preconditioning (PCG ) method

The convergence of the CG method is saved only for symmetric positive definite matrices. This must take into account a preconditioner. In a symmetric preconditioning the system of equations is transformed with the help of a preconditioner matrix to with, and it applied the CG method.

The matrix is symmetric, A is as symmetrical. It is also positive definite, there have after the inertia set of Sylvester and equal numbers of positive and negative eigenvalues.

The resulting method is called the PCG method (of English Preconditioned Conjugate Gradient. ):

First, choose a desired and calculated:

For one sets:

  • Save matrix-vector product in order to figure it out only once
  • Find in the direction of the minimum and update gradient and preconditioned gradient
  • Correct the search direction

To the residual in the norm is less than a tolerance (for ).

A common preconditioner in the context of CG is the incomplete Cholesky decomposition. This combination is also referred to as ICCG and was introduced in the 1970s by van der Vorst and Meijerink.

Two further allowed for the PCG procedure preconditioner is the Jacobian preconditioner, the main diagonal, and the SSOR preconditioner

With, the main diagonal and the lower triangular matrix is strict.

Convergence rate of the CG method

It can be demonstrated that the convergence speed of the CG algorithm

Will be described. This is the condition of the matrix with respect to the spectral norm, ie the matrix norm generated by the Euclidean norm and the energy norm of. The term is non-negative, since the condition number (with respect to one generated by a vector norm matrix norm ) of a matrix is always greater than or equal to 1. Since symmetric and positive definite is, applies

From the minimization property itself also can be derived that

Wherein any polynomial of degree k and with the solution. With the spectrum, ie, the set of eigenvalues ​​of the matrix A is meant. It follows, that solves a CG system to form a matrix with only K distinct eigenvalues ​​, in k steps, and CG that for systems in which the eigenvalues ​​are concentrated in a few small environments rapidly converges. This in turn provides an indication of meaningful preconditioner: A preconditioner is good if it ensures that the eigenvalues ​​are concentrated.

Extension to unsymmetric matrices

The system matrix A is asymmetrical, but regularly, the CG method, the normal equations

Be applied, as for a regular matrix A is symmetric and positive definite. This procedure is also called CGNR, since in this approach the norm of the residual of is minimized. Alternatively, there is the method CGNE which

Solves. Here, the error is minimized.

Both methods have the disadvantage that, for one must be available, which is not always given, and on the other the condition of A is squared in this approach, which can lead to slow convergence.

174283
de