Hirschberg's algorithm

The Hirschberg algorithm calculates the pairwise sequence alignment and has a linear input to the memory requirements. Developed in 1970 by Dan Hirschberg algorithm uses the method of dynamic programming and divide-and- conquer principle.

General

The Hirschberg algorithm is a general-purpose and optimal algorithm for finding a sequence alignment. The well-known BLAST algorithm, FASTA algorithm, and the only sub-optimal heuristics. Comparing the Hirschberg algorithm with the Needleman - Wunsch algorithm, it is the Hirschberg algorithm less a completely new algorithm, but rather a clever strategy that skillfully uses the Needleman - Wunsch algorithm to the memory consumption linearize, which is also unique about this algorithm is that the calculations for a sequence alignment need only linearly much space, so the space complexity of the algorithm in O (n) is. For the calculation of an alignment between two strings and and the algorithm has a running time of and memory consumption.

Applies the algorithm, for example, in bioinformatics for alignment of different DNA or protein sequences.

In a slightly modified form of Hirschberg's algorithm is also used to compute connected components with application to parallel processors in a graph.

Calculating the Levenshtein distance to linear memory

It is first important to understand to understand the Hirschberg algorithm that the Levenshtein distance can be calculated in linear memory space:

01: = 0 02 for j in 1. N loop 03: = 04 end loop 05 for i in 1. M loop 06 s: = 07: = 08 c: = 09 for j in 1. N loop 10 c: = 11 s: = 12: = c 13 end loop 14 end loop In lines 1-4, the one-dimensional array is initialized. At line 6, the initialization of the first element is saved in the. Thereafter, and initialized with a starting value for the next line. The following figure shows a snapshot of a line run. In the inner loop always points to the respective previously calculated result, while the result still required of the last row assured. After line 14, the Levenshtein distance is available as a result.

ε ... ε 0 1 2 3 ...     1 ... s = 0 c == 1 It should be clear that these calculations can also perform reverse. The imaginary matrix is not run from left to right and from top to bottom, but from bottom right to top left:

01: = 0 02 for j in 0 .. n-1 loop 03: = 04 end loop 05 for i in 0 .. m-1 loop 06 s: = 07: = 08 c: = 09 for j in 0 .. n-1 loop 10 c: = 11 s: = 12: = c 13 end loop 14 end loop Calculation of the alignment on a linear space

The Divide & Conquer algorithm of Hirschberg calculates an alignment of the strings, and by forward and backward pass are combined ( line numbers refer to the pseudocode given below ):

1 If or is a trivial alignment problem before (lines 14-22 ). A string consisting of only one character must be aligned to another string and an alignment is returned., And one proceeds to step 2

A second forward pass is calculated from an alignment, and the first half (lines 27 - 40 ). The result of the forward pass is an array whose elements specify the cost of a run up ( with ).

3 A, a backward pass is calculated by alignment with the second half (lines 42 - 55 ). The result is another array whose elements to specify the cost of a run of back.

4 In the field elements and are the two Levenshtein distances for global alignments of and the respective halves of. In the remaining elements of the distances from the first half are at all prefixes. According contain the remaining elements of the distances of the second - half to all of suffixes.

5 The Levenshtein distance of about can now be calculated by running along the center line of the alignment matrix and corresponding after a smallest sum of - and elements studied. If such minimum sum found one has found a position in the middle row, in which the optimal alignment crosses the center line and the center of. At this point split into two parts and thereby, also the alignment problem can be broken into two smaller problems alignment (lines 59-65 ).

6 Step 1 is recursively called on the two parts of and. The two recursive calls return partial alignments that are linked to a single alignment. The alignment is returned ( line 68 and 69).

  • DS Hirschberg: A linear space algorithm for computing maximal common subsequences. In: Communications of the ACM. 18, No. 6, 1975, p 341-343 (PDF).
  • Chao, C.M., Hardison, R. C. and Miller, W.: Recent developments in linear -space alignment methods: a survey. In: Journal of Computational Biology. No. 4, 1994, pp. 271-291 (PDF).
  • Algorithm
  • Bioinformatics
  • Dynamic Programming
  • Parallel processing
393318
de