Consensus sequence

As a consensus sequence is that sequence of nucleotides or amino acids will be referred to, which is different in the amount of at least a given set of corresponding pattern sequences. The exact nature of this sequence can in this case depending on the choice of distance measure, such as Hamming or Levenshtein distance, vary.

Most is the creation of a consensus sequence based on the assumption that the sequences given have a common evolutionary origin or a sequence motif representing a particular biological task, and too often can also be useful to formulate ambiguous consensus sequences.

In nucleic acids also, for example, R is the base symbols of nucleic acid nomenclature may be used for this, that is in addition to the unique Base symbols A, C, G, T, U for any purine, Y is any pyrimidine and N for any nucleotide se.

In general, consensus sequences are created heuristically from a multiple sequence alignment ( MSA). In the simplest case, the element is included in the consensus sequence, which is most prevalent in the corresponding column of the MSA.

Hamming distance

200479