Sequence analysis

DNA sequence analysis in molecular biology and bioinformatics automated, computer-assisted determination of the characteristic portions, and in particular genes in a DNA sequence. It examines the information obtained during the sequencing of the DNA sequence and position of base pairs. The results of this activity are also called annotations, with the sequence analysis not only limited to Annotationsmethoden.

The analysis of DNA sequences was determined by the availability of large quantities of genomic data and the need for its interpretation. Many of the developed methods for nucleotide sequences can also be the same, or with minor modifications to the amino acid sequences, that is to apply to the primary structure of proteins. The methods which are for the most part attributable to the so-called string algorithms, can be - neglecting the biology -specific restrictions - even transmitted to any symbol sequences.

Sequence analysis can be motivated by the following problems:

  • Analogue genes, genes so whose protein products have similar functions, can have similar patterns in various ways; homologous genes may diverge in the course of evolution: Can one unknown genes in man by knowledge of the homologous genes locate in the mouse? How far the organisms are genetically isolated from each other? How much time has passed since their separation in the family tree?
  • Introns and exons have different patterns and statistics, gene control regions are often highly conserved: Can these areas alone by pattern matching and statistical analysis of the n -tuple frequencies automatically distinguish?
  • A large portion of the genomic DNA consists of noncoding DNA, which is characterized by relatively short very often repeated units ( repeats): How to filter this out so that search algorithms do not produce false positive results due to false or misleading results?

Algorithms

String algorithms

One of the most common problems is the search for specific subsequences in a database. You can either exact matches ( string matching algorithms) Search for all or approximate correspondences within a specified Levenshtein distance from the search string. In the English -speaking world, these adaptations to two strings sequence alignments are called, which in turn the entire family of alignment algorithms gave its name. The term is also in German more and more in untranslated form. By far the most well-known realizations of alignments are the Needleman - Wunsch algorithm ( global alignment ), the Smith -Waterman algorithm ( local alignment ) and the BLAST algorithm ( heuristic pairwise alignment ).

243203
de