Substitution matrix

In bioinformatics, the entries describe in a substitution matrix, a relative rate at which the course of evolution an amino acid into another mutated ( in the case of a protein matrix). In this case, the entry indicates the relative rate at which the amino acid is mutated to the amino acid. Some matrices are symmetric, it is so. A substitution matrix is often used to assign a score to a given sequence alignment and to determine just how good is the alignment. Substitutionsmatrizen Frequently used BLOSUM and Point Accepted Mutation matrix ( PAM matrix). Algorithms such as BLAST or FASTA use when searching for similar proteins in a database, a substitution matrix.

Types of Substitutionsmatrizen

There are various types of Substitutionsmatrizen:

  • Identity matrix
  • Based on the genetic code
  • Based on the chemical properties of the amino acids
  • Based on empirical data (PAM and BLOSUM, and VT, MD BlastP and OPTIMA )

The last three types of matrices is taken into account that certain mutations frequently ( more likely) than others. Widespread but are usually only matrices that are based on empirical data, the BLOSUM ( blocks substitution matrix), and PAM ( Percent accepted mutation or point mutation accepted ) matrix are best known.

Identity matrix

The simplest substitution matrix is ​​the identity matrix, in which all non-identical letters get the value 0 and all identical letters the value 1 Thus the score of this matrix divided by the alignment length equal to the percent identity of two sequences. This matrix looks like this: e:

This matrix would be very ill-suited to compare two evolutionarily distant amino acid sequences. However, in order to compare Nukleidsequenzen (DNA), in which all the mutations are similarly likely to such a matrix is often used.

Empirical matrices

BLOSUM - matrix

The BLOSUM matrices was calculated in 1992 by Henikoff and Henikoff. There are different matrices, which differ only in the following figures. The BLOSUM matrix most commonly used is BLOSUM62. For calculating the BLOSUM62 matrix -related protein sequences were compared that were identical to a maximum of 62%. From this comparison, a table shows which represents the relative mutation rate (log odds ).

PAM - Matrix

The PAM matrix was one of the first amino acid Substitutionsmatrizen. It was developed in the 1970s by Margaret Dayhoff.

The matrix is calculated by observing the difference in closely related proteins.

The PAM1 matrix indicates the rate at which a substitution would be expected if 1% of the amino acids would have changed, thus corresponds to a similarity of 99%. The highest level is PAM250, which corresponds to a sequence similarity of about 20 %, with higher levels you work in practice, since one can no longer speak of similarity at a probability of less than 20 %.

The probabilities in a PAM matrix for clarity multiplied by 10000, ie in the PAM1 - matrix, the probability is below that glutamic acid ( E) by alanine (A) is replaced, equal to 0.0017 or 0.17 %.

Not quite correct, but good to remember, PAM is authorized as a percentage of mutations.

Example of a PAM1 - Matrix

ARNDCQEGHILKMFPSTWYV A 9867 2 9 10 3 8 17 21 2 6 4 2 6 2 22 35 32 0 2 18 R 1 9913 1 0 1 10 0 0 10 3 1 19 4 1 4 6 1 8 0 1 N 4 1 9822 36 0 4 6 6 21 3 1 13 0 1 2 20 9 1 4 1 D 6 0 42 9859 0 6 53 6 4 1 0 3 0 0 1 5 3 0 0 1 C 1 1 0 0 9973 0 0 0 1 1 0 0 0 0 1 5 1 0 3 2 Q 3 9 4 5 0 9876 27 1 23 1 3 6 4 0 6 2 2 0 0 1 E 10 0 7 56 0 35 9865 4 2 3 1 4 1 0 3 4 2 0 1 2 G 21 1 12 11 1 3 7 9935 1 0 1 2 1 1 3 3 21 0 0 5 H 1 8 18 3 1 20 1 0 9912 0 1 1 0 2 3 1 1 1 4 1 I 2 2 3 1 2 1 2 0 0 9872 9 2 12 7 0 1 7 0 1 33 L 3 1 3 0 0 6 1 1 4 22 9947 2 45 13 3 1 3 4 2 15 K 2 37 25 6 0 12 7 2 2 4 1 9926 20 0 3 8 11 0 1 1 M 1 1 0 0 0 2 0 0 0 5 8 4 9874 1 0 1 2 0 0 4 F 1 1 1 0 0 0 0 1 2 8 6 0 4 9946 0 2 1 3 28 0 P 13 5 2 1 1 8 3 2 5 1 2 2 1 1 9926 12 4 0 0 2 S 28 11 34 7 11 4 6 16 2 2 1 7 4 3 17 9840 38 5 2 2 T 22 2 13 4 1 3 2 2 1 11 2 8 6 1 5 32 9871 0 2 9 W 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 9976 1 0 Y 1 0 3 0 3 0 1 0 4 1 1 0 0 21 0 1 1 2 1 9945 V 13 2 1 1 3 2 2 3 3 57 11 1 17 1 3 2 10 0 2 9901 horizontal: original amino acid vertical: mutated amino acid

Example of a PAM250 - Matrix

ARNDCQEGHILKMFPSTWYV A 13 6 9 9 5 8 9 12 6 8 6 7 7 4 11 11 11 2 4 9 R 3 17 4 3 2 5 3 2 6 3 2 9 4 1 4 4 3 7 2 2 N 4 4 6 7 2 5 6 4 6 3 2 5 3 2 4 5 4 2 3 3 D 5 4 8 11 1 7 10 5 6 3 2 5 1 3 4 5 5 1 2 3 C 2 1 1 1 52 1 1 2 2 2 1 1 1 1 2 3 2 1 4 2 Q 3 5 5 6 1 10 7 3 7 2 3 5 3 1 4 3 3 1 2 3 E 5 4 7 11 1 9 12 5 6 3 2 5 1 3 4 5 5 1 2 3 G 12 5 10 10 4 7 9 27 5 5 4 6 5 3 8 11 9 2 3 7 H 2 5 5 4 2 7 4 2 15 2 2 3 2 2 3 3 2 2 3 2 I 3 2 2 2 2 2 2 2 2 10 6 2 6 5 2 3 4 1 3 9 L 6 4 4 3 2 6 4 3 5 15 34 4 20 13 5 4 6 6 7 13 K 6 18 10 8 2 10 8 5 8 5 4 24 9 2 6 8 8 4 3 5 M 1 1 1 1 0 1 1 1 1 2 3 2 6 2 1 1 1 1 1 2 F 2 1 2 1 1 1 1 1 3 5 6 1 4 32 1 2 2 4 20 3 P 7 5 5 4 3 5 4 5 5 3 3 4 3 2 20 6 5 1 2 4 S 9 6 8 7 7 6 7 9 6 5 4 7 5 3 9 10 9 4 4 6 T 8 5 6 6 4 5 5 6 4 6 4 6 5 3 6 8 11 2 3 6 W 0 2 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 55 1 0 Y 1 1 2 1 3 1 1 1 3 2 2 1 2 15 1 2 2 3 31 2 V 7 4 4 4 4 4 4 4 5 4 15 10 4 10 5 5 5 72 4 17 horizontal: original amino acid vertical: mutated amino acid

  • Bioinformatics
133153
de