Genetic code

The genetic code is a rule by which nucleic acids present in groups of three consecutive nucleobases - triplets or codons called - are translated into amino acids. This translation, called translation, occurs in the binding of amino acids at different transfer ribonucleic acids ( tRNA), which are arranged by or activated for their assembly into proteins, the amino acids.

The transfer ribonucleic acids differ in their befindliches in a prominent location molecule nucleotide triplet that consists of three nucleotides corresponding to nucleotides of a particular codon ( complementary ), thus forming a three -membered anticodon. Codon and anticodon fit specifically to each other and them is according to the genetic code associated with a particular amino acid. At each transfer ribonucleic acid amino acid that is bound to the anticodon of the tRNA to the appropriate codon is. In this way, by the specific binding of an amino acid to a tRNA with a for by the genetic code provided anticodon, that is the sign of a particular amino acid, the codon translated into the prescribed according to the genetic code amino acid. Strictly speaking, the translation already included in the structure of the different tRNA species: Each tRNA molecule contains such a textured amino acid binding site, only to the anticodon of the tRNA appropriate for the genetic code, the amino acid can be bound thereto.

After binding of the amino acid to tRNA, the synthesis of the sequence of the codons in a deoxyribonucleic acid ( DNA) protein set begin. As a precondition for this synthesis of the DNA segment of a gene first into a messenger ribonucleic acid (mRNA ) is rewritten ( transcription ); then certain parts of this mRNA are selectively removed ( splicing ) in eukaryotes. Finally, the amino acids of the matching to the codon tRNAs are linked together to form a polypeptide chain.

All after the synthesis and the splicing of the mRNA, including the following steps to the synthesis of the proteins are referred to as translation, as in protein synthesis, the reaction of the triplet sequence of DNA can be seen in amino acid sequence. However, the actual application of the genetic code, the translation of a codon in an amino acid takes place only at the binding of the amino acid to tRNA, therefore in the preparation of amino acids for the assembly of the proteins. Some base triplets encode any amino acid. These are referred to as nonsense codons. They lead to a translation stop, which stops protein synthesis ( stop codons ).

All living things use the basic features of the same genetic code. The most commonly used version is specified in the following tables. They show that amino acids are encoded by the 43 = 64 possible codons, and which codons encoding each of the 20 canonical, used in the translation of amino acids. As is for example the codon for the amino acid GAU Asp ( aspartic acid), and Cys (cysteine ​​) is coded by the codons UGU and UGC. The bases used in the table are adenine, guanine, cytosine and uracil, of the mRNA; thymine in DNA is used instead of uracil.

History of discovery

In the first half of the sixties of the 20th century prevailed among biochemists some competition for the understanding of the genetic code. On May 27, 1961 at 3 clock in the morning the German biochemist Heinrich Matthaei succeeded in the laboratory of Marshall Nirenberg with the poly -U experiment the decisive breakthrough: the decoding of the codons UUU for the amino acid phenylalanine. This experiment is referred to by some geneticists as the most important of the 20th century. 1966, five years after the decipherment of the first codon, the complete decoding of the genetic code with all 64 base triplets had succeeded.

Codon

As is known, the codon sequence of three nucleobases ( base triplet ) of the mRNA encoding an amino acid in the genetic code. In total there are 43 = 64 possible codons, three of which are nonsense codons are they used for termination of translation, the remaining 61 encode a total of 20 canonical proteinogenic amino acids. For many amino acids there are several different encodings. The coding triplet is still necessary because the doublet coding only 42 = 16 possible codons would occur, and thus not have sufficient opportunities were available to cover all 20 canonical amino acids.

Translation begins with a start codon, but this alone is not sufficient to start the process. Initiation certain sequences near the start codon are also necessary to induce transcription in the mRNA and the binding of the ribosome. The main start codon is AUG, which also codes for methionine. CUG and UUG, and GUG and AUU in prokaryotes, also work, but with much lower efficiency. However, the first amino acid is always methionine.

Translation ends with one of three stop codons, which are also referred to as nonsense codons. Originally these codons have been given names - UAG is amber (amber ), UGA is opal ( opal ), and UAA is ocher ( ocher ), a pun on the name of its discoverer (Harris Bernstein).

While the codon UGA is usually read as a stop, it can rarely, and only under certain conditions for a 21 amino acid are: The selenocysteine ​​( Sec). The biosynthesis and incorporation mechanism of selenocysteine ​​into proteins is very different from that of all other amino acids: its insertion requires a novel translation step, in which a UGA is interpreted differently in the context of a particular sequence environment and together with certain cofactors. This structurally unique and specific selenocysteine ​​tRNA ( tRNASec ) is required, which can be loaded in vertebrates with three different, but related amino acids: serine, selenocysteine ​​and phosphoserine.

Some archaea and bacteria use a UAG stop codon for the 22 proteinogenic amino acid: pyrrolysine.

Degeneration and fault tolerance

If a codon during translation incorrectly decoded ( using an incorrect amino acid ), as does the structure of the protein produced no more, and it no longer functions as intended. Apparently, it was therefore very early in the evolutionary history helps that the genetic code has a certain fault tolerance: it is a so-called degenerate code, that is, a semantic unit is encoded by several different syntactic symbols: Minus the stop codons are 61 different codons to available, but it must be encoded only 20 amino acids. As can be seen in the table above, that a plurality of codes are used for some amino acids. Then differ generally in only one of the three bases. ( This gives you a minimum distance in the code space, see Hamming distance. ) If so read one of the bases false, the probability that nevertheless the correct amino acid is selected, still at 60 %. In most cases, the affected codons also differ in the third ( " wobble " ) base of a codon that is read incorrectly during translation most often.

Furthermore, amino acids that occur frequently in proteins than others, more codons encoding them.

Noteworthy is the fact that the character of an amino acid is largely determined by the average position of a triplet:

  • U ( RNA) / T ( DNA) - hydrophobic
  • C - polar to neutral
  • A - loaded
  • G - charged, neutral to polar

It follows that radical substitutions are the result of mutations in the second position ( amino acid exchange for another character ) in the first place. Mutations in the first, but especially at the third position ( " wobble " ) is often obtained, however, the amino acid, or at least their character " conservative substitution ". Considering further that transitions ( conversion of purines and pyrimidines into each other ) from mechanistic reasons occur more frequently than transversions (conversion of a purine to a pyrimidine and vice versa, this process is mostly depurination advance), so there is a further explanation for the conservative character of the code.

Origin of the genetic code

The use of the word " code" is to Erwin Schrödinger back that had the words " hededitary code -script ", " chromosome code" and " miniature code" used in a series of lectures in 1943, which he put together in 1944 and as the basis for his book " What is life? "used from 1944. The exact seat of this code was still unclear at this time.

Previously it was believed the genetic code is created randomly. Even Francis Crick in 1968 designated it as a "frozen accident". Studies from 2004 indicate, however, that it represents the result of a rigorous optimization in terms of fault tolerance. Errors are particularly serious for the spatial structure of a protein when the hydrophobicity of a wrongly built- amino acid differs significantly from the original. In a statistical analysis prove in this respect under a million random codes only 100 more than the actual. Taking into account in the calculation of fault tolerance, additional factors that correspond to typical patterns of mutations and write errors, this number is reduced even to 1 of 1 million.

Universality of the code

Basic principle

It is noteworthy that the genetic code with a few exceptions for all living creatures is the same in principle, all living things so the same employ " genetic language". As a particular codon is always the same amino acid, it is possible, for example, in genetic engineering to deliver the gene for human insulin in bacteria, so that the produce insulin. This principle is called the " universality of the code ". This is explained by the evolution in such a way that the genetic code designed very early in the evolutionary history of life, and was then passed by all evolving species. Such a generalization does not rule out that the frequency of different code words may be different ( the so-called codon usage ) between organisms.

Exceptions

There are also exceptions to the universality of the genetic code: So in the mitochondria ( the energy -converting organelles of the cell), probably of symbiotic bacteria descended ( endosymbiont theory ), and which contain a self genetic material ( in addition to the DNA of the nucleus ), used a slightly modified form of the code.

The ciliates show deviations from the Standard Code: UAG, UAA, and often, coding for glutamine; this deviation is also found in some green algae. UGA also sometimes stands for cysteine. Another variation is found in the yeast Candida, where CUG serine coded.

Furthermore, there are several variants of amino acids, the bacteria (Bacteria) and archaea ( Archaea) are used; the stop codon UGA is, as described above, selenocysteine ​​and pyrrolysine encoding UAG. It is not excluded that other coding variants exist that have not yet been discovered.

Currently, 16 deviations of the assignment of an amino acid to a codon ( triplet of bases of mRNA) from the standard code are known:

196268
de