GC-content

The GC content is a characteristic of DNA molecules, and is the proportion of DNA guanine and cytosine bases in the whole of the bases ( guanine, cytosine, adenine, and thymine ) in percent.

GC-content and stability of the DNA double helix

The respective complementary bases AT and GC are connected to each other in the double-stranded DNA molecule via hydrogen bonds. The adenine -thymine pairs always form two hydrogen bonds, guanine - cytosine pairs three. Other than initially assumed, the energy gain by hydrogen bonds is negligible, because the bases can be received similarly good hydrogen bonds with the surrounding water. The hydrogen bonds of a GC base pair contribute only minimally to the stability of the double helix, while the hydrogen bonds of an AT base pair even have a destabilizing effect. Stacking interactions ( stacking interactions ), however, act only in the double helix between successive base pairs: between the aromatic ring systems of the heterocyclic bases creates a dipole- induced dipole interaction, which is energetically favorable. Thus, the formation of the first base pair due to the low energy gain and Entropieverlustes is quite unfavorable, but the elongation ( extension ) of the helix is energetically favorable, since the base pair stacking proceeds with energy gain.

However, the stacking interactions are sequence-dependent and are energetically most favorable for stacked GC - GC as they are less favorable for stacked AT -AT. The differences in the stacking interactions explain mainly the fact that GC-rich DNA segments are thermodynamically more stable than AT-rich sequences, while hydrogen bonding plays a minor role.

Determination of GC - content

The GC content of DNA can be determined experimentally by different methods. The easiest way is to measure the so-called " melting temperature" of the DNA double helix using a photometer: DNA absorbs ultraviolet light having a wavelength of 260 nm Denatured ( " melting ") of the double-stranded by heating into two single strands, the light absorption increases by about 40 %. This effect is known as hyperchromicity. The melting temperature Tm of a DNA duplex is directly dependent on the GC content, and defined as the temperature at which 50 % of the double helix into its denatured state (ie single-stranded ) are present. From the photometrically determined melting temperature of the GC content can with the empirical formula (Tm - 69.4 ) are calculated x 2.44.

Much more accurate the determination of the GC content using gas chromatography. However, the sequence of the DNA molecule is known, the GC content can be easily calculated using the above formula.

GC content and taxonomy

The GC content of the genome is used as a taxonomic feature for the classification of organisms, particularly bacteria. The values ​​here range from about 20 to nearly 80 %. Bacteria with high GC content can be found especially among the Actinobacteria, but also Delta proteobacteria such as myxobacteria are GC - rich. Thermophilic organisms also exhibit increased GC content, which is certainly due to the greater stability of GC base pairing.

GC contents of some model organisms:

By comparison, the average GC content in humans is 41% ( see also CpG island). Due to the structure of the genetic code, it is virtually impossible for an organism to develop its genome exclusively of two bases (GC or AT) and thus achieve a GC content of 100% or 0%. The number of possible codon is not sufficient to encode all of the amino acids in a two -base code.

GC content of a DNA section

The fraction of base pairs GC and AT also varies within a genome. AT-rich (and therefore low GC ) to find regions in the genome, often the places where the double helix should be easily soluble, for example, at the points at which the replication of the DNA molecule begins. Also in human chromosomes exist regions with GC contents that differ significantly from 50%. These sections are usually included in the maintenance of the spatial structure of the chromosomes. Furthermore, the GC content in the DNA segments that encode for a gene, often higher than in other regions (such as introns, regulatory sequences). This property use is made to search in sequenced genomes according to the actual genes: first genome sequences consist only of a sequence of millions of bases. The annotation of the actual genes ( ie, the start and end points in the genome ) is done with the help of computer programs (eg MICA ), find the rich in GC and identified as candidate genes. One encounters in the study of an organism to functional genes whose GC content is significantly different from that of the other genes, this is often taken as an indication that these genes have been recently acquired by horizontal gene transfer or originate from a retrovirus.

Pictures of GC-content

85079
de