Cheminformatics

Chemo computer science, cheminformatics, or computer science chemistry (English: Chemoinformatics, cheminformatics, Chemical Informatics or Chemiinformatics ) denotes a branch of science that connects the field of chemistry with methods of computer science with the aim to develop and apply methods for the calculation of molecular properties. One of the founding fathers include Paul Demain, Johann Gasteiger, Jure Zupan and Ivar Ugi.

The term "chemo computer science " is relatively young, while the older Termini Computational Chemistry (derived from English: Computational Chemistry ) and chemical graph theory the same area call (Lit.: Bonchev / Rouvray, 1990). Computational chemistry is nowadays understood rather as a branch of theoretical chemistry and quantum chemistry.

  • 2.1 Ab initio methods
  • 2.2 Semi-empirical method
  • 2.3 Molecular Mechanical Process
  • 3.1 Quantitative structure-activity relationship
  • 3.2 Lead Optimization
  • 3.3 Thermodynamics
  • 3.4 Molecular Modeling

Basics

Chemo computer science deals with calculations on digital representations of molecular structures. Molecular structures can be regarded as a graph. When her representation for many applications already the so-called binding table (English: connection table ) is sufficient in the kind of links ( bonds) is placed between the individual atoms of a molecule. Only for further considerations, the inclusion of two-dimensional ( 2-D ) or three-dimensional ( 3-D ) coordinates are necessary. The latter is particularly needed when, for example in the field of medicinal chemistry, interactions with biomolecules such as proteins will be investigated.

The magnitude of the total theoretical chemical space, which consists of all possible (virtual) molecular structures is estimated to be approximately 1062 molecules, and is therefore far greater than the quantity of the previously synthesized molecule real (Lit.: Lahana, 1999). Using computer-based methods, however, can use a lot of millions of molecules already theoretically analyze ( in silico ), without having to synthesize them first for laboratory measurements.

Representation of chemical structures

The representation of chemical structures is one of the basic questions. For the majority of applications, the representation as a binding table ( Connection Table) has enforced based on the Valenzstrukturtheorie. As an example of a binding table here is given in the standard format Molfile the MDL acesulfame. The lines 5-14 contain the x -, y -and z- coordinates, and the item identifier atoms, lines 15-24, the binding table of the starting and Endatomen each binding and the binding type. The zero columns contain other possible identifier.

Acesulfame    ISIS 05070815372D   10 10 0 0 0 0 0 0 0 0999 V2000      3.2283 -1.4806 0.0000 S 0 0 3 0 0 0 0 0 0 0 0 0      2.5154 -1.8944 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0      3.2283 -0.6538 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0      4.0544 -1.4806 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0      3.6448 -2.1935 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0      1.7990 -1.4806 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0      2.5154 -0.2406 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0      1.7990 -0.6538 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0      1.0826 -1.8944 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0      2.5154 0.5855 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0    1 2 1 0 0 0 0    1 3 1 0 0 0 0    1 4 2 0 0 0 0    1 5 2 0 0 0 0    2 6 1 0 0 0 0    3 7 1 0 0 0 0    6 8 1 0 0 0 0    6 9 2 0 0 0 0    7 10 1 0 0 0 0    7 8 2 0 0 0 0   M END In addition to the binding table 3-D coordinates for real existing molecules via X-ray structure analysis can be performed. Where this is not possible, or a molecule is not physically exist, the 3-D coordinates can be at least approximately also generated directly from the binding table by iterative energy minimization calculations for different conformations of a molecule. 2-D coordinates are generally used only for illustration a molecule and therefore must satisfy mainly aesthetic. You will also be calculated directly from the binding table according to generally accepted rules of chemical characters, enter only in the rarest of cases, the actual spatial conditions in a molecule again.

Methods

Methods which do not require empirical parameters are referred to as ab initio methods. Semi-empirical methods include empirical sizes and other semi-empirical parameters, which were determined by theoretical approaches, however, is not related to measurable quantities have more. In principle, ab initio methods for small molecules are suitable. Semi-empirical methods display their capabilities for medium-sized (100 atoms) molecules. Examples of semi-empirical methods MNDO and AM1 are.

Ab initio methods

The quality with which ab initio methods to calculate the properties of molecules, depends essentially on the basis set of atoms, that is, how well and with how many individual functions, the atomic orbitals are shown. Ab initio methods that take into account the electron correlation, are considerably more complex, however, provide the best results. One manages a compromise and includes the electron correlation, an approximately. Examples of such methods are: Møller- Plesset perturbation theory, CI ( configuration interaction ), CC ( coupled cluster), MCSCF ( multi-configuration self-consistent - field).

Semi-empirical method

In semi- empirical method, a large part of the integrals of the Hartree -Fock formalism is neglected, others are approximated by spectroscopic values, parameters or parameterized functions. Reason for this approximation was the low computing capacity of earlier times. To the theoretical findings to still be able to apply to chemical problems, the existing formalism had to be simplified.

The Hückel approximation is the simplest semi-empirical approach, since it computes no integrals. However, it is also only applicable to electron systems. The theory was later extended to systems ( extended Hückel Theory, EHT).

Established methods, which are even today still used frequently belong to the class of NDDO approximation ( Neglect of Diatomic Differential Overlap ): MNDO (Modified Neglect of Differential Overlap ), AM1 ( Austin Model 1 ), PM3 ( Parametrised Method 3). Semiempirical methods with CI and MCSCF have been combined for critical calculations. With such methods are then, for example reaction barriers and all the energy profiles of complex reactions predictable ( MNDO / CI, MNDO / MCSCF ).

The limits of semiempirical methods lie in their parameterization: Actually can use the methods produce only systems are expected, were present in the Parametrisierungsdatensatz similarly.

Molecular Mechanical Process

Force field programs use a classical mechanical approach: bonds between two atoms A and B are simply approximated as a spring and described by a harmonic potential ( Hooke's law ):

Since a double bond between two carbon atoms has a different strength and equilibrium length than a single bond, different sets of parameters are required ( force constant and rest position ). Therefore used to identify the atoms no longer simple elements, but atom types. Similar approaches for bonding and torsion angles. Electrostatic ( coulomb ), and van der Waals interactions, is called the non- bonding interactions. Force field methods must be parameterized on empirical or quantum mechanically calculated data, so that a force field is characterized by two things, its energy function and the parameter set.

Force fields allow the geometry optimization of very large ( bio) molecules (eg, proteins ) and are mainly used for molecular dynamics or Monte Carlo simulations.

Applications

There are several important issues within the area - a selection:

  • The computerized representation of molecules and the quantum mechanical calculation of their properties.
  • Store and find applications that are structured chemicals (databases)
  • Methods to understand the systematics in the interaction between molecular structure and properties of substances ( QSPR ).
  • Force field calculations for geometry optimization of large molecules
  • Molecular dynamics to calculate the binding thermodynamics of enzymes
  • Computer-aided synthesis planning
  • Computer-aided prediction of the efficacy of medicines

Below are some selected application examples are presented in more detail.

Quantitative structure-activity relationship

Using the appropriate algorithms are developed codes for molecules. By induction test new approaches to molecular properties can be created to inhibit such as the bioavailability or the ability of a substance, the function of a particular protein in the body or strengthen (see also: QSAR).

Lead optimization

By appropriate chemical and biological hypotheses can this chemical space reduced to a few candidates, which are then synthesized in the laboratory and clinically tested. For this reason, cheminformatics plays an important role for the optimization of lead compounds in pharmaceutical chemistry and medicinal chemistry.

Thermodynamics

In the chemical engineering group contribution methods are used to estimate material properties such as normal boiling points, critical data, surface tensions, and more.

Molecular Modeling

The Molecular Modeling for example, deals with the creation of models of unknown macromolecules using the template (template ) similar, known molecules ( homology modeling ), the interaction between small and large molecules (receptor docking ), which QSAR is possible, the molecular dynamics and the development energitisch minimized 3- D structures of molecules ( climber algorithm, Simulated cooling, molecular mechanics, etc.). It is therefore important to develop models based on known structures of unknown structures, so as to enable a QSAR.

Related areas

There is a strong relation to analytical chemistry and chemometrics. The structure -property relationships ( for example: correlation spectra ) play a central role. Due to similar operation exists a close relationship Computational Physics, obtaining a clear separation is often not evident.

Software packages

The programs of computational chemistry are based on different quantum chemical methods for solving the molecular Schrödinger equation. Basically, two approaches can be distinguished: Semi-empirical methods and ab initio methods.

All the procedures and methods described are available in popular software packages. Examples: ACES, GAUSSIAN, GAMESS, MOLPRO, Spartan, TURBOMOLE, Cerius2 and Jaguar. ArgusLab suitable as a freely available program for entry into the computer chemistry.

The challenge for the user of this software is to find the most suitable model for his problem and to interpret the results in the area of ​​validity of the models.

181463
de