TRANSFAC

TRANSFAC ( Transcription Factor database) is a manually curated database of eukaryotic transcription factors, their genomic binding sites and DNA - binding profiles. The contents of the database can be used by using appropriate software for the prediction of potential transcription factor binding sites ( TFBS ).

History

The data collection of the underlying database was first released in 1988 by Edgar Wingender. It comprised essentially of three tables: on transcription factor binding sites ( TFBS ) in genes on transcription factors (TF) and via DNA-binding zinc finger domains of the type. It initially a locally executable database named TRANSFAC was created. As part of one of the first publicly funded Bioinformatikprojekte in Germany at the former Society for Biotechnological Research ( GBF, now the Helmholtz Centre for Infection Research, HZI) in Braunschweig from an available resource on the Internet has been developed. To secure a long-term financing in 1997 the transfer of the database to a company founded to ( BIOBASE ) was performed. But there are still older, for non-commercial users free access versions of the database.

Content and structure of the database

The focus of the database is the relationship between transcription factors ( TF) and their DNA binding sites ( TFBS ). For each TF, as far as is in the scientific literature, described its structural and functional properties. TF will be summarized with reference to the properties of their DNA binding domains to families, classes and superclasses. This results in a classification scheme for transcription factors. The binding of TF to a specific binding site of a gene together with the precise localization of the binding site, its nucleotide sequence, and the methodology, which has led to their detection documented. At a TF ( or a group of closely related TF) binding sites are related snaps and Nukleotidverteilungs matrices (count matrices, position -specific scoring matrices, PSSM ) summarized. Many matrices in the matrix library of TRANSFAC created by Annotationsteam, others taken from the scientific literature.

Areas of application

The TRANSFAC database is used, inter alia, as an encyclopedia for eukaryotic transcription factors. The target sequences and regulated genes can be listed for each TF and such comprehensive data sets for individual TF binding sequences are put together, such as a test or training sequences for TFBS recognition algorithms. TF classification makes it possible to analyze such a data related to the properties of the DNA - binding domain. Conversely, those TF can be retrieved for the regulated genes are documented for the TFBS in these genes. From the documented in TRANSFAC TF - target gene relationships and transcriptional regulatory networks were constructed in the context of systems biology studies and analyzes. By far the most common usage of TRANSFAC however, is based on the computer -based prediction of potential transcription factor binding sites. Different algorithms use to the individual TF binding sites or the matrix library.

Based on TRANSFAC content tools for the prediction of transcription factor binding sites are:

  • Patch - analyzing sequence similarities with documented in TRANSFAC binding sites; is provided together with the database.
  • SiteSeer - analyzing sequence similarities with documented in TRANSFAC binding sites.
  • Match - identifies potential TFBS using the matrix library; is provided together with the database.
  • TESS ( Transcription Element Search System ) - analyzed sequence similarities with binding sites from TRANSFAC as well as potential binding sites by using the matrix libraries from TRANSFAC and three other sources. TESS also provides a program for the identification of cis-regulatory modules ( CRMs, characteristic combinations of TFBS ) prepared using the TRANSFAC matrices.
  • PROMO - matrix -based TFBS prediction using the commercial database version
  • TFM Explorer - identification of common potential TFBS in a set of genes
  • MotifMogul - matrix -based sequence analysis using different algorithms
  • PMS ( Poly Matrix Search) - matrix -based sequence analysis in conserved promoter regions
  • Master modules - Identification of putative regulatory transcription factors for arbitrary genes and subsequent identification of cis-regulatory modules ( CRMs ).

Balance of matrices with those of the matrix libraries from TRANSFAC and other sources:

  • T- Reg Comparator for comparison of individual or groups of matrices with those of the TRANSFAC or other matrix libraries.
  • MACO (Poly Matrix Search) - Matrix comparison with matrix libraries

Using TRANSFAC precomputed genomic annotations provided by different servers.

Related data sources

The following sources provide related or overlapping with parts of the TRANSFAC database content:

  • JASPAR - collection of transcription factor binding profiles ( matrices) and sequence analysis program
  • PLACE - cis-regulatory elements of DNA in plants; to February 2007
  • Plant Care - cis-regulatory elements and transcription factors in plants (2002)
  • PRODORIC - a similar concept to TRANSFAC - but for prokaryotes
  • RegulonDB - Focus on the bacterium Escherichia coli
  • SCPD - specific data and tool collection for yeast (Saccharomyces cerevisiae) (1998)
  • TFE - The transcription factor encyclopedia
  • TRDD - Transcription Regulatory Regions Database, mainly through regulatory regions and TF binding sites
782343
de