UniProt (universal protein database ) is the largest bioinformatics database for proteins of all living organisms and viruses, and contains information about protein function and structure, as well as links to other topics relevant databases. It combines the data from Swiss- Prot, TrEMBL and Protein Information Resource ( PIR) and is published in a regular rhythm.
What is UniProt consist of?
UniProt is a consortium that has teamed up in 2002 of the following components:
- The European Bioinformatics Institute (EBI )
- The Swiss Institute of Bioinformatics (SIB )
- Protein Information Resource ( PIR)
The EBI has a great source of bioinformatic data, the SIB houses the server ( ExPASy ) ( Expert Protein Analysis System), which provide essential information for proteomics. PIR, which is operated by the National Biomedical Research Foundation ( NBRF ), derives from the oldest protein sequence database ( Margaret Oakley Dayhoff 's " Atlas of protein sequence and structure ").
The UniProt databases
Each member of the UniProt consortium "cultivates " the databases. Until recently (Source: UniProt Supporting Information) EBI and SIB together produced Swiss-Prot and TrEMBL. The PIR put the database PIR - PSD (Protein Sequence Database ) are available.
Swiss-Prot is probably the most well-known protein database due to their extensive cross-references, citations, the integration of other databases and their minimal redundancy. TrEMBL (Translated EMBL Nucleotide Sequence Data Library) is a computer - annotated supplement of Swiss-Prot database that contains all the translations of EMBL nucleotide entries that are present not yet integrated in Swiss-Prot. This allows a fast data delivery.
Organization of the UniProt databases
UniProt has three elements, which are specialized to a specific use:
- The UniProt Knowledgebase ( UniProtKB ) is the central database of protein sequences. It gives information about the function and classification of proteins and cross-references.
- The UniProt Archive ( UniParc ) stores the set of all protein sequence data publicly available.
- The UniProt Reference Clusters ( UniRef ) are databases that allow the user a quick search, by preventing that redundant links available sequences appear. Among other things, pre- identical sequences, and fragments (of different organisms ) are combined in a data entry.