Protein Data Bank
From PDBWiki
The Protein Data Bank (PDB) is the worldwide repository of biological macromolecular structural data.
Contents |
[edit] Overview
For simplicity, one can think of the PDB as a database of protein- and nucleic acid structures. However, it is important to keep in mind that the PDB entries are actually descriptions of structure determination experiments and their results. While such experiments usually aim at determining the structure of a protein or other macromolecule, the result of the experiment is actually just a model of the physical molecule.
The PDB was one of the first central repositories for biological data, preceding similar databases like Genbank for genomic sequences or Pubmed for biomedical literature. It was also one of the first examples where publication authors were required to submit experimental data to a central database before publication [5]. The PDB is now the single worldwide archive of structural data of biological macromolecules [3].
To date (22nd Apr 2008) the PDB contains 50,480 entries. [1]
[edit] Data model
The PDB data model is based on the mmCIF data dictionary, a standard originally developed for the description of x-ray crystallography experiments which was later extended to other types of structure determination, namely nuclear magnetic resonance spectroscopy (NMR). Even though formally the PDB refers to the set of flat files which are available via FTP, a relational data model can be derived from the mmCIF data dictionary and the www sites provided by the wwPDB partners are based on such a relational database [5].
[edit] History
The PDB was initiated in 1971 by the Brookhaven National Laboratory in New York [2]. Some authors still refer to it as the Brookhaven Protein Data Bank even though it has not been associated with Brookhaven since 1999. Since this time the PDB has been maintained by the Research Collaboratory for Structural Bioinformatics (RCSB). The RCSB, together with the MSD group at the European Bioinformatics Institute (EBI), the PDBj in Japan and the BMRB (USA) now form the Worldwide Protein Data Bank (wwPDB) organization, which was announced in 2003 [3]. The members of the wwPDB are maintaining and developing the PDB as the single and publicly available archive of macromolecular structural data.
[edit] Recent Developments
The PDB data has recently been cleaned up as part of the PDB remediation project [7]. The remediated files were officially released in August 2007. Major goals of the remediation project were to verify and update annotations, standardize the naming of chemical components and to fix inconsistencies in residue numbering.
[edit] Criticism
Despite the recent efforts of data remediation, the PDB files still contain numerous inconsistencies and errors (which was one of the motivations for creating PDBWiki). Typical problems include violations of the PDB format specifications, inconsistent residue numbering and missing values for experimental parameters. While most of these types of errors were probably introduced during data submission, the data model itself has also been criticized [6]. Finally, many authors have pointed out problems with the contained experimental data or its interpretation by the submitters [9,10].
[edit] References
[1] http://www.pdb.org/.
[2] F.C.Bernstein, T.F.Koetzle, G.J.B.Williams, E.F.Meyer Jr, M.D.Brice, J.R.Rodgers, O.Kennard, T.Shimanouchi, M.Tasumi: The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112 pp. 535-542 (1977)
[3] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242 (2000).
[4] H.M. Berman, K. Henrick, H. Nakamura (2003): Announcing the worldwide Protein Data Bank. Nature Structural Biology 10 (12), p. 980
[5] H.M. Berman, K. Henrick, H. Nakamura, J. Markley, P.E. Bourne, J. Westbrook: Realism about PDB. Nature Biotechnology 25, 845 - 846 (2007).
[6] A.C. Schierz, L.N. Soldatova, R.D. King. Overhauling the PDB. Nat Biotechnol. 2007 Apr;25(4):437-42.
[7] http://remediation.wwpdb.org.
[8] P.E. Bourne, H.M. Berman, B. McMahon, K.D. Watenpaugh, J. Westbrook, P.M.D. Fitzgerald: The Macromolecular Crystallographic Information File (mmCIF). Meth. Enzymol. (1997) 277, 571-590.
[9] R.W. Hooft, G. Vriend, C. Sander, E.E. Abola. Errors in protein structures. Nature. 1996 May 23;381(6580):272.
[10] Weichenberger, C. X., and Sippl, M. J. (2007) Recognition and Correction of Erroneous Asparagine and Glutamine Side Chain Rotamers in Protein Structures. Nucleic Acids Res., 35, W403-W406.
[edit] See also
- PDB FAQ Frequently asked questions about the PDB and working with macromolecular models
- The Protein Data Bank article in Wikipedia
- wwPDB FAQ
