The Pfam protein families database

Alex Bateman

Lachlan Coin

Richard Durbin

Robert Finn

Simon Moxon+4 authors

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK, ^{Center for Genomics and Bioinformatics, Karolinska Institutet, S-171 77 Stockholm, Sweden and}^{Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, St Louis, MO 63110, USA}

^{To whom correspondence should be addressed. Tel: +44 1223 494950; Fax: +44 1223 494919; Email:}ku.ca.regnas@bga

Received 2003 Sep 17; Accepted 2003 Oct 20.

Abstract

Pfam is a large collection of protein families and domains. Over the past 2 years the number of families in Pfam has doubled and now stands at 6190 (version 10.0). Methodology improvements for searching the Pfam collection locally as well as via the web are described. Other recent innovations include modelling of discontinuous domains allowing Pfam domain definitions to be closer to those found in structure databases. Pfam is available on the web in the UK (http://www.sanger.ac.uk/Software/Pfam/), the USA (http://pfam.wustl.edu/), France (http://pfam.jouy.inra.fr/) and Sweden (http://Pfam.cgb.ki.se/).

Abstract

ACKNOWLEDGEMENTS

We would like to thank William Mifsud, Nicola Kerrison, David Waterfield and Ben Vella Briffa for adding many of the new families to Pfam. We are grateful to Kevin Howe for useful discussions and advice. We would also like to thank Timo Lassmann and Markus Wistrand for help maintaining the Sweden Pfam website and Lorenzo Cerutti for maintaining the French Pfam website. This work was funded by the The Wellcome Trust and and an MRC (UK) E-science grant.

ACKNOWLEDGEMENTS

REFERENCES

References

1. Bateman A., Birney,E., Durbin,R., Eddy,S.R., Howe,K.L. and Sonnhammer,E.L.L. (2000) The Pfam protein families database. Nucleic Acids Res., 28, 263–266.
2. Bateman A., Birney,E., Cerruti,L., Durbin,R., Etwiller,L., Eddy,S.R., Griffiths-Jones,S., Howe,K.L., Marshall,M. and Sonnhammer,E.L.L. (2002) The Pfam protein families database. Nucleic Acids Res., 30, 276–280.
3. Boeckmann B., Bairoch,A., Apweiler,R., Blatter,M.C., Estreicher,A., Gasteiger,E., Martin,M.J., Michoud,K., O’Donovan,C., Phan,I. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31, 365–370.
4. Corpet F., Servant,F., Gouzy,J. and Kahn,D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res., 28, 267–269.
5. Lo Conte L., Brenner,S.E., Hubbard,T.J., Chothia,C. and Murzin,A.G. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30, 264–267.
6. Pearl F.M., Bennett,C.F., Bray,J.E., Harrison,A.P., Martin,N., Shepherd,A., Sillitoe,I., Thornton,J. and Orengo,C.A. (2003) The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res., 31, 452–455.
7. Sayle R. and Milner-White,E. (1995) RASMOL: biomolecular graphics for all. Trends Biochem. Sci., 20, 374–374. [[PubMed]
8. Letunic I., Goodstadt,L., Dickens,N.J., Doerks,T., Schultz,J., Mott,R., Ciccarelli,F., Copley,R.R., Ponting,C.P. and Bork,P. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res., 30, 242–244.
9. Coin L., Bateman,A. and Durbin,R. (2003) Enhanced protein domain discovery by using language modeling techniques from speech recognition. Proc. Natl Acad. Sci. USA, 100, 4516–4520.
10. Laskowski R.A(2001) PDBsum: summaries and analyses of PDB structures. Nucleic Acids Res., 29, 221–222. [Google Scholar]
11. Boutselakis H., Dimitropoulos,D., Fillon,J., Golovin,A., Henrick,K., Hussain,A., Ionides,J., John,M., Keller,P.A., Krissinel,E. et al. (2003) E-MSD: the European Bioinformatics Institute Macromolecular Structure Database. Nucleic Acids Res., 31, 458–462.
12. Kraulis P(1991) MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr., 24, 946–950. [PubMed][Google Scholar]
13. Bacon D. and Anderson,W. (1988) A fast algorithm for rendering space-filling molecule pictures. J. Mol. Graph., 6, 219–220. [PubMed]