Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
Journal: 2000/June - Nature Genetics
ISSN: 1061-4036
Similar articles
Articles by the same authors
Discussion board
Nat Genet 25(1): 25-29

Gene Ontology: tool for the unification of biology

+11 authors
FlyBase (
Berkeley Drosophila Genome Project (
Saccharomyces Genome Database (
Mouse Genome Database and Gene Expression Database (
Correspondence should be addressed to J.M.C. (ude.drofnats@yrrehc) and D.B. (ude.drofnats.emoneg@nietstob), Department of Genetics, Stanford University School of Medicine, Stanford, California, USA.


Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web ( are being constructed: biological process, molecular function and cellular component.


The accelerating availability of molecular sequences, particularly the sequences of entire genomes, has transformed both the theory and practice of experimental biology. Where once biochemists characterized proteins by their diverse activities and abundances, and geneticists characterized genes by the phenotypes of their mutations, all biologists now acknowledge that there is likely to be a single limited universe of genes and proteins, many of which are conserved in most or all living cells. This recognition has fuelled a grand unification of biology; the information about the shared genes and proteins contributes to our understanding of all the diverse organisms that share them. Knowledge of the biological role of such a shared protein in one organism can certainly illuminate, and often provide strong inference of, its role in other organisms.

Progress in the way that biologists describe and conceptualize the shared biological elements has not kept pace with sequencing. For the most part, the current systems of nomenclature for genes and their products remain divergent even when the experts appreciate the underlying similarities. Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address.


  • 1. Goffeau A, et al Life with 6000 genes. Science. 1996;274:546.[PubMed][Google Scholar]
  • 2. Worm Sequencing Consortium. The C. elegans Sequencing Consortium Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 1998;282:2012–2018.[PubMed]
  • 3. Adams MD, et al The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195.[PubMed][Google Scholar]
  • 4. Meinke DW, et al Arabidopsis thaliana: a model plant for genome analysis. Science. 1998;282:662–682.[PubMed][Google Scholar]
  • 5. Chervitz SA, et al Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure. Nucleic Acids Res. 1999;27:74–78.[Google Scholar]
  • 6. Rubin GM, et al Comparative genomics of the eukaryotes. Science. 2000;287:2204–2215.[Google Scholar]
  • 7. Tang Z, Kuo T, Shen J, Lin RJBiochemical and genetic conservation of fission yeast Dsk1 and human SR protein-specific kinase 1. Mol. Cell. Biol. 2000;20:816–824.[Google Scholar]
  • 8. Vajo Z, et al Conservation of the Caenorhabditis elegans timing gene clk-1 from yeast to human: a gene required for ubiquinone biosynthesis with potential implications for aging. Mamm. Genome. 1999;10:1000–1004.[PubMed][Google Scholar]
  • 9. Ohi R, et al Myb-related Schizosaccharomyces pombe cdc5p is structurally and functionally conserved in eukaryotes. Mol. Cell. Biol. 1998;18:4097–4108.[Google Scholar]
  • 10. Bassett DE, Jr, et al Genome cross-referencing and XREFdb: implications for the identification and analysis of genes mutated in human disease. Nature Genet. 1997;15:339–344.[PubMed][Google Scholar]
  • 11. Kataoka T, et al Functional homology of mammalian and yeast RAS genes. Cell. 1985;40:19–26.[PubMed][Google Scholar]
  • 12. Botstein D, Fink GRYeast: an experimental organism for modern biology. Science. 1988;240:1439–1443.[PubMed][Google Scholar]
  • 13. Tatusov RL, Galperin MY, Natale DA, Koonin EVThe COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–36.[Google Scholar]
  • 14. Andrade MA, et al Automated genome sequence analysis and annotation. Bioinformatics. 1999;15:391–412.[PubMed][Google Scholar]
  • 15. Fleischmann W, Moller S, Gateau A, Apweiler RA novel method for automatic functional annotation of proteins. Bioinformatics. 1999;15:228–233.[PubMed][Google Scholar]
  • 16. The FlyBase Consortium The FlyBase database of the Drosophila Genome Projects and community literature. Nucleic Acids Res. 1999;27:85–88.
  • 17. Blake JA, et al The Mouse Genome Database (MGD): expanding genetic and genomic resources for the laboratory mouse. Nucleic Acids Res. 2000;28:108–111.[Google Scholar]
  • 18. Ringwald M, et al GXD: a gene expression database for the laboratory mouse—current status and recent enhancements. Nucleic Acids Res. 2000;28:115–119.[Google Scholar]
  • 19. Ball CA, et al Integrating functional genomic information into the Saccharomyces Genome Database. Nucleic Acids Res. 2000;28:77–80.[Google Scholar]
  • 20. Bairoch A, Apweiler RThe SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48.[Google Scholar]
  • 21. Benson DA, et alGenBank Nucleic Acids Res. 2000;28:15–18.[Google Scholar]
  • 22. Baker W, et al The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 2000;28:19–23.[Google Scholar]
  • 23. Tateno Y, et al DNA Data Bank of Japan (DDBJ) in collaboration with mass sequencing teams. Nucleic Acids Res. 2000;28:24–26.[Google Scholar]
  • 24. Barker WC, et al The Protein Information Resource (PIR) Nucleic Acids Res. 2000;28:41–44.[Google Scholar]
  • 25. Mewes HW, et al MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2000;28:37–40.[Google Scholar]
  • 26. Costanzo MC, et al The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD): comprehensive resources for the organization and comparison of model organism protein information. Nucleic Acids Res. 2000;28:73–76.[Google Scholar]
  • 27. Bateman A, et al The Pfam protein families database. Nucleic Acids Res. 2000;28:263–266.[Google Scholar]
  • 28. Lo Conte L, et al SCOP: a structural classification of proteins database. Nucleic Acids Res. 2000;28:257–259.[Google Scholar]
  • 29. Bairoch AThe ENZYME database in 2000. Nucleic Acids Res. 2000;28:304–305.[Google Scholar]
  • 30. Enzyme Nomenclature. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enyzmes. NC-IUBMB. Academic; New York: 1992. [PubMed]
  • 31. Tye BKMCM proteins in DNA replication. Annu. Rev. Biochem. 1999;68:649–686.[PubMed][Google Scholar]
  • 32. Eisen M, Spellman PT, Brown PO, Botstein DCluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA. 1998;95:14863–14868.[Google Scholar]
  • 33. Spellman PT, et al Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell. 1998;9:3273–3297.[Google Scholar]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.