Gene Ontology: tool for the unification of biology
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
The accelerating availability of molecular sequences, particularly the sequences of entire genomes, has transformed both the theory and practice of experimental biology. Where once biochemists characterized proteins by their diverse activities and abundances, and geneticists characterized genes by the phenotypes of their mutations, all biologists now acknowledge that there is likely to be a single limited universe of genes and proteins, many of which are conserved in most or all living cells. This recognition has fuelled a grand unification of biology; the information about the shared genes and proteins contributes to our understanding of all the diverse organisms that share them. Knowledge of the biological role of such a shared protein in one organism can certainly illuminate, and often provide strong inference of, its role in other organisms.
Progress in the way that biologists describe and conceptualize the shared biological elements has not kept pace with sequencing. For the most part, the current systems of nomenclature for genes and their products remain divergent even when the experts appreciate the underlying similarities. Interoperability of genomic databases is limited by this lack of progress, and it is this major obstacle that the Gene Ontology (GO) Consortium was formed to address.
- 1. Goffeau A, et al Life with 6000 genes. Science. 1996;274:546.
- 2. Worm Sequencing Consortium. The C. elegans Sequencing Consortium Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 1998;282:2012–2018.
- 3. Adams MD, et al The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195.
- 4. Meinke DW, et al Arabidopsis thaliana: a model plant for genome analysis. Science. 1998;282:662–682.
- 5. Chervitz SA, et al Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure. Nucleic Acids Res. 1999;27:74–78.
- 6. Rubin GM, et al Comparative genomics of the eukaryotes. Science. 2000;287:2204–2215.
- 7. Tang Z, Kuo T, Shen J, Lin RJBiochemical and genetic conservation of fission yeast Dsk1 and human SR protein-specific kinase 1. Mol. Cell. Biol. 2000;20:816–824.
- 8. Vajo Z, et al Conservation of the Caenorhabditis elegans timing gene clk-1 from yeast to human: a gene required for ubiquinone biosynthesis with potential implications for aging. Mamm. Genome. 1999;10:1000–1004.
- 9. Ohi R, et al Myb-related Schizosaccharomyces pombe cdc5p is structurally and functionally conserved in eukaryotes. Mol. Cell. Biol. 1998;18:4097–4108.
- 10. Bassett DE, Jr, et al Genome cross-referencing and XREFdb: implications for the identification and analysis of genes mutated in human disease. Nature Genet. 1997;15:339–344.
- 11. Kataoka T, et al Functional homology of mammalian and yeast RAS genes. Cell. 1985;40:19–26.
- 12. Botstein D, Fink GRYeast: an experimental organism for modern biology. Science. 1988;240:1439–1443.
- 13. Tatusov RL, Galperin MY, Natale DA, Koonin EVThe COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–36.
- 14. Andrade MA, et al Automated genome sequence analysis and annotation. Bioinformatics. 1999;15:391–412.
- 15. Fleischmann W, Moller S, Gateau A, Apweiler RA novel method for automatic functional annotation of proteins. Bioinformatics. 1999;15:228–233.
- 16. The FlyBase Consortium The FlyBase database of the Drosophila Genome Projects and community literature. Nucleic Acids Res. 1999;27:85–88.
- 17. Blake JA, et al The Mouse Genome Database (MGD): expanding genetic and genomic resources for the laboratory mouse. Nucleic Acids Res. 2000;28:108–111.
- 18. Ringwald M, et al GXD: a gene expression database for the laboratory mouse—current status and recent enhancements. Nucleic Acids Res. 2000;28:115–119.
- 19. Ball CA, et al Integrating functional genomic information into the Saccharomyces Genome Database. Nucleic Acids Res. 2000;28:77–80.
- 20. Bairoch A, Apweiler RThe SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48.
- 21. Benson DA, et alGenBank Nucleic Acids Res. 2000;28:15–18.
- 22. Baker W, et al The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 2000;28:19–23.
- 23. Tateno Y, et al DNA Data Bank of Japan (DDBJ) in collaboration with mass sequencing teams. Nucleic Acids Res. 2000;28:24–26.
- 24. Barker WC, et al The Protein Information Resource (PIR) Nucleic Acids Res. 2000;28:41–44.
- 25. Mewes HW, et al MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2000;28:37–40.
- 26. Costanzo MC, et al The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD): comprehensive resources for the organization and comparison of model organism protein information. Nucleic Acids Res. 2000;28:73–76.
- 27. Bateman A, et al The Pfam protein families database. Nucleic Acids Res. 2000;28:263–266.
- 28. Lo Conte L, et al SCOP: a structural classification of proteins database. Nucleic Acids Res. 2000;28:257–259.
- 29. Bairoch AThe ENZYME database in 2000. Nucleic Acids Res. 2000;28:304–305.
- 30. Enzyme Nomenclature. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enyzmes. NC-IUBMB. Academic; New York: 1992.
- 31. Tye BKMCM proteins in DNA replication. Annu. Rev. Biochem. 1999;68:649–686.
- 32. Eisen M, Spellman PT, Brown PO, Botstein DCluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA. 1998;95:14863–14868.
- 33. Spellman PT, et al Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell. 1998;9:3273–3297.