Large-scale analysis of the human and mouse transcriptomes
Abstract
High-throughput gene expression profiling has become an important tool for investigating transcriptional activity in a variety of biological samples. To date, the vast majority of these experiments have focused on specific biological processes and perturbations. Here, we have generated and analyzed gene expression from a set of samples spanning a broad range of biological conditions. Specifically, we profiled gene expression from 91 human and mouse samples across a diverse array of tissues, organs, and cell lines. Because these samples predominantly come from the normal physiological state in the human and mouse, this dataset represents a preliminary, but substantial, description of the normal mammalian transcriptome. We have used this dataset to illustrate methods of mining these data, and to reveal insights into molecular and physiological gene function, mechanisms of transcriptional regulation, disease etiology, and comparative genomics. Finally, to allow the scientific community to use this resource, we have built a free and publicly accessible website (http://expression.gnf.org) that integrates data visualization and curation of current gene annotations.
The sequence of the first mammalian genome represents a landmark in modern biology and opens new avenues to pursue global approaches at understanding gene function and its relationship to human physiology (1, 2). The raw genome sequence and the accompanying gene predictions provide a starting point for the understanding of their function, the complexity of their interactions, and their roles in promoting cellular and organismal phenotypes. The most common approach to global gene annotation uses primary amino acid sequence analysis tools (e.g., blast and hmmer) and sequence databases (e.g., GenBank and Pfam; refs. 3–6). These powerful tools are used to annotate genes of unknown function under the premise that proteins of similar structure usually have similar function (e.g., kinases contain kinase domains).
Whereas primary sequence analysis frequently indicates the molecular function of a gene and can point to relevant biochemical assays for future study, it does not suggest the cellular or physiological role for proteins. To attempt to gain a more complete picture of a novel gene's function, researchers often perform multiple-tissue Northern blots to look at its expression in a panel of tissues or organs. However, this experiment can be laborious and time-consuming, and availability of a representative number of tissue samples is an important factor for interpretation of the results.
High-throughput gene expression analysis has allowed us to construct the equivalent of a multiple-tissue Northern blot for thousands of genes at once. We have constructed such a resource by profiling 46 human and 45 mouse tissues from diverse tissue origins. Whereas several recent studies have also described high-throughput gene expression measurements on diverse tissue sets (7–9), previous analyses of physiological gene function have been limited to identification of housekeeping genes, and clustering of genes involved in metabolic pathways and development of the central nervous system. The analysis of the data described in the current work has a significantly different and expanded scope. Here, we use mRNA expression patterns to specifically augment gene annotation of genes with no known physiological function. Furthermore, we extend this analysis to investigate mechanisms of transcriptional regulation, to discover candidate disease markers, and to compare transcriptional profiles of gene orthologs in mouse and human. Finally, we have constructed a web resource that allows users to easily perform common queries on the data. Because these data are generated from a non-ratiometric and standardized genomic technology, expansion of this dataset in our continuing effort toward elucidating the transcriptome will easily allow inclusion of additional gene expression data from internal samples as well as those contributed by external collaborators.
Click here to view.Acknowledgments
We thank David Lockhart and Lisa Wodicka for helpful discussions, and Jennifer Villasenor for excellent technical assistance. We also thank Cheng Li and Wing Hung Wong for statistical advice, and Martha Bulyk for helpful comments and suggestions. A.I.S. acknowledges the Achievement Rewards for College Scientists (ARCS) Foundation of San Diego and the La Jolla Interfaces in Science Program for predoctoral support.
Abbreviations
| AD | average difference |
| GPCR | G protein-coupled receptor |
References
- 1. Lander E S, Linton L M, Birren B, Nusbaum C, Zody M C, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al Nature (London) 2001;409:860–921.[PubMed][Google Scholar]
- 2. Venter J C, Adams M D, Myers E W, Li P W, Mural R J, Sutton G G, Smith H O, Yandell M, Evans C A, Holt R A, et al Science. 2001;291:1304–1351.[PubMed][Google Scholar]
- 3. Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. J Mol Biol. 1990;215:403–410.[PubMed]
- 4. Eddy S R, Mitchison G, Durbin R. J Comput Biol. 1995;2:9–23.[PubMed]
- 5. Burks C, Fickett J W, Goad W B, Kanehisa M, Lewitter F I, Rindone W P, Swindell C D, Tung C S, Bilofsky H S. Comput Appl Biosci. 1985;1:225–233.[PubMed]
- 6. Sonnhammer E L, Eddy S R, Durbin R. Proteins. 1997;28:405–420.[PubMed]
- 7. Warrington J A, Nair A, Mahadevappa M, Tsyganskaya M. Physiol Genomics. 2000;2:143–147.[PubMed]
- 8. Miki R, Kadota K, Bono H, Mizuno Y, Tomaru Y, Carninci P, Itoh M, Shibata K, Kawai J, Konno H, et al Proc Natl Acad Sci USA. 2001;98:2199–2204.[Google Scholar]
- 9. Penn S G, Rank D R, Hanzel D K, Barker D L. Nat Genet. 2000;26:315–318.[PubMed]
- 10. Lockhart D J, Dong H, Byrne M C, Follettie M T, Gallo M V, Chee M S, Mittmann M, Wang C, Kobayashi M, Horton H, Brown E L. Nat Biotechnol. 1996;14:1675–1680.[PubMed]
- 11. Wodicka L, Dong H, Mittmann M, Ho M H, Lockhart D J. Nat Biotechnol. 1997;15:1359–1367.[PubMed]
- 12. Sandberg R, Yasuda R, Pankratz D G, Carter T A, Del Rio J A, Wodicka L, Mayford M, Lockhart D J, Barlow C. Proc Natl Acad Sci USA. 2000;97:11038–11043.
- 13. Hughes J D, Estep P W, Tavazoie S, Church G M. J Mol Biol. 2000;296:1205–1214.[PubMed]
- 14. Welsh J B, Sapinoso L M, Su A I, Kern S G, Wang-Rodriguez J, Moskaluk C A, Frierson H F, Jr, Hampton G M. Cancer Res. 2001;61:5974–5978.[PubMed]
- 15. Nagase T, Ishikawa K, Suyama M, Kikuno R, Miyajima N, Tanaka A, Kotani H, Nomura N, Ohara O. DNA Res. 1998;5:277–286.[PubMed]
- 16. Sallese M, Mariggio S, Collodel G, Moretti E, Piomboni P, Baccetti B, De Blasi A. J Biol Chem. 1997;272:10188–10195.[PubMed]
- 17. Rosenfeld M G, Briata P, Dasen J, Gleiberman A S, Kioussi C, Lin C, O'Connell S M, Ryan A, Szeto D P, Treier M. Recent Prog Horm Res. 2000;55:1–13.[PubMed]
- 18. McGuire A M, Hughes J D, Church G M. Genome Res. 2000;10:744–757.[PubMed]
- 19. Harmer S L, Hogenesch J B, Straume M, Chang H S, Han B, Zhu T, Wang X, Kreps J A, Kay S A. Science. 2000;290:2110–2113.[PubMed]
- 20. Scully K M, Jacobson E M, Jepsen K, Lunyak V, Viadiu H, Carriere C, Rose D W, Hooshmand F, Aggarwal A K, Rosenfeld M G. Science. 2000;290:1127–1131.[PubMed]
- 21. Mauro M J, Druker B J. Curr Opin Oncol. 2001;13:3–7.[PubMed]
- 22. Lin B, Ferguson C, White J T, Wang S, Vessella R, True L D, Hood L, Nelson P S. Cancer Res. 1999;59:4180–4184.[PubMed]
- 23. Eklund L, Muona A, Lietard J, Pihlajaniemi T. Matrix Biol. 2000;19:489–500.[PubMed]
- 24. Eklund L, Piuhola J, Komulainen J, Sormunen R, Ongvarrasopone C, Fassler R, Muona A, Ilves M, Ruskoaho H, Takala T E, Pihlajaniemi T. Proc Natl Acad Sci USA. 2001;98:1194–1199.
- 25. Eisen M B, Spellman P T, Brown P O, Botstein D. Proc Natl Acad Sci USA. 1998;95:14863–14868.




