Multiclass cancer diagnosis using tumor gene expression signatures.
Journal: 2002/January - Proceedings of the National Academy of Sciences of the United States of America
ISSN: 0027-8424
Abstract:
The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a support vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.
Relations:
Content
Citations
(405)
References
(21)
Diseases
(1)
Chemicals
(1)
Organisms
(1)
Processes
(1)
Affiliates
(1)
Similar articles
Articles by the same authors
Discussion board
Proc Natl Acad Sci U S A 98(26): 15149-15154

Multiclass cancer diagnosis using tumor gene expression signatures

+6 authors
Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, Cambridge, MA 02138; Departments of Adult and Pediatric Oncology, Dana–Farber Cancer Institute/Harvard Medical School, Boston, MA 02115; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115; Department of Pathology, Memorial Sloan–Kettering Cancer Center, New York, NY 10021; and Departments of Biology, McGovern Institute, Center for Brain and Computational Learning, and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139
To whom reprint requests should be addressed at: Dana–Farber Cancer Institute, 44 Binney Street, Dana 640, Boston, MA 02115. E-mail: ude.tim.iw.emoneg@bulog.
Contributed by Eric S. Lander
Contributed by Eric S. Lander
Accepted 2001 Oct 23.

Abstract

The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a support vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.

Abstract

Cancer classification relies on the subjective interpretation of both clinical and histopathological information with an eye toward placing tumors in currently accepted categories based on the tissue of origin of the tumor. However, clinical information can be incomplete or misleading. In addition, there is a wide spectrum in cancer morphology and many tumors are atypical or lack morphologic features that are useful for differential diagnosis (1). These difficulties can result in diagnostic confusion, prompting calls for mandatory second opinions in all surgical pathology cases (2). In the aggregate, these are significant limitations that may hinder patient care, add expense, and confound the results of clinical trials.

Molecular diagnostics offer the promise of precise, objective, and systematic human cancer classification, but these tests are not widely applied because characteristic molecular markers for most solid tumors have yet to be identified (3). Recently, DNA microarray-based tumor gene expression profiles have been used for cancer diagnosis. However, studies have been limited to few cancer types and have spanned multiple technology platforms complicating comparison among different datasets (410). The feasibility of cancer diagnosis across all of the common malignancies based on a single reference database has not been explored. In addition, comprehensive gene expression databases have yet to be developed, and there are no established analytical methods capable of solving complex, multiclass, gene expression-based classification problems.

To address these challenges, we created a gene expression database containing the expression profiles of 218 tumor samples representing 14 common human cancer classes. By using an innovative analytical method, we demonstrate that accurate multiclass cancer classification is indeed possible, suggesting the feasibility of molecular cancer diagnosis by means of comparison with a comprehensive and commonly accessible catalog of gene expression profiles.

Acknowledgments

We thank Scott Pomeroy, Margaret Shipp, Raphael Bueno, Kevin Loughlin, and Phil Febbo for contributing tumor samples to this study. We thank David Waltregny for initial review of pathology, Christine Huard and Michelle Gaasenbeek for expert technical assistance, and Leslie Gaffney for insightful editorial review. We are also indebted to members of the Cancer Genomics Group (Whitehead/Massachusetts Institute of Technology Center for Genome Research) and the Golub Laboratory (Dana–Farber Cancer Institute) for many valuable discussions. This work was supported in part by a Harvard/National Institutes of Health training grant in Molecular Hematology (S.R.) and by grants from Affymetrix, Millennium Pharmaceuticals (Cambridge, MA), and Bristol-Myers Squibb (E.S.L.).

Acknowledgments

Abbreviations

SVMsupport vector machine
OVAone vs. all
S2Nsignal to noise
Abbreviations

Note Added in Proof.

Recently, Su et al. (30) also reported using human tumor gene expression profiles to distinguish a number of carcinoma classes.

Note Added in Proof.

References

  • 1. Ramaswamy S, Osteen R T, Shulman L N In: Clinical Oncology. Lenhard R E, Osteen R T, Gansler T, editors. Atlanta: Am. Cancer Soc.; 2001. pp. 711–719. [PubMed][Google Scholar]
  • 2. Tomaszewski J E, LiVolsi V A. Cancer. 1999;86:2198–2200.[PubMed]
  • 3. Connolly J L, Schnitt S J, Wang H H, Dvorak A M, Dvorak H F In: Cancer Medicine. Holland J F, Frei E, Bast R C, Kufe D W, Morton D L, Weichselbaum R R, editors. Baltimore: Williams & Wilkins; 1997. pp. 533–555. [PubMed][Google Scholar]
  • 4. Golub T R, Slonim D K, Tamayo P, Huard C, Gaasenbeek M, Mesirov J P, Coller H, Loh M L, Downing J R, Caligiuri M A, et al Science. 1999;286:531–537.[PubMed][Google Scholar]
  • 5. Alizadeh A A, Eisen M B, Davis R E, Ma C, Lossos I S, Rosenwald A, Boldrick J C, Sabet H, Tran T, Yu X, et al Nature (London) 2000;403:503–511.[PubMed][Google Scholar]
  • 6. Bittner M, Meltzer P, Chen Y, Jiang Y, Seftor E, Hendrix M, Radmacher M, Simon R, Yakhini Z, Ben-Dor A, et al Nature (London) 2000;406:536–540.[PubMed][Google Scholar]
  • 7. Perou C M, Sorlie T, Eisen M B, van de Rijn M, Jeffrey S S, Rees C A, Pollack J R, Ross D T, Johnsen H, Akslen L A, et al Nature (London) 2000;406:747–752.[PubMed][Google Scholar]
  • 8. Hedenfalk I, Duggan D, Chen Y, Radmacher M, Simon R, Meltzer P, Gusterson B, Esteller M, Kallioniemi O P, Wilfond B, et al N Engl J Med. 2001;344:539–548.[PubMed][Google Scholar]
  • 9. Khan J, Wei J S, Ringner M, Saal L H, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu C R, Peterson C, et al Nat Med. 2001;7:673–679.[Google Scholar]
  • 10. Dhanasekaran S M, Barrette T R, Ghosh D, Shah R, Varambally S, Kurachi K, Pienta K J, Rubin M A, Chinnaiyan A M. Nature. 2001;412:822–826.[PubMed]
  • 11. Eisen M B, Spellman P T, Brown P O, Botstein D. Proc Natl Acad Sci USA. 1998;95:14863–14868.
  • 12. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E S, Golub T R. Proc Natl Acad Sci USA. 1999;96:2907–2912.
  • 13. Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. (2002) Mach. Learn., in press.
  • 14. Hair J F, Anderson R E, Tatham R L, Black W C Multivariate Data Analysis. Englewood Cliffs, NJ: Prentice–Hall; 1998. [PubMed][Google Scholar]
  • 15. Slonim D K Proceedings of the Fourth Annual International Conference on Computational Molecular Biology. Tokyo: Universal Acad. Press; 2000. pp. 263–272. [PubMed][Google Scholar]
  • 16. Dasarathy V B NN Pattern Classification Techniques. Los Alamitos, CA: IEEE Comp. Soc. Press; 1991. [PubMed][Google Scholar]
  • 17. Brown M P, Grundy W N, Lin D, Christianini N, Sugnet C W, Furey T S, Ares M, Haussler D. Proc Natl Acad Sci USA. 2000;97:262–267.
  • 18. Furey T, Christianini N, Duffy N, Bednarski D W, Schummer M, Haussler D. Bioinformatics. 2000;16:906–914.[PubMed]
  • 19. Vapnik V N Statistical Learning Theory. New York: Wiley; 1998. [PubMed][Google Scholar]
  • 20. Evgeniou T, Pontil M, Poggio T. Adv Comput Math. 2000;13:1–50.[PubMed]
  • 21. Hainsworth J D, Greco F A. N Engl J Med. 1993;329:257–263.[PubMed]
  • 22. Chapelle, O., Vapnik, V., Bousquet, O. & Mukherjee, S. (2002) Mach. Learn., in press.
  • 23. Hastie, T., Tibshirani, R., Eisen, M. B., Alizadeh, A., Levy, R., Staudt, L., Chan, W. C., Botstein, D. & Brown, P. (2000) Genome Biol.1, RESEARCH003.
  • 24. Taipale J, Beachy P A. Nature (London) 2001;411:349–354.[PubMed]
  • 25. Lickert H, Domon C, Huls G, Wehrle C, Duluc I, Clevers H, Meyer B I, Freund J N, Kemler R. Development (Cambridge, UK) 2000;127:3805–3813.[PubMed]
  • 26. Ziemer L T, Pennica D, Levine A J. Mol Cell Biol. 2001;21:562–574.
  • 27. Bienz M, Clevers H. Cell. 2000;103:311–320.[PubMed]
  • 28. Scherf U, Ross D T, Waltham M, Smith L H, Lee J K, Tanabe L, Kohn K W, Reinhold W C, Myers T G, Andrews D T, et al Nat Genet. 2000;24:236–244.[PubMed][Google Scholar]
  • 29. Staunton J E, Slonim D K, Coller H A, Tamayo P, Angelo M J, Park J, Scherf U, Lee J K, Reinhold W O, Weinstein J N, et al Proc Natl Acad Sci USA. 2001;98:10787–10792.[Google Scholar]
  • 30. Su A I, Welsh J B, Sapinoso L M, Kern S G, Dimitrov P, Lapp H, Schultz P G, Powell S M, Moskaluk C A, Frierson H F, Jr, Hampton G M. Cancer Res. 2001;61:7388–7393.[PubMed]
Collaboration tool especially designed for Life Science professionals.Drag-and-drop any entity to your messages.