A Study of Biomedical Concept Identification: MetaMap vs. People
Abstract
Although huge amounts of unstructured text are available as a rich source of biomedical knowledge, to process this unstructured knowledge requires tools that identify concepts from free-form text. MetaMap is one tool that system developers in biomedicine have commonly used for such a task, but few have studied how well it accomplishes this task in general. In this paper, we report on a study that compares MetaMap’s performance against that of six people. Such studies are challenging because the task is inherently subjective and establishing consensus is difficult. Nonetheless, for those concepts that subjects generally agreed on, MetaMap was able to identify most concepts, if they were represented in the UMLS. However, MetaMap identified many other concepts that people did not. We also report on our analysis of the types of failures that MetaMap exhibited as well as trends in the way people chose to identify concepts.
Acknowledgments
We thank the physicians and nurses who participated in this study. Thanks also to Lelia Arnheim for the data entry and consistency checking. This work was supported by a grant from the National Science Foundation.
References
- 1. Aronson, A. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program in Proc AMIA Symp 2001. 17–21.
- 2. Aronson, A.R., MetaMap: Mapping Text to the UMLS Metathesaurus 1996.
- 3. Aronson AR, Rindflesch TCQuery expansion using the UMLS Metathesaurus. Proc AMIA Symp. 1997;36(1):485–9.[Google Scholar]
- 4. Pratt, W. and H. Wasserman. QueryCat: Automatic Categorization of MEDLINE Queries in Proc AMIA Symp 2000. Los Angeles, CA. p. 655–659.
- 5. Wright LW, et al Hierarchical Concept Indexing of Full-Text Documents in the Unified Medical Language System Information Sources Map. Journal of the American Society for Information Science. 1998;50(6):514–523.[PubMed][Google Scholar]
- 6. Weeber, M., et al. Text-based discovery in biomedicine: the architecture of the DAD-system in Proc AMIA Symp 2000. 903–7.
- 7. Pratt, W. and M. Yetisgen-Yildiz. A Knowledge-Based, Text-Mining Approach to Finding Connections in the Biomedical Literature in SIGIR-03: International ACM Conference on Research and Development in Information Retrieval 2003 (submitted). Toronto, Canada.
- 8. Sneiderman, C., T. Rindflesch, and C. Bean. Identification of anatomical terminology in medical text in Proc AMIA Symp 1998. 428–32.
- 9. Rindflesch, T., L. Hunter, and A. Aronson. Mining molecular binding terminology from biomedical text. in Proc AMIA Symp 1999. 127–31.
- 10. Hripcsak G, et al A reliability study for evaluating information extraction from radiology reports. J Am Med Inf Assoc. 1999;6:143–150.[Google Scholar]
- 11. Friedman, C., et al. GENIES: a natural-langauge processing system for the extraction of molecular pathways from journal articles in Bioinformatics suppl 2001. 74–82. [[PubMed]
- 12. Sager N, et al Natural Language Processing and the Representation of Clinical Data. J Am Med Inf Assoc. 1994;1(2):142–60.[Google Scholar]
- 13. Haug P, Ranum D, Frederick PComputerized Extraction of Coded Findings from Free-text Radiologic Report. Radiology. 1990;174:543–8.[PubMed][Google Scholar]
- 14. Friedman C, Hripcsak GEvaluating natural language processors in the clinical domain. Methods Inf Med. 1998;37(4–5):334–44.[PubMed][Google Scholar]

