Usability-driven pruning of large ontologies: the case of SNOMED CT

López-García, Pablo; Boeker, Martin; Illarramendi, Arantza; Schulz, Stefan

Published in

Oxford University Press, JAMIA: A Scholarly Journal of Informatics in Health and Biomedicine, e1(19), p. e102-e109, 2012

DOI: 10.1136/amiajnl-2011-000503

Tools

Export citation

Search in Google Scholar

Usability-driven pruning of large ontologies: the case of SNOMED CT

Journal article published in 2012 by Pablo López-García, Martin Boeker

, Arantza Illarramendi, Stefan Schulz

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

OBJECTIVES: To study ontology modularization techniques when applied to SNOMED CT in a scenario in which no previous corpus of information exists and to examine if frequency-based filtering using MEDLINE can reduce subset size without discarding relevant concepts. MATERIALS AND METHODS: Subsets were first extracted using four graph-traversal heuristics and one logic-based technique, and were subsequently filtered with frequency information from MEDLINE. Twenty manually coded discharge summaries from cardiology patients were used as signatures and test sets. The coverage, size, and precision of extracted subsets were measured. RESULTS: Graph-traversal heuristics provided high coverage (71-96% of terms in the test sets of discharge summaries) at the expense of subset size (17-51% of the size of SNOMED CT). Pre-computed subsets and logic-based techniques extracted small subsets (1%), but coverage was limited (24-55%). Filtering reduced the size of large subsets to 10% while still providing 80% coverage. DISCUSSION: Extracting subsets to annotate discharge summaries is challenging when no previous corpus exists. Ontology modularization provides valuable techniques, but the resulting modules grow as signatures spread across subhierarchies, yielding a very low precision. CONCLUSION: Graph-traversal strategies and frequency data from an authoritative source can prune large biomedical ontologies and produce useful subsets that still exhibit acceptable coverage. However, a clinical corpus closer to the specific use case is preferred when available.

Published in

Links

Tools

Usability-driven pruning of large ontologies: the case of SNOMED CT

Abstract