Document Clustering of Clinical Narratives: a Systematic Study of Clinical Sublanguages

Patterson, Olga; Hurdle, John F.

Tools

Export citation

Search in Google Scholar

Document Clustering of Clinical Narratives: a Systematic Study of Clinical Sublanguages

Journal article published in 2011 by Olga Patterson

, John F. Hurdle

This paper is available in a repository.

Full text: Download

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

It is widely believed that different clinical domains use their own sublanguage in clinical notes, complicating natural language processing, but this has never been demonstrated on a broad selection of note types. Starting from formal sublanguage theory, we constructed a feature space based on vocabulary and semantic types used in 17 different clinical domains by three author types (physicians, nurses, and social workers) in both the in- and outpatient settings. We supplied the resulting vectors to CLUTO, a robust clustering tool suitable for this high-dimensional space. Our results confirm that note types with a broad clinical scope, e.g, History & Physicals and Discharge Summaries, cluster together, while note types with a narrow clinical scope form surprisingly pure, disjoint sublanguages. A reasonable conclusion from this study is that any tool relying on term statistics or semantics trained on one clinical note type may not work well on any other.

Links

Tools

Document Clustering of Clinical Narratives: a Systematic Study of Clinical Sublanguages

Abstract