Multi-label Text Classification of German Language Medical Documents.

Spat, Stephan; Cadonna, Bruno; Rakovac, Ivo; Gütl, Christian; Leitner, Hubert; Stark, Günther; Beck, Peter

Tools

Export citation

Search in Google Scholar

Multi-label Text Classification of German Language Medical Documents.

Proceedings article published in 2007 by Stephan Spat, Bruno Cadonna, Ivo Rakovac

, Christian Gütl, Hubert Leitner, Günther Stark, Peter Beck

This paper is available in a repository.

Full text: Download

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

At nearly every patient visit, medical documents are produced and stored in a medical record, often in an unstructured form as free text. The growing amount of stored documents increases the need for effective and timely retrieval of information. We developed a multi-label text classification system to categorize free text medical documents (e.g. discharge letters, clinical findings, reports) written in German into predefined classes. A random sample of 1,500 free text medical documents was retrieved from a general hospital information system and was manually assigned to 1 to 8 categories by a domain expert. This sample was used to train and evaluate the performance of 4 classification schemes: Naïve Bayes, k-NN, SVM, and J48. Additional tests of the effect of text preprocessing were done. In our study, preprocessing improved the performance, and best results were obtained by J48 classification.

Links

Tools

Multi-label Text Classification of German Language Medical Documents.

Abstract