Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining

Kirk, Isa Kristina; Simon, Christian; Banasik, Karina; Holm, Peter Christoffer; Haue, Amalie Dahl; Jensen, Peter Bjødstrup; Juhl Jensen, Lars; Rodríguez, Cristina Leal; Pedersen, Mette Krogh; Eriksson, Robert; Andersen, Henrik Ullits; Almdal, Thomas; Bork-Jensen, Jette; Grarup, Niels; Borch-Johnsen, Knut; Pedersen, Oluf; Pociot, Flemming; Hansen, Torben; Bergholdt, Regine; Rossing, Peter; Brunak, Søren

Published in

eLife Sciences Publications, eLife, (8), 2019

DOI: 10.7554/elife.44941

Tools

Export citation

Search in Google Scholar

Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining

Journal article published in 2019 by Isa Kristina Kirk, Christian Simon, Karina Banasik, Peter Christoffer Holm, Amalie Dahl Haue, Peter Bjødstrup Jensen, Lars Juhl Jensen

, Cristina Leal Rodríguez, Mette Krogh Pedersen, Robert Eriksson, Henrik Ullits Andersen, Thomas Almdal, Jette Bork-Jensen

, Niels Grarup, Knut Borch-Johnsen and other authors.

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving forbidden

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.

Published in

Links

Tools

Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining

Abstract