Published in

BMJ Publishing Group, BMJ Open, 8(10), p. e037719, 2020

DOI: 10.1136/bmjopen-2020-037719

Links

Tools

Export citation

Search in Google Scholar

What are the implications of using individual and combined sources of routinely collected data to identify and characterise incident site-specific cancers? a concordance and validation study using linked English electronic health records data

Journal article published in 2020 by Helen Strongman ORCID, Rachael Williams, Krishnan Bhaskaran ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Green circle
Postprint: archiving allowed
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

ObjectivesTo describe the benefits and limitations of using individual and combinations of linked English electronic health data to identify incident cancers.Design and settingOur descriptive study uses linked English Clinical Practice Research Datalink primary care; cancer registration; hospitalisation and death registration data.Participants and measuresWe implemented case definitions to identify first site-specific cancers at the 20 most common sites, based on the first ever cancer diagnosis recorded in each individual or commonly used combination of data sources between 2000 and 2014. We calculated positive predictive values and sensitivities of each definition, compared with a gold standard algorithm that used information from all linked data sets to identify first cancers. We described completeness of grade and stage information in the cancer registration data set.Results165 953 gold standard cancers were identified. Positive predictive values of all case definitions were ≥80% and ≥94% for the four most common cancers (breast, lung, colorectal and prostate). Sensitivity for case definitions that used cancer registration alone or in combination was ≥92% for the four most common cancers and ≥80% across all cancer sites except bladder cancer (65% using cancer registration alone). For case definitions using linked primary care, hospitalisation and death registration data, sensitivity was ≥89% for the four most common cancers, and ≥80% for all cancer sites except kidney (69%), oral cavity (76%) and ovarian cancer (78%). When primary care or hospitalisation data were used alone, sensitivities were generally lower and diagnosis dates were delayed. Completeness of staging data in cancer registration data was high from 2012 (minimum 76.0% in 2012 and 86.4% in 2014 for the four most common cancers).ConclusionsAscertainment of incident cancers was good when using cancer registration data alone or in combination with other data sets, and for the majority of cancers when using a combination of primary care, hospitalisation and death registration data.