Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison

Wan, Wenxin; Ge, Calvin B.; Friesen, Melissa C.; Locke, Sarah J.; Russ, Daniel E.; Burstyn, Igor; Baker, Christopher J. O.; Adisesh, Anil; Lan, Qing; Rothman, Nathaniel; Huss, Anke; van Tongeren, Martie; Vermeulen, Roel; Peters, Susan

Published in

Oxford University Press, Annals of Work Exposures and Health, 2023

DOI: 10.1093/annweh/wxad002

Tools

Export citation

Search in Google Scholar

Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison

Journal article published in 2023 by Wenxin Wan

, Calvin B. Ge, Melissa C. Friesen

, Sarah J. Locke, Daniel E. Russ

, Igor Burstyn

, Christopher J. O. Baker, Anil Adisesh

, Qing Lan, Nathaniel Rothman, Anke Huss, Martie van Tongeren

, Roel Vermeulen

, Susan Peters

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Objectives Automatic job coding tools were developed to reduce the laborious task of manually assigning job codes based on free-text job descriptions in census and survey data sources, including large occupational health studies. The objective of this study is to provide a case study of comparative performance of job coding and JEM (Job-Exposure Matrix)-assigned exposures agreement using existing coding tools. Methods We compared three automatic job coding tools [AUTONOC, CASCOT (Computer-Assisted Structured Coding Tool), and LabourR], which were selected based on availability, coding of English free-text into coding systems closely related to the 1988 version of the International Standard Classification of Occupations (ISCO-88), and capability to perform batch coding. We used manually coded job histories from the AsiaLymph case-control study that were translated into English prior to auto-coding to assess their performance. We applied two general population JEMs to assess agreement at exposure level. Percent agreement and PABAK (Prevalence-Adjusted Bias-Adjusted Kappa) were used to compare the agreement of results from manual coders and automatic coding tools. Results The coding per cent agreement among the three tools ranged from 17.7 to 26.0% for exact matches at the most detailed 4-digit ISCO-88 level. The agreement was better at a more general level of job coding (e.g. 43.8–58.1% in 1-digit ISCO-88), and in exposure assignments (median values of PABAK coefficient ranging 0.69–0.78 across 12 JEM-assigned exposures). Based on our testing data, CASCOT was found to outperform others in terms of better agreement in both job coding (26% 4-digit agreement) and exposure assignment (median kappa 0.61). Conclusions In this study, we observed that agreement on job coding was generally low for the three tools but noted a higher degree of agreement in assigned exposures. The results indicate the need for study-specific evaluations prior to their automatic use in general population studies, as well as improvements in the evaluated automatic coding tools.

Published in

Links

Tools

Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison

Abstract