Accuracy of ChatGPT‐Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis

Vaira, Luigi Angelo; Lechien, Jerome R.; Abbate, Vincenzo; Allevi, Fabiana; Audino, Giovanni; Beltramini, Giada Anna; Bergonzani, Michela; Bolzoni, Alessandro; Committeri, Umberto; Crimi, Salvatore; Gabriele, Guido; Lonardi, Fabio; Maglitto, Fabio; Petrocelli, Marzia; Pucci, Resi; Saponaro, Gianmarco; Tel, Alessandro; Vellone, Valentino; Chiesa‐Estomba, Carlos Miguel; Boscolo‐Rizzo, Paolo; Salzano, Giovanni; De Riu, Giacomo

Published in

SAGE Publications, Otolaryngology - Head and Neck Surgery, 2023

DOI: 10.1002/ohn.489

Tools

Export citation

Search in Google Scholar

Accuracy of ChatGPT‐Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis

Journal article published in 2023 by Luigi Angelo Vaira

, Jerome R. Lechien, Vincenzo Abbate, Fabiana Allevi, Giovanni Audino, Giada Anna Beltramini, Michela Bergonzani, Alessandro Bolzoni, Umberto Committeri, Salvatore Crimi, Guido Gabriele, Fabio Lonardi, Fabio Maglitto, Marzia Petrocelli, Resi Pucci

and other authors.

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

AbstractObjectiveTo investigate the accuracy of Chat‐Based Generative Pre‐trained Transformer (ChatGPT) in answering questions and solving clinical scenarios of head and neck surgery.Study DesignObservational and valuative study.SettingEighteen surgeons from 14 Italian head and neck surgery units.MethodsA total of 144 clinical questions encompassing different subspecialities of head and neck surgery and 15 comprehensive clinical scenarios were developed. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using accuracy (range 1‐6), completeness (range 1‐3), and references' quality Likert scales.ResultsThe overall median score of open‐ended questions was 6 (interquartile range[IQR]: 5‐6) for accuracy and 3 (IQR: 2‐3) for completeness. Overall, the reviewers rated the answer as entirely or nearly entirely correct in 87.2% of cases and as comprehensive and covering all aspects of the question in 73% of cases. The artificial intelligence (AI) model achieved a correct response in 84.7% of the closed‐ended questions (11 wrong answers). As for the clinical scenarios, ChatGPT provided a fully or nearly fully correct diagnosis in 81.7% of cases. The proposed diagnostic or therapeutic procedure was judged to be complete in 56.7% of cases. The overall quality of the bibliographic references was poor, and sources were nonexistent in 46.4% of the cases.ConclusionThe results generally demonstrate a good level of accuracy in the AI's answers. The AI's ability to resolve complex clinical scenarios is promising, but it still falls short of being considered a reliable support for the decision‐making process of specialists in head‐neck surgery.

Published in

Links

Tools

Accuracy of ChatGPT‐Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis

Abstract