Will code one day run a code? Performance of language models on ACEM primary examinations and implications

Smith, Jesse; Choi, Philip Mc; Buntine, Paul

Published in

Wiley, Emergency Medicine Australasia, 5(35), p. 876-878, 2023

DOI: 10.1111/1742-6723.14280

Tools

Export citation

Search in Google Scholar

Will code one day run a code? Performance of language models on ACEM primary examinations and implications

Journal article published in 2023 by Jesse Smith

, Philip Mc Choi

, Paul Buntine

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

AbstractObjectiveLarge language models (LLMs) have demonstrated mixed results in their ability to pass various specialist medical examination and their performance within the field of emergency medicine remains unknown.MethodsWe explored the performance of three prevalent LLMs (OpenAI's GPT series, Google's Bard, and Microsoft's Bing Chat) on a practice ACEM primary examination.ResultsAll LLMs achieved a passing score, with scores with GPT 4.0 outperforming the average candidate.ConclusionLarge language models, by passing the ACEM primary examination, show potential as tools for medical education and practice. However, limitations exist and are discussed.

Published in

Links

Tools

Will code one day run a code? Performance of language models on ACEM primary examinations and implications

Abstract