Page 58 - Read Online
P. 58
Page 2 of 11 Lim et al. Plast Aesthet Res 2023;10:43 https://dx.doi.org/10.20517/2347-9264.2023.70
INTRODUCTION
Exponential advancements in AI have catalyzed the emergence of LLMs, which are gaining interest for their
[1,2]
abilities to synthesize large swaths of information and produce comprehensible answers to most queries .
ChatGPT has proven to answer complex medical queries, which could revolutionize healthcare for patients
[1]
with limited access to medical professionals . Hand trauma, specifically nerve lacerations, is common and
[3]
can cause severe impairment if mismanaged . Timely and accurate information is imperative but can be
[4]
limited, particularly in rural and low-resource settings . Opportunistically, AI-powered language models
could fill this gap as potential medical guidance providers.
ChatGPT has dominated the mainstay of articles determining the suitability of LLMs in medical practice.
With the launch of other AI tools, specifically Google’s AI BARD and BingAI, their comparative
performances should be evaluated. Therefore, this study evaluated Google’s AI BARD, BingAI, and
ChatGPT for providing accurate and relevant information to patients presenting with hand trauma nerve
lacerations. Using objective and subjective metrics, we assessed contextual understanding and
recommendation suitability. This comparison will highlight strengths and weaknesses, informing
improvements in AI-driven guidance for hand trauma care.
CASE REPORT
The authors evaluated the suitability of three LLMs-Google’s AI BARD, BingAI, and ChatGPT-4 by their
capacity to interpret medical literature, extract relevant data, and produce precise, intelligible, and
contextually suitable clinical advice. Their responses were also compared against established clinical
guidelines. Additionally, the analysis will encompass the efficiency, dependability, potential biases, and
ethical implications affiliated with each LLM within the realm of nerve injury management.
A set of simulated patient-perspective queries concerning digital nerve injury diagnosis and management
were presented to ChatGPT, BARD, and Bing AI. Responses generated were compared to existing clinical
guidelines and literature. Additionally, a panel of plastic surgery residents and Board-certified Plastic
Surgeons with extensive peripheral nerve injury expertise evaluated the responses with a Likert scale.
Assessment criteria included accuracy, comprehensiveness, and provision of relevant information sources.
To maintain consistency and precision, the same author (BL) documented the initial response each LLM
provided for every question, avoiding further clarifications or alterations. Questions were crafted to avoid
grammatical errors or ambiguity and were simultaneously inputted using separate accounts for OpenAI,
Google, and Microsoft, granting access to ChatGPT-4, BARD, and Bing AI, respectively.
We compiled a dataset of hand trauma nerve laceration cases, encompassing diverse scenarios, symptoms,
and treatment alternatives. Each language model's efficacy was assessed on a Likert scale [Table 1] based on
its capacity to offer patient guidance accurately and effectively. Evaluation criteria comprised the subsequent
elements:
Accuracy: the correctness and dependability of the information supplied by the language models.
Comprehensibility: the patients' ability to readily understand the provided information and directions.
Empathy and tone: the language models' capacity to convey empathy and sustain a suitable tone.