Page 58 - Read Online
P. 58

Page 2 of 11               Lim et al. Plast Aesthet Res 2023;10:43  https://dx.doi.org/10.20517/2347-9264.2023.70

               INTRODUCTION
               Exponential advancements in AI have catalyzed the emergence of LLMs, which are gaining interest for their
                                                                                                      [1,2]
               abilities to synthesize large swaths of information and produce comprehensible answers to most queries .
               ChatGPT has proven to answer complex medical queries, which could revolutionize healthcare for patients
                                                    [1]
               with limited access to medical professionals . Hand trauma, specifically nerve lacerations, is common and
                                                     [3]
               can cause severe impairment if mismanaged . Timely and accurate information is imperative but can be
                                                              [4]
               limited, particularly in rural and low-resource settings . Opportunistically, AI-powered language models
               could fill this gap as potential medical guidance providers.
               ChatGPT has dominated the mainstay of articles determining the suitability of LLMs in medical practice.
               With the launch of other AI tools, specifically Google’s AI BARD and BingAI, their comparative
               performances should be evaluated. Therefore, this study evaluated Google’s AI BARD, BingAI, and
               ChatGPT for providing accurate and relevant information to patients presenting with hand trauma nerve
               lacerations.  Using  objective  and  subjective  metrics,  we  assessed  contextual  understanding  and
               recommendation suitability. This comparison will highlight strengths and weaknesses, informing
               improvements in AI-driven guidance for hand trauma care.

               CASE REPORT
               The authors evaluated the suitability of three LLMs-Google’s AI BARD, BingAI, and ChatGPT-4 by their
               capacity to interpret medical literature, extract relevant data, and produce precise, intelligible, and
               contextually suitable clinical advice. Their responses were also compared against established clinical
               guidelines. Additionally, the analysis will encompass the efficiency, dependability, potential biases, and
               ethical implications affiliated with each LLM within the realm of nerve injury management.

               A set of simulated patient-perspective queries concerning digital nerve injury diagnosis and management
               were presented to ChatGPT, BARD, and Bing AI. Responses generated were compared to existing clinical
               guidelines and literature. Additionally, a panel of plastic surgery residents and Board-certified Plastic
               Surgeons with extensive peripheral nerve injury expertise evaluated the responses with a Likert scale.
               Assessment criteria included accuracy, comprehensiveness, and provision of relevant information sources.


               To maintain consistency and precision, the same author (BL) documented the initial response each LLM
               provided for every question, avoiding further clarifications or alterations. Questions were crafted to avoid
               grammatical errors or ambiguity and were simultaneously inputted using separate accounts for OpenAI,
               Google, and Microsoft, granting access to ChatGPT-4, BARD, and Bing AI, respectively.

               We compiled a dataset of hand trauma nerve laceration cases, encompassing diverse scenarios, symptoms,
               and treatment alternatives. Each language model's efficacy was assessed on a Likert scale [Table 1] based on
               its capacity to offer patient guidance accurately and effectively. Evaluation criteria comprised the subsequent
               elements:

               Accuracy: the correctness and dependability of the information supplied by the language models.

               Comprehensibility: the patients' ability to readily understand the provided information and directions.

               Empathy and tone: the language models' capacity to convey empathy and sustain a suitable tone.
   53   54   55   56   57   58   59   60   61   62   63