Page 65 - Read Online
P. 65

Lim et al. Plast Aesthet Res 2023;10:43  https://dx.doi.org/10.20517/2347-9264.2023.70  Page 9 of 11

               medical issues.


               Regarding user-friendliness and comprehensibility, ChatGPT and BARD exhibited comparable success.
               ChatGPT’s responses were succinct and devoid of technical jargon, rendering them advantageous for
               individuals with limited expertise, such as junior medical staff or patients. Additionally, ChatGPT
               acknowledged the user’s occupation and hand dominance, personalizing the interaction. BARD displayed a
               similar syntactical approach and empathized with the user, offering greetings, and expressing sympathy for
               the user’s injury. Although both models consistently advised consulting a doctor, BARD conveyed a
               warmer, more welcoming tone. Conversely, Bing AI manifested the least personable demeanor, utilizing
               cold, clinical language and frequent third-person pronouns. Although Table 2 shows Bing AI outperforming
               BARD in the Flesch-Kincaid Grade Level and Coleman-Liau Index, this is due to BARD's failure to answer
               prompt 3, adversely affecting its scores. Noting that half of the comparisons were statistically insignificant,
               the authors suggest further investigations to acquire more robust results.


               Deploying AI chatbots in clinical settings engenders ethical concerns, encompassing potential patient
               confidentiality breaches and inaccuracies due to error-prone public health data. Such issues may protract
               diagnosis, compromise patient safety, and entail legal consequences. To establish reliable and accountable
               AI systems, prioritizing transparency, explainability, and adherence to regulations and privacy policies is
               imperative. Addressing data and algorithmic biases and ensuring ongoing monitoring, evaluation, and
               ethical guideline adherence is essential for developers and users.


               This study's principal limitation was its reliance on a specialized cohort of certified Plastic Surgeons and
               trainees to assess LLMs’ coherence, comprehensibility, and usability, which may impede generalizability and
               introduce subjectivity and biases. Nonetheless, this research constitutes a preliminary exploration, guiding
               future inquiries with varied clinician samples to appraise LLMs’ utility in healthcare settings.


               This study showed that ChatGPT consistently provided more reliable, evidence-based clinical advice than
               BARD and Bing AI. However, LLMs generally lack depth and specificity, limiting their use in individualized
               decision-making. Healthcare professionals are crucial in interpreting and contextualizing LLM responses,
               especially for complex cases requiring multidisciplinary input. Future research should enhance LLM
               performance by incorporating specialized databases and expert knowledge, ensuring traceability and
               credibility of AI-generated content, and integrating LLMs with human expertise to advance nerve injury
               management and support patient-centered care.

               DECLARATIONS
               Authors’ contributions
               Made substantial contributions to the conception and design of the study and performed data analysis and
               interpretation: Lim B, Seth I, Bulloch, Xie Y
               Performed data acquisition, as well as provided administrative, technical, and material support: Lim B,
               Seth I
               Supervision, validation: Hunter-Smith DJ, Rozen WM


               Availability of data and materials
               Not applicable.
   60   61   62   63   64   65   66   67   68   69   70