Page 65 - Read Online
P. 65
Lim et al. Plast Aesthet Res 2023;10:43 https://dx.doi.org/10.20517/2347-9264.2023.70 Page 9 of 11
medical issues.
Regarding user-friendliness and comprehensibility, ChatGPT and BARD exhibited comparable success.
ChatGPT’s responses were succinct and devoid of technical jargon, rendering them advantageous for
individuals with limited expertise, such as junior medical staff or patients. Additionally, ChatGPT
acknowledged the user’s occupation and hand dominance, personalizing the interaction. BARD displayed a
similar syntactical approach and empathized with the user, offering greetings, and expressing sympathy for
the user’s injury. Although both models consistently advised consulting a doctor, BARD conveyed a
warmer, more welcoming tone. Conversely, Bing AI manifested the least personable demeanor, utilizing
cold, clinical language and frequent third-person pronouns. Although Table 2 shows Bing AI outperforming
BARD in the Flesch-Kincaid Grade Level and Coleman-Liau Index, this is due to BARD's failure to answer
prompt 3, adversely affecting its scores. Noting that half of the comparisons were statistically insignificant,
the authors suggest further investigations to acquire more robust results.
Deploying AI chatbots in clinical settings engenders ethical concerns, encompassing potential patient
confidentiality breaches and inaccuracies due to error-prone public health data. Such issues may protract
diagnosis, compromise patient safety, and entail legal consequences. To establish reliable and accountable
AI systems, prioritizing transparency, explainability, and adherence to regulations and privacy policies is
imperative. Addressing data and algorithmic biases and ensuring ongoing monitoring, evaluation, and
ethical guideline adherence is essential for developers and users.
This study's principal limitation was its reliance on a specialized cohort of certified Plastic Surgeons and
trainees to assess LLMs’ coherence, comprehensibility, and usability, which may impede generalizability and
introduce subjectivity and biases. Nonetheless, this research constitutes a preliminary exploration, guiding
future inquiries with varied clinician samples to appraise LLMs’ utility in healthcare settings.
This study showed that ChatGPT consistently provided more reliable, evidence-based clinical advice than
BARD and Bing AI. However, LLMs generally lack depth and specificity, limiting their use in individualized
decision-making. Healthcare professionals are crucial in interpreting and contextualizing LLM responses,
especially for complex cases requiring multidisciplinary input. Future research should enhance LLM
performance by incorporating specialized databases and expert knowledge, ensuring traceability and
credibility of AI-generated content, and integrating LLMs with human expertise to advance nerve injury
management and support patient-centered care.
DECLARATIONS
Authors’ contributions
Made substantial contributions to the conception and design of the study and performed data analysis and
interpretation: Lim B, Seth I, Bulloch, Xie Y
Performed data acquisition, as well as provided administrative, technical, and material support: Lim B,
Seth I
Supervision, validation: Hunter-Smith DJ, Rozen WM
Availability of data and materials
Not applicable.