Page 40 - Read Online
P. 40
Treger et al. Art Int Surg. 2025;5:126-32 https://dx.doi.org/10.20517/ais.2024.66 Page 128
their research found that nearly 60% of Americans would not want AI-powered robots to be used for
surgery or in the formation of diagnoses or treatment plans. Much of this mistrust may stem from the lack
of understanding of how these AI models are able to devise their proposed solutions. Distrust may stem
from the observation that these models can be unpredictable at times and may “confabulate” when they do
not have a clear answer to the questions being asked instead of simply stating they do not have enough
information to give a reliable answer. Extensive progress must be made so that patients can trust the use of
AI in matters affecting their health, particularly a greater understanding of the inner workings of AI.
Furthermore, humans would like to be confident that AI will act in the best interest of humans, which is
why reliable, human-centric ethical standards are essential for guiding the operation of these models.
ACCURACY IN DECISION MAKING
The most accessible way to test AI’s medical decision making in plastic surgery is through standardized
exams. These are crafted to have definitive, correct answers. Preliminary studies assessing ChatGPT’s
performance on the Plastic Surgery In-Service Exams, which are annual exams taken by American plastic
surgery residents to assess their knowledge base, demonstrate that ChatGPT is able to perform at a level
higher than early trainees. Interestingly, however, it falls short of trainees in their last years of plastic surgery
residency . When comparing subsequent versions of ChatGPT, it was found that the AI model rapidly
[8]
evolved in its performance and accuracy on the in-service exams, progressing toward the level of human
performance. Ultimately, it was found that ChatGPT struggled with clinical scenarios where multiple
correct recommendations were available, but only one was preferred by the question writers. This may be
ascribed to the AI’s lack of real-world experience, missing the intuition derived from time spent in clinical
practice.
An important finding from this research, though, is that models like ChatGPT currently do not meet the
performance of those closer to attending plastic surgeon status. This suggests that AI models may lack the
clinical and subjective insight to fully grasp patient scenarios. While AI may be well-equipped to follow
algorithmic approaches, it currently lacks certain reasoning capabilities, particularly when situations are not
so clearly defined. Real-life patient care in plastic surgery is filled with nuance, unlike standardized exams
that are crafted to have definitive answers. Today, we still cannot confidently rely on AI to make accurate
and comprehensive judgments regarding diagnosis and treatment plans.
PATIENT HEALTH LITERACY
Health literacy and education play an essential role in enabling patients to effectively manage and advocate
for their healthcare needs. Medical jargon and complexity can act as a daunting barrier to understanding
one’s own health. Those with decreased health literacy are more likely to be hospitalized, visit the
[9]
emergency room, underuse their prescribed medications, and suffer from higher morbidity and mortality .
Therefore, enhancing patients’ health literacy, both for general healthcare and in the specific context of
plastic and reconstructive surgery, would have immediate and obvious benefits.
The implementation of AI as an accessible plastic surgery consultant has the opportunity to enhance patient
satisfaction and outcomes. For this reason, researchers are currently exploring the utility of large language
models such as ChatGPT in answering patient questions regarding their plastic surgery needs . Beyond
[10]
answering patient questions, AI models can effectively simplify medical jargon for patients. This facilitates
an improved understanding of their medical condition(s). Ayre et al. demonstrated that ChatGPT was able
[11]
to bring medical jargon from a grade 12.8 reading level down to a revised grade 11 level . While this
remains above the average reading level for Americans, this study shows the potential for AI to make
complex text easier to understand for patients without compromising the integrity of its content.

