Page 32 - Read Online
P. 32
Brenac et al. Art Int Surg 2024;4:296-315 https://dx.doi.org/10.20517/ais.2024.49 Page 300
Evaluation of ChatGPT-generated patient-facing information
Currently, 80% of Americans use the Internet for medical information, and a new study has determined that
78.4% of patients are open to utilizing ChatGPT for medical diagnoses [20,29,30] . Therefore, ensuring the quality
of ChatGPT-generated information for patient safety in an age where people increasingly consult the
Internet for healthcare information is critical. In attempts to develop safe and accurate patient-facing
information, ChatGPT-generated responses have been evaluated across various divisions of plastic surgery:
microsurgery, breast surgery, rhinoplasty, and cleft lip and palate surgery [20-24] . Additionally, to properly
determine the quality of ChatGPT-generated information, material currently available from academic and
professional sources is often compared against newly created ChatGPT medical information [20-24] . Grading
scales and tests frequently utilized by researchers to assess the quality of PRS information generated by
ChatGPT against online resources include Likert scales, EQUIP scales, and readability tests [18,19,21] . In one
study by Berry et al., ChatGPT-generated responses to frequently asked microsurgery medical questions
were compared against information currently provided by the American Society of Reconstructive
Microsurgery (ASRM) utilizing paired t-tests . Similar to Wang et al., a value of P < 0.05 indicated
[20]
[28]
statistical significance . Six plastic surgeons were tasked with assessing the comprehensiveness and clarity
of the two sources’ responses and selecting the source that provided the highest-quality patient-facing
[20]
information . Thirty non-medical individuals only indicated their preference. Surprisingly, plastic
surgeons scored ChatGPT information significantly higher in terms of comprehensiveness (P < 0.001) and
clarity (P < 0.05) . Plastic surgeons and non-medical individuals also chose ChatGPT as the source that
[20]
provides the highest-quality microsurgical information 70.7% and 55.9% of the time, respectively.
Interestingly, the readability scores of ChatGPT responses were considerably worse than ASRM according
to the following readability tests: Flesch-Kincaid Grade Level (P < 0.0001), FleschKincaid Readability Ease
(P < 0.001), Gunning Fog Index (P < 0.0001), Simple Measure of Gobbledygook Index (P < 0.0001),
Coleman-Liau Index (P < 0.001), Linsear Write Formula (P < 0.0001), and Automated Readability Index
(P < 0.0001) . Therefore, even though ChatGPT has proven to create accurate, comprehensive, and clear
[20]
microsurgical medical information, it may struggle to produce medical information at a desired 6th-grade
reading level when not explicitly prompted to do so.
Similarly, in a study by Grippaudo et al., ten plastic surgery residents analyzed the quality of ChatGPT-
generated breast plastic surgery information utilizing an EQIP scale for the frequently performed
procedures: breast reduction, breast reconstruction, and augmentation mammoplasty . The EQUIP scale is
[21]
made up of 36 yes or no questions with three sections: Content data (Questions 1-18), Identification data
(Questions 25-36), and Structure data (Questions 25-36) . Each question has a singular point value and a
[21]
score above 18 is considered a high score. ChatGPT was proven to create quality breast surgery information.
Regarding “Structure data”, ChatGPT thrived in providing clear and comprehensive information for
patients. However, one limitation identified by the researchers was that ChatGPT-generated medical
information struggled to perform well for the “Identification data” questions, often lacking proper
validation or bibliographic references. Despite this limitation, ChatGPT proved to create quality PRS
patient-facing information in regard to breast reconstruction, breast reduction, and augmentation
mammoplasty. Additionally, in a study by Seth et al., three specialist plastic and reconstructive surgeons
evaluated ChatGPT’s ability to create safe and high-quality breast augmentation material by asking plastic
surgeons to qualitatively assess ChatGPT-generated responses to six breast augmentation questions. The
researchers also performed a literature search to assess the accessibility, informativeness, and accuracy of
[22]
the responses . ChatGPT was found to provide comprehensive and grammatically accurate responses but
[22]
lacked personalized advice . Xie et al. discovered similar results to those of Seth et al. when investigating
the use of ChatGPT to generate responses to rhinoplasty questions from the American Society of Plastic
Surgeons (ASPS) website . Responses were evaluated by four plastic surgeons qualitatively for accuracy,
[23]
informativeness, and accessibility by plastic surgeons . Surgeons determined that the ChatGPT provided
[23]