Page 37 - Read Online
P. 37
Yoseph et al. Art Int Surg 2024;4:267-77 https://dx.doi.org/10.20517/ais.2024.38 Page 273
Figure 3. (A) Aggregate clarity and (B) completeness ratings, expressed in percentages, from all study participants (n = 10) comparing
*
LLM vs. physicians-generated responses. P < 0.05. LLM: Large language model.
Figure 4. (A) Median clarity and (B) completeness ratings for individual questions from all study participants (n = 10) comparing LLM vs.
physician-generated responses. Error bars represent IQR from the 25th through the 75th percentile. LLM: Large language model; IQR:
interquartile range.
were no significant differences between patients and controls on completeness ratings for ChatGPT (H =
5.36, P = 0.206) or Gemini separately (H = 1.61, P = 0.204) [Supplementary Figure 1A and B].