Page 37 - Read Online

P. 37

Yoseph et al. Art Int Surg 2024;4:267-77 https://dx.doi.org/10.20517/ais.2024.38 Page 273

Figure 3. (A) Aggregate clarity and (B) completeness ratings, expressed in percentages, from all study participants (n = 10) comparing
*
LLM vs. physicians-generated responses. P < 0.05. LLM: Large language model.

Figure 4. (A) Median clarity and (B) completeness ratings for individual questions from all study participants (n = 10) comparing LLM vs.
physician-generated responses. Error bars represent IQR from the 25th through the 75th percentile. LLM: Large language model; IQR:
interquartile range.

were no significant differences between patients and controls on completeness ratings for ChatGPT (H =
5.36, P = 0.206) or Gemini separately (H = 1.61, P = 0.204) [Supplementary Figure 1A and B].

32 33 34 35 36 37 38 39 40 41 42