Page 37 - Read Online
P. 37

Yoseph et al. Art Int Surg 2024;4:267-77  https://dx.doi.org/10.20517/ais.2024.38                                                        Page 273



























                Figure 3. (A) Aggregate clarity and (B) completeness ratings, expressed in percentages, from all study participants (n = 10) comparing
                                           *
                LLM vs. physicians-generated responses.  P < 0.05. LLM: Large language model.







































                Figure 4. (A) Median clarity and (B) completeness ratings for individual questions from all study participants (n = 10) comparing LLM vs.
                physician-generated responses. Error bars represent IQR from the 25th through the 75th percentile. LLM: Large language model; IQR:
                interquartile range.


               were no significant differences between patients and controls on completeness ratings for ChatGPT (H =
               5.36, P = 0.206) or Gemini separately (H = 1.61, P = 0.204) [Supplementary Figure 1A and B].
   32   33   34   35   36   37   38   39   40   41   42