Page 64 - Read Online
P. 64
Page 8 of 11 Lim et al. Plast Aesthet Res 2023;10:43 https://dx.doi.org/10.20517/2347-9264.2023.70
exhibited similar comprehensibility.
DISCUSSION
Overall, ChatGPT provided the most comprehensive and comprehensible information, generating an
extensive array of management solutions while incorporating contextual data, all in a reader-friendly
manner. BARD’s answers were similar in structure to ChatGPT’s, albeit lacking the same level of detail and
clarity. Bing AI struggled to attain similar levels of ChatGPT and BARD’s comprehensibility and easiness of
understanding, often utilizing technical language with frequent statistical references and lacking succinct
summaries which came across as unempathetic and impersonal, as reflected in its Flesch Reading Ease
Score. However, its consistent provision of sources compensates for its shortcomings, resulting in its
comparable DISCERN score to ChatGPT’s.
ChatGPT and BARD demonstrated superior comprehensibility and user-centricity to Bing’s AI, making
them more suitable for improving public comprehension of nerve injury management. Overall, the accuracy
of the three LLMs was insufficient for use as automated diagnostic tools supporting healthcare
professionals, as they neglected to reference key peripheral nerve injury guidelines or high-quality
[22]
research . Additionally, the LLMs omitted experimental treatments like stem cell therapy and
photochemical tissue bonding, indicating a limitation in their algorithms' capacity for generating innovative
[23]
solutions . Despite this, as a clinical practice tool, they could ensure that patients are not misled or
provided an unfeasible option.
In terms of information depth, ChatGPT provided supplementary data and enhanced its primary response
when asked, outperforming BARD. Although BARD’s comprehensiveness was comparable to ChatGPT, it
failed to respond to the third prompt, severely impacting its readability and reliability scores. Moreover,
ChatGPT presented the most thorough rationale for each suggestion, bolstering its credibility. BARD closely
followed, but its explanations lacked ChatGPT's depth. Bing AI delivered the least detailed responses, which
were sometimes not the gold standard. For example, it suggested exercise as an alternative treatment but
neglected to specify that aerobic exercise is the most optimal form for addressing nerve injuries .
[24]
Nevertheless, Bing AI offered the most diverse range of treatment alternatives. Therefore, while Bing AI
lacked depth, ChatGPT and BARD were limited in breadth. Notably, all three LLMs concentrated on
layperson first aid, offering limited information for healthcare professionals and academics. They omitted
management algorithms for nerve injuries, for example, when Bhandari (2019) recommends immediate
surgery for penetrating trauma with neurological symptoms but conservative management for blunt
trauma. The LLMs also neglect to address Seddon and Sunderland's categorizations of peripheral nerve
injuries, which impact management [8,25,26] . This oversight may be attributed to the phrasing of the queries, as
the LLMs could be presuming the authors are non-medical professionals, consequently yielding less
scholarly and comprehensive replies. Considering this is the first study to compare these LLMs and on this
topic, further research should seek to rectify these deficiencies.
Literature consistency was flagged by many studies investigating ChatGPT in the past, so the comparative
performance in generating references was pertinent to this study. Bing AI demonstrated superior
consistency compared to BARD and ChatGPT, often supplying relevant hyperlinks to fact-check its claims.
Meanwhile, BARD failed to produce high-level references and ChatGPT only recommended databases for
users to search or generated aberrant references. Despite this, Bing AI primarily cited health websites over
scholarly articles and directed users to irrelevant web pages, resulting in a higher DISCERN score than
BARD’s but failed to surpass ChatGPT’s. Despite these constraints, LLMs’ rapid information retrieval and
summarization capacity make them attractive for patients who are gathering information about emerging