Page 9 - Read Online
P. 9

Dababneh et al. Art Int Surg 2024;4:214-32  https://dx.doi.org/10.20517/ais.2024.50                                                    Page 216

               Table 1. Inclusion and exclusion criteria of the systematic review
                Inclusion criteria                                 Exclusion criteria
                Articles on the integration of AI in hand and wrist surgery   Letters to the editor
                Articles published between 2014-2024               Systematic reviews
                                                                   Articles not related to hand or wrist surgery
                                                                   Languages other than English
                                                                   Articles related to prosthetic hand or arm

               AI: Artificial intelligence.
               terms for AI were included to ensure comprehensive coverage of ML-related articles in the field of hand
               surgery.

               In total, 1,228 articles were identified and screened using the Covidence platform. Two reviewers initially
               screened the articles based on the relevance of titles and abstracts. All articles that did not refer to the
               application of AI to specific concepts related to hand or wrist surgery were excluded. Two hundred and
               twenty-five articles advanced to full-text screening. The inclusion and exclusion criteria applied are
               included in Table 1. Excluded articles included duplicates, letters to the editor, systematic reviews, non-
               English publications, content unrelated to hand or wrist surgery, and articles published over a decade ago. A
               full-text review conducted by a single reviewer led to the extraction of 90 articles, which were subsequently
               confirmed by a second reviewer. Each included study then underwent a manual bibliographic review to
               identify other relevant studies that were not included in the primary search. This process led to the
               inclusion of eight additional articles, bringing the total number of articles covered in this review to 98.


               The focus of this study was to explore the application of ML in the diagnosis and management of various
               hand conditions, including hand and wrist fractures, peripheral nerve injuries, carpal tunnel syndrome
               (CTS), osteoarthritis (OA), and triangular fibrocartilage complex (TFCC) disorders. By examining these
               innovative technologies, this study seeks to assist hand surgeons in integrating ML into their practice.
               Therefore, an emphasis is placed on evaluating the performance of AI as well as its potential to enhance
               resident training and improve patient communication. However, this research does not address
               rehabilitation or the use of prosthetic arms and hands following nerve injury or amputations [Figure 1].


               RESULTS
               Use of large language model in AI
               This review identified ten articles that focused on AI’s performance in executing various tasks relating to
               hand surgery.


               A common area in which AI’s performances were evaluated was answering hand surgery multiple-choice
                                                                                                   [16]
               exam questions. Thibaut et al. compared ChatGPT-3.5’s performance to Google’s Bard Chatbot . Both
               large language models (LLMs) were tasked with answering 18 questions from the European Board of Hand
               Surgery (EBHS). This study showed that both platforms failed to obtain a passing score and did not adapt
               their response even after the authors provided the correct answer. A similar study carried out in 2024 tasked
               ChatGPT-3.5 and ChatGPT-4 with answering the 2021 and 2022 Self-Assessment Examinations (SAE) of
               the American Society for Surgery of the Hand (ASSH) . ChatGPT-4 performed significantly better, with an
                                                             [17]
               overall score of 68.9%, compared to ChatGPT 3.5’s 58.0%. These findings align with Ghanem et al.’s study,
               which reported that ChatGPT-4 achieved an overall passing score of 61.98% in the ASSH 2019 exam .
                                                                                                       [18]
               Despite ChatGPT’s improvement with newer versions, most studies highlighted it is limited by its
               incapacity to take clinical context into consideration [17-20] .
   4   5   6   7   8   9   10   11   12   13   14