Page 108 - Read Online
P. 108

Choksi et al. Art Int Surg. 2025;5:160-9  https://dx.doi.org/10.20517/ais.2024.84                                                           Page 164

               Table 1. Participant demographics
                                                              All (n = 27)  Trainees (n = 20)  Attendings (n = 7)
                Female n (%)                                  10 (37%)   9 (45%)         1 (14%)
                Right dominant hand n (%)                     24 (88.9%)  18 (90%)       6 (86%)
                Robotic case experience (median +/- SD)       50 +/- 156  45 +/- 32      186 +/- 231
                PGY level (for residents) (median +/- SD)     N/A        3 +/- 1.27      N/A
                Years experience after training (attendings) (median +/- SD)  N/A  N/A   8.7 +/- 6.01

               PGY: Post graduate year; N/A: not applicable.

               Table 2. Performance of each model in the system using 3-fold cross-validation across held-out surgeons. For technical score
               prediction models, we assume the type of suturing exercise (backhand, railroad) is known prior and apply the corresponding model
                                                       Average weighted F-1   Average macro F-1   Average
                Technique              Model
                                                       score               score             accuracy
                Sub-stitch classification  Video swin   0.6452             0.6400            0.7023
                                       transformer
                Technical score prediction -   Video swin   0.7185         0.7155            0.7259
                backhand               transformer
                Technical score prediction - railroad Video swin   0.6430  0.6364            0.6411
                                       transformer
                Surgeon proficiency prediction  Random forest   0.6266     0.5805            0.6665
                                       classifier


               This study represents one of the first to utilize AI to automatically assess surgical trainees on a specific
               robotic surgery task. This study underlines the ability of AI-assisted assessment tools as an effective
               educational tool for surgical trainees in identifying their proficiency and potentially providing feedback.
               While Ma et al. developed the first AI-based video feedback tool for robotic suturing, their study
               participants had no robotic surgical experience and, therefore, focused on improving tasks rather than
               determining proficiency . Our model is able to provide feedback while also determining the skill level of
                                    [16]
               the trainee.

               Suturing is a fundamental surgical skill, and proficiency in this skill implies mastery of many technicalities,
               such as needle angulation, insertion point, depth, and tissue manipulation. By breaking down the suturing
               tasks into four sub-stitches: needle positioning, needle targeting, needle driving, and needle withdrawal,
               trainees can understand what specific needle movements they need to practice while maintaining a
               standardized taxonomy. This specific suturing taxonomy, based on prior research, allows us and future
               researchers to have a reproducible methodology around automatic supervised learning suturing
               assessment [15,17] .

               These surgical techniques are vital for surgical trainees to practice more specific movements in a controlled
               setting before they perform surgery on a patient. It also allows surgical attendings to gain trust in their
               trainees prior to operating in a real clinical setting based on their robotic proficiency score.

               This preliminary study shows only the feasibility of creating this type of model to assess the skill level of a
               trainee, with an accuracy of 66.7%. While the model does need to be improved, its current accuracy allows
               for identifying which residents need extra practice. Only six of the participants were false negatives in our
               study [Figure 3]. Although the accuracy rate is only 66.7%, having residents practice more and then attempt
               the dry lab again will only improve their skills and therefore the model can be utilized to pick out those
               trainees who need more practice.  However, we do aim to improve our accuracy in the future with a larger
   103   104   105   106   107   108   109   110   111   112   113