Page 107 - Read Online
P. 107

Page 163                                                           Choksi et al. Art Int Surg. 2025;5:160-9  https://dx.doi.org/10.20517/ais.2024.84



































                Figure 1. (Top-left to Bottom-right) Example frames for Needle Positioning, Needle Targeting, Needle Driving, and Needle Withdrawal
                sub-stitch actions from the backhand suturing task.

               classification accuracy by ~11% and technical score prediction by ~5%.


               RESULTS
               Twenty-seven surgeons participated in the suturing tasks [Table 1]. The surgeons ranged from post
               graduate year (PGY) 1 to attendings with greater than 10 years’ experience. The average robotic case
               experience was 50 cases +/- 156 cases. A total of 102 videos, consisting of 51 videos for the backhand
               suturing task and 51 videos for the railroad suturing task, spanning 891,384 frames, were evaluated. A total
               of 862,038 frames were annotated. We employed 3-fold cross-validation across the surgeons and averaged
               the results from the held-out surgeons across the folds. Performance was assessed on sub-stitch
               classification accuracy, technical score accuracy, and surgeon proficiency prediction. The clip-based Video
               Swin Transformer models achieved an average accuracy of 70.23% for sub-stitch classification and 68.4% for
               technical score prediction on the test folds [Table 2]. The confusion matrix for sub-stitch classification and
               technical score prediction across all videos is presented in Figure 3, with the darker cells representing more
               samples, labeled by their proportion within the dataset. Combining the model outputs, the Random Forest
               Classifier achieved an average accuracy of 66.7% in predicting surgeon proficiency [Figure 4]. The
               importance of input features to the Random Forest Classifier was analyzed using mean decrease in impurity
               (MDI), revealing that the Needle Driving feature with a technical score of “Ideal” is the most significant in
               determining surgeon proficiency [Figure 5].


               DISCUSSION
               This study shows the feasibility of creating a dry lab model and DL-based automatic assessment tool for
               robotic-assisted surgery. Our results show that our CV algorithm is capable of assessing the proficiency level
               of a surgical trainee with an accuracy of 66.7% utilizing surgical videos.
   102   103   104   105   106   107   108   109   110   111   112