Page 107 - Read Online
P. 107
Page 163 Choksi et al. Art Int Surg. 2025;5:160-9 https://dx.doi.org/10.20517/ais.2024.84
Figure 1. (Top-left to Bottom-right) Example frames for Needle Positioning, Needle Targeting, Needle Driving, and Needle Withdrawal
sub-stitch actions from the backhand suturing task.
classification accuracy by ~11% and technical score prediction by ~5%.
RESULTS
Twenty-seven surgeons participated in the suturing tasks [Table 1]. The surgeons ranged from post
graduate year (PGY) 1 to attendings with greater than 10 years’ experience. The average robotic case
experience was 50 cases +/- 156 cases. A total of 102 videos, consisting of 51 videos for the backhand
suturing task and 51 videos for the railroad suturing task, spanning 891,384 frames, were evaluated. A total
of 862,038 frames were annotated. We employed 3-fold cross-validation across the surgeons and averaged
the results from the held-out surgeons across the folds. Performance was assessed on sub-stitch
classification accuracy, technical score accuracy, and surgeon proficiency prediction. The clip-based Video
Swin Transformer models achieved an average accuracy of 70.23% for sub-stitch classification and 68.4% for
technical score prediction on the test folds [Table 2]. The confusion matrix for sub-stitch classification and
technical score prediction across all videos is presented in Figure 3, with the darker cells representing more
samples, labeled by their proportion within the dataset. Combining the model outputs, the Random Forest
Classifier achieved an average accuracy of 66.7% in predicting surgeon proficiency [Figure 4]. The
importance of input features to the Random Forest Classifier was analyzed using mean decrease in impurity
(MDI), revealing that the Needle Driving feature with a technical score of “Ideal” is the most significant in
determining surgeon proficiency [Figure 5].
DISCUSSION
This study shows the feasibility of creating a dry lab model and DL-based automatic assessment tool for
robotic-assisted surgery. Our results show that our CV algorithm is capable of assessing the proficiency level
of a surgical trainee with an accuracy of 66.7% utilizing surgical videos.

