Page 106 - Read Online
P. 106

Choksi et al. Art Int Surg. 2025;5:160-9  https://dx.doi.org/10.20517/ais.2024.84                                                           Page 162

               In our study, we also aim to design a dry lab model for basic robotic suturing skills and create DL CV
               models capable of automatically assessing the performance of a trainee on suturing tasks. We further aim to
               determine if the participants are proficient at robotic suturing or need more practice.


               METHODS
               Study design and participants
               Twenty-seven surgeons were recorded while completing two repetitions of two robotic tasks, backhand
               suturing and railroad suturing, on a bench-top model created by our lab. The bench top model consisted of
               artificial skin padded by packaging foam taped to a wooden block, as shown in Figure 1. This was placed
               inside a robotic simulation abdominal cavity.


               The railroad suturing task consisted of performing a running stitch by driving the needle from one side to
               the opposite of the wound, and then re-entering next to where it exited. The backhand suturing exercise
               involves performing a continuous stitch by guiding the needle from the side closest to the operator to the
               opposite side.

               Videos were recorded at 30 frames per second (FPS) and downsampled to 15 FPS for the study. This
               prospective cohort study was approved by the Institutional Review Board of Northwell Health (IRB 23-069).


               Video segmentation and data labeling
               Video of each suturing task was broken down into a sequence of four sub-stitch phases: needle positioning,
               needle targeting, needle driving, and needle withdrawal [Figure 1]. Using Encord (Cord Technologies
               Limited, London, UK), every video was first temporally labeled into the four sub-stitch phases and then
               each sub-stitch was annotated with a binary technical score (ideal or non-ideal) based on a previously
               validated model [15,17] . The ideal/non-ideal classification reflects the operator’s skill while performing the
               suturing action. The annotators were all surgical residents and trained by a senior surgical resident. The
               annotators and engineers were blinded to the participants and experience levels when performing the
               annotations and building the model. Sub-stitch annotations (labels) were mapped into frame annotations.

               CV model
               Our proposed system leverages a multi-model approach to surgical skill assessment. We employ two
               distinct video DL models that are trained on overlapping 16-frame clips with a stride of 1 extracted from the
               videos: the first model classifies the clip into one of the four sub-stitch phases or a background (no action)
               class, while the second model classifies the clip into a binary technical score. These models use the Video
               Swin Transformer architecture to capture spatiotemporal features within the surgical workflow. To generate
               a comprehensive skill assessment, the individual model predictions are aggregated and fed into a Random
               Forest Classifier, which produces a final classification of trainee skill level, categorized as either “Proficient”
               or “Trainee”. Proficient in suturing was based on a case experience of 50 robotic cases. This was then
               confirmed by a video review by two minimally invasive fellowship-trained attendings. This multi-faceted
               approach aims to provide a robust and objective evaluation of surgical competency in robotic surgery
               training [Figure 2].


               At 15 fps, 16 frames correspond to 1.067 s of video time, which constitutes a very short period for the
               models to identify what suturing sub-stitch is taking place, and if any mistakes are made during the sub-
               stitch. Rather than naively increasing the clip size, which results in a larger memory footprint, we
               introduced a dilation (frame skip) of 15 frames between consecutive frames of the clip, resulting in an
               effective clip length of ~17 s. We employ dilation in both train and test phases, which improved sub-stitch
   101   102   103   104   105   106   107   108   109   110   111