Page 9 - Read Online
P. 9

Page 93                                                                  Liu et al. Art Int Surg 2024;4:92-108  https://dx.doi.org/10.20517/ais.2024.19

               from operating room videos. Model ablation studies suggest that action recognition performance is enhanced by
               composing human mesh representations with lower arm, pelvic, and cranial joints.

               Conclusion: Our work presents promising opportunities for OR video review programs to study human behavior in
               a systematic, scalable manner.

               Keywords: Action recognition, human mesh recovery, operating room, surgery, artificial intelligence, computer
               vision, deep learning



               INTRODUCTION
               In recent years, video review programs in hospitals have grown rapidly, particularly within critical settings
                                                                               [1]
               such as intensive care units, trauma bays, and operating rooms (OR) . These programs use video
               recordings, which serve as a veritable source of truth and offer insights into case challenges, systematic flaws
               in operating workflows, and opportunities for improvement. Healthcare providers can leverage these
               insights to design safer and more efficient interventions. In environments where mere seconds can
               significantly alter a patient’s life course, the integration of video review programs holds enormous potential
               to improve patient outcomes.


               Realizing this potential in a scalable, efficient way requires a systematic approach to video review that
               enables the granular analysis of movements, spatial dynamics, and actions made by human subjects.
               However, this vision is untenable with the manual methods that dominate modern programs. Manual
               review of OR videos is labor-intensive and difficult to perform systematically. Previous studies focusing
               exclusively on the analysis of OR movements required several mobility experts to review videos individually,
                                                       [2-5]
               discuss observations, and consolidate findings . Extrapolating objective insights on team performance
               presents even more substantial challenges, as communications and team dynamics can be subtle despite
                                                                  [6]
               their overwhelming importance to a successful operation . While human analysis falls short, artificial
               intelligence (AI) algorithms are equipped to identify such subtle human motions in an efficient, scalable
                                                                                      [7]
               manner. Conventional AI-centered approaches process videos or images holistically . However, visual cues
               differ between ORs across various institutions and specialties, potentially leading to model overfitting in low
               data regimes. Instead of processing videos directly, we thus leverage human meshes in sequence to analyze
               movements and actions in the OR.

               Human mesh recovery (HMR) is a rapidly emerging technique for estimating detailed 3D human body
               meshes from 2D images. HMR harnesses deep learning architectures and parametric human body models
               to capture the shape and pose of a person. Recent increases in available high-resolution 3D motion capture
               data, alongside significant advances in HMR methods [8-11] , present a compelling opportunity for the
               systematic analysis of OR videos. Resulting shape and pose estimates can be used to derive high-fidelity
               human meshes, providing a basis for studying underlying human behaviors based on the change of an
               individual’s mannerisms and associated poses throughout time. For example, a common prelude to a
               human greeting may involve the extension of one’s hand, the quick turn of one’s neck, or the opening of
               one’s upper arms for an embrace. All these actions can be interpreted clearly with the progression of arm
               and neck joints from estimated human meshes. Previous studies have applied HMR to diverse simulated
               and real-world settings, such as the analysis of striking techniques in sports, the reconstruction of clothed
               human bodies, and the modeling of avatars in virtual reality environments .
                                                                             [12]
   4   5   6   7   8   9   10   11   12   13   14