Page 9 - Read Online
P. 9
Page 93 Liu et al. Art Int Surg 2024;4:92-108 https://dx.doi.org/10.20517/ais.2024.19
from operating room videos. Model ablation studies suggest that action recognition performance is enhanced by
composing human mesh representations with lower arm, pelvic, and cranial joints.
Conclusion: Our work presents promising opportunities for OR video review programs to study human behavior in
a systematic, scalable manner.
Keywords: Action recognition, human mesh recovery, operating room, surgery, artificial intelligence, computer
vision, deep learning
INTRODUCTION
In recent years, video review programs in hospitals have grown rapidly, particularly within critical settings
[1]
such as intensive care units, trauma bays, and operating rooms (OR) . These programs use video
recordings, which serve as a veritable source of truth and offer insights into case challenges, systematic flaws
in operating workflows, and opportunities for improvement. Healthcare providers can leverage these
insights to design safer and more efficient interventions. In environments where mere seconds can
significantly alter a patient’s life course, the integration of video review programs holds enormous potential
to improve patient outcomes.
Realizing this potential in a scalable, efficient way requires a systematic approach to video review that
enables the granular analysis of movements, spatial dynamics, and actions made by human subjects.
However, this vision is untenable with the manual methods that dominate modern programs. Manual
review of OR videos is labor-intensive and difficult to perform systematically. Previous studies focusing
exclusively on the analysis of OR movements required several mobility experts to review videos individually,
[2-5]
discuss observations, and consolidate findings . Extrapolating objective insights on team performance
presents even more substantial challenges, as communications and team dynamics can be subtle despite
[6]
their overwhelming importance to a successful operation . While human analysis falls short, artificial
intelligence (AI) algorithms are equipped to identify such subtle human motions in an efficient, scalable
[7]
manner. Conventional AI-centered approaches process videos or images holistically . However, visual cues
differ between ORs across various institutions and specialties, potentially leading to model overfitting in low
data regimes. Instead of processing videos directly, we thus leverage human meshes in sequence to analyze
movements and actions in the OR.
Human mesh recovery (HMR) is a rapidly emerging technique for estimating detailed 3D human body
meshes from 2D images. HMR harnesses deep learning architectures and parametric human body models
to capture the shape and pose of a person. Recent increases in available high-resolution 3D motion capture
data, alongside significant advances in HMR methods [8-11] , present a compelling opportunity for the
systematic analysis of OR videos. Resulting shape and pose estimates can be used to derive high-fidelity
human meshes, providing a basis for studying underlying human behaviors based on the change of an
individual’s mannerisms and associated poses throughout time. For example, a common prelude to a
human greeting may involve the extension of one’s hand, the quick turn of one’s neck, or the opening of
one’s upper arms for an embrace. All these actions can be interpreted clearly with the progression of arm
and neck joints from estimated human meshes. Previous studies have applied HMR to diverse simulated
and real-world settings, such as the analysis of striking techniques in sports, the reconstruction of clothed
human bodies, and the modeling of avatars in virtual reality environments .
[12]

