Page 8 - Read Online
P. 8
Liu et al. Art Int Surg 2024;4:92-108 Artificial
DOI: 10.20517/ais.2024.19
Intelligence Surgery
Original Article Open Access
A human mesh-centered approach to action
recognition in the operating room
4
2
2
5,#
3
Benjamin Liu 1 , Gilles Soenens , Joshua Villarreal , Jeffrey Jopling , Isabelle Van Herzeele , Anita Rau ,
Serena Yeung-Levy 5,#
1
Department of Computer Science, Stanford University, Stanford, CA 94305, USA.
2
Department of Thoracic and Vascular Surgery, Ghent University Hospital, Gent 9000, Belgium.
3
Department of Surgery, Stanford University, Stanford, CA 94305, USA.
4
Department of Surgery, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.
5
Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
#
Authors contributed equally.
Correspondence to: Benjamin Liu, Department of Computer Science, Stanford University, 353 Serra Mall, Stanford, CA 94305,
USA. E-mail: bencliu@cs.stanford.edu
How to cite this article: Liu B, Soenens G, Villarreal J, Jopling J, Van Herzeele I, Rau A, Yeung-Levy S. A human mesh-centered
approach to action recognition in the operating room. Art Int Surg 2024;4:92-108. https://dx.doi.org/10.20517/ais.2024.19
Received: 14 Mar 2024 First Decision: 15 May 2024 Revised: 25 May 2024 Accepted: 11 Jun 2024 Published: 30 Jun 2024
Academic Editor: Andrew A. Gumbs Copy Editor: Dong-Li Li Production Editor: Dong-Li Li
Abstract
Aim: Video review programs in hospitals play a crucial role in optimizing operating room workflows. In scenarios
where split-seconds can change the outcome of a surgery, the potential of such programs to improve safety and
efficiency is profound. However, leveraging this potential requires a systematic and automated analysis of human
actions. Existing methods predominantly employ manual methods, which are labor-intensive, inconsistent, and
difficult to scale. Here, we present an AI-based approach to systematically analyze the behavior and actions of
individuals from operating rooms (OR) videos.
Methods: We designed a novel framework for human mesh recovery from long-duration surgical videos by
integrating existing human detection, tracking, and mesh recovery models. We then trained an action recognition
model to predict surgical actions from the predicted temporal mesh sequences. To train and evaluate our
approach, we annotated an in-house dataset of 864 five-second clips from simulated surgical videos with their
corresponding actions.
Results: Our best model achieves an F1 score and the area under the precision-recall curve (AUPRC) of 0.81 and
0.85, respectively, demonstrating that human mesh sequences can be successfully used to recover surgical actions
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0
International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing,
adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as
long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if changes were made.
www.oaepublish.com/ais

