Page 39 - Read Online
P. 39
Ding et al. Art Int Surg 2024;4:109-38 https://dx.doi.org/10.20517/ais.2024.16 Page 123
neural radiance combined with a time-dependent neural displacement field. Advancing this further,
[249]
EndoSurf employs three neural fields to model the surgical dynamics, shape, and texture.
Pose estimation
Pose estimation aims to estimate the geometric relationship between an image and a prior model, which can
[254]
take several forms . These include rigid surface models, dense deformable surfaces, point-based skeletons,
and robot kinematic models [254,255] . Many of the same techniques and variations employed in depth
estimation and 3D reconstruction tasks also apply here, including feature-based matching with the
[256]
[255]
perspective-n-point problem and end-to-end learning-based approaches . For point-based skeletons,
such as the human body, pose estimation relied on handcrafted features before the advent of deep neural
[257]
[258]
networks . In the context of surgical data science, pose estimation is highly relevant given the amount of a
priori information going into any surgery, which can be leveraged to create 3D models. These include
surgical tool models, robot models, and patient images. The 6DoF pose estimation of surgical tools, for
example, in relation to patient anatomy, can enable algorithms that anticipate surgical errors and mitigate
the risk of injuries . By identifying tools’ proximity to critical structures, pose estimation technologies can
[254]
ensure safer operations . This is further advanced by precise 6DoF pose estimation of both instruments
[259]
and tissue.
Deep neural networks have been shown to demonstrate promising outcomes for object pose estimation in
RGB images [254,260-263] . Modern approaches often involve training models to regress 2D key points instead of
directly estimating the object pose. These key points are then utilized to reconstruct the 6DoF object pose
through the perspective-n-point (PnP) algorithm, with techniques showing robust performance, even in
scenarios with occlusions .
[260]
Hand pose estimation also benefits from these technological advancements, with several methods proposed
for deducing hand configurations from single-frame RGB images . This capability is crucial for
[264]
understanding the interactions between surgical tools and the operating environment, offering insights into
the precise manipulation of instruments.
Beyond tool and hand pose estimation, human pose estimation can be applied for a broad spectrum of
clinical applications, including surgical workflow analysis, radiation safety monitoring, and enhancing
human-robot cooperation [265,266] . By leveraging videos from ceiling-mounted cameras, which capture both
personnel and equipment in the operating room, human pose estimation can identify the finer activities
within surgical phases, such as interactions between clinicians, staff, and medical equipment. The feasibility
of estimating the poses of the individuals in an operating room, utilizing color images, depth images, or a
combination of both, opens possibilities for real-time analysis of clinical environments .
[267]
APPLICATIONS OF GEOMETRIC SCENE UNDERSTANDING EMPOWERED DIGITAL
TWINS
Geometric scene understanding plays a pivotal role in developing the DT framework by enabling the
creation and real-time refinement of digital models based on real-world observations. Geometric
information processing is crucial here for precise representation, visualization, and model interaction.
Section “GEOMETRIC SCENE UNDERSTANDING TASKS” outlined the methods for processing this
information, critical for navigating the complex geometry of surgical settings - identifying shapes, positions,
and movements of anatomical features and tools. This section delves into the integration of geometric scene
understanding within the DT framework, emphasizing its successful applications. It offers valuable insights
that could be leveraged or specifically adapted to further the development of DT technologies in surgery.

