Page 27 - Read Online
P. 27
Ding et al. Art Int Surg 2024;4:109-38 https://dx.doi.org/10.20517/ais.2024.16 Page 111
causal relation with the surgical task, should ensure superior generalizability and interpretability of surgical
data science research.
The fundamental component of the DT paradigm is the building and updating of the digital model from
real-world observations. In this process, geometric information processing plays a central role in the
representation, visualization, and interaction of the digital model. Thus, the geometric scene understanding
(i.e., perceiving geometric information from the target scene) is vital to enabling the realization of the DT
and further DT-based research in surgical data science. In this work, we focus on reviewing the geometric
representations, geometric scene understanding techniques, and their successful application for building
primitive DT frameworks. The design of geometric representations needs to consider the trade-off among
accuracy, representation ability, complexity, interactivity, and interpretability, as they correspond to the
accuracy, applicability, efficiency, interactivity, and reliability of the DT framework. The extraction and
encoding of different target representations and their status in the digital world lead to the establishment
and development of various geometric scene understanding tasks, including segmentation, detection, depth
estimation, 3D reconstruction, and pose estimation [Figure 1]. Although various sensors in a surgical setup,
such as optical trackers, depth sensors, and robotics meters, are important for geometric understanding and
DT instantiation in some procedures , we focus on reviewing methods based on visible light imaging, as it
[7]
is the primary real-time observation source in most surgeries, especially in minimally invasive surgeries
(MIS) due to their limited operational space. We further select methods that achieve superior benchmark
performance for geometric scene understanding, which can be applied for accurate DT construction and
updating. The integration of geometric scene understanding within the DT framework has led to successful
applications, including simulator-empowered DT models and procedure-specific DT models.
The paper is organized as follows: Section “GEOMETRIC REPRESENTATIONS” provides an overview of
the existing digital representations for geometric understanding. Section “GEOMETRIC SCENE
UNDERSTANDING TASKS” investigates existing datasets and various algorithms used to extract
geometric understanding, assessing their effectiveness and limitations in terms of benchmark performance.
Section “APPLICATIONS OF GEOMETRIC SCENE UNDERSTANDING EMPOWERED DIGITAL
TWINS” explores the successful attempts to apply geometric scene understanding techniques in DT.
Concluding the paper, Section “DISCUSSION” offers an in-depth discussion on the present landscape,
challenges, and future directions in the field of geometric understanding within surgical data science.
GEOMETRIC REPRESENTATIONS
This section introduces various geometric representation categories and analyzes their advantages and
disadvantages in relation to DT requirements. The taxonomy, along with some examples and summarized
takeaways, is shown in Figure 2. We first present direct grid-based and point-based representations. Then,
we discuss latent space representation generated via ruled encoding or neural encoding on previous
representations or other modalities. Finally, we cover functional representations, which are generated
through meticulous mathematical derivation or estimation based on observation.
Grid-based representation
The grid-based representation divides the 2D/3D space into discrete cells and usually stores it as
multidimensional arrays. Each cell holds values for various attributes, including density, color, semantic
classes, and others. For example, the segmentation masks represent shapes of interest in rectangular space
uniformly divided into a 2D grid of cells, where each cell is called a pixel. Some 3D shapes are also stored in
uniformly divided 3D grids of cells. These cells are called voxels and usually hold the density/occupancy
value. While these simplified representations enable efficient data queries for specific locations, they come

