Page 27 - Read Online
P. 27

Ding et al. Art Int Surg 2024;4:109-38  https://dx.doi.org/10.20517/ais.2024.16     Page 111

               causal relation with the surgical task, should ensure superior generalizability and interpretability of surgical
               data science research.

               The fundamental component of the DT paradigm is the building and updating of the digital model from
               real-world observations. In this process, geometric information processing plays a central role in the
               representation, visualization, and interaction of the digital model. Thus, the geometric scene understanding
               (i.e., perceiving geometric information from the target scene) is vital to enabling the realization of the DT
               and further DT-based research in surgical data science. In this work, we focus on reviewing the geometric
               representations, geometric scene understanding techniques, and their successful application for building
               primitive DT frameworks. The design of geometric representations needs to consider the trade-off among
               accuracy, representation ability, complexity, interactivity, and interpretability, as they correspond to the
               accuracy, applicability, efficiency, interactivity, and reliability of the DT framework. The extraction and
               encoding of different target representations and their status in the digital world lead to the establishment
               and development of various geometric scene understanding tasks, including segmentation, detection, depth
               estimation, 3D reconstruction, and pose estimation [Figure 1]. Although various sensors in a surgical setup,
               such as optical trackers, depth sensors, and robotics meters, are important for geometric understanding and
               DT instantiation in some procedures , we focus on reviewing methods based on visible light imaging, as it
                                              [7]
               is the primary real-time observation source in most surgeries, especially in minimally invasive surgeries
               (MIS) due to their limited operational space. We further select methods that achieve superior benchmark
               performance for geometric scene understanding, which can be applied for accurate DT construction and
               updating. The integration of geometric scene understanding within the DT framework has led to successful
               applications, including simulator-empowered DT models and procedure-specific DT models.


               The paper is organized as follows: Section “GEOMETRIC REPRESENTATIONS” provides an overview of
               the existing digital representations for geometric understanding. Section “GEOMETRIC SCENE
               UNDERSTANDING TASKS” investigates existing datasets and various algorithms used to extract
               geometric understanding, assessing their effectiveness and limitations in terms of benchmark performance.
               Section “APPLICATIONS OF GEOMETRIC SCENE UNDERSTANDING EMPOWERED DIGITAL
               TWINS” explores the successful attempts to apply geometric scene understanding techniques in DT.
               Concluding the paper, Section “DISCUSSION” offers an in-depth discussion on the present landscape,
               challenges, and future directions in the field of geometric understanding within surgical data science.


               GEOMETRIC REPRESENTATIONS
               This section introduces various geometric representation categories and analyzes their advantages and
               disadvantages in relation to DT requirements. The taxonomy, along with some examples and summarized
               takeaways, is shown in Figure 2. We first present direct grid-based and point-based representations. Then,
               we discuss latent space representation generated via ruled encoding or neural encoding on previous
               representations or other modalities. Finally, we cover functional representations, which are generated
               through meticulous mathematical derivation or estimation based on observation.

               Grid-based representation
               The grid-based representation divides the 2D/3D space into discrete cells and usually stores it as
               multidimensional arrays. Each cell holds values for various attributes, including density, color, semantic
               classes, and others. For example, the segmentation masks represent shapes of interest in rectangular space
               uniformly divided into a 2D grid of cells, where each cell is called a pixel. Some 3D shapes are also stored in
               uniformly divided 3D grids of cells. These cells are called voxels and usually hold the density/occupancy
               value. While these simplified representations enable efficient data queries for specific locations, they come
   22   23   24   25   26   27   28   29   30   31   32