Page 43 - Read Online
P. 43

Ding et al. Art Int Surg 2024;4:109-38  https://dx.doi.org/10.20517/ais.2024.16     Page 127

               cannot currently be processed in real time for updating geometric information and require auxiliary
               constraints like stereo matching for plausible output. The lack of observability due to limited operational
               space is also a major challenge that leads to a paucity of features for geometric scene understanding.
                                                                                    [123]
                                                             [2]
               Incorporating other modalities like robot kinematics  and temporal constraints  can be a complement
               under this situation. The emergence of foundation models also offers an alternate approach to harness the
               power of foundation models trained on enormous natural images to address the aforementioned challenges
               in the surgical domain [146,145] . However, the domain gap may hinder the optimal extraction of precise
                      [147]
               features  and may require further work to extend them in the surgical domain.
               CONCLUSION
               Surgical data science, benefiting from the advent of end-to-end deep learning architectures, is also hindered
               by their lack of reliability and interoperability. The DT paradigm is envisioned to advance the surgical data
               science domain further, introducing new avenues of research in surgical planning, execution, training, and
               postoperative analysis by providing a universal digital representation that enables robust and interpretable
               surgical data science research. Geometric scene understanding is the core building block of DT and plays a
               pivotal role in building and updating digital models. In this review, we find that the existing geometric
               representation and well-established tasks provide fundamental materials and tools to implement the DT
               framework and have led to the emergence of successful applications. However, challenges remain in
               employing more advanced but data-consuming methods especially in segmentation, detection, and
               monocular depth estimation tasks in the surgical domain due to a lack of annotations and a gap in the scale
               of the data. The complexity of the surgical scene due to the large portion of dynamic and deformable tissues,
               and the lack of observability due to limited operational space are also common factors that hinder the
               development of geometric scene understanding tasks, especially for the 3D reconstruction that demands
               multi-view observations. To address these challenges, numerous approaches, including synthetic image
               generation, sim-to-real generalization, auxiliary data incorporation, and foundational model adaptation, are
               being explored. Among all of these methods, the auxiliary data incorporation and foundation models
               present the most promising improvement. Since the auxiliary data is not always available and the
               exploration of the foundation models in surgical data science is still preliminary, it is expected to see more
               advancement in this direction that improves the geometric scene understanding performance and further
               promotes DT research. Developing an accurate, efficient, interactive, and reliable DT requires robust and
               efficient holistic geometric representation and combinations of effective geometric scene understanding, to
               build and update digital model pipelines in real time.

               DECLARATIONS
               Authors’ contributions
               Initial writing of the majority part and coordination of the collaboration among authors: Ding H
               Initial writing of the 3D reconstruction, integration, and revision of the paper: Seenivasan L
               Initial writing of the depth estimation and pose estimation part and revision of the paper: Killeen BD
               Initial writing of the pose estimation and application part: Cho SM
               The main idea of the paper, overall structure, and revision of the paper: Unberath M


               Availability of data and materials
               See Section “Availability of data and materials” in the main text.


               Financial support and sponsorship
               This research is in part supported by (1) the collaborative research agreement with the Multi-Scale Medical
               Robotics Center at The Chinese University of Hong Kong; (2) the Link Foundation Fellowship for
               Modeling, Training, and Simulation; and (3) NIH R01EB030511 and Johns Hopkins University Internal
   38   39   40   41   42   43   44   45   46   47   48