Page 26 - Read Online
P. 26
Page 110 Ding et al. Art Int Surg 2024;4:109-38 https://dx.doi.org/10.20517/ais.2024.16
INTRODUCTION
Surgical data science is an emerging interdisciplinary research domain that has the potential to transform
the future of surgery. Capitalizing on the pre- and intraoperative surgical data, the research efforts in
surgical data science are dedicated to enhancing the quality, safety, and efficacy of interventional
[1]
healthcare . With the advent of powerful machine learning algorithms for surgical image and video
analysis, surgical data science has witnessed a significant thrust, enabling solutions for problems that were
once considered exceptionally difficult. These advances include improvements in low-level vision tasks,
[2]
such as surgical instrument segmentation or “critical view of safety” classification , to high-level and
downstream challenges, such as intraoperative guidance , intelligent assistance systems [8-10] , surgical phase
[3-7]
recognition [11-18] , gesture classification [14,19-21] , and skills analysis [20,22,23] . While end-to-end deep learning
models have been the backbone of recent advancements in surgical data science, the high-level surgical
analysis derived from these models raises concerns about reliability due to the lack of interpretability and
explainability. Alternate to these end-to-end approaches, the emerging digital twin (DT) paradigm, a virtual
equivalent of the real world, allows interpretable high-level surgical scene analysis on enriched digital data
generated from low-level tasks.
End-to-end deep learning models have been the standard approach to surgical data science in both low- and
high-level tasks. These models either focus on specific tasks or are used as foundational models, solving
multiple downstream tasks. This somewhat straightforward approach, inspired by deep learning best
practices, has historically excelled in task-specific performances due to deep learning’s powerful
representation learning capabilities. However, we argue that this approach - despite its recent successes - is
ripe for innovation [1,24,25] . End-to-end deep learning models exhibit strong tendencies to learn, exploit, or
give in to non-causal relationships, or shortcuts, in the data [26-28] . Because it is impossible at worst or very
difficult at best to distinguish low-level vision from high-level surgical data science components in end-to-
end deep neural networks, it generally remains unclear how reliable these solutions are under various
domain shifts and whether they associate the correct input signals with the resulting prediction . These
[29]
uncertainties and unreliability hinder further development in the surgical data science domain and the
clinical translation of the current achievements . While explainable machine learning, among other
[1]
techniques, seeks to develop methods that may assert adequate model behavior , by and large, this
[30]
limitation poses a challenge that we believe is not easily remedied with explanation-like constructs of
similarly end-to-end deep learning origin.
The DT paradigm offers an alternative to task-specific end-to-end machine learning-based approaches for
current surgical data science research. It provides a clear framework to separate low-level processing from
high-level analysis. As a virtual equivalent of the real environment (surgical field in surgical data science),
the DT models the real-world dynamics and properties using the data obtained from sensor-rich
environments through low-level processing [7,31-33] . The resulting DT is ready for high-level complex analysis
since all relevant quantities are known precisely and in a computationally accessible form. Unlike end-to-
end deep learning paradigms that rely on data fitting, the DT paradigm employs data to construct a DT
model. While the low-level processing algorithms that enable the DT are not immune to non-causal
learning and environmental influences during the machine learning process, which might compromise
robustness or performance, their impact is mostly confined to the accuracy of digital model construction
and update. The resulting digital model in the DT paradigm can provide not only visual guidance like
mixed reality but also, more importantly, a platform for more comprehensive surgical data science research
like data generation, high-level surgical analysis (e.g., surgical phase recognition, and skill assessment), and
autonomous agent training. DT’s uniform representation of causal factors, including geometric and physical
attributes of the subjects and tools, surgery-related prior knowledge, and user input, along with their clear

