Page 93 - Read Online
P. 93
Shi et al. Art Int Surg 2024;4:247-57 Artificial
DOI: 10.20517/ais.2024.17
Intelligence Surgery
Original Article Open Access
Long-term reprojection loss for self-supervised
monocular depth estimation in endoscopic surgery
1
2
1
Xiaowei Shi , Beilei Cui , Matthew J. Clarkson , Mobarakol Islam 1
1
Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of Medical Physics and Biomedical
Engineering, University College London, London WC1E 6BT, UK.
2
Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong 999077, China.
Correspondence to: Xiaowei Shi, Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS) and Department of
Medical Physics and Biomedical Engineering, University College London, Gower Street, London WC1E 6BT, UK. E-mail:
xiaowei.shi.22@alumni.ucl.ac.uk
How to cite this article: Shi X, Cui B, Clarkson MJ, Islam M. Long-term reprojection loss for self-supervised monocular depth
estimation in endoscopic surgery. Art Int Surg 2024;4:247-57. https://dx.doi.org/10.20517/ais.2024.17
Received: 1 Mar 2024 First Decision: 12 Jul 2024 Revised: 6 Aug 2024 Accepted: 2 Sep 2024 Published: 10 Sep 2024
Academic Editors: Luca Milone, Andrew A. Gumbs Copy Editor: Pei-Yun Wang Production Editor: Pei-Yun Wang
Abstract
Aim: Depth information plays a key role in enhanced perception and interaction in image-guided surgery. However,
it is di icult to obtain depth information with monocular endoscopic surgery due to a lack of reliable cues for
perceiving depth. Although there are reprojection loss-based self-supervised learning techniques to estimate depth
and pose, the temporal information from the adjacent frames is not e iciently utilized to handle occlusion in
surgery.
Methods: We design long-term reprojection loss (LT-RL) self-supervised monocular depth estimation techniques
by integrating longer temporal sequences into reprojection to learn better perception and to address occlusion
artifacts in image-guided laparoscopic and robotic surgery. For this purpose, we exploit four temporally adjacent
source frames before and after the target frame, where conventional reprojection loss uses two adjacent frames.
The pixels that are visible in the target frame but occluded in the immediate two adjacent frames will produce the
inaccurate depth but a higher chance to appear in the four adjacent frames during the calculation of minimum
reprojection loss.
Results: We validate LT-RL on the benchmark surgical datasets of Stereo correspondence and reconstruction of
endoscopic data (SCARED) and Hamlyn to compare the performance with other state-of-the-art depth estimation
methods. The experimental results show that our proposed technique yields 2%-4% better root-mean-squared
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0
International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing,
adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as
long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and
indicate if changes were made.
www.oaepublish.com/ais

