Page 102 - Read Online
P. 102

Page 256                                                                Shi et al. Art Int Surg 2024;4:247-57  https://dx.doi.org/10.20517/ais.2024.17

               Table 3. Ablation study on less or more frames
                Frames      Abs Rel (% ↓)         Sq Rel(% ↓)        RMSE (↓)        RMSE Log (↓)
                2           0.062                 0.513              5.289           0.094
                4           0.058                 0.452              5.014           0.083
                6           0.611                 0.448              5.209           0.091

               Quantitative comparison with 2, 4 and 6 consecutive frames conducted on the SCARED dataset. The unit of % and millimeter (mm) of each
               metric is indicated in the bracket. The best results are in black bold. Abs Rel: Absolute relative error; Sq Rel: square relative error; RMSE: root-
               mean-squared error; RMSE Log: root-mean-square logarithmic error; SCARED: stereo correspondence and reconstruction of endoscopic data.


               regions, and handle complex anatomical structures. Ablation studies underscore the importance of utilizing
               an optimal number of consecutive frames (in this case, 4) to maximize depth estimation performance while
               mitigating occlusion. While LT-RL does not affect the inference phase, its requirement for additional frames
               during training increases the computational overhead. Additionally, although our method demonstrates
               excellent generalization on the Hamlyn dataset, the specificity of our validation datasets suggests that
               further research is needed to fully understand LT-RL’s performance across a broader range of endoscopic
               and surgical scenarios.


               In conclusion, we present LT-RL by integrating longer temporal information to tackle occlusion artifacts in
               endoscopic surgery. Our extensive validation and comparison demonstrate the evidence that it is crucial to
               consider small camera pose changes in endoscopic surgery, and the proposed LT-RL addressed the issue
               successfully. The external validation of the Hamlyn dataset demonstrates the better robustness and
               generalization of the proposed method. Although LT-RL requires extra computation for the additional
               frames during training, there is no effect in the inference phase as there is no need for loss calculation in
               deployment. Our self-supervised loss is simple, flexible and easy to adapt to any network architecture of
               convolution and recent transformer-based models. The excellent 3D reconstruction reflects the better depth
               and pose learning and prediction of our LT-RL over other models. Future work should consider
               investigating the reliability of the LT-RL over vanilla reprojection loss. Computational efficiency can also be
               improved by using a shared encoder and an equal number of input frames for both depth and pose
               estimation tasks.


               DECLARATIONS
               Authors’ contributions
               Conceptualization, investigation, methodology, validation, visualization, writing - original draft, writing -
               review and editing: Shi X
               Conceptualization, methodology, visualization, writing - original draft, writing - review and editing: Islam
               M
               Conceptualization, methodology, writing - review and editing: Clarkson MJ
               Conceptualization, validation, visualization, writing - review and editing: Cui B


               Availability of data and materials
               Our code is available at https://github.com/xiaowshi/Long-Term_Reprojection_Loss.


               Financial support and sponsorship
               This work was part-funded by the EPSRC grant [EP/W00805X/1].
   97   98   99   100   101   102   103   104   105   106   107