Page 87 - Read Online
P. 87
Wei et al. Art Int Surg 2024;4:187-98 I http://dx.doi.org/10.20517/ais.2024.12 Page 193
Table 1. Depth evaluation metrics
Metrics Definition
Abs Rel 1 Í ∈D | − |/ ∗
∗
|D|
2
Sq Rel 1 Í ∈D | − | / ∗
∗
|D|
q
RMSE 1 Í ∈D | ∗ − | 2
|D|
q
1 Í
RMSE ∈D | log ∗ − log | 2
|D|
1 ∈ D| max( ∗ , ∗ < ) | × 100%
|D|
and represent the estimated
∗
depth value and the corresponding
ground truth. D corresponds to the
estimated depth map. RMSE: Root
mean square error.
3.3 Evaluation on scale-aware depth estimation
We compare the accuracy of depth estimation using the KV-EndoNeRF method with several other deep
[7]
learning-based approaches and the SfM method, specifically COLMAP .
[7]
• COLMAP is a general-purpose SfM pipeline used for reconstructing 3D point cloudreconstruction from
ordered and unordered image collections. In our study, we apply it to monocular surgical scene reconstruc-
tion. The recovered points are then projected onto each image plane to obtain the sparse depth maps for
evaluation.
• EndoSLAM [30] is an unsupervised relative monocular depth estimation method specifically designed for
gastrointestinal tract organs. It combines residual networks with a spatial attention module to focus on
highly textured tissue regions. We fine-tune the depth model using the SCARED data for comparison.
• AF-SfMLearner [10] isanovelself-supervised networkforestimatingmonoculardepthinendoscopicscenes.
It is trained on the SCARED datasets, which contain severe brightness fluctuations induced by illumination
variations, non-Lambertian reflections, and inter-reflections.
• DS-NeRF [17] is a general depth-supervised NeRF method that utilizes sparse reconstruction from the SfM
to recover dense 3D structures. We apply DS-NeRF to estimate dense depth maps for each endoscopic
image.
We present the quantitative depth comparison results on SCARED data in Table 2, which rescales the re-
sults using the ground truth median scaling method. In addition to standard depth evaluation metrics, we
calculate the means and standard errors of the rescaling factors to demonstrate the scale-awareness ability.
KV-EndoNeRF achieves the best up-to-scale performance with respect to five metrics and ranks the second
best for the other two metrics. Notably, KV-EndoNeRF also achieves nearly perfect absolute scale estimation.
These quantitative results show that our proposed method effectively extracts absolute scale information from
kinematics and integrates it into NeRF for further depth optimization, resulting in accurate absolute depth
estimation.
Furthermore, we select four representative images from the SCARED dataset for qualitative depth comparison.
As shown in Figure 3, our method with NeRF-based optimization produces depth predictions with sharp
boundaries and fine-grained details, outperforming other approaches in terms of absolute depth estimation.
However, COLMAPcouldonlyrecoversparsedepthmapswithouttheentire3Dgeometryofthetissuesurface.
While EndoSLAM and AF-SfMLearner are capable of generating reasonable 3D structures of tissues, they lose
many details in tissues with complex geometries and edges. Lastly, the estimated depth values from DS-NeRF
contain significant noise, which could affect the surgeons’ observations of complicated tissue surfaces.

