Page 97 - Read Online
P. 97
Li et al. Intell Robot 2021;1(1):84-98 I http://dx.doi.org/10.20517/ir.2021.06 Page 92
In general, the image photometric loss contains the structural similarity metric (SSIM) [26] and the regulariza-
tion loss 1. The wavelet SSIM loss is used to replace SSIM loss in photometric loss. Therefore, the image
photometric loss is defined as
1 − ( , )
= + (1 − ) −
(10)
2 1
where we empirically set = 0.85.
When computing the photometric loss from different source images, most previous approaches average the
photometric loss together into every available source images. However, the second assumption requests that
each pixel in the target image is also visible to the source image. However, this assumption is easily broken. It
is inevitable that some moving objects and occlusions exist in the scene; thus, some pixels are available in one
image but are not available in the next image. As a result, inaccurate pixel reconstruction and the photometric
error are caused. Following the work in [22] , the minimum photometric loss at each pixel in the target image
is computed instead of the average photometric loss. Note that this method can only correct the photometric
loss but not eliminate it. Therefore, the final per-pixel photometric loss is
= min ( , ) (11)
Inaddition,theperformanceofdepthnetworksuffersfromtheinfluenceofmovingobjectsintheimage. These
moving pixels should not be involved in computing the photometric loss. Therefore, a binary per-pixel mask
in [22] is applied to automatically recognize moving pixels ( = 0) and static pixels ( = 1). The mask only
includes some pixels whose photometric error of the reconstructed image is lower than that of the target
image and source image . The mask is defined as
= [min( ( , )) > min( ( , ))] (12)
[ ] is the Iverson bracket. The auto-masking photometric loss [22] is
(13)
=
The second-order gradients of the depth map are used to make the depth map smooth. Because the edge or
corner in the depth map should be less smooth than other flat regions, the gradient of the depth map should
be locally smooth rather than fully smooth. Therefore, a Laplacian [23] is applied to automatically perceive the
position of each pixel. Different from the method in [23] , it is used at every scale instead of a specific scale. The
Laplacian template is second-order differencing with four neighborhoods. It can reinforce object edges and
weaken the region of slowly varying intensity. The smoothness loss of this pixel receives a lower weight when
the Laplacian is higher. The smoothness loss is defined as follows:
2
= −∇ ( ) (| | + | | + | |) (14)
2
2
2
∇ = + (15)
2 2
where ∇ is the Laplacian operator.
Therefore, the total loss function is
(16)
= +
The final total loss is averaged per pixel, batch, and scale.