Page 97 - Read Online
P. 97

Li et al. Intell Robot 2021;1(1):84-98  I http://dx.doi.org/10.20517/ir.2021.06       Page 92


               In general, the image photometric loss contains the structural similarity metric (SSIM) [26]  and the regulariza-
               tion loss    1. The wavelet SSIM loss is used to replace SSIM loss in photometric loss. Therefore, the image
               photometric loss is defined as

                                                               
                                                1 −               (      ,    )
                                                               
                                               =                + (1 −   )       −       
            (10)

                                                       2                        1
               where we empirically set    = 0.85.

               When computing the photometric loss from different source images, most previous approaches average the
               photometric loss together into every available source images. However, the second assumption requests that
               each pixel in the target image is also visible to the source image. However, this assumption is easily broken. It
               is inevitable that some moving objects and occlusions exist in the scene; thus, some pixels are available in one
               image but are not available in the next image. As a result, inaccurate pixel reconstruction and the photometric
               error are caused. Following the work in [22] , the minimum photometric loss at each pixel in the target image
               is computed instead of the average photometric loss. Note that this method can only correct the photometric
               loss but not eliminate it. Therefore, the final per-pixel photometric loss is

                                                                     
                                                           = min     (      ,    )                    (11)
                                                                     
                                                             
               Inaddition,theperformanceofdepthnetworksuffersfromtheinfluenceofmovingobjectsintheimage. These
               moving pixels should not be involved in computing the photometric loss. Therefore, a binary per-pixel mask
                  in [22]  is applied to automatically recognize moving pixels (   = 0) and static pixels (   = 1). The mask    only
                                                                                 
               includes some pixels whose photometric error of the reconstructed image    is lower than that of the target
                                                                                 
               image       and source image      . The mask    is defined as
                                                             
                                               = [min(    (      ,    )) > min(    (      ,       ))]  (12)
                                                             
               [ ] is the Iverson bracket. The auto-masking photometric loss [22]  is

                                                                                                      (13)
                                                              =        
               The second-order gradients of the depth map are used to make the depth map smooth. Because the edge or
               corner in the depth map should be less smooth than other flat regions, the gradient of the depth map should
               be locally smooth rather than fully smooth. Therefore, a Laplacian [23]  is applied to automatically perceive the
               position of each pixel. Different from the method in [23] , it is used at every scale instead of a specific scale. The
               Laplacian template is second-order differencing with four neighborhoods. It can reinforce object edges and
               weaken the region of slowly varying intensity. The smoothness loss of this pixel receives a lower weight when
               the Laplacian is higher. The smoothness loss is defined as follows:

                                                   2
                                                  =     −∇   (      ) (|              | + |              | + |              |)  (14)
                                                                  2
                                                            2
                                                                       
                                                      2
                                                     ∇    =    +                                      (15)
                                                                2       2
               where ∇ is the Laplacian operator.

               Therefore, the total loss function is
                                                                                                      (16)
                                                                   =         +        

               The final total loss is averaged per pixel, batch, and scale.
   92   93   94   95   96   97   98   99   100   101   102