Page 89 - Read Online
P. 89

Li et al. Intell Robot 2021;1(1):84-98                      Intelligence & Robotics
               DOI: 10.20517/ir.2021.06



               Research Article                                                              Open Access




               Unsupervised monocular depth estimation with aggre-
               gating image features and wavelet SSIM (Structural

               SIMilarity) loss


                                   1
                       1
                                                 1
               Bingen Li , Hao Zhang , Zhuping Wang , Chun Liu , Huaicheng Yan 3,4 , Lingling Hu 1
                                                          2
               1 Department of Control Science and Engineering, Tongji University, Shanghai 200000, China.
               2 the College of Surveying and Geo-informatics, Tongji University, Shanghai 200000, China.
               3 East China University of Science and Technology, Shanghai 200000, China.
               4 College of Mechatronics and Control Engineering, Hubei Normal University, Huangshi 435000, China.
               Correspondence to: Dr. Hao Zhang, Department of Control Science and Engineering, Tongji University, Shanghai 200000, China.
               E-mail: zhang hao@tongji.edu.cn
               How to cite this article: Li B, Zhang H, Wang Z, Liu C, Yan H, Hu L. Unsupervised monocular depth estimation with aggregating
               image features and wavelet SSIM (Structural SIMilarity) loss. Intell Robot 2021;1(1):84-98. http://dx.doi.org/10.20517/ir.2021.06
               Received: 27 Aug 2021  First Decision: 4 Sep 2021 Revised: 14 Sep 2021 Accepted: 15 Sep 2021 Published: 12 Oct 2021
               Academic Editor: Simon X. Yang Copy Editor: Xi-Jun Chen  Production Editor: Xi-Jun Chen




               Abstract
               Unsupervised learning has shown to be effective for image depth prediction. However, the accuracy is restricted be-
               cause of uncertain moving objects and the lack of other proper constraints. This paper focuses on how to improve the
               accuracy of depth prediction without increasing the computational burden of the depth network. Aggregated residual
               transformations are embedded in the depth network to extract high-dimensional image features. A more accurate
               mapping relationship between feature map and depth map can be built without bringing extra network computational
               burden. Additionally, the 2D discrete wavelet transform is applied to the structural similarity loss (SSIM) to reduce
               the photometric loss effectively, which can divide the entire image into various patches and obtain high-quality image
               information. Finally, the effectiveness of the proposed method is demonstrated. The training model can improve the
               performance of the depth network on the KITTI dataset and decrease the domain gap on the Make3D dataset.


               Keywords: Unsupervised depth estimation, computational complexity, aggregated residual transformations, 2D dis-
               crete wavelet transform






                           © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0
                           International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, shar­
                ing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you
                give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate
                if changes were made.



                                                                                            www.intellrobot.com
   84   85   86   87   88   89   90   91   92   93   94