Page 73 - Read Online
P. 73

Page 257                          Liu et al. Intell Robot 2024;4(3):256-75  I http://dx.doi.org/10.20517/ir.2024.17

               1. INTRODUCTION
               With the rapid development of computer vision and the low cost of visual sensors, visual simultaneous local-
               ization and mapping (VSLAM)  [1–3]  methods have been paid special attention for localization and navigation
               applications such as unmanned cars and automated guided vehicles (AGV), and have developed into a rela-
               tively well-established theoretical system. Since the VSLAM method is increasingly mature, most researchers
               have prioritized their research into deploying simultaneous localization and mapping (SLAM) into more com-
               plex scenarios with longer lifecycles to cope with more challenging issues. For large-scale [4,5]  and lifelong [6,7]
               VSLAM systems, the computational complexity and time cost are too high, resulting in the disjointed infor-
               mation from the real-time input data flow. Therefore, it is crucial to efficiently increase the operating speed of
               the VSLAM system while guaranteeing its accuracy.

                                                                             [8]
               With the development of the compute unified device architecture (CUDA) , graphics processors [9,10]  are no
               longer limited to graphics tasks. More researchers have begun to study the use of the graphic processing unit
               (GPU) for general-purpose computing. Leveraging the computing performance that is tens or even hundreds
               of times better than the central processing unit (CPU) and combining with the friendly programming envi-
               ronment supported by CUDA architecture, GPU parallel acceleration has been successfully applied in many
               fields such as data mining, weather prediction, and behavior recognition [11] . Similarly, the establishment of
               the VSLAM system based on heterogeneous computing [12]  has become a hot research content to solve the
               real-time problem of VSLAM.


               For VSLAM development
               VSLAM generally includes front-end visual odometry [13] , back-end optimization [14]  and mapping [15] , and
               loop closing [16] . In 2015, Oriented FAST and Rotated BRIEF SLAM (ORB-SLAM2) [17]  was proposed by
               Mur-Artal and Tardós, which is a classical VSLAM system based entirely on feature points. The program de-
               signed a closed-loop detection method based on bag-of-words (BOW) modeling, which can generate sparse
               3D maps with centimeter-level accuracy. In contrast to [17] , which estimated the camera poses with the Ef-
               ficient Perspective-n-Point (EPnP) algorithm, ORB-SLAM3 [18]  employed the pose estimation method based
               on the PnP algorithm. Although the latter has a certain improvement in accuracy and robustness, its real-time
               performance is still difficult to guarantee when dealing with large-scale scenarios.

               For VSLAM parallel acceleration development
               Inlarge-scalescenes,ahigherresolutioncamerawithawide-anglelensisoftenutilizedtoobtainwide-viewand
               clearer images. When the image size is too large, the running speed of VSLAM becomes extremely slow due to
               the large amounts of pixels processed in the feature detection and matching module. Mohammadi and Reza-
               eian used CUDA to improve the extraction of scale invariant feature sift, which not only ensured the accuracy,
               but also improved the speed by nearly 30 times through the optimal combination of kernels [19] . Nevertheless,
               it still cannot be applied in real-time SLAM applications. Parker et al. designed a parallel acceleration scheme
               for feature extraction and feature matching based on the Learned Arrangement of Three Patches (LATCH) de-
               scriptors [20] . The scheme was also applied to structure from motion (SFM), achieving nearly ten times faster
               than the feature description based on Scale-Invariant Feature Transform (SIFT) descriptors while maintaining
               accuracy. Urban and Hinz extended and improved upon a state-of-the-art SLAM to make it applicable to be
               rigidly coupled with multi-camera systems by MultiCol model [21] . On this basis, Li et al. proposed a CPU-
               based multi-threading method for parallel image reading, feature extraction and tracking, which solved the
               load imbalance problem and further improved computational efficiency [22] . Moreover, the feature extraction
               was implemented via OpenMP in CPU, while the feature matching was implemented in GPU, significantly
   68   69   70   71   72   73   74   75   76   77   78