Page 73 - Read Online

P. 73

Page 257 Liu et al. Intell Robot 2024;4(3):256-75 I http://dx.doi.org/10.20517/ir.2024.17

1. INTRODUCTION
With the rapid development of computer vision and the low cost of visual sensors, visual simultaneous local-
ization and mapping (VSLAM) [1–3] methods have been paid special attention for localization and navigation
applications such as unmanned cars and automated guided vehicles (AGV), and have developed into a rela-
tively well-established theoretical system. Since the VSLAM method is increasingly mature, most researchers
have prioritized their research into deploying simultaneous localization and mapping (SLAM) into more com-
plex scenarios with longer lifecycles to cope with more challenging issues. For large-scale [4,5] and lifelong [6,7]
VSLAM systems, the computational complexity and time cost are too high, resulting in the disjointed infor-
mation from the real-time input data flow. Therefore, it is crucial to efficiently increase the operating speed of
the VSLAM system while guaranteeing its accuracy.

[8]
With the development of the compute unified device architecture (CUDA) , graphics processors [9,10] are no
longer limited to graphics tasks. More researchers have begun to study the use of the graphic processing unit
(GPU) for general-purpose computing. Leveraging the computing performance that is tens or even hundreds
of times better than the central processing unit (CPU) and combining with the friendly programming envi-
ronment supported by CUDA architecture, GPU parallel acceleration has been successfully applied in many
fields such as data mining, weather prediction, and behavior recognition [11] . Similarly, the establishment of
the VSLAM system based on heterogeneous computing [12] has become a hot research content to solve the
real-time problem of VSLAM.

For VSLAM development
VSLAM generally includes front-end visual odometry [13] , back-end optimization [14] and mapping [15] , and
loop closing [16] . In 2015, Oriented FAST and Rotated BRIEF SLAM (ORB-SLAM2) [17] was proposed by
Mur-Artal and Tardós, which is a classical VSLAM system based entirely on feature points. The program de-
signed a closed-loop detection method based on bag-of-words (BOW) modeling, which can generate sparse
3D maps with centimeter-level accuracy. In contrast to [17] , which estimated the camera poses with the Ef-
ficient Perspective-n-Point (EPnP) algorithm, ORB-SLAM3 [18] employed the pose estimation method based
on the PnP algorithm. Although the latter has a certain improvement in accuracy and robustness, its real-time
performance is still difficult to guarantee when dealing with large-scale scenarios.

For VSLAM parallel acceleration development
Inlarge-scalescenes,ahigherresolutioncamerawithawide-anglelensisoftenutilizedtoobtainwide-viewand
clearer images. When the image size is too large, the running speed of VSLAM becomes extremely slow due to
the large amounts of pixels processed in the feature detection and matching module. Mohammadi and Reza-
eian used CUDA to improve the extraction of scale invariant feature sift, which not only ensured the accuracy,
but also improved the speed by nearly 30 times through the optimal combination of kernels [19] . Nevertheless,
it still cannot be applied in real-time SLAM applications. Parker et al. designed a parallel acceleration scheme
for feature extraction and feature matching based on the Learned Arrangement of Three Patches (LATCH) de-
scriptors [20] . The scheme was also applied to structure from motion (SFM), achieving nearly ten times faster
than the feature description based on Scale-Invariant Feature Transform (SIFT) descriptors while maintaining
accuracy. Urban and Hinz extended and improved upon a state-of-the-art SLAM to make it applicable to be
rigidly coupled with multi-camera systems by MultiCol model [21] . On this basis, Li et al. proposed a CPU-
based multi-threading method for parallel image reading, feature extraction and tracking, which solved the
load imbalance problem and further improved computational efficiency [22] . Moreover, the feature extraction
was implemented via OpenMP in CPU, while the feature matching was implemented in GPU, significantly

68 69 70 71 72 73 74 75 76 77 78