Page 89 - Read Online
P. 89

Page 273                          Liu et al. Intell Robot 2024;4(3):256-75  I http://dx.doi.org/10.20517/ir.2024.17

                                Table 8. Accuracy and runtime comparison between OpenVSLAM and CUDA-SLAM
                                                         Open   CUDA-  CUDA-   CUDA-
                                           Project
                                                        VSLAM   SLAM1  SLAM2   SLAM3
                                      Keyframe increase scale  0%  0%   50%     50%
                                      Number of feature points  2,000  4,000  2,000  4,000
                                         RMSE (APE)     0.684 m  0.611 m  0.550 m  0.327 m
                                         RMSE (RPE)     0.013 m  0.011 m  0.010 m  0.008 m
                                          Tracking time  52 ms   16 ms  12 ms   16 ms
                                         Optimization time  144 ms  41 ms  53 ms  53 ms
                                      The data with the best performance is bolded in the table. VSLAM: Visual
                                      simultaneous localization and mapping; CUDA: compute unified device
                                      architecture; SLAM: simultaneous localization and mapping; RMSE: root
                                      mean square error; RPE: relative pose error; APE: absolute pose error.
                                           Table 9. Runtime comparison of the tracking module

                                             ORB-SLAM2       OpenVSLAM       CUDA-SLAM
                                  Sequence  Median  Mean   Median  Mean    Median  Mean
                                    T1    0.0288 s  0.0291 s  0.0451 s  0.0469 s  0.0124 s  0.0191 s
                                    T2    0.0296 s  0.0298 s  0.0527 s  0.0498 s  0.0119 s  0.0192 s
                                    T3    0.0294 s  0.0295 s  0.0457 s  0.0467 s  0.0118 s  0.0186 s
                                    T4    0.0295 s  0.0297 s  0.0563 s  0.0575 s  0.0125 s  0.0198 s
                                    T5    0.0290 s  0.0291 s  0.0543 s  0.0547 s  0.0126 s  0.0228 s
                                    T6    0.0288 s  0.0289 s  0.0520 s  0.0509 s  0.0134 s  0.0219 s
                                  Average  0.0292 s  0.0293 s  0.0507 s  0.0512 s  0.0124 s  0.0202 s
                                  SLAM: Simultaneous localization and mapping; VSLAM: Visual simultaneous localiza-
                                  tion and mapping; CUDA: compute unified device architecture.


               number of keyframes by 50%, then compare the tracking time and trajectory accuracy of OpenVSLAM and
               CUDA-SLAM. The results are shown in Table 8.

               It can be shown in Table 8 that CUDA-SLAM has significantly improved the tracking accuracy after expanding
               the data scale, but still consumes much less time than the OpenVSLAM algorithm. It is further verified that the
               parallel SLAM algorithm in the paper has significant advantages in terms of accuracy and runtime. Therefore,
               in practical applications, the best performance of VSLAM system can be achieved by flexibly adjusting the
               number of feature points and keyframes.


               To better meet the practical application circumstances, an embedded GPU named Jetson AGX Orin is intro-
               duced. Further experiments are conducted on the above hardware platform with TUM datasets to verify the
               effectiveness of the parallel tracking module more comprehensively. The runtime comparison of the track-
               ing module is shown in Table 9. The performance of the proposed tracking algorithm is improved by 31%
               compared with the traditional algorithm under the same test environment. It can be seen that our algorithm
               outperforms traditional algorithms under a variety of open datasets and is stable under the embedded GPU
               hardware.



               5. CONCLUSION
               In the paper, we propose a parallel scheme on the key modules of VSLAM system based on CUDA for the
               large-scale computational tasks with high complexity in the tracking and optimization of VSLAM system. By
               improving the modules of feature extraction and matching as well as BA, the parallelization of the algorithm
               is achieved on the GPU. Compared with the traditional sequential execution methods, the speedups of feature
               extraction, feature matching and BA are respectively 10-20, 5-13 and 10 times, while maintaining accuracy. At
               last, the proposed front-end and back-end parallel algorithms are migrated to OpenVSLAM. The results show
               that the tracking accuracy of CUAD-SLAM is basically identical to the state-of-the-art methods with the same
               settings but the running speed is significantly improved.
   84   85   86   87   88   89   90   91   92   93   94