Page 83 - Read Online
P. 83

Page 267                          Liu et al. Intell Robot 2024;4(3):256-75  I http://dx.doi.org/10.20517/ir.2024.17

                                                 Table 1. Image processor parameters
                                                    NVIDIA GeForce  NVIDIA GeForce
                                       Property                            NVIDIA Jetson AGX Orin
                                                     RTX 2080Ti  RTX 1050Ti
                                 Number of CUDA cores  4,352      768          2,048
                                 Video memory capacity  11 GB     4 GB          32 G
                                 Video memory bandwidth  616 GB/s  112 GB/s   204.8 GB/s
                                 Video memory bus width  352bit   128bit       256bit
                                   Video memory type  GDDR6      GDDR5         LPDDR5
                                    Core architecture  Turing     Pascal       Ampere
                                                CUDA: Compute unified device architecture.



























                                  Figure 6. The runtime comparison of key functions in parallel feature extraction.


               strating that the CUDA parallel feature extraction algorithm proposed in the paper is effective. It is also shown
               that the acceleration effect of the two types of graphics cards is significantly different due to the difference be-
               tween the number of CUDA cores and the difference between the bandwidth of graphics memory. At the same
               time, the speedup increases with the growth of the image size, but the acceleration rate slows down with the
               increase of the image size, which conforms to the heterogeneous acceleration characteristics. For the improved
               acceleration algorithm, the whole image needs to be copied to the graphics card device first. The higher the
               image resolution, the more data will be copied to the graphics card device, which consumes a lot of time and
               is counted in the pyramid part. In order to avoid chance errors, the images of different pixels are executed 50
               times separately and the average execution time is taken as the statistical result. The execution time of each
               key function in the feature extraction can be seen in Figure 6. As can be seen in Figure 6, pyramid genera-
               tion consumes the most time in the front-end, and the overall time consumed decreases significantly as the
               performance of the graphics card increases.

               The performance of feature matching is mainly evaluated by two indicators:


               (1) Accuracy: this indicator refers to the number of successful matching point pairs whose Hamming distance
               is less than a certain threshold.
               (2) Running speed: this indicator refers to the average runtime in the process of matching a certain number of
               feature points.

               Figure 7 shows the results of the proposed parallel matching algorithm. It can be seen that the successful
               matching rate is high. In order to further verify the accuracy of the algorithm, we calculate the matching
   78   79   80   81   82   83   84   85   86   87   88