Page 78 - Read Online
P. 78

Liu et al. Intell Robot 2024;4(3):256-75  I http://dx.doi.org/10.20517/ir.2024.17   Page 262

               as in:
                                                               
                                                          ∑ ∑
                                                       10 =       (  ,   ),
                                                         −   −  
                                                               
                                                          ∑ ∑
                                                       01 =       (  ,   ),                             (4)
                                                         −   −  
                                                               
                                                          ∑ ∑
                                                       00 =     (  ,   ),
                                                         −   −  
               where    00 is the image moment and the centroid coordinates of the image block are defined as follows:

                                                           10       01
                                                          =  ,       =  .                               (5)
                                                           00       00


               Then, the principal direction of the feature point is obtained by connecting the geometric center of the image
               with the centroid:
                                                                       01
                                                  = arctan(  ) = arctan(  ).                            (6)
                                                                       10

               In GPU, the wrap is the basic scheduling unit. All threads within a wrap execute the same instruction at
               the same time, and share the same block of memory in low transfer latency. Since the direction is generally
               calculated by selecting a circular area with diameter of 32 pixels, one wrap is used to process one feature point.
               Therefore, each thread in the wrap is assigned to process one row of pixels, which improves the execution
               efficiency. During the execution, 32 calculation units are required for the center row, while the number of
               calculations for the rest of the rows is a fixed value of less than 32. The one-dimensional 32-element array    is
               constructed to save the number of executions per pixel row, and it is stored in constant memory.


               In the image matching, the BRIEF descriptor is generally used to represent and describe the detected feature
               points with its faster calculation speed and higher matching quality compared with SIFT and Speeded-Up
               Robust Features (SURF) descriptors. It is determined by 256 random point pairs within the neighborhood
               window around the feature point, where the random pairs can be obtained by Gaussian sampling.

               Since the calculation for feature point descriptors is mutually independent and takes up 32 times as much
               storage space, 32 threads are allocated for each feature point to improve the running efficiency. In order to
               speed up the data fetch, “cudaMemcpyToSymbol()” is utilized to save point pairs in the constant memory.

               2.5 Feature matching
               During initialization, the number of extracted feature points is generally several times more than that of track-
               ingduetothefactthatthefeaturematchingneedsmoreaccuratemappointsandinitialposeestimation. Hence,
               the feature matching is very time-consuming. Meanwhile, it is also difficult to guarantee real-time matching
               even if a local matching method is employed within a window, since there is no such a priori information, such
               as the known movement change between camera poses. Therefore, the paper implements the parallelization
               of feature matching by resorting to the OpenCV CUDA module, which mainly includes searching for point
               pairs based on Hamming distance and culling false matchings based on RANSAC algorithm.

               Forcoarsefeaturematching,theBruteForcealgorithmbasedontheHammingdistanceisappliedbycalculating
               the set with the minimum distance between two sets of feature vector sets as a matching point pair. Since the
               size of the feature vectors is fixed, it will be set as                   . When performing the initial feature matching,
                               .   = dimBlock .   =                   . The dimension of the thread grid is set to one dimension, and the
               parameters are given in
   73   74   75   76   77   78   79   80   81   82   83