Page 76 - Read Online
P. 76
Liu et al. Intell Robot 2024;4(3):256-75 I http://dx.doi.org/10.20517/ir.2024.17 Page 260
Figure 2. The process of parallel algorithm. (A) The parallel algorithm for feature extraction and matching; (B) The parallel algorithm for
optimization. CPU: Central processing unit; GPU: graphic processing unit.
time compared to data transfer through Peripheral Component Interconnect Express (PCI-e) bus. From the
second layer, one thread is assigned to process one pixel. For each layer, the image is generated by down-
sampling the data from the previous layer. To satisfy the diversity of scale factors, the proposed method uses
the bilinear interpolation method to calculate the pixel values. In the image preprocessing stage, non-integer
scale factors are more likely to generate non-integer pixel coordinates during downsampling than integer scale
factors. Therefore, bilinear interpolation is employed to find the four nearest pixel points to that pixel coordi-
nate to calculate the pixel value, which can reduce the visual distortion to some extent. In addition, in order
to extract higher quality feature points, the image needs to be initialized with the Gaussian filter before gen-
erating the image pyramid corresponding to the pyramid pool initialization of Figure 2A. Image filtering is
an identical and independent convolution operation on all pixels of an image, so it is well suited for parallel
operations in the GPU. Since the Gaussian template is a fixed two-dimensional array, it can be stored in the
constant memory of the CUDA architecture to avoid duplicate memory copies.
2.2 FAST detection
FAST corner is determined by detecting the similarity between the corner candidate and its sixteen sur-
rounding pixels within a radius of 3 pixels. The label of the pixels around is taken as one of three states
according to
< −
= − ≤ ≤ + (1)
ℎ > + ,
where is the detection threshold, and is the pixel value of the corner candidate. If there are continuous
pixels whose labels are “darker” or “brighter”, then it is regarded as a corner. Since the corner detection is
only related to its own neighborhood, one thread is still assigned to process one pixel for parallel acceleration
of detection.

