Page 12 - Read Online
P. 12

Page 110                          Wu et al. Intell Robot 2022;2(2):105­29  I http://dx.doi.org/10.20517/ir.2021.20

               While the above methods learn point-wise features through multi-layer perceptrons, some other works adopt
               3D convolutional kernels to design convolutional neural networks for point clouds, which can preserve more
               spatial information of point clouds. One of the typical networks is PointConv [36] , which uses a permutation-
               invariant convolution operation. As an extension of traditional image convolution, the weight functions and
               the density functions of a given point in PointConv are learned from MLP and kernel density estimation, re-
               spectively. Boulch et al. [37]  built a generalization of discrete convolutions for point clouds by replacing the
               discrete kernels for grid sampled data with continuous ones. Relation-shape convolutional neural network
               (RS-CNN) [38]  is a hierarchical architecture which leverages the relation-shape convolution (RS-Conv) to learn
               the geometric topology constraint among points from their relations with an inductive local representation.
               Inspired by dense connection mode, Liu et al. [39]  introduced DensePoint, a framework that aggregates outputs
               of all previous layers through a generalized convolutional operator in order to learn a densely contextual rep-
               resentation of point clouds from multi-level and multi-scale semantics. Apart from continuous convolutional
               kernels, discrete convolutional kernels play a role in deep learning for point clouds as well. ShellNet [29] , a con-
               volution network that utilizes an effective convolution operator called ShellConv, achieves a balance of high
               performance and short run time. ShellConv partitions the domain into concentric spherical shells and con-
               ducts convolutional operation based on this discrete definition. Mao et al. [40]  proposed InterpConv for object
               classification, whose key parts are spatially-discrete kernel weights, a normalization term and an interpolation
               function. Rao et al. [41]  introduced an architecture named spherical fractal convolutional neural network, in
               which point clouds are projected into a discrete fractal spherical structure in an adaptive way. Unlike other
               CNN methods, a novel convolution operator [30]  is proposed, which convolves annularly on point clouds and
               is applied in an annular convolutional neural network (A-CNN), leading to higher performance. Through
               specified regular and dilated rings along with constraint-based K-NN search methods, the annular convolu-
               tional methods can order neighboring points and attain the relationship between ordered points. DRINet [42]
               develops a dual-representation (i.e., voxel-point and point-voxel ) to propagate features between these two
               representations, performing SOTA on the ModelNet40 dataset with high runtime efficiency.


               3.2. LiDAR­fusion classification
               Sensors-fusionarchitectureshavebecomeanemergingtopicduetotheirbalanceamongthecompatibilitywith
               application scenarios, the complementarity of perception information, and the cost. LiDAR is fused with other
               sensors to deal with specific tasks for autonomous driving. For instance, point clouds and images are fused in
               order to accomplish the 2D object detection [43,44] and the fusion of LiDAR and radar is applied to localize and
               track objects more precisely in terms of 3D object detection [4,45] . However, it is desirable to carry out the point
               cloud based object classification as a single task with fused methods in the field of real-world self-driving cars.
               Generally, 3D classification is implemented as a branch of 3D object detection architecture to classify targets
               of a proposal region and help predict the bounding box. Moreover, since the PointNet [1]  was proposed in
               2017, many studies dealing directly with raw point clouds have been inspired. For 3D classification task, the
               overall accuracy can achieve 93.6% [16]  on the generic benchmark ModelNet40, which satisfies the demand
               for applications of autonomous car so that 3D classification is not regarded as an independent task. On the
               other hand, LiDAR-based fusion methods for the object category prediction are not feasible due to the lack
               of corresponding image datasets aligned with existing point cloud datasets. Only a few works concentrate on
               the fusion method specifically for 3D classification in the field of autonomous driving. Therefore, this section
               focuses on the classifier integrated into the LiDAR-fusion 3D detectors or segmentators.


               According to the different stages in which sensors data are fused, fusion methods can be divided into early
               fusion and late fusion. For early fusion, features from different data sources are fused in the input stage by
               concatenating each individual feature into a unified representation. This representation is sent to a network
               to get final outputs. For late fusion, the prediction results from the individual uni-modal streams are fused
               to output the final prediction. Late fusion merges results by summation or averaging in the simplest cases.
               Compared with early fusion, late fusion lacks the ability to exploit cross correlations among multi-modal data.
   7   8   9   10   11   12   13   14   15   16   17