Page 12 - Read Online
P. 12
Page 110 Wu et al. Intell Robot 2022;2(2):10529 I http://dx.doi.org/10.20517/ir.2021.20
While the above methods learn point-wise features through multi-layer perceptrons, some other works adopt
3D convolutional kernels to design convolutional neural networks for point clouds, which can preserve more
spatial information of point clouds. One of the typical networks is PointConv [36] , which uses a permutation-
invariant convolution operation. As an extension of traditional image convolution, the weight functions and
the density functions of a given point in PointConv are learned from MLP and kernel density estimation, re-
spectively. Boulch et al. [37] built a generalization of discrete convolutions for point clouds by replacing the
discrete kernels for grid sampled data with continuous ones. Relation-shape convolutional neural network
(RS-CNN) [38] is a hierarchical architecture which leverages the relation-shape convolution (RS-Conv) to learn
the geometric topology constraint among points from their relations with an inductive local representation.
Inspired by dense connection mode, Liu et al. [39] introduced DensePoint, a framework that aggregates outputs
of all previous layers through a generalized convolutional operator in order to learn a densely contextual rep-
resentation of point clouds from multi-level and multi-scale semantics. Apart from continuous convolutional
kernels, discrete convolutional kernels play a role in deep learning for point clouds as well. ShellNet [29] , a con-
volution network that utilizes an effective convolution operator called ShellConv, achieves a balance of high
performance and short run time. ShellConv partitions the domain into concentric spherical shells and con-
ducts convolutional operation based on this discrete definition. Mao et al. [40] proposed InterpConv for object
classification, whose key parts are spatially-discrete kernel weights, a normalization term and an interpolation
function. Rao et al. [41] introduced an architecture named spherical fractal convolutional neural network, in
which point clouds are projected into a discrete fractal spherical structure in an adaptive way. Unlike other
CNN methods, a novel convolution operator [30] is proposed, which convolves annularly on point clouds and
is applied in an annular convolutional neural network (A-CNN), leading to higher performance. Through
specified regular and dilated rings along with constraint-based K-NN search methods, the annular convolu-
tional methods can order neighboring points and attain the relationship between ordered points. DRINet [42]
develops a dual-representation (i.e., voxel-point and point-voxel ) to propagate features between these two
representations, performing SOTA on the ModelNet40 dataset with high runtime efficiency.
3.2. LiDARfusion classification
Sensors-fusionarchitectureshavebecomeanemergingtopicduetotheirbalanceamongthecompatibilitywith
application scenarios, the complementarity of perception information, and the cost. LiDAR is fused with other
sensors to deal with specific tasks for autonomous driving. For instance, point clouds and images are fused in
order to accomplish the 2D object detection [43,44] and the fusion of LiDAR and radar is applied to localize and
track objects more precisely in terms of 3D object detection [4,45] . However, it is desirable to carry out the point
cloud based object classification as a single task with fused methods in the field of real-world self-driving cars.
Generally, 3D classification is implemented as a branch of 3D object detection architecture to classify targets
of a proposal region and help predict the bounding box. Moreover, since the PointNet [1] was proposed in
2017, many studies dealing directly with raw point clouds have been inspired. For 3D classification task, the
overall accuracy can achieve 93.6% [16] on the generic benchmark ModelNet40, which satisfies the demand
for applications of autonomous car so that 3D classification is not regarded as an independent task. On the
other hand, LiDAR-based fusion methods for the object category prediction are not feasible due to the lack
of corresponding image datasets aligned with existing point cloud datasets. Only a few works concentrate on
the fusion method specifically for 3D classification in the field of autonomous driving. Therefore, this section
focuses on the classifier integrated into the LiDAR-fusion 3D detectors or segmentators.
According to the different stages in which sensors data are fused, fusion methods can be divided into early
fusion and late fusion. For early fusion, features from different data sources are fused in the input stage by
concatenating each individual feature into a unified representation. This representation is sent to a network
to get final outputs. For late fusion, the prediction results from the individual uni-modal streams are fused
to output the final prediction. Late fusion merges results by summation or averaging in the simplest cases.
Compared with early fusion, late fusion lacks the ability to exploit cross correlations among multi-modal data.