Page 15 - Read Online
P. 15
Wu et al. Intell Robot 2022;2(2):10529 I http://dx.doi.org/10.20517/ir.2021.20 Page 113
Table 3. Summary of 3D object detection methods. Here ”I”, ”mvPC”, ”vPC”, ”pPC”, ”RaPC” stands for image, multiple view of point
cloud, voxelized point cloud, point cloud, Radar point cloud respectively
Modality &
Detector Category Model Novelty
Representation
IPOD [50] pPC a novel point-based proposal generation
STD [51] pPC proposal generation(from point-based spherical anchors)+PointPool
PointRGCN [53] pPC RPN+R-GCN+C-GCN
LiDAR SRN [34] pPC structural relation network(geometric and locational features+MLP)
-Only
Part-A2 [54] pPC intra-object part prediction+RoI-aware point cloud pooling
HVNet [55] vPC multi-scale voxelization+hybrid voxel feature extraction
LiDAR R-CNN [56] pPC R-CNN style second-stage detector(size aware point features)
Two-stage 3D-CVF [64] CVF(auto-calibrated projection)+adaptive gated fusion network
Detection I & vPC
Roarnet [65] I & pPC RoarNet 2D(geometric agreement search)+RoarNet 3D(RPN+BRN)
MV3D [12] I & mvPC 3D proposals network+region-based fusion network
LiDAR ScanNet [46] I & mvPC multi-level fusion+spatial-channel attention +extension spatial upsample
-Fusion MMF [47] I & mvPC point-wise fusion+ROI feature fusion
Pointpainting [66] I & pPC image based semantics network+appended (painted) point cloud
CM3D [67] I & pPC pointwise feature fusion+proposal genaration+ROI-wise feature fusion
MVDNet [28] RaPC & mvPC two-stage deep fusion(region-wise feature fusion)
VoxelNet [57] vPC voxel feature encoding+3D convolutional middle layer+RPN
PointPillars [58] pillar points pillar feature net+backbone(2D CNN)+SSD detection head
SASSD [59] pPC backbone(SECOND)+auxiliary network+PS Warp
LiDAR TANet [61] vPC Triple Attention module(channel-wise, point-wise, and voxel-wise attention)
One-stage [62]
Detection -Only SE-SSD pPC teacher and student SSDs+shape aware augumentation+consistency loss
3D Auto Label [63] mvPC motion state classification+static object and dynamic object auto labeling
ImVoteNet [48] I & pPC lift 2D image votes, semantic and texture cues to the 3D seed points
EPNet [68] I & pPC two-stream RPN+LI-Fusion Module+refinement network
LiDAR-Fusion CLOCs [69] I & vPC a late fusion architecture with any pair of pre-trained 2D and 3D detectors
Table 4. Experiment results of 3D object detection methods on KITTI test 3D object detection benchmark. Average Precision (AP) for
car with IoU threshold 0.7, pedestrian with IoU threshold 0.5, and cyclist with IoU threshold 0.5 is shown. ”-” means the result is not
available
Car Pedestrian Cyclist
Model
Easy Medium Hard Easy Medium Hard Easy Medium Hard
IPOD [50] 79.75% 72.57% 66.33% 56.92% 44.68% 42.39% 71.40% 53.46% 48.34%
STD [51] 79.71% 87.95% 75.09% 42.47% 53.29% 38.35% 61.59% 78.69% 55.30%
PointRGCN [53] 85.97% 75.73% 70.60% - - - - - -
Part-A2 [54] 85.94% 77.86% 72.00% 89.52% 84.76% 81.47% 54.49% 44.50% 42.36%
LiDAR R-CNN [56] 85.97% 74.21% 69.18% - - - - - -
3D-CVF [64] 89.20% 80.05% 73.11% - - - - - -
Roarnet [65] 83.71% 73.04% 59.16% - - - - - -
MV3D [12] 71.09% 62.35% 55.12% - - - - - -
SCANet [46] 76.09% 66.30% 58.68% - - - - - -
MMF [47] 86.81% 76.75% 68.41% - - - - - -
CM3D [67] 87.22% 77.28% 72.04% - - - - - -
VoxelNet [57] 77.47% 65.11% 57.73% 39.48% 33.69% 31.51% 61.22% 48.36% 44.37%
PointPillars [58] 79.05% 74.99% 68.30% 52.08% 43.53% 41.49% 75.78% 59.07% 52.92%
SASSD [59] 88.75% 79.79% 74.16% - - - - - -
TANet [61] 84.81% 75.38% 67.66% 54.92% 46.67% 42.42% 73.84% 59.86% 53.46%
SE-SSD [62] 91.49% 82.54% 77.15% - - - - - -
EPNet [68] 89.81% 79.28% 74.59% - - - - - -
CLOCs [69] 88.94% 80.67% 77.15% - - - - - -
4.2. LiDARfusion detection
LiDAR-fusion detection enriches the information with the aspect of data sources to improve the performance
at a low cost. Its auxiliary input data include RGB images, angular velocity (acceleration), depth images, and
so on.
4.2.1. Two-stage detection
The input data of the LiDAR-fusion detector vary in diverse fields with aspects of sampling frequency and
data representations. Hence, simple summation or multiplication at the source side contributes little to the