Page 15 - Read Online
P. 15

Wu et al. Intell Robot 2022;2(2):105­29  I http://dx.doi.org/10.20517/ir.2021.20    Page 113

               Table 3. Summary of 3D object detection methods. Here ”I”, ”mvPC”, ”vPC”, ”pPC”, ”RaPC” stands for image, multiple view of point
               cloud, voxelized point cloud, point cloud, Radar point cloud respectively
                                                   Modality &
                Detector   Category  Model                    Novelty
                                                  Representation
                                    IPOD  [50]       pPC      a novel point-based proposal generation
                                    STD  [51]        pPC      proposal generation(from point-based spherical anchors)+PointPool
                                    PointRGCN  [53]  pPC      RPN+R-GCN+C-GCN
                            LiDAR   SRN  [34]        pPC      structural relation network(geometric and locational features+MLP)
                            -Only
                                    Part-A2  [54]    pPC      intra-object part prediction+RoI-aware point cloud pooling
                                    HVNet  [55]      vPC      multi-scale voxelization+hybrid voxel feature extraction
                                    LiDAR R-CNN  [56]  pPC    R-CNN style second-stage detector(size aware point features)
                Two-stage           3D-CVF  [64]              CVF(auto-calibrated projection)+adaptive gated fusion network
                 Detection                          I & vPC
                                    Roarnet  [65]   I & pPC   RoarNet 2D(geometric agreement search)+RoarNet 3D(RPN+BRN)
                                    MV3D  [12]      I & mvPC  3D proposals network+region-based fusion network
                            LiDAR   ScanNet  [46]   I & mvPC  multi-level fusion+spatial-channel attention +extension spatial upsample
                           -Fusion  MMF  [47]       I & mvPC  point-wise fusion+ROI feature fusion
                                    Pointpainting  [66]  I & pPC  image based semantics network+appended (painted) point cloud
                                    CM3D  [67]      I & pPC   pointwise feature fusion+proposal genaration+ROI-wise feature fusion
                                    MVDNet  [28]  RaPC & mvPC  two-stage deep fusion(region-wise feature fusion)
                                    VoxelNet  [57]   vPC      voxel feature encoding+3D convolutional middle layer+RPN
                                    PointPillars  [58]  pillar points  pillar feature net+backbone(2D CNN)+SSD detection head
                                    SASSD  [59]      pPC      backbone(SECOND)+auxiliary network+PS Warp
                            LiDAR   TANet  [61]      vPC      Triple Attention module(channel-wise, point-wise, and voxel-wise attention)
                One-stage                [62]
                 Detection  -Only   SE-SSD           pPC      teacher and student SSDs+shape aware augumentation+consistency loss
                                    3D Auto Label  [63]  mvPC  motion state classification+static object and dynamic object auto labeling
                                    ImVoteNet  [48]  I & pPC  lift 2D image votes, semantic and texture cues to the 3D seed points
                                    EPNet  [68]     I & pPC   two-stream RPN+LI-Fusion Module+refinement network
                         LiDAR-Fusion  CLOCs  [69]  I & vPC   a late fusion architecture with any pair of pre-trained 2D and 3D detectors


               Table 4. Experiment results of 3D object detection methods on KITTI test 3D object detection benchmark. Average Precision (AP) for
               car with IoU threshold 0.7, pedestrian with IoU threshold 0.5, and cyclist with IoU threshold 0.5 is shown. ”-” means the result is not
               available

                                            Car                 Pedestrian              Cyclist
                     Model
                                    Easy  Medium   Hard  Easy    Medium  Hard    Easy  Medium   Hard
                     IPOD  [50]    79.75%  72.57%  66.33%  56.92%  44.68%  42.39%  71.40%  53.46%  48.34%
                     STD  [51]     79.71%  87.95%  75.09%  42.47%  53.29%  38.35%  61.59%  78.69%  55.30%
                     PointRGCN  [53]  85.97%  75.73%  70.60%  -    -    -         -    -         -
                     Part-A2  [54]  85.94%  77.86%  72.00%  89.52%  84.76%  81.47%  54.49%  44.50%  42.36%
                     LiDAR R-CNN  [56]  85.97%  74.21%  69.18%  -  -    -         -    -         -
                     3D-CVF  [64]  89.20%  80.05%  73.11%  -       -    -         -    -         -
                     Roarnet  [65]  83.71%  73.04%  59.16%  -      -    -         -    -         -
                     MV3D  [12]    71.09%  62.35%  55.12%  -       -    -         -    -         -
                     SCANet  [46]  76.09%  66.30%  58.68%  -       -    -         -    -         -
                     MMF  [47]     86.81%  76.75%  68.41%  -       -    -         -    -         -
                     CM3D  [67]    87.22%  77.28%  72.04%  -       -    -         -    -         -
                     VoxelNet  [57]  77.47%  65.11%  57.73%  39.48%  33.69%  31.51%  61.22%  48.36%  44.37%
                     PointPillars  [58]  79.05%  74.99%  68.30%  52.08%  43.53%  41.49%  75.78%  59.07%  52.92%
                     SASSD  [59]   88.75%  79.79%  74.16%  -       -    -         -    -         -
                     TANet  [61]   84.81%  75.38%  67.66%  54.92%  46.67%  42.42%  73.84%  59.86%  53.46%
                     SE-SSD  [62]  91.49%  82.54%  77.15%  -       -    -         -    -         -
                     EPNet  [68]   89.81%  79.28%  74.59%  -       -    -         -    -         -
                     CLOCs  [69]   88.94%  80.67%  77.15%  -       -    -         -    -         -


               4.2. LiDAR­fusion detection
               LiDAR-fusion detection enriches the information with the aspect of data sources to improve the performance
               at a low cost. Its auxiliary input data include RGB images, angular velocity (acceleration), depth images, and
               so on.



               4.2.1. Two-stage detection
               The input data of the LiDAR-fusion detector vary in diverse fields with aspects of sampling frequency and
               data representations. Hence, simple summation or multiplication at the source side contributes little to the
   10   11   12   13   14   15   16   17   18   19   20