Page 24 - Read Online
P. 24

Page 122                          Wu et al. Intell Robot 2022;2(2):105­29  I http://dx.doi.org/10.20517/ir.2021.20

               Table 6. Summary of 3D semantic segmentation. ”I”, ”mvPC”, ”vPC”, ”pPC” and ”rm” stands for image, point cloud in multi-view based
               representation, point cloud in voxel-based representation, point cloud in point-based representation and range map separately
                                        Modality &
                Category  Model                   Architecture
                                      Representation
                         PointNet  [1]    pPC     Point-wise MLP+T-Net+global max pooling
                         PointNet++  [2]  pPC     Set abstraction (sampling, grouping, feature learning)+interpolation+skip link concatentation
                         KWYND  [92]      pPC     Feature network + neighbors definition + regional descriptors
                         MPC  [93]        pPC     PointNet++-like network+ gumbel subset sampling
                         3D-MiniNet  [97]  pPC    Fast 3D point neighbor search + 3D MiniNet + post-processing
                 LiDAR   LU-Net  [100]  pPC & vPC  U-Net for point cloud
                  -Only  SceneEncoder  [101]  pPC  Multi-hot scene descriptor + region similarity loss
                         RPVNet  [13]  rpc&pPC&vPC  Range-point-voxel fusion network(deep fusion + gated fusion module)
                         SqueezeSeg  [102]  mvPC  SqueezeNet + conditional random field
                         PointSeg  [103]  mvPC    SqueezeNet + new feature extract layers
                         Pointwise  [105]  pPC    Pointwise convolution operator
                         Dilated  [106]   pPC     Dilated point convolutions
                         3DMV  [107]     I & vPC  A novel end-to-end network(back propagation layer)
                         SuperSensor  [95]  I & mvPC  Associate architecture+360 degree sensor configuration
                 LiDAR         [108]
                 -Fusion  MVPNet        I & mvPC  Multi-view point regression network+geometric loss
                         FuseSeg  [3]    I & rPC  Point correspondece+feature level fusion
                         PMF  [109]     I & mvPC  Perspective projection+a two-stream network(fusion part)+perception-aware loss

               Table 7. Summary of 3D instance segmentation. ”I”, ”mvPC”, ”vPC”, ”pPC”,”FPC” and ”rm” stands for image, point cloud in multi-
               view based representation, point cloud in voxel-based representation, point cloud in point-based representation, point cloud in Frustum
               representation and range map separately

                                              Modality &
                   Category  Model                       Architecture
                                             Representation
                            GSPN  [111]         pPC      Region-based PointNet(generative shape proposal network+Point RoIAlign)
                            3D-BoNet  [112]     pPC      Instance-level bounding box prediction + point-level mask prediction
                            Joint  [113]        pPC      Spatial embedding object proposal + local Bounding Boxes refinement
                  LiDAR-Only
                            SqueezeSeg  [102]   mvPC     SqueezeNet + conditional random field
                            SqueezeSegV2  [114]  mvPC    SqueezeSeg-like + context aggregation module
                            3D-BEVIS  [118]     mvPC     2D-3D deep model(2D instance feature+3D feature propagation)
                            PanopticFusion  [116]  I & vPC  Pixel-wise panoptic labels+a fully connected conditional random field
                 LiDAR-Fusion
                            Fustrum PointNets  [117]  I & FPC  Frunstum proposal+3D instance segmentation(PointNet)

               segmentation results through clustering.




               7. DISCUSSION
               As the upstream and key module of an autonomous vehicle, the perception system outputs its results to down-
               stream modules (e.g., decision and planning modules). Therefore, the performance and reliability of the per-
               ception system determine the implementation of downstream tasks, thus affecting the performance of the
               whole autonomous system. For now, although sensor fusion (Table 8 shows a summary for LiDAR fusion ar-
               chitectures in this paper) can make up for the shortcomings of single LiDAR in bad weather and other aspects,
               there is still a huge gap between the algorithm design and practical applications in the real world. For this
               reason, it is necessary to be properly aware of existing open challenges and figure out possible directions to the
               solution. This section discusses the challenges and possible solutions for LiDAR-based 3D perception.

                • Dealing with large-scale point clouds and high-resolution images. The need for higher accuracy has
                  prompted researchers to consider larger scale point clouds and higher resolution images. Most the existing
                  algorithms [2,29,36,119]  are designed for small 3D point clouds (e.g., 4k points or 1 m × 1 m blocks) without
                  goodextendingcapabilitytolargerpointclouds(e.g., millionsofpointsandupto200m ×200m). However,
                  larger point clouds come with a higher computational cost that is hard to afford for self-driving cars with
                  limited computational processing ability. Several recent studies have focused on this problem and proposed
                  some solutions. A deep learning framework for large-scale point clouds named SPG [120]  partitions point
   19   20   21   22   23   24   25   26   27   28   29