Page 9 - Read Online
P. 9

Wu et al. Intell Robot 2022;2(2):105­29  I http://dx.doi.org/10.20517/ir.2021.20    Page 107

               data from other sensors) as input, and then outputs the category of the target object (3D shape classification);
               3D bounding box implying location, height, length, and width with the category of the target object (3D ob-
               ject detection); track ID in a continuous sequence (3D object tracking); segmented label for each point (3D
                               1
               segmentation); etc. . In addition, 3D point cloud registration, 3D reconstruction, 3D point cloud generation,
               and 6-DOF pose estimation are also tasks worth researching.


               Previous related surveys review deep learning methods on LiDAR point cloud before 2021 [5–8] . This paper
               reviews the latest deep learning methods on not only LiDAR point cloud but also LiDAR point cloud fusion
               (with image and radar). Compared with multi modality fusion surveys [9–11] , which cover a wide range of
               sensors, this paper provides a more detailed and comprehensive review on each related 3D perception task
               (3D shape classification, 3D object detection, 3D object tracking, and 3D segmentation). The contribution of
               this paper is summarized as follows:

               1. This paper is a survey that focuses on deep learning algorithms with only LiDAR point clouds and LiDAR-
                  based fusion data (especially LiDAR point cloud fused with the camera image) as input in the field of
                  autonomous driving. This work is structured considering four representative 3D perception tasks, namely
                  3D shape classification, 3D object detection, 3D object tracking, and 3D segmentation.
               2. This paper gives a review of methods organized by whether fusion data are utilized as their input data.
                  Moreover, studies and algorithms reviewed in this paper were published in the last decade, which ensures
                  the timeliness and refer-ability of the study.
               3. This paper puts some open challenges and possible research directions forward to serve as a reference and
                  stimulate future works.

               The remainder of this paper is structured as follows. Section 2 provides background knowledge about Li-
               DAR point clouds, including representations and characteristics of LiDAR point cloud, existing LiDAR-based
               benchmark datasets, and corresponding evaluation metrics. The following four sections give a review of rep-
               resentative LiDAR-only and LiDAR-fusion methods for four 3D perception tasks: Section 3 for 3D shape
               classification, Section 4 for 3D object detection, Section 5 for 3D object tracking, and Section 6 for 3D seman-
               tic segmentation and instance segmentation. Some discussions about overlooked challenges and promising
               directions are raised in Section 7. At the end, Section 8 draws the conclusions for this paper.



               2. BACKGROUND
               Point clouds in the field of autonomous driving are generally generated by the on-board LiDAR. The existing
               mainstream LiDAR emits laser wavelengths of 905 and 1550 nm, which are focused and do not disperse over
               longdistances. WhenalaserbeamofLiDARhitsthesurfaceofanobject, thereflectedlasercarriesinformation
               of the target object such as location and distance. By scanning the laser beam according to a certain trajectory,
               theinformationofthereflectedlaserpointswillberecorded. SincetheLiDARscanningisextremelyfine, many
               laser points can be obtained, and thus a LiDAR point cloud is available. The LiDAR point cloud (point clouds
               mentioned in this paper refer to LiDAR point clouds) is an unordered sparse point set representing the spatial
               distribution of targets and characteristics of the target surface under the same spatial reference system. There
               are three approaches basically implemented in deep learning-based methods to process LiDAR point cloud so
               that processed data can be used as input data to the network: (1) multi-view-based methods; (2) volumetric-
               based methods; and (3) point-based methods. Multi-view-based methods represent point cloud as 2D views
               by projecting it onto 2D grid-based feature maps, which can leverage existing 2D convolution methods and



                  1 Here, we use the term 3D to narrowly describe the tasks with 3D point clouds or 3D point cloud-based fusion data as input and
               information of the object in real 3D space as output (i.e., category, 3D bounding box, and semantic labels of objects). Broadly speaking, some
               other works explain 3D tasks as tasks inferring information of the object in real 3D space with any kind of input data.
   4   5   6   7   8   9   10   11   12   13   14