Page 8 - Read Online
P. 8

Page 106                          Wu et al. Intell Robot 2022;2(2):105­29  I http://dx.doi.org/10.20517/ir.2021.20


               1. INTRODUCTION
               The perception system is crucial for autonomous driving, which enables the autonomous car to understand
               the surrounding environment with the location, velocity, and future state of pedestrians, obstacles, and other
               traffic participants. It provides basic and essential information for downstream tasks of autonomous driving
               (i.e., decisionmaking, planning, andcontrolsystem). Thus, a precise perception systemis vital, which depends
               on breakthroughs in both hardware and software, i.e., 2D and 3D acquisition technology and perception algo-
               rithms.

               Sensors equipped on the perception system generally include 2D cameras, RGB-D cameras, radar, and LiDAR.
               With advantages such as high angular resolution, clear detail recognition, and long-range detection, LiDAR
               thus becomes indispensable in autonomous driving above the L3 level. LiDAR utilizes pulses of light to trans-
               late the physical world into a 3D point cloud in real time with a high level of confidence. By measuring the
               propagation distance between the LiDAR emitter and the target object and analyzing the reflected energy mag-
               nitude, amplitude, frequency, and phase of the reflected wave spectrum on the surface of the target object, Li-
               DAR can present the precise 3D structural information of the target object within centimeter level. According
               to the scanning mechanism, LiDAR can be divided into three categories: the standard spindle-type, solid-state
               LiDAR (MEMS), and flash LiDAR. Compared with the standard spindle-type LiDAR, solid-state LiDAR and
               flash LiDAR provide a solution to high material cost and high mass production cost; therefore, the standard
               spindle-type LiDAR will be replaced gradually in the future. The application of LiDAR in autonomous cars is
               gradually gaining market attention. According to Sullivan’s statistics and forecasts, the LiDAR market in the
               automotive segment is expected to reach $8 billion by 2025, accounting for 60% of the total.

               In recent decades, deep learning has been attracting extensive attention from computer vision researchers due
               to its outstanding ability in dealing with massive and unstructured data, which stimulates the growth of envi-
               ronment perception algorithms for autonomous driving. Depending on whether the algorithm concerns the
               position and pose of the object in real 3D space or just the position of the object in the reflected plane (i.e.,
               image plane), deep learning-based perception algorithms can be divided into 3D and 2D perception. While
               deep learning-based 2D perception has achieved great progress and thus become a mature branch in the field
               of computer vision, 3D perception is an emerging topic and yet under-investigated. Relatively, 3D perception
               outputs abundant information, i.e., height, length, width, and semantic label for each 3D object, to restore the
               real state of the object in three-dimensional space. In general, the input data of 3D perception tasks contain
               RGB-D images from depth cameras, images from monocular cameras, binocular cameras, and multi-cameras,
               and point clouds from LiDAR scanning. Among them, data from LiDAR and multi-camera-based stereo-
               vision systems achieve higher accuracy in 3D inference. Unlike images from stereo-vision systems, LiDAR
               point clouds as a relatively new data structure are unordered and possess interaction among points as well
               as invariance under transformation. These characteristics make deep learning on LiDAR point clouds more
               challenging. The publication of the pioneering framework PointNet [1]  together with PointNet++ [2]  inspires
               plenty of works on deep learning for LiDAR point clouds, which will promote the development of autonomous
               driving perception systems. Hence, this work gives a review of 3D perception algorithms based on deep learn-
               ing for LiDAR point cloud. However, in real-world applications, a single LiDAR sensor always struggles in
               heavy weather, color-related detection, and lightly disturbed conditions, which does not fulfill the need of
               autonomous cars that must perceive surroundings accurately and robustly in all variable and complex con-
               ditions. To overcome the shortcomings of a single LiDAR, LiDAR-based fusion [3,4]  emerges with improved
               perception accuracy, reliability, and robustness. Among the LiDAR-fusion methods, the fusion of LiDAR sen-
               sors and cameras including visual cameras and thermal cameras is most widely used in the area of robotics
               and autonomous driving perception. Hence, this paper also reviews deep learning-based fusion methods for
               LiDAR.

               LiDAR-based 3D perception tasks take a LiDAR point cloud (or a LiDAR point cloud fused with images or
   3   4   5   6   7   8   9   10   11   12   13