Page 8 - Read Online
P. 8
Page 106 Wu et al. Intell Robot 2022;2(2):10529 I http://dx.doi.org/10.20517/ir.2021.20
1. INTRODUCTION
The perception system is crucial for autonomous driving, which enables the autonomous car to understand
the surrounding environment with the location, velocity, and future state of pedestrians, obstacles, and other
traffic participants. It provides basic and essential information for downstream tasks of autonomous driving
(i.e., decisionmaking, planning, andcontrolsystem). Thus, a precise perception systemis vital, which depends
on breakthroughs in both hardware and software, i.e., 2D and 3D acquisition technology and perception algo-
rithms.
Sensors equipped on the perception system generally include 2D cameras, RGB-D cameras, radar, and LiDAR.
With advantages such as high angular resolution, clear detail recognition, and long-range detection, LiDAR
thus becomes indispensable in autonomous driving above the L3 level. LiDAR utilizes pulses of light to trans-
late the physical world into a 3D point cloud in real time with a high level of confidence. By measuring the
propagation distance between the LiDAR emitter and the target object and analyzing the reflected energy mag-
nitude, amplitude, frequency, and phase of the reflected wave spectrum on the surface of the target object, Li-
DAR can present the precise 3D structural information of the target object within centimeter level. According
to the scanning mechanism, LiDAR can be divided into three categories: the standard spindle-type, solid-state
LiDAR (MEMS), and flash LiDAR. Compared with the standard spindle-type LiDAR, solid-state LiDAR and
flash LiDAR provide a solution to high material cost and high mass production cost; therefore, the standard
spindle-type LiDAR will be replaced gradually in the future. The application of LiDAR in autonomous cars is
gradually gaining market attention. According to Sullivan’s statistics and forecasts, the LiDAR market in the
automotive segment is expected to reach $8 billion by 2025, accounting for 60% of the total.
In recent decades, deep learning has been attracting extensive attention from computer vision researchers due
to its outstanding ability in dealing with massive and unstructured data, which stimulates the growth of envi-
ronment perception algorithms for autonomous driving. Depending on whether the algorithm concerns the
position and pose of the object in real 3D space or just the position of the object in the reflected plane (i.e.,
image plane), deep learning-based perception algorithms can be divided into 3D and 2D perception. While
deep learning-based 2D perception has achieved great progress and thus become a mature branch in the field
of computer vision, 3D perception is an emerging topic and yet under-investigated. Relatively, 3D perception
outputs abundant information, i.e., height, length, width, and semantic label for each 3D object, to restore the
real state of the object in three-dimensional space. In general, the input data of 3D perception tasks contain
RGB-D images from depth cameras, images from monocular cameras, binocular cameras, and multi-cameras,
and point clouds from LiDAR scanning. Among them, data from LiDAR and multi-camera-based stereo-
vision systems achieve higher accuracy in 3D inference. Unlike images from stereo-vision systems, LiDAR
point clouds as a relatively new data structure are unordered and possess interaction among points as well
as invariance under transformation. These characteristics make deep learning on LiDAR point clouds more
challenging. The publication of the pioneering framework PointNet [1] together with PointNet++ [2] inspires
plenty of works on deep learning for LiDAR point clouds, which will promote the development of autonomous
driving perception systems. Hence, this work gives a review of 3D perception algorithms based on deep learn-
ing for LiDAR point cloud. However, in real-world applications, a single LiDAR sensor always struggles in
heavy weather, color-related detection, and lightly disturbed conditions, which does not fulfill the need of
autonomous cars that must perceive surroundings accurately and robustly in all variable and complex con-
ditions. To overcome the shortcomings of a single LiDAR, LiDAR-based fusion [3,4] emerges with improved
perception accuracy, reliability, and robustness. Among the LiDAR-fusion methods, the fusion of LiDAR sen-
sors and cameras including visual cameras and thermal cameras is most widely used in the area of robotics
and autonomous driving perception. Hence, this paper also reviews deep learning-based fusion methods for
LiDAR.
LiDAR-based 3D perception tasks take a LiDAR point cloud (or a LiDAR point cloud fused with images or