Page 83 - Read Online
P. 83
Page 276 Zhang et al. Intell Robot 2022;2(3):27597 I http://dx.doi.org/10.20517/ir.2022.20
(B) Quadrupedal
locomotion in challenging
natural environments. (F) Exteroceptive and proprioceptive
perception for quadrupedal locomotion
in a variety of challenging natural
(A) Recovering from (C) Imitating real-world
the fall. animals. and urban environments.
2019
2022
2021
2020
(D) Adaptive behaviors (E) Coupling vision (G) Utilizing prior knowledge
in response to changing and proprioception to learn reusable locomotion
situations. for navigation tasks. and dribbling skills.
(H) Leveraging both proprioceptive
states and visual observations for
locomotion control.
Figure 1. Several typical quadrupedal locomotion studies based on DRL algorithm: (A) recovering from a fall [6] ; (B) a radically robust
controller for quadrupedal locomotion in challenging natural environments [7] ; (C) learning agile locomotion skills by imitating real-world
animals [10] ; (D) producing adaptive behaviors in response to changing situations [9] ; (E) coupling vision and proprioception for navigation
tasks [11] ; (F) integrating exteroceptive and proprioceptive perception for quadrupedal locomotion in a variety of challenging natural and
urban environments over multiple seasons [8] ; (G) utilizing prior knowledge of human and animal movement to learn reusable locomotion
and dribbling skills [12] ; and (H) leveraging both proprioceptive states and visual observations for locomotion control [13] .
reach and rapidly change the kinematic state according to the environment. To further study quadrupedal
locomotion on uneven terrain, the complexity of traditional control methods is gradually increased as more
scenarios are considered [1–4] . As a result, the associated development and maintenance becomes rather time-
consuming and labor-intensive, and it remains vulnerable to extreme situations.
With the rapid development of the artificial intelligence field, deep reinforcement learning (DRL) has recently
emerged as an alternative method for developing legged motor skills. The core idea of DRL is that the con-
trol policy learns to make decisions to obtain the maximum benefit based on the reward received from the
[5]
environment . DRL has been used to simplify the design of locomotion controllers, automate parts of the
design process, and learn behaviors that previous control methods could not achieve [6–9] . Research on DRL
algorithms for legged robots has gained wide attention in recent years. Meanwhile, several well-known re-
search institutions and companies have publicly revealed their implementations of DRL-based legged robots,
as shown in Figure 1.
Currently, there are several reviews on applying DRL algorithms to robots. Some works summarize the types
of DRL algorithms and deployment on several robots such as robotic arms, bipeds, and quadrupeds [14] . They
discuss in detail the theoretical background and advanced learning algorithms of DRL, as well as present key
currentchallengesinthisfieldandideasforfutureresearchdirectionstostimulatenewresearchinterests. There
is also a work summarizing some case studies involving robotic DRL and some open problems [15] . Based on
these case studies, they discuss common challenges in DRL and how the work addresses them. They also
provide an overview of other prominent challenges, many of which are unique to real-world robotics settings.
Furthermore, a common paradigm for DRL algorithms applied to robotics is to train policies in simulations
and then deploy them on real machines. This can lead to the reality gap [16] (also known as sim-to-real gap)
problem, which is summarized for the robotic arm in [17] . These reviews introduce the basic background
behind sim-to-real transfer in DRL and outline the main methods currently used: domain randomization,
domain adaptation, imitation learning, meta-learning, and knowledge distillation. They categorize some of
the most relevant recent works and outline the main application scenarios while also discussing the main
opportunities and challenges of different approaches and pointing out the most promising directions. The
closest work to our review simply surveys current research on motor skills learning via DRL algorithms [18] ,