Page 83 - Read Online
P. 83

Page 276                        Zhang et al. Intell Robot 2022;2(3):275­97  I http://dx.doi.org/10.20517/ir.2022.20


                                         (B) Quadrupedal
                                      locomotion in challenging
                                       natural environments.              (F) Exteroceptive and proprioceptive
                                                                         perception for quadrupedal locomotion
                                                                          in a variety of  challenging natural
                      (A) Recovering from  (C) Imitating real-world
                          the fall.         animals.                         and urban environments.

                        2019
                                                                           2022
                                                           2021
                                        2020
                                       (D) Adaptive behaviors   (E) Coupling vision  (G) Utilizing prior knowledge
                                       in response to changing  and  proprioception  to learn reusable locomotion
                                          situations.    for navigation tasks.  and dribbling skills.

                                                                        (H) Leveraging both proprioceptive
                                                                         states and visual observations for
                                                                             locomotion control.


               Figure 1. Several typical quadrupedal locomotion studies based on DRL algorithm: (A) recovering from a fall  [6] ; (B) a radically robust
               controller for quadrupedal locomotion in challenging natural environments  [7] ; (C) learning agile locomotion skills by imitating real-world
               animals  [10] ; (D) producing adaptive behaviors in response to changing situations  [9] ; (E) coupling vision and proprioception for navigation
               tasks  [11] ; (F) integrating exteroceptive and proprioceptive perception for quadrupedal locomotion in a variety of challenging natural and
               urban environments over multiple seasons  [8] ; (G) utilizing prior knowledge of human and animal movement to learn reusable locomotion
               and dribbling skills  [12] ; and (H) leveraging both proprioceptive states and visual observations for locomotion control  [13] .


               reach and rapidly change the kinematic state according to the environment. To further study quadrupedal
               locomotion on uneven terrain, the complexity of traditional control methods is gradually increased as more
               scenarios are considered [1–4] . As a result, the associated development and maintenance becomes rather time-
               consuming and labor-intensive, and it remains vulnerable to extreme situations.

               With the rapid development of the artificial intelligence field, deep reinforcement learning (DRL) has recently
               emerged as an alternative method for developing legged motor skills. The core idea of DRL is that the con-
               trol policy learns to make decisions to obtain the maximum benefit based on the reward received from the
                          [5]
               environment . DRL has been used to simplify the design of locomotion controllers, automate parts of the
               design process, and learn behaviors that previous control methods could not achieve [6–9] . Research on DRL
               algorithms for legged robots has gained wide attention in recent years. Meanwhile, several well-known re-
               search institutions and companies have publicly revealed their implementations of DRL-based legged robots,
               as shown in Figure 1.

               Currently, there are several reviews on applying DRL algorithms to robots. Some works summarize the types
               of DRL algorithms and deployment on several robots such as robotic arms, bipeds, and quadrupeds [14] . They
               discuss in detail the theoretical background and advanced learning algorithms of DRL, as well as present key
               currentchallengesinthisfieldandideasforfutureresearchdirectionstostimulatenewresearchinterests. There
               is also a work summarizing some case studies involving robotic DRL and some open problems [15] . Based on
               these case studies, they discuss common challenges in DRL and how the work addresses them. They also
               provide an overview of other prominent challenges, many of which are unique to real-world robotics settings.
               Furthermore, a common paradigm for DRL algorithms applied to robotics is to train policies in simulations
               and then deploy them on real machines. This can lead to the reality gap [16]  (also known as sim-to-real gap)
               problem, which is summarized for the robotic arm in [17] . These reviews introduce the basic background
               behind sim-to-real transfer in DRL and outline the main methods currently used: domain randomization,
               domain adaptation, imitation learning, meta-learning, and knowledge distillation. They categorize some of
               the most relevant recent works and outline the main application scenarios while also discussing the main
               opportunities and challenges of different approaches and pointing out the most promising directions. The
               closest work to our review simply surveys current research on motor skills learning via DRL algorithms [18] ,
   78   79   80   81   82   83   84   85   86   87   88