Page 86 - Read Online
P. 86

Zhang et al. Intell Robot 2022;2(3):275­97  I http://dx.doi.org/10.20517/ir.2022.20  Page 279

               theoretical justification, and empirical performance [20–22] .


               Similartoon-policyalgorithms,PPO(TRPO)hasbeencriticizedforitssampleinefficiency; thus,moreefficient
               model-free algorithms (ARS [23] , SAC [24] , V-MPO [25] , etc.) are sometimes considered. Some researchers have
               also recently used advanced algorithms for more challenging tasks. For example, the multi-objective variant of
               the VMPO algorithm (MO-VMPO)  [26] has been utilized to train a policy to track the planned trajectories [27] .
               Some researchers have introduced guided constrained policy optimization (GCPO) method for tracking base
               velocitycommandswhilefollowingdefinedconstraints [28] . Moreover,formoreefficientreal-worldfine-tuning
               and to avoid overestimation problems, REDQ, an off-policy algorithm [29] , is used for real robots [30] .


               3.2. Simulator
               The robot simulator should be able to simulate the dynamic physical laws of the robot itself more realistically
               and efficiently solve the collisions generated when the robot interacts with the environment. Over the past few
               years, the Pybullet [31]  and RaiSim [32]  simulation platforms have been the choice of most researchers. However,
               the current robotic simulators in academia are still relatively simple, and the precision is far less than that of
               simulators in games. For robots, directly realizing end-to-end decision making from perception to control
               is difficult without an accurate and realistic simulator. Common robotic simulators, such as Pybullet and
               RaiSim, can only solve control-level simulations, but they are stretched for real-world simulations. They have
               been developed to run on CPUs with reduced parallelism. On the other hand, while mujoco [33]  is a popular
               DRL algorithm verification simulator, it is rarely used as a deployment and testing platform for real-world
               quadrupedal locomotion algorithms. A possible explanation is that the highly encapsulated mujoco simulator
               makes it difficult for researchers to develop it further.

               Recently,NVIDIAreleasedanewsimulator,IsaacGym [34] ,whichsimulatestheenvironmentwithmuchhigher
               accuracy than the aforementioned simulators, and can simulate and train directly on GPUs. This simulator is
               scalable and can simulate a large number of scenarios in parallel, so researchers can use DRL algorithms for
               large-scale training. It can also build large-scale realistic complex scenes, and its underlying PhysX engine can
               accurately and realistically model and simulate the motion of objects. Therefore, more researchers have begun
               to use Isaac Gym as the implementation and verification platform of DRL algorithm [35–38] .


               3.3. Hardware platform
               In the early research stage, Minitaur [39]  with only eight degrees of freedom was used to verify the feasibility
               of the DRL algorithm in simple experimental scenarios. To accomplish more complex tasks, robots (Unitree
                      1
                                 2
               Laikago , Unitree A1 , ANYmal [40] , etc.) with more than 12 degrees of freedom are used by researchers. While
               theANYmal seriesrobotsarewellknownfortheirhighhardwarecosts, low-costrobotssuchasUnitreeA1area
               more prevailing choice among researchers. Lower-cost hardware platforms allow DRL algorithms to be more
               widely used. More recently, a wheel-legged quadruped robot [38]  demonstrated skills learned from existing
               DRL controllers and trajectory optimization, such as ducking and walking, and new skills, such as switching
               between quadrupedal and humanoid configurations.

               3.4. Publisher
               Currently, DRL-based quadrupedal locomotion research is an emerging and promising field, and many papers
               have not been officially published. The published papers are mainly in journals or conferences related to the
               field of robotics, and there are four outstanding works [6–9]  published on Science Robotics. It is worth noting
               that the field is actually an intersection of several fields, and some excellent studies have been published at




                  1 https://www.unitree.com/products/laikago/
                  2 https://www.unitree.com/products/a1/
   81   82   83   84   85   86   87   88   89   90   91