Page 97 - Read Online
P. 97

Zhang et al. Intell Robot 2022;2(3):275­97   I http://dx.doi.org/10.20517/ir.2022.20                              Page 289




                                 Robot    Minitaur   Minitaur        ANYmal          Minitaur   Minitaur     Minitaur     Stoch        ANYmal






                                 Simulator  Pybullet  Pybullet       RaiSim          Pybullet   Pybullet     Pybullet     Pybullet     RaiSim





                                      Po-  Direc-  Veloci-       Torque,  and  and  Acceler-  Roll          Moving      Con-      Ori-  Phases.  Joint  Ve-
                                      Base  and      Acutual).  Velocity,  Impulse  Slippage  Base  Orienta-            Energy  Torques  and  Slip,  Base

                                      Previous  Running  Torques  Linear  Orientation,  Body  Foot  Collision,  Joint  and  Base  Angle,  Motor  Distance  Stance  and  Torques,  Deviation.

                                 Function  and  Desired  Motor  SpeedGap(Desiredvs.  and  State,  Joint  Slippage,  Self  Gap.  Distance,  Angles,  and  and  Gap,  Change,  and  Velocities.  Goal  Work,  Target  and  Angular

                           research  Reward  Current  sitions,  tion,  ties.  Angular  Height,  Power,  and  Clearance,  Action  Walking  ations  Angle.  Speed  tion.  Steering  Distance.  Position  sumption,  and  Planner:  entation,  Controller:  Velocities  locities,



                           locomotion  Space  Pose.  Leg  Angles  Swing  Fre- Extensions,  Amplitude,  Height.  Joint  Leg Swing Angles and  Angles  Swing  and  Extensions,  Scales.  Com-  Duration.  Motor  PMTG  Posi-  End-Point  Candidate  Controller:  Joint






                           quadrupedal  Action  Desired  Leg  and  quency,  Walking  An-  Desired  State  Positions.  Extensions.  Leg  and  Phase  High-Level:  and  mand,  Low-Level:  Position,  Param.  Leg  and  tions.  Base  Planner:  Phase.  Con-  Desired  Positions.





                           DRL-based   Angu-  Base   (control  Velocity  TG.  the  of  Base  Gravity,  Positions,  Joint  Standing  History.  Velocity,  Locomo-  Height,  Base  Up).  Orienta-  (8-dim),  Angular  Base  Positions.  Position  PMTG  Low-Level:  Angu-  Base  Com-  Latent  Velocities  Orientation.  Elevation,  Torques,  Joint  Variables.  (Current&Next),  Velocities.





                           in    Space  (2-dim), lar Velocities (2-dim), and Motor  (8-dim).  Orientation, Base Angular Veloc-  Desired  Phase  and  Velocity,  and  Linear  (Self-Righting).  Command,  (Standing  Space  Angles  and  (2-dim),  (2-dim).  BaseLinearVelocity, Orientation  Motor  and  Base  Orientation,  and  Velocities,  Angles,  and  Terrain  Velocity,  Phase  Phase  and  Angles
                           publications  State  Orientation  Angles  ities,  input),  Self-Righting:  gular  Velocities  Base  Up:  Space  tion:  State  Motor  tion  Velocities  (3-dim),  High-Level:  Orientation.  State,  lar  mand.  Joint  Torques,  Planner:  and  State  State,  Feet  troller:  Joint





                           relevant  Algorithm  PPO  PPO            +  TRPO  GAE [22]  SAC      CEM  +       ARS          PPO         TRPO,  GAE



                           most                                                                 MPC                                   PPO,


                           the         loco-  algo-  be-  pro-    to  us-                                  au-              in        com-
                           of          agile  DRL  learning  that  knowl-  approach  maneuvers  Entropy  minimal  neural  framework  from  to  complex  learn  to  behaviors  terrain-  which
                           Classification  design  leveraging  for  PMTG  using  prior  and  DRL  recovery  controller.  Max.  requiring  learn  to  tuning  policies.  RL  locomotion only4.5minutesofdatacollected  robot.  framework  decompose  tasks.  effective quadrupedal walking using DRL,  these  training  for  locomotion, binesModel-BasedPlanningand






                           1.    Description  to  system  by  architecture  by  memory  model-free  hierarchical  a  sample-efficient  algorithm  model-based  learning  quadruped  hierarchical  tomatically  locomotion  is  realize  technique
                           Table       A  motion  rithms.  An  haviors  vides  edge.  A  control  ing  A  RL  per-task  network  A  for  a  on  A  kMPs  and  Stoch.  A  aware  RL.




                                 Year     2018       2018            2019            2019       2019         2019         2019         2019
                                 Pub.     RSS        CoRL            ArXiv           RSS        CoRL         IROS         ICRA         ICRA



                                        Locomo-     Gener-        a  for  Rein-    Reinforce-               Learning    Loco-  Kinematic  of  Control  Rein-



                                        Agile  Robots [39]  Trajectory  Controller  Deep  using  Deep          Locomotion [46]  Quadruped  through  (kMPs) [55]  and  Deep  using

                                        Learning  Quadruped  Modulating  Recovery  Robot  Learning [48]  via  Walk  DataEfficientReinforcementLearning  Robots [99]  Reinforcement  Learned  Behaviors  Primitives  Planning  Gaits  Learning [100]
                        APPENDIX  Publication  Sim-to-Real:  For  tion  Policies  ators [49]  Robust  Quadrupedal  forcement  to  Learning  Learning [98]  ment  Legged  for  Hierarchical  Quadruped  for  Realizing  motion  Motion  DeepGait:  Quadrupedal  forcement
   92   93   94   95   96   97   98   99   100   101   102