Page 90 - Read Online
P. 90

Zhang et al. Intell Robot 2022;2(3):275­97  I http://dx.doi.org/10.20517/ir.2022.20  Page 283

               problem to a certain extent, there have been recent DRL studies based on motion priors [86–90] , which have
               been successfully applied to quadrupedal locomotion tasks [12,56,91] . However, the variety of motion priors in
               these studies is insufficient, and the robot’s behavior is not agile and natural. This makes it difficult for robots
               to cope with complex and unstructured natural environments. Improving the diversity of motion priors is
               also an interesting direction in quadrupedal locomotion research. On the other hand, there is currently a lack
               of general real-world legged motion skills datasets and benchmarks, which would have significant value for
               DRL-based quadrupedal locomotion research. If many real-world data were available, we could study and
               verify offline RL [92]  algorithms for quadrupedal locomotion. The main feature of offline RL algorithms is that
               the robot does not need to interact with the environment during the training phase, so we can bypass the
               notorious reality gap problem.


               4.2.3. Large-scale pre-training of DRL models
               The pre-training and fine-tuning paradigms for new tasks have emerged as simple yet effective solutions in
               supervisedandself-supervisedlearning. Pre-trainedDRL-basedmodelsenablerobotstorapidlyandefficiently
               acquire new skills and respond to non-stationary complex environments. Meta-learning methods seem to be
               a popular solution for improving the generalization (adaptation) performance of robots to new environments.
               However, current meta-reinforcement learning algorithms are limited to simple environments with narrow
               task distributions [93–96] . A recent study showed that multi-task pre-training with fine-tuning on new tasks
               performs as well as or better than meta-pre-training with meta test-time adaptation [97] . Research considering
               large-scale pre-trained models in quadrupedal locomotion research is still in its infancy and needs further
               exploration. Furthermore, this direction is inseparable from the motor skills dataset mentioned above, but it
               focuses more on large-scale pre-training of DRL-based models and online fine-tuning for downstream tasks.



               5. CONCLUSIONS
               In the past few years, there have been some breakthroughs in quadrupedal locomotion research. However, due
               tothelimitationsofalgorithmsandhardware,thebehaviorofrobotsisstillnotagileandintelligent. Thisreview
               provides a comprehensive survey of several DRL algorithms in this field. We first introduce basic concepts and
               formulations,andthencondenseopenproblemsintheliterature. Subsequently,wesortoutpreviousworksand
               summarize the algorithm design and core components in detail, which includes DRL algorithms, simulators,
               hardware platforms, observation and action space design, reward function design, prior knowledge, solution
               of reality gap problems, etc. While this review considers as many factors as possible in systematically collating
               the relevant literature, there are still many imperceptible factors that may affect the performance of DRL-based
               control policies in real-world robotics tasks. Finally, we point out future research directions around open
               questions to drive important research forward.



               DECLARATIONS

               Authors’ contributions
               Made substantial contributions to conception and design of the study and performed data analysis and inter-
               pretation: Zhang H, Wang D
               Performed data acquisition, as well as provided administrative, technical, and material support: He L


               Availability of data and materials
               Please refer to Table 1 and Table 2 in the appendix.
   85   86   87   88   89   90   91   92   93   94   95