Page 90 - Read Online

P. 90

Zhang et al. Intell Robot 2022;2(3):27597 I http://dx.doi.org/10.20517/ir.2022.20 Page 283

problem to a certain extent, there have been recent DRL studies based on motion priors [86–90] , which have
been successfully applied to quadrupedal locomotion tasks [12,56,91] . However, the variety of motion priors in
these studies is insufficient, and the robot’s behavior is not agile and natural. This makes it difficult for robots
to cope with complex and unstructured natural environments. Improving the diversity of motion priors is
also an interesting direction in quadrupedal locomotion research. On the other hand, there is currently a lack
of general real-world legged motion skills datasets and benchmarks, which would have significant value for
DRL-based quadrupedal locomotion research. If many real-world data were available, we could study and
verify offline RL [92] algorithms for quadrupedal locomotion. The main feature of offline RL algorithms is that
the robot does not need to interact with the environment during the training phase, so we can bypass the
notorious reality gap problem.

4.2.3. Large-scale pre-training of DRL models
The pre-training and fine-tuning paradigms for new tasks have emerged as simple yet effective solutions in
supervisedandself-supervisedlearning. Pre-trainedDRL-basedmodelsenablerobotstorapidlyandefficiently
acquire new skills and respond to non-stationary complex environments. Meta-learning methods seem to be
a popular solution for improving the generalization (adaptation) performance of robots to new environments.
However, current meta-reinforcement learning algorithms are limited to simple environments with narrow
task distributions [93–96] . A recent study showed that multi-task pre-training with fine-tuning on new tasks
performs as well as or better than meta-pre-training with meta test-time adaptation [97] . Research considering
large-scale pre-trained models in quadrupedal locomotion research is still in its infancy and needs further
exploration. Furthermore, this direction is inseparable from the motor skills dataset mentioned above, but it
focuses more on large-scale pre-training of DRL-based models and online fine-tuning for downstream tasks.

5. CONCLUSIONS
In the past few years, there have been some breakthroughs in quadrupedal locomotion research. However, due
tothelimitationsofalgorithmsandhardware,thebehaviorofrobotsisstillnotagileandintelligent. Thisreview
provides a comprehensive survey of several DRL algorithms in this field. We first introduce basic concepts and
formulations,andthencondenseopenproblemsintheliterature. Subsequently,wesortoutpreviousworksand
summarize the algorithm design and core components in detail, which includes DRL algorithms, simulators,
hardware platforms, observation and action space design, reward function design, prior knowledge, solution
of reality gap problems, etc. While this review considers as many factors as possible in systematically collating
the relevant literature, there are still many imperceptible factors that may affect the performance of DRL-based
control policies in real-world robotics tasks. Finally, we point out future research directions around open
questions to drive important research forward.

DECLARATIONS

Authors’ contributions
Made substantial contributions to conception and design of the study and performed data analysis and inter-
pretation: Zhang H, Wang D
Performed data acquisition, as well as provided administrative, technical, and material support: He L

Availability of data and materials
Please refer to Table 1 and Table 2 in the appendix.

85 86 87 88 89 90 91 92 93 94 95