Page 87 - Read Online
P. 87

Page 280                        Zhang et al. Intell Robot 2022;2(3):275­97  I http://dx.doi.org/10.20517/ir.2022.20


























               Figure 4. The key components of the DRL-based controller design from the classification result of the most relevant publications. Tables 1
               and 2 in the Appendix provide a completed summary.


               conferences in the machine learning field.

               3.5. State, action, reward, and others
               State, action, and reward are integral and important components for training controllers. The design of these
               components will directly affect the performance of the controller. However, there is no fully unified standard
               and method for the specific design.


               For the design of state space, on the one hand, considering too few observations can lead to a partially ob-
               servable controller. On the other hand, providing all available readings results in a brittle controller that is
               overfitted to the simulation environment. Both affect the performance of the controller in the real machine,
               so researchers can only make trade-offs based on practical problems. In current research works, for simple
               tasks (walking, turning on flat ground, etc.), proprioception alone (base orientation, angular velocity, joint po-
               sition and velocity, etc.) is sufficient to solve the problem [10,39,41] . For more complex tasks (walking on uneven
               ground, climbing stairs or hillsides, avoiding obstacles, etc.), exteroception, such as visual information, needs
               to be introduced [8,13,42] . Adding additional sensors alleviates the partial observation issues to some extent.


               Most researchers use the desired joint positions (residuals) as the action space and then calculate the torque
               through a PD controller to control the robot locomotion. Early studies [43]  experimentally demonstrated that
               controllers with such action space can achieve better performance. However, recent studies also attempt to use
               lower-level control commands to obtain highly dynamic motion behavior to avoid the use of PD controllers
               and control torque directly [44] . Although the current DRL-based controllers have achieved outstanding per-
               formance [6–8] , their stability is still not as good as the common control methods, such as MPC controllers [45] .
               The force–position hybrid control method adopted by MPC is worthy of reference and further research. Fur-
               thermore, in some studies based on hierarchical DRL, the latent commands serve as the action space of the
               high-level policy to guide the behavior of low-level policies [46,47] .


               In general, the design of the reward function is fairly laborious, especially for complex systems such as robots.
               Small changes in the reward function hyperparameters have the potential to have a large impact on the final
               performanceofthecontroller. Inorderfortherobottocompletemorecomplextasks,therewardfunctionmust
               be designed with sufficient detail [6–8,48] . Some specific factors include the desired direction, base orientation,
               angular velocity, base linear velocity, joint position and velocity, foot contact states, policy output, and motor
   82   83   84   85   86   87   88   89   90   91   92