Page 72 - Read Online
P. 72

Page 58                              Harib et al. Intell Robot 2022;2(1):37-71  https://dx.doi.org/10.20517/ir.2021.19



























                                         Figure 9. Universal model of RL. RL: Reinforcement learning.

               the other hand, is well known for online system identification and control. Adaptive control, on the other
               hand, is not necessarily optimal and may not be appropriate for applications such as humanoid
               robots/service robots, where optimality is essential. Furthermore, robots that will be employed in a human
               setting must be able to learn over time and create the best biomechanical and robotics solutions possible
               while coping with changing dynamics. Optimality in robotics might be defined as the use of the least
               amount of energy or the application of the least amount of force to the environment during physical
               contact. Aspects of safety, such as joint or actuator restrictions, can also be included in the cost function.


               4.1. Reinforcement learning for robotic control
               The reinforcement learning (RL) domain of robotics differs significantly from the majority of well-studied
               RL benchmark issues. In robotics, assuming that the true state is totally visible and noise-free is typically
               impractical. The learning system will have no way of knowing which state it is in, and even very dissimilar
               states may appear to be quite similar. As a result, RL in robots is frequently represented as a partially
               observed system. Consequently, the learning system must approximate the real state using filters.
               Experience with an actual physical system is time-consuming, costly, and difficult to duplicate. Because each
               trial run is expensive, such applications drive us to concentrate on issues that do not surface as frequently in
               traditional RL benchmark instances. Appropriate approximations of state, policy, value function, and/or
               system dynamics must be introduced in order to learn within a tolerable time period. While real-world
               experience is costly, it can typically not be substituted solely by simulation learning. Even little modeling
               flaws in analytical or learned models of the system might result in significantly divergent behavior, at least
               for highly dynamic jobs. As a result, algorithms must be resistant to under-modeling and uncertainty.

               Another issue that arises frequently in robotic RL is generating appropriate reward functions. To cope with
               the expense of real-world experience, rewards that steer the learning system fast to success are required.
                                                                                                      [148]
               This problem is known as reward shaping, and it requires a significant amount of manual contribution .
               In robotics, defining excellent reward functions necessitates a substantial degree of domain expertise and
               can be difficult in practice.


               Not all RL methods are equally appropriate for robotics. Indeed, many of the methods used to solve
               complex issues thus far have been model-based, and robot learning systems frequently use policy search
   67   68   69   70   71   72   73   74   75   76   77