Page 63 - Read Online
P. 63
Page 58 Harib et al. Intell Robot 2022;2(1):37-71 https://dx.doi.org/10.20517/ir.2021.19
Figure 9. Universal model of RL. RL: Reinforcement learning.
the other hand, is well known for online system identification and control. Adaptive control, on the other
hand, is not necessarily optimal and may not be appropriate for applications such as humanoid
robots/service robots, where optimality is essential. Furthermore, robots that will be employed in a human
setting must be able to learn over time and create the best biomechanical and robotics solutions possible
while coping with changing dynamics. Optimality in robotics might be defined as the use of the least
amount of energy or the application of the least amount of force to the environment during physical
contact. Aspects of safety, such as joint or actuator restrictions, can also be included in the cost function.
4.1. Reinforcement learning for robotic control
The reinforcement learning (RL) domain of robotics differs significantly from the majority of well-studied
RL benchmark issues. In robotics, assuming that the true state is totally visible and noise-free is typically
impractical. The learning system will have no way of knowing which state it is in, and even very dissimilar
states may appear to be quite similar. As a result, RL in robots is frequently represented as a partially
observed system. Consequently, the learning system must approximate the real state using filters.
Experience with an actual physical system is time-consuming, costly, and difficult to duplicate. Because each
trial run is expensive, such applications drive us to concentrate on issues that do not surface as frequently in
traditional RL benchmark instances. Appropriate approximations of state, policy, value function, and/or
system dynamics must be introduced in order to learn within a tolerable time period. While real-world
experience is costly, it can typically not be substituted solely by simulation learning. Even little modeling
flaws in analytical or learned models of the system might result in significantly divergent behavior, at least
for highly dynamic jobs. As a result, algorithms must be resistant to under-modeling and uncertainty.
Another issue that arises frequently in robotic RL is generating appropriate reward functions. To cope with
the expense of real-world experience, rewards that steer the learning system fast to success are required.
[148]
This problem is known as reward shaping, and it requires a significant amount of manual contribution .
In robotics, defining excellent reward functions necessitates a substantial degree of domain expertise and
can be difficult in practice.
Not all RL methods are equally appropriate for robotics. Indeed, many of the methods used to solve
complex issues thus far have been model-based, and robot learning systems frequently use policy search