Page 65 - Read Online
P. 65
Page 60 Harib et al. Intell Robot 2022;2(1):37-71 https://dx.doi.org/10.20517/ir.2021.19
[149]
RL into robotics and manipulation. Kober et al. conducted a comprehensive review of RL in robotics in
2013. They provide a reasonably comprehensive overview of “Real” Robotic RL and mention the most
innovative studies, which are organized by significant findings.
In the last 15 years or so, the use of RL in robots has continuously risen. An overview of the RL-based
implementation in robots’ control is shown in Table 5 [150-172] , where each of the undermentioned papers has
been categorized based on the nature of their approach.
A stacked Q-learning technique for a robot interacting with its surroundings was introduced by Digney .
[150]
[151]
In an inverted pole-balancing issue, Schaal employed RL for robot learning. For compliance tasks, Kuan
and Young developed an RL-based mechanism in conjunction with a robust sliding mode impedance
[152]
controller, which they evaluated in simulation. To cope with the variation in the different compliance tasks,
they apply an RL-based method in their research. Bucak and Zohdy [153,154] proposed an RL-based control
strategy for one and two link robots in 1999 and 2001. Althoefer et al. used RL to attain motion and avoid
[155]
obstacles in a Fuzzy rule-based system for a robot manipulator. Q-learning for robot control was
investigated by Gaskett . For a mobile robot navigation challenge, Smart and Kaelbling also opted for an
[156]
[157]
RL-based approach . For optimal control of a musculoskeletal-type robot arm with two joints and six
muscles, Izawa et al. used an RL actor-critic framework. For an optimum reaching task, they employed
[158]
the proposed technique. RL approaches in humanoid robots are characterized, by Peters et al. , as greedy
[159]
methods, “vanilla” policy gradient methods, and natural gradient methods. They highly encourage the
adoption of a natural gradient approach to control humanoid robots, because natural-actor-critic (NAC)
structures converge fast and are better suited to high-dimensional systems like humanoid robots. They have
proposed a number of different ways to design RL-based control systems for humanoid robots. An
[160]
[161]
expansion of this study was given in 2009 by Bhatnagar et al. . Theodorou et al. employed RL for
optimal control of arm kinematics. NAC applications in robotics were presented by Peters and Schaal .
[162]
For the estimate, the NAC employs the natural gradient approach. Other works presented here [163-165] go into
greater depth on actor-critic based RL in robots. Buchli et al. propose RL for variable impedance
[166]
management methods based on policy improvement using a route integral approach. Only simulations were
used to illustrate the efficiency of the suggested method. Theodorou et al. used a robot dog to evaluate RL
[167]
based on policy improvement using path integral . RL-based control for robot manipulators in uncertain
[168]
[169]
circumstances was given by Shah and Gopal . Kim et al. [170,171] applied an RL-based method to determine
acceptable compliance for various scenarios by interaction with the environment. The usefulness of
Kim et al. [170,171] ’s RL-based impedance learning technique has been demonstrated in simulations.
For a robot goalkeeper and inverted pendulum examples, Adam et al. proposed a very interesting article
[172]
on the experimental implementation of experience replay Q-learning and experience replay SARSA
approaches. In this form of RL scheme, the data obtained during the online learning process is saved and
fed back to the RL system continuously . The results are encouraging, albeit the implementation method
[172]
may not be appropriate for all actual systems, as the exploration phase indicates very irregular, nearly
unstable behavior, which might harm a more delicate plant.
It is worth noting that several of the RL systems outlined above are conceptually well-developed, with
convergence proofs available. However, there is still much work to be done on RL, and real-time
implementations of most of these systems are still a great difficulty. Furthermore, adequate benchmark
challenges are required to test newly created or improved RL algorithms.
[173]