Page 65 - Read Online
P. 65

Page 60                              Harib et al. Intell Robot 2022;2(1):37-71  https://dx.doi.org/10.20517/ir.2021.19

                                                        [149]
               RL into robotics and manipulation. Kober et al.  conducted a comprehensive review of RL in robotics in
               2013. They provide a reasonably comprehensive overview of “Real” Robotic RL and mention the most
               innovative studies, which are organized by significant findings.


               In the last 15 years or so, the use of RL in robots has continuously risen. An overview of the RL-based
               implementation in robots’ control is shown in Table 5 [150-172] , where each of the undermentioned papers has
               been categorized based on the nature of their approach.


               A stacked Q-learning technique for a robot interacting with its surroundings was introduced by Digney .
                                                                                                      [150]
                                                   [151]
               In an inverted pole-balancing issue, Schaal  employed RL for robot learning. For compliance tasks, Kuan
               and Young  developed an RL-based mechanism in conjunction with a robust sliding mode impedance
                         [152]
               controller, which they evaluated in simulation. To cope with the variation in the different compliance tasks,
               they apply an RL-based method in their research. Bucak and Zohdy [153,154]  proposed an RL-based control
               strategy for one and two link robots in 1999 and 2001. Althoefer et al.  used RL to attain motion and avoid
                                                                         [155]
               obstacles in a Fuzzy rule-based system for a robot manipulator. Q-learning for robot control was
               investigated by Gaskett . For a mobile robot navigation challenge, Smart and Kaelbling also opted for an
                                   [156]
                                [157]
               RL-based approach . For optimal control of a musculoskeletal-type robot arm with two joints and six
               muscles, Izawa et al.  used an RL actor-critic framework. For an optimum reaching task, they employed
                                [158]
               the proposed technique. RL approaches in humanoid robots are characterized, by Peters et al. , as greedy
                                                                                               [159]
               methods, “vanilla” policy gradient methods, and natural gradient methods. They highly encourage the
               adoption of a natural gradient approach to control humanoid robots, because natural-actor-critic (NAC)
               structures converge fast and are better suited to high-dimensional systems like humanoid robots. They have
               proposed a number of different ways to design RL-based control systems for humanoid robots. An
                                                                      [160]
                                                                                        [161]
               expansion of this study was given in 2009 by Bhatnagar et al. . Theodorou et al.  employed RL for
               optimal control of arm kinematics. NAC applications in robotics were presented by Peters and Schaal .
                                                                                                      [162]
               For the estimate, the NAC employs the natural gradient approach. Other works presented here [163-165]  go into
               greater depth on actor-critic based RL in robots. Buchli et al.  propose RL for variable impedance
                                                                       [166]
               management methods based on policy improvement using a route integral approach. Only simulations were
               used to illustrate the efficiency of the suggested method. Theodorou et al.  used a robot dog to evaluate RL
                                                                            [167]
               based on policy improvement using path integral . RL-based control for robot manipulators in uncertain
                                                         [168]
                                                     [169]
               circumstances was given by Shah and Gopal . Kim et al. [170,171]  applied an RL-based method to determine
               acceptable compliance for various scenarios by interaction with the environment. The usefulness of
               Kim et al. [170,171] ’s RL-based impedance learning technique has been demonstrated in simulations.

               For a robot goalkeeper and inverted pendulum examples, Adam et al.  proposed a very interesting article
                                                                          [172]
               on the experimental implementation of experience replay Q-learning and experience replay SARSA
               approaches. In this form of RL scheme, the data obtained during the online learning process is saved and
               fed back to the RL system continuously . The results are encouraging, albeit the implementation method
                                                 [172]
               may not be appropriate for all actual systems, as the exploration phase indicates very irregular, nearly
               unstable behavior, which might harm a more delicate plant.


               It is worth noting that several of the RL systems outlined above are conceptually well-developed, with
               convergence proofs available. However, there is still much work to be done on RL, and real-time
               implementations of most of these systems are still a great difficulty. Furthermore, adequate benchmark
               challenges  are required to test newly created or improved RL algorithms.
                        [173]
   60   61   62   63   64   65   66   67   68   69   70