Page 66 - Read Online
P. 66

Harib et al. Intell Robot 2022;2(1):37-71  https://dx.doi.org/10.20517/ir.2021.19       Page 61

               Table 5. RL-based control in robotic control - an overview
                Approach           Employed by…
                Q-learning         Digney [150]  (1996), Gaskett [156]  (2002), Shah and Gopal [169]  (2009) and Adam et al. [172]  (2012)
                                          [158]               [161]
                Optimal control/bio-mimetic   Izawa et al.   (2002) and Theodorou et al.   (2007)
                learning
                                               [163]          [159]             [162]             [164]
                NAC                Atkeson and Schaal   (1997), Peters et al.   (2003), Peters and Schaal   (2008), Hoffmann et al.   (2008)
                                                [165]
                                   and Peters and Schaal   (2008)
                                       [151]
                Inverted pole-balancing  Schaal   (1996) and Adam et al.[172] (2012)
                                             [152]            [166]
                Impedance control  Kuan and Young   (1998) and Buchli et al.   (2010)
                Fuzzy rule-based system  Althoefer et al. [155]  (2001)
                                               [157]
                Navigation challenge  Smart and Kaelbling   (2002)
                Route integral control  Buchli et al. [166]  (2010)
                                             [167]
                Path integral      Theodorou et al.   (2010)
               RL: Reinforcement learning; NAC: natural-actor-critic.

               4.2. Deep reinforcement learning for robotic manipulation control
               In 2012, deep learning (DL) achieved its first major breakthrough with a CNN for classification . It
                                                                                                     [174]
               iteratively trains the parameters using loss computation and BP using hundreds of thousands of data-label
               pairs. Although this approach has developed steadily since its inception and is currently one of the most
               widely used DL structures, it is not ideal for robotic manipulation control because obtaining a large number
               of pictures of joint angles with labeled data to train the model is too time-consuming. CNN has been used
               in several studies to learn the motor torques required to drive a robot using raw RGB video pictures .
                                                                                                      [175]
               However, as we will see later, employing deep reinforcement learning (DRL) is a more promising and
               fascinating notion.

               In the context of robotic manipulation control, the purpose of DRL is to train a deep policy NN, such as the
               one shown in Figure 10, to discover the best command sequence for completing the job. The present state,
               as shown in Figure 11, is the input, which can comprise the angles of the manipulator’s joints, the location
               of the end effector, and their derivative information, such as velocity and acceleration. Furthermore, the
               current posture of target objects, as well as the status of relevant sensors if any are present in the
               surroundings, can be tallied in the current state. The policy network’s output is an action that specifies
               which control instructions, such as torques or velocity commands, should be applied to each actuator. A
               positive reward will be produced when the robotic manipulator completes a job. The algorithm is supposed
               to discover the best successful control method for robotic manipulation using these delayed and weak data.


               The study of sample efficiency for supervised deep learning determines the scale of the training set required
               in learning. Consequently, even though it is more challenging than supervised deep learning, the study of
               sample efficiency for DRL in robotic control provides how much data is needed to build an optimal policy.
               The first demonstration of using DRL on a robot was in 2015, when Levine et al.  applied trajectory
                                                                                       [176]
               optimization techniques and policy search methods with NNs to accomplish a practical sample efficient
               learning. They employ a recently developed policy search approach to learn a variety of dynamic
               manipulation behaviors with very broad policy representations, without requiring known models or
               example demonstrations in this study. This method uses repeatedly refitted time-varying linear models to
               train a collection of trajectories for the desired motion skill, and then unifies these trajectories into a single
               control policy that can generalize to new scenarios. Some modifications are needed in order to lower the
               sample count and automate parameter selection to enable this technique to run on a real robot. Finally, this
               approach has proven that the learning of robust controllers for complexity is possible, which did achieve
               various compound tasks such as stacking tight-fitting Lego blocks and putting together a toy airplane after
   61   62   63   64   65   66   67   68   69   70   71