Page 66 - Read Online

P. 66

Harib et al. Intell Robot 2022;2(1):37-71 https://dx.doi.org/10.20517/ir.2021.19 Page 61

Table 5. RL-based control in robotic control - an overview
Approach Employed by…
Q-learning Digney [150] (1996), Gaskett [156] (2002), Shah and Gopal [169] (2009) and Adam et al. [172] (2012)
[158] [161]
Optimal control/bio-mimetic Izawa et al. (2002) and Theodorou et al. (2007)
learning
[163] [159] [162] [164]
NAC Atkeson and Schaal (1997), Peters et al. (2003), Peters and Schaal (2008), Hoffmann et al. (2008)
[165]
and Peters and Schaal (2008)
[151]
Inverted pole-balancing Schaal (1996) and Adam et al.[172] (2012)
[152] [166]
Impedance control Kuan and Young (1998) and Buchli et al. (2010)
Fuzzy rule-based system Althoefer et al. [155] (2001)
[157]
Navigation challenge Smart and Kaelbling (2002)
Route integral control Buchli et al. [166] (2010)
[167]
Path integral Theodorou et al. (2010)
RL: Reinforcement learning; NAC: natural-actor-critic.

4.2. Deep reinforcement learning for robotic manipulation control
In 2012, deep learning (DL) achieved its first major breakthrough with a CNN for classification . It
[174]
iteratively trains the parameters using loss computation and BP using hundreds of thousands of data-label
pairs. Although this approach has developed steadily since its inception and is currently one of the most
widely used DL structures, it is not ideal for robotic manipulation control because obtaining a large number
of pictures of joint angles with labeled data to train the model is too time-consuming. CNN has been used
in several studies to learn the motor torques required to drive a robot using raw RGB video pictures .
[175]
However, as we will see later, employing deep reinforcement learning (DRL) is a more promising and
fascinating notion.

In the context of robotic manipulation control, the purpose of DRL is to train a deep policy NN, such as the
one shown in Figure 10, to discover the best command sequence for completing the job. The present state,
as shown in Figure 11, is the input, which can comprise the angles of the manipulator’s joints, the location
of the end effector, and their derivative information, such as velocity and acceleration. Furthermore, the
current posture of target objects, as well as the status of relevant sensors if any are present in the
surroundings, can be tallied in the current state. The policy network’s output is an action that specifies
which control instructions, such as torques or velocity commands, should be applied to each actuator. A
positive reward will be produced when the robotic manipulator completes a job. The algorithm is supposed
to discover the best successful control method for robotic manipulation using these delayed and weak data.

The study of sample efficiency for supervised deep learning determines the scale of the training set required
in learning. Consequently, even though it is more challenging than supervised deep learning, the study of
sample efficiency for DRL in robotic control provides how much data is needed to build an optimal policy.
The first demonstration of using DRL on a robot was in 2015, when Levine et al. applied trajectory
[176]
optimization techniques and policy search methods with NNs to accomplish a practical sample efficient
learning. They employ a recently developed policy search approach to learn a variety of dynamic
manipulation behaviors with very broad policy representations, without requiring known models or
example demonstrations in this study. This method uses repeatedly refitted time-varying linear models to
train a collection of trajectories for the desired motion skill, and then unifies these trajectories into a single
control policy that can generalize to new scenarios. Some modifications are needed in order to lower the
sample count and automate parameter selection to enable this technique to run on a real robot. Finally, this
approach has proven that the learning of robust controllers for complexity is possible, which did achieve
various compound tasks such as stacking tight-fitting Lego blocks and putting together a toy airplane after

61 62 63 64 65 66 67 68 69 70 71