Page 67 - Read Online
P. 67

Page 62                              Harib et al. Intell Robot 2022;2(1):37-71  https://dx.doi.org/10.20517/ir.2021.19

































                              Figure 11. DRL scheme for Robotic manipulation control. DRL: Deep reinforcement learning.

               minutes of interaction time.


               The concept of imitation learning became very popular for robotic manipulation, since relying on learning
               from trial and error required a significant amount of system interaction time if based solely on DRL
                         [177]
                                                                                 [178]
               approaches . In 2018, an interesting approach was proposed by Vecerik et al.  combining both imitation
               learning and task-reward-based learning, which improved the agent’s abilities in simulation. The approach
               was based on an extension of Deep Deterministic Policy Gradient (DDPG) algorithm for tasks with sparse
               rewards. Unfortunately, in real robot experiments, the location of the object, as well as the explicit states of
               joints like position and velocity, must be specified, which limits the approach’s applications to high-
                              [179]
               dimensional data .

               In 2017, Andrychowicz et al.  proposed Hindsight Experience Replay as a novel technique that provides
                                        [180]
               for sample-efficient learning from sparse and binary rewards, avoiding the need for complex reward
               engineering. It may be used in conjunction with any off-policy RL algorithm to create an implicit
               curriculum.

               In October 2021, AI researchers at Stanford University presented a new technique called deep evolutionary
               reinforcement learning, or DERL . The new method employs a sophisticated virtual environment as well
                                           [181]
               as RL to develop virtual agents that can change their physical form as well as their learning abilities. The
               discoveries might have far-reaching ramifications for AI research in general and robotics research in
               particular in the future. Each agent in the DERL architecture employs DRL to gain the abilities it needs to
               achieve its objectives throughout the course of its existence. MuJoCo, a virtual environment that enables
               very accurate rigid-body physics modeling, was employed by the researchers to create their framework.
               Universal Animal is their design space, and the objective is to construct morphologies that can master
               locomotion and item manipulation tasks in a range of terrains. The developed agents were put through their
               paces in eight various tasks, including patrolling, fleeing, manipulating items, and exploring. Their findings
               reveal that AI agents who have developed in different terrains learn and perform better than AI agents who
   62   63   64   65   66   67   68   69   70   71   72