Page 67 - Read Online
P. 67
Page 62 Harib et al. Intell Robot 2022;2(1):37-71 https://dx.doi.org/10.20517/ir.2021.19
Figure 11. DRL scheme for Robotic manipulation control. DRL: Deep reinforcement learning.
minutes of interaction time.
The concept of imitation learning became very popular for robotic manipulation, since relying on learning
from trial and error required a significant amount of system interaction time if based solely on DRL
[177]
[178]
approaches . In 2018, an interesting approach was proposed by Vecerik et al. combining both imitation
learning and task-reward-based learning, which improved the agent’s abilities in simulation. The approach
was based on an extension of Deep Deterministic Policy Gradient (DDPG) algorithm for tasks with sparse
rewards. Unfortunately, in real robot experiments, the location of the object, as well as the explicit states of
joints like position and velocity, must be specified, which limits the approach’s applications to high-
[179]
dimensional data .
In 2017, Andrychowicz et al. proposed Hindsight Experience Replay as a novel technique that provides
[180]
for sample-efficient learning from sparse and binary rewards, avoiding the need for complex reward
engineering. It may be used in conjunction with any off-policy RL algorithm to create an implicit
curriculum.
In October 2021, AI researchers at Stanford University presented a new technique called deep evolutionary
reinforcement learning, or DERL . The new method employs a sophisticated virtual environment as well
[181]
as RL to develop virtual agents that can change their physical form as well as their learning abilities. The
discoveries might have far-reaching ramifications for AI research in general and robotics research in
particular in the future. Each agent in the DERL architecture employs DRL to gain the abilities it needs to
achieve its objectives throughout the course of its existence. MuJoCo, a virtual environment that enables
very accurate rigid-body physics modeling, was employed by the researchers to create their framework.
Universal Animal is their design space, and the objective is to construct morphologies that can master
locomotion and item manipulation tasks in a range of terrains. The developed agents were put through their
paces in eight various tasks, including patrolling, fleeing, manipulating items, and exploring. Their findings
reveal that AI agents who have developed in different terrains learn and perform better than AI agents who