Page 77 - Read Online

P. 77

Harib et al. Intell Robot 2022;2(1):37-71 https://dx.doi.org/10.20517/ir.2021.19 Page 63

have only seen flat terrain.

An overview of the connection of the above-mentioned work is presented in Table 6. Some basic problems
are listed in the table, and each paper’s approach is presented and categorized based on observation and
action space, reward shaping and algorithm types.

Although DRL-based robotic manipulation control algorithms have proliferated in recent years, the issues
of acquiring robust and diverse manipulation abilities for robots using DRL have yet to be properly
overcome for real-world applications.

4.3. Summary
Over the last several years, the robotics community has been progressively using RL and DRL-based
algorithms to manage complicated robots or multi-robot systems, as well as to give end-to-end policies
from perception to control. Since both algorithms base their knowledge acquisition on trial-and-error, they
naturally require a large number of episodes, which limits the learning in terms of time and experience
variability in real-world scenarios. In addition, the real-world experience must consider the potential
dangers or unexpected behaviors of the considered robot, especially when it comes to safety-critical
applications. Even though there are some successful real applications to DRL in robotics, especially with
tasks involving object manipulations [182,183] , the success of its algorithms beyond the simulated worlds is fairly
limited. Transferring DRL policies from simulation environments to reality, referred to as “sim-to-real”, is a
necessary step toward more complex robotic systems that have DL-defined controllers. This has led to an
increase in research in “sim-to-real” transfer, which resulted in many publications over the past few years.

Another angle that we see crucial for robotics applications is local vs. global learning. For instance, when
humans learn a new task, like walking, they automatically build upon the previously learned skill in order to
learn a new one, like running, which becomes significantly easier. It is essential to reuse other locally
learned information from past data sets. When it comes to robot RL/DRL, the publicity of the making of
such data sets with many skills should be available and accessible to everyone in robotic research, which
would be considered a huge asset. When it comes to reward shaping, RL approaches have significantly
benefited from it by using rewards that convey closeness and are not only based on binary success or failure.
For robotics, it is challenging to shape such a reward design, hence, it would be optimal if the reward-
shaping is physically motivated, like for instance, minimizing the torques while achieving a task.

5. CONCLUSION
In this review paper, we have surveyed the evolution of adaptive learning for nonlinear dynamic systems. In
an initial step, after we introduced adaptive controllers and the modification techniques to overcome
bounded disturbances, we have concluded that adaptive controllers have proven their effectiveness,
especially in the processes that can be modeled linearly with slowly time-varying parameters relative to the
system’s dynamics. However, they do not provide stability for systems where parameter dynamics are at
least the same magnitude as the system’s dynamics.

In an evolutionary manner, AI-based techniques have emerged to improve the controller robustness. Newer
methods, such as fuzzy logic and NNs were introduced. Essentially, these methods approximate a nonlinear
function and provide a good representation of the nonlinear unknown plant, although it is typically used as
a model-free controller. The plant is treated as a “black box”, with input and output data gathered and
trained on. The AI framework addresses the plant’s model after the training phase, and can handle the plant
with practically no need for a mathematical model. It is feasible to build the complete algorithm using AI

72 73 74 75 76 77 78 79 80 81 82