Page 77 - Read Online
P. 77

Harib et al. Intell Robot 2022;2(1):37-71  https://dx.doi.org/10.20517/ir.2021.19      Page 63

               have only seen flat terrain.


               An overview of the connection of the above-mentioned work is presented in Table 6. Some basic problems
               are listed in the table, and each paper’s approach is presented and categorized based on observation and
               action space, reward shaping and algorithm types.


               Although DRL-based robotic manipulation control algorithms have proliferated in recent years, the issues
               of acquiring robust and diverse manipulation abilities for robots using DRL have yet to be properly
               overcome for real-world applications.

               4.3. Summary
               Over the last several years, the robotics community has been progressively using RL and DRL-based
               algorithms to manage complicated robots or multi-robot systems, as well as to give end-to-end policies
               from perception to control. Since both algorithms base their knowledge acquisition on trial-and-error, they
               naturally require a large number of episodes, which limits the learning in terms of time and experience
               variability in real-world scenarios. In addition, the real-world experience must consider the potential
               dangers or unexpected behaviors of the considered robot, especially when it comes to safety-critical
               applications. Even though there are some successful real applications to DRL in robotics, especially with
               tasks involving object manipulations [182,183] , the success of its algorithms beyond the simulated worlds is fairly
               limited. Transferring DRL policies from simulation environments to reality, referred to as “sim-to-real”, is a
               necessary step toward more complex robotic systems that have DL-defined controllers. This has led to an
               increase in research in “sim-to-real” transfer, which resulted in many publications over the past few years.

               Another angle that we see crucial for robotics applications is local vs. global learning. For instance, when
               humans learn a new task, like walking, they automatically build upon the previously learned skill in order to
               learn a new one, like running, which becomes significantly easier. It is essential to reuse other locally
               learned information from past data sets. When it comes to robot RL/DRL, the publicity of the making of
               such data sets with many skills should be available and accessible to everyone in robotic research, which
               would be considered a huge asset. When it comes to reward shaping, RL approaches have significantly
               benefited from it by using rewards that convey closeness and are not only based on binary success or failure.
               For robotics, it is challenging to shape such a reward design, hence, it would be optimal if the reward-
               shaping is physically motivated, like for instance, minimizing the torques while achieving a task.

               5. CONCLUSION
               In this review paper, we have surveyed the evolution of adaptive learning for nonlinear dynamic systems. In
               an initial step, after we introduced adaptive controllers and the modification techniques to overcome
               bounded disturbances, we have concluded that adaptive controllers have proven their effectiveness,
               especially in the processes that can be modeled linearly with slowly time-varying parameters relative to the
               system’s dynamics. However, they do not provide stability for systems where parameter dynamics are at
               least the same magnitude as the system’s dynamics.

               In an evolutionary manner, AI-based techniques have emerged to improve the controller robustness. Newer
               methods, such as fuzzy logic and NNs were introduced. Essentially, these methods approximate a nonlinear
               function and provide a good representation of the nonlinear unknown plant, although it is typically used as
               a model-free controller. The plant is treated as a “black box”, with input and output data gathered and
               trained on. The AI framework addresses the plant’s model after the training phase, and can handle the plant
               with practically no need for a mathematical model. It is feasible to build the complete algorithm using AI
   72   73   74   75   76   77   78   79   80   81   82