Page 45 - Read Online
P. 45

Page 40                                                               Harib et al. Intell Robot 2022;2(1):37-71  https://dx.doi.org/10.20517/ir.2021.19

               which are represented by action values. Q-learning is a popular type of RL in which the best policy is
               learned implicitly as a Q-function. There have been several publications on the use of RL in robotic systems.

               This review is organized into 3 sections besides the present introductory chapter and a concluding section.
               In section 2, we talk about adaptive control limitations for nonlinear systems and introduce the probable
               drawbacks in the presence of disturbances. We also present the main modifications proposed in the 1980s to
               overcome these limitations. Section 3 will focus on presenting the work that implemented NNs in nonlinear
               dynamical systems and particularly in robotics, while we cite some work that targeted the inverted
               pendulum control problem using NNs. Finally, section 4 emphasizes the previous research concerning RL
               and Deep RL (DRL) based control problems and their implementation in robotics manipulation, while
               highlighting some of their major drawbacks in the field.


               2. ADAPTIVE CONTROL LIMITATIONS - BOUNDED DISTURBANCES
               Given a system/plant with an uncertainty set, it is clear that the control objective will be intuitively
               achievable through either identification, robust control, or a combination of both as in adaptation. The
               identification  is  the  capability  to  acquire  information  in  reducing  uncertainty.  This  problem
               characterization had seen some rigorous analysis over a long period of time and can become very
               challenging. In 2001, Wang and Zhang  explored some fundamental limitations of robust and adaptive
                                                 [12]
               control by employing a basic first-order linear time-varying system as a vehicle. One can notice that robust
               control cannot deal with much uncertainty, while the use of adaptive control shows a much better capability
               of dealing with uncertain parameters and providing better robustness. However, adaptive control requires
               additional information on parameter evolution and is fundamentally limited to slowly time-varying
               systems. Furthermore, adaptation is not capable of achieving proximity to the nominal performance when
               under near-zero variation rates.


               2.1. Problem statement
               The design of adaptive control laws is always under the assumption that system dynamics are exactly
               specified by models. Hence, when the true plant dynamics is not perfectly described by any model, as
               expected from a practice point of view, one can only question the real behavior of the control. The robust
               stability, required for any adaptive control to achieve practical applicability of the algorithms, can be
               provided when only the modeling error is sufficiently “small”. Unfortunately, stability alone cannot
               guarantee robustness, since the modeling error appears as a disturbance and usually causes divergence of
               the adaptive process.


               While one of the fundamental fields of application of adaptive control is in systems with unknown time-
               varying parameters, the algorithms have been proved robust, in the presence of noise and bounded
                                                              [13]
               disturbances, only for systems with constant parameters . Ideally, when there are no disturbances or noise
               and when parameters are constant, adaptation shows smooth convergence and stability properties. On the
               other hand, the adaptive laws are not robust in the presence of bounded disturbances, noise and time-
               varying parameters.


               In order to mathematically state the problem of non-robustness of adaptive control to bounded
               disturbances, let us start by considering a MIMO system in the form [14,15] ,
   40   41   42   43   44   45   46   47   48   49   50