Page 54 - Read Online
P. 54
Page 40 Harib et al. Intell Robot 2022;2(1):37-71 https://dx.doi.org/10.20517/ir.2021.19
which are represented by action values. Q-learning is a popular type of RL in which the best policy is
learned implicitly as a Q-function. There have been several publications on the use of RL in robotic systems.
This review is organized into 3 sections besides the present introductory chapter and a concluding section.
In section 2, we talk about adaptive control limitations for nonlinear systems and introduce the probable
drawbacks in the presence of disturbances. We also present the main modifications proposed in the 1980s to
overcome these limitations. Section 3 will focus on presenting the work that implemented NNs in nonlinear
dynamical systems and particularly in robotics, while we cite some work that targeted the inverted
pendulum control problem using NNs. Finally, section 4 emphasizes the previous research concerning RL
and Deep RL (DRL) based control problems and their implementation in robotics manipulation, while
highlighting some of their major drawbacks in the field.
2. ADAPTIVE CONTROL LIMITATIONS - BOUNDED DISTURBANCES
Given a system/plant with an uncertainty set, it is clear that the control objective will be intuitively
achievable through either identification, robust control, or a combination of both as in adaptation. The
identification is the capability to acquire information in reducing uncertainty. This problem
characterization had seen some rigorous analysis over a long period of time and can become very
challenging. In 2001, Wang and Zhang explored some fundamental limitations of robust and adaptive
[12]
control by employing a basic first-order linear time-varying system as a vehicle. One can notice that robust
control cannot deal with much uncertainty, while the use of adaptive control shows a much better capability
of dealing with uncertain parameters and providing better robustness. However, adaptive control requires
additional information on parameter evolution and is fundamentally limited to slowly time-varying
systems. Furthermore, adaptation is not capable of achieving proximity to the nominal performance when
under near-zero variation rates.
2.1. Problem statement
The design of adaptive control laws is always under the assumption that system dynamics are exactly
specified by models. Hence, when the true plant dynamics is not perfectly described by any model, as
expected from a practice point of view, one can only question the real behavior of the control. The robust
stability, required for any adaptive control to achieve practical applicability of the algorithms, can be
provided when only the modeling error is sufficiently “small”. Unfortunately, stability alone cannot
guarantee robustness, since the modeling error appears as a disturbance and usually causes divergence of
the adaptive process.
While one of the fundamental fields of application of adaptive control is in systems with unknown time-
varying parameters, the algorithms have been proved robust, in the presence of noise and bounded
[13]
disturbances, only for systems with constant parameters . Ideally, when there are no disturbances or noise
and when parameters are constant, adaptation shows smooth convergence and stability properties. On the
other hand, the adaptive laws are not robust in the presence of bounded disturbances, noise and time-
varying parameters.
In order to mathematically state the problem of non-robustness of adaptive control to bounded
disturbances, let us start by considering a MIMO system in the form [14,15] ,