A novel optimal trajectory tracking scheme is introduced for nonlinear continuous-time systems in strict feedback form with uncertain dynamics by using neural networks (NNs). The method employs an actor-critic-based NN backstepping technique for minimizing a discounted value function along with an identifier to approximate unknown system dynamics that are expressed in augmented form. Novel online weight update laws for the actor and critic NNs are derived by using both the NN identifier and Hamilton-Jacobi-Bellman residual error. A new continual lifelong learning technique utilizing the Fisher Information Matrix via Hamilton-Jacobi-Bellman residual error is introduced to obtain the significance of weights in an online mode to overcome the issue of catastrophic forgetting for NNs, and closed-loop stability is analyzed and demonstrated. The effectiveness of the proposed method is shown in simulation by contrasting the proposed with a recent method from the literature on an underactuated unmanned aerial vehicle, covering both its translational and attitude dynamics.

Optimal control of nonlinear dynamical systems with known and uncertain dynamics is an important field of study due to numerous practical applications. Traditional optimal control methods^{[1,2]} for nonlinear continuous-time (CT) systems with known dynamics often require the solution to a partial differential equation, referred to as Hamilton-Jacobi-Bellman (HJB) equation, which cannot be solved analytically. To address this challenge, actor-critic designs (ACDs) combined with approximate dynamic programming (ADP) have been proposed as an online method^{[3,4]}. Numerous optimal adaptive control (OAC) techniques for nonlinear CT systems using strict-feedback structure have emerged, leveraging backstepping design as outlined in^{[5,6]}. These approaches, however, require predefined knowledge of the system dynamics. In real-world industrial settings, where system dynamics might be partially or completely unknown, the application of neural network (NN)-based optimal tracking for uncertain nonlinear CT systems in strict feedback form has been demonstrated in^{[5,7]}, utilizing the policy/value iterations associated with ADP. However, these policy/value iteration methods often require an extensive number of iterations within each sampling period to solve the HJB equation and ascertain the optimal control input, leading to a significant computational challenge.

The optimal trajectory tracking of nonlinear CT systems involves obtaining a time-varying feedforward term to ensure precise tracking and a feedback term to stabilize the system dynamics. Recent optimal tracking efforts^{[7,8]}, have utilized a backstepping-based approach with completely known or partially unknown system dynamics, but the design of feedforward term while minimizing a cost function has not been addressed. Instead, a linear term is used to design the control input. A more recent study^{[8,9]} employed a positive function for obtaining simple weight update laws of the actor and critic NN, which also relaxes the persistency of excitation (PE) condition. However, finding such a function for the time-varying trajectory tracking problem of a nonlinear CT system will be challenging by using an explicit time-dependent value function and HJB equation at each stage of backstepping design since the Hamiltonian is nonzero along the optimal trajectory^{[10]}. In simplified and optimized backstepping control schemes were developed for a class of nonlinear strict feedback systems^{[8,11,12]}. These approaches are different from the one proposed in^{[5]}. However, they either require complete knowledge of the system dynamics or do not assume that the system dynamics are completely unknown.

Moreover, all control techniques rooted in NN-based learning, whether aimed at regulation or tracking, routinely face the issues of catastrophic forgetting^{[13]}. This is understood as the system's ability to lose previously acquired knowledge while assimilating new information^{[13,14]}. Continual lifelong learning (CLL) is conceived as the sustained ability of a nonlinear system to acquire, assimilate, and retain knowledge over prolonged periods without the interference of catastrophic forgetting. This concept is particularly critical when delving into the realm of online NN control strategies for nonlinear CT systems, as these systems are often tasked with navigating and managing complex processes within dynamic and varying environments and conditions. Nonetheless, the lifelong learning (LL) methodologies shown in^{[13,15]} operate in an offline mode and have not been applied to real-time NN control scenarios yet. This scenario offers a promising direction to leverage the advantage of LL in online control systems, addressing catastrophic forgetting and thus enhancing the efficacy of the control system progressively. Implementing LL-oriented strategies in online NN control enables persistent learning and adaptation without discarding prior knowledge, thereby improving its overall performance. By developing an LL-based NN trajectory tracking scheme, it is possible to continuously learn and track trajectories of interest without losing information about previous tasks.

This paper presents an optimal backstepping control approach that incorporates reinforcement learning (RL) to design the controller. The proposed method utilizes an augmented system to address the tracking problem, incorporating both feedforward and feedback controls, which sets it apart from prior work such as^{[8,16]}. This approach uses a trajectory generator to generate the trajectories and hence deals with the non-stationary condition in the HJB equation that arises in optimal tracking problems due to the time-varying reference trajectory. In addition, the proposed weight update laws are direct error driven based, obtained using Hamiltonian and control input error, in contrast to where the weight update laws are obtained using some positive functions^{[8,16]}. Furthermore, the control scheme incorporates an identifier where the approximation error is bounded above by system states to approximate the unknown system dynamics, as opposed to prior work, such as^{[8,16]}, where the system dynamics are either completely known or partially known. Additionally, the utilization of an HJB equation at each step of the backstepping process is intended to ensure that the entire sequence of steps is optimized.

The paper also examines the impacts of LL and catastrophic forgetting on control systems and proposes strategies for addressing these challenges in control system-based applications. Specifically, the proposed method employs a weight velocity attenuation (WVA)-based LL scheme in an online manner, in contrast to prior work, such as^{[13,15]}, which utilizes offline learning. Additionally, the proposed method demonstrates the stability of the LL scheme via Lyapunov analysis in contrast to offline-based learning^{[13,15]}, where the weight convergence is not addressed. To validate the effectiveness of the proposed method, an unmanned aerial vehicle (UAV) application is considered, and the proposed method is contrasted with the existing approach. Lyapunov stability shows the uniform ultimate boundedness (UUB) of the overall closed-loop continual lifelong RL (LRL) scheme.

The contributions include

(1) A novel optimal trajectory tracking control formulation is presented, utilizing an augmented system approach for nonlinear strict-feedback systems within an ADP-based framework, offering a novel perspective.

(2) An NN-based identifier is employed, wherein the reconstruction error is presumed to be upper-bounded by the norm of the state vector, providing an enhanced approximation of the system dynamics. The new weight update laws are introduced, incorporating Hamiltonian and the NN identifier within an actor-critic framework at each step of the backstepping process.

(3) An online LL method is developed in the critic NN weight update law, mitigating both catastrophic forgetting and gradient explosion, with the significance of weights for NN layers obtained using Fisher Information Matrix (FIM) determined by the Bellman error, as opposed to offline LL-based methods with targets.

(4) Lyapunov stability analysis is undertaken for the entire closed-loop tracking system, involving the identifier NN and the LL-based actor-critic NN framework to show the UUB of the closed-loop system.

In this section, we provide the problem formulation and the development of our proposed LRL approach for uncertain nonlinear CT systems in strict feedback form.

Consider the following strict feedback system

where

^{[17]}).

^{[4]}).

Next, the LRL control design is introduced. The goal of the LRL control scheme is to achieve satisfactory tracking and maintain the boundedness of all closed-loop system signals while minimizing the control effort and addressing the issue of catastrophic forgetting.

The design of the control system begins by implementing an optimal backstepping approach using augmented system-based actor-critic architecture and then using an online LL to mitigate catastrophic forgetting.

To develop optimal control using the backstepping technique, first, a new augmented system is expressed in terms of tracking error as follows. Define the tracking error as

where

In order to get both the feedforward and feedback part of the controller, the tracking problem is changed to a regulation problem by defining a new augmented state as

where

Step 1: For the first backstepping step, let

where

By taking the time derivative on both sides of the optimal performance function (4), the tracking Bellman equation is obtained as

By noting that the first term of (5) is

Therefore, the tracking HJB equation is generated as

where

It is well known that NNs have universal function approximation abilities and can approximate a nonlinear continuous function

Since

where

where optimal Hamiltonian function

where

where the estimated Hamiltonian function

where

where

Therefore,

where

where

where

since the optimal Hamiltonian value is zero. Notice that the estimated Hamiltonian,

Since

where

where

The HJB equation for step 2 is given by

where

Since

Therefore,

where

where

where

Since

where

where

where

Similarly, the actor NN will be designed to estimate the optimal control as

where

since the optimal Hamiltonian for the second step is zero, notice that the estimated Hamiltonian,

Next, an NN identifier will be used to approximate the unknown dynamics given by (3) and (22).

A single-layer NN is used to approximate both the nonlinear functions ^{[17]}. Then, by using

where

^{[17]}).

^{[18]}, where

Define the dynamics of the NN identifier as

where

where

In this section, the actor-critic weight update laws are obtained using the gradient descent method to the Hamiltonian-based performance function. The following Lemma is stated.

By using the gradient descent algorithm, the weight update law can be obtained as

On simplifying (42), we will get the weight update law for critic NN, as shown in Lemma. The weight update law for actor NN is obtained by defining the performance function as

By using the gradient descent approach, the weight update law for an actor NN is obtained as

On further solving and adding the stabilization terms, we will get the weight update law shown in Lemma 1.

The following assumption is stated next.

^{[4]}).

Next, the following theorem is stated.

Next, an online regularization-based approach to LL is introduced.

To mitigate the issues of catastrophic forgetting^{[13]}, a novel technique called WVA was proposed^{[15]}. However, WVA has only been used in an offline manner, which cannot be applied to NN-based online techniques.

In contrast, this study introduces a new online LL technique that can be integrated into an online NN-based trajectory tracking control scheme by identifying and safeguarding the most critical parameters during the optimization process. To achieve this, the proposed technique employs a performance function given by

where

where

where

where

Subsequently, leveraging normalized gradient descent allows us to formulate an additional term in the critic weight update law. This term is derived as follows

For LL, the terms from (49) are combined with the terms from the previously defined update law that is given in Theorem 1. Next, the following theorem is stated.

This section delineates the outcomes of optimal tracking control founded on LRL, applied on an underactuated UAV system.

Consider the UAV model depicted in

Quadrotor UAV.

The quadrotor dynamics can be modeled by two unique equations: (1) translational; and (2) rotational. However, these dynamic equations interrelate via the rotation matrix, rendering them as two cohesive subsystems. A holistic control strategy involves both outer and inner loop controls, corresponding to the two subsystems. The outer loop aims to execute positional control by managing the state variables of

Define

Given the underactuated nature of UAV translational dynamics, an intermediate control vector

is introduced for optimal position control derivation, and the translational dynamic can thus be reformulated as

Solving yields the control

Using the reference trajectory vector

For the coordinate transformation of rotational dynamic, the transformation relationship between rotational velocity

with

Applying time derivation to both sides yields the attitude dynamic as:

The function

So, the attitude dynamic can be rephrased in strict feedback form as

Reference signals are denoted as

Tracking error variables are designated as

Therefore, using the control law (34) and the weight update laws shown in Theorems 1 and 2 for translational and attitude dynamics will drive the UAV system to track the reference trajectory, as shown in the simulations.

The desired position trajectory for

We consider two task scenarios in which the reference trajectory is changed in each task as if the UAV is moving in a different path or environment. In the simulations, we have shown task 1 again to demonstrate that when the UAV returns to task 1, the LL-based control will help mitigate the catastrophic forgetting. The proposed method is able to drive the UAV to track the reference trajectory accurately, even on changing tasks. ^{[9]} shown by green color. ^{[9]}.

Actual and reference trajectories using the proposed LL-based method.

Tracking performance of position and attitude subsystems using LRL and recent literature (r-lit)^{[9]} methods.

Position and attitude tracking errors using proposed LRL and recent literature (r-lit)^{[9]} methods.

Both the system state tracking plots and positional error plots in ^{[9]} exhibits higher error, thus showing the need for LL. In contrast, the 'Lit' method, as shown in green, has a higher error rate when compared to other methods. Notably, the total average error shown in

Torque inputs and cumulative cost using proposed LRL and recent literature (r-lit)^{[9]} methods.

This paper proposed an innovative LL tracking control technique for uncertain nonlinear CT systems in strict feedback form. The method combined the augmented system, trajectory generator, and optimal backstepping approach to design both feedforward and feedback terms of the tracking scheme. By utilizing a combination of actor-critic NN and identifier NN, the method effectively approximated the solution to the HJB equations with unknown nonlinear functions. The use of RL at each step of the backstepping process allows for the development of virtual and actual optimal controllers that can effectively handle the challenges posed by uncertain, strict feedback systems. The proposed work highlighted the significance of considering catastrophic forgetting in online controller design and developed a new method to address this issue. Simulation results on a UAV tracking a desired trajectory show acceptable performance. The proposed approach can be extended by using deep NNs for better approximation. In addition, the integral RL (IRL)-based approach can relax the drift dynamics. Dynamic surface control can be included to minimize the number of NNs used.

Made substantial contributions to the conception and design of the study: Ganie I, Jagannathan S

Made contributions in writing, reviewing, editing, and methodology: Ganie I, Jagannathan S

Not applicable.

The project or effort undertaken was or is sponsored by the Office of Naval Research Grant N00014-21-1-2232 and Army Research Office Cooperative Agreements W911NF-21-2-0260 and W911NF-22-2-0185.

Both authors declared that there are no conflicts of interest.

Not applicable.

Not applicable.

© The Author(s) 2024.

The time derivative of

Where

Considering the first term of (51), we can write it as

Substituting (3), (9) in (52) gives

Substituting the value of

Using

Separating the terms in (55) w.r.t actual NN weights and the terms w.r.t weight estimation error gives

Substituting

One can further simplify (57), as follows

Using (11), we have

which can be further written as

where

On substituting the value of

which can be further simplified by using the cyclic property of traces as

To simplify, we have

Consider the fourth term of (52),

Therefore, one can further simplify (65) by using Young's inequality in cross-product terms as follows

Consider the fifth term of (52)

Using

Considering a last term, we can write

Using Young's inequality in the cross product terms, we can write

Combining (60), (64), (66) and (69) and simplifying, we have

where

where

This demonstrates that the overall closed-loop system is bounded. Since

This is the final step. Consider the Lyapunov function as follows

The time derivative of

Let

Considering the second term of (74), substituting (3) in

which on further solving leads to

Using

Separating the terms in (77) w.r.t actual NN weights and the terms w.r.t weight estimation error gives

Substituting

One can further simplify (79), as follows

Using (11), we have

which can be further written as

where

Consider the second and third term of (52)

On substituting the value of

which can be further simplified by using the cyclic property of traces as

To simplify, we have

Consider the fourth term of (52),

Therefore, one can further simplify (87) by using Young's inequality in cross-product terms as follows

Consider the fifth term of (52)

Using

Considering a last term, we can write

Using Young's inequality in the cross product terms, we can write

Combining (82), (86), (88) and (91) and simplifying, we have

where

This demonstrates that the overall closed-loop system is bounded. Since

The convergence of weights for Task 1 remains in alignment with Theorem 1. For Task 2, an additional term emerges in the Lyapunov proof (92) due to the regularization penalty, denoted as

where

Substituting

Employing Young's inequality to the first and third terms of (95), we get

Substituting

Thus, the integration of this term into the proof solely modifies the error bound to

The aggregate contribution to the error bounds is calculated by adding

McLain TW, Beard RW. Successive galerkin approximations to the nonlinear optimal control of an underwater robotic vehicle. In Proceedings of the1998 IEEE international conference on robotics and automation (Cat. No. 98CH36146). Leuven, Belgium. 20-20 May 1998.

10.1109/ROBOT.1998.677069

Bryson AE. Applied optimal control: optimization, estimation and control. New York: Routledge; 1975. p. 496.

10.1201/9781315137667

Kutalev A, Lapina A. Stabilizing elastic weight consolidation method in practical ML tasks and using weight importances for neural network pruning.