Page 124 - Read Online

P. 124

Boin et al. Intell Robot 2022;2(2):14567 I http://dx.doi.org/10.20517/ir.2022.11 Page 149

platoon problem. However, despite the DDPG algorithm’s success in literature, there are still instability chal-
lenges related to the algorithm along with a time consuming hyper-parameter tuning process to account for
the minute differences in vehicle models/dynamics amongst platoons. As previously discussed, FRL provides
advantages in these areas where information sharing can accelerate performance during training and improve
the performance of the system as a whole. In addition, the ability to share experience across like models has
been proven to allow for fast convergence of models, which further optimizes the performance of DDPG when
[5]
applied to AV platoons .

1.2. Contributions
To the best of our knowledge, no works at the time of this study existed covering the specific topic of FRL
applied to platoon control. Many of the works existing on FRL have shown the benefits of FRL with regard
to the increased rate of convergence and overall system performance with distributed networks, edge caching
and communications [16–19] . Furthermore, of the works cited in this study, the works closely related to FRL
for platoon control are those of Peake et al. and Liang et al. [15,23] . In contrast to Liang et al., where FedAvg
is applied successfully to control the steering angle of a single vehicle, we apply FRL to an AV platooning
problem where the control of multiple vehicles’ positions and spacing are required [15] . Peake et al. explore
multi-agent reinforcement learning and its ability to improve the performance of AV platoons experiencing
communication delays [23] . Although Peake et al. are also successful in their approach, there is no specific
reference to FRL in the paper [23] . In addition, a variety of existing works on FRL choose to use either gradients
or model weights in the FRL aggregation method. This study explores how both aggregation methods can
provide benefits to the AV platooning problem and, most importantly, which provides a better result. Finally,
this study further distinguishes its approach from existing literature by declaring two possible ways to apply
FRL to AV platooning:
1. Intra-FRL: where multi-vehicle platoons share data during training to increase the performance of vehicles
within the same platoon.
2. Inter-FRL: where multi-vehicle platoons share data during training across platoons amongst vehicles in the
exact same platoon position to increase performance.

Incontrasttoexistingliterature, whereitiscommontoaveragetheparametersacrosseachmodelinthesystem,
for Intra-FRL, we propose a directional averaging where follower vehicles incorporate the preceding vehicle
parameters in the computation of the gradients or weights. Thus, in Intra-FRL, the leading vehicle trains
independently of those following. The AV platoon provides a unique playground environment suitable for
exploring the suitability of FRL as a means to increase the performance of systems with regard to convergence
rate and performance.

2. PROPOSED FRAMEWORK
In this section, a state space model is formulated and presented for the AV platooning problem. Next, the
MDP model is presented, outlining the platoon system’s state space, action space and reward function. Lastly,
the FRL DDPG algorithm design and application to AV platooning are described.

2.1. CACC CTHP model formulation
Consider a platoon of vehicles V = 1 , 2 , ..., where the leader of the platoon is 1.

119 120 121 122 123 124 125 126 127 128 129