Page 124 - Read Online
P. 124

Boin et al. Intell Robot 2022;2(2):145­67  I http://dx.doi.org/10.20517/ir.2022.11  Page 149

               platoon problem. However, despite the DDPG algorithm’s success in literature, there are still instability chal-
               lenges related to the algorithm along with a time consuming hyper-parameter tuning process to account for
               the minute differences in vehicle models/dynamics amongst platoons. As previously discussed, FRL provides
               advantages in these areas where information sharing can accelerate performance during training and improve
               the performance of the system as a whole. In addition, the ability to share experience across like models has
               been proven to allow for fast convergence of models, which further optimizes the performance of DDPG when
                                  [5]
               applied to AV platoons .


               1.2. Contributions
               To the best of our knowledge, no works at the time of this study existed covering the specific topic of FRL
               applied to platoon control. Many of the works existing on FRL have shown the benefits of FRL with regard
               to the increased rate of convergence and overall system performance with distributed networks, edge caching
               and communications [16–19] . Furthermore, of the works cited in this study, the works closely related to FRL
               for platoon control are those of Peake et al. and Liang et al. [15,23] . In contrast to Liang et al., where FedAvg
               is applied successfully to control the steering angle of a single vehicle, we apply FRL to an AV platooning
               problem where the control of multiple vehicles’ positions and spacing are required [15] . Peake et al. explore
               multi-agent reinforcement learning and its ability to improve the performance of AV platoons experiencing
               communication delays [23] . Although Peake et al. are also successful in their approach, there is no specific
               reference to FRL in the paper [23] . In addition, a variety of existing works on FRL choose to use either gradients
               or model weights in the FRL aggregation method. This study explores how both aggregation methods can
               provide benefits to the AV platooning problem and, most importantly, which provides a better result. Finally,
               this study further distinguishes its approach from existing literature by declaring two possible ways to apply
               FRL to AV platooning:
               1. Intra-FRL: where multi-vehicle platoons share data during training to increase the performance of vehicles
                  within the same platoon.
               2. Inter-FRL: where multi-vehicle platoons share data during training across platoons amongst vehicles in the
                  exact same platoon position to increase performance.

               Incontrasttoexistingliterature, whereitiscommontoaveragetheparametersacrosseachmodelinthesystem,
               for Intra-FRL, we propose a directional averaging where follower vehicles incorporate the preceding vehicle
               parameters in the computation of the gradients or weights. Thus, in Intra-FRL, the leading vehicle trains
               independently of those following. The AV platoon provides a unique playground environment suitable for
               exploring the suitability of FRL as a means to increase the performance of systems with regard to convergence
               rate and performance.





               2. PROPOSED FRAMEWORK
               In this section, a state space model is formulated and presented for the AV platooning problem. Next, the
               MDP model is presented, outlining the platoon system’s state space, action space and reward function. Lastly,
               the FRL DDPG algorithm design and application to AV platooning are described.



               2.1. CACC CTHP model formulation
               Consider a platoon    of vehicles V =    1 ,   2 , ...,      where the leader of the platoon is    1.
   119   120   121   122   123   124   125   126   127   128   129