Page 121 - Read Online
P. 121

Page 146                         Boin et al. Intell Robot 2022;2(2):145­67  I http://dx.doi.org/10.20517/ir.2022.11


               1. INTRODUCTION
               In recent years, federated learning (FL) and its extension federated reinforcement learning (FRL) have become
               apopulartopicofdiscussionintheartificialintelligence(AI)community. TheconceptofFLwasfirstproposed
                                                                                         [1]
               byGooglewiththedevelopmentofthefederatedaveraging(FedAvg)aggregationmethod . FedAvgprovided
               an increase in the performance of distributed systems while also providing privacy advantages when compared
               to centralized architectures for supervised machine learning (ML) tasks [1–3] . FL’s core ideology was initially
               motivated by the need to train ML models from distributed data sets across mobile devices while minimizing
                                          [1]
               data leakage and network usage .

               Research on the topics of reinforcement learning (RL) and deep reinforcement learning (DRL) has made great
               progress over the years; however, there remain important challenges for ensuring the stable performance of
               DRL algorithms in the real world. DRL processes are often sensitive to small changes in the model space
               or hyper-parameter space, and as such the application of a single trained model across similar systems often
               leads to control inaccuracies or instability [4,5] . In order to overcome the stability challenges that DRL poses,
               often a model must be manually customized to accommodate the finite differences amongst similar agents
               in a distributed system. FRL aims to overcome the aforementioned issues by allowing agents to share private
                                                                                  [1]
               information in a secure way. By utilizing an aggregation method, such as FedAvg , systems with many agents
               can have decreased training times with increased performance.


               Despite the popularity of FL and FRL, to the best of our knowledge at the time of this study, there are no works
               applying FRL to platoon control. In general, there are two types of “models” for AV decision making: vehicle-
                                                        [6]
               following modeling and lane-changing modeling . For the purposes of this study, the vehicle-following ap-
               proach known as co-operative adaptive cruise control (CACC) is explored. Vehicle following models are based
                                                                                        [7]
               on following a vehicle on a single lane road with respect to a leading vehicle’s actions . CACC is a multi-
               vehicle control strategy where vehicles follow one another in a line known as a platoon, while simultaneously
                                                     [8]
               transmitting vehicle data amongst each other . CACC platoons have been proven to improve traffic flow sta-
               bility, throughput and safety for occupants [8,9] . Traditionally controlled vehicle following models have limited
                                                                                   [7]
               accuracy, poor generalization from a lack of data, and a lack of adaptive updating .
               We are motivated by the current state-of-the-art for CACC AV Platoons, along with previous works related
               to FRL, to apply FRL to the AV platooning problem and observe the performance benefits it may have on the
               system. We propose an FRL framework built atop a custom AV platooning environment in order to analyse
               FRL’ssuitabilityforimprovingAVplatoonperformance. Inaddition,twoapproachesareproposedforapplying
               FRL amongst AV platoons. The first proposed method is inter-platoon FRL (Inter-FRL), where FRL is applied
               to AVs across different platoons. The second proposed method is intra-platoon FRL (Intra-FRL), where FRL is
               applied to AVs within the same platoon. We investigate the possibility of Inter-FRL and Intra-FRL as a means
               to increase performance using two aggregation methods: averaging model weights and averaging gradients.
               Furthermore, the performance of Inter-FRL and Intra-FRL using both aggregation methods is studied relative
               to platooning environments trained without FRL (no-FRL). Finally, we compare the performance of Intra-FRL
               with weight averaging (Intra-FRLWA) against a platooning environment trained without FRL for platoons of
               length 3, 4 and 5 vehicles.



               1.1. Related works
               In this subsection, the current state-of-the-art is presented for FRL and DRL applied to AV’s. In addition the
               contributions of this paper are presented.
   116   117   118   119   120   121   122   123   124   125   126