Page 48 - Read Online
P. 48
Page 146 Boin et al. Intell Robot 2022;2(2):14567 I http://dx.doi.org/10.20517/ir.2022.11
1. INTRODUCTION
In recent years, federated learning (FL) and its extension federated reinforcement learning (FRL) have become
apopulartopicofdiscussionintheartificialintelligence(AI)community. TheconceptofFLwasfirstproposed
[1]
byGooglewiththedevelopmentofthefederatedaveraging(FedAvg)aggregationmethod . FedAvgprovided
an increase in the performance of distributed systems while also providing privacy advantages when compared
to centralized architectures for supervised machine learning (ML) tasks [1–3] . FL’s core ideology was initially
motivated by the need to train ML models from distributed data sets across mobile devices while minimizing
[1]
data leakage and network usage .
Research on the topics of reinforcement learning (RL) and deep reinforcement learning (DRL) has made great
progress over the years; however, there remain important challenges for ensuring the stable performance of
DRL algorithms in the real world. DRL processes are often sensitive to small changes in the model space
or hyper-parameter space, and as such the application of a single trained model across similar systems often
leads to control inaccuracies or instability [4,5] . In order to overcome the stability challenges that DRL poses,
often a model must be manually customized to accommodate the finite differences amongst similar agents
in a distributed system. FRL aims to overcome the aforementioned issues by allowing agents to share private
[1]
information in a secure way. By utilizing an aggregation method, such as FedAvg , systems with many agents
can have decreased training times with increased performance.
Despite the popularity of FL and FRL, to the best of our knowledge at the time of this study, there are no works
applying FRL to platoon control. In general, there are two types of “models” for AV decision making: vehicle-
[6]
following modeling and lane-changing modeling . For the purposes of this study, the vehicle-following ap-
proach known as co-operative adaptive cruise control (CACC) is explored. Vehicle following models are based
[7]
on following a vehicle on a single lane road with respect to a leading vehicle’s actions . CACC is a multi-
vehicle control strategy where vehicles follow one another in a line known as a platoon, while simultaneously
[8]
transmitting vehicle data amongst each other . CACC platoons have been proven to improve traffic flow sta-
bility, throughput and safety for occupants [8,9] . Traditionally controlled vehicle following models have limited
[7]
accuracy, poor generalization from a lack of data, and a lack of adaptive updating .
We are motivated by the current state-of-the-art for CACC AV Platoons, along with previous works related
to FRL, to apply FRL to the AV platooning problem and observe the performance benefits it may have on the
system. We propose an FRL framework built atop a custom AV platooning environment in order to analyse
FRL’ssuitabilityforimprovingAVplatoonperformance. Inaddition,twoapproachesareproposedforapplying
FRL amongst AV platoons. The first proposed method is inter-platoon FRL (Inter-FRL), where FRL is applied
to AVs across different platoons. The second proposed method is intra-platoon FRL (Intra-FRL), where FRL is
applied to AVs within the same platoon. We investigate the possibility of Inter-FRL and Intra-FRL as a means
to increase performance using two aggregation methods: averaging model weights and averaging gradients.
Furthermore, the performance of Inter-FRL and Intra-FRL using both aggregation methods is studied relative
to platooning environments trained without FRL (no-FRL). Finally, we compare the performance of Intra-FRL
with weight averaging (Intra-FRLWA) against a platooning environment trained without FRL for platoons of
length 3, 4 and 5 vehicles.
1.1. Related works
In this subsection, the current state-of-the-art is presented for FRL and DRL applied to AV’s. In addition the
contributions of this paper are presented.