Page 52 - Read Online
P. 52
Page 47 Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02
mainly concerned with privacy leakage and is employed by RADC and GADC. Note that the DDPG algorithm
is applied for local anomaly detection model training.
5.5. FRL for other applications
Due to the outstanding performance of training efficiency and privacy protection, many researchers are ex-
ploring the possible applications of FRL.
FL has been applied to realize distributed energy management in IoT applications. In the revolution of smart
home, smart meters are deployed in the advanced metering infrastructure (AMI) to monitor and analyze the
energy consumption of users in real-time. As an example [110] , the FRL-based approach is proposed for the
energy management of multiple smart homes with solar PVs, home appliances, and energy storage. Multiple
local home energy management systems (LHEMSs) and a global server (GS) make up FRL architecture of
the smart home. DRL agents for LHEMSs construct and upload local models to the GS by using energy
consumption data. The GS updates the global model based on local models of LHEMSs using the federated
stochasticgradientdescent(FedSGD)algorithm. Underheterogeneoushomeenvironments,simulationresults
indicatethattheproposedapproachoutperformsotherswhenitcomestoconvergencespeed, applianceenergy
consumption, and the number of agents.
Moreover, FRL offers an alternative to share information with low latency and privacy preservation. The col-
laborative perception of vehicles provided by IoV can greatly enhance the ability to sense things beyond their
line of sight, which is important for autonomous driving. Region quadtrees have been proposed as a storage
and communication resource-saving solution for sharing perception information [111] . It is challenging to tai-
lor the number and resolution of transmitted quadtree blocks to bandwidth availability. In the framework of
FRL, Mohamed et al. [112] presents a quadtree-based point cloud compression mechanism to select coopera-
tive perception messages. Specifically, over a period of time, each vehicle covered by an RSU transfers its latest
network weights with the RSU, which then averages all of the received model parameters and broadcasts the
result back to the vehicles. Optimal sensory information transmission (i.e., quadtree blocks) and appropri-
ate resolution levels for a given vehicle pair are the main objectives of a vehicle. The dueling and branching
concepts are also applied to overcome the vast action space inherent in the formulation of the RL problem.
Simulation results show that the learned policies achieve higher vehicular satisfaction and the training process
is enhanced by FRL.
5.6. Lessons Learned
Inthefollowing, wesummarizethemajorlessonslearnedfromthissurveyinordertoprovideacomprehensive
understanding of current research on FRL applications.
5.6.1. Lessons learned from the aggregation algorithms
The existing FRL literature usually uses classical DRL algorithms, such as DQN and DDPG, at the participants,
while the gradients or parameters of the critic and/or actor networks are periodically reported synchronously
or asynchronously by the participants to the coordinator. The coordinator then aggregates the parameters
or gradients and sends the updated values to the participants. In order to meet the challenges presented by
different scenarios, the aggregation algorithms have been designed as a key feature of FRL. In the original
FedAvg algorithm [12] , the number of samples in a participant’s dataset determines its influence on the global
model. In accordance with this idea, several papers propose different methods to calculate the weights in the
aggregationalgorithmsaccordingtotherequirementofapplication. InthestudyfromLim etal. [106] , theaggre-
gation weight is derived from the average of the cumulative rewards of the last ten episodes. Greater weights
are placed on the models of those participants with higher rewards. In contrast to the positive correlation
of reward, Huang et al. [96] takes the error rate of action as an essential factor to assign weights for participat-
ing in the global model training. In D2D -assisted edge caching, Wang et al. [89] uses the reward and some