Page 52 - Read Online
P. 52

Page 47                                                                  Qi et al. Intell Robot 2021;1(1):18-57  I

               mainly concerned with privacy leakage and is employed by RADC and GADC. Note that the DDPG algorithm
               is applied for local anomaly detection model training.

               5.5. FRL for other applications
               Due to the outstanding performance of training efficiency and privacy protection, many researchers are ex-
               ploring the possible applications of FRL.

               FL has been applied to realize distributed energy management in IoT applications. In the revolution of smart
               home, smart meters are deployed in the advanced metering infrastructure (AMI) to monitor and analyze the
               energy consumption of users in real-time. As an example [110] , the FRL-based approach is proposed for the
               energy management of multiple smart homes with solar PVs, home appliances, and energy storage. Multiple
               local home energy management systems (LHEMSs) and a global server (GS) make up FRL architecture of
               the smart home. DRL agents for LHEMSs construct and upload local models to the GS by using energy
               consumption data. The GS updates the global model based on local models of LHEMSs using the federated
               stochasticgradientdescent(FedSGD)algorithm. Underheterogeneoushomeenvironments,simulationresults
               indicatethattheproposedapproachoutperformsotherswhenitcomestoconvergencespeed, applianceenergy
               consumption, and the number of agents.

               Moreover, FRL offers an alternative to share information with low latency and privacy preservation. The col-
               laborative perception of vehicles provided by IoV can greatly enhance the ability to sense things beyond their
               line of sight, which is important for autonomous driving. Region quadtrees have been proposed as a storage
               and communication resource-saving solution for sharing perception information [111] . It is challenging to tai-
               lor the number and resolution of transmitted quadtree blocks to bandwidth availability. In the framework of
               FRL, Mohamed et al. [112]  presents a quadtree-based point cloud compression mechanism to select coopera-
               tive perception messages. Specifically, over a period of time, each vehicle covered by an RSU transfers its latest
               network weights with the RSU, which then averages all of the received model parameters and broadcasts the
               result back to the vehicles. Optimal sensory information transmission (i.e., quadtree blocks) and appropri-
               ate resolution levels for a given vehicle pair are the main objectives of a vehicle. The dueling and branching
               concepts are also applied to overcome the vast action space inherent in the formulation of the RL problem.
               Simulation results show that the learned policies achieve higher vehicular satisfaction and the training process
               is enhanced by FRL.

               5.6. Lessons Learned
               Inthefollowing, wesummarizethemajorlessonslearnedfromthissurveyinordertoprovideacomprehensive
               understanding of current research on FRL applications.

               5.6.1. Lessons learned from the aggregation algorithms
               The existing FRL literature usually uses classical DRL algorithms, such as DQN and DDPG, at the participants,
               while the gradients or parameters of the critic and/or actor networks are periodically reported synchronously
               or asynchronously by the participants to the coordinator. The coordinator then aggregates the parameters
               or gradients and sends the updated values to the participants. In order to meet the challenges presented by
               different scenarios, the aggregation algorithms have been designed as a key feature of FRL. In the original
               FedAvg algorithm [12] , the number of samples in a participant’s dataset determines its influence on the global
               model. In accordance with this idea, several papers propose different methods to calculate the weights in the
               aggregationalgorithmsaccordingtotherequirementofapplication. InthestudyfromLim etal. [106] , theaggre-
               gation weight is derived from the average of the cumulative rewards of the last ten episodes. Greater weights
               are placed on the models of those participants with higher rewards. In contrast to the positive correlation
               of reward, Huang et al. [96]  takes the error rate of action as an essential factor to assign weights for participat-
               ing in the global model training. In D2D -assisted edge caching, Wang et al. [89]  uses the reward and some
   47   48   49   50   51   52   53   54   55   56   57