Page 53 - Read Online
P. 53

Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02      Page 48


               device-related indicators as the measurement to evaluate the local model’s contribution to the global model.
               Moreover, the existing FRL methods based on offline DRL algorithms, such DQN and DDPG, usually use ex-
               perience replay. Sampling random batch from replay memory can break correlations of continuous transition
               tuples and accelerate the training process. To arrive at an accurate evaluation of the participants, the paper [102]
               calculates the aggregation weight based on the size of the training batch in each iteration.


               The above aggregation methods can effectively deal with the issue of data imbalance and performance dis-
               crepancy between participants, but it is hard for participants to cope with subtle environmental differences.
               According to the paper [105] , as soon as a participant reaches the predefined criteria in its own environment, it
               should stop learning and send its model parameters as a reference to the remaining individuals. Exchanging
               mature network models (satisfying terminal conditions) can help other participants complete their training
               quickly. Participants in other similar environments can continue to use FRL for further updating their param-
               eters to achieve the desired model performance according to their individual environments. Liu et al. [57]  also
               suggests that the sharing global model in the cloud is not the final policy model for local participants. An
               effective transfer learning should be applied to resolve the structural difference between the shared network
               and private network.

               5.6.2. Lessons learned from the relationship between FL and RL
               InmostoftheliteratureonFRL,FLisusedtoimprovetheperformanceofRL.WithFL,thelearningexperience
               can be shared among decentralized multiple parties while ensuring privacy and scalability without requiring
               direct data offloading to servers or third parties. Therefore, FL can expand the scope and enhance the security
               ofRL.AmongtheapplicationsofFRL,mostresearchersfocusonthecommunicationnetworksystemduetoits
               robust security requirements, advanced distributed architecture, and a variety of decision-making tasks. Data
               offloading [93]  and caching [89]  solutions powered by distributed AI are available from FRL. In addition, with
               the ability to detect a wide range of attacks and support defense solutions, FRL has emerged as a strong alter-
               native for performing distributed learning for security-sensitive scenarios. Enabled by the privacy-enhancing
               and cooperative features, detection and defense solutions can be learned quickly where multiple participants
               join to build a federated model [107,109] . FRL can also provide viable solutions to realize intelligence for control
               systems with many applied domains such as robotics [57]  and autonomous driving [104]  without data exchange
               and privacy leakage. The data owners (robot or vehicle) may not trust the third-party server and therefore hes-
               itate to upload their private information to potentially insecure learning systems. Each participant of FRL runs
               a separate RL model for determining its own control policy and gains experience by sharing model parameters,
               gradients or losses.

               Meanwhile, RL may have the potential to optimize FL schemes and improve the efficiency of training. Due
               to the unstable network connectivity, it is not practical for FL to update and aggregate models simultaneously
               across all participants. Therefore, Wang et al. [113]  proposes a RL-based control framework that intelligently
               chooses the participants to participate in each round of FL with the aim to speed up convergence. Similarly,
               Zhang et al. [114] applies RL to pre-select a set of candidate edge participants, and then determine reliable edge
               participants through social attribute perception. In IoT or IoV scenarios, due to the heterogeneous nature
               of participating devices, different computing and communication resources are available to them. RL can
               speed up training by coordinating the allocation of resources between participants. Zhan et al. [115]  defines
               the L4L (Learning for Learning) concept, i.e., use RL to improve FL. Using the heterogeneity of participants
               and dynamic network connections, this paper investigates a computational resource control problem for FL
               that simultaneously considers learning time and energy efficiency. An experience-driven resource control
               approach based on RL is presented to derive the near-optimal strategy with only the participants’ bandwidth
               information in the previous training rounds. In addition, as with any other ML algorithm, FL algorithms are
               vulnerable to malicious attacks. RL has been studied to defend against attacks in various scenarios, and it
               can also enhance the security of FL. The paper [116]  proposes a reputation-aware RL (RA-RL) based selection
   48   49   50   51   52   53   54   55   56   57   58