Page 53 - Read Online

P. 53

Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 48

device-related indicators as the measurement to evaluate the local model’s contribution to the global model.
Moreover, the existing FRL methods based on offline DRL algorithms, such DQN and DDPG, usually use ex-
perience replay. Sampling random batch from replay memory can break correlations of continuous transition
tuples and accelerate the training process. To arrive at an accurate evaluation of the participants, the paper [102]
calculates the aggregation weight based on the size of the training batch in each iteration.

The above aggregation methods can effectively deal with the issue of data imbalance and performance dis-
crepancy between participants, but it is hard for participants to cope with subtle environmental differences.
According to the paper [105] , as soon as a participant reaches the predefined criteria in its own environment, it
should stop learning and send its model parameters as a reference to the remaining individuals. Exchanging
mature network models (satisfying terminal conditions) can help other participants complete their training
quickly. Participants in other similar environments can continue to use FRL for further updating their param-
eters to achieve the desired model performance according to their individual environments. Liu et al. [57] also
suggests that the sharing global model in the cloud is not the final policy model for local participants. An
effective transfer learning should be applied to resolve the structural difference between the shared network
and private network.

5.6.2. Lessons learned from the relationship between FL and RL
InmostoftheliteratureonFRL,FLisusedtoimprovetheperformanceofRL.WithFL,thelearningexperience
can be shared among decentralized multiple parties while ensuring privacy and scalability without requiring
direct data offloading to servers or third parties. Therefore, FL can expand the scope and enhance the security
ofRL.AmongtheapplicationsofFRL,mostresearchersfocusonthecommunicationnetworksystemduetoits
robust security requirements, advanced distributed architecture, and a variety of decision-making tasks. Data
offloading [93] and caching [89] solutions powered by distributed AI are available from FRL. In addition, with
the ability to detect a wide range of attacks and support defense solutions, FRL has emerged as a strong alter-
native for performing distributed learning for security-sensitive scenarios. Enabled by the privacy-enhancing
and cooperative features, detection and defense solutions can be learned quickly where multiple participants
join to build a federated model [107,109] . FRL can also provide viable solutions to realize intelligence for control
systems with many applied domains such as robotics [57] and autonomous driving [104] without data exchange
and privacy leakage. The data owners (robot or vehicle) may not trust the third-party server and therefore hes-
itate to upload their private information to potentially insecure learning systems. Each participant of FRL runs
a separate RL model for determining its own control policy and gains experience by sharing model parameters,
gradients or losses.

Meanwhile, RL may have the potential to optimize FL schemes and improve the efficiency of training. Due
to the unstable network connectivity, it is not practical for FL to update and aggregate models simultaneously
across all participants. Therefore, Wang et al. [113] proposes a RL-based control framework that intelligently
chooses the participants to participate in each round of FL with the aim to speed up convergence. Similarly,
Zhang et al. [114] applies RL to pre-select a set of candidate edge participants, and then determine reliable edge
participants through social attribute perception. In IoT or IoV scenarios, due to the heterogeneous nature
of participating devices, different computing and communication resources are available to them. RL can
speed up training by coordinating the allocation of resources between participants. Zhan et al. [115] defines
the L4L (Learning for Learning) concept, i.e., use RL to improve FL. Using the heterogeneity of participants
and dynamic network connections, this paper investigates a computational resource control problem for FL
that simultaneously considers learning time and energy efficiency. An experience-driven resource control
approach based on RL is presented to derive the near-optimal strategy with only the participants’ bandwidth
information in the previous training rounds. In addition, as with any other ML algorithm, FL algorithms are
vulnerable to malicious attacks. RL has been studied to defend against attacks in various scenarios, and it
can also enhance the security of FL. The paper [116] proposes a reputation-aware RL (RA-RL) based selection

48 49 50 51 52 53 54 55 56 57 58