Page 56 - Read Online
P. 56

Page 51                             Qi et al. Intell Robot 2021;1(1):18-57  I

               a respective label. The attacker flips the labels on training examples in one category onto another while the
               features of the examples are kept unchanged, misguiding the establishment of a target model [124] . However, in
               the decision-making task of FRL, the training data is continuously generated from the interaction between the
               agent and the environment. As a result, the data poisoning attack is implemented in another way. For example,
               the malicious agent tampers with the reward, which causes the evaluative function to shift. An option is to
               conduct regular safety assessments for all participants. Participants whose evaluation indicator falls below the
               threshold are punished to reduce the impact on the global model [125] . Apart form the insider attacks which
               are launched by the agents in the FRL system, there may be various outsider attacks which are launched by
               intruders or eavesdroppers. Intruders may hide in the environment where the agent is and manipulate the
               transitions of environment to achieve specific goals. In addition, by listening to the communication between
               the coordinator and the agent, the eavesdropper may infer sensitive information from exchanging parameters
               and gradients [126] . Therefore, the development of technology that detects and protects against attacks and
               privacy threats does have great potential and is urgently needed.

               6.5. Join and exit mechanisms design
               One overlooked aspect of FRL-based research is the join and exit process of participants. In practice, the
               management of participants is essential to the normal progression of cooperation. As mentioned earlier in
               the security issue, the penetration of malicious participants severely impacts the performance of the cooper-
               ative model and the speed of training. The joining mechanism provides participants with the legal status to
               engage in federated cooperation. It is the first line of defense against malicious attackers. In contrast, the exit
               mechanism signifies the cancellation of the permission for cooperation. Participant-driven or enforced exit
               mechanisms are both possible. In particular, for synchronous algorithms, ignoring the exit mechanism can
               negatively impact learning efficiency. This is because the coordinator needs to wait for all participants to sub-
               mit their information. In the event that any participant is offline or compromised and unable to upload, the
               time for one round of training will be increased indefinitely. To address the bottleneck, a few studies consider
               updating the global model using the selected models from a subset of participants [113,127] . Unfortunately, there
               is no comprehensive consideration of the exit mechanism, and the communication of participants is typically
               assumed to be reliable. Therefore, research gaps of FRL still exist in joining and exiting mechanisms. It is
               expected that the coordinator or monitoring system, upon discovering a failure, disconnection, or malicious
               participant, will use the exit mechanism to reduce its impact on the global model or even eliminate it.

               6.6. Incentive mechanisms
               For most studies, the agents taking part in the FRL process are assumed to be honest and voluntary. Each agent
               provides assistance for the establishment of the cooperation model following the rules and freely shares the
               masked experience through encrypted parameters or gradients. An agent’s motivation for participation may
               come from regulation or incentive mechanisms. The FRL process within an organization is usually governed
               by regulations. For example, BSs belonging to the same company establish a joint model for offloading and
               caching. Nevertheless, because participants may be members of different organizations or use disparate equip-
               ment, it is difficult for regulation to force all parties to share information learned from their own data in the
               same manner. If there are no regulatory measures, participants prone to selfish behavior will only benefit from
               the cooperation model but not submit local updates. Therefore, the cooperation of multiple parties, organiza-
               tions, or individuals requires a fair and efficient incentive mechanism to encourage their active participation.
               In this way, agents providing more contributions can benefit more and selfish agents unwilling to share there
               learning experience will receive less benefit. As an example, Google Keyboard [128]  users can choose whether
               or not to allow Google to use their data, but if they do, they can benefit from more accurate word prediction.
               Although an incentive mechanism in a context-aware manner among data owners is proposed in the study
               from Yu et al. [129] , it is not suitable for the RL problems. There is still no clear plan of action regarding how
               the FRL-based application can be designed to create a reasonable incentive mechanism for inspiring agents to
               participate in collaborative learning. To be successful, future research needs to propose a quantitative standard
   51   52   53   54   55   56   57   58   59   60   61