Page 56 - Read Online
P. 56
Page 51 Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02
a respective label. The attacker flips the labels on training examples in one category onto another while the
features of the examples are kept unchanged, misguiding the establishment of a target model [124] . However, in
the decision-making task of FRL, the training data is continuously generated from the interaction between the
agent and the environment. As a result, the data poisoning attack is implemented in another way. For example,
the malicious agent tampers with the reward, which causes the evaluative function to shift. An option is to
conduct regular safety assessments for all participants. Participants whose evaluation indicator falls below the
threshold are punished to reduce the impact on the global model [125] . Apart form the insider attacks which
are launched by the agents in the FRL system, there may be various outsider attacks which are launched by
intruders or eavesdroppers. Intruders may hide in the environment where the agent is and manipulate the
transitions of environment to achieve specific goals. In addition, by listening to the communication between
the coordinator and the agent, the eavesdropper may infer sensitive information from exchanging parameters
and gradients [126] . Therefore, the development of technology that detects and protects against attacks and
privacy threats does have great potential and is urgently needed.
6.5. Join and exit mechanisms design
One overlooked aspect of FRL-based research is the join and exit process of participants. In practice, the
management of participants is essential to the normal progression of cooperation. As mentioned earlier in
the security issue, the penetration of malicious participants severely impacts the performance of the cooper-
ative model and the speed of training. The joining mechanism provides participants with the legal status to
engage in federated cooperation. It is the first line of defense against malicious attackers. In contrast, the exit
mechanism signifies the cancellation of the permission for cooperation. Participant-driven or enforced exit
mechanisms are both possible. In particular, for synchronous algorithms, ignoring the exit mechanism can
negatively impact learning efficiency. This is because the coordinator needs to wait for all participants to sub-
mit their information. In the event that any participant is offline or compromised and unable to upload, the
time for one round of training will be increased indefinitely. To address the bottleneck, a few studies consider
updating the global model using the selected models from a subset of participants [113,127] . Unfortunately, there
is no comprehensive consideration of the exit mechanism, and the communication of participants is typically
assumed to be reliable. Therefore, research gaps of FRL still exist in joining and exiting mechanisms. It is
expected that the coordinator or monitoring system, upon discovering a failure, disconnection, or malicious
participant, will use the exit mechanism to reduce its impact on the global model or even eliminate it.
6.6. Incentive mechanisms
For most studies, the agents taking part in the FRL process are assumed to be honest and voluntary. Each agent
provides assistance for the establishment of the cooperation model following the rules and freely shares the
masked experience through encrypted parameters or gradients. An agent’s motivation for participation may
come from regulation or incentive mechanisms. The FRL process within an organization is usually governed
by regulations. For example, BSs belonging to the same company establish a joint model for offloading and
caching. Nevertheless, because participants may be members of different organizations or use disparate equip-
ment, it is difficult for regulation to force all parties to share information learned from their own data in the
same manner. If there are no regulatory measures, participants prone to selfish behavior will only benefit from
the cooperation model but not submit local updates. Therefore, the cooperation of multiple parties, organiza-
tions, or individuals requires a fair and efficient incentive mechanism to encourage their active participation.
In this way, agents providing more contributions can benefit more and selfish agents unwilling to share there
learning experience will receive less benefit. As an example, Google Keyboard [128] users can choose whether
or not to allow Google to use their data, but if they do, they can benefit from more accurate word prediction.
Although an incentive mechanism in a context-aware manner among data owners is proposed in the study
from Yu et al. [129] , it is not suitable for the RL problems. There is still no clear plan of action regarding how
the FRL-based application can be designed to create a reasonable incentive mechanism for inspiring agents to
participate in collaborative learning. To be successful, future research needs to propose a quantitative standard