Page 55 - Read Online

P. 55

Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 50

inpartialinformationbyaddingaproximalterm [117] . Thelocalupdatessubmittedbyagentsareconstrainedby
thetunabletermandhaveadifferenteffectontheglobalparameters. Inaddition, aprobabilisticagentselection
scheme can be implemented to select the agents whose local FL models have significant effects on the global
model to minimize the FL convergence time and the FL training loss [118] . Another problem is theoretical
analysis of the convergence bounds. Although some existing studies have been directed at this problem [119] ,
the convergence can be guaranteed since the loss function is convex. How to analyze and evaluate the non-
convex loss functions in HFRL is also an important research topic in the future.

6.2. Agents without rewards in VFRL
In most existing works, all the RL agents have the ability to take part in full interaction with the environment
and can generate their own actions and rewards. Even though some MARL agents may not participate in the
policy decision, they still generate their own reward for evaluation. In some scenarios, special agents in VFRL
take the role of providing assistance to other agents. They can only observe the environment and pass on the
knowledgeoftheirobservation,soastohelpotheragentsmakemoreeffectivedecisions. Therefore,suchagents
do not have their own actions and rewards. The traditional RL models cannot effectively deal with this thorny
problem. Manyalgorithmseitherdirectlyusethestatesofsuchagentsaspublicknowledgeinthesystemmodel
or design corresponding action and reward for such agents, which may be only for convenience of calculation
and have no practical significance. These approaches cannot fundamentally overcome the challenge, especially
when privacy protection is also an essential objective to be complied with. Although the FedRL algorithm [65]
is proposed to deal with the above problem, which has demonstrated good performance, there are still some
limitations. First of all, the number of agents used in experiments and algorithms is limited to two, which
means the scalability of this algorithm is not high and VFRL algorithms for a large number of agents need
to be designed. Secondly, this algorithm uses Q-network as the federated model, which is a relatively simple
algorithm. Therefore, how to design VFRL models based on other more complex and changeable networks
remains an open issue.

6.3. Communications
In FRL, the agents need to exchange the model parameters, gradients, intermediate results, etc., between them-
selves or with a central server. Due to the limited communication resources and battery capacity, the commu-
nication cost is an important consideration when implementing these applications. With an increased number
of participants, the coordinator has to bear more network workload within the client-server FRL model [120] .
This is because each participant needs to upload and download model updates through the coordinator. Al-
though the distributed peer-to-peer model does not require a central coordinator, each agent may have to
exchange information with other participants more frequently. In current research for distributed models,
there are no effective model exchange protocols to determine when to share experiences with which agents. In
addition, DRL involves updating parameters in deep neural networks. Several popular DRL algorithms, such
as DQN [121] and DDPG [122] , consist of multiple layers or multiple networks. Model updates contain millions
of parameters, which isn’t feasible for scenarios with limited communication resources. The research direc-
tions for the above issues can be divided into three categories. First, it is necessary to design a dynamic update
mechanism for participants to optimize the number of model exchanges. A second research direction is to use
model compression algorithms to reduce the amount of communication data. Finally, aggregation algorithms
that allow participants to only submit the important parts of local model should be studied further.

6.4. Privacy and Security
Although FL provides privacy protection that allows the agents to exchange information in a secure manner
during the learning process, it still has several privacy and security vulnerabilities associated with communica-
tion and attack [123] . As FRL is implemented based on FL algorithms, these problems also exist in FRL in the
same or variant form. It is important to note that the data poisoning attack is a different attack mode between
FL and FRL. In the existing classification tasks of FL, each piece of training data in the dataset corresponds to

50 51 52 53 54 55 56 57 58 59 60