Page 55 - Read Online
P. 55

Qi et al. Intell Robot 2021;1(1):18-57  I      Page 50

               inpartialinformationbyaddingaproximalterm [117] . Thelocalupdatessubmittedbyagentsareconstrainedby
               thetunabletermandhaveadifferenteffectontheglobalparameters. Inaddition, aprobabilisticagentselection
               scheme can be implemented to select the agents whose local FL models have significant effects on the global
               model to minimize the FL convergence time and the FL training loss [118] . Another problem is theoretical
               analysis of the convergence bounds. Although some existing studies have been directed at this problem [119] ,
               the convergence can be guaranteed since the loss function is convex. How to analyze and evaluate the non-
               convex loss functions in HFRL is also an important research topic in the future.

               6.2. Agents without rewards in VFRL
               In most existing works, all the RL agents have the ability to take part in full interaction with the environment
               and can generate their own actions and rewards. Even though some MARL agents may not participate in the
               policy decision, they still generate their own reward for evaluation. In some scenarios, special agents in VFRL
               take the role of providing assistance to other agents. They can only observe the environment and pass on the
               knowledgeoftheirobservation,soastohelpotheragentsmakemoreeffectivedecisions. Therefore,suchagents
               do not have their own actions and rewards. The traditional RL models cannot effectively deal with this thorny
               problem. Manyalgorithmseitherdirectlyusethestatesofsuchagentsaspublicknowledgeinthesystemmodel
               or design corresponding action and reward for such agents, which may be only for convenience of calculation
               and have no practical significance. These approaches cannot fundamentally overcome the challenge, especially
               when privacy protection is also an essential objective to be complied with. Although the FedRL algorithm [65]
               is proposed to deal with the above problem, which has demonstrated good performance, there are still some
               limitations. First of all, the number of agents used in experiments and algorithms is limited to two, which
               means the scalability of this algorithm is not high and VFRL algorithms for a large number of agents need
               to be designed. Secondly, this algorithm uses Q-network as the federated model, which is a relatively simple
               algorithm. Therefore, how to design VFRL models based on other more complex and changeable networks
               remains an open issue.

               6.3. Communications
               In FRL, the agents need to exchange the model parameters, gradients, intermediate results, etc., between them-
               selves or with a central server. Due to the limited communication resources and battery capacity, the commu-
               nication cost is an important consideration when implementing these applications. With an increased number
               of participants, the coordinator has to bear more network workload within the client-server FRL model [120] .
               This is because each participant needs to upload and download model updates through the coordinator. Al-
               though the distributed peer-to-peer model does not require a central coordinator, each agent may have to
               exchange information with other participants more frequently. In current research for distributed models,
               there are no effective model exchange protocols to determine when to share experiences with which agents. In
               addition, DRL involves updating parameters in deep neural networks. Several popular DRL algorithms, such
               as DQN [121]  and DDPG [122] , consist of multiple layers or multiple networks. Model updates contain millions
               of parameters, which isn’t feasible for scenarios with limited communication resources. The research direc-
               tions for the above issues can be divided into three categories. First, it is necessary to design a dynamic update
               mechanism for participants to optimize the number of model exchanges. A second research direction is to use
               model compression algorithms to reduce the amount of communication data. Finally, aggregation algorithms
               that allow participants to only submit the important parts of local model should be studied further.

               6.4. Privacy and Security
               Although FL provides privacy protection that allows the agents to exchange information in a secure manner
               during the learning process, it still has several privacy and security vulnerabilities associated with communica-
               tion and attack [123] . As FRL is implemented based on FL algorithms, these problems also exist in FRL in the
               same or variant form. It is important to note that the data poisoning attack is a different attack mode between
               FL and FRL. In the existing classification tasks of FL, each piece of training data in the dataset corresponds to
   50   51   52   53   54   55   56   57   58   59   60