Page 48 - Read Online
P. 48

Page 43                             Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02


               federated learning to speed up the convergence rate and reduces communication overhead, while DRL task is
               employed to optimize the cooperative caching policy between RSUs of vehicular networks. Simulation results
               show that the proposed algorithm can achieve a high hit rate, good adaptability and fast convergence in a
               complex environment.


               Apart from caching services, FRL has demonstrated its strong ability to facilitate resource allocation in edge
               computing. InthestudyfromZhuetal. [93] , theauthorsspecificallyfocusonthedataoffloadingtaskformobile
               edge computing (MEC) systems. To achieve joint collaboration, the heterogeneous multi-agent actor-critic (H-
               MAAC) framework is proposed, in which edge devices independently learn the interactive strategies through
               their own observations. The problem is formulated as a multi-agent MDP for modeling edge devices’ data
               allocation strategies, i.e., moving the data, locally executing or offloading to a cloud server. The corresponding
               jointcooperationalgorithmthatcombinestheedgefederatedmodelwiththemulti-agentactor-criticRLisalso
               presented. Dual lightweight neural networks are built, comprising original actor/critic networks and target
               actor/critic networks.


               Blockchain technology has also attracted lot attention from researchers in edge computing fields since it is able
               to provide reliable data management within the massive distributed edge nodes. In the study from Yu et al. [94] ,
               the intelligent ultra-dense edge computing (I-UDEC) framework is proposed, integrating with blockchain and
               RL technologies into 5G ultra-dense edge computing networks. In order to achieve low overhead computation
               offloading decisions and resource allocation strategies, authors design a two-timescale deep reinforcement
               learning (2Ts-DRL) approach, which consists of a fast-timescale and a slow-timescale learning process. The
               target model can be trained in a distributed manner via FL architecture, protecting the privacy of edge devices.


               Additionally, to deal with the different types of optimization tasks, variants of FRL are being studied. Zhu et
               al. [95]  presents a resource allocation method for edge computing systems, called concurrent federated rein-
               forcement learning (CFRL). The edge node continuously receives tasks from serviced IoT devices and stores
               those tasks in a queue. Depending on its own resource allocation status, the node determines the scheduling
               strategy so thatall tasksarecompleted as soonas possible. In case the edgehost does not haveenough available
               resources for the task, the task can be offloaded to the server. Contrary to the definition of the central server
               in the basic FRL, the aim of central server in CFRL is to complete the tasks that the edge nodes cannot handle
               instead of aggregating local models. Therefore, the server needs to train a special resource allocation model
               based on its own resource status, forwarded tasks and unique rewards. The main idea of CFRL is that edge
               nodes and the server cooperatively participate in all task processing in order to reduce total computing time
               and provide a degree of privacy protection.

               5.2. FRL for communication networks
               In parallel with the continuous evolution of communication technology, a number of heterogeneous commu-
               nication systems are also being developed to adapt to different scenarios. Many researchers are also working
               toward intelligent management of communication systems. The traditional ML-based management methods
                                                                                                   [5]
               are often inefficient due to their centralized data processing architecture and the risk of privacy leakage . FRL
               can play an important role in services slicing and access controlling to replace centralized ML methods.


               In communication network services, network function virtualization (NFV) is a critical component of achiev-
               ing scalability and flexibility. Huang et al. [96]  proposes a novel scalable service function chains orchestration
               (SSCO) scheme for NFV-enabled networks via FRL. In the work, a federated-learning-based framework for
               training global learning, along with a time-variant local model exploration, is designed for scalable SFC orches-
               tration. It prevents data sharing among stakeholders and enables quick convergence of the global model. To
               reduce communication costs, SSCO allows the parameters of local models to be updated just at the beginning
               and end of each episode through distributed clients and the cloud server. A DRL approach is used to map
   43   44   45   46   47   48   49   50   51   52   53