Page 48 - Read Online
P. 48
Page 43 Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02
federated learning to speed up the convergence rate and reduces communication overhead, while DRL task is
employed to optimize the cooperative caching policy between RSUs of vehicular networks. Simulation results
show that the proposed algorithm can achieve a high hit rate, good adaptability and fast convergence in a
complex environment.
Apart from caching services, FRL has demonstrated its strong ability to facilitate resource allocation in edge
computing. InthestudyfromZhuetal. [93] , theauthorsspecificallyfocusonthedataoffloadingtaskformobile
edge computing (MEC) systems. To achieve joint collaboration, the heterogeneous multi-agent actor-critic (H-
MAAC) framework is proposed, in which edge devices independently learn the interactive strategies through
their own observations. The problem is formulated as a multi-agent MDP for modeling edge devices’ data
allocation strategies, i.e., moving the data, locally executing or offloading to a cloud server. The corresponding
jointcooperationalgorithmthatcombinestheedgefederatedmodelwiththemulti-agentactor-criticRLisalso
presented. Dual lightweight neural networks are built, comprising original actor/critic networks and target
actor/critic networks.
Blockchain technology has also attracted lot attention from researchers in edge computing fields since it is able
to provide reliable data management within the massive distributed edge nodes. In the study from Yu et al. [94] ,
the intelligent ultra-dense edge computing (I-UDEC) framework is proposed, integrating with blockchain and
RL technologies into 5G ultra-dense edge computing networks. In order to achieve low overhead computation
offloading decisions and resource allocation strategies, authors design a two-timescale deep reinforcement
learning (2Ts-DRL) approach, which consists of a fast-timescale and a slow-timescale learning process. The
target model can be trained in a distributed manner via FL architecture, protecting the privacy of edge devices.
Additionally, to deal with the different types of optimization tasks, variants of FRL are being studied. Zhu et
al. [95] presents a resource allocation method for edge computing systems, called concurrent federated rein-
forcement learning (CFRL). The edge node continuously receives tasks from serviced IoT devices and stores
those tasks in a queue. Depending on its own resource allocation status, the node determines the scheduling
strategy so thatall tasksarecompleted as soonas possible. In case the edgehost does not haveenough available
resources for the task, the task can be offloaded to the server. Contrary to the definition of the central server
in the basic FRL, the aim of central server in CFRL is to complete the tasks that the edge nodes cannot handle
instead of aggregating local models. Therefore, the server needs to train a special resource allocation model
based on its own resource status, forwarded tasks and unique rewards. The main idea of CFRL is that edge
nodes and the server cooperatively participate in all task processing in order to reduce total computing time
and provide a degree of privacy protection.
5.2. FRL for communication networks
In parallel with the continuous evolution of communication technology, a number of heterogeneous commu-
nication systems are also being developed to adapt to different scenarios. Many researchers are also working
toward intelligent management of communication systems. The traditional ML-based management methods
[5]
are often inefficient due to their centralized data processing architecture and the risk of privacy leakage . FRL
can play an important role in services slicing and access controlling to replace centralized ML methods.
In communication network services, network function virtualization (NFV) is a critical component of achiev-
ing scalability and flexibility. Huang et al. [96] proposes a novel scalable service function chains orchestration
(SSCO) scheme for NFV-enabled networks via FRL. In the work, a federated-learning-based framework for
training global learning, along with a time-variant local model exploration, is designed for scalable SFC orches-
tration. It prevents data sharing among stakeholders and enables quick convergence of the global model. To
reduce communication costs, SSCO allows the parameters of local models to be updated just at the beginning
and end of each episode through distributed clients and the cloud server. A DRL approach is used to map