Page 51 - Read Online

P. 51

Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 46

private information. Lim et al. [105] proposes a FRL architecture which allows agents working on independent
IoT devices to share their learning experiences with each other, and transfer the policy model parameters to
other agents. The aim is to effectively control multiple IoT devices of the same type but with slightly different
dynamics. Whenever an agent meets the predefined criteria, its mature model will be shared by the server
with all other agents in training. The agents continue training based on the shared model until the local model
converges in the respective environment. The actor-critical proximal policy optimization (Actor-Critic PPO)
algorithm is integrated into the control of multiple rotary inverted pendulum (RIP) devices. The results show
that the proposed architecture facilitates the learning process and if more agents participate the learning speed
can be improved. In addition, Lim et al. [106] uses FRL architecture based on a multi-agent environment to
solve the problems and limitations of RL for applications to the real-world problems. The proposed federation
policy allows multiple agents to share their learning experiences to get better learning efficacy. The proposed
scheme adopts Actor-Critic PPO algorithm for four types of RL simulation environments from OpenAI Gym
as well as RIP in real control systems. Compared to a previous real-environment study, the scheme enhances
learning performance by approximately 1.2 times.

5.4. FRL for attack detection
With the heterogeneity of services and the sophistication of threats, it is challenging to detect these attacks
using traditional methods or centralized ML-based methods, which have a high false alarm rate and do not
take privacy into account. FRL offers a powerful alternative to detecting attacks and provides support for
network defense in different scenarios.

Because of various constraints, IoT applications have become a primary target for malicious adversaries that
can disruptnormal operationsor steal confidential information. Inorder to address the security issues in flying
ad-hoc network (FANET), Mowla et al. [107] proposes an adaptive FRL-based jamming attack defense strategy
for unmanned aerial vehicles (UAVs). A model-free Q-learning mechanism is developed and deployed on
distributed UAVs to cooperatively learn detection models for jamming attacks. According to the results, the
average accuracy of the federated jamming detection mechanism, employed in the proposed defense strategy,
is 39.9% higher than the distributed mechanism when verified with the CRAWDAD standard and the ns-3
simulated FANET jamming attack dataset.

An efficient traffic monitoring framework, known as DeepMonitor, is presented in the study of Nguyen et
al. [108] to provide fine-grained traffic analysis capability at the edge of software defined network (SDN) based
IoT networks. The agents deployed in edge nodes consider the different granularity-level requirements and
their maximum flow-table capacity to achieve the optimal flow rule match-field strategy. The control optimiza-
tion problem is formulated as the MDP and a federated DDQN algorithm is developed to improve the learning
performance of agents. The results show that the proposed monitoring framework can produce reliable traffic
granularity at all levels of traffic granularity and substantially mitigate the issue of flow-table overflows. In ad-
dition, the distributed denial of service (DDoS) attack detection performance of an intrusion detection system
can be enhanced by up to 22.83% by using DeepMonitor instead of FlowStat.

In order to reduce manufacturing costs and improve production efficiency, the industrial internet of things
(IIoT) is proposed as a potentially promising research direction. It is a challenge to implement anomaly de-
tection mechanisms in IIoT applications with data privacy protection. Wang et al. [109] proposes a reliable
anomaly detection strategy for IIoT using FRL techniques. In the system framework, there are four entities
involved in establishing the detection model, i.e., the Global Anomaly Detection Center (GADC), the Local
Anomaly Detection Center (LADC), the Regional Anomaly Detection Center (RADC), and the users. The
anomaly detection is suggested to be implemented in two phases, including anomaly detection for RADC and
users. Especially, the GADC can build global RADC anomaly detection models based on local models trained
by LADCs. Different from RADC anomaly detection based on action deviations, user anomaly detection is

46 47 48 49 50 51 52 53 54 55 56