Page 51 - Read Online
P. 51

Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02      Page 46


               private information. Lim et al. [105]  proposes a FRL architecture which allows agents working on independent
               IoT devices to share their learning experiences with each other, and transfer the policy model parameters to
               other agents. The aim is to effectively control multiple IoT devices of the same type but with slightly different
               dynamics. Whenever an agent meets the predefined criteria, its mature model will be shared by the server
               with all other agents in training. The agents continue training based on the shared model until the local model
               converges in the respective environment. The actor-critical proximal policy optimization (Actor-Critic PPO)
               algorithm is integrated into the control of multiple rotary inverted pendulum (RIP) devices. The results show
               that the proposed architecture facilitates the learning process and if more agents participate the learning speed
               can be improved. In addition, Lim et al. [106]  uses FRL architecture based on a multi-agent environment to
               solve the problems and limitations of RL for applications to the real-world problems. The proposed federation
               policy allows multiple agents to share their learning experiences to get better learning efficacy. The proposed
               scheme adopts Actor-Critic PPO algorithm for four types of RL simulation environments from OpenAI Gym
               as well as RIP in real control systems. Compared to a previous real-environment study, the scheme enhances
               learning performance by approximately 1.2 times.


               5.4. FRL for attack detection
               With the heterogeneity of services and the sophistication of threats, it is challenging to detect these attacks
               using traditional methods or centralized ML-based methods, which have a high false alarm rate and do not
               take privacy into account. FRL offers a powerful alternative to detecting attacks and provides support for
               network defense in different scenarios.


               Because of various constraints, IoT applications have become a primary target for malicious adversaries that
               can disruptnormal operationsor steal confidential information. Inorder to address the security issues in flying
               ad-hoc network (FANET), Mowla et al. [107]  proposes an adaptive FRL-based jamming attack defense strategy
               for unmanned aerial vehicles (UAVs). A model-free Q-learning mechanism is developed and deployed on
               distributed UAVs to cooperatively learn detection models for jamming attacks. According to the results, the
               average accuracy of the federated jamming detection mechanism, employed in the proposed defense strategy,
               is 39.9% higher than the distributed mechanism when verified with the CRAWDAD standard and the ns-3
               simulated FANET jamming attack dataset.

               An efficient traffic monitoring framework, known as DeepMonitor, is presented in the study of Nguyen et
               al. [108]  to provide fine-grained traffic analysis capability at the edge of software defined network (SDN) based
               IoT networks. The agents deployed in edge nodes consider the different granularity-level requirements and
               their maximum flow-table capacity to achieve the optimal flow rule match-field strategy. The control optimiza-
               tion problem is formulated as the MDP and a federated DDQN algorithm is developed to improve the learning
               performance of agents. The results show that the proposed monitoring framework can produce reliable traffic
               granularity at all levels of traffic granularity and substantially mitigate the issue of flow-table overflows. In ad-
               dition, the distributed denial of service (DDoS) attack detection performance of an intrusion detection system
               can be enhanced by up to 22.83% by using DeepMonitor instead of FlowStat.


               In order to reduce manufacturing costs and improve production efficiency, the industrial internet of things
               (IIoT) is proposed as a potentially promising research direction. It is a challenge to implement anomaly de-
               tection mechanisms in IIoT applications with data privacy protection. Wang et al. [109]  proposes a reliable
               anomaly detection strategy for IIoT using FRL techniques. In the system framework, there are four entities
               involved in establishing the detection model, i.e., the Global Anomaly Detection Center (GADC), the Local
               Anomaly Detection Center (LADC), the Regional Anomaly Detection Center (RADC), and the users. The
               anomaly detection is suggested to be implemented in two phases, including anomaly detection for RADC and
               users. Especially, the GADC can build global RADC anomaly detection models based on local models trained
               by LADCs. Different from RADC anomaly detection based on action deviations, user anomaly detection is
   46   47   48   49   50   51   52   53   54   55   56