Page 39 - Read Online
P. 39

Corizzo et al. J Surveill Secur Saf 2020;1:140-50            Journal of Surveillance,
               DOI: 10.20517/jsss.2020.15                                        Security and Safety




               Original Article                                                              Open Access


               Feature extraction based on word embedding
               models for intrusion detection in network traffic



               Roberto Corizzo , Eftim Zdravevski , Myles Russell , Andrew Vagliano , Nathalie Japkowicz 1
                                             2
                             1
                                                           1
                                                                           3
               1 Department of Computer Science, American University, Washington, DC 20016, USA.
               2 Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje 1000, North Macedonia.
               3 Department of Computer Science, Northwestern University, Evanston, IL 60208, USA.
               Correspondence to: Dr. Roberto Corizzo, Department of Computer Science, American University, 4400 Massachusetts Avenue
               NW, Washington, DC 20016, USA. E-mail: rcorizzo@american.edu
               How to cite this article: Corizzo R, Zdravevski E, Russell M, Vagliano A, Japkowicz N. Feature extraction based on word embedding
               models for intrusion detection in network traffic. J Surveill Secur Saf 2020;1:140-50. http://dx.doi.org/10.20517/jsss.2020.15
               Received: 30 Apr 2020    First Decision: 15 Jun 2020    Revised: 27 Jun 2020    Accepted: 17 Jul 2020    Available online: 28 Dec 2020

               Academic Editor: Xiaofeng Chen    Copy Editor: Cai-Hong Wang    Production Editor: Jing Yu



               Abstract
               Aim: The analysis of network traffic plays a crucial role in modern organizations since it can provide defense
               mechanisms against cyberattacks. In this context, machine learning algorithms can be fruitfully adopted to identify
               malicious patterns in network sessions. However, they cannot be directly applied to a raw data representation
               of network traffic. An active thread of research focuses on the design and implementation of feature extraction
               techniques that aim at mapping raw data representations of network traffic sessions to a new representation that
               can be processed by machine learning algorithms.

               Methods: In this paper, we propose a feature extraction approach based on word embedding models. The
               proposed approach extracts semantic features characterized by contextual information that is hidden in the raw
               data representation.

               Results: Our experiments conducted on three datasets showed that our feature extraction approach based on word
               embedding models has the potential to increase the classification performance of conventional machine learning
               algorithms that are applied to intrusion detection, and it is competitive with known feature extraction baselines in
               the state-of-the-art.


               Conclusion: This study shows that word embedding models can be used to carry out intrusion detection tasks
               accurately. Feature extraction based on word embedding models requires a higher computational time than
               simpler techniques, but leads to a higher accuracy, which is important for the identification of complex attacks.

                           © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0
                           International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use,
                sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long
                as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
                and indicate if changes were made.


                                                                                                                                                  www.jsssjournal.com
   34   35   36   37   38   39   40   41   42   43   44