Page 46 - Read Online
P. 46
Corizzo et al. J Surveill Secur Saf 2020;1:140-50 I http://dx.doi.org/10.20517/jsss.2020.15 Page 147
Table 2. Classification performance of extremely randomized trees models with different feature extraction techniques
using different intrusion detection datasets
Dataset Feature extraction Precision Recall F-score Accuracy AUC F-score improvement
technique (Macro) (Macro) (Macro) over baseline (%)
ADFA-LD [16] Bag of System Calls 0.9603 0.9244 0.9414 0.9752 0.9904 2.12%
Subsequence Vector 0.9402 0.9053 0.9218 0.9670 0.9846 /
Word2Vec 0.9376 0.8862 0.9096 0.9626 0.9791 -1.32%
Word2Vec + TF-IDF 0.9246 0.8702 0.8948 0.9568 0.9762 -2.92%
Doc2Vec 0.9006 0.5158 0.4985 0.8783 0.7457 -45.92%
NGIDS-DS [17] Bag of System Calls 0.9689 0.9691 0.9690 0.9690 0.9937 1.38%
Subsequence Vector 0.9557 0.9560 0.9558 0.9558 0.9899 /
Word2Vec 0.9999 0.9999 0.9999 0.9999 0.9999 4.61%
Word2Vec + TF-IDF 1.0000 1.0000 1.0000 1.0000 1.0000 4.62%
Doc2Vec 0.7398 0.6560 0.6289 0.6648 0.7462 -34.20%
WWW2019 [18] Bag of System Calls 0.9568 0.9108 0.9303 0.9457 0.9823 13.68%
Subsequence Vector 0.9830 0.8281 0.8183 0.9476 0.9048 /
Word2Vec 0.9971 0.9929 0.9950 0.9959 0.9999 21.59%
Word2Vec + TF-IDF 0.9990 0.9992 0.9991 0.9999 0.9999 22.09%
Doc2Vec 0.8894 0.6478 0.6662 0.7981 0.7179 -18.58%
Results with three datasets: Australian Defence Force Academy Linux (ADFA-LD), Next-Generation Intrusion Detection System
(NGIDS-DS), and Web Conference 2019 (WWW2019). Best results in terms of macro F-score are marked in bold. TF-IDF: Term-
Frequency and Inverse Document Frequency
Table 3. Training and prediction execution time of the different feature extraction techniques using different intrusion
detection datasets
Dataset Feature extraction technique Training time (min) Prediction time (s)
ADFA-LD [16] Bag of System Calls 0.13 0.25
Subsequence Vector 0.15 0.28
Word2Vec 1.95 0.36
Word2Vec + TF-IDF 80.43 0.45
Doc2Vec 0.88 0.30
NGIDS-DS [17] Bag of System Calls 0.86 0.82
Subsequence Vector 0.88 0.84
Word2Vec 26.03 1.05
Word2Vec + TF-IDF 1060.3 1.32
Doc2Vec 4.05 0.88
WWW2019 [18] Bag of System Calls 1.23 1.18
Subsequence Vector 1.21 1.16
Word2Vec 18 1.45
Word2Vec + TF-IDF 529.3 1.82
Doc2Vec 26.08 1.22
Results with three datasets: Australian Defence Force Academy Linux (ADFA-LD), Next-Generation Intrusion Detection System
(NGIDS-DS), and Web Conference 2019 (WWW2019). TF-IDF: Term-Frequency and Inverse Document Frequency
detection is that represented by system calls processed separately and aggregated using the compositionality
property, rather than the whole trace represented directly as a vector.
In Table 3 we report the average execution times observed with the different feature extraction techniques.
The execution was performed on a workstation equipped with an AMD Ryzen 5 1600 Processor (3200 MHz,
6 cores, 12 logical processors) with 32 GB of DDR4 RAM. The results show that frequency-based methods
appear very efficient, even if they lead to sub-optimal results in terms of accuracy, as discussed before.
More sophisticated feature extraction techniques are computationally more intensive, and in particular
Word2Vec in combination with TF-IDF exhibits the highest execution time among all the techniques tested
in this study. However, the leading time of the method is motivated by the training time of the TF-IDF
dictionary which, in our experiments, is performed from scratch at every execution. In practice, there
is the possibility to reduce this cost drastically by incrementally updating the TF-IDF model. Moreover,
once models are trained and deployed, their prediction time appears similar for all of them, on the order
of milliseconds. We argue that, in a production setting, training models from scratch is not required