https://doi.org/10.31449/inf.v47i6.3714 Informatica 47 (2023) 97–104 97 
Towards an Efficient Approach Using Graph-Based Evolutionary 
Algorithm for IoT Botnet Detection  
Quoc-Dung Ngo
1
, Huy-Trung Nguyen
2
 
1
Posts and Telecommunications Institute of Technology, Hanoi, 10000, Vietnam 
2
People’s Security Academy, Hanoi, 10000, Vietnam  
E-mails: dungnq@ptit.edu.vn, huytrung.nguyen.hvan@gmail.com 
Keywords: IoT botnet, evolutionary algorithm, IoT security, PSI graph 
Received:  September 1, 2021 
In recent years, a large number of Internet of Things devices are used in life, many of which are vulnerable 
to attacks from a security perspective. Botnet malware is one of the main threats to IoT devices. Hence 
detection of IoT botnet is one of the most important challenge for IoT devices. This paper proposes an IoT 
botnet detection approach based on PSI graph data combine with evolutionary algorithm-based 
technique. In recent years, a large number of Internet of Things devices are used in life, many of which 
are vulnerable to attacks from a security perspective. Botnet malware is one of the main threats to IoT 
devices. Hence detection of IoT botnets is one of the most important challenges for IoT devices. In the 
paper, a IoT botnet detection approach based on PSI graph analysis by using the evolutionary algorithm-
based technique. It applies bacterial evolution algorithm (BEA) in the training process on PSI graph 
multi-architecture IoT Botnet data to detect IoT Botnet. The PSI graphs were extracted from executable 
files and transform into vectors to feed into the classical machine learning classifiers. The result of the 
classifiers is then combine using soft voting method with BEA.  The proposed method has achieved good 
experimental results (i.e., Accuracy at 95.30%, F1 at 96.15%). The approach also achieves a relatively 
low false-positive rate at 4.59%. 
Povzetek: Predlagan je pristop za odkrivanje botnetov IoT z uporabo PSI grafov in evolucijskega 
algoritma. 
 
1 Introduction 
The fourth industrial revolution explicitly resulted in the 
boundless growing scale of the Internet of things globally. 
For instance, the number of connected devices was 
forecasted by Statista [1] to reach the milestone of 75.4 
billion in the next 5 years. This means that IoT application 
and devices have been increasing their presence in every 
daily activity. Nevertheless, this popularization has 
exposed myriads of important information security 
matters namely violation of data breach, privacy, etc. In 
these problems, malicious code has emerged in popularity. 
There are several categories of malwares but ransomware 
and botnet are the two types having unique behaviors.  
A botnet is a group of internet-connected devices 
infected by malware that allow cyber-criminals to control 
them. Botnets carry out many malicious behaviors such as 
data theft, unauthorized access, credentials leaks, 
unauthorized access, data theft and distributed denial-of-
service (DDoS) [2], [3]. Along with the immense growing 
of Internet of things application, there have been countless 
number of botnet attacks originated from IoT devices. For 
instance, the legendary DDoS attack that turn half of the  
 
 

Corresponding Author 
 
 
internet down for several hours in 2016 was launched by 
Mirai botnet from about 1.2 million infected devices [4].  
Besides, successors of Mirai known as Reaper and Hajime 
also infect IoT devices then turn them into bots for DDoS 
purposes. 
To alleviate the destruction of IoT botnet attacks, 
security researchers have been frequently examining on 
state-of-art malware detection techniques. There is some 
noticeable effort on fitting rule-based methods in 
analyzing abnormal traffic [5], leveraging machine 
learning based classifier on engineered sets of features 
such as opcodes [6], processor contexts [7], etc. From the 
point of view of a security researcher, malware detection 
technique can be divided into two categories: static and 
dynamic analysis. 
Dynamic analysis [8] requires a separated and 
supervised environment to executing then monitoring the 
suspicious executables to record its footprints including 
system calls, network traffic and register values. The most 
challenging aspect of dynamic analysis is the process of 
designing and constructing an appropriate virtual machine 
that has the capabilities of luring the malware to active all 
98 Informatica 47 (2023) 97–104 Q-D. Ngo et al. 
its characteristic. Furthermore, IoT malware can operate 
on multiple architecture namely SPARC, ARM, MIPS, 
x86, PowerPC. Hence, virtualizing an environment that 
satisfies all the action conditions of the IoT botnet is 
expensive. In other words, the most critical drawback 
when applying dynamic analysis for IoT malware is the 
technical difficulties in building a suitable environment 
for the fullest activation of each malicious samples.  
Contrarily, static analysis [9] leveraged a wide range 
of techniques to identify the malicious characteristic 
without execution. Evaluated features in static analysis 
include printable strings information, grayscale images, 
control flow graph, opcodes, etc. The plus points of this 
method are not only limited to the ability of depicting the 
structure and functionality of multi-arch malware but also 
included the reduction of computational resource since it 
does not require any supervised environment. In addition, 
static analysis ensure the safety of the system as well as 
enforcing the ethical constraints [10] because of the lack 
of sample execution. Although static analysis has its own 
drawbacks in handling obfuscated files, there are many 
proposals to solve this problem with a satisfactory result. 
In brief, static analysis is a feasible solution in detection 
IoT malware [11]. 
In the related study which nominated PSI graph [9] as 
a novel feature in detecting IoT botnet, Nguyen et al. only 
focused on the overall structure of the PSI graph. 
According to the proposed hypothesis, PSI graph contains 
a huge number of executables paths of an executable file, 
including both normal and abnormal paths. However, 
graph exploration is an expensive operation according to 
the number of vertices as well as the interconnection 
between them. Therefore, if it is possible to efficiently 
extract the necessary route which depict the characteristic 
of the original PSI graph, the computational complexity of 
the entire botnet detection process would be greatly 
reduced. 
The paper expands the research results of [9] 
combined with an evolutionary algorithm into the 
ensemble process aimed towards an effective method in 
detecting IoT botnets. In summary, the key contributions 
of this work are: 
(1) Proposing an approach in IoT botnet detection 
model that bases on graph data combine with evolutionary 
algorithm. 
(2) Experimenting the proposed method with large 
IoT Botnet datasets result in higher accuracy than normal 
voting method for ensembling weak learner. 
In addition to the presented content, the rest of the 
paper is structured as follows: Section 2 presents related 
works in the research field; then Section 3 describes in 
detail the proposed method; then describes the empirical 
data set and evaluation criteria; Finally, the analysis and 
evaluation of the experimental results and conclusions.  
2 Related works  
The process of analyzing malware samples can be 
categorized into static and dynamic analysis. In general, 
static analysis can depict the structure and maliciousness 
without the need of executing the malware sample [9]. On 
the other hand, dynamic analysis aims to investigate the 
behavior of a malware by activating its sample in a 
supervised environment [8]. Furthermore, there is a 
combination inherited the advantages of both dynamic and 
static analysis techniques which was known as hybrid 
analysis [12]. 
There is a featured characteristic of IoT botnets which 
known as the diversity of operating architectures such as 
x86, MIPS, ARM, PowerPC [13]. In addition, according 
to the requirements of dynamic analysis method, it would 
be costly to simulate an entire environment of a single 
architecture to perform dynamic analysis techniques. 
Therefore, when it comes to investigate IoT botnets, static 
analysis methods allow researchers to solve multi-
architecture issues and mitigate the limitations of dynamic 
analysis. 
In recent years, the number and complexity as well as 
the notorious level of malwares have been sky-rocketed. 
While signature-based classifier [14] were almost useless 
in detecting novel types of malwares, security researchers 
often leverage Machine Learning algorithms as an 
alternate yet effective solution to deal with unseen 
malwares [15]. Besides, evolutionary algorithms and their 
variants are another considerable technique to deal with 
the rapid mutation of unseen malwares [16], [17], [18], 
[19].  
An overview of general application of evolutionary 
algorithms on rule-based system was described by Shafiq 
et al. in [17]. This comparative study leveraged static 
features from executables then picked five well-known 
evolutionary algorithms including XSC, GAssist-ADI, 
UCS, SLAVE, GAssist-Intervalar and benchmarking 
these against another five non-evolutionary algorithms in 
classifying malicious executables. The experiment dataset 
consisted of 11,786 Window PE in which 1,447 PE were 
benign and 10,339 malicious PE from VH Heavens Virus 
Collection which was later divided into eight major 
classes. The accuracy of these evolutionary-based models 
is promising with the lowest value equaled to 0.95, mostly 
the accuracy of them ranged from 0.98 to 0.99. However, 
by considering all suggested four performance metrics: (1) 
classification accuracy, (2) number of rules, (3) 
comprehensibility of the rules, (4) processing overheads, 
this paper stated that non-evolutionary rule learning 
algorithms clearly outperform evolutionary rule learning 
ones for every performance metrics. Besides, the 
processing costs and comprehension of evolutionary rule 
learning algorithms can be improved by combining some 
concepts from non-evolutionary rule learning algorithms. 
Another combination from Rafique et al. leveraged 
dynamic analysis technique and evolutionary algorithms 
to automatically classify malware families and their 
polymorphic variants [18]. By using protocol-aware 
modeling to handle formal protocol traffic and state-space 
modeling to handle unknown protocol traffic, this solution 
was able to extract features from network behaviors which 
Towards an Efficient Approach Using Graph-Based Evolutionary…   Informatica 47 (2023) 97–104 99 
were collected from PCAP file after executing and 
monitoring malware samples in a supervised environment. 
Next, in the evaluation phase, four evolutionary 
algorithms (GAssist-ADI, SLAVE, UCS, XCS) were 
selected to compare against four old-school non-
evolutionary classifiers (C4.5, C-SVM, kNN, Naïve 
Bayes). The experimental dataset contained 6000 binaries 
of 20 recent malware families, most of them were obtained 
from MALICIA dataset. Obtained results demonstrated 
the poor performance of evolutionary classifiers, except 
UCS, which dominated all the rests with roughly 99.7% of 
accuracy on the entire dataset and 85.28% per malware 
family. Another notable downside of examined 
evolutionary classifiers was the testing time which mostly 
slower than the non-evolutionary candidates. This paper 
presented state-space modeling which was a promising 
technique in extracting unknown protocol network 
behaviors. However, this approach still needs to be 
examined further and compared to others network feature 
extractors. In addition, the applied evolutionary 
algorithms in this research were used without either any 
modifications or improvements from their original 
proposal. 
A noticeable research of Manavi et al. [16] took 
advantages of static analysis technique to extract OpCodes 
from executables then utilized an evolutionary-based 
classifier to detect malicious samples according to a 
predefined list of 9 malware families. In this work, after 
the disassembling phase, a graph of OpCode was 
constructed for the executable file. Then the proposed 
evolutionary classifier would create the most similar graph 
to the target. Finally, by applying the Euclidean distance 
fitness function, the most similar graph of the results 
would determine the maliciousness of the sample. The 
experimental dataset of this research was quite diversity 
since it included 3 sub datasets: 1600 malwares and 1600 
benigns from VX Heaven’s dataset, 4000 apks with the 
ratio of 50-50 between benign-malware from Drebin 
dataset, 2042 samples including 9 different malware 
families from Microsoft Kaggle malware classification 
challenge.  
In the first two dataset, the experimented results of the 
proposed method were as good as the related study of 
Hashemi et al. [20] and Santos et al. [21] which considered 
OpCode as a feature. Besides, in the third dataset of 
Microsoft, the evolutionary classifier outperformed the 
other but the accuracy was limited to 87.67%. 
Nevertheless, this research took advantages of static 
feature but did not suggest any in-depth solutions to deal 
with obfuscated malwares. In addition, the runtime 
analysis of the proposed evolutionary classifier was 
omitted. Last but not least, although the dataset was quite 
varied, it was still lack of botnet, especially IoT botnet. 
An efficient complement between genetic algorithms 
and neural nets called Genetic Neural Network - GNN in 
botnet detection was proposed in [22], this paper 
combined the genetic algorithm's significant global search 
capabilities with the precise local search factor of the 
backpropagation to provide forward neural nets to 
improve the initial weight of the neural nets. The 
performance of the proposed GNN with 7 extracted 
features from network flow data proved that GNN was a 
promising model with better accuracy (95.7%) than either 
back propagation neural nets or genetic algorithm. 
However, this work did not specify either any 
deterministic method for feature selection or any 
description of the experimental dataset. 
Nevertheless, to the best of our knowledge, there have 
not been any proposed researchs that aim to detect IoT 
botnet leverage the evolutionary algorithm and the novel 
PSI graph [9]  as a feature.   
3 Methodology 
We enhanced the performance of weak classifiers in 
dectecting IoT Botnet based on PSI-graphs generated from 
ELF files by apply the bacterial evolutionary algortihm in 
the ensemble process of these classifers. This section will 
explain our approach in detail including psi graphs 
extraction process and the performance of evolutionary 
voting process in detecting IoT botnet on these graphs. 
3.1 Overview 
The main components of our method are presented in 
figure 1. There are 3 main processes in our method: 
extracting PSI graphs from ELF files, training weak 
classifiers and applying bacterial evolutionary algorithm 
in the ensemble process of weak classifiers. 
100 Informatica 47 (2023) 97–104 Q-D. Ngo et al. 
 
Figure 1: The overview of proposed method. 
Firstly, we execute the ELF files of malware and 
benign samples to generate PSI graph from these files. 
After that, we preprocess the graph using graph2vec [23] 
algorithm embedding vector of similar structured graphs 
in near feature space. After that, we use classical machine 
learning classifiers to classify the graph vectors generated 
from graph2vec. We then perform different voting 
strategies for the ensemble process of weak classifiers 
including hard voting and soft voting. The bacterial 
evolutionary algorithm is applied in the soft voting phase 
to improve voting process accuracy. Finally, we compare 
the classification result of each classifier and ensemble 
method to estimate decide whether the method is effective 
or not. 
3.2 PSI graph extraction 
Printable String Information (PSI) is a set of string usually 
contain important sematic information that can reflect the 
attacker’s intent. PSI was used in static analysis method to 
identify ELF malware files. In this research, the author 
doesn’t give enough attention to the linkages of the PSI 
element which give more information about the context 
and could greatly improve the result. In our work, we 
collected our PSI graph dataset generated by Nguyen et al. 
[9] from our previous research and inherited the way to 
represent IoT executable file with PSI graph. 
Definition 1: PSI graph is a directed graph defined as 
𝐺 ( 𝐸 , 𝑉 ), where 𝑉 is a set of vertices called PSI elements 
and 𝐸 is a set of edges which represents for function calls. 
 
Figure 2: An example of PSI graph. 
3.3 Traing weak classifiers 
After obtaining PSI-Graph, we have to convert the graph 
data into input for machine learning classfiers. Using 
graph2vec algorithm, we turn our PSI-graph data into 
vectors where graph with similar structure are embedded 
in near feature space. Then, we standardize the feature 
vector for better converging process by scale down feature 
Towards an Efficient Approach Using Graph-Based Evolutionary…   Informatica 47 (2023) 97–104 101 
that have large value to make all feature stay in the same 
range of value. The standardize process is applied using 
the formular: 
𝑥 𝑠𝑡 𝑎 𝑛 𝑑𝑎 𝑟 𝑑𝑖 𝑧 𝑒 𝑑 =
𝑥 − 𝜇 𝜎 
 
where 𝜇 , 𝜎 is the mean and the standard deviation of 
original data, respectively. 
We then feed the standardize graph vectors data into 
classical machine learning classifiers for classification 
process to detect IoT Botnet samples. K-nearest neighbor 
(KNN), Support Vector Machine (SVM), Logistic 
Regression, Gradient Boosting and Random Forest are the 
chosen machine learning algorithm for individual 
classifiers. The classifiers are well known for their 
effectiveness in classificatin problem and have been used 
by many researchers for the intrusion detection problem. 
In the training phase, we use use k-fold cross-validation 
combining with hyperparameter tuning with a grid of 
parameter values. The model with best hyperparameters is 
trained and tested on testing set to evaluate the 
performance. To combine the prediction from different 
classifiers, an ensemble method is required. Here we used 
2 different voting method: hard voting and soft voting. 
Hard voting is combining all the prediction of the 
classifiers and made the final prediction base on the 
majority of the vote while soft voting calculate the 
prediction probability of each classifer’s prediction and 
the final prediction is the largest summed probability from 
classifiers. Hard voting method is pretty straight forward. 
Ensemble method usually result in higher accuracy for 
classification task because it concludes the prediction 
from each involved classifier, that’s why individual 
classifiers are referred to as “weak classifiers”.  
3.4 Bacterial evolutionary algorithm 
Bacterial evolutionary algorithm (BEA) is a kind of 
evolutionary algorithm base on bacteroa, and its properties 
are similar to those of the GA’s (Genetic Algorithm): it is 
also a global optimization technique, and provides a near-
optimal, approximate solution for the problem. It is useful 
even if the objective function is non-linear, non-
continuous, multimodal or high-dimensional. BEA does 
not use the derivatives of the objective function, thus it 
does not cause a problem, if they are not known or do not 
exist [24]. 
In our approach, the BEA algorithm is used in the 
ensemble phase to improve the soft voting strategy and 
was depicted in figure 1.  The BEA algorithm has 3 main 
steps: generate population; clone, mutate and select; gene 
transfer. The details about BEA algorith in our approach 
is descibe as follow: 
Generate population: we create the initial population 
of the algorithm with number of population N_POP = 100. 
Each chomosome in the population is one bacteria which 
contains N_GENES = 5, these 5 genes represent the 
weights of 5 weak classifiers in the soft voting proces. 
Genes have the value in the range of Gauss destribution 
with mean value = 1 and standard deviation value = 0.2. 
The use of Gauss destribution will make the weight 
contain value with the range close to 1. This avoid the 
situation when maybe one classifier has very large weights 
and the others one is too small for comparison.  
Clone, mutate and select: In the beginning of 1 
generation, each bacteria create 20 clones of itself 
N_CLONE = 20. At a given time, one random gene is 
selected from all the clones and these clones will mutate 
by changing the chosen gene into a random value that 
belong the distribution mentioned above. After that, we 
calculate the fitness score of each clone from the average 
accuracy in 10-fold on training set with the weight of the 
clone. If the clone has higher fitness score than the original 
then it will be selected to replace all the other clonel. The 
mutating process repeat 10 times N_MUTATE = 10 which 
guarantee that all the gene will be mutated for N_GENE = 
5. 
Gene transfer: After the mutating process, all the 
bacteria are sorted by fitness score. The population is 
seperated in 2 halves. We then select 2 random bacteria, 
one for the upper half and one for the lower half. One or 
several random gene from upper half bacteria will be 
copied to the lower half bacteria. The population is then 
reorganized for all the lower half bacteria will have the 
chance to join the upper half and the upper half will always 
contain quality bacteria. This process repeats 50 times 
N_TRANS = 50. This is the end of a generation. 
Cloning and gene transfer process is repeated in 10 
generation N_GENERATIONS = 10. When we perform 
the experiment, we realize the algorithm has fast 
convergence rate so we don’t need any further local 
optimization algorithm (memetic algorithm). 
4 Experimental and evaluation   
This section gives the information about our experimental 
enviroment and results, the evaluation metrics, dataset 
used and discussion. 
4.1 Dataset description 
We inherit the PSI graph dataset from previous researches 
on PSI graph. This dataset consists of 10010 PSI graph 
samples with fairly balance botnet and benign samples 
including 3845 IoT botnet samples and 6165 benign 
samples. IoT botnet samples belong to two typical botnet 
families which are Gagyft and Mirai and other less popular 
malware such as Tsunami, Aida, as shown in figure 3. 
 
102 Informatica 47 (2023) 97–104 Q-D. Ngo et al. 
 
Figure 3: Distribution of the botnet sample in the dataset. 
Samples from the dataset come from multiple CPU 
architectures including ARM, MIPS, Intel 80386, x86-64, 
PowerPC, Motorola, Spark, and SuperH. The number of 
IoT botnet belong to each CPU architecture is describe in 
figure 4. 
 
Figure 4: Number of botnet in each CPU architecture in 
the dataset. 
The following configuration was used when we 
conduct the experiment: Ubuntu 16.04LTS 64-bit, Intel 
Xeon, 8Gb RAM. The experiment is built in Python 
language. 
4.2 Evaluation metric and results 
The following terms are used to evaluate the effectiveness 
of the proposed method. 
- True positive (TP): the number of malicious samples 
that are properly recognised 
- True negative (TN) is the number of benign programs 
that are correctly recognised 
- False positive (FP) is the number of benign programs 
that are incorrectly identified 
- False negative (FN) is the number of malicious 
programs that are incorrectly 
The following metrics are used to evaluate the 
precision-efficiency of the proposed method: 
- True positive rate (TPR) or Sensitivity, Recall is the 
number of predicted malware samples correctly classified 
as malicious divided by total malware. This metric shows 
the probability of detecting malware samples.  
 
𝑇𝑃𝑅 =
𝑇𝑃
𝑃 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
 
 
- False positive rate (FPR) or Fall-out: the number of 
predicted benign samples falsely marked as malicious 
divided by total benign samples. This metric shows the 
probability of false alarm. 
 
𝐹 𝑃 𝑅 =
𝐹𝑃
𝑁 =
𝐹𝑃
𝐹𝑃 + 𝑇𝑁
 
 
- Accuracy (ACC): the ratio of the number of corrected 
samples to the number of both malware and benign 
samples. However, accuracy is not trustful in imbalanced 
dataset. 
𝐴𝐶𝐶 =
𝑇𝑃 + 𝑇𝑁
𝑃 + 𝑁 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
 
 
- F1-score is the harmonic mean of Precision and 
Recall (TPR). F1-score is a combining metric to estimate 
the entire model performance and is defined as follow: 
 
𝐹 1
= 2
𝑃 𝑟 𝑒𝑐 𝑖 𝑠 𝑖 𝑜𝑛 . 𝑅𝑒𝑐𝑎 𝑙𝑙 𝑃 𝑟 𝑒𝑐 𝑖 𝑠 𝑖 𝑜𝑛 + 𝑅𝑒𝑐𝑎 𝑙𝑙 
 
We ran the experiment training weak classifer, perform 
ensemble process and improve the ensemble process using 
bacterial evolutionary algorithm, as shown in table 1. 
 
Table 1. Experimental results of the proposed method with different classifiers. 
Classifier 
Accuracy 
(%) 
Precision 
(%) 
Recall 
(%) 
F1 
(%) 
FPR 
(%) 
Average 10-fold CV 
accuracy (on 
training set) 
Best weak 
estimator 
(KNN) 
94.54 95.17 96.00 95.58 7.80 94.25 
Hard voting 95.07 96.96 94.97 95.96 4.77 - 
Soft voting 
(Equal weights) 
95.14 96.76 95.29 96.02 5.11 94.82 
Soft voting 
(BAE) 
95.30 97.08 95.24 96.15 4.59 95.08 
Towards an Efficient Approach Using Graph-Based Evolutionary…   Informatica 47 (2023) 97–104 103 
 
Figure 5: Bacterial evolution algorithm training accuracy 
for soft voting. 
The result show that the best individual classifer 
achieve 94.54 % accuracy in detecting IoT botnet is KNN. 
The reason KNN can achieve the highest classification 
rate among others classifiers is when converting graph 
data into vector we used graph2vec. In graph2vec, graphs 
with similar structure usually have vectors embedded near 
each other’s therefore the KNN algorithm can group these 
graphs more easily which result in higher classification 
rate. The result is also showing the soft voting process 
after the BEA algorithm perform better than normal hard 
voting and soft voting method with high accuracy of 
95.30% accuracy and 4.59% FPR. Figure 5 also show that 
applying evolutionary-based BEA in soft voting process 
does increase the overall performance of the model. 
The author in [16] also represent malware as graph 
using opCode graph and evolutionary algorithm for 
classification process. The result from our study produce 
significantly higher detection rate than the work introduce 
in Manavi et al. [16] (95.30% compare to 85.8% ~ 
87.67%). Haddadpajouh et al. [25] used deep recurrent 
neural network to classify ARM-based IoT Botnet.Our 
results reach equivalent accuracy with the research in [25] 
(94% accuracy), but in their research they used smaller 
dataset and only focus on ARM-based IoT Botnet. The 
same thing can be said when compare with study by Su et 
al. [26] using malware image and CNN (94% accuracy). 
The result has shown that applying evolutionary algorithm 
in the process of training on PSI graph data make could 
improve the process of detecting IoT Botnet. 
5 Conclusion and future works 
In this research, we apply bacterial evolution 
algorithm (BEA) in the training process on PSI graph 
multi-architecture IoT Botnet data to detect IoT Botnet. 
The PSI graphs were extracted from executable files and 
transform into vectors to feed into the classical machine 
learning classifiers. The result of the classifiers is then 
combine using soft voting method with BEA. The result 
show that our method has achieved higher accuracy to the 
other research using the graph as input while perfoming 
on much larger dataset. In the future, we hope to improve 
our graph method and some modification to the algorithm 
to achieve higher accuracy for the model. 
 
References 
[1] Statista Research Department., “Internet of Things‐
Number of connected devices worldwide 2015‐
2025,” 2019. 
https://www.statista.com/statistics/471264/iot-
number-of-connected-devices-worldwide/ 
[2] “Al-Hadhrami, Y. and Hussain, F.K., 2021. DDoS 
attacks in IoT networks: a comprehensive 
systematic literature review. World Wide Web, 
24(3), pp.971-1001.” [Online]. Available: 
https://doi.org/10.1007/s11280-020-00855-2 
[3] Sérgio S.C. Silva  , Rodrigo M.P. Silva , Raquel 
C.G. Pinto , Ronaldo M. Salles, “Botnets: A 
survey,” J. Comput. Netw. Elsevier, vol. 57, no. 2, 
pp. 378–403, 2013. [Online]. Available: 
https://doi.org/10.1016/j.comnet.2012.07.021 
[4] Bertino, E. and Islam, N., “Botnets and internet of 
things security,” Computer, vol. 50, no. 2, pp. 76–
79, 2017. [Online]. Available: 
https://doi.org/10.1109/mc.2017.62 
[5] “Ozawa, S., Ban, T., Hashimoto, N., Nakazato, J. 
and Shimamura, J., 2020. A study of IoT malware 
activities using association rule learning for darknet 
sensor data. International Journal of Information 
Security, 19(1), pp.83-92.”. [Online]. Available: 
https://doi.org/10.1007/s10207-019-00439-w 
[6] “Peters, W., Dehghantanha, A., Parizi, R.M. and 
Srivastava, G., 2020. A comparison of state-of-the-
art machine learning models for OpCode-based IoT 
malware detection. In Handbook of Big Data 
Privacy (pp. 109-120). Springer, Cham.”. [Online]. 
Available: https://doi.org/10.1007/978-3-030-
38557-6_6  
[7] “Takase, H., Kobayashi, R., Kato, M. and Ohmura, 
R., 2020. A prototype implementation and 
evaluation of the malware detection mechanism for 
IoT devices using the processor information. 
International Journal of Information Security, 
19(1), pp.71-81.”. [Online]. Available: 
https://doi.org/10.1007/s10207-019-00437-y 
[8] Le, H.V. and Ngo, Q.D., “V-Sandbox for Dynamic 
Analysis IoT Botnet,” IEEE Access, vol. 8, pp. 
145768–145786, 2020. [Online]. Available: 
https://doi.org/10.1109/access.2020.3014891 
[9] Nguyen, H.T., Ngo, Q.D. and Le, V.H., ., “A novel 
graph-based approach for IoT botnet detection,” Int. 
J. Inf. Secur., vol. 19, no. 5, pp. 567–577, 2020. 
[Online]. Available: https://doi.org/10.1007 
/s10207-019-00475-6  
[10] Ma, W., Duan, P., Liu, S., Gu, G. and Liu, J.C., 
“Shadow attacks: automatically evading system-
call-behavior based malware detection,” J. Comput. 
Virol., vol. 8, no. 1, pp. 1–13, 2012. [Online]. 
104 Informatica 47 (2023) 97–104 Q-D. Ngo et al. 
Available: https://doi.org/10.1007/s11416-011-015 
7-5 
[11] “Quoc-Dung Ngo, Huy-Trung Nguyen, et al., A 
survey of IoT malware and detection methods based 
on static features, ICT Express, Volume 6, Issue 4, 
pp. 280-286, 2020.” . [Online]. Available: 
https://doi.org/10.1016/j.icte.2020.04.005 
[12] “Ngo, Q.D., Nguyen, H.T., Tran, H.A. and Nguyen, 
D.H., 2021, January. IoT Botnet detection based on 
the integration of static and dynamic vector 
features. In 2020 IEEE Eighth International 
Conference on Communications and Electronics 
(ICCE) (pp. 540-545). IEEE.” . [Online]. Available: 
https://doi.org/10.1109/icce48956.2021.9352145 
[13] “Xiao, L., Wan, X., Lu, X., Zhang, Y. and Wu, D., 
2018. IoT security techniques based on machine 
learning: How do IoT devices use AI to enhance 
security?. IEEE Signal Processing Magazine, 35(5), 
pp.41-49.” . [Online]. Available: https://doi.org/10. 
1109/msp.2018.2825478  
[14] “Borello, J.M. and Mé, L., 2008. Code obfuscation 
techniques for metamorphic viruses. Journal in 
Computer Virology, 4(3), pp.211-220.” . [Online]. 
Available: https://doi.org/10.1007/s11416-008-008 
4-2 
[15] “Souri, A. and Hosseini, R., 2018. A state-of-the-art 
survey of malware detection approaches using data 
mining techniques. Human-centric Computing and 
Information Sciences, 8(1), pp.1-22.” . [Online]. 
Available: https://doi.org/10.1186/s13673-018-0125-x 
[16] Manavi, F. and Hamzeh, A., “A new approach for 
malware detection based on evolutionary 
algorithm,” 2019, pp. 1619–1624. [Online]. 
Available: https://doi.org/10.1145/3319619.3326811 
[17] Shafiq, M.Z., Tabish, S.M. and Farooq, M., “On the 
appropriateness of evolutionary rule learning 
algorithms for malware detection,” 2009, pp. 2609–
2616. [Online]. Available: 
https://doi.org/10.1145/1570256.1570370 
[18] Rafique, M.Z., Chen, P., Huygens, C. and Joosen, 
W., “Evolutionary algorithms for classification of 
malware families through different network 
behaviors,” 2014, pp. 1167–1174. [Online]. 
Available: https://doi.org/10.1145/2576768.2598238 
[19] “Lysenko, S., Bobrovnikova, K., Shchuka, R. and 
Savenko, O., 2020, May. A cyberattacks detection 
technique based on evolutionary algorithms. In 
2020 IEEE 11th International Conference on 
Dependable Systems, Services and Technologies 
(DESSERT) (pp. 127-132). IEEE.”.  [Online]. 
Available: https://doi.org/10.1109/dessert50317.2020. 
9125016 
[20] “Hashemi, H., Azmoodeh, A., Hamzeh, A. and 
Hashemi, S., 2017. Graph embedding as a new 
approach for unknown malware detection. Journal 
of Computer Virology and Hacking Techniques, 
13(3), pp.153-166.”. [Online]. Available: 
https://doi.org/10.1007/s11416-016-0278-y 
[21] Santos, I., Brezo, F., Nieves, J., Penya, Y.K., Sanz, 
B., Laorden, C. and Bringas, P.G., “Idea: Opcode-
sequence-based malware detection,” 2010, pp. 35–
43. [Online]. Available: https://doi.org/10.1007/ 
978-3-642-11747-3_3 
[22] Yin, C., Awlla, A.H., Yin, Z. and Wang, J., “Botnet 
detection based on genetic neural network,” Int. J. 
Secur. Its Appl., vol. 9, no. 11, pp. 97–104, 2015. 
[Online]. Available: https://doi.org/10.14257/ijsia. 
2015.9.11.10 
[23] A. Narayanan, M. Chandramohan, R. Venkatesan, 
L. Chen, Y. Liu, and S. Jaiswal, “graph2vec: 
Learning distributed representations of graphs,” 
ArXiv Prepr. ArXiv170705005, 2017. 
[24] F. Hatwágner and A. Horváth, “Maintaining genetic 
diversity in bacterial evolutionary algorithm,” Ann. 
Univ Sci Bp. Sec Comp, vol. 37, pp. 175–194, 2012. 
[25] H. HaddadPajouh, A. Dehghantanha, R. Khayami, 
and K.-K. R. Choo, “A deep recurrent neural 
network based approach for internet of things 
malware threat hunting,” Future Gener. Comput. 
Syst., vol. 85, pp. 88–96, 2018. [Online]. Available: 
https://doi.org/10.1016/j.future.2018.03.007 
[26] J. Su, D. V. Vasconcellos, S. Prasad, D. Sgandurra, 
Y. Feng, and K. Sakurai, “Lightweight 
classification of IoT malware based on image 
recognition,” 2018, vol. 2, pp. 664–669. [Online]. 
Available: https://doi.org/10.1109/COMPSAC. 
2018.10315