https://doi.org/10.31449/inf.v48i14.5998 Informatica 48 (2024) 83–96 83 
Detecting and Tracking Rumours in Social Media Based on Deep 
Learning Algorithm 
1,2
Chunyan Han
*
, 
2
Ling Lin
 
1
School of Journalism and Communication, Shandong University, Jinan, Shandong 251000, China 
2
Department of Scientific Research, Changji University, Changji, Xinjiang 831100, China 
E-mail: 13415980058@163.com  
*Corresponding author 
Keywords: deep learning algorithms, detection and tracking, social media, rumours  
Received: April 9, 2024
Online rumours have become more widespread and have a wider impact. In the era of social media, more 
and more network users take photos of themselves or others, actively express their views and opinions, 
and even interact and communicate with others, which forms online public opinion. The automatic 
detection technology of rumours can purify the network environment and avoid chaos or turbulence in 
society. Therefore, this paper proposes a deep-learning algorithm to detect and track rumours in social 
media. After eliminating some feature types for the real-time detection of data flow, when the time index 
reaches 40, the real-time accuracy of the data mining algorithm is 57.23%, that of the ant colony 
algorithm is 53.45%. The effectiveness of this method is also confirmed. Break through the limitations of 
traditional skills under the deep learning algorithm, using traditional painting and involving devices, 
ready-made materials, popular symbols, digital technology and other means. 
Povzetek: Razvit je nov algoritem globokega učenja za zaznavanje in sledenje govoricam na družbenih 
omrežjih. 
 
1 Introduction 
Rumour is an important phenomenon of information 
dissemination in human society. It has been one of the 
focus of interest in the field of social psychology, and 
Journalism and communication for decades. Online 
rumours have become more widespread and have a wider 
impact. In the era of social media, more and more network 
users reprint or share photos, videos and published text 
messages taken by themselves or others on the Internet, 
actively express their views and opinions, and even 
interact and communicate with others, which forms online 
public opinion [1-2]. While providing sufficient 
information, social media has become a hotbed for 
rumours. Some scholars believe that social media provides 
a platform for public communication and allows rumours 
to disintegrate into discussion and debate [3]. Whether the 
spread of rumours is the illusion of excess information or 
the degradation of self-immunity, whether social media 
contributes to the spread of rumours or strengthens the 
self-purification function of the network has become a 
difficult problem for scholars [4-5]. As the non-disclosure 
or untimely disclosure of information, the lack of attention 
or trust of the government, opinion leaders and network 
promoters, network public opinion is very easy to induce 
all kinds of network rumours. The release and 
dissemination of information on social media are 
extremely convenient. The classification goal is four 
classification tasks. Each text is divided into support, 
opposition, doubt and comment, referred to as sdqc task 
for short. At the same time, the Sina Weibo platform has  
 
 
also opened a Weibo Biyao account to regularly release 
confirmed rumours for users to browse [6]. 
Automatic rumour detection technology has 
important research value and social significance for 
purifying the network environment, avoiding social chaos 
or turbulence and eliminating threats to the country. 
Therefore, this paper proposes a deep-learning algorithm 
to detect and track rumours in social media [7]. With the 
continuous development and maturity of deep learning 
algorithms, it has been successfully applied in different 
fields. It has shown excellent performance, such as 
emotional information extraction in text mining tasks. It 
can predict whether the input rumour data is true or false 
in combination with auxiliary information. It can also be 
modelled as a three-classification problem, that is, true, 
false and unverifiable. Therefore, it is very necessary to 
rely on a computer for automatic rumour detection, and 
the rumour detection model is applied to this scenario for 
early rumour detection to effectively curb the spread of 
rumour [8-9]. The deep learning algorithm can also 
capture the complex relationship of the data itself from 
many data sources and map the learned different feature 
vectors to the same implicit space to obtain a unified data 
representation. On this basis, the deep feature 
representation of data is fused into the traditional 
recommendation algorithm to effectively fuse multi-
source. 
The deep learning algorithm assumes that users' social 
communication process in social networks will affect the 
modelling of users' preferences. According to the 
characteristics that graph convolution networks can model 
84 Informatica 48 (2024) 83–96 C. Han et al.  
 
graph structure and node information, this paper uses a 
convolution neural network [10]. Traditional methods 
often find it difficult to handle situations lacking historical 
data. However, deep learning models can make reasonable 
recommendations for new users or products by learning 
the common features of users and products. In addition, 
some deep learning techniques, such as transfer learning 
and meta learning, can also transfer knowledge from other 
tasks or fields to recommendation systems to alleviate 
cold start problems. With the rapid development of 
platforms such as e-commerce and social media, the 
number of users and products is constantly increasing, and 
the scale of data is also becoming larger and larger. 
Traditional methods often have low efficiency in 
processing these large-scale data. However, deep learning 
models, especially distributed deep learning frameworks, 
can efficiently handle these dynamic and large-scale data. 
Through technologies such as parallel computing and 
distributed storage, deep learning models can complete 
training in a short period of time and update 
recommendation results in real-time. According to 
different research ideas of rumour detection, early work 
usually builds classifiers based on different types of 
manual features through supervised learning [11-12]. The 
reading, dissemination and expression of information 
become very easy. The audience can pay attention to and 
forward the news they are interested in. In addition, due to 
the low threshold and low cost of social media, the role of 
the "gatekeeper" of the media is weakened, and it is 
difficult to accurately and quickly work. In terms of 
techniques, we break through the limitations of traditional 
skills under the deep learning algorithm, using traditional 
painting and involving devices, finished materials, 
popular symbols, digital technology and other means [13]. 
I put forward the following innovations in this paper: 
(1) The LSTM model is constructed in this paper. In 
the LSTM model, three gates are set up to determine 
whether the input value of the upper layer is important 
enough to be remembered and output. Each gate is 
controlled by a Sigmoid function unit, in which if the 
value generated. 
(2) The rumour detection is discussed and described. 
According to the different characteristics of rumour data, 
rumour detection is divided into two sub-tasks: rumour 
position classification and rumour authenticity prediction. 
Among them, the rumour position classification task is 
oriented to the tree-structured data set, and its general 
structure is a source text and different users' replies to this 
text. The classification goal is four classification tasks, 
and each text is divided into support, opposition, 
questioning and comment, which is referred to as SDQC 
task for short. At the same time, the Sina Weibo platform 
has also opened an account on Weibo Biyao and regularly 
publishes proven rumours for users to browse. 
The second chapter mainly describes the research 
status of wagging words at home and abroad, this paper's 
research work, and its significance. The third chapter 
discusses the realization of the deep learning model. The 
fourth chapter studies the data set and makes experiments 
and analyses on rumour detection in social media. The 
fifth chapter is a summary of the full text. 
2 Related work 
2.1 Research status of waving words at 
home and abroad 
Since the "school fever", people have never stopped 
studying social media. Most researchers summarize its 
characteristics from five angles: First, analyze the cultural 
oscillation caused by social media from a cultural 
perspective. The rise of new media will inevitably lead to 
new media culture. As a new branch of popular culture, 
the Internet influences the lifestyle and values of current 
netizens. 
The model considers the daily cycle of rumours, such 
as holidays, and the impact of external shock cycles, such 
as the quadrennial general election, and shows that 
rumours are likely to fluctuate with the time cycle [14]. 
Aieha B, et al. Put forward that the Internet will make 
rumours spread even more powerful: Professor Chao 
Naipeng used various methods such as psychology and 
sociology to demonstrate the propagation mode and 
characteristics of Internet rumours in the study of rumour 
phenomenon in Internet communication and pointed out 
that "Internet rumours are more harmful and more difficult 
to control than oral rumours" [15]. Wu L, et al. The 
proposed automatic rumour detection for social media 
aims to use the relevant information of suspected rumours, 
such as text content, comment information, forwarding 
mode, publisher's personal data, etc., to identify whether 
the message published on social media is a rumour [16]. 
Chua et al. 
The elaboration of the value and communication 
characteristics of social networks pointed out that people's 
use of social network services is a social communication 
demand, and the "weak connection" can enable people to 
obtain the valuable social capital they need [17]. Bai n, et 
al. Proposed to analyze and detect rumours for Chinese 
Sina Weibo. The article pointed out that at that time, the 
number of users of Sina Weibo was eight times that of 
Twitter, and it had many functions different from Twitter. 
One of them was that Sina Weibo had an official rumour 
reporting and publishing platform [18]. Wang Z, et al. 
Proposed to explore the characteristics and dissemination 
subjects of social media rumours to scientifically deal with 
the generation and dissemination of online rumours, 
which is important content for the state and the 
government to control online public opinion [19]. LAN y, 
et al. Put forward the relationship formula of rumour 
intensity: rumour intensity = rumour importance * rumour 
ambiguity. This formula was later revised by Koros as 
rumour intensity = rumour importance * rumour 
ambiguity / public criticism. Later, French scholar 
Kapfrey and American scholar CASS Sanstein studied and 
explained rumours from the perspective of sociology and 
politics [20]. Srinivasan s et al. 
Pointed out that widely spread rumours are often 
confusing, and social media users cannot effectively 
identify rumours due to professional knowledge or time 
and space constraints. Moreover, the scale of social media 
information is huge, and it is impossible to invite experts 
Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 85 
 
to identify rumours exhaustively. Many news 
organizations and social media service providers are 
trying to build a rumour reporting platform [21]. Sheng L, 
et al. Proposed a rumour detection algorithm for Twitter 
research. Its main idea is to select comments containing 
common sense knowledge and investigative news and 
assume that they are disputes over the authenticity of 
relevant information to judge whether the relevant 
information is a rumour [22]. Pinheiro a, et al. Proposed 
to reproduce the social media communication model from 
the perspective of Neo Confucianism. Such research 
requires professional science and engineering knowledge 
and mastery of network topology. Restoring network 
communication modes helps us understand various 
complex communication behaviours in social media [23]. 
 
Table 1: Summary table 
Researchers Method/Model Data source Feature Evaluation 
indicators 
Contribution/Discovery 
Aieha B, et al Psychology and 
Sociological 
Methods 
Online 
dissemination 
Rumor 
dissemination 
patterns and 
characteristics 
Not 
specifically 
mentioned 
Online rumors are more 
harmful and difficult to 
control than verbal 
rumors 
Wu L et al Automatic 
detection of 
social media 
rumors 
Social media Text content, 
comments, 
forwarding 
mode, publisher 
data 
Accuracy, 
recall rate, F1 
score 
(expected) 
Using multi-source 
information to identify 
rumors 
Chua et al Social network 
value 
Social network 
services 
"Weakly 
connected" 
social capital 
Not 
specifically 
mentioned 
Social network services 
meet the needs of social 
communication 
Bai et al Analysis of 
Rumors on Sina 
Weibo 
Sina Weibo Official rumor 
reporting 
platform 
Not 
specifically 
mentioned 
The Rumor 
Characteristics of Sina 
Weibo and Its Official 
Platform 
Wang Z et al Characteristics 
of social media 
rumors 
Social media Characteristics 
of Rumors and 
Disseminators 
Not 
specifically 
mentioned 
The Importance of 
Scientific Response to 
Internet Rumors 
Lany et al Rumor intensity 
relationship 
Not 
specifically 
mentioned 
The importance 
and ambiguity 
of rumors 
Not 
specifically 
mentioned 
The relationship 
between rumor 
intensity, importance, 
and ambiguity 
Kapfrey&CASS 
Sanstein 
From the 
perspectives of 
sociology and 
political science 
Not 
specifically 
mentioned 
The social and 
political impact 
of rumors 
Not 
specifically 
mentioned 
The Social and Political 
Interpretation of 
Rumors 
Srinivasan et al Identification 
and 
dissemination 
of rumors 
Social media Difficulty in 
identifying 
rumors 
Not 
specifically 
mentioned 
The limitations of 
social media users in 
identifying rumors 
Sheng L et al Twitter rumor 
detection 
algorithm 
Twitter Common sense 
knowledge and 
investigative 
news 
commentary 
Accuracy, 
recall rate, F1 
score 
(expected) 
Using controversial 
comments to detect 
rumors 
 
2.2 The research work of this paper and its 
significance 
Based on the results of the former's research on rumours 
in social media, I put forward a deep learning algorithm to 
study rumours in social media. As the main force of 
communication tools in the new era, the emergence of 
social media has successfully created a hot topic and 
enriched people's talk after dinner. However, everything 
has two sides. In social media, anyone can make any 
comments through any terminal anytime and anywhere, 
which undoubtedly provides convenience for people's life, 
but also creates conditions for the emergence and spread 
of rumours. In recent years, Weibo has developed rapidly. 
Once a public crisis happens in society, we will pay 
attention to the information on Weibo for the first time, 
and a large amount of information will gather on Weibo, 
which will form a hot topic or trend and then attract the 
traditional media to follow up. The spread of rumours in 
social media also conforms to this rule. These rumours 
detection methods based on classification features have 
86 Informatica 48 (2024) 83–96 C. Han et al.  
 
achieved initial results [24]. However, the manual design 
of features is time-consuming and labour-consuming, and 
the designed features are often limited to specific scenes, 
so the generalization ability is not good. The rumour 
authenticity prediction task is oriented to the data set of a 
single text. The classification goal is a two-classification 
task. Based on the analysis and extraction of rumour 
characteristics based on a deep learning algorithm, this 
paper constructs a real-time rumour detection model to 
predict the possibility of each message becoming a rumour 
simultaneously so that supervisors can quickly obtain 
suspected rumours and actively identify and track them. 
As for the analysis of the spread process of Yao rumours 
in social media, this paper mainly analyses some typical 
rumours to explore the related factors, spread patterns and 
characteristics of rumours to pave the way for the spread 
effect and control of rumours below. 
3 Implementation of deep learning 
model 
3.1 Principle and algorithm of deep 
learning 
Once the feature vectors are determined, we can use deep 
learning models for training and extraction. During the 
training phase, we use labeled data (i.e., samples known to 
be Botnet or non-Botnet) to train the model, enabling it to 
learn the ability to distinguish between Botnet and non-
Botnet. In the extraction stage, we apply the trained model 
to new, unlabeled data to predict whether they belong to 
Botnet. In the specific application of Botnet recognition, 
deep learning models can perform well because they can 
capture complex patterns that traditional methods find 
difficult to detect. For example, deep neural networks 
(DNN) or recurrent neural networks (RNN) can be used to 
analyze time series data, such as network traffic logs, to 
detect abnormal patterns related to Botnet behavior. In 
addition, Convolutional Neural Networks (CNNs) are also 
very effective in processing image data, such as 
screenshots or network topologies, and can recognize 
visual features related to Botnet activities.  [25]. 
Subsequently, new depth structure model algorithms have 
been proposed, setting off a wave of research on depth 
learning. It is not too long the ten years of deep learning 
development. Most of the models are based on the most 
basic core models.  
The advantage of LSTM lies in increasing the 
forgetting threshold, input threshold and output threshold 
so that it can have a variable cyclic weight. This will make 
the integral scale change dynamically even at different 
times when the parameters are fixed, thus solving the 
problems of gradient expansion or gradient disappearance 
[26]. It can be seen from the internal structure diagram of 
LSTM that the difference between LSTM and RNN is that 
three gates are set up in the LSTM model to determine 
whether the input value of the upper layer is important 
enough to be remembered and can be output. Each gate is 
controlled by a Sigmoid function unit, in which if the 
value generated by the input gate is close to zero, it will 
block the value here and will not go to the next level; The 
value generated by the forgetting gate is close to zero. The 
model is input as a piece of Chinese text, which goes to 
the input layer formed by the model, and then passes 
through a multi-layer network structure formed by a 
plurality of bidirectional lstm. Finally, the corresponding 
output is the probability distribution of this piece of text 
under different classification results. The structure 
diagram of the LSTM model is shown in Figure 1. 
Firstly, we need to clarify that LSTM (Long Short-
Term Memory) network and RBM (Restricted Boltzmann 
Machine) are two different neural network models, each 
with its own structure and computational approach. I will 
first explain the core concepts of the two separately, and 
then expand and analyze them based on these concepts. 
LSTM network is a special type of recurrent neural 
network (RNN) that can learn and remember dependency 
relationships in long sequences. Each LSTM unit contains 
three gates: input gate, forget gate, and output gate, as well 
as a memory cell state. These gates and memory cells 
together determine the flow of information in the network. 
 
𝐸 ( 𝑣 , ℎ | 𝜃 ) = − ∑ 𝑎 𝑖 𝑣 𝑖 𝑛 𝑖 = 1
− ∑ 𝑏 𝑗 ℎ
𝑗 𝑚 𝑗 = 1
 (1) 
 
In the above formula, 𝑎 𝑖 represents the bias of visible 
neuron 𝑖 , 𝑏 𝑗 represents the bias of hidden neuron 𝑗 , and 
𝜃 = { 𝑤 𝑖𝑗
, 𝑎 𝑗 , 𝑏 𝑗 } is the parameter of RBM. When the 
parameters are determined, we can get the joint probability 
distribution of ( 𝑣 , ℎ ) 
 
𝑃 ( 𝑣 , ℎ | 𝜃 ) =
𝑒 − 𝐸 ( 𝑣 , ℎ | 𝜃 )
𝑧 ( 𝜃 )
 (2) 
 
Where 𝑍 ( 𝜃 ) is the allocation function. 
𝑣 distribution 𝑃 ( 𝑣 | 𝜃 ) in RBM is the marginal 
distribution of joint probability distribution 𝑃 ( 𝑣 , ℎ | 𝜃 ) of 
observation data 
 
𝑃 ( 𝑣 | 𝜃 ) =
1
𝑍 ( 𝜃 )
∑ 𝑒 − 𝐸 ( 𝑣 , ℎ | 𝜃 )
ℎ
 
(3) 
 
When the state of visible neurons is given, the 
activation states of hidden neurons are conditionally 
independent. The activation probability of the second 
hidden neuron is 
 
𝑃 ( ℎ
𝑗 = 1 | 𝑣 , 𝜃 ) = 𝜎 ( 𝑏 𝑗 + ∑ 𝑣 𝑖 𝑤 𝑖𝑗
𝑖 ) (4) 
 
Where 𝜎 ′
( 𝑥 ) =
1
1 + 𝑒 𝑥 𝑝 ( − 𝑥 )
 is the sigmoid activation 
function. According to the symmetry of the RBM 
structure, when the state of a hidden neuron is given, the 
activation probability of the 𝑗 visible neuron is: 
 
𝑃 ( 𝑣 𝑖 = 1 | ℎ , 𝜃 ) (5) 
Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 87 
 
LSTM is a special type of recurrent neural network 
(RNN) that overcomes the gradient vanishing and 
exploding problems encountered by traditional RNNs 
when processing long sequences by introducing gating 
mechanisms. [27]. The specific calculation process of 
LSTM is shown in Figure 2. 
Because LSTM's feature extraction ability is not ideal, 
we use the LSTM model embedded with the Attention 
mechanism to detect common sense rumours. The specific 
steps are as follows 
 
Figure 1: Structure diagram of the LSTM model 
 
Figure 2: Schematic diagram of LSTM mechanism calculation 
(1) First, use the Word2Vec model and Adam 
optimizer to realize the vectorization of input text. The 
corpus p of this paper consists of n sentences, and each 
sentence consists of m words. 
(2) Before inputting text into the LSTM model, it is 
usually necessary to vectorize the text, such as using the 
Word Embedding method to convert words into fixed 
dimensional vectors. These vectors not only contain 
semantic information of words, but also enable the model 
to handle variable length sequences. Once the text is 
vectorized, attention mechanisms can evaluate the 
importance of these vectors (i.e. local features of the text) 
at each time step. This can be achieved by calculating a 
weight vector that is associated with each position in the 
88 Informatica 48 (2024) 83–96 C. Han et al.  
 
input sequence and reflects the importance of that position 
to the model output [28]. 
(3) Finally, local features and global features are 
fused, and the classification results are output by a 
classifier. 
 
𝐻 𝐾 = 𝐿𝑖 𝑛 𝑒𝑎 𝑟 ( 𝐻 ; 𝜃 𝐻𝐾
) (6) 
𝐻 𝑉 = 𝐿𝑖 𝑛 𝑒𝑎 𝑟 ( 𝐻 ; 𝜃 𝐻𝑉
) (7) 
 
Where 𝜃 𝐻𝐾
 and 𝜃 𝐻𝑉
 are training parameters 
representing the full connection layer without activation 
function, 𝐻 𝐾 , 𝐻 𝑉 ∈ 𝑅 . 𝑑𝑖 𝑚 is the converted feature 
dimension. 
The visual feature is converted into the same feature 
dimension 𝑀 ∈ 𝑅 as the text feature 𝐻 through the full 
connection layer, and then the converted image feature is 
converted into 𝑞 𝑢 𝑒𝑟 𝑦 . The 𝑀 𝑄 calculation process is 
shown in the formula. 
 
𝑀 𝑄 = 𝐿𝑖 𝑛 𝑒𝑎 𝑟 ( 𝑀 ; 𝜃 𝑀𝑄
) (8) 
 
Where 𝜃 𝑀𝑄
 is the training parameter, 𝑉 𝑄 ∈ 𝑅 and 𝑚 
represent the number of candidate frames. 
The use of multi head attention mechanism is a key 
component in the Transformer model, which allows the 
model to simultaneously focus on different parts of the 
input sequence in different representation subspaces. 
Multi head attention divides input into multiple "heads", 
each head independently learns a set of attention weights, 
and then concatenates the outputs of these heads and 
linearly transforms them to obtain the final output. We 
define some symbols to better understand the multi head 
attention mechanism. 
 
𝑎𝑡𝑡𝑒 𝑛 𝑖 𝑀𝐻
= 𝑠 𝑜𝑓𝑡 𝑚 𝑎𝑥 (9) 
𝑀 𝑢 𝑝 𝑑𝑎 𝑡 𝑒 = 𝐴 𝑡𝑡 𝑒 𝑛 𝑀𝐻
× 𝐻 𝑣 (10) 
𝑀 = 𝐿𝑖 𝑛 𝑒𝑎 𝑟 ( 𝑀 , 𝑀 𝑢 𝑝 𝑑𝑎 𝑡 𝑒 𝜃 𝑀 ) (11) 
 
Where 𝜃 𝑀 is the training parameter. Finally, the 
visual feature 𝑀 is averaged and pooled to get the final 
visual feature 𝑀 ∈ 𝑅 . 
The updated visual features and text features are fused 
by attention weight, and the calculation process is shown 
in the formula: 
 
𝐴𝑡 𝑡𝑒 𝑛 𝐻𝑀
= 𝑠 𝑜𝑓𝑡 𝑚 𝑎𝑥 ( 𝑊 2
𝐻𝑀
+ 𝑏 1
𝐻𝑀
) 
(12) 
 
Where, 𝑊 𝐻𝑀
is the weight matrix and 𝑏 𝐻𝑀
 is the 
deviation. 
By optimizing the spread graph, we can build a news 
credibility prediction model, find the spread patterns that 
distinguish rumours based on heterogeneous user 
representation and modelling methods, and fuse images, 
embedded text in images and text content to carry out 
rumour detection. It is usually necessary to use the vector 
with an intersecting semantic presentation to carry out the 
research. However, identifying these websites generally 
depends on manual review, which will not only cause a 
waste of human resources but also cause a certain error 
rate due to the differences in people's knowledge level and 
identification ability.  
3.2 Rumour detection 
According to the different characteristics of rumour data, 
rumour detection is divided into two sub-tasks: rumour 
position classification and rumour authenticity prediction. 
Among them, the rumour position classification task is 
oriented to the data set of the tree structure. Its general 
structure is a source text and the replies of different users 
to the text. The classification goal is four classification 
tasks. Each text is divided into support, opposition, doubt 
and comment, referred to as sdqc task for short. At the 
same time, the Sina Weibo platform has also opened a 
Weibo Biyao account to release confirmed rumours for 
users to browse regularly. However, identifying these 
websites generally depends on manual review, which will 
not only cause a waste of human resources but also cause 
a certain error rate due to the differences in people's 
knowledge level and identification ability. These rumours 
detection methods based on classification features have 
achieved initial results. The rumour authenticity 
prediction task is oriented to the data set of a single text. 
The classification goal is a two-classification task. It can 
predict whether the input rumour data is true or false in 
combination with auxiliary information. It can also be 
modelled as a three-classification problem, that is, true, 
false and unverifiable. Therefore, it is very necessary to 
rely on computers for automatic rumour detection, and the 
rumour detection model is applied to this scenario for 
early rumour detection to curb the spread of rumours 
effectively. 
According to different research ideas of rumour 
detection, early work usually builds classifiers based on 
different types of manual features through supervised 
learning. For example, features are extracted from texts 
and user profiles, and classifiers such as support vector 
machines and decision trees are used to predict the 
credibility of Twitter. However, Weibo is an open 
platform where people can participate. This high 
participation will undoubtedly make the spread of rumours 
more rapid. At the same time, it will give more people 
more opportunities to participate in evaluating rumours 
and expressing their views on the rumours. However, this 
method of combining features from different sources only 
increases the amount and types of information that can be 
used by the model and does not pay enough attention to 
early detection. Such methods can't find rumours as early 
as possible in practice. By optimizing the spread graph, we 
can build a news credibility prediction model, find the 
spread patterns that distinguish rumours based on 
heterogeneous user representation and modelling 
methods, and fuse images, embedded text in images and 
text content to carry out rumour detection. The evaluation 
of users' rumours can objectively describe the correctness 
of the rumours and provide effective features for 
classification from another different angle. Compared 
with some simple features before, it will undoubtedly be 
more convincing. It is found that compared with rumours, 
rumour correction has a clearer content definition, more 
Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 89 
 
reliable information sources and less emotional use. At the 
same time, it is found that the followers of opinion leaders 
can ease the relationship between emotional 
characteristics and the other two types of characteristics; 
that is, if opinion leaders have too emotional remarks, they 
will be supplemented by opinion followers, thus 
enhancing the clarity of content and reliable sources. 
4 Experimental results and analysis 
4.1 Data set 
In order to deeply analyze and evaluate the classification 
performance of rumor and non rumor Weibo, we obtained 
a detailed dataset from the Sina Community Management 
Center. We obtained confirmed rumors on Weibo from 
Sina Community Management Center, which have been 
officially reviewed and confirmed. In order to form a 
comparison, we also randomly selected non rumor Weibo 
accounts that are comparable in quantity to rumor Weibo 
accounts from the Sina Weibo platform. These Weibo 
posts have been confirmed as non rumors by manual or 
automated tools, such as models based on content and user 
behavior. 
HTML tags, special characters, URL links, and 
usernames mentioned at @ were removed from Weibo, 
and only plain text content was retained. Use Chinese 
word segmentation tools (such as Jieba word 
segmentation) to segment Weibo text. Common Chinese 
stop words such as "de", "yes", and "zai" have been 
removed. For different forms of certain words, such as 
"running" and "running", we performed stem extraction to 
represent them uniformly. To train our model, we divided 
the dataset into training sets at a ratio of 70%. In order to 
adjust the hyperparameters of the model and avoid 
overfitting, we used 15% of the data as the validation set. 
Finally, we use the remaining 15% of the data as the test 
set to evaluate the performance of the model. To obtain 
more microblog data, we obtained a group of known 
rumours from the Sina community management centre, 
which reports all kinds of rumours. We also collected a 
considerable number of non-rumour microblogs. The 
detailed statistical data of the two data sets are shown in 
Table 1. 
We choose the following four representative methods 
for performance comparison and use a convolutional 
neural network (CNN) to represent the original Weibo of 
suspected rumours and classify them. All text information 
of the forwarding sequence is processed into TF-IDF 
vector representation (Salton et al ． , 1988), and the 
support vector machine classifier is trained to classify 
rumours. The rumour classifier is trained using the 2-layer 
gated recurrent neural network GRU (Choe et al. 2014). 
We evaluate the performance of these models by using 
evaluation indicators such as accuracy and recall. The 
detailed results of different methods under various 
evaluation indicators are shown in Table 2. 
Before inputting text into the LSTM model, it is 
usually necessary to vectorize the text, such as using the 
Word Embedding method to convert words into fixed 
dimensional vectors. These vectors not only contain 
semantic information of words, but also enable the model 
to handle variable length sequences. Once the text is 
vectorized, attention mechanisms can evaluate the 
importance of these vectors (i.e. local features of the text) 
at each time step. This can be achieved by calculating a 
weight vector that is associated with each position in the 
input sequence and reflects the importance of that position 
to the model output. In the design of the CED-CNN model, 
the introduction of CNN (Convolutional Neural Network) 
may play a crucial role. CNN can automatically learn and 
extract spatial features from input data, which is very 
effective for processing text and image information on 
social media. By combining CNN with your early 
detection strategy, the CED-CNN model can more 
accurately capture the features of rumors and recognize 
them in the early stages. 
The experiment was conducted on the data set. 
According to the info gain attribute EVA method and gain 
ratio attribute Eval method, we used Twitter's streamable 
to obtain data for 4 months, and the total amount of text 
contained was about 63g. This information is randomly 
sampled and pushed by Twitter in all messages. The data 
set contains about 60 million tweets in total. The 
experimental results are shown in Table 3 and Table 4, 
respectively.
Table 1: Data set statistics 
Weibo-all Rumour Non-rumour All samples 
Number of samples 3850 4198 8051 
Number of Posts posted 2572046 2450822 5022867 
The average number of Posts posted 667 583 623 
Table 2: Experimental results of the Weibo data set 
Methods 
Weibo-stan Weibo-all 
Accuracy rate Recall rate F1 ER Accuracy rate Recall rate F1 ER 
CNN 0.808 0.828 0.803 100% 0.886 0.882 0.882 100% 
TF-IDF 0.858 0.798 0.867 100% 0.820 0.913 0.778 100% 
GRU 0.921 0.925 0.912 100% 0.905 0.922 0.902 100% 
Table 3: Analysis results of feature importance of info gain attribute Eval 
Factor Importance Factor Importance Factor Importance 
90 Informatica 48 (2024) 83–96 C. Han et al.  
 
utc_offset 0.19621 Location 0.0096 IsGeo_enabled 0.0016 
Language 0.1167 Follower_count 0.00376 Tag Num 0 
time_zone 0.06436 Listed_count 0.00368 Statuses_count 0.00245 
created_at 0.02392 Friends_count 0.00351 Favourites_count 0 
Table 4: Analysis results of gain ratio attribute Eval feature importance 
Factor Ranking Factor Ranking Factor Ranking 
Language 1 Is Verified 5 IsGeo_enabled 9 
Follower_count 2 Utc_offset 6 Tag Num 10 
Listed_count 3 Location 7 Statuses_count 11 
Friends_count 4 Created_at 8 Favourites_count 12 
 
 
As seen from Tables 3 and 4, according to the 
comprehensive results of the two methods, regionalism is 
an important social relationship influencing factor 
consistent with the principle of homogeneity. The 
importance of language is self-evident. In addition, the 
number of fans, the number of followers, the number of 
lists, etc., are also important. The common feature of these 
attributes is that they can represent users' activity. The 
length of self-introduction, certification, registration time 
and other factors are related to the quality of users' 
published content. 
4.2 Analysis and discussion of rumour 
detection in social media 
In this experiment, other features of the data mining 
algorithm, ant colony algorithm and deep learning 
algorithm in this paper are calculated, and the rumour 
model is trained. Then the real-time detection error rate of 
the rumour model is tested, and the experimental results 
are shown in Figure 3.
 
Figure 3: Real-time error rate of detection under different algorithms 
Figure 3 shows the real-time error rate of detection 
under different algorithms. At the same time, although 
data mining algorithms and ant colony algorithms can 
achieve good performance in certain situations, they may 
have certain limitations when dealing with real-time data 
streams and new topic detection tasks. Data mining 
algorithms typically rely on manually designed features 
and rules, which may not fully cover all data patterns and 
variations. As a heuristic search algorithm, although ant 
colony algorithm can solve optimization problems to a 
certain extent, it may not be flexible and efficient enough 
when dealing with complex real-time data streams. 
In terms of topic coverage experiments (as shown in 
Figure 4), we can further analyze the ability of different 
algorithms to place texts discussing the same topic into the 
same Weibo topic cluster. Due to its powerful feature 
learning and representation capabilities, deep learning 
algorithms may be able to more accurately recognize and 
classify relevant texts, thereby generating more accurate 
and consistent topic clusters. Although data mining 
algorithms and ant colony algorithms can achieve the 
Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 91 
 
same goals to some extent, they may be slightly inferior in 
accuracy and consistency. 
As can be seen from Figure 4, the coverage of 
different algorithms, that is, the degree of information 
coincidence between any two media data streams. And the 
coverage of this deep learning algorithm in the topic is 
38.8%. It can be concluded that the coverage rate of the 
deep learning algorithm in topics is the highest, and the 
degree of information coincidence recorded in this paper 
is the proportion of the number of topics that can be 
successfully aligned with each other in the two data 
streams in the total number of topics. 
During the training of CED, each suspected rumour's 
"credible detection point" is constantly advanced and can 
be judged according to the threshold-based strategy during 
the test. 
 
Figure 4: Change of topic coverage under different algorithms 
 
Figure 5: Change curve of advance rate in the microblog data set 
92 Informatica 48 (2024) 83–96 C. Han et al.  
 
As can be seen from Figure 5, when the amount of 
forwarding information used is less than 9%, CED / ced-
om / ced-cnn can detect about 40% / 50% / 60% of 
microblogs, respectively. This verifies the advantages of 
considering the original microblog information and using 
convolutional neural network modelling. At the same 
time, the three CED methods we proposed can make 
effective early detection of rumours. The proportion has a 
local peak when using the whole post sequence for 
detection. There are few cases in ced-cnn, less than 8%, 
indicating that ced-cnn needs less post information for 
detectability and has a high utilization rate of post 
information. 
After eliminating some feature types to ensure the 
real-time detection of data flow, this study calculates the 
three methods through the data mining algorithm, ant 
colony algorithm, and other features of the deep learning 
algorithm in this paper trains the rumour model and then 
tests the real-time detection accuracy of the rumour model. 
The experimental results are shown in Figure 6. 
As can be seen from Figure 6, after eliminating some 
feature types for the real-time detection of data flow, for 
the three methods, when the time index reaches 40. The 
effectiveness of this method is also confirmed. 
A rumour about AIDS has been wildly forwarded on 
the Internet. The original is "Don't eat outside recently, 
especially barbecue, cold dishes, Lanzhou ramen, etc. a 
group of people infected with AIDS have used their blood 
to drop into food in some cities across the country. It has 
been confirmed that people have been infected." Although 
I have heard of similar legends for a long time, when I 
suddenly see the news, there is a kind of panic and 
uneasiness. Zhihulai representative online forum and Sina 
Weibo representative social networking sites were 
selected respectively to count the relevant posts and make 
the following comparison tables, as shown in Tables 5 and 
6.
 
Figure 6: Real-time detection accuracy under different algorithms 
Table 5: Statistics on the interaction of rumour-refuting posts in Zhihu Forum 
Topic post name ViewsNumber of replies Number of replies 
AIDS is a typical rumour spread by dripping blood 86 6 
"Dripping blood" spreads rumours of AIDS 42 3 
"AIDS people poisoned" speech disseminators were punished by 
public security 
44 8 
You are responsible for your online comments 16 3 
Total 188 20 
Table 6: Statistics of Sina Weibo rumour refutation post interaction 
Total number of selected topic posts Forwarding volume Comment volume 
2137 2411 977 
 
Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 93 
 
As can be seen from Table 5 and Table 6, there are 
four subject posts on the Zhihu Forum, with 188 page 
views and 20 replies, which means that, at most, only 188 
people have learned about rumours from this forum. On 
Sina Weibo, a total of 2,411 original theme posts were 
searched. The number of first reposts is 2411, and the 
number of comments is 977. The number may be 
staggering if the second reposts and the third reposts are 
counted. Coupled with 22% of diving groups who know 
silently but don't interact, the number of page views will 
be jaw-dropping. By comparison, it can be concluded that 
the vitality of rumours has become short and fragile, to a 
certain extent, due to the spread of social media. Because 
in the media age, it takes much more time and workforce 
to clarify similar rumours. 
In this paper, the deep learning algorithm makes a 
more comprehensive analysis of the testing effect by 
observing the false positive and false negative rates under 
all threshold settings. The optimal threshold can be 
selected through the trade-off between the two. In 
addition, the throughput measurement per second is 
calculated for many message characteristics. The 
performance test results of the real-time detection method 
in this paper under the full threshold setting are shown in 
Figure 7. 
As seen in Figure 7, this method makes a more 
comprehensive analysis of the testing effect by observing 
false positive and false negative rates under all visual 
threshold settings. In addition, by measuring the 
throughput per second of a large number of message 
feature calculations, this paper also tests the efficiency of 
feature calculation.  
According to the gainratioattributeeval method, this 
experiment's most important user characteristics are 
language, number of fans, number of lists, number of 
concerns, time zone, authentication, time zone offset, 
location, etc. The experimental results are shown in Table 
7. 
As can be seen from Table 7, according to 
CfsSubsetEval, the most important features are location, 
language, time zone, time zone offset, etc. It should be 
noted that Location is the city name filled in the text box 
when the user registers. And time_zone and utc_offset are 
closely related to the location. 
To demonstrate the high efficiency of calculating 
implication degree and false feedback characteristics, this 
paper constructs a rumour detection system and processes 
20,000 Weibo to test the system's throughput. This paper 
runs on a single core and tests it. In this study, the average 
throughput of an idle machine was obtained after repeated 
operation six times. The experimental results are shown in 
Figure 8.As can be seen from Figure 8, the detection 
method studied in this paper can process up to 7000 
microblogs per second, which can realize efficient real-
time rumour detection without burden. In addition, the 
number of news reports is very small compared with the 
number of social media messages. On this basis, this paper 
finds no significant difference in the performance of using 
k-term entry and vector distance to calculate the 
implication feature.
 
Figure 7: Performance of real-time detection method of deep learning algorithm under full threshold setting 
94 Informatica 48 (2024) 83–96 C. Han et al.  
 
 
Figure 8: Detection throughput, deep learning algorithm and the average value of Sina Weibo flow 
Table 7: Results of characteristic importance analysis of CFS subset eval model 
Factor Language Location Time_zone 
Ranking 1 2 3 
Factor Follower_count Description length Utc_offset 
Ranking 4 5 6 
5 Discussion 
In this study, we propose a real-time rumor detection 
method based on deep learning algorithms and emerging 
feature types. In order to comprehensively evaluate our 
method, we compared it in detail with the relevant work 
listed in Table 1. These works cover various methods from 
psychology and sociology to automated rumor detection 
systems. 
Firstly, in terms of performance indicators, our 
method performs excellently in both processing speed and 
accuracy. As shown in Figure 8, our system can process 
up to 7000 Weibo posts per second, which is very 
important in real-time rumor detection scenarios. In 
addition, the effectiveness of LSTM models in hotspot 
tracking tasks has been validated under deep learning 
algorithms, especially when the dimensionality of feature 
vectors is low. This result demonstrates the efficiency and 
practicality of our method. 
Secondly, in terms of model architecture and feature 
selection, our method adopts implicit features and pseudo 
feedback features, which are rarely mentioned in existing 
work. The introduction of these emerging feature types 
enables our model to better capture the complexity and 
diversity of rumors on social media. In addition, we also 
utilized traditional text features and demonstrated better 
performance in rumor detection tasks by combining 
multiple feature types. 
Finally, in terms of the dataset used, we chose Sina 
Weibo, a representative social media platform. Sina 
Weibo has a large user base, fast information 
dissemination speed, and frequent rumors. By processing 
and analyzing data from Sina Weibo, our method can more 
accurately reflect the spread patterns and characteristics of 
rumors on social media. 
6 Conclusion 
In the research on the real-time rumour detection method 
of social media, this study proposes a real-time Yao 
language detection method based on a deep learning 
algorithm. This method mainly relies on two emerging 
feature types: implication feature and pseudo feedback 
feature, as well as the calculation of traditional text 
features. After eliminating some feature types for the real-
time detection of data flow. The experimental results on 
real data sets show that the LSTM model under the deep 
learning algorithm is effective in hot spot tracking tasks. 
Especially when the dimension of the eigenvector is 
relatively low, its effect is much better than in other 
models. Therefore, it can be concluded that we can not 
completely rely on the purification function of social 
media; let it be and let it go. The government should also 
establish corresponding regulatory mechanisms to 
monitor the spread of rumours. Only under the premise of 
giving full play to the self-purification role of social 
media, multi-pronged and appropriate regulation can we 
ensure a healthy and orderly network operation 
environment.  
Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 95 
 
Competing of interests 
The authors declare no competing of interests. 
Authorship contribution statement 
Chunyan Han: Writing-Original draft preparation, 
Conceptualization, Supervision, Project administration. 
Ling Lin: Language review, Methodology, Software. 
Data availability 
On Request 
Declarations 
Not applicable 
R efer ence s 
[1] S. Vosoughi, M. ‘Neo’ Mohsenvand, and D. Roy, 
“Rumor gauge: Predicting the veracity of rumors 
on Twitter,” ACM transactions on knowledge 
discovery from data (TKDD), 11(4): 1–36, 
2017.https://doi.org/10.1145/3070644 
[2] M. Mirbabaie, I. Amojo, and S. Stieglitz, 
“Affording Twitter in Emergency Situations: The 
Occurrence of Rumor Sense-Making,” Journal 
of Database Management (JDM), 32(2): 50–66, 
2021. DOI: 10.4018/JDM.2021040104 
[3] M. Guo, Z. Xu, L. Liu, M. Guo, and Y. Zhang, 
“An adaptive deep transfer learning model for 
rumor detection without sufficient identified 
rumors,” Math Probl Eng, 2020, 
2020.https://doi.org/10.1155/2020/7562567 
[4] T. Ma, H. Zhou, Y. Tian, and N. Al-Nabhan, “A 
novel rumor detection algorithm based on entity 
recognition, sentence reconfiguration, and 
ordinary differential equation network,” 
Neurocomputing, 447: 224–234, 
2021.https://doi.org/10.1016/j.neucom.2021.03.
055 
[5] Y. Cheng and L. Zhao, “Dynamical behaviors 
and control measures of rumor-spreading model 
in consideration of the infected media and time 
delay,” Inf Sci (N Y), 564: 237–253, 
2021.https://doi.org/10.1016/j.ins.2021.02.047 
[6] M. Huang, G. Zou, B. Zhang, Y. Gan, S. Jiang, 
and K. Jiang, “Identifying influential individuals 
in microblogging networks using graph 
partitioning,” Expert Syst Appl, 102: 70–82, 
2018.https://doi.org/10.1016/j.eswa.2018.02.021 
[7] M. A. Al-Garadi et al., “Analysis of online social 
network connections for identification of 
influential users: Survey and open research 
issues,” ACM Computing Surveys (CSUR), 
51(1): 1–37, 2018. 
https://doi.org/10.1145/3155897 
[8] F. P. Boogaard, K. S. A. H. Rongen, and G. W. 
Kootstra, “Robust node detection and tracking in 
fruit-vegetable crops using deep learning and 
multi-view imaging,” Biosyst Eng, 192: 117–
132, 2020. 
https://doi.org/10.1016/j.biosystemseng.2020.01
.023 
[9] V. Chandrakanth, V. S. N. Murthy, and S. S. 
Channappayya, “UAV-based autonomous 
detection and tracking of beyond visual range 
(BVR) non-stationary targets using deep 
learning,” J Real Time Image Process, 1–17, 
2022.https://link.springer.com/article/10.1007/s
11554-021-01185-w 
[10] Y. Li, X. Zhang, H. Li, Q. Zhou, X. Cao, and Z. 
Xiao, “Object detection and tracking under 
complex environment using deep learning‐based 
LPM,” IET computer vision, 13(2): 157–164, 
2019. https://doi.org/10.1049/iet-cvi.2018.5129 
[11] Y. Zou, R. Lan, X. Wei, and J. Chen, “Robust 
seam tracking via a deep learning framework 
combining tracking and detection,” Appl Opt, 
59(14): 4321–4331, 2020. 
https://doi.org/10.1364/AO.389730 
[12] N. Shlezinger, N. Farsad, Y. C. Eldar, and A. J. 
Goldsmith, “ViterbiNet: A deep learning based 
Viterbi algorithm for symbol detection,” IEEE 
Trans Wirel Commun, 19(5): 3319–3331, 2020. 
https://doi.org/10.1109/TWC.2020.2972352 
[13] J. Zhang, S. Jiang, Y. Zhang, X. Liu, D. Wang, 
and F. Qiu, “Long-term tracking algorithm using 
deep features and a single shot multibox 
detector,” J Electron Imaging, 27(5): 53019, 
2018.https://doi.org/10.1117/1.JEI.27.5.053019 
[14] V. Indu and S. M. Thampi, “A psychologically-
inspired fuzzy-based approach for user 
personality prediction in rumor propagation 
across social networks,” Journal of Intelligent & 
Fuzzy Systems, 41(5): 5425–5439, 2021. DOI: 
10.3233/JIFS-189864 
[15] A. I. E. Hosni, K. Li, and S. Ahmad, “Minimizing 
rumor influence in multiplex online social 
networks based on human individual and social 
behaviors,” Inf Sci (N Y), 512: 1458–1480, 
2020.https://doi.org/10.1016/j.ins.2019.10.063 
[16] L. Wu, Y. Rao, H. Yu, Y. Wang, and N. 
Ambreen, “A multi‐semantics classification 
method based on deep learning for incredible 
messages on social media,” Chinese Journal of 
Electronics, 28(4): 754–763, 
2019.https://doi.org/10.1049/cje.2019.05.002 
[17] A. Y. K. Chua and S. Banerjee, “To share or not 
to share: The role of epistemic belief in online 
health rumors,” Int J Med Inform, 108: 36–41, 
2017. 
https://doi.org/10.1016/j.ijmedinf.2017.08.010 
[18] N. Bai, F. Meng, X. Rui, and Z. Wang, “Rumor 
detection based on a source-replies conversation 
tree convolutional neural net,” Computing, 1–17, 
2022. https://doi.org/10.1007/s00607-021-
01034-5 
[19] Z. Wang and A. Chen, “On ISRC rumor 
spreading model for scale-free networks with 
self-purification mechanism,” Complexity, 2021: 
1–9, 2021. 
https://doi.org/10.1155/2021/6685306 
96 Informatica 48 (2024) 83–96 C. Han et al.  
 
[20] L. Yang, Z. Li, and A. Giua, “Containment of 
rumor spread in complex social networks,” Inf 
Sci (N Y), 506: 113–130, 2020. 
https://doi.org/10.1016/j.ins.2019.07.055 
[21] S. Srinivasan and D. B. LD, “A neuro-fuzzy 
approach to detect rumors in online social 
networks,” International Journal of Web 
Services Research (IJWSR), 17(1): 64–82, 2020. 
DOI: 10.4018/IJWSR.2020010104 
[22] L. Sheng, X. Guang, and X. Ma, “The Spread of 
Rumors and Positive Energy in Social Network,” 
Journal of Internet Technology, 19(5): 1515–
1524, 2018. 
https://jit.ndhu.edu.tw/article/view/1771/0 
[23] A. Pinheiro, C. Cappelli, and C. Maciel, 
“Designing auditability in social networks to 
prevent the spread of false information,” IEEE 
Latin America Transactions, 15(12): 2282–2289, 
2017. 
https://doi.org/10.1109/TLA.2017.8071089 
[24] A. Pal, A. Y. K. Chua, and D. H.-L. Goh, 
“Debunking rumors on social media: The use of 
denials,” Comput Human Behav, 96: 110–122, 
2019. https://doi.org/10.1016/j.chb.2019.02.022 
[25] H. J. Oh and H. Lee, “When do people verify and 
share health rumors on social media? The effects 
of message importance, health anxiety, and 
health literacy,” J Health Commun, 24(11): 837–
847, 2019. 
https://doi.org/10.1080/10810730.2019.1677824 
[26] O. Oh, P. Gupta, M. Agrawal, and H. R. Rao, 
“ICT mediated rumor beliefs and resulting user 
actions during a community crisis,” Gov Inf Q, 
35(2): 243–258, 2018. 
https://doi.org/10.1016/j.giq.2018.03.006 
[27] W.-H. Tsai and Z.-W. Lin, “Social 
Constructionism and the Significance of Political 
Rumors in Contemporary China,” Asian Surv, 
59(5): 870–888, 2019. 
https://www.jstor.org/stable/26848407 
[28] S. Luna, “Affective atmospheres of terror on the 
Mexico–US border: Rumors of violence in 
Reynosa’s prostitution zone,” Cultural 
Anthropology, 33(1): 58–84, 
2018.https://doi.org/10.14506/ca33.1.03