https://doi.org/10.31449/inf.v48i14.5998 Informatica 48 (2024) 83–96 83 Detecting and Tracking Rumours in Social Media Based on Deep Learning Algorithm 1,2 Chunyan Han * , 2 Ling Lin 1 School of Journalism and Communication, Shandong University, Jinan, Shandong 251000, China 2 Department of Scientific Research, Changji University, Changji, Xinjiang 831100, China E-mail: 13415980058@163.com *Corresponding author Keywords: deep learning algorithms, detection and tracking, social media, rumours Received: April 9, 2024 Online rumours have become more widespread and have a wider impact. In the era of social media, more and more network users take photos of themselves or others, actively express their views and opinions, and even interact and communicate with others, which forms online public opinion. The automatic detection technology of rumours can purify the network environment and avoid chaos or turbulence in society. Therefore, this paper proposes a deep-learning algorithm to detect and track rumours in social media. After eliminating some feature types for the real-time detection of data flow, when the time index reaches 40, the real-time accuracy of the data mining algorithm is 57.23%, that of the ant colony algorithm is 53.45%. The effectiveness of this method is also confirmed. Break through the limitations of traditional skills under the deep learning algorithm, using traditional painting and involving devices, ready-made materials, popular symbols, digital technology and other means. Povzetek: Razvit je nov algoritem globokega učenja za zaznavanje in sledenje govoricam na družbenih omrežjih. 1 Introduction Rumour is an important phenomenon of information dissemination in human society. It has been one of the focus of interest in the field of social psychology, and Journalism and communication for decades. Online rumours have become more widespread and have a wider impact. In the era of social media, more and more network users reprint or share photos, videos and published text messages taken by themselves or others on the Internet, actively express their views and opinions, and even interact and communicate with others, which forms online public opinion [1-2]. While providing sufficient information, social media has become a hotbed for rumours. Some scholars believe that social media provides a platform for public communication and allows rumours to disintegrate into discussion and debate [3]. Whether the spread of rumours is the illusion of excess information or the degradation of self-immunity, whether social media contributes to the spread of rumours or strengthens the self-purification function of the network has become a difficult problem for scholars [4-5]. As the non-disclosure or untimely disclosure of information, the lack of attention or trust of the government, opinion leaders and network promoters, network public opinion is very easy to induce all kinds of network rumours. The release and dissemination of information on social media are extremely convenient. The classification goal is four classification tasks. Each text is divided into support, opposition, doubt and comment, referred to as sdqc task for short. At the same time, the Sina Weibo platform has also opened a Weibo Biyao account to regularly release confirmed rumours for users to browse [6]. Automatic rumour detection technology has important research value and social significance for purifying the network environment, avoiding social chaos or turbulence and eliminating threats to the country. Therefore, this paper proposes a deep-learning algorithm to detect and track rumours in social media [7]. With the continuous development and maturity of deep learning algorithms, it has been successfully applied in different fields. It has shown excellent performance, such as emotional information extraction in text mining tasks. It can predict whether the input rumour data is true or false in combination with auxiliary information. It can also be modelled as a three-classification problem, that is, true, false and unverifiable. Therefore, it is very necessary to rely on a computer for automatic rumour detection, and the rumour detection model is applied to this scenario for early rumour detection to effectively curb the spread of rumour [8-9]. The deep learning algorithm can also capture the complex relationship of the data itself from many data sources and map the learned different feature vectors to the same implicit space to obtain a unified data representation. On this basis, the deep feature representation of data is fused into the traditional recommendation algorithm to effectively fuse multi- source. The deep learning algorithm assumes that users' social communication process in social networks will affect the modelling of users' preferences. According to the characteristics that graph convolution networks can model 84 Informatica 48 (2024) 83–96 C. Han et al. graph structure and node information, this paper uses a convolution neural network [10]. Traditional methods often find it difficult to handle situations lacking historical data. However, deep learning models can make reasonable recommendations for new users or products by learning the common features of users and products. In addition, some deep learning techniques, such as transfer learning and meta learning, can also transfer knowledge from other tasks or fields to recommendation systems to alleviate cold start problems. With the rapid development of platforms such as e-commerce and social media, the number of users and products is constantly increasing, and the scale of data is also becoming larger and larger. Traditional methods often have low efficiency in processing these large-scale data. However, deep learning models, especially distributed deep learning frameworks, can efficiently handle these dynamic and large-scale data. Through technologies such as parallel computing and distributed storage, deep learning models can complete training in a short period of time and update recommendation results in real-time. According to different research ideas of rumour detection, early work usually builds classifiers based on different types of manual features through supervised learning [11-12]. The reading, dissemination and expression of information become very easy. The audience can pay attention to and forward the news they are interested in. In addition, due to the low threshold and low cost of social media, the role of the "gatekeeper" of the media is weakened, and it is difficult to accurately and quickly work. In terms of techniques, we break through the limitations of traditional skills under the deep learning algorithm, using traditional painting and involving devices, finished materials, popular symbols, digital technology and other means [13]. I put forward the following innovations in this paper: (1) The LSTM model is constructed in this paper. In the LSTM model, three gates are set up to determine whether the input value of the upper layer is important enough to be remembered and output. Each gate is controlled by a Sigmoid function unit, in which if the value generated. (2) The rumour detection is discussed and described. According to the different characteristics of rumour data, rumour detection is divided into two sub-tasks: rumour position classification and rumour authenticity prediction. Among them, the rumour position classification task is oriented to the tree-structured data set, and its general structure is a source text and different users' replies to this text. The classification goal is four classification tasks, and each text is divided into support, opposition, questioning and comment, which is referred to as SDQC task for short. At the same time, the Sina Weibo platform has also opened an account on Weibo Biyao and regularly publishes proven rumours for users to browse. The second chapter mainly describes the research status of wagging words at home and abroad, this paper's research work, and its significance. The third chapter discusses the realization of the deep learning model. The fourth chapter studies the data set and makes experiments and analyses on rumour detection in social media. The fifth chapter is a summary of the full text. 2 Related work 2.1 Research status of waving words at home and abroad Since the "school fever", people have never stopped studying social media. Most researchers summarize its characteristics from five angles: First, analyze the cultural oscillation caused by social media from a cultural perspective. The rise of new media will inevitably lead to new media culture. As a new branch of popular culture, the Internet influences the lifestyle and values of current netizens. The model considers the daily cycle of rumours, such as holidays, and the impact of external shock cycles, such as the quadrennial general election, and shows that rumours are likely to fluctuate with the time cycle [14]. Aieha B, et al. Put forward that the Internet will make rumours spread even more powerful: Professor Chao Naipeng used various methods such as psychology and sociology to demonstrate the propagation mode and characteristics of Internet rumours in the study of rumour phenomenon in Internet communication and pointed out that "Internet rumours are more harmful and more difficult to control than oral rumours" [15]. Wu L, et al. The proposed automatic rumour detection for social media aims to use the relevant information of suspected rumours, such as text content, comment information, forwarding mode, publisher's personal data, etc., to identify whether the message published on social media is a rumour [16]. Chua et al. The elaboration of the value and communication characteristics of social networks pointed out that people's use of social network services is a social communication demand, and the "weak connection" can enable people to obtain the valuable social capital they need [17]. Bai n, et al. Proposed to analyze and detect rumours for Chinese Sina Weibo. The article pointed out that at that time, the number of users of Sina Weibo was eight times that of Twitter, and it had many functions different from Twitter. One of them was that Sina Weibo had an official rumour reporting and publishing platform [18]. Wang Z, et al. Proposed to explore the characteristics and dissemination subjects of social media rumours to scientifically deal with the generation and dissemination of online rumours, which is important content for the state and the government to control online public opinion [19]. LAN y, et al. Put forward the relationship formula of rumour intensity: rumour intensity = rumour importance * rumour ambiguity. This formula was later revised by Koros as rumour intensity = rumour importance * rumour ambiguity / public criticism. Later, French scholar Kapfrey and American scholar CASS Sanstein studied and explained rumours from the perspective of sociology and politics [20]. Srinivasan s et al. Pointed out that widely spread rumours are often confusing, and social media users cannot effectively identify rumours due to professional knowledge or time and space constraints. Moreover, the scale of social media information is huge, and it is impossible to invite experts Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 85 to identify rumours exhaustively. Many news organizations and social media service providers are trying to build a rumour reporting platform [21]. Sheng L, et al. Proposed a rumour detection algorithm for Twitter research. Its main idea is to select comments containing common sense knowledge and investigative news and assume that they are disputes over the authenticity of relevant information to judge whether the relevant information is a rumour [22]. Pinheiro a, et al. Proposed to reproduce the social media communication model from the perspective of Neo Confucianism. Such research requires professional science and engineering knowledge and mastery of network topology. Restoring network communication modes helps us understand various complex communication behaviours in social media [23]. Table 1: Summary table Researchers Method/Model Data source Feature Evaluation indicators Contribution/Discovery Aieha B, et al Psychology and Sociological Methods Online dissemination Rumor dissemination patterns and characteristics Not specifically mentioned Online rumors are more harmful and difficult to control than verbal rumors Wu L et al Automatic detection of social media rumors Social media Text content, comments, forwarding mode, publisher data Accuracy, recall rate, F1 score (expected) Using multi-source information to identify rumors Chua et al Social network value Social network services "Weakly connected" social capital Not specifically mentioned Social network services meet the needs of social communication Bai et al Analysis of Rumors on Sina Weibo Sina Weibo Official rumor reporting platform Not specifically mentioned The Rumor Characteristics of Sina Weibo and Its Official Platform Wang Z et al Characteristics of social media rumors Social media Characteristics of Rumors and Disseminators Not specifically mentioned The Importance of Scientific Response to Internet Rumors Lany et al Rumor intensity relationship Not specifically mentioned The importance and ambiguity of rumors Not specifically mentioned The relationship between rumor intensity, importance, and ambiguity Kapfrey&CASS Sanstein From the perspectives of sociology and political science Not specifically mentioned The social and political impact of rumors Not specifically mentioned The Social and Political Interpretation of Rumors Srinivasan et al Identification and dissemination of rumors Social media Difficulty in identifying rumors Not specifically mentioned The limitations of social media users in identifying rumors Sheng L et al Twitter rumor detection algorithm Twitter Common sense knowledge and investigative news commentary Accuracy, recall rate, F1 score (expected) Using controversial comments to detect rumors 2.2 The research work of this paper and its significance Based on the results of the former's research on rumours in social media, I put forward a deep learning algorithm to study rumours in social media. As the main force of communication tools in the new era, the emergence of social media has successfully created a hot topic and enriched people's talk after dinner. However, everything has two sides. In social media, anyone can make any comments through any terminal anytime and anywhere, which undoubtedly provides convenience for people's life, but also creates conditions for the emergence and spread of rumours. In recent years, Weibo has developed rapidly. Once a public crisis happens in society, we will pay attention to the information on Weibo for the first time, and a large amount of information will gather on Weibo, which will form a hot topic or trend and then attract the traditional media to follow up. The spread of rumours in social media also conforms to this rule. These rumours detection methods based on classification features have 86 Informatica 48 (2024) 83–96 C. Han et al. achieved initial results [24]. However, the manual design of features is time-consuming and labour-consuming, and the designed features are often limited to specific scenes, so the generalization ability is not good. The rumour authenticity prediction task is oriented to the data set of a single text. The classification goal is a two-classification task. Based on the analysis and extraction of rumour characteristics based on a deep learning algorithm, this paper constructs a real-time rumour detection model to predict the possibility of each message becoming a rumour simultaneously so that supervisors can quickly obtain suspected rumours and actively identify and track them. As for the analysis of the spread process of Yao rumours in social media, this paper mainly analyses some typical rumours to explore the related factors, spread patterns and characteristics of rumours to pave the way for the spread effect and control of rumours below. 3 Implementation of deep learning model 3.1 Principle and algorithm of deep learning Once the feature vectors are determined, we can use deep learning models for training and extraction. During the training phase, we use labeled data (i.e., samples known to be Botnet or non-Botnet) to train the model, enabling it to learn the ability to distinguish between Botnet and non- Botnet. In the extraction stage, we apply the trained model to new, unlabeled data to predict whether they belong to Botnet. In the specific application of Botnet recognition, deep learning models can perform well because they can capture complex patterns that traditional methods find difficult to detect. For example, deep neural networks (DNN) or recurrent neural networks (RNN) can be used to analyze time series data, such as network traffic logs, to detect abnormal patterns related to Botnet behavior. In addition, Convolutional Neural Networks (CNNs) are also very effective in processing image data, such as screenshots or network topologies, and can recognize visual features related to Botnet activities. [25]. Subsequently, new depth structure model algorithms have been proposed, setting off a wave of research on depth learning. It is not too long the ten years of deep learning development. Most of the models are based on the most basic core models. The advantage of LSTM lies in increasing the forgetting threshold, input threshold and output threshold so that it can have a variable cyclic weight. This will make the integral scale change dynamically even at different times when the parameters are fixed, thus solving the problems of gradient expansion or gradient disappearance [26]. It can be seen from the internal structure diagram of LSTM that the difference between LSTM and RNN is that three gates are set up in the LSTM model to determine whether the input value of the upper layer is important enough to be remembered and can be output. Each gate is controlled by a Sigmoid function unit, in which if the value generated by the input gate is close to zero, it will block the value here and will not go to the next level; The value generated by the forgetting gate is close to zero. The model is input as a piece of Chinese text, which goes to the input layer formed by the model, and then passes through a multi-layer network structure formed by a plurality of bidirectional lstm. Finally, the corresponding output is the probability distribution of this piece of text under different classification results. The structure diagram of the LSTM model is shown in Figure 1. Firstly, we need to clarify that LSTM (Long Short- Term Memory) network and RBM (Restricted Boltzmann Machine) are two different neural network models, each with its own structure and computational approach. I will first explain the core concepts of the two separately, and then expand and analyze them based on these concepts. LSTM network is a special type of recurrent neural network (RNN) that can learn and remember dependency relationships in long sequences. Each LSTM unit contains three gates: input gate, forget gate, and output gate, as well as a memory cell state. These gates and memory cells together determine the flow of information in the network. 𝐸 ( 𝑣 , ℎ | 𝜃 ) = − ∑ 𝑎 𝑖 𝑣 𝑖 𝑛 𝑖 = 1 − ∑ 𝑏 𝑗 ℎ 𝑗 𝑚 𝑗 = 1 (1) In the above formula, 𝑎 𝑖 represents the bias of visible neuron 𝑖 , 𝑏 𝑗 represents the bias of hidden neuron 𝑗 , and 𝜃 = { 𝑤 𝑖𝑗 , 𝑎 𝑗 , 𝑏 𝑗 } is the parameter of RBM. When the parameters are determined, we can get the joint probability distribution of ( 𝑣 , ℎ ) 𝑃 ( 𝑣 , ℎ | 𝜃 ) = 𝑒 − 𝐸 ( 𝑣 , ℎ | 𝜃 ) 𝑧 ( 𝜃 ) (2) Where 𝑍 ( 𝜃 ) is the allocation function. 𝑣 distribution 𝑃 ( 𝑣 | 𝜃 ) in RBM is the marginal distribution of joint probability distribution 𝑃 ( 𝑣 , ℎ | 𝜃 ) of observation data 𝑃 ( 𝑣 | 𝜃 ) = 1 𝑍 ( 𝜃 ) ∑ 𝑒 − 𝐸 ( 𝑣 , ℎ | 𝜃 ) ℎ (3) When the state of visible neurons is given, the activation states of hidden neurons are conditionally independent. The activation probability of the second hidden neuron is 𝑃 ( ℎ 𝑗 = 1 | 𝑣 , 𝜃 ) = 𝜎 ( 𝑏 𝑗 + ∑ 𝑣 𝑖 𝑤 𝑖𝑗 𝑖 ) (4) Where 𝜎 ′ ( 𝑥 ) = 1 1 + 𝑒 𝑥 𝑝 ( − 𝑥 ) is the sigmoid activation function. According to the symmetry of the RBM structure, when the state of a hidden neuron is given, the activation probability of the 𝑗 visible neuron is: 𝑃 ( 𝑣 𝑖 = 1 | ℎ , 𝜃 ) (5) Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 87 LSTM is a special type of recurrent neural network (RNN) that overcomes the gradient vanishing and exploding problems encountered by traditional RNNs when processing long sequences by introducing gating mechanisms. [27]. The specific calculation process of LSTM is shown in Figure 2. Because LSTM's feature extraction ability is not ideal, we use the LSTM model embedded with the Attention mechanism to detect common sense rumours. The specific steps are as follows Figure 1: Structure diagram of the LSTM model Figure 2: Schematic diagram of LSTM mechanism calculation (1) First, use the Word2Vec model and Adam optimizer to realize the vectorization of input text. The corpus p of this paper consists of n sentences, and each sentence consists of m words. (2) Before inputting text into the LSTM model, it is usually necessary to vectorize the text, such as using the Word Embedding method to convert words into fixed dimensional vectors. These vectors not only contain semantic information of words, but also enable the model to handle variable length sequences. Once the text is vectorized, attention mechanisms can evaluate the importance of these vectors (i.e. local features of the text) at each time step. This can be achieved by calculating a weight vector that is associated with each position in the 88 Informatica 48 (2024) 83–96 C. Han et al. input sequence and reflects the importance of that position to the model output [28]. (3) Finally, local features and global features are fused, and the classification results are output by a classifier. 𝐻 𝐾 = 𝐿𝑖 𝑛 𝑒𝑎 𝑟 ( 𝐻 ; 𝜃 𝐻𝐾 ) (6) 𝐻 𝑉 = 𝐿𝑖 𝑛 𝑒𝑎 𝑟 ( 𝐻 ; 𝜃 𝐻𝑉 ) (7) Where 𝜃 𝐻𝐾 and 𝜃 𝐻𝑉 are training parameters representing the full connection layer without activation function, 𝐻 𝐾 , 𝐻 𝑉 ∈ 𝑅 . 𝑑𝑖 𝑚 is the converted feature dimension. The visual feature is converted into the same feature dimension 𝑀 ∈ 𝑅 as the text feature 𝐻 through the full connection layer, and then the converted image feature is converted into 𝑞 𝑢 𝑒𝑟 𝑦 . The 𝑀 𝑄 calculation process is shown in the formula. 𝑀 𝑄 = 𝐿𝑖 𝑛 𝑒𝑎 𝑟 ( 𝑀 ; 𝜃 𝑀𝑄 ) (8) Where 𝜃 𝑀𝑄 is the training parameter, 𝑉 𝑄 ∈ 𝑅 and 𝑚 represent the number of candidate frames. The use of multi head attention mechanism is a key component in the Transformer model, which allows the model to simultaneously focus on different parts of the input sequence in different representation subspaces. Multi head attention divides input into multiple "heads", each head independently learns a set of attention weights, and then concatenates the outputs of these heads and linearly transforms them to obtain the final output. We define some symbols to better understand the multi head attention mechanism. 𝑎𝑡𝑡𝑒 𝑛 𝑖 𝑀𝐻 = 𝑠 𝑜𝑓𝑡 𝑚 𝑎𝑥 (9) 𝑀 𝑢 𝑝 𝑑𝑎 𝑡 𝑒 = 𝐴 𝑡𝑡 𝑒 𝑛 𝑀𝐻 × 𝐻 𝑣 (10) 𝑀 = 𝐿𝑖 𝑛 𝑒𝑎 𝑟 ( 𝑀 , 𝑀 𝑢 𝑝 𝑑𝑎 𝑡 𝑒 𝜃 𝑀 ) (11) Where 𝜃 𝑀 is the training parameter. Finally, the visual feature 𝑀 is averaged and pooled to get the final visual feature 𝑀 ∈ 𝑅 . The updated visual features and text features are fused by attention weight, and the calculation process is shown in the formula: 𝐴𝑡 𝑡𝑒 𝑛 𝐻𝑀 = 𝑠 𝑜𝑓𝑡 𝑚 𝑎𝑥 ( 𝑊 2 𝐻𝑀 + 𝑏 1 𝐻𝑀 ) (12) Where, 𝑊 𝐻𝑀 is the weight matrix and 𝑏 𝐻𝑀 is the deviation. By optimizing the spread graph, we can build a news credibility prediction model, find the spread patterns that distinguish rumours based on heterogeneous user representation and modelling methods, and fuse images, embedded text in images and text content to carry out rumour detection. It is usually necessary to use the vector with an intersecting semantic presentation to carry out the research. However, identifying these websites generally depends on manual review, which will not only cause a waste of human resources but also cause a certain error rate due to the differences in people's knowledge level and identification ability. 3.2 Rumour detection According to the different characteristics of rumour data, rumour detection is divided into two sub-tasks: rumour position classification and rumour authenticity prediction. Among them, the rumour position classification task is oriented to the data set of the tree structure. Its general structure is a source text and the replies of different users to the text. The classification goal is four classification tasks. Each text is divided into support, opposition, doubt and comment, referred to as sdqc task for short. At the same time, the Sina Weibo platform has also opened a Weibo Biyao account to release confirmed rumours for users to browse regularly. However, identifying these websites generally depends on manual review, which will not only cause a waste of human resources but also cause a certain error rate due to the differences in people's knowledge level and identification ability. These rumours detection methods based on classification features have achieved initial results. The rumour authenticity prediction task is oriented to the data set of a single text. The classification goal is a two-classification task. It can predict whether the input rumour data is true or false in combination with auxiliary information. It can also be modelled as a three-classification problem, that is, true, false and unverifiable. Therefore, it is very necessary to rely on computers for automatic rumour detection, and the rumour detection model is applied to this scenario for early rumour detection to curb the spread of rumours effectively. According to different research ideas of rumour detection, early work usually builds classifiers based on different types of manual features through supervised learning. For example, features are extracted from texts and user profiles, and classifiers such as support vector machines and decision trees are used to predict the credibility of Twitter. However, Weibo is an open platform where people can participate. This high participation will undoubtedly make the spread of rumours more rapid. At the same time, it will give more people more opportunities to participate in evaluating rumours and expressing their views on the rumours. However, this method of combining features from different sources only increases the amount and types of information that can be used by the model and does not pay enough attention to early detection. Such methods can't find rumours as early as possible in practice. By optimizing the spread graph, we can build a news credibility prediction model, find the spread patterns that distinguish rumours based on heterogeneous user representation and modelling methods, and fuse images, embedded text in images and text content to carry out rumour detection. The evaluation of users' rumours can objectively describe the correctness of the rumours and provide effective features for classification from another different angle. Compared with some simple features before, it will undoubtedly be more convincing. It is found that compared with rumours, rumour correction has a clearer content definition, more Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 89 reliable information sources and less emotional use. At the same time, it is found that the followers of opinion leaders can ease the relationship between emotional characteristics and the other two types of characteristics; that is, if opinion leaders have too emotional remarks, they will be supplemented by opinion followers, thus enhancing the clarity of content and reliable sources. 4 Experimental results and analysis 4.1 Data set In order to deeply analyze and evaluate the classification performance of rumor and non rumor Weibo, we obtained a detailed dataset from the Sina Community Management Center. We obtained confirmed rumors on Weibo from Sina Community Management Center, which have been officially reviewed and confirmed. In order to form a comparison, we also randomly selected non rumor Weibo accounts that are comparable in quantity to rumor Weibo accounts from the Sina Weibo platform. These Weibo posts have been confirmed as non rumors by manual or automated tools, such as models based on content and user behavior. HTML tags, special characters, URL links, and usernames mentioned at @ were removed from Weibo, and only plain text content was retained. Use Chinese word segmentation tools (such as Jieba word segmentation) to segment Weibo text. Common Chinese stop words such as "de", "yes", and "zai" have been removed. For different forms of certain words, such as "running" and "running", we performed stem extraction to represent them uniformly. To train our model, we divided the dataset into training sets at a ratio of 70%. In order to adjust the hyperparameters of the model and avoid overfitting, we used 15% of the data as the validation set. Finally, we use the remaining 15% of the data as the test set to evaluate the performance of the model. To obtain more microblog data, we obtained a group of known rumours from the Sina community management centre, which reports all kinds of rumours. We also collected a considerable number of non-rumour microblogs. The detailed statistical data of the two data sets are shown in Table 1. We choose the following four representative methods for performance comparison and use a convolutional neural network (CNN) to represent the original Weibo of suspected rumours and classify them. All text information of the forwarding sequence is processed into TF-IDF vector representation (Salton et al . , 1988), and the support vector machine classifier is trained to classify rumours. The rumour classifier is trained using the 2-layer gated recurrent neural network GRU (Choe et al. 2014). We evaluate the performance of these models by using evaluation indicators such as accuracy and recall. The detailed results of different methods under various evaluation indicators are shown in Table 2. Before inputting text into the LSTM model, it is usually necessary to vectorize the text, such as using the Word Embedding method to convert words into fixed dimensional vectors. These vectors not only contain semantic information of words, but also enable the model to handle variable length sequences. Once the text is vectorized, attention mechanisms can evaluate the importance of these vectors (i.e. local features of the text) at each time step. This can be achieved by calculating a weight vector that is associated with each position in the input sequence and reflects the importance of that position to the model output. In the design of the CED-CNN model, the introduction of CNN (Convolutional Neural Network) may play a crucial role. CNN can automatically learn and extract spatial features from input data, which is very effective for processing text and image information on social media. By combining CNN with your early detection strategy, the CED-CNN model can more accurately capture the features of rumors and recognize them in the early stages. The experiment was conducted on the data set. According to the info gain attribute EVA method and gain ratio attribute Eval method, we used Twitter's streamable to obtain data for 4 months, and the total amount of text contained was about 63g. This information is randomly sampled and pushed by Twitter in all messages. The data set contains about 60 million tweets in total. The experimental results are shown in Table 3 and Table 4, respectively. Table 1: Data set statistics Weibo-all Rumour Non-rumour All samples Number of samples 3850 4198 8051 Number of Posts posted 2572046 2450822 5022867 The average number of Posts posted 667 583 623 Table 2: Experimental results of the Weibo data set Methods Weibo-stan Weibo-all Accuracy rate Recall rate F1 ER Accuracy rate Recall rate F1 ER CNN 0.808 0.828 0.803 100% 0.886 0.882 0.882 100% TF-IDF 0.858 0.798 0.867 100% 0.820 0.913 0.778 100% GRU 0.921 0.925 0.912 100% 0.905 0.922 0.902 100% Table 3: Analysis results of feature importance of info gain attribute Eval Factor Importance Factor Importance Factor Importance 90 Informatica 48 (2024) 83–96 C. Han et al. utc_offset 0.19621 Location 0.0096 IsGeo_enabled 0.0016 Language 0.1167 Follower_count 0.00376 Tag Num 0 time_zone 0.06436 Listed_count 0.00368 Statuses_count 0.00245 created_at 0.02392 Friends_count 0.00351 Favourites_count 0 Table 4: Analysis results of gain ratio attribute Eval feature importance Factor Ranking Factor Ranking Factor Ranking Language 1 Is Verified 5 IsGeo_enabled 9 Follower_count 2 Utc_offset 6 Tag Num 10 Listed_count 3 Location 7 Statuses_count 11 Friends_count 4 Created_at 8 Favourites_count 12 As seen from Tables 3 and 4, according to the comprehensive results of the two methods, regionalism is an important social relationship influencing factor consistent with the principle of homogeneity. The importance of language is self-evident. In addition, the number of fans, the number of followers, the number of lists, etc., are also important. The common feature of these attributes is that they can represent users' activity. The length of self-introduction, certification, registration time and other factors are related to the quality of users' published content. 4.2 Analysis and discussion of rumour detection in social media In this experiment, other features of the data mining algorithm, ant colony algorithm and deep learning algorithm in this paper are calculated, and the rumour model is trained. Then the real-time detection error rate of the rumour model is tested, and the experimental results are shown in Figure 3. Figure 3: Real-time error rate of detection under different algorithms Figure 3 shows the real-time error rate of detection under different algorithms. At the same time, although data mining algorithms and ant colony algorithms can achieve good performance in certain situations, they may have certain limitations when dealing with real-time data streams and new topic detection tasks. Data mining algorithms typically rely on manually designed features and rules, which may not fully cover all data patterns and variations. As a heuristic search algorithm, although ant colony algorithm can solve optimization problems to a certain extent, it may not be flexible and efficient enough when dealing with complex real-time data streams. In terms of topic coverage experiments (as shown in Figure 4), we can further analyze the ability of different algorithms to place texts discussing the same topic into the same Weibo topic cluster. Due to its powerful feature learning and representation capabilities, deep learning algorithms may be able to more accurately recognize and classify relevant texts, thereby generating more accurate and consistent topic clusters. Although data mining algorithms and ant colony algorithms can achieve the Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 91 same goals to some extent, they may be slightly inferior in accuracy and consistency. As can be seen from Figure 4, the coverage of different algorithms, that is, the degree of information coincidence between any two media data streams. And the coverage of this deep learning algorithm in the topic is 38.8%. It can be concluded that the coverage rate of the deep learning algorithm in topics is the highest, and the degree of information coincidence recorded in this paper is the proportion of the number of topics that can be successfully aligned with each other in the two data streams in the total number of topics. During the training of CED, each suspected rumour's "credible detection point" is constantly advanced and can be judged according to the threshold-based strategy during the test. Figure 4: Change of topic coverage under different algorithms Figure 5: Change curve of advance rate in the microblog data set 92 Informatica 48 (2024) 83–96 C. Han et al. As can be seen from Figure 5, when the amount of forwarding information used is less than 9%, CED / ced- om / ced-cnn can detect about 40% / 50% / 60% of microblogs, respectively. This verifies the advantages of considering the original microblog information and using convolutional neural network modelling. At the same time, the three CED methods we proposed can make effective early detection of rumours. The proportion has a local peak when using the whole post sequence for detection. There are few cases in ced-cnn, less than 8%, indicating that ced-cnn needs less post information for detectability and has a high utilization rate of post information. After eliminating some feature types to ensure the real-time detection of data flow, this study calculates the three methods through the data mining algorithm, ant colony algorithm, and other features of the deep learning algorithm in this paper trains the rumour model and then tests the real-time detection accuracy of the rumour model. The experimental results are shown in Figure 6. As can be seen from Figure 6, after eliminating some feature types for the real-time detection of data flow, for the three methods, when the time index reaches 40. The effectiveness of this method is also confirmed. A rumour about AIDS has been wildly forwarded on the Internet. The original is "Don't eat outside recently, especially barbecue, cold dishes, Lanzhou ramen, etc. a group of people infected with AIDS have used their blood to drop into food in some cities across the country. It has been confirmed that people have been infected." Although I have heard of similar legends for a long time, when I suddenly see the news, there is a kind of panic and uneasiness. Zhihulai representative online forum and Sina Weibo representative social networking sites were selected respectively to count the relevant posts and make the following comparison tables, as shown in Tables 5 and 6. Figure 6: Real-time detection accuracy under different algorithms Table 5: Statistics on the interaction of rumour-refuting posts in Zhihu Forum Topic post name ViewsNumber of replies Number of replies AIDS is a typical rumour spread by dripping blood 86 6 "Dripping blood" spreads rumours of AIDS 42 3 "AIDS people poisoned" speech disseminators were punished by public security 44 8 You are responsible for your online comments 16 3 Total 188 20 Table 6: Statistics of Sina Weibo rumour refutation post interaction Total number of selected topic posts Forwarding volume Comment volume 2137 2411 977 Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 93 As can be seen from Table 5 and Table 6, there are four subject posts on the Zhihu Forum, with 188 page views and 20 replies, which means that, at most, only 188 people have learned about rumours from this forum. On Sina Weibo, a total of 2,411 original theme posts were searched. The number of first reposts is 2411, and the number of comments is 977. The number may be staggering if the second reposts and the third reposts are counted. Coupled with 22% of diving groups who know silently but don't interact, the number of page views will be jaw-dropping. By comparison, it can be concluded that the vitality of rumours has become short and fragile, to a certain extent, due to the spread of social media. Because in the media age, it takes much more time and workforce to clarify similar rumours. In this paper, the deep learning algorithm makes a more comprehensive analysis of the testing effect by observing the false positive and false negative rates under all threshold settings. The optimal threshold can be selected through the trade-off between the two. In addition, the throughput measurement per second is calculated for many message characteristics. The performance test results of the real-time detection method in this paper under the full threshold setting are shown in Figure 7. As seen in Figure 7, this method makes a more comprehensive analysis of the testing effect by observing false positive and false negative rates under all visual threshold settings. In addition, by measuring the throughput per second of a large number of message feature calculations, this paper also tests the efficiency of feature calculation. According to the gainratioattributeeval method, this experiment's most important user characteristics are language, number of fans, number of lists, number of concerns, time zone, authentication, time zone offset, location, etc. The experimental results are shown in Table 7. As can be seen from Table 7, according to CfsSubsetEval, the most important features are location, language, time zone, time zone offset, etc. It should be noted that Location is the city name filled in the text box when the user registers. And time_zone and utc_offset are closely related to the location. To demonstrate the high efficiency of calculating implication degree and false feedback characteristics, this paper constructs a rumour detection system and processes 20,000 Weibo to test the system's throughput. This paper runs on a single core and tests it. In this study, the average throughput of an idle machine was obtained after repeated operation six times. The experimental results are shown in Figure 8.As can be seen from Figure 8, the detection method studied in this paper can process up to 7000 microblogs per second, which can realize efficient real- time rumour detection without burden. In addition, the number of news reports is very small compared with the number of social media messages. On this basis, this paper finds no significant difference in the performance of using k-term entry and vector distance to calculate the implication feature. Figure 7: Performance of real-time detection method of deep learning algorithm under full threshold setting 94 Informatica 48 (2024) 83–96 C. Han et al. Figure 8: Detection throughput, deep learning algorithm and the average value of Sina Weibo flow Table 7: Results of characteristic importance analysis of CFS subset eval model Factor Language Location Time_zone Ranking 1 2 3 Factor Follower_count Description length Utc_offset Ranking 4 5 6 5 Discussion In this study, we propose a real-time rumor detection method based on deep learning algorithms and emerging feature types. In order to comprehensively evaluate our method, we compared it in detail with the relevant work listed in Table 1. These works cover various methods from psychology and sociology to automated rumor detection systems. Firstly, in terms of performance indicators, our method performs excellently in both processing speed and accuracy. As shown in Figure 8, our system can process up to 7000 Weibo posts per second, which is very important in real-time rumor detection scenarios. In addition, the effectiveness of LSTM models in hotspot tracking tasks has been validated under deep learning algorithms, especially when the dimensionality of feature vectors is low. This result demonstrates the efficiency and practicality of our method. Secondly, in terms of model architecture and feature selection, our method adopts implicit features and pseudo feedback features, which are rarely mentioned in existing work. The introduction of these emerging feature types enables our model to better capture the complexity and diversity of rumors on social media. In addition, we also utilized traditional text features and demonstrated better performance in rumor detection tasks by combining multiple feature types. Finally, in terms of the dataset used, we chose Sina Weibo, a representative social media platform. Sina Weibo has a large user base, fast information dissemination speed, and frequent rumors. By processing and analyzing data from Sina Weibo, our method can more accurately reflect the spread patterns and characteristics of rumors on social media. 6 Conclusion In the research on the real-time rumour detection method of social media, this study proposes a real-time Yao language detection method based on a deep learning algorithm. This method mainly relies on two emerging feature types: implication feature and pseudo feedback feature, as well as the calculation of traditional text features. After eliminating some feature types for the real- time detection of data flow. The experimental results on real data sets show that the LSTM model under the deep learning algorithm is effective in hot spot tracking tasks. Especially when the dimension of the eigenvector is relatively low, its effect is much better than in other models. Therefore, it can be concluded that we can not completely rely on the purification function of social media; let it be and let it go. The government should also establish corresponding regulatory mechanisms to monitor the spread of rumours. Only under the premise of giving full play to the self-purification role of social media, multi-pronged and appropriate regulation can we ensure a healthy and orderly network operation environment. Detecting and Tracking Rumours in Social Media Based on Deep… Informatica 48 (2024) 83–96 95 Competing of interests The authors declare no competing of interests. Authorship contribution statement Chunyan Han: Writing-Original draft preparation, Conceptualization, Supervision, Project administration. Ling Lin: Language review, Methodology, Software. Data availability On Request Declarations Not applicable R efer ence s [1] S. Vosoughi, M. ‘Neo’ Mohsenvand, and D. Roy, “Rumor gauge: Predicting the veracity of rumors on Twitter,” ACM transactions on knowledge discovery from data (TKDD), 11(4): 1–36, 2017.https://doi.org/10.1145/3070644 [2] M. Mirbabaie, I. Amojo, and S. Stieglitz, “Affording Twitter in Emergency Situations: The Occurrence of Rumor Sense-Making,” Journal of Database Management (JDM), 32(2): 50–66, 2021. DOI: 10.4018/JDM.2021040104 [3] M. Guo, Z. Xu, L. Liu, M. Guo, and Y. Zhang, “An adaptive deep transfer learning model for rumor detection without sufficient identified rumors,” Math Probl Eng, 2020, 2020.https://doi.org/10.1155/2020/7562567 [4] T. Ma, H. Zhou, Y. Tian, and N. Al-Nabhan, “A novel rumor detection algorithm based on entity recognition, sentence reconfiguration, and ordinary differential equation network,” Neurocomputing, 447: 224–234, 2021.https://doi.org/10.1016/j.neucom.2021.03. 055 [5] Y. Cheng and L. Zhao, “Dynamical behaviors and control measures of rumor-spreading model in consideration of the infected media and time delay,” Inf Sci (N Y), 564: 237–253, 2021.https://doi.org/10.1016/j.ins.2021.02.047 [6] M. Huang, G. Zou, B. Zhang, Y. Gan, S. Jiang, and K. Jiang, “Identifying influential individuals in microblogging networks using graph partitioning,” Expert Syst Appl, 102: 70–82, 2018.https://doi.org/10.1016/j.eswa.2018.02.021 [7] M. A. Al-Garadi et al., “Analysis of online social network connections for identification of influential users: Survey and open research issues,” ACM Computing Surveys (CSUR), 51(1): 1–37, 2018. https://doi.org/10.1145/3155897 [8] F. P. Boogaard, K. S. A. H. Rongen, and G. W. Kootstra, “Robust node detection and tracking in fruit-vegetable crops using deep learning and multi-view imaging,” Biosyst Eng, 192: 117– 132, 2020. https://doi.org/10.1016/j.biosystemseng.2020.01 .023 [9] V. Chandrakanth, V. S. N. Murthy, and S. S. Channappayya, “UAV-based autonomous detection and tracking of beyond visual range (BVR) non-stationary targets using deep learning,” J Real Time Image Process, 1–17, 2022.https://link.springer.com/article/10.1007/s 11554-021-01185-w [10] Y. Li, X. Zhang, H. Li, Q. Zhou, X. Cao, and Z. Xiao, “Object detection and tracking under complex environment using deep learning‐based LPM,” IET computer vision, 13(2): 157–164, 2019. https://doi.org/10.1049/iet-cvi.2018.5129 [11] Y. Zou, R. Lan, X. Wei, and J. Chen, “Robust seam tracking via a deep learning framework combining tracking and detection,” Appl Opt, 59(14): 4321–4331, 2020. https://doi.org/10.1364/AO.389730 [12] N. Shlezinger, N. Farsad, Y. C. Eldar, and A. J. Goldsmith, “ViterbiNet: A deep learning based Viterbi algorithm for symbol detection,” IEEE Trans Wirel Commun, 19(5): 3319–3331, 2020. https://doi.org/10.1109/TWC.2020.2972352 [13] J. Zhang, S. Jiang, Y. Zhang, X. Liu, D. Wang, and F. Qiu, “Long-term tracking algorithm using deep features and a single shot multibox detector,” J Electron Imaging, 27(5): 53019, 2018.https://doi.org/10.1117/1.JEI.27.5.053019 [14] V. Indu and S. M. Thampi, “A psychologically- inspired fuzzy-based approach for user personality prediction in rumor propagation across social networks,” Journal of Intelligent & Fuzzy Systems, 41(5): 5425–5439, 2021. DOI: 10.3233/JIFS-189864 [15] A. I. E. Hosni, K. Li, and S. Ahmad, “Minimizing rumor influence in multiplex online social networks based on human individual and social behaviors,” Inf Sci (N Y), 512: 1458–1480, 2020.https://doi.org/10.1016/j.ins.2019.10.063 [16] L. Wu, Y. Rao, H. Yu, Y. Wang, and N. Ambreen, “A multi‐semantics classification method based on deep learning for incredible messages on social media,” Chinese Journal of Electronics, 28(4): 754–763, 2019.https://doi.org/10.1049/cje.2019.05.002 [17] A. Y. K. Chua and S. Banerjee, “To share or not to share: The role of epistemic belief in online health rumors,” Int J Med Inform, 108: 36–41, 2017. https://doi.org/10.1016/j.ijmedinf.2017.08.010 [18] N. Bai, F. Meng, X. Rui, and Z. Wang, “Rumor detection based on a source-replies conversation tree convolutional neural net,” Computing, 1–17, 2022. https://doi.org/10.1007/s00607-021- 01034-5 [19] Z. Wang and A. Chen, “On ISRC rumor spreading model for scale-free networks with self-purification mechanism,” Complexity, 2021: 1–9, 2021. https://doi.org/10.1155/2021/6685306 96 Informatica 48 (2024) 83–96 C. Han et al. [20] L. Yang, Z. Li, and A. Giua, “Containment of rumor spread in complex social networks,” Inf Sci (N Y), 506: 113–130, 2020. https://doi.org/10.1016/j.ins.2019.07.055 [21] S. Srinivasan and D. B. LD, “A neuro-fuzzy approach to detect rumors in online social networks,” International Journal of Web Services Research (IJWSR), 17(1): 64–82, 2020. DOI: 10.4018/IJWSR.2020010104 [22] L. Sheng, X. Guang, and X. Ma, “The Spread of Rumors and Positive Energy in Social Network,” Journal of Internet Technology, 19(5): 1515– 1524, 2018. https://jit.ndhu.edu.tw/article/view/1771/0 [23] A. Pinheiro, C. Cappelli, and C. Maciel, “Designing auditability in social networks to prevent the spread of false information,” IEEE Latin America Transactions, 15(12): 2282–2289, 2017. https://doi.org/10.1109/TLA.2017.8071089 [24] A. Pal, A. Y. K. Chua, and D. H.-L. Goh, “Debunking rumors on social media: The use of denials,” Comput Human Behav, 96: 110–122, 2019. https://doi.org/10.1016/j.chb.2019.02.022 [25] H. J. Oh and H. Lee, “When do people verify and share health rumors on social media? The effects of message importance, health anxiety, and health literacy,” J Health Commun, 24(11): 837– 847, 2019. https://doi.org/10.1080/10810730.2019.1677824 [26] O. Oh, P. Gupta, M. Agrawal, and H. R. Rao, “ICT mediated rumor beliefs and resulting user actions during a community crisis,” Gov Inf Q, 35(2): 243–258, 2018. https://doi.org/10.1016/j.giq.2018.03.006 [27] W.-H. Tsai and Z.-W. Lin, “Social Constructionism and the Significance of Political Rumors in Contemporary China,” Asian Surv, 59(5): 870–888, 2019. https://www.jstor.org/stable/26848407 [28] S. Luna, “Affective atmospheres of terror on the Mexico–US border: Rumors of violence in Reynosa’s prostitution zone,” Cultural Anthropology, 33(1): 58–84, 2018.https://doi.org/10.14506/ca33.1.03