https://doi.org/10.31449/inf.v47i4.3911 Informatica 47 (2023) 523–536 523 
CoBiAt: A Sentiment Classification Model Using Hybrid ConvNet-
Dual-LSTM with Attention Techniques 
 
Roop Ranjan
 
and A K Daniel
 
Department
 
of Computer Science and Engineering, Madan Mohan Malaviya University of Technology, Gorakhpur, 
UP, India 
 
Keywords: deep learning, Dual-LSTM, keras, FasText, attention, emotion analysis  
 
Received: Januar 14, 2022 
 
Many researchers have recently turned their attention to emotion analysis as a resultant to the number 
of social reviews of various services. User behaviour may be better understood with the plethora of data, 
which makes it possible to work toward enhancing QoS. The critical areas of research in language 
processing are text categorization, which places unorganised data into relevant categories. In several 
natural language processing (NLP) applications, LSTM and CNN are utilised for text classification. CNN 
models use techniques to obtain top-level features. In this study, an attention-based model using Dual-
LSTM and ConvNet has been proposed. For effectiveness verification of the model, it is trained using two 
unique datasets. The proposed hybrid model has demonstrated a significant performance gain when 
compared to previous deep learning techniques. In comparison to other traditional machine learning 
models, the suggested approach yields outcomes with a higher level of accuracy. 
Povzetek: Predstavljena je metoda CoBiAt za klasifikacijo čustev z uporabo Dual-LSTM in ConvNet . 
 
1 Introduction 
The widespread usage of social media allows people to 
provide comments on events, situations, services, and 
products qualities [1]. These comments are frequently 
based on user experiences, which may include good or bad 
thoughts about items or services. These suggestions will 
assist firms in improving their services, allowing them to 
generate enhanced profit. As a result, it is critical to assess 
user input gathered from social networking websites. 
Analysis of Sentiments is useful in expressing users' 
opinions (positive, neutral or negative) about their 
services through text data [2]. It has also been observed 
that scholars are becoming more interested in social media 
platforms such as Facebook and Twitter. Businesses 
embrace public opinion analysis because it describes 
human activity and behaviour, as well as how individuals 
are influenced by the viewpoints of others. 
In recent era, the widespread applications using deep 
learning has led to advancements in image processing, 
natural language processing, and speech recognition. 
Deep learning is superior to machine learning in 
classification of sentiment analysis problems due to the 
availability of large datasets and the inexpensive mass 
fabrication of capable Graphics Processing Unit (GPU) 
units [5]. Deep learning is extensively utilized in the 
models based on natural language processing (NLP) [6], 
including emotion analysis, because to its autonomous 
learning characteristics. The two most often used deep 
learning algorithms in sentiment analysis of reviews are 
CNN and Recurrent Neural Network (RNN).  Gradient 
disappearing and exploding problems plague RNN to a 
large extent. RNN is challenging to train for long distance  
 
correlations for a given series because of these difficulties. 
The RNN model underpins BiLSTM, which has shown 
promising results in text-based sentiment analysis. An 
LSTM model like this features channels for two-way 
communication to help the network comprehend its 
environment. The forward and backward layers of 
BiLSTM allow the network to access the sequence's prior 
and subsequent contexts [7]. 
In the past few years, many CNN- or RNN-based 
methods for classifying text have been proposed [8, 9]. 
CNNs can learn the local behaviour from temporal data, 
but not sequential correlations. RNNs are specialised for 
sequential modelling, as opposed to CNNs, but are unable 
to extract features in parallel. Traditional RNNs, however, 
result in an exploding and vanishing state versus its 
gradient for extended data sequences. Vanishing gradient 
and gradient explosion problems are successfully solved 
by long short-term memory (LSTM) [10], a type of RNNs 
architecture with LSTM units as hidden units. 
The attention-based mechanism aids in enhancing the 
performance of deep learning models, sentence 
summarization, and reading comprehension, as 
demonstrated in the machine learning translation process 
[11, 12]. The majority of deep learning implementations 
for text analysis use word embedding techniques to 
produce feature vectors from the dataset. BiLSTM is 
unable to prioritise the critical information from the data 
and just collects contextual information from the features. 
In contrast to BiLSTM, CNN has a convolutional layer 
that extracts and shrinks vector features.  
In this research we aim to construct a novel text 
classification utilising a hybrid deep learning attention-
oriented optimal model that integrates the strong 
524 Informatica 47 (2023) 523–536 R. Ranjan et al. 
characteristics of CNN with BiLSTM employing an 
attention mechanism in order to handle the 
aforementioned problem. By adding a convolutional layer 
to the CNN model with attention features, the attention-
based Conv-BiLSTM mechanism, a unique approach, 
aims to solve the shortcomings of BiLSTM. The suggested 
model's overall methodology involves training the input 
data using the Keras skip-gram model, which is then 
passed to the Convolution Layer, which draws out the 
data's basic semantic information. The BiLSTM layer, 
which combines the attention-based approach to identify 
which characteristics are significantly associated with 
semantics and should be employed for final classification, 
receives the feature vectors obtained by the Conv layer. 
The training and performance evaluation of the 
proposed model is performed on two datasets; the first set 
of data is the collection of tweets for Indian Railways 
(hereafter IR Dataset) starting from 01
st
 October 2019 and 
31
st
 October 2019 [13] and the second dataset is IMDB 
reviews. Word embedding is performed using two 
prominent word embedding algorithms, FasText and 
Keras embedding. Experiments demonstrated that the Self 
Attention Based Conv-BiLSTM model outperformed 
other deep learning models and conventional machine 
learning methodologies. 
The following are the key contributions of the 
research: 
(1) Two distinct Word embedding approaches 
FasText and Keras embedding were used to render tweets 
as word vectors. Both of the strategies for word 
embedding make use of pre-trained, supervised word 
vectors that are able to capture the semantics of individual 
words and are taught using a large corpus of words. The 
effectiveness of the proposed CoBiAt model will be 
evaluated through the utilisation of these two-word vector 
models as the objective.  
(2) ConvNet that has been coupled with BiLSTM and 
the Self Attention methodology has been offered for 
classifying the reviews. The ConvNet module collects 
local features via word embedding, the Self Attention-
based BiLSTM module extracts long-distance 
associations, and the selected features are then categorised 
in the classification result.  
(3) The experimental results are compared with other 
popular deep learning-based techniques and standard 
machine learning approaches to prove the potential of the 
proposed optimised model. 
There are several drawbacks in deep learning models 
like CNN, RNN and LSTM etc. The main issue with CNN 
is that it does not provide clear encoding the orientation 
and position of content. Whereas RNN suffers from 
exploding problems and gradient vanishing issues. The 
LSTM technique takes longer time for training the dataset 
also implementing dropout in LSTM is a very tedious task. 
All the above issues with deep learning methods 
motivated us to design an approach that overcomes the 
drawbacks of deep learning methods. Therefore, we 
performed regressive experiments and proposed a 
hybridized model that combines the strong features of 
CNN with BiLSTM using an efficient attention-
mechanism to further optimise the performance and 
accuracy of the classification provided by the model. 
The following is the structure of this research paper: 
The background and important research for text sentiment 
categorization using ConvNet and attention-based Dual-
LSTM are discussed in Section 2. Section 3 describes in 
detail how the proposed model works. Section 4 describes 
the environment setup for executing the CoBiAt model. 
Section 5 discusses in depth the experimental outcomes 
for the research and comparison with other models. 
Section 6 finishes with a conclusion and suggestions for 
future research. 
2 Related work 
In recent times, public opinion analysis on network 
Deep learning techniques have shown remarkable 
accomplishments in the area of natural language 
processing in recent years. Deep learning is subfield of 
machine learning that seeks to represent high-level of 
abstractions in the given data. This is accomplished 
through the use of model architectures with complex 
structures or those built of several nonlinear 
transformations [14]. Convolutional Neural Network 
(CNN) learns complicated, high-dimensional, and non-
linear mapping relationships by fully utilising the structure 
of multi-layer perceptron. It has been frequently utilised 
and has produced good results in image identification and 
speech recognition applications [15]. 
CNN was proposed for use in natural language 
processing by [16], who also constructed a dynamic 
Convolution Neural Network (DCNN) technique to 
analyse dataset of varying lengths. In [17], the authors 
developed a method for analysing opinions regarding 
health services. They amassed 2,026 tweets using Twitter 
hashtags to compile their dataset. They compared DNN 
and CNN with word2vec embeddings as two DL models. 
CNN's model was the most accurate. The model was 
trained on a fairly limited dataset in this work, and neither 
model addressed the negation problem. In [18], five 
different combinations of LSTM models were utilised to 
analyse tweets. To train the models, they utilised both 
dynamic and static CBOW and a word embedding. The 
results demonstrated that the integrated LSTM model 
trained using dynamic CBOW performed better than the 
other models. 
Many neural network-based techniques have recently 
been demonstrated to be effective in a number of 
sentimental analysis applications. Given the attention-
based mechanism's tremendous success in natural 
language processing, its application in Sentiment Analysis 
tasks has gained in prominence. The paper [19] proposed 
using the deep attention mechanism to analyse user 
evaluations and produced better results than a recurrent 
neural network (RNN). The model [20] suggested a 
bidirectional gated recurrent unit-based position-aware 
bidirectional attention network (PBAN) (bi-GRU). To 
tackle this difficulty, the LSTM model [20] was 
suggested, which had the capacity to maintain sequence 
information and produced good results on several 
sequence modeling tasks. 
CoBiAt: A Sentiment Classification Model Using Hybrid ConvNet… Informatica 47 (2023) 523–536 525 
A powerful tool for capturing the relationship 
between context and aspect at the next level is an attention 
mechanism. The authors in [21] made a multi-grained 
attention network (MGAN) that uses both fine-grained 
and coarse-grained attention mechanisms to collect 
information about how aspect and context interact with 
one another. In another research [22] used the concept of 
feed-forward networks and multi-head attention (MHA) to 
effectively extract the hidden context presentation of and 
embeddings of aspects. In the paper [23], movie reviews 
from IMDb were analysed to see how people felt about 
them. Some pre-processing steps were taken to get rid of 
characters, symbols, words that were repeated, and "stop 
words." Then, CountVectorizer was used to extract 
features. The authors proposed CNN model and compared 
with several traditional models. The proposed CNN model 
achieved 99.33% accuracy. The dataset has 3000 reviews 
that are either good or bad. Meng et al. [24] used a CNN 
to get the higher-level feature representation from a better 
layer of word embedding. After a BiLSTM takes in local 
and global semantic information, an attention layer is used 
to focus on important term characteristics of aspects. 
Furthermore, Ma et al. created an attention-based LSTM 
that uses the common-sense knowledge of sentiment-
related concepts proposed in SenticNet for incorporating 
external knowledge [25]. 
In contrast to conventional machine learning 
classification algorithms, LSTM has demonstrated its 
effectiveness in achieving high classification accuracy 
[26]. This article [27] implemented deep CNN and 
BiLSTM. Deep CNN was implemented on character-level 
embeddings for improving word embeddings' 
information. Then, bidirectional LSTM is used for 
classifying the sentences according to their sentiment. 
This article emphasises data standardisation in order to 
obtain high performance, the researchers have created a 
tweet processor model to remove unnecessary terms from 
tweets. In their research [28], offer Hybrid two 
Convolutional Neural Networks and Bidirectional LSTM, 
a further variant of hybridized deep learning architecture 
for sentiment classification. Two CNN layers and a 
bidirectional LSTM layer are employed here. Seven 
datasets are evaluated using three pre-trained word 
vectors, GloVe, Word2Vec, and FasText. Word2Vec has 
been observed to be more efficient than the other two-
word vectors. 
In another research [29], the authors suggested a 
network of deep memory for sentiment classification. The 
proposed method may simultaneously record both user 
and product information. Inference-based memory 
components and a large long-term memory that also 
serves as a knowledge basis compose this memory 
network. Model architecture has two parts. Each 
document is represented by an LSTM, and its rating is 
predicted using a deep memory network with several 
levels (hops), all of which are content-based attention 
models. 
The researchers in [30] used Dual LSTM and Keras 
word embedding to classify traveller attitudes. The 
authors' proposed model performed word embedding 
using the two distinct words embedding algorithms 
word2Vec and Keras embedding. Keras embedding 
provided better results as compared to that of word2Vec. 
The authors in [31] devised hierarchy-based attention 
(HA) technique for capturing the hierarchical structure of 
documents at the sentence and text levels, 
where information of varying significance was given 
special treatment while generating document 
representations. Because most prior approaches focused 
just on localized information of contents and ignored 
preferences of global users and quality of product, [32] 
proposed a model for classification of sentiment using 
attention method for product information for global users. 
For text sentiment analysis, [33] recommended combining 
CNN with three distinct attention mechanisms: LSTM 
attention, vector attention, and pooling attention. [34] used 
a self-attention with a sparse technique for determining 
text emotion polarity by capturing the significance of each 
word. In another work, [35] looked at the challenge of 
classifying personality traits based on textual content. The 
authors used a hybrid CNN+ LSTM model to classify the 
text in a different personality trait. The research performed 
by the author’s shows that CNN is a powerful method for 
selecting the best characteristics that improve prediction 
accuracy and the LSTM model keeps earlier context 
information, which makes it easier to use important 
context information at the start of a phrase. Their proposed 
model proved to be significantly superior to other models. 
However, the lack of attention model was observed in their 
results. 
In innovative research in field of bio-medical 
engineering in [36], researchers presented a data analytical 
system for EEG utilising a multi-layer Gated Recurrent 
Unit (GRU) for anomaly identification. The proposed 
model consists of four stages from data collection to 
model performance evaluation. Using a publicly 
accessible EEG dataset, the suggested model achieved 
accuracy of 96.91 percent, sensitivity of 97.95 percent, 
specificity of 96.16 percent, and 96.39 percent F1 score. 
In [37] Wang et al., demonstrated the concept of weak 
tagging for sentiment classification. The authors presented 
a BiLSTM emotion categorization model with multi 
stages of training and a emotion classification model with 
weak tagging data denoising. The model for emotion 
classification based on weak type tagging information 
denoising had the best classification performance of all the 
experimental groups, but the model was also observed to 
be the most time-consuming.  
In the field of medical imaging deep learning has 
shown promising results. In [38], authors have 
demonstrated a model based on Transfer Learning (TL). 
Using a state-of-the-art CNN for fundus image processing, 
we customised TL for mild and multi-class DED (diabetic 
eye disease) cases in this study.  Using fine-tune, 
optimization the researchers achieved 88.3% accuracy. In 
another article [39], a unique hybrid CNN-BiLSTM deep 
learning strategy for four-level facial pain recognition is 
proposed. To achieve the results of pain intensity 
estimation satisfactorily, the fully connected layer of the 
VGG-Face was optimized for this task by addition of a 
fully connected layer, and the dimensions of the extracted 
features was reduced using PCA (Principal Component 
526 Informatica 47 (2023) 523–536 R. Ranjan et al. 
Analysis) to improve the algorithm's overall 
computational efficiency. The improved algorithm 
achieved an AUC of 98.4% and a test precision of 90%. 
By training a large number of sentiment text corpora, 
Using algorithms convex hull and convolution neural 
network, [40] proposes a model for fault detection in 
wireless sensor networks. The authors performed several 
experiments and found that CNN with Naïve Byes is 
proving to be better and more efficient.  The authors in 
[41], presented a model using block-chain with deep 
learning. Described all the fundamental ideas involved in 
the management and security of such data, and offered a 
novel solution to handle the hospital's big data utilising 
Deep Learning and Block-Chain technology to ensure 
their safety.  
Since RNN can preserve a sequence of information 
over time, it is a helpful supplement to CNN; nonetheless, 
RNN is severely affected by the Gradient explosion 
problem described by [42]. Because of these issues, 
distance correlation in a sequence is difficult to train with 
RNN. Bi-LSTM is an RNN model that has reportedly 
shown promising outcomes in the analysis of sentiment in 
the given text. It contains two LSTMs to allow the network 
for better understanding of the contexts provided. 
Backward LSTM's and forward layers grant the for 
accessing the sequence's preceding and subsequent 
context. Text sentiment classification, on the other hand, 
uses vectors to represent the text, which is often performed 
in a large-dimension space. When the Bi-LSTM retrieves 
relevant knowledge from several obtained features, it is 
unable to place a premium on the most important data 
[43].  
Considering the above issues in deep learning 
methods for text classification, the CoBiAt model has 
been proposed here that combines the strong feature of 
CNN and Bi-LSTM with attention model. The 
performance of the model indicates that the proposed 
model has the potential to tackle above issues in deep 
learning models. 
3 Proposed method 
The proposed attention-based method has the 
following layers: 
 
3.1 Input and Pre-Processing  
3.2 Word Vector Matrix 
3.3 ConvNet Layer 
3.4 Dual-LSTM Layer 
3.5 Attention Layer 
 
3.1 Input and Pre-Processing Layer 
In this part, the words are standardized and cleansed 
by changing them from a human language to text format 
in order to eliminate superfluous elements. This phase 
assists classifiers in achieving high performance and rapid 
sentiment classification. In this work, steps included 
tokenization of sentences into related words by using 
NLTK (Natural Language Toolkit), a popular python 
library: convert upper case to lower case; duplicate text 
removal, removal of special characters, reduction of 
several spaces, hashtags, URL, mention, punctuations and 
other stopwords; and lemmatization of texts in dataset. 
 
3.2 Word vector matrix layer 
The Word Vector Matrix layer embeds pre-processed 
input tokens. Token representation reveals hidden ties 
between words that often appear together. The input 
dataset was trained using the Keras Skip-gram model. 
Here the text from the input dataset has N-words, vector 
of the t
th
 words is represented by w t with t ∈ [1, T] in the 
given text.  Here the text is given with words w t, the words 
are embedded by layers for vectors using a layer of 
embedding W p. x t is depiction of vector for w t, which is 
represented using Eq.1. 
 
x t = W pw t                        (1) 
 
The model was trained with skip-gram approach by 
optimising the average likelihood log. The method trains 
semantic embedding by prediction of the target word 
based on context and detecting semantic links between 
words. 
 
3.3 ConvNet layer 
With the ConvNet layer, the task of selecting features 
from input data is accomplished. In view of reducing the 
number of dimensions in the source text, ConvNet layers 
are employed. Several 1-Dimensional Convnet kernels are 
employed for conducting convolution for the input vectors 
in this study. Equation (2) creates a sequential vector of 
text by integrating the component vectors of word 
embedding: 
 
X 1:V = [x 1, x 2, x 3, x 4, · · · x V],                           (2) 
 
where V represents the total size of tokens in the given 
text. Convolution kernels of different sizes are 
implemented on X 1:V for capturing the n-gram 
characteristics(where n=1,2,3) of the given text using a 
one-dimensional Convolution to capture the intrinsic 
features. When a window of r words spanning from v: v + 
r is input during the v
th
 convolution, the convnet process 
creates features set for such window as below: 
 
h r,v = tan h(W rx v:v+r−1 + d r),                  (3) 
Where X v:v+r−1, are the embedded vectors obtained of the 
given word in a given window, Wr is the weights matrix 
that can learn, and d r is the bias used in embedding. 
Since every filter primarily has to be implemented to 
different parts of the word, therefore the filter’s feature 
maps with the size of convolution r are: 
h r = [h r1, h r2, h r3, h r4, · · · x V−r+1].               (4) 
 
It is advantageous to utilise Convnet kernels of 
varying sizes in order to capture hidden connections 
between adjacent words. The most essential objective of 
CoBiAt: A Sentiment Classification Model Using Hybrid ConvNet… Informatica 47 (2023) 523–536 527 
using a CNN for text-based feature retrieval is that it 
minimises the number of learnable parameters throughout 
the max-pooling-based process of feature learning. 
Multiple channels of convolution act on the input, with 
each channel holding data at distinct timestamps. 
Consequently, the output of each ConvNet channel during 
the max-pooling operation is the largest value of all 
timestamps for that channel. Max pooling is implemented 
here to the maps of features with convolution size r for 
each convolution kernel to produce: 
 
s r = Max
v
(h r1, h r2, h r3, h r4, · · · x V−r+1)            (5) 
 
For obtaining the final feature map of the window, p r is 
combined for every filter size r = 1, 2, 3 and for 
extracting the n-gram (where n=1, 2, 3) hidden features: 
 
h r = [s 1, s 2, s 3].                                                       (6) 
 
3.4 BiLSTM layer 
The Dual-LSTM layer receives attributes as input 
from the ConvNet layer and extracts the final hidden state 
to produce features. The prior and subsequent context 
information is accessible to the Dual-LSTM module, and 
the data collected by the BiLSTM can be seen in two 
distinct text-based formats. The ConvNet feature sets are 
input into the Dual-LSTM model, which generates a 
sequential representation. By aggregating information for 
words in both directions (ahead and backward), Dual-
LSTM obtains word annotations and therefore the 
annotations include contextual information. The forward 
LSTM represented as (𝐿 1
⃗⃗⃗ 
) reads the sequences of features 
from first to last, whereas the reverse LSTM represented 
as (𝐿 ⃖⃗
2
) reads from last to first. The word annotation 
received from 𝐿 1
⃗⃗⃗ 
 is represented by 𝐿 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 and from 𝐿 ⃖⃗
2
 
the annotation is represented by 𝐿 𝑏𝑎𝑐𝑘𝑤𝑎𝑟𝑑 the 
bidirectional process is iterated and he final feature 
representation output (L) is obtained as follows:  
 
𝐿 = {𝐿 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 , 𝐿 𝑏𝑎𝑐𝑘𝑤𝑎𝑟𝑑 }                                (7) 
 
3.5 Attention layer 
This output representation from the Dual-LSTM layer 
is delivered to the attention layer, which assesses which 
features are highly interconnected and should be used for 
final categorization.  The attention mechanism, which is a 
fully linked layer with a softmax function that focuses on 
qualities of selected words to reduce the impact of less 
significant words on the text's sentiment, is a completely 
connected layer. The process of the attention layer is as 
follows: 
 
The word annotation  L
forward
 is initially supplied to 
get    U
forward
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
  by single perceptron as an uncovered 
format of L
forward
  .   U
forward
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
   is represented as follows: 
U
forward
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
= tanh (f∗L
forward
+ b)                                (8) 
 
Where weight and bias in the neuron is represented 
by f and b respectively. Hyperbolic tangent function is 
represented by tanh(). The layer calculates the 
significance of each word based on the similarity 
between 𝑈 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
 and a text level context vector 
𝐶 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
 for measuring the importance of every text.  
The softmax function is then utilized by the layer for 
calculating the normalized weight 𝑍 ̃
𝑓𝑤𝑑 of each word as 
follows: 
 
Z
̃
fwd
=
exp (U
forward
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
∗C
forward
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
)
∑ exp (U
forward
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
∗C
forward
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
)
M
i=1
                     (9) 
 
Here, the total number of texts in a particular set of texts 
is denoted by M. 
The text level context vector (C
forward
⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ 
) is an illustration 
of high level for the descriptive words for the set of word 
sequences that is initialized on a random basis and learned 
together throughout the phase of training. 
The forward context representation Fc is then produced as 
a weight-based addition of the read word annotations in the 
forward direction on the basis of weight parameter   Z
̃
fwd
.  
Fc is a component of the attention layer's output that may 
be represented as: 
 
Fc = ∑(Z
̃
fwd
∗L
forward
)             (10) 
 
Similar toZ
̃
fwd
, Z
̃
bwd
 is calculated with the help of the 
backward direction hidden state L
backward
.  Hc, like Fc, 
is a backward context representation that is part of the 
attention layer's output, and it is represented as: 
 
                Hc = ∑(Z
̃
bwd
∗L
backward
)             (11) 
 
The forward context representation Fc is concatenated 
with the backward context representation denoted as Hc, 
BiLSTM gets an interpretation for a particular sequence of 
features, and finally delivers the classification results. The 
attention layer improves the accuracy of prediction and 
decreases the size of learned weights required for 
predicting by using this method. 
 
4 Experiments 
4.1. Dataset 
Experiments are carried out in this part to evaluate 
the performance of the proposed model for text 
categorization on two unique datasets. Using Roop 
528 Informatica 47 (2023) 523–536 R. Ranjan et al. 
Ranjan et al. [30] as a starting point, we have 25000 
tweets from people who used Indian Railways services 
on various days in October 2019. Table 1 breaks down 
these tweets into three different categories: positive, 
neutral, and negative. The binary-labeled set of IMDB 
movie reviews is the second dataset used. There are 
14641 review tweets collected in the dataset as shown 
in Table 2. The dataset from IMDB movie was also 
utilised to compare the model to prior sentiment 
categorization research. The IMDB movie dataset can be 
obtained from 
https://www.kaggle.com/datasets/lakshmi25npathi/imdb-
dataset-of-50k-movie-reviews/metadata. 
In this section, experiments are conducted for 
 
Figure 1: Proposed architecture 
 
Table 1: Categorization details of tweets dataset 
 
Tweets Dataset 
Size Positive Negative Neutral 
25000 10695 8953 5352 
 
Table 2: Categorization details of IMDB movie 
reviews dataset 
 
IMDB Movies Review Dataset 
Size Positive Negative Neutral 
50000 18324 20625 11051 
 
Further both dataset are divided into Training, 
Validation and Testing Set using Gaussian distribution. 
The ratio of the total dataset is 60:20:20 for training, 
validation and testing dataset. 
4.2. Experimental setup 
Because of the GPU environment's support, Google 
Colab with Keras is being used with backend such as 
Tensorflow for Keras. Computing-intensive machine 
learning methods can be trained in shorter amounts of 
time when running on GPUs. In a GPU context, greater 
computational power is available, allowing for more 
training iterations while fine-tuning the machine 
learning models. 
4.3 Setting of hyper-parameters 
Implementing hyper-parameter tuning is crucial for 
High model performance can only be achieved if hyper-
parameter adjustment is implemented. The randomised 
search method is utilized to optimise hyper-parameters 
and improve the accuracy. Using a random combination 
of hyperparameters, randomised search determines the 
optimal answer for developing the model. Due to grid 
search's inability to perform well when there are a large 
number of dimensions, random search is preferred over 
grid search. Table 3 represents the hyper-parameters 
values using randomised search in the proposed model. 
 
Table 3: Setting of hyper-parameters 
 
Parameters Values 
Dimension(Embedding) Keras(300) 
Size of Kernel 5 
Output Size(Dual-
LSTM)  
32 
Filter Size 32 
Function 
(Regularization) 
L2 
Activation SoftMax 
Weight Constraints Kernel Constraints 
(max norm is 3) 
Epochs Count  100 
Batch Size 32 
Batch Normalization Yes 
Learning Rate (LR) 0.03 
Optimization Adam 
 
CoBiAt: A Sentiment Classification Model Using Hybrid ConvNet… Informatica 47 (2023) 523–536 529 
5 Results and discussion 
5.1 Performance comparison 
Using optimal hyper parameter values, the proposed 
model was compared to CNN, LSTM, CNN-LSTM, and 
BiLSTM, which are all deep learning-based models. The IR 
dataset and the IMDB Movies review dataset were used to 
make the comparison. Figure 2a shows the results of 
comparing the proposed model's overall accuracy with other 
deep learning models using FasText Embedding. On the IR 
dataset, CNN, LSTM, and BiLSTM all did better with 
FasText embedding than with Keras embedding. However, 
CNN-LSTM and CNN-BiLSTM with Keras embedding (Fig. 
2b) gave more accurate results. The fact that the observation 
was made shows that the proposed model is much better than 
other methods. For the IR dataset, the proposed model is 96.32 
percent accurate with FasText and 95.98 percent accurate with 
Keras. 
 
 
(a) 
 
 
(b) 
Figure 2: Accuracy of different models for IR dataset 
 
 
(a) 
 
 
(b) 
Figure 3: Accuracy of different models based for IMDB 
dataset 
 
The figure 3a and 3b provides comparison of the 
overall accuracy of the proposed architecture with other 
deep learning methods using Keras Embedding. The 
observation depicts that FasText embedding performs 
better than Keras embedding for IMDB dataset with an 
improvement of 1.12%. 
5.2 Evaluation of performance 
The proposed system's performance is assessed using 
the standard evaluation matrix illustrated in Figure 4. 
 
 
Figure 4: Standard evaluation parameters 
 
The standard validation parameters are described as 
below: 
 
• True Negative (TN) - These are accurately 
forecasted negative outcomes, demonstrating the 
value of actual class is zero and the outcome of 
the anticipated class is zero i.e. correct prediction 
of negative classes. 
• True Positive (TP) - TP are observed positives 
that are accurately predicted and indicate that the 
outcome of the actual class and the outcomes of 
the expected class are positive i.e. correct 
prediction of positive classes. 
 
False negative and false positives happen if actual 
class is different from the anticipated class. 
 
• False Positive (FP) – observations of the 
anticipated class is positive and actual class is 
negative i.e. incorrect prediction of positive 
classes. 
• False Negative (FN) - When the actual class is 
positive while the projected class is negative i.e. 
incorrect prediction of negative classes. 
 
530 Informatica 47 (2023) 523–536 R. Ranjan et al. 
Using these standard parameters following rules are 
implemented for evaluation of effectiveness of the 
proposed hybrid model: 
 
Precision (P) = Precision is termed as the proportion 
of correct anticipated positive outcomes to total projected 
positive outcomes. 
FP TP
TP
P
+
=                              (12) 
 
Recall(R) =   the proportion of properly forecasted 
positive outcomes to the overall observations in the 
positive class. 
     
FN TP
TP
R
+
=                                  (13) 
 
F-Measure (F) = the average of Recall and Precision 
is termed as F-Measure. Resulting in score takes into 
account both false negatives and false positives. 
 
) (
) * ( * 2
R P
R P
F
+
=                            (14) 
Accuracy (A) = the most important performance 
parameter is accuracy; this is just the proportion of 
predicted observations which are correct to all 
observations. 
 
  
FP FN TP TN
TP TN
A
+ + +
+
=           (15) 
 
Figure 5 illustrates the overall performance of two 
independent datasets and two distinct word embedding 
procedures utilising a variety of deep learning techniques. 
The suggested model outperformed competing strategies 
for both the IR and IMDB datasets. The overall precision 
of IR dataset with FasText embedding was observed 
96.32% which is 3.07% more than the CNN-BiLSTM 
Model and outperformed three other models with a huge 
improvement. The recall value is improved by 2.12% than 
the nearest best performing model BiLSTM. The overall 
performance of the proposed model having F-measure was 
observed to be 96.16% and accuracy of 96.32% for 
FasText embedding.  The performance of the proposed 
model has shown lesser performance when Keras Word 
Embedding is implemented on the different deep learning 
techniques and the proposed model. The model proposed 
here with Keras embedding displays the promising 
improvement for the classification with 96.01% precision, 
96.32% recall, 96.16% F-measure and 95.98% accuracy. 
This was observed because CNN and LSTM lacked proper 
information about the forthcoming context of the 
network's huge corpus of words.  
For IMDB dataset the study presented that the 
proposed model has shown improvement on other 
techniques. The proposed model with FasText shows the 
improvement of 3.03% over the CNN-BiLSTM model in 
predicting positive class prediction. As far as recall is 
concerned the improvement is 3.06%. For F-measure it 
shows improvement by 2.56%. The performance of 
proposed model over other studied techniques with Keras 
Embedding is also impressive with improvement of 
94.63%% in precision and 94.56% in accuracy. 
 
 
 
Figure 5: Performance evaluation on faxtext and keras word embedding 
 
For US Airlines dataset the study presented that the 
proposed model has shown improvement on other 
techniques. The proposed model with Word2Vec shows 
an improvement of 7.34% over the BiLSTM model in 
predicting positive class prediction.  
 
 
 
 
 
 
 
As far as the recall is concerned the improvement is 
7.96%. For F-measure, it shows an improvement of 
7.65%. The performance of the proposed model over other 
studied techniques with Keras Embedding is also 
impressive with an improvement of 9.39% in precision 
and 9.84% in accuracy. 
 
CoBiAt: A Sentiment Classification Model Using Hybrid ConvNet… Informatica 47 (2023) 523–536 531 
 
(a) 
 
      (b) 
 
                                      (c) 
 
                               (d) 
Figure 6: Model performance Word2Vec word embedding 
for IR dataset 
 
Figure 6 shows the performance of different evaluation 
parameters using the proposed model on Word2Vec 
Embedding for the IR dataset. Experiments were 
performed on another deep learning model on the same 
dataset. Word2Vec effectively initializes word vectors for 
the IR datasets, as shown by the higher level of correctness 
of experimental outcomes. It is clear that the proposed 
self-attention-based classification model provides better 
results for all evaluation parameters as compared to other 
techniques with Word2Vec. The values of precision for 
CNN were observed at 83.16%, the LSTM model 
performed precision 85.65%, CNN-LSTM performed 
76.32%. The BILSTM deep learning model performed 
much better than the other three with the precision of 
88.62% and 88.15% classification accuracy. The proposed 
Model outperformed other deep learning models with a 
precision of 95.96%.  Other performance parameters are 
also evaluated for different models. Comparison of the 
proposed model is performed and the proposed self-
attention-based model has shown an impressive 
improvement over other deep learning models with recall 
95.32%, F-measure of 95.64%, and accuracy of 95.35%. 
The accuracy of only CNN in the study was merely 
82.30%, while the accuracy of experiments on BiLSTM 
was 87.99% on the IR dataset, indicating that utilizing 
CNN and BiLSTM separately to conduct sentiment 
analysis did not yield useful results. Further features of 
BiLSTM and CNN were combined to optimize the 
accuracy performance, this approach shows better 
efficiency than CNN and BiLSTM having accuracy of 
91.60% on the IR dataset.  
 
 
(a) 
532 Informatica 47 (2023) 523–536 R. Ranjan et al. 
                                                                      
(b) 
 
(c) 
 
(d) 
Figure 7: Model performance using Keras word 
embedding for IR dataset 
 
Different deep learning models were also implemented on 
same IR dataset using Keras Embedding. Figure 7 displays 
the model performance on the said embedding. For each 
of the performance evaluation parameters the proposed 
model has shown significant improvement over others. 
The overall accuracy of the proposed optimized model 
was 9.84% better than BiLSTM method. Precision, Recall, 
and F-Measure were 9.39%, 11.18%, and 10.29% higher 
than the BiLSTM model. This indicates clearly that CNN 
and BiLSTM can’t offer great results on their own since 
CNN can't learn the correlation sequence for long-term 
dependencies and BiLSTM can’t extract local 
characteristics. The combination of CNN and BiLSTM is 
merged with self-attention; the model is able to learn each 
word in tweets more effectively since it contains enough 
word context information based on previous and future 
context. 
 
 
(a) 
                                                               
(b) 
 
                                      (c) 
CoBiAt: A Sentiment Classification Model Using Hybrid ConvNet… Informatica 47 (2023) 523–536 533 
 
         (d) 
Figure 8: Model performance using Word2Vec word 
embedding for US Airlines dataset 
 
Since the proposed model performed much better on IR 
dataset using different word embeddings Wird2Vec and 
Keras. Therefore, further experiments were conducted for 
testing the validity of the performance of the proposed 
model. The proposed model was then implemented on the 
US Airlines dataset using both word embeddings that were 
used for the IR dataset. The overall performance of the 
proposed model using Word2Vec embedding is shown in 
Figure 8 and Keras word embedding is represented in 
Figure 9. On the US Airlines dataset, the performance of 
the given self-attention-based model is observed to be 
reduced than the performance of the IR dataset but still, 
the proposed model outperforms other deep learning 
techniques in performance with 93.89%, 94.63% precision 
for Word2VEc and Keras Embeddings respectively. The 
classification accuracy on US Airlines dataset was also 
impressive and better than other models with 93.24% for 
Word2Vec and 94.56% for Keras Embedding. 
 
 
                                      (a) 
a   
 
                                    (b)                                                                           
 
(c)        
 
(d) 
Figure 9: Model performance using Keras embedding for 
US Airlines dataset 
 
 
 
534 Informatica 47 (2023) 523–536 R. Ranjan et al. 
 
 
Figure 10: Accuracy of the proposed model over 
traditional models 
 
Extensive experiments were performed for classic 
learning techniques on US airlines dataset for 
classification for validating the effectiveness of the model. 
Figure 10 demonstrates the accuracy level of the models. 
The proposed model outperformed the other traditional 
model also with a very high level of accuracy as compared 
to other models. The Gaussian Naïve Bayes shows the 
least performance among all with an accuracy of 
classification 66.60%. The Decision Tree method 
performed better than Gaussian Naïve Byes with an 
accuracy of 73.5%. On the other hand, KNN, SVM, and 
Random Forest methods achieved an accuracy of 74.01%, 
80%, and 84.5% respectively. The proposed model 
performed the most optimized accuracy with 94.56%. 
 
5.3 Performance comparison with other state 
of the art model 
The experimental findings were compared to earlier work 
in text sentiment classification methodologies to ensure 
that the performance model is verified. 
 
Table 5: Experimental findings for sentiment 
classification accuracy in % 
Models Accuracy Reported by 
CNN-BiLSTM 90.66 Rhanou et al. [45] 
CNN-BiLSTM 94.20 Zi-xian et al. [46] 
BiLSTM with 
Self-Attention 
86.24 Jun-Xie et al. [47] 
BiLSTM with 
Muti-Head 
Attention 
92.11 FEI et al. [48] 
Conv-LSTM-
Conv 
89.02 Ghorbani et al. [49] 
Text-CNN 91.50 Chuantao et al. [50] 
CNN-BiLSTM 
with Keras 
95.98 Proposed Model 
 
Rhanou et al. [45] suggested a model that combines 
CNN with BiLSTM models using Doc2vec embedding for 
long text emotion analysis. The composite neural network 
model presented in [46] is comprised of two parts: a 
convolutional neural network for extracting local features 
from text vectors, a BiLSTM that extracts globalized 
features linked to context of text, and a fusion of the 
attributes collected by the two complementary models. 
The sentences is automatically classified by the trained 
neural network based hybrid model. The results of 
experiments reveal that the accuracy rate of text 
classification is 94.2%, with a total of 10 iterations. For 
polarity classification of fine-grained sentiment in small 
sized texts, a BiLSTM model based on Self-Attention-
Based using information of aspect-term is presented in 
Jun-Xie et al. [47]. A layer of word-encoding, a Dual 
LSTM module, an attention-based module, and a softmax 
function module are the primary constituents of the model. 
The vector based on hidden feautres and the vectors of 
aspects are merged by inserting in the BiLSTM module 
and the module based on self attention reducing 
computational complexity imposed by direct vector 
division. The model [47] achieved 86.24 accuracy which 
is lesser in comparison to proposed model. The model 
described in [48] investigates analysis of sentiment for 
Chinese text on social media by merging Multi-head 
Attention (MHAT) mechanism with BiLSTM networks 
for addressing the shortcomings of classic sentiment 
analysis. The goal of researchers was to add weights of 
influence to the generated sequence of text due to the 
MHAT mechanism's ability to learn important 
information from a distinct representation subspace 
utilising numerous dispersed computations. The model 
presented in [48] provided 92.11% accuracy. Gorbani et 
al. [49] suggested a ConvNet with BiLSTM model that 
classifies features using CNN, learns context information 
using BiLSTM, and then reuses the results for CNN to 
produce an abstract feature before applying to the final 
dense layer. The model achieved a great accuracy of 89.02 
percent. The proposed model, on the other hand, is simpler 
and requires less complexity analysis, yet it achieves a 
greater accuracy of 6.96 percent more than [49]. Chuantao 
et al. [50] presented the BiLSTM deep learning model 
with two weak-tagging stages. The suggested approach 
employed weak-tagging for training the proposed model, 
lowering the detrimental influence of samples of noise in 
weak-tagging for the categorization of sentiment model's 
performance of categorization, and increases the accuracy 
of the sentiment categorization approach, which achieves 
91.50 percent accuracy. In comparison to [50], the 
suggested Keras embeddings model outperformed the [50] 
model. 
Based on the comparison with these previous models 
it is evident that the proposed model with Keras 
embeddings outperforms other models and achieves much 
better accuracy. 
6  Conclusion 
The research was performed on two different datasets. 
Word2Vec and Keras word embedding methods were 
applied for training and evaluation of the model on both 
datasets. The proposed model integrated the features of 
CNN with BiLSTM with the self-attention mechanism. 
ConvNet collects text characteristics and passes text 
context information to BiLSTM. The attention mechanism 
improved the classification accuracy as it extracted the 
context of the sentence more accurately. Hyper-
parameters tuning was performed to optimize the model. 
CoBiAt: A Sentiment Classification Model Using Hybrid ConvNet… Informatica 47 (2023) 523–536 535 
Therefore, the proposed model performed classification 
with improved accuracy and efficiency. The model 
provided improvement in accuracy by 7.20% using 
Word2Vec embedding for IR dataset and 9.84% more 
accuracy in classification using Keras embedding. The 
model performed effectively on US airlines dataset as well 
with 8.77% more accuracy with word2vec embedding and 
8.03% more accuracy for Keras embedding. The proposed 
model outperformed other traditional models for the US 
Airlines dataset. 
R efer ence s 
[1] Liu B, Sentiment Analysis and Opinion Mining 
(Synthesis Lectures on Human Language 
Technologies), vol. 5, no. 1. San Rafael, CA, USA: 
Morgan & Claypool, 2012, pp. 1_167. Accessed: 
Nov. 1, 2020, doi: 
10.2200/S00416ED1V01Y201204HLT016. 
[2] Tang HF, Tan SB and Cheng XQ, Research on 
sentiment classification of Chinese reviews based on 
supervised machine learning techniques. Chin. Inf. 
Process., vol. 21, no. 6, pp. 88_126, 2007. 
[3] Liu Y, J.W. Bi, and Z. P. Fan. A method for multi-
class sentiment classification based on an improved 
one- vs-one (OVO) strategy and the support vector 
machine (SVM) algorithm. Inf. Sci, vols. 394_395, 
pp.  38_52, Jul. 2017. 
[4] Zhang J and Zong C, Deep neural networks in 
machine translation: An overview. IEEE Intell. Syst., 
vol. 30, no. 5, pp. 16_25, Sep. /Oct. 2015. 
[5] Yin W, Schütze H, Xiang B, and Zhou B, ABCNN: 
Attention-based convolutional neural network for 
modeling sentence pairs.  Trans. Assoc. Comput. 
Linguistics, vol. 4, pp. 259_272, Dec. 2016. 
[6] Ansari A, Maknojia M, and Shaikh A, Intelligent 
question answering system based on artificial neural 
network in Proc. IEEE ICETECH, Coimbatore, 
India, Mar. 2016, pp. 758_763. 
[7] Collobert R, Weston J, Bottou L, Karlen M, 
Kavukcuoglu K, and Kuksa P, Natural language 
processing from scratch.  J. Mach. Learn. Res., vol. 
12 pp. 2493–2537, Aug. 2011 
[8] Levy O, Goldberg Y, and Dagan I, Improving 
distributional similarity with lessons learned from 
word embeddings.  Trans. Assoc. Comput. 
Linguistics, vol. 3, pp. 211–225, May 2015 
[9] Liu G and Guo J, Bidirectional LSTM with attention 
mechanism and convolutional layer for text 
classification.  Neurocomputing, vol. 337, pp. 
325_338, Apr. 2019. 
[10] Bahdanau D, Cho K, and Bengio Y, Neural machine 
translation by jointly learning to align and translate. 
2014, arXiv: 1409.0473. [Online]. Available: 
https://arxiv.org/abs/1409.0473 
[11] Rush A M, Chopra S, and Weston J, A neural 
attention model for abstractive sentence 
summarization. 2015, arXiv: 1509.00685. [Online]. 
Available: https://arxiv.org/abs/1509.00685 
 
[12] Hermann K M, Kocisky T, Grefenstette E, 
Espeholt L, Kay W, Suleyman M, and Blunsom P, 
Teaching machines to read and comprehend in 
Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 
1684–1692. 
[13] Ranjan R., Daniel A.K. (2021), Intelligent 
Sentiments Information Systems Using Fuzzy 
Logic, Information and Communication 
Technology for Intelligent Systems. ICTIS 2020. 
Smart Innovation, Systems and Technologies, vol 
195. Springer, Singapore. 
https://doi.org/10.1007/978-981-157078-0_55 
[14] Jiang L, Yu M, Zhou M, Liu X, and Zhao T, 
Target-dependent Twitter sentiment classification 
in Proc. ACL, 2011, pp. 151–160. 
[15] Perez-Rosas V, Banea C, and Mihalcea R, 
Learning sentiment lexicons in Spanish.  LREC, 
vol. 12, p. 73, May 2012 
[16] Zhang M, Zhang Y, and Vo D T, Gated neural 
networks for targeted sentiment analysis in Proc. 
AAAI Conf. Artif. Intell., 2016, pp. 3087–3093 
[17] Tang D, Qin B, and Liu T, Aspect level sentiment 
classification with deep memory network.  Proc. 
Conf. Empirical Methods Natural Lang. Process. 
2016, pp. 214_224. 
[18] Tang D, Qin B, Feng X, and Liu T. Effective 
LSTMs for target-dependent sentiment 
classification in Proc. 26th Int. Conf. 
Comput.Linguistics, Tech. Papers (COLING), 
2016, pp. 3298_3307. 
[19] Ren Y, Zhang Y, Zhang M, and Ji D, Context-
sensitive Twitter sentiment classification using 
neural     network.  In Proc. AAAI, Phoenix, AZ, 
USA, Feb. 2016, pp. 215_221. 
[20] Rosenthal S, Farra N, and Nakov P, SemEval-2017 
task 4: Sentiment analysis in Twitter.  Proc. 
SemEval, Vancouver, BC, Canada, Aug. 2017, pp. 
502_518. 
[21] Fan F, Feng Y, and Zhao D, Multi-grained 
attention network for aspect- level sentiment 
classification. In Proc. Conf. Empirical Methods 
Natural Lang. Process., 2018, pp. 3422_3433. 
[22] Zhang Q, Lu R, A multi-attention network for 
aspect-level sentiment analysis.  Future Internet, 
vol. 11, no. 7, p. 157, Jul. 2019. 
[23] Xu Q,  Zhu L, Dai T, and Yan C, Aspect-based 
sentiment classification with multi-attention 
network. Neurocomputing, vol. 388, pp. 135_143, 
May 2020. 
[24] Meng W, Wei Y, Liu P, Zhu Z, and Yin H, Aspect 
based sentiment analysis with feature enhanced 
attention CNN-BiLSTM.  IEEE Access, vol. 7, pp. 
167240–167249, 2019. 
[25] Park H J, Song M, and Shin K S, Deep learning 
models and datasets for aspect term sentiment 
classification: Implementing holistic recurrent 
attention on target-dependent memories.  Knowl.-
Based Syst.   vol.187, Jan. 2020, Art. No. - 104825. 
 
 
536 Informatica 47 (2023) 523–536 R. Ranjan et al. 
[26] Lin Z, Feng M, Santos CN  D, Yu  M,  Xiang  B,  
Zhou  B,  Bengio Y, A structured self-attentive 
sentence embedding. arXiv preprint 2017, arXiv: 
1703.03130. 
[27] Chen H, Sun M, Tu C, Lin Y, Liu Z, Neural 
sentiment classification with user and product 
attention. In Proceedings of the 2016 Conference 
on Empirical Methods in Natural Language 
Processing, Austin, TX, USA, 21 September 2016; 
pp. 1650–1659. 
[28] Fu, X, Yang, J, Li J, Fang M, Wang H, Lexicon-
enhanced LSTM with attention for general 
sentiment analysis. IEEE Access 2018, 6, 71884–
71891. 
[29] Dou, Z Y, Capturing user and product Information 
for document level sentiment analysis with deep 
memory network. In Proceedings of the 2017 
Conference on Empirical Methods in Natural 
Language Processing, Copenhagen, Denmark, 3 
March 2017; pp. 521–526. 
[30] Ranjan R, Daniel A K, A Deep Learning Model for 
Extracting Consumer Sentiments using Recurrent 
Neural Network Techniques, Int. Jour. Of Com. Sci 
and Net. Sec., Vol. 21 No. 8 pp. 238-246, 2021 
[31] Yang Z, Yang D, Dyer C, He X, Smola A., and 
Hovy E., Hierarchical attention networks for 
document classification, in Proc. Conf. North 
Amer. Chapter Assoc. Comput. Linguistics, Hum. 
Lang. Technol., 2016, pp. 1480–1489. 
[32] Chen H, Sun M, Tu C, Lin Y, and Liu Z, Neural 
sentiment classification with user and product 
attention,in Proc. Conf. Empirical Methods 
Natural Lang. Process., 2016, pp. 1650–1659. 
[33] Zhang Z, Zou Y, and Gan C, Textual sentiment 
analysis via three different attention convolutional 
neural networks and cross-modality consistent 
regression,” Neurocomputing, vol. 275, pp. 1407–
1415, Jan. 2018. 
[34] Deng D, Jing L, Yu J, and Su S, Sparse self-
attention LSTM for Sentiment lexicon construction, 
IEEE/ACM Trans. Audio, Speech, Lang. Process., 
vol. 27, no. 11, pp. 1777–1790, Nov. 2019. 
[35] Ahmad H., Asghar M. U., Asghar M. Khan Z., A.  
and Mosavi A. H., A Hybrid Deep Learning 
Technique for Personality Trait Classification 
From Text,  in IEEE Access, vol. 9, pp. 146214-
146232, 2021, doi: 
10.1109/ACCESS.2021.3121791. 
[36] Cheng J., Sadiq M., Kalugina O. A., Nafees S. A.  
and Umer Q., Convolutional Neural Network 
Based Approval Prediction of Enhancement 
Reports, in IEEE Access, vol. 9, pp. 122412-
122424, 2021, doi: 
10.1109/ACCESS.2021.3108624. 
[37] Wang C., Yang X.  and Ding L., Deep Learning 
Sentiment Classification Based on Weak Tagging 
Information, in IEEE Access, vol. 9, pp. 66509-
66518, 2021, doi: 
10.1109/ACCESS.2021.3077059. 
 
[38] F. Xu, Z. Pan, and R. Xia, E-commerce product 
review sentiment classification based on a naïve 
Bayes continuous learning framework, Inf. 
Process. Manage. vol. 57, no. 5, Sep. 2020, Art. no. 
102221. 
[39] Preethi, G.; Krishna, P. V.; Obaidat, M. S.; Saritha, 
V.; Yenduri, S. ,2017: Application of deep learning 
to sentiment analysis for recommender system on 
cloud. International Conference on Computer, 
Information and Telecommunication Systems, pp. 
93–97. 
[40] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, 
C. Clark, K. Lee, and L. Zettlemoyer, Deep 
contextualized word representations,'' J. Assoc. 
Comput. Linguistics, vol. 1, pp. 2227_2237, Mar. 
2018. 
[41] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, 
Gradient-based learning applied to document 
recognition, Proc. IEEE, vol. 86, no. 11, 
[42] pp. 2278_2324, Nov. 1998. 
[43] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, 
Sentiment analysis of comment texts based on 
BiLSTM, IEEE Access, vol. 7, pp. 51522_51532, 
2019. 
[44] F. Y. Zhou, L. P. Jin, and J. Dong, Review of 
convolutional neural network, Chin. J. Comput., 
vol. 1, pp. 35_38, Jan. 2017. 
[45] M. Rhanou, M. Mikram, S. Yousfi and S. Barzali, 
A CNN-BiLSTM Model for Document-Level 
Sentiment Analysis, Mach. Learn. Knowl. Extr. 
2019, 1, 832–847; doi:10.3390/make1030048 
[46] Zi-xian Liu, De-gan Zhang, Gu-zhao Luo, Ming 
Lian, Bing Liu,, A new method of emotional 
analysis based on CNN–BiLSTM hybrid neural 
network, Cluster Computing, 2019, 
https://doi.org/10.1007/s10586-020-03055-94 
[47] JUN XIE, BO CHEN, XINGLONG GU, 
FENGMEI LIANG, AND XINYING XU, Self-
Attention-Based BiLSTM Model for Short Text 
Fine-Grained Sentiment Classification 56789,-
().volV), IEEE Access, 2019, doi 
10.1109/ACCESS.2019.2957510 
[48] FEI LONG, KAI ZHOU, AND WEIHUA OU, 
Sentiment Analysis of Text Based on Bidirectional 
LSTM With Multi-Head Attention, 2019, IEEE 
Access, doi 10.1109/ACCESS.2019.2942614  
[49] M. Ghorbani, M. Bahaghighat, Q. Xin, and F. 
Özen, ConvLSTMConv network: A deep learning 
approach for sentiment analysis in cloud 
computing, J. Cloud Comput., vol. 9, no. 1, pp. 
9_16, Dec. 2020,doi: 10.1186/s13677-020-00162-
1. 
[50] Chuantao Wang , Xuexin Yang , and Linkai Ding, 
Deep Learning Sentiment Classification Based on 
Weak Tagging Information, 2021, IEEE Access, 
doi 10.1109/ACCESS.2021.3077059