https://doi.org/10.31449/inf.v43i3.2615 Informatica 43 (2019) 381–386 381 Twitter-based Opinion Mining for Flight Service Utilizing Machine Learning Prayag Tiwari Department of Information Engineering, University of Padova, Italy E-mail: prayagforms@gmail.com Hari Mohan Pandey Department of Computer Science, Edge Hill University, Ormskirk, UK E-mail: pandeyh@edgehill.ac.uk Aditya Khamparia School of Computer Science and Engineering, Lovely Professional University, Phagwara, India E-mail: aditya.khamparia88@gmail.com Sachin Kumar Department of System Programming, South Ural State University, Chelyabinsk, Russia E-mail: sachinagnihotri16@gmail.com Keywords: sentiment analysis, random forest, logistic regression Received: December 12, 2018 Twitter is one of the most prominent social networking platforms so far. Millions of users utilize Twitter to share their thoughts and views on various topics of interest every day resulting a huge amount of data. This data could be considered to have a rich source of useful information hidden inside. Using machine learning to this data may give rise to effective recommender frameworks for individuals to manage their lives in a much more convenient way. In this paper, we propose a machine learning approach to classify the passenger’s tweets regarding the airplane services to understand the pattern of emotions. We adopt Random Forest (RF) and Logistic Regression (LR) to classify each tweet into positive, negative and neutral sentiment. The evaluation of the collected real data demonstrates that these two methods are able to achieve an accuracy ≈80%. Povzetek: Z metodami strojnega učenja so analiziranji tviti (čivki) letalskih potnikov o letalskih storitvah. 1 Introduction At present, large scale companies are investing plenty of time, resources and energy to enhance the consumer’s loyalty. It may explore more opportunities for the interaction between companies and consumers to get their feedback and suggestion about the products and services with an aspect of customer satisfaction and product quality improvement. This may increase the both the economic and social development of the company. A crucial but challenging step is to automatically analyze the customer feedback by extracting useful information from the huge data of customer feedbacks [1]. Customer feedback data is very important in addressing several issues and sentiment and opinion analysis is one of the important issue among them. Extracted patterns from the data may be utilized by company experts to understand the polarity of the opinion towards different products and services. In general the polarity of opinion may be positive, negative or neutral. Companies may use these polarity of opinions in order to improve their quality of products and/or services. Sentiment analysis/opinion mining assists in answering different question about products and services by understanding the emotions in the feedbacks [2]. Present world is utilizing the natural language processing (NLP) and text classification techniques to map the sentiments within the text into positive, negative and neutral classes [3]. The sentiments can be seen as an indirect publicity of a company’s products and services in the world that provide a direct impact on other customer’s. For travelers, the most popular and convenient platform for sharing their opinion is Twitter [4]. Each travel journey on different carriers may bring different comfort levels i.e. good, average or poor level of comfort. These comfort levels are conveyed to the social media i.e. Twitter etc. by the travelers in terms of tweets. If a traveler enjoyed the trip, the respective tweet would demonstrate the happiness or positive emotions towards the travel carrier otherwise negative emotions may be conveyed. Figure 1 depicts a furious tweet by a passenger on British Airways flight. As a result, the company considered it very urgent and important and settled the issue at the earliest. In another scenario (Figure 2), a sarcasm tweet for Indigo Airways 382 Informatica 43 (2019) 381–386 P. Tiwari et al. was fired because the baggage of passenger was transferred to a different location (Hyderabad) other than the traveler (Calcutta). The tweet in figure 2 seem to be negative from a human perspective whereas it is difficult to put this into negative class for the machine because of the complex words used in the tweet. Also, tweets on tweets may not contain more than 140 characters at once. Therefore, it is useless to expect the detailed information inside the tweet. However, a general understanding about polarity of emotions can be developed using machine learning methods. Further, tweets in the categories may be analyzed to get insights or possible reasons for these sentiments [5]. Every day more than a million of people are travelling around the world and tweeting their views with respect to the journey. It results in a huge amount of data available for analysis every day. Hence, machine learning techniques can be considered as a solution for such analysis. Machine learning techniques are efficient to handle huge data with large dimensions [6, 7]. Figure 1: A negative tweet illustrating loss of luggage. Figure 2: A tweet illustrating wrong transfer of luggage in sarcastic way. The main motivation behind this work is to provide a better analysis for classification of sentiment from the tweet data in order to assist the airline companies to improve customer satisfaction and improve the quality of service. The organization of the paper is as follows: Section 2 provides the state of art literature review. In section 3, proposed work is discussed. Section 4 presents the experimental results and discussion which is followed by conclusion in section 5. 2 Literature review Kusen et al. [8] analyzed a twitter data set consisting of 343645 tweets about 2016 Austrian presidential election. This analysis amalgamated approaches from sentiment analysis, network science, and bot detection. It was shown that the immediate relationship between the winners of the 2016 Austrian presidential races was more famous and had a high impact on Twitter than other rivals. Ahmed et al. [9] have demonstrated how the first time twitter utilized as a campaign tool in the Indian election 2014 by different parties. They demonstrated computer- aided and multi-level manual analysis of 98363 tweet messages by 11 parties during the campaign. It had a high impact on twitter of winning party than other parties. Stigleitz et al. [10] examined whether opinion persisting in online networking content is related to a client's data sharing coordination. They conducted an examination with regards to political correspondence on Twitter. On the basis of two dataset collections of about 165,000 tweets altogether, they found out that candidly charged Twitter messages had a tendency to be retweeted all the more regularly and more immediately contrasted with biased ones. As a general suggestion, organizations should give careful consideration to the examination of opinion identified with their brands and items in social networking correspondence, in addition to planning promoting content that triggers emotions. Gunarathne et al. [11] investigated the objection resolution experience of passengers of U.S. aircraft, by utilizing an interesting data collection amalgamating both customers– brand cooperation’s on Twitter and how clients felt toward the end of these associations. They located that objection Customer who is more dominant in online networking communities will probably be fulfilled. Customers who have beforehand objection to the brand via social networking media and customers who grumble about process-related instead of result related issues are less inclined to feel better at last. To the best of our insight, this examination is the first to recognize the key factors that shape client sentiments toward their brand– client communications via social networking media. Their outcomes give useful direction to effectively settling clients' objection using social networking field that expects exponential development in the coming decade. Seunghyun et al. [12] showed social networking examination utilizing Twitter data alluding to cruise travel. This examination likewise incorporated an inside and out an investigation on tweets by three kinds of group users: private, commercial and blogs. The outcomes demonstrated that not exclusively were words identified with industry, travel, emotions, and destination most often utilized as a part of organizing tweets, but also proficient bloggers, cruise lines, celebrities and travel organizations really drove significant subgroups on cruise themes on Twitter. On the basis of such outcomes, this examination gives attainable marketing approach. 3 Proposed work In this section, our proposed model consists of several steps like preprocessing, feature extraction etc. in order to train the model and use the test dataset to check the evaluation metric on the test dataset. Precision, F1- measure, and Recall are used as an evaluation metric. 3.1 System architecture Proposed architecture can be seen in the figure no. 3 that how flow started of our model from the dataset, text pre- processing, feature extraction, a division of dataset into training and testing set, the trained model then tested on the test dataset. Twitter-based Opinion Mining for Flight Service... Informatica 43 (2019) 381–386 383 3.2 Text preprocessing As a pre-processing step, we do a basic statistical analysis on the collected data. The statistics include the number of words (denoted as word_counts), the number of hashtags (denoted as hashtag_counts), and counts for other punctuation marks. Figure 3: Architecture of Proposed Sentiment Analysis Model. The distribution of those textual variables over the three sentiment classes is shown in Fig 4. Figure 4: Distribution of Class Labels We then remove the hashtags, mentions, URLs etc= to make text data more clean for further analysis. We also removed punctuations, stop words and digits. Finally, we stem words and convert them to lowercase. This is the standard procedure for pre-processing textual data. The examples of tweets after pre-processing can be seen in Figure 5. Figure 5: A sample of preprocessed tweets. 3.3 Random forest Decision Trees are the most widely used machine learning methods. Random Forest provides an effective way of averaging several decision trees, trained in different segments of the same training dataset with the aim to deteriorate the variance and provide a stable and accurate prediction. Random forest could be an ensemble learning procedure for regression, classification, and elective undertakings, which is achieved by building a large group of decision trees at training phase and provoking the classes which are the model for the mean prediction (regression) or classifications (classes) of the distinctive trees. In a distinct computation, classification is implemented recursively until every leaf is pure. The aim is to dynamically predict the best decision tree until it catches up the adaptability, precision, and balance. There are three measures to split the node are shown in Eq. 1-3. Entropy = ∑ 𝑃 𝑗 log 2 𝑃 𝑗 𝑗 (1) Gini=1-∑ 𝑃 𝑗 2 𝑗 (2) Classification Error= 1-max P j (3) Where P j is the probability of class j. The algorithms starts as follows: we pick a bootstrap observation from the S in which S (i) represents the i th bootstraps for every tree in the given forest. Then train the decision tree utilizing a revised decision tree algorithm. The revised decision tree algorithms as follows: in contrast of analyzing all feasible feature split, some random features f ⊆ F, at every node of the tree where F is the feature sets. The given node split on the top features in f comparably than selecting F. In this, f is much more compact and smaller than F. The most challenging task is to choosing on which feature to split in the decision tree learning that is why making narrow the feature set makes faster learning. The pseudocode is given as follows: 3.4 Logistic regression Logistic Regression is a statistical method for investigating a dataset in which there are at least one or more than one independent variables that decide a result. The result is estimated with a dichotomous variable (in which there are just two conceivable results). The objective of logistic regression is to locate the best fitting model to depict the connection between the dichotomous feature and the set of independent factors. Our Hypothesis function can be written like as given below, Y = W T X (4) 384 Informatica 43 (2019) 381–386 P. Tiwari et al. A sigmoid function is implemented across the notable hypothesis function to keep into the range of (0, 1). The sigmoid function can be described as, sg(y) = 1/(1+𝑒 −𝑦 ) (5) So our new hypothesis is sg(y) =sg(W T X) =1/ (1+e -WTX ) (6) Boundary Estimation: Our new hypothesis function provides us the values in between 0 and 1so it can be clarified probability of y would be 1 for given X and this can be written in this form, sg(y) = P(y = 1|x, W) (7) Cost Function: Taking a square error function does not work from the transformed hypothesis function so we make a new form of cost function which is as follows: E(sg( W, x), y) = -log(1-sg(W, x)) if y = 0 E(sg( W, x), y) = -log(sg(W, x)) if y = 1 Therefore, the mean of cost function will be as follows, H(W) = 1 𝑚 ∑ 𝐸 (𝑠𝑔 ( 𝑊 , 𝑥 𝑖 ), 𝑦𝑖 ) 𝑚 𝑖 =1 (8) Parameter Estimation: We utilize an iterative approach known as Gradient Descent to enhance the parameters across every step and reduce the cost function to the most feasible value. Gradient Descent requires a convex cost function to avoid getting stuck in a local minimum at the optimization stage. We begin with irregular parameter values and update their values at every stage to reduce the cost function to some extent until we reach the lowest point or equivalently there are not any changes to the value of the target function. The gradient descent step is as follows, β (i+1) = β i – p 𝛿𝐻 (𝑊 ) 𝛿𝛽 𝑖 (9) For every i =1, 2, 3…, n and p is the learning rate controlling the speed that it moves across the slope on the curve to reduce the cost function. Above process can be shown in the pseudocode for logistic regression with L1 regularization. The procedure starts with providing input dataset D with corresponding labels and iteration numbers. In this, w h is the temporary variable. Our algorithm start working as mentioned in the pseudocode. 3.5 Evaluation metric In order to measure the accuracy of classification [13], we used different parameters such as Recall, Precision, and F- measure [12]. Recall can be regarded as the measure of completeness whereas Precision can be seen as a measure of exactness. Formally, precision can be defined as the ratio of correctly classified instances of one class and a total number of instances classified in the same class, whereas recall is the ratio of correctly classified instances of one class and overall instances of the same class. Both precision and Recall can be calculated using the confusion matrix. Confusion matrix represents the number of correctly classified and incorrectly classified instances of all classes. Using the confusion matrix, all performance evaluation measures can be calculated. For a twitter dataset with a binary classification problem, if the total 600 tweets are classified to one class, among which 500 of them are correctly classified, and the total number of tweets in this class are 700. Then, the precision of the classifier is 500/600= 83.3%, and the recall of the classifier is 500/700=71.4%. The Recall and Precision are integrated to develop a new measure known as F-measure or F-score. The formula to calculate F-measure is given in Equation 12. Precision= 𝑇𝑃 𝑇𝑃 +𝐹𝑃 (10) Recall= 𝑇𝑃 𝑇𝑃 +𝐹𝑁 (11) F-measure=2( 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +𝑅𝑒𝑐𝑎𝑙𝑙 ) (12) Where TP is True Positive, TN is True Negative, FN is False Negative and FP is False Positive. 4 Experiments 4.1 Data preparation In this study, we experiment on the US Airlines 2016, which contains 14500 passenger tweets. Since the number of original features is too large, we manually select the textual based features, because are easily accessed by passengers. As can be seen from Figure 6, the class labels are highly unbalanced. The dataset is available for public use [14]. After the preprocessing step, we identified the top 30 frequent words in the dataset, which is shown in Figure 6. 4.2 Experimental analysis For further evaluation, it is necessary to have test data that could be helpful to evaluate several measures of our model. Data was divided into 70 percent train and 30 percent test set Text count variable has been combined with cleaned data to create a data frame. Twitter-based Opinion Mining for Flight Service... Informatica 43 (2019) 381–386 385 For opting better parameters, it is needed to assess on a different validation from training. By utilizing just a single validation set one might not deliver reliable validation result. To get a more precise estimation, cross- validation is performed. In this study, we conduct k-fold validation on the data at hand and utilize GridSearchCV to search for the best- performed parameter combination. We select precision as the metric for optimization for both logistic regression and Random Forest classifiers. In order for bag-of-word features to be properly fed into classifiers, we use CountVectorizer to transform words into vectors. The word cloud in Figure 10 gives a decent visual depiction of the word recurrence for each kind of opinion, in which the left one corresponds to the positive opinion and the right one the negative. The span of the word relates to its recurrence across all tweets. This figure gives us a rough idea of what passengers are discussing. For instance, for negative opinion, passengers appear to gripe about delayed of flight, cancellation of flights, the low-quality of the flight service, the hours holding up and etc. Be that as it may, for positive opinion, passengers are thankful and they discuss extraordinary administration/flight. A cloud of the word has been mentioned in Figure 10 to visualize those positive and negative tweets more properly. Several other approaches have been used but Logistic Regression and Random Forest gave better result on train and test dataset. The main advantage of using Random Forest for text classification is that it ensemble multiple and different kinds of decision trees and utilize an assortment of the different trees to improve the result of the model. 4.3 Results and discussion Our proposed model provided this result on the test dataset. As it can be seen that in the case of positive, negative or neutral categories, our proposed model can classify with high precision, recall and f-measures. After applying logistic regression and random forest on the dataset, the performance values are recorded in table 1 and table 2. As from above tables, it can be seen that both classifiers performed very well, but Random Forest works better as compared to logistic regression, with a consistent higher value in Precision, Recall, and F-score than logistic regression. The 82 % accuracy value on the test data is superior to our predefined target, which is to the maximum value we can achieve by setting the prediction labels for all samples to be the dominant class. The precision is also high for all the three classes and the recall rate is relatively low for the neutral classes. For better illustrating the effectiveness of our proposed models, we also present examples of some negative and positive tweets classified by our proposed approaches. Model-predicted accurately like Negative, Negative in the first column and Positive, Positive for the second column based on the test set. Sentiment Class Precision Recall F1-Score Positive 0.80 0.74 0.77 Negative 0.73 0.53 0.62 Neutral 0.83 0.93 0.88 Table 1: Evaluation Metric of Logistic Regression. Sentiment Class Precision Recall F1-Score Positive 0.82 0.74 0.78 Negative 0.75 0.60 0.65 Neutral 0.84 0.95 0.90 Table 2: Evaluation Metric of Random Forest. Figure 6: Top 30 most frequent words. Figure 7: Distribution of Text Variables. 386 Informatica 43 (2019) 381–386 P. Tiwari et al. 5 Conclusion and future scope This study tackles the sentiment classification problem by utilizing two machine learning models. On the collected data, we achieve an accuracy of 82%. This study has impacts on the aviation industry in that it provides an effective and efficient way to monitor the passengers’ sentiments for aviation companies to improve their service. For future work, we would like to conduct a deeper analysis of the data and extract more useful information for providing recommendations for several airplane organization and passengers. It would be also used to use a bigger dataset than the used dataset because a larger dataset may provide some better result than used one. The author would like to use also deep learning models and especially focus on how to identify the sarcasm because there are several sentences seems positive but their meaning is negative so this is a really big issue to sort out and at present, existing models are not efficient to sort it out effectively. 5.1 Acknowledgement Prayag Tiwari has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 721321. Sachin Kumar has received financial support by the Ministry of Education and Science of Russian Federation (Government Order 2.7905.2017/8.9). 6 References [1] Kumar S and M Nezhurina. An ensemble classification approach for prediction of user’s next location based on Twitter data. Journal of Ambient Intelligence and Humanized Computing. 2018. https://doi.org/10.1007/s12652-018-1134-3. [2] Kumar S and M Zymbler. A machine learning approach to analyze customer satisfaction from airline tweets. Journal of Big Data, 6(1):62, 2019. DOI: 10.1186/s40537-019-0224-1 [3] Yee L and P Tan. Gaining customer knowledge in low cost airlines through text mining. Industrial Management & Data Systems, 114(9): 1344-1359, 2014. https://doi.org/10.1108/IMDS-07-2014-0225 [4] Twitter: www.twitter.com access on 11.02.2019 [5] Zhang L, Y Sun and T Luo. A framework for evaluating customer satisfaction. In Proc: International Conference on Software, Knowledge, Information Management and Applications (SKIMA), IEEE, Chengdu, China, 15-17 December, 2016. https://doi.org/10.1007/s11263-007-0056-x. [6] Jaiswal AK, P Tiwari, S Kumar, D Gupta, A Khanna, JJPC Rodrigues. Identifying pneumonia in chest x- rays: a deep learning approach. Measurement, 145: 511-518, 2019. https://doi.org/10.1016/j.measurement.2019.05.076 [7] Tiwari P and M Melucci. Towards a Quantum Inspired Binary Classifier. IEEE Access, 7:42354- 42372, 2019. DOI: 10.1109/ACCESS.2019.2904624 [8] Kusen E and M Strembech. An analysis of tweeter discussion on the 2016 Austrian presidential election, arXiv preprint arXiv: 1707.09939, 2017. [9] Ahmed S, K Jaidka and J Cho. The 2014 Indian elections on Twitter: a comparison of campaign strategies of political parties. Telematics and Informatics, 33 (4):1071-1087, 2016. https://doi.org/10.1016/j.tele.2016.03.002 [10] Stieglitz S and L Dang-Xuan. Emotions and information diffusion in social media - sentiment of microblogs and sharing behavior. Journal of management information systems, 29 (4):217-248, 2013. https://doi.org/10.2753/MIS0742-1222290408 [11] Gunarathne P, H Rui and A Seidmann. Whose and what social media complaints have happier resolutions? Evidence from Twitter. Journal of Management Information Systems 34 (2):314-340, 2017. https://doi.org/10.1080/07421222.2017.1334465 [12] Seunghyun BP, C Ok, B Chae. Using Twitter Data for Cruise Tourism Marketing and Research. Journal of Travel & Tourism Marketing, 33(6):885-898, 2016. https://doi.org/10.1080/10548408.2015.1071688 [13] Gräbner D, M Zanker, G Fliedl and M Fuchs. Classification of customer reviews based on sentiment analysis. In: Fuchs M, Ricci F, Cantoni L (eds) Information and Communication Technologies in Tourism 2012, Springer, Vienna, pp. 460-470, 2012. Doi: 10.1007/978-3-7091-1142-0_40 [14] Data-set:https://data.world/crowdflower/airline- twitter-sentiment accessed on 12.11.2018. Negative Tweets Positive Tweets “ @united It's a shame choosing #United may be the difference between reuniting with aging friends and never seeing them again #PoorService”, “@united Big thanks to Ms. Winston for assisting me over the phone with a baggage claim issue today. She really went the extra mile!” “@united flight attendant doesn‚Äôt understand not understanding English doesn‚Äôt mean they are deaf. Stop yelling English slowly at them” “@United THANK U! Secured room for the night Thx to VERY helpful customer service rep N. Dorns. I thanked her. Can u 2? #goodenoughmother” Table 3: Sample of the classified data into positive and negative tweets.