https://doi.org/10.31449/inf.v45i3.3453 Informatica 45 (2021) 367–380 367 A Classifier Ensemble Approach for Prediction of Rice Yield Based on Climatic Variability for Coastal Odisha Region of India Subhadra Mishra Department of Computer Science and Application, CPGS, Odisha University of Agriculture and Technology Bhubaneswar, Odisha, India E-mail: mishra.subhadra@gmail.com Debahuti Mishra Department of Computer Science and Engineering, Siksha ’O’ Anusandhan Deemed to be University Bhubaneswar, Odisha, India E-mail: debahutimishra@soa.ac.in Pradeep Kumar Mallick School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India E-mail: pradeep.mallickfcs@kiit.ac.in Gour Hari Santra Department of Soil Science and Agricultural Chemistry, IAS, Siksha ’O’ Anusandhan Deemed to be University Bhubaneswar, Odisha, India E-mail: santragh@gmail.com Sachin Kumar (Corresponding Author) Department of Computer Science, South Ural State University, Chelyabinsk, Russia E-mail: sachinagnihotri16@gmail.com Keywords: crop prediction, classifier ensemble, support vector machine, k-nearest neighbour, naive bayesian, decision tree, linear discriminant analysis Received: February 23, 2021 Agriculture is the backbone of Indian economy especially rice production, but due to several reasons the expected rice yields are not produced. The rice production mainly depends on climatic parameters such as rainfall, temperature, humidity, wind speed etc. If the farmers can get the timely advice on variation of climatic condition, they can take appropriate action to increase the rice production. This factor motivate us to prepare a computational model for the farmers and ultimately to the society also. The main contribution of this work is to present a classifier ensemble based prediction model by considering the original rice yield and climatic datasets of coastal districts Odisha namely Balasore, Cuttack and Puri for the period of 1983 to 2014 for Rabi and Kharif seasons. This ensemble method uses five diversified classifiers such as Support Vector Machine, k-Nearest Neighbour, Naive Bayesian, Decision Tree, and Linear Discriminant Analysis. This is an iterative approach; where at each iteration one classifier acts as main classifier and other four classifiers are used as base classifiers whose output has been considered after taking the majority voting. The performance measure increases 95.38% to 98.10% and 95.38% to 98.10% for specificity, 88.48% to 96.25% and 83.60% to 94.81% for both sensitivity and precision and 91.78% to 97.17% and 74.48% to 88.59% for AUC for Rabi and Kharif seasons dataset of Balasore district and also same improvement in Puri and Cuttack District. Thus the average classification accuracy is found to be above 96%. Povzetek: Opisana je ansambelska metoda napovedovanja pridelka riža v Indiji. 1 Introduction Agriculture is the pivot of Indian economy. Around 58% of rural households are dependent on agriculture as their ma- jor means of livelihood. However, the share of agriculture has changed considerably in the past 50 years. In 1950 55% of GDP came from agriculture while in 2009 it is 18.5% and during the financial year 2015-2016 it is 16.85% [1]. Indian agriculture has made great progress in ensuring food security to its huge population with its food grains produc- tion reaching a record level of 236 million ton in 2013- 2014. While the required amount for 2030 and 2050 are 345 and 494 million ton respectively. In India rice is grown in different agro climatic zones and altitudes. Rice grown in India has extended from 8 to 35°N latitude and from sea level to 3000 meter. Rice required a hot and humid climate and well suited to the areas which have high humidity, long 368 Informatica 45 (2021) 367–380 S. Mishra et al. sun shine and sufficient water supply. The average temper- ature required for the crop is 21 to 36°C. It is predicted that the demand for the rice will grow further than other crops. There are various challenges to achieve higher productiv- ity with respect to climate change and its repercussions. In tropical area higher temperature is one of the important en- vironmental factors which limit rice production. Different parts of the country have variable impacts due to climate change. For example by the year of 2080 the numbers of rain days are to be decreased along with narrow rise of 7- 10% annual rain fall which will lead to high intensity storm. Moreover, on one hand when monsoon rain fall over the country is expected to rise by 10-15%, on the other hand the winter rain fall is expected to reduce by 5-26% and sea- sonal variability would be further compounded [2]. Then, cereal production is expected to be reduced by 10-40% by 2100 due to rise in temperature, rising water scarcity and decrease in number of rain days. Higher loss is predicted in Rabi crops [3]. Rice productivity may decline by 6 percent for every 10C rising temperature [4]. In general changing climate trends will lead to overall decline agricultural yield. The simulation analysis projected that on all India basis, the consequent of climate change on productivity in 2030s ranges from -2.5 to -12% for crops such as rice, wheat, maize, sorghum, mustard and potato [5, 6]. Climate is the sum of total variation in temperature, humidity, rainfall and other metrological factors in a particular area for a period of at least 25 years [1]. Odisha’s climate has also under gone appreciable changes as a result of various factors. The previous six seasons of the year has changed into basically two mainly summer and rainy. The deviation in day tem- perature and annual precipitation is mainly restricted to 4 months in a year and number of rain days decreased from 120 to 90 days apart from being abnormal. In addition, the mean temperature is increasing and minimum temper- ature has increased about 25% [2, 3, 4, 5]. Such climate change related adversity is affecting adversely productivity and production of food grains. Agriculture is the backbone of Indian economy. But due to several reasons the expected crop yields are not produced. The production mainly de- pends on climatic parameters such as rainfall, temperature, humidity, wind speed etc. So the farmer should know the timely variation in climatic condition. If they can get the timely advice then they can increase the production. Be- fore development of the technology the farmers can pre- dict the production just by seeing the previous experience on a particular crop. But gradually the data increases and due to the environmental factors the weather changes. So we can use this vast amount of data for prediction of rice production. For a uniform growth and development assur- ance in agriculture (the current rate is 2.8% per annum), an exhaustive appraisal of the accountability of the agricul- ture production owing to predicted type of weather trans- form is necessary.In this paper the main aim is to create an ensemble model for prediction of climatic variability on rice yield for coastal Odisha. The weather parameters such as rainfall, temperature and humidity etc. are considered because they affect the 95% production of rice crop. Ad- ditionally, the classifier’s accuracy validity has been mea- sured using specificity, sensitivity/recall, precision, Neg- ative Predictive Value (NPV), False Positive Rate (FPR), False Negative Rate (FNP), False Discovery Rate (FDR) and the probabilistic measures such as; F-Score, G-Mean, Matthews Correlation Coefficient (MCC) and J-Statistics. This paper is organized as follows; section 2 describes the related works, materials and methods or approaches used for experimentation are described in section 3. The frame- work of the proposed prediction model is given in section 4, section 5 deals with experimentation and model evalu- ation. The result analysis, discussion and conclusion are given in section 6, 7 and 8 respectively. 2 Related work While undertaking this work, the existing literature that has been followed during every phase of the entire research work with the intention of clear representation of the ma- chine learning based prediction models. The various ap- proaches are explored and have been addressed to design the ensemble based rice production model based on cli- matic variability. This section describes few recent works on this are which motivated us to develop an ensemble based model. Narayan Balkrishnan [7] proposed an en- semble model AdaSVM and AdaNaive which is used to project the crop production. Authors compared their pro- posed model among the Support Vector Machine (SVM) and Naïve Bayes (NB) methods. For prediction of out- put, two parameters are used such as accuracy and the classification error and it has been observed that AdaSVM and AdaNaive are better than SVM and NB. B Narayanan [8][8] compared the SVM and NB with AdaSVM and AdaNaive and conclude that the later one is better than first two methods. Sadegh Bafandeh [9] studied the detailed his- torical background and different applications of the method in various areas. If the distribution of the data is not known then the k-Nearest Neighbour (K-NN) method can be ap- plied for classification technique [10, 11, 12]. In the feature space objects can be classified on the basis of closest train- ing examples. It is one of the instance–based learning or lazy learning where computation is done until classification and function is approximated locally [13, 14]. A Bayesian network or Bayes network or belief network or Bayesian model or probabilistic directed acyclic graphical models a type of statistical model. A belief network to assess the ef- fect of climate change on potato production was formulated by yiqun Gu et. al. [15]. Authors have shown a belief net- work combining the uncertainty of future climate change, considering the variability of current weather parameters such as temperature, radiation, rainfall and the knowledge about potato development. They thought that their net- work give support for policy makers in agriculture. They test their model by using synthetic weather scenarios and then the results are compared with the conventional math- A Classifier Ensemble Approach for Prediction of Rice Yield Based. . . Informatica 45 (2021) 367–380 369 ematical model and conclude that the efficiency is more for the belief network. There are various factors influenc- ing the prediction. UnoY et al. [16] used agronomic vari- ables, nitrogen application and weed control using the ma- chine learning algorithm such as artificial neural network and Decision Tree (DT) to develop the yield mapping and to forecast yield. They have concluded that high predic- tion accuracies are obtained by using ANNs. Veenadhari S et al. [17] described the soybean productivity modelling using DT algorithms. Authors have collected the climate data of Bhopal district for the period 1984-2003. They considered the climatic factors such as evaporation, maxi- mum temperature, maximum relative humidity, rainfall and the crop was soybean yield and applied the Interactive Di- chotomizer3 algorithm which is information based method and based on two assumptions. Using the induction tree analysis it was found that the relative humidity is a ma- jor influencing parameter on the soybean crop yield. DT formed for influence of climatic factors on soybean yield. Using the if-then-else rules the DT is formulated to classi- fication rules. Relative humidity affects much on the pro- duction of soybean and some rules generated which help to in the low and high prediction of soybean. One of the drawbacks was only the low or high yield can be predicted but the amount of yield production cannot be predicted. Due to the diversity of climate in India, agriculture crops are poorly impressed in terms of their achievement from past two decades. Forecasting of crop production and ad- vanced yield might be helpful to policy inventor and farm- ers to take convenient decision. The forecasting also helps for planning in the industries and they can coordinate their business on account of the component of the climate. A software tool titled ‘Crop Advisor’ has been developed by Veenadhari et al. [18] which is a client friendly and can forecast the crop yields with the effect of weather parame- ters.C4.5 algorithm is applied ascertain the most effective climatic parameter on the crop yields of specified crops in preferred district of Madhya Pradesh. The software will be helpful for advice the effect of various weather parame- ters on the crop yield. Other agro –input parameters liable for crop yield are not accommodating in this tool, since the application of these input parameters differ with indi- vidual fields in space and time. Alexander Brenning et al. [19] compared all the classifier including Linear Discrimi- nant Analysis (LDA) for crop identification based on multi- temporal land dataset and concluded that stabilized LDA performed well mainly in field wise classification. Ming- gang Du et al. [20] used the method LDA for plant classi- fication and conclude that LDA with Principal Component Analysis is effective and feasible for plant classification. Renrang Liao [21] classified fruit tree crops using penal- ized LDA and found that the LDA may not be able to deal with collinear high dimensional data. It has been observed that, most of literature are using single classification model to predict the crop yield leading to increase in misclassifi- cation by data biasing, therefore we have been motivated to formulate a multiclassifier based model known as clas- sifier ensemble [22]. This ensemble technique helps to re- duce the classification error by considering the outputs of different classifiers by taking the majority of right outputs [23, 24]. In this paper we have tried to consider the colli- sion of the weather transform scenario of Odisha context of the farming yield of the one main fasten food rice using the machine learning methods such as SVM, K-NN, NB, DT and LDA [25, 26]. 3 Materials and methods This section briefly describes the machine learning tech- niques and tools used to develop the ensemble based crop prediction model. 3.1 Support Vector Machine (SVM) TSVM is one of the supervised machine learning tech- niques and also known as support vector networks. It anal- yses data mainly for classification and regression analysis. A set of labelled training data it produces by using input- output mapping functions [27]. For both classification of linear and non linear dataset, SVM method can be used. The original training data transformed a higher dimension by SVM using non linear mapping. Then for the linear op- timal separating hyper plane, the new dimension searched by SVM. Thus, a decision boundary formed which sepa- rates the different classes from one another [28]. When the SVM is used for the prediction of the crop yield then it is known as support vector regression. The main objective of the SVM is to find non-linear function by the use of kernel that is a linear on polynomial function [29, 30, 30]. The radial basis function and the polynomial function are the widely used kernel functions. In case large input samples space the difficulty of using linear function can be avoided by using SVM. Due to optimization the complex problem can be converted into simple linear function optimization [32]. 3.2 K-Nearest Neighbour (K-NN) K-NN [33] is one of the simplest supervised learning meth- ods used for both classification and prediction techniques [34, 35]. By using K-NN the unknown sample can be clas- sified to predefined classes, based on the training data. It requires more computation than other techniques. But it is better for dynamic numbers that change or updated quickly. For new sample classification the K-NN process the de- tachment among the entire sample in the training data. The Euclidian distance is used for distance measurement. The samples with the smallest distance to the new sample are known as K-nearest neighbours [36]. The main idea be- hind theK-NN is to estimate on a fixed number of obser- vations those are closest to the desired output. It can be used for both in discrete and continuous decision making such as classification and regression. In case of classifi- cation most frequent neighbours are selected and in case 370 Informatica 45 (2021) 367–380 S. Mishra et al. of prediction or regression the average of k-neighbours are calculated. Besides the Euclidean distance, Manhattan dis- tance and Minkowski distance are used in K-NN [37]. 3.3 Naïve Bayesian Classifier (NB) The NB classification technique is developed on the ba- sis of Bayesian theorem. This technique is most suitable when the input value is very high that when the dataset is very high we can use the Naïve Bayes technique. The other names of Bayes classifiers are simple Bayes or idiot Bayes [38]. Naïve Bayes classifier is a simple probabilistic classi- fier with strong independence assumptions. The classifier can be trained on the nature of the probability model. It can work well in many complex real world situations. It requires a little quantity of training data to calculate the pa- rameter essential for the classification and it is the main ad- vantage of Naïve Bayes classifier. Bayes theorem is based on probabilistic belief. It is based on conditional proba- bility on mathematical manipulation. Therefore, Bayes im- portant characteristics can be computed using rules of prob- ability, more specific conditional probability [39]. 3.4 Decision Tree (DT) DT presents a very encouraging technique for automating most of the data mining and predictive modelling process. They embed automated solutions such as over fitting and handling missing data. The models built by DTs can be eas- ily viewed as a tree of simple decisions and provide well- integrated solutions with high accuracy. DT also known as classification tree is a tree like structure which recursively partitions the dataset in terms of its features. Each interior node of such a tree is labelled with a test function. The best known DT algorithms are C4.5 and ID3 [40].The figure 1 illustrates an example of DT with their IF ::: THEN::: ELSE::: rules form. Figure 1: Decision Tree with IF ::: THEN ::: ELSE ::: Rules form 3.5 Linear Discriminant Analysis (LDA) Discriminant analysis is a multivariate method of classifi- cation. Discriminant analysis is similar to regression anal- ysis except that the dependent variable is categorical rather than continuous in discriminant analysis; the intent is to predict class membership of individual observations based on a set of predictor variables. LDA generally attempts to find linear combinations of predictor variables that best separate the groups of observations. These combinations are called discriminant functions. It is one of the dimen- sional reduction methods, used in preprocessing in pattern- classification and machine learning applications. In order to avoid over fitting we can apply LDA in the dataset for good class separability with reduced computational cost [41]. Linear combinations of the predictors are used by LDA to model the degree to which an observation belongs to each class and discriminant function is used and a thresh- old is applied for classification [42]. 3.6 Majority voting Majority voting is one of the ensemble learning algorithms, which is a voting based methods. Majority vote is appropri- ate when each classifier cl can produce class-probability es- timates rather than a simple classification decision. A class- probability estimate for data pointy is the probability that the true class isk : A(f(x) = mjcl), form = 1; ;M. We can combine the class probabilities of all the hypothe- ses so that the class probability of the ensemble can be found [43]. Sarwesh Site et. al. described about the bet- ter performance for better prediction after merging two or more classifier using the voting of data, which is known as ensemble classifier. They described various technique of ensemble classifier both for binary classification and multi- class classification [44]. Xueyi Wang et. al. prepared a model to find the accuracies of majority voting ensembles by taking the UCI repository data and made experiment of the 32 dataset. They made their data into different subsets such as core, outlier and boundary and found result that for better ensemble method or to achieve high accuracy; the weak individual classifier should be partly diverse [45]. 3.7 Performance measures This section discusses the basics of specificity, sensitiv- ity/recall, and precision, NPV , FPR, FNP, FDR, F-Score, G-Mean, MCC and J-Statistics. These are extent to which a test measures what it is supposed to measure; in other words, it is the accuracy of the test or validity of the test and measured using a confusion matrix i.e. a two- by-two matrix. There are four elements of a confusion matrix such as; True Positives (TP), False Positives (FP), False Negatives (FN) and True Negatives (TN) represented in the a, b, c and d cells in the matrix []. Specificity is computed as d(TN)=(FP ) + d(TN), sensitivity as; a(TP )=a(TP ) +c(FN). Sensitivity and specificity are in- versely proportional, i.e. as the sensitivity increases, the A Classifier Ensemble Approach for Prediction of Rice Yield Based. . . Informatica 45 (2021) 367–380 371 specificity decreases and vice versa. Precision tells about, how many of test positives are true positives and if this number is higher or closer to 100 then, this test it sug- gests that this new test is doing as good as the defined standard. It can be computed as;a(TP )=a(TP ) +b(FP ); NPV tells how many of test negatives are true negatives and the desired value is approximately 100 and then it sug- gests that this new test is doing as good as the defined stan- dard. Computed as; d(TN)=c(FN) +d(TN). Assum- ing all other factors remain constant, the PPV will increase with increasing prevalence; and NPV decreases with in- crease in prevalence.A false positive error or fall-out is a result that indicates a given condition has been fulfilled, when it actually has not been fulfilled, or erroneously a positive effect has been assumed. In other words, it is the proportion of all negatives that still yield positive test out- comes, i.e., the conditional probability of a positive test re- sult given an event that was not present and computed as b(FP )=b(FP ) + d(TN) or 1-Specificity. An FNR is a test that result indicates a condition failed, while it actually was successful, or erroneously no effect has been assumed. In other words, it is the proportion of events that are be- ing tested for which yield negative test outcomes with the test, i.e., the conditional probability of a negative test re- sult given that the event being looked for has taken place and can be computed as, c(FN)=a(TP ) +c(FN) or 1- Sensitivity. FDR is a way of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are de- signed to control the expected proportion of rejected null hypotheses that were incorrect rejections or false discover- ies and computed as, b(FP )=b(FP ) +a(TP ) or 1-PPV . F-Score measure considers both the precision and the re- call of the test to compute the score. It can be interpreted as a weighted average of the precision and recall, where an F-Score reaches its best value at 1 and worst at 0. It can be computed as: 2 ((Precision Recall)=(Precision Recall)). MCC is used to measure the quality of binary classification. It takes into account true and false pos- itives and negatives and is generally regarded as a bal- anced measure and can be used in case of imbalanced datasets. This is a correlation coefficient between the ob- served and predicted binary classification results. While there is no perfect way of describing the confusion ma- trix of true and false positives and negatives by a single number, MCC is generally regarded as being one of the best such measures and can be computed as: ((a d) (b c))= p (a +b) (a +c) (d +b) (d +c). The ac- curacy determined for the classifiers may not be an ade- quate performance measure when the number of negative cases is much greater than the number of positive cases i.e. the imbalanced classes. Suppose, there are 1000 cases, 995 of which are negative cases and 5 of which are pos- itive cases. If the system classifies them all as negative, the accuracy would be 99.5% even though the classifier missed all positive cases, in such cases G-mean comes into action. G-mean has the maximum value when sen- sitivity and specificity are equal and can be computed as: p Precision Recall. Youden’s J Statistics is a way of summarizing the performance of a diagnostic test. For a test with poor diagnostic accuracy, Youden’s index equals 0, and in a perfect test Youden’s index equals 1. The in- dex gives equal weight to false positive and false negative values, so all tests with the same value of the index give the same proportion of total misclassified results. This is Sensitivity +Specificity1 . 4 Structural and functional representation of proposed ensemble based prediction model The schematic representation of the proposed model is shown in Figure 2. First the datasets are collected from three coastal district of Odisha and different parame- ters collected from the Odisha Agriculture Statistics, Di- rector of Agriculture and Food Production, Govt. of Odisha, Bhubaneswar sources, and then the datasets are pre-processed. The proposed methodology is based on classifier ensemble method. The intension is to predict the rice yield for two seasons such as Rabi and Kharif with re- spect to the climatic variability of the coastal Odisha. This model uses five classifiers where four classifiers act as base classifier and one act as main classifier. List of classifier used are SVM, k-NN, DT, NB and LDA. Experiments are conducted by considering each classifier once as main clas- sifier and remaining four as base classifiers by using MAT- LAB 10 at windows OS. Then, we get five different pre- dicted outputs for rice production. Each classifier is build according the basic algorithm defined in literature [26] [31] [36] [38] [40] [43]. Let B =fb 1 ; ;b 4 g be the four base classifiers, and C =fc 1 ; ;c 4 g be the output of those four base clas- sifiers. The output of each classifier is passed through a conversion functionf to retrieve the production denoted as ^ S as given below and this acts as input to main classifier. ^ S l =f(c i ) (1) Wheref can be computed using equation (2) f(c i ) = N jNj (2) WhereN is the sum ofS i which belongs to classc i Hence, main classifier will have input having vector D =fdataset; ^ S 1 ; ^ S 2 ; ^ S3; ^ S 4 g. Result obtained after pro- cessingD by main classifier is compared expected output (y). Again equation (2) is used to compute the production based upon the class labels predicted. Final prediction is made by using majority voting on the class label predicted by each classifier as main classifier (Figure 3). Throughout the paper # symbol is used before classifier for differentiating it with base classifiers 372 Informatica 45 (2021) 367–380 S. Mishra et al. Figure 2: Schematic representation of proposed ensemble based prediction model Figure 3: Majority voting applied on the main classifiers 5 Experimentation and model evaluation This section elaborates the experimentation process start- ing from datasets chosen with their description, step wise representation of the working principle of proposed method and also the results are analyzed with respect to the aver- age classification accuracy and the predictive performances used to validate the model. 5.1 Dataset description Real dataset D is collected from three coastal regions of Odisha such as Balasore, Puri, Cuttack district. Let d i 2 D 8i = 1; ; 31 features where jd i j = 25 represents the attributes of the datasets. Differ- ent parameters collected from the Odisha Agriculture Statistics, Director of Agriculture and Food Production, Govt. of Odisha, Bhubaneswar [46]; such as p = fmaxtemperature;mintemperature;rainfall; humidityg that effect the rice production. Since, there are two types of rice production seasons such as; Rabi and Kharif produced between months “January - June” and “July –December”, hencep i is collected over the range of six months each resulting 24 set of attributes and 25 th at- tribute is the production in hector of crops for particular year. The rice production graph for those three coastal ar- eas of Odisha from the year 1983-2014 is shown in Figure (4a) and Figure (4b) for Rabi and Kharif season respec- tively. The detail description of datasets with standard de- viation (Std. Dev) for three areas is shown in Table 1. 5.2 Construction of dataset for classification Raw data collected have some missing value, and without class. One way is to deal with missing value is to simply replace it with most negligible positive real number. For classification, D must be in the form D =fd;yg, where d i refers to features andy i refers to class label. In order to predict the production of rice crop, one needs to properly define class label. One way is to use clustering and allocate each feature a class label similar to their cluster number. Looking to the random cluster index formed makes it dif- ficult to build common class label for the feature. Hence, in our work we have proposed a range based class label formation. Let Sdenote the production column vector of datasetD andy i can be formulated using equation (3). y i = 8 > > > > > > < > > > > > > : u s i