https://doi.org/10.31449/inf.v45i3.3041 Informatica 45 (2021) 433–440 433 Developing an Efficient Predictive Model Based on ML and DL Approaches to Detect Diabetes Said Gadri Laboratory of Informatics and its Applications of M’sila LIAM, Department of Computer Science Faculty of Mathematics and Informatics, University Mohamed Boudiaf of M’sila, M’sila, 28000, Algeria E-mail: Said.kadri@univ-msila.dz Keywords: diabetes classification, machine learning, deep learning Received: January 26, 2020 During the last decade, some important progress in machine learning ML area has been made, especially with the apparition of a new subfield called deep learning DL and CNN networks (Convolutional Neural Networks). This new tendency is used to perform much more sophisticated algorithms allowing high performance in many disciplines such as; pattern recognition, image classification, computer vision, as well as other supervised and unsupervised classification tasks. In this work, we have developed an automatic classifier that permits the classification of a large number of diabetic patients based on some blood characteristics by using ML and DL approaches. Initially, we have proceeded to the classification task using many ML algorithms. Then we proposed a simple DNN model composed of many layers. Finally, we established a comparison between ML and DL algorithms, as well as our model with other existing models. For the programming task, we have used Python, Tensorflow, and Keras which are the most used in the field. Povzetek: V tem delu smo razvili avtomatski klasifikator, ki omogoča klasifikacijo več bolnikov s sladkorno boleznijo na podlagi nekaterih značilnosti krvi z uporabo ML in DL pristopov. 1 Introduction Diabetes is one of the chronic diseases which causes an increase in blood sugar. Many serious complications can occur within the patient’s body if diabetes remains untreated or unidentified. The best way to identify diabetes results in visiting a diabetes diagnostic center or consulting a specialist doctor. The advance in machine learning ML and deep learning DL can help efficiently in solving this critical problem. The main task to perform using these approaches is to design a predictive model that helps to predict the presence of diabetes in patients with maximum accuracy [1, 2]. In simple words, the machine learning area is based on a set of methods that allow a machine to learn meaningful features (patterns) from data without the interaction of the human. During the last years, many IA tasks like pattern recognition, image classification, computer vision, etc, are based essentially on Machine Learning approaches that gave a high performance. Several algorithms have been developed in this area, including; logistic regression, k-nearest neighbors, naïve Bayes, decision trees, support vector machine, artificial neural networks, etc. Researches on Artificial Neural Networks ANN can be considered as the oldest discipline in ML and AI that dates back to J. McCulloch and W. Pitts in 1943, but these researches were quickly interrupted due to their high requirements in terms of hardware, software, and running time. Many years later, they were revived in parallel with the apparition of a new subfield called deep learning DL. This new approach greatly simplifies the feature engineering process in many vital areas such as; medical imaging. Among the DL methods, CNNs are of special importance. When exploiting local connectivity patterns efficiently which is the case of those used in the ImageNET competition [3]. Many works are trying to apply CNNs on image analysis [4, 5] using a variety of methods like the rectified linear unit [6] and deep residual learning [7]. The present paper is organized as follows: section 1 is a short introduction presenting the area of our work and its advantages and benefits. Section 2 presents the background of deep learning as a new sub-field of ML and AI. Section 3 is a detailed overview of the related works in the same area. In the fourth section, we described our proposed model. The fifth section presents the experimental part we have done to validate our proposed model. In section 6, we illustrated the obtained results when applying the new model. In section 7, we discussed the results obtained in the previous section. In section 8, we established a quick comparison between the ML algorithms and the proposed ANN model. In section 9, we established a large comparison to the existing models. In the last section, we summarized the realized work and suggested some perspectives for future researches. 2 Deep learning background Deep Learning DL becomes one of the most popular subfields of IA, and ML, especially in speech recognition, computer vision, and some other interesting topics [8]. Its success is motivated by three factors: the increased 434 Informatica 45 (2021) 433–440 S. Gadri amount of available data especially with the massive connection on the internet, the improvement and the lowest cost of hardware and software [6, 7, 9, 8, 10], the increased processing abilities (e.g., GPU units) [11]. DL is based essentially on the use of ANNs with at least two layers or many hidden layers. The convolutional neural network CNN is a specialized feedforward neural network that was developed to process multidimensional data, such as images. Its origins refer to the neo-cognition proposed by Fukushima in 1980 [12]. The first model of CNN was proposed by LeCUN et al., [13] in 1998 for the purpose of character recognition. Furthermore, many other alternative ANN architectures have been developed later, including; recurrent neural networks RNNs, autoencoders, and stochastic networks [14, 15, 16]. Deep learning also provided an efficient solution to the problem of input data representation which is considered as the most critical phase in ML especially when the problem is enough complex such as image or speech recognition [17] where it is difficult to define the best features to process on. In this way, DNNs can learn high-level feature representations of inputs through their multiple hidden layers. The first DNNs had appeared in the 1960s but were abandoned, after that for a long time in favor of the ML approach, due to its high requirements in terms of difficulties in training and inadequate performance [18]. In 1986 RumelHart et al., [19] proposed the back- propagation method to update efficiently neural network weights using the gradient of the loss function through multiple layers. Despite the promising results given by DNNs in the late 1980s [12] and 1990s [20], they were abandoned due to many problems. In 2006, researches in DL were revived especially when researchers have developed new methods for sensibility initializing DNN weights using a supervised layer-wise pretraining procedure [21, 22, 23]. It is the case of deep belief networks DBNs which worked very well through supervised learning and gave high accuracy in image and speech tasks in 2009 and 2012 [11]. In 2012, Krizhevsky et al., [18] proposed a deep CNN for the large-scale visual recognition challenge (LSVRC) [24] reducing significantly the error rate. This CNN has been implemented on multiple graphics processing unit GPUs for the first time. This new technique has since become a common practice in DL work until our days because it allows the training of large datasets and increases significantly the speed of processing. Furthermore, the use of a new activation function RELU (Rectified Linear Unit) was the alternative solution of the gradient that allowed faster training of data. The dropout technique is also used as a regularization method to decrease overfitting in large networks with many layers. All these interesting improvements in DL let the leading technology companies increase the research efforts, producing many other advances in the field. Many DL frameworks using tensor computation [14, 15, 16, 17] and GPU compatibility libraries [18] have been developed and made available to researchers through open-source software [23] and cloud services [24, 25]. On the other hand, many companies have met the challenges of big data when exploring large amounts of data to predict value decisions [26]. The notion of big data refers to data that exceeds the capability of standard data storage and data processing systems [27]. This large volume of data requires also high-performance hardware and very efficient analysis tools [28]. Some other ML challenges appeared with big data including; high dimensionality, distributed infrastructures, real-time requirements. Najafabadi et al [26] discuss the use of DL to solve big data challenges and proved that the capacity of DNNs to extract meaningful features from large datasets is extremely important. The automatic extraction of features from heterogeneous data, e.g., image text, audio is very useful and difficult. But this task becomes easy with the use of DL methods. Other tasks including; semantic-based IRS like semantic indexing and hashing [27, 28] also became possible with these high-level features, furthermore, DL is also used to tag incoming data streams allowing to classify fast-moving data [26]. In general, high-level DNNs are suitable for learning from large data issued of big data sources. In conclusion, we can say that DL is currently growing faster than before. 3 Related work Diabetes detection is one of the most important topics in the health care field. Many studies have been done in this field and have as a common goal the improvement of the accuracy as well as the speed of detection. Among these studies, we can note the following: Kayaer and Yeldirim [29] proposed a general regression neural network to detect diabetes and achieved an accuracy of 80.21%. Goncalves et al. [30] developed a system based on a new hierarchical neuro-fuzzy binary space repartitioning method BSP, they achieved an accuracy of 80.08% for the training set and 78.26% for the test set. Polat and Gunes [31] used a generalized discriminant analysis method GDA and the least square support vector machine LS-SVM, they reported an acceptable accuracy of 79.16%. Kim et al. [32] applied SVM with CPONs to obtain the best classification accuracy, they used the following datasets: WISCONSIN Breast Cancer Dataset, Pima Indians Diabetes, BUPA Liver Disorder Dataset, Ionosphere Dataset, and the MNIST Digit dataset from the UCI machine learning repository. They obtained an accuracy of 83.11% for the Pima Indians Diabetes PID dataset. Caliskan et al. [33] developed a simple training strategy for deep neural network classifiers using L-BFGS algorithm on many datasets including the PID dataset, they achieved an accuracy of 77.09%. Dwivedi [34] used machine learning algorithms to predict diabetes mellitus. He used many algorithms including; SVM, ANN, LR, DT, KNN, NB. The highest obtained accuracy was 82% using the Naïve Bayes algorithm. Vijayashree et al. [35] proposed a new system based on the use of recursive feature elimination combined with an analysis component to predict diabetes. Two architectures have been used here: a deep neural network which gave an accuracy of 82.67% and an artificial neural network which gave an accuracy of 78.62%. Ashiquzzaman et al. [36] proposed a prediction framework for diabetes mellitus using a deep learning approach with diminution of the overfitting. They used the Developing an Efficient Predictive Model Based on... Informatica 45 (2021) 433–440 435 Pima Indians Diabetes dataset and obtained an accuracy of 88.41%. Cheruki et al. [37] proposed an expert system based on PCA and an adaptive neuro-fuzzy inference system ANFIS on the PID dataset for diabetes diagnosis and obtained an accuracy of 89.47%. Kannadasan et al. [38] proposed a new predictive model for type 2 diabetes classification using stacked autoencoders cascaded with a softmax classifier and achieved an accuracy of 86.26%. Kowsher et al. [39] developed a deep ANN and ML classifier using many performance measures such as; accuracy and precision to specify the best DNN model and obtained an accuracy of 95.14% on the training dataset. Many other studies have been done, but the classification accuracy was within 59.5% and 77.7%. 4 The proposed approach In the present work, we have developed an automatic binary classifier that permits us to identify people affected by diabetes based on some characteristics (features). Thus, it is a binary classification into two (02) given classes (1: the patient has diabetes, 0: the patient does not have diabetes). All of the input variables that describe each patient are numerical. We note also that to build the best predictive model, we have used two approaches: the classic ML approach and DL approach. Initially, we have proceeded with the classification task using many ML algorithms including; LR, LDA, KNN, CART, NB, and SVM. Then we proposed a DNN model composed of many simple layers: one input layer (180 neurons), eight hidden layers (150, 120, 80, 50, 30, 18, 8, 4 neurons), and finally an output layer (01 neuron). In the last stage, we established a first comparison between the different algorithms and a second comparison between our model and the existing models. As programming tools, we have used Python, Tensorflow, and Keras which are the most used in this field. Fig 1 summarizes the classification task based on the two approaches, while Fig 2 presents a detailed architecture of the proposed DNN model to improve the performance of the classification task. 5 Experimental work 5.1 Used dataset In our experiments on the scikit-learn toolkit and DNNs, we used the Pima Indians onset of diabetes dataset which is a standard machine learning dataset from the UCI Machine Learning repository. It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years. Thus, our problem is a binary classification problem (onset of diabetes as 1 or not as 0). All of the input variables that describe each patient are numerical. This makes it easy to use directly with neural networks that expect numerical input and output values, and ideal to use the neural network in Python, Tensorflow, and Keras. The dataset includes data from 768 women with 8 characteristics, including: Figure 2: Architecture of the proposed CNN model. Input Layer (08 Variables) Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 Hidden Layer 4 Hidden Layer 5 Output Layer Hidden Layer 9 Hidden Layer 8 Hidden Layer 6 Hidden Layer 7 Figure 1: Diabetes Classification Process. Classic ML Approach Input Data (Patients) Predicted Class DL approach 436 Informatica 45 (2021) 433–440 S. Gadri 1. Number of times pregnant 2. Plasma glucose concentration within 2hours oral glucose tolerance test. 3. Diastolic blood pressure (mm/Hg) 4. Triceps skinfold thickness (mm) 5. 2-Hour serum insulin (mu U/ml) 6. Body mass index (weight in kg/(height in m)^2) 7. Diabetes pedigree function 8. Age (years) 9. The last column of the dataset indicates if the person has been diagnosed with diabetes (1) or not (0). 5.2 Programming tools Python: Python is currently one of the most popular languages for scientific applications. It has a high-level interactive nature and a rich collection of scientific libraries which lets it a good choice for algorithmic development and exploratory data analysis. It is increasingly used in academic establishments and also in industry. It contains a famous module called scikit-learn tool integrating a large number of ML algorithms for supervised and unsupervised problems such as; decision trees, logistic regression, Naïve Bayes, KNN, ANN, etc. this package of algorithms allows to simplify ML to non- specialists working on a general-purpose. Tensorflow: TensorFlow is a multipurpose open- source library for numerical computation using data flow graphs. It offers APIs for beginners and experts to develop for desktop, mobile, web, and cloud. TensorFlow can be used from many programming languages such as; Python, C++, Java, R,…, and Runs on a variety of platforms including; Unix, Windows, iOS, Android. We note also that Tensorflow can be run on single machines (CPU, GPU, TPU) or distributed machines of many 100s of GPU cards Keras: Keras is the official high-level API of TensorFlow which is characterized by many important characteristics: Minimalist, highly modular neural networks library written in Python, Capable of running on top of either TensorFlow or Theano, Large adoption in the industry and research community, Easy productization of models, Supports both convolutional networks and recurrent networks and combinations of the two, Supports arbitrary connectivity schemes (including multi-input and multi-output training), Runs seamlessly on CPU and GPU. 5.3 Evaluation To validate the different ML algorithms, and obtain the best model, we have used the cross-validation method consisting of splitting our dataset into 10 parts, train on 9, and test on 1 and repeat for all combinations of train/test splits. For the DNN model, we have used two parameters which are: loss value and accuracy metric. 1.Accuracy metric: This is a ratio of the number of correctly predicted instances divided by the total number of instances in the dataset multiplied by 100 to give a percentage (e.g., 90% accurate). 2.Loss value: used to optimize an ML algorithm or DL model. It must be calculated on training and validation datasets. Its simple interpretation is based on how well the ML algorithm or the DL model is doing in these two datasets. It gives the sum of errors made for each example in the training or validation set. 3.Precision: It is the number of real correct positive results divided by the total number of positive results predicted by the classifier. 4.Recall: It is the number of real correct positive results divided by the number of all relevant samples in the dataset (all samples that should have been identified as positive). 5.F1-Score: is the Harmonic Mean between precision and recall. The range for F1-Score is [0, 1]. It tells you how precise your classifier is (how many instances it classifies correctly), as well as how robust it is. 6.Confusion Matrix: The Confusion matrix is one of the easiest metrics used for finding the correctness and accuracy of the model. It is used for classification problem where the output can be of two or more types of classes and give the correctness for each class. 6 Illustration of obtained results To build the best predictive model, we performed two tasks: 1. Applying many ML algorithms, including; Logistic Regression LR, Linear Discriminant Analysis LDA, K- nearest Neighbors KNN, Decision Tree (CART variant), Gaussian Naïve Bayes NB, Support Vector Machine SVM. For this purpose, we used the scikit-learn library of python containing the most known learning algorithms. 2. Designing a DNN (Deep Neural Network) model according to the following process: ▪ We proposed a model composed of ten (10) full connected layers described as follows: ▪ layer 1(180 neurons and expects 8 input variables), layer 2 (150 neurons), layer 3 (120 neurons), layer 4 (80 neurons), layer 5 (50 neurons), layer 6 (30 neurons), layer 7 (18 neurons), layer 8 (8 neurons), layer 9 (04 neurons), and finally, layer 10 or the output layer has 1 neuron to predict the class (onset of diabetes or not). ▪ The ten (10) fully connected layers are defined using the Dense class of Keras which permits to specify the number of neurons in the layer as the first argument, the initialization method as the second argument, and the activation function using the activation argument. ▪ We initialize the network weights to a small random number generated from a uniform distribution (‘Uniform‘), in this case between 0 and 0.05 which is the default uniform weight initialization in Keras. Or ‘normal’ for small random numbers generated from a Gaussian distribution. ▪ We use the rectifier (‘Relu’) activation function on most layers and the sigmoid function in the output layer. Developing an Efficient Predictive Model Based on... Informatica 45 (2021) 433–440 437 ▪ We use a sigmoid function on the output layer to ensure our network output is between 0 and 1 and easy to map to either a probability of class 1 or 0. ▪ We compile the model using the efficient numerical libraries of Keras under the covers (the so-called backend) such as TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware (we have used CPU in our application). ▪ When compiling, we must specify some additional properties required when training the network. We note that training a network means finding the best set of weights to make predictions for this problem. ▪ When training the model, we must specify the loss function to evaluate a set of weights, the optimizer used to search through different weights of network, and any optional metrics we would like to collect and report during training. Since our problem is a binary classification, we have used a logarithmic loss, which is defined in Keras as “binary_crossentropy“. ▪ We will also use the efficient gradient descent algorithm “adam” because it is an efficient default. ▪ Finally, since it is a classification problem, we report the classification accuracy as the performance metric. ▪ Execute the model on some data. ▪ We can train or fit our model on our loaded data by calling the fit() function on the model, the training process will run for a fixed number of iterations through the dataset called epochs, which we must specify using the n-epochs argument. We can also set the number of instances that are evaluated before a weight update in the network is performed, called the batch size, and set using the batch_size argument. For our case, we fixed the following values: Nb- iter=350, batch-size=10. These are chosen experimentally by trial and reducing the error. ▪ We trained our DNN on the entire dataset (training set) and evaluated its performance on a part of the same dataset (test set) using the evaluate () function. This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy. The following tables summarize the obtained results on ML algorithms and the CNN model. 7 Discussion Table 1 summarizes the obtained results when applying the different ML algorithms including; LR, LDA, KNN, DT (CART), NB, SVM. We observe that LR, LDA, KNN, CART, and NB give a high classification accuracy (> 70%), while SVM gives a relatively low value of accuracy (65%) compared to the previous algorithms. Table 2 gives the performance of KNN with Minkowski similarity by class. The performance is given in terms of three measures: precision, recall, and F1-score for each class. Class 0 (no diabetes appears on a patient) has high values for the three measures, but class 1 (onset diabetes) not. This can be interpreted by the fact that we meet a con-flict in detecting a diabetic patient. Finally, table 2 calculates the micro-average, the macro-average and the weighted average of the different performance measures. These new metrics represent a kind of harmonic mean that summarizes the different values. Table 3 presents the obtained results when applying the proposed DNN model on the training set and the test set. Two performance measures are considered in this case, the loss value which calculates the sum of errors after training the model, and the accuracy value which gives the rate of correctness. It is clear, that the loss value is very low against the accuracy which is very high and depends on the size of the used set. It is the reason for which the accuracy of the training set is higher than the accuracy of the test set. In the same way, Figure 3 shows the evaluation of training loss and validation loss over time and in terms of the number of epochs. It begins very high for the training set and ends very low because of the large number of samples, but its variation for the validation set is not very quick and appears relatively stable. Similarly, Figure 4 plots the evolution of training accuracy and validation accuracy in terms of the number of epochs. Contrary to the loss value, the accuracy starts very low and ends very high. This property is clearer with the training set because of its large size. Algorithm Accuracy LR 77,09% LDA 77,61% KNN (k=5, metric = minkowski) 71,36% CART 70,49% NB 75,87% SVM 65,27% Table 1: The accuracy average after applying different ML algorithms. Class Precision Recall F1-score 0 0.75 0.84 0.79 1 0.62 0.49 0.55 Micro Avg 0.71 0.71 0.71 Macro Avg 0.69 0.66 0.67 Weighted Avg 0.70 0.71 0.70 Table 2: Performance report for KNN (k=5, metric = Minkowski). DNN (180, 150, 120, 80, 50, 30, 18, 8, 4, 1) Training set Loss: 0.0763 ; Accuracy: 97,39% Test set Loss: 0.26 ; Accuracy: 94.27% Table 3: Loss and accuracy values obtained when applying the proposed model. 438 Informatica 45 (2021) 433–440 S. Gadri We can see also in Figures 3 and 4 representing loss value and accuracy value successively that curves are not continuous but there exist some pics, that because the two values of loss and accuracy don’t progress continuously over epochs, but increase and decrease until stabilizing on a specific value. 8 Comparison between ML, and the DNN approaches In this section, we try to establish a comparison between different algorithms, ML algorithms, and the DNN model. The result of this comparison is illustrated in Table 4. According to Table 4 and Figure 5, The comparison favors the DNN model over ML algorithms. And we can confirm that the DL approach gives always the best values of accuracy if we choose a suitable architecture. 9 Comparison with other works In this section, we will establish a comparative study between our proposed model and other models of the state-of-the-art, as illustrated in Table 5 and shown in Figure 6. 10 . Conclusion and future suggestions In the last years, object recognition is based essentially on the ML approach that gives high performance. Many years later, some important progress on the ML area has been Algorithm Accuracy rate LR 77,09% LDA 77,61% KNN (k=5, metric = minkowski) 71,36% CART 70,49% NB 75,87% SVM 65,27% CNN model Train-Acc: 97,39% Test-Acc: 94,27% Table 4: Comparison between ML approach and DL Approach. Figure 3: Comparison between different Algorithms. Author Year Accuracy Kayaer and Yeldirim [29] 2003 80.21% Goncalves et al. [30] 2006 80.08% Polat and Gunes [31] 2007 79.16% Kim et al. [32] 2015 83.11% Dwivedi [34] 2017 82% Vijayashree et al. [35] 2017 82.67% Ashiquzzaman et al. [36] 2017 88.41% Cheruki et al. [37] 2017 89.47% Caliskan et al. [33] 2018 77.09% Kannadasan et al. [38] 2018 86.26% Kowsher et al.[39] 2020 95.14% My proposed model 2021 97.39% Table 5: Comparison with other existing models. Figure 4: Comparison with other existing models. 0 20 40 60 80 100 LR LDA KNN CART NB SVM CNN model accuracy rate 0 20 40 60 80 100 Comparison with other models Figure 5: Training loss Vs Validation loss of the DNN model. Figure 6: Training Accuracy Vs Validation Accuracy of the DNN model. Developing an Efficient Predictive Model Based on... Informatica 45 (2021) 433–440 439 made especially with the apparition of a new subfield called deep learning. It is mainly based on the use of many neural networks of simple interconnected units to extract meaningful patterns from a large amount of data to solve some complex problem such as; medical image classification, fraud detection, character recognition, etc. currently, we can use larger datasets to learn powerful models, and better techniques to avoid overfitting and underfitting. Until our days, the obtained results in this area of research are very surprising in different domains. We talk about very high values of accuracy which often exceed the threshold of 90%. For example, the accuracy rate on the digits set is over 97%. In the present paper, we have performed a task of classification on the Pima Indian dataset PID. We have used in the first stage many ML algorithms including; LR, LDA, KNN, DT (CART), NB, SVM. We obtained a good result of accuracy especially, on LR, LDA, KNN. In the second stage, we have built a DNN model to perform the same task of classification. The achieved performance is very surprising. We concluded our work by establishing a large comparison between different algorithms. The result of this comparison was in favor of the DL approach through the DNN model we have built. Furthermore, we have conducted a large comparison of our model with the existing models, our proposed model gives the best performance (a high value of accuracy). As a perspective of this promising work, we propose to improve these results by improving the architecture of the DNN model by studying other ANN architectures, changing some model parameters such as; the number of layers, the number of neurons in each layer, the number of training epochs and the size of data batches. Another suggestion that seems important, is to use other types of DNN or combining CNN with recurrent neural networks RNN. References [1] Ali A., Fakhreldeen A. S. (2021). A comparative analysis of machine learning algorithms to build a predictive model for detecting diabetes complications. Informatica journal, vol 45(1), pp. 117-125, https://doi.org/10.31449/inf.v45i1.3111 [2] Gjoreski M. (2021). A method for combining classical and deep machine learning for mobile health and behavior monitoring. Informatica journal, vol 45(1), pp. 169-170, https://doi.org/10.31449/inf.v45i1.3482 [3] Lee H., Grosse R., Ranganath R., and Ng A.Y(2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 609–616. ACM. https://doi.org/10.1145/1553374.1553453 [4] Pinto N., Doukhan D., DiCarlo J.J. , and Cox D.D. (2009) A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS computational biology, 5(11):e1000579. https://doi.org/10.1371/journal.pcbi.1000579 [5] Turaga S.C., Murray J.F., Jain V. Roth F., Helmstaedter M., Briggman K., Denk W., and Seung H.S. (2010). Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Computation, 22(2):511–538. https://doi.org/10.1162/neco.2009.10-08-881 [6] Abadi B., Agarwal M., Barham A, P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X. (2018) TensorFlow: large- scale machine learning on heterogeneous systems 2015. http://tensorflow.org/. Accessed 1 Nov 2018. https://doi.org/10.1145/3190508.3190551 Theano Development Team. (2016). Theano: a Python framework for fast computation of mathematical expressions. arXiv e-prints arXiv:1605.02688.https://doi.org/10.1016/b978-2- 294-71225-8.00029-9 Sivaslioglu S, Ozgur F., Şahinbaş K. (2021). A generative model based adversarial security of deep learning and linear classifier models. Informatica journal, vol 45(1), pp. 33-64, https://doi.org/10.31449/inf.v45i1.3234 [7] Chollet F, et al (2015). Keras. Received from https://keras.io. Accessed 1 Nov 2018. [8] Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer, E. cudnn (2014). Efficient primitives for deep learning. arXiv:1410.0759 [cs.NE] Krizhevsky A, Sutskever I, Hinton GE (2012). Imagenet classification with deep convolutional neural networks. Neural information processing systems. p. 25. https://doi.org/10.1145/3065386 [9] Fukushima K. (1980). Neocognitron: a self- organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;36(4):193–202. https://doi.org/10.1007/bf00344251 [10] LeCun Y, Bottou L, Bengio Y, Haffner P (1998). Gradient-based learning applied to document recognition. Proc IEEE. 86(11):2278–324. https://doi.org/10.1109/5.726791 [11] Witten IH, Frank E, Hall MA, Pal CJ (2016). Data mining, Fourth Edition: Practical machine learning tools and techniques. 4th ed. San Francisco: Morgan Kaufmann Publishers Inc. https://doi.org/10.1186/1475-925x-5-51 [12] Goodfellow I, Bengio Y, Courville A (2016). Deep learning. Cambridge: The MIT Press; 2016. ISBN 9780262035613 [13] Minar MR, Naher J. (2018). Recent advances in deep learning: an overview. arXiv:1807.08169 [cs.LG] http://dx.doi.org/10.13140/RG.2.2.24831.10403 [14] LeCun Y, Bengio Y, Hinton G (2015). Deep learning. Nature;521:436. https://doi.org/10.1038/nature14539 440 Informatica 45 (2021) 433–440 S. Gadri [15] Schmidhuber J. (2015). Deep learning in neural networks: an overview. Neural Net; 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003 [16] Rumelhart DE, Hinton GE, Williams RJ. (1986). Learning representations by back-propagating errors. Nature; pp. 323-33. https://doi.org/10.1038/323533a0 [17] Le Cun,, Y., Boser B., Denker J.S., Henderson D., Howard R.E., Hubbard W., Jackel L.D. (1990) :Handwritten digit recognition with a back- propagation network. In Advances in neural information processing systems. https://doi.org/10.1109/ijcnn.1990.137801 [18] LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD. (1989). Backpropagation applied to handwritten zip code recognition. Neural Comput;1(4):541–51. https://doi.org/10.1162/neco.1989.1.4.541 [19] Hinton GE, Osindero S, Teh Y-W. (2006). A fast learning algorithm for deep belief nets. Neural Comput;18(7):1527–54. https://doi.org/10.1162/neco.2006.18.7.1527. [20] Bengio Y, Lamblin P, Popovici D, Larochelle H. (2007) Greedy layer-wise training of deep networks. In: Proceedings of the 19th international conference on neural information processing systems. NIPS’06. MIT Press, Cambridge, MA, USA. p. 153–60. [21] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L. (2015). ImageNet large scale visual recognition challenge. Int J Comput Vision (IJCV);115(3):211–52. https://doi.org/10.1007/s11263-015-0816-y. https://doi.org/10.1007/s11263-015-0816-y [22] Kumar M. (2016). An incorporation of artificial intelligence capabilities in cloud computing. Int J Eng Comput Sci. https://doi.org/10.18535/ijecs/v5i11.63. [23] Saiyeda A, Mir MA. (2017). Cloud computing for deep learning analytics: a survey of current trends and challenges. Int J Adv Res Comput Sci;8(2):68–72. https://doi.org/10.26483/ijarcs.v8i2.2931 Dumbill E. (2012). What is big data?: an introduction to the big data landscape. http://radar.oreilly.com/ 2012/01/what-is-big-data.html [24] Hongbiao Ni. (2020). Face recognition based on deep learning under the background of big data. Informatica journal, vol 44(4), pp. 491-495, https://doi.org/10.31449/inf.v44i4.3390 [25] Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E. (2015). Deep learning applications and challenges in big data analytics. J Big Data; 2(1):1. https://doi.org/10.1186/s40537-014-0007-7. [26] Hinton G, Salakhutdinov R. (2011). Discovering binary codes for documents by learning deep generative models. Top Cogn Sci.;3(1):74–91. https://doi.org/10.1111/j.1756-8765.2010.01109.x [27] Salakhutdinov R, Hinton G. (2009). Semantic hashing. Int.J.Approx Reason;50(7):969–78. doi:10.1016/j.ijar.2008.11.006 [28] Kayaer K, Yıldırım T. (2003). Medical diagnosis on pima indian diabetes using general regression neural networks. Proceedings of the International Conference on Artificial Neural Networks and Neural Information Processing (ICANN/ICONIP):181–184 Istanbul, Turkey, June 26–29. https://doi.org/10.1007/3-540-44989-2_127 [29] Goncalves L. B and Bernardes M.M. (2006). “Inverted Hierarchical Neuro-Fuzzy BSP System: A Novel Neuro-Fuzzy Model for Pattern Classification and Rule Extraction in Databases,” in IEEE Transactions on Systems, Man, and Cybernetics, vol. 36, no. 2, pp. 236-248, Mar. 2006. https://doi.org/10.1109/tsmcc.2004.843220 [30] Polat K, Gunes S. (2007). An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease. Digit Signal Process, 17(4):702–710 https://doi.org/10.1016/j.dsp.2006.09.005. [31] Kim S, Yu Z, Kil R, Lee M. (2015). Deep learning of support vector machines with class probability output networks. Neural Network. 64, 19–28. https://doi.org/10.1016/j.neunet.2014.09.007 [32] Caliskan A, Yuksel ME, Badem H, Basturk A (2018). Performance improvement of deep neural network classifiers by a simple training strategy. Eng Appl Artif Intell; 67:14–23. https://doi.org/10.1016/j.engappai.2017.09.002. [33] Kumar A Dwivedi. (2017). Analysis of computational intelligence techniques for diabetes mellitus prediction,” Neural Comput. Appl., vol. 13, no. 3, pp. 1–9. DOI:10.1007/s00521-017-2969-9 [34] Vijayashree J and Jayashree, J. (2017). An Expert System for the Diagnosis of Diabetic Patients using Deep Neural Networks and Recursive Feature Elimination,” International Journal of Civil Engineering and Technology, vol. 8, pp. 633-641. ISSN Print: 0976-6308 and ISSN Online: 0976-6316 [35] Ashiquzzaman A, Tushar A. K., Islam M,. Kim J.-M et al., ``Reduction of overfitting in diabetes prediction using deep learning neural network,'' arXiv:1707.08386 [cs.CV]. DOI: 10.5815/ijieeb.2019.02.03 [36] Cheruku R, Edla and DR, Kuppili V, Sm-ruleminer. (2017). Spider monkey-based rule miner using novel fitness function for diabetes classification. Comput Biol Med;81:79–92 https://doi.org/10.1016/j.compbiomed.2016.12.009 [37] Kannadasan K, Edla D.R., Kuppili V. (2018). Type 2 diabetes data classification using stacked autoencoders in deep neural networks. Clin. Epidemiol. Glob. Health. 7(4), 530–535. https://doi.org/10.1016/j.cegh.2018.12.004 [38] Kowsher M., Turaba M.Y., Sajed T et al., (2020). Prognosis and treatment prediction of type-2 diabetes using deep neural network and machine learning classifiers, in International Conference on Computer and Information Technology (ICCIT). http://dx.doi.org/10.1109/ICCIT48885.2019.903857 4