https://doi.org/10.31449/inf.v45i7.3774 Informatica 45 (2021) 115–124 115 ResNet-34/DR: A Residual Convolutional Neural Network for the Diagnosis of Diabetic Retinopathy Noor M. Al-Moosawi College of Computer Science & Information Technology, University of Basrah, Iraq E-mail: almoosawinoor2@gmail.com Raidah S. Khudeyer College of Computer Science & Information Technology, University of Basrah, Iraq E-mail: raidah.khudayer@uobasrah.edu.iq Keywords: Convolutional Neural Networks (CNN), Deep Learning (DL), Diabetic Retinopathy (DR), ResNet-34, Transfer Learning (TL) Received: October 10, 2021 Diabetic retinopathy (DR) is an eye complication associated with diabetes, resulting in blurred vision or blindness. The early diagnosis and treatment of DR can decrease the risk of vision loss dramatically. However, such diagnosis is a tedious and complicated task due to the variability of retinal changes across the stages of the diseases, and due to the high number of undiagnosed and untreated DR cases. In this paper, we develop a computationally efficient and scalable deep learning model using convolutional neural networks (CNN), for diagnosing DR automatically. Various preprocessing algorithms are utilized to improve accuracy, and a transfer learning strategy is adopted to speed up the process. Our experiment used the fundus image set available on online Kaggle datasets. As an ultimate conclusion of applicable performance metrics, our computational simulation achieved a relatively-high F1 score of 93.2% for stage-based DR classification. Povzetek: Opisana je metoda globokih nevronskih mrež za diagnozo težav vida zaradi sladkorne bolezni. 1 Introduction Diabetic retinopathy (DR) is a disease that affects the eye as a complication of diabetes, causing impaired vision as a result of damage to the retina, the light-sensitive tissues at the bottom of the eye that are required for vision [1-2]. Diabetes harms blood vessels in the retina. The longer a person gets diabetes, the more likely such person is to develop DR. According to the World Health Organization (WHO), the global population of DR patients is expected to increase to 592 million by 2025 [1]. Diabetic retinopathy (DR) develops through many stages with increasing severity, which could if left untreated, lead to blindness [3]. DR is mainly classified into no proliferative (NPDR) and proliferative (PDR). Furthermore, NPDR can be classified as mild, moderate, or severe. Figure 1 shows examples of different stages. DR stages are as follows [3- 5]: a) No DR: The eye is healthy. b) Mild NPDR: Small swellings appear in retina blood vessels. c) Moderate NPDR: As the disease progresses, some retina blood vessels become blocked. d) Severe NPDR: More blood vessels are blocked, depriving the retina of oxygen and nutrients. e) PDR: In this stage, the growth of new blood vessels is stimulated proliferative. However, such new blood vessels have an abnormal appearance and very thin and fragile walls. When these vessels bleed, they can cause severe vision loss and even blindness. The early detection of the disease helps avoid complications and improves chances of recovery. More than 90% of patients can avoid vision loss by early detection and treatment [3]. Typically; an ophthalmologist diagnoses DR by manually interpreting and analyzing fundus photographs. However, DR diagnosis is a tedious and complicated task due to the variability of retinal changes across the stages of the diseases, and due to the high number of undiagnosed and untreated DR cases. Human competency is prone to error and novel computational techniques are being pursued in an attempt to overcome this problem. Diagnosis can be more reliable if it is based on extracted highly discriminative features and resistant to specific conditions, such as lighting changes. Deep learning and CNN are the most current methods for extracting features. CNN's extracted features have a high discriminative capacity [6]. The previous methods, which depended on a deep CNN for DR diagnosis using a very deep CNN model, GoogLeNet, VggNet, and ResNet, achieved good Figure 1: Examples of fundus images for DR stages according to disease severity. 116 Informatica 45 (2021) 115–124 N. M. Al-Moosawi et al. accuracy rates. However, it is still possible to further optimize outputs by making some improvements to the models as follows: a) We proposed an efficient CNN model that is based on ResNet-34 for transfer learning for DR diagnosis with higher accuracy. b) The current APTOS 2019 and IDRID datasets are being analyzed to evaluate the performance of the proposed ResNet-34/DR model. c) Transfer learning with pre-trained deep CNNs and hyperparameter tuning are critical components of the training process and have been highly beneficial in medical image analysis. We use weights from the ImageNet dataset to initialize the weights instead of initializing the weights randomly. d) By comparison with the accuracy rates of the modified GoogLeNet and VggNet models, ResNet-34/DR yielded better classification performance. This paper is structured into several sections as follows. Section 2 provides an overview of deep learning, CNN for image classification, and the need for Transfer Learning (TL) in our task. The methodology used in this paper will be described in detail in Section 3. Section 4 explains preprocessing for the dataset, the training process with hyper_parameters for all experiments. The final results of the experiments are summarized in Section 5. In Section 6, the performance of our work is discussed against previous studies. Finally, section 7 concludes this paper with the baseline for the future. 2 Background Numerous previous studies employed a variety of methods to handle the problem of diagnosing DR. We will highlight them in this section. Table 1 summarizes the previous studies discussed in this section. 2.1 Deep learning methods Recent developments in artificial intelligence (AI) have paved the road for big advances in the field of automatic diagnosis in various medical fields as compared with manual methods. Computer-Aided Diagnosis (CAD) systems could provide features, such as the reduction of human error, supporting medical decisions, and improved patient care, as the diagnosis of DR is essentially made in reliance on image processing techniques, using the latest AI technologies, particularly, machine learning (ML) and deep learning (DL), whereas DL is a special type of ML, involving a deeper level of data analysis, and hence, deeper learning [7]. DL has quickly established itself as a valuable technique for the analysis and classification of medical images [3-5]. Previous studies that relied on machine learning methods and feature extraction have produced excellent work for the diagnosis of DR, as characterized by solid secretions, red lesions, micro aneurysms, and blood vessels [8]. The classifiers that are used to accomplish the task include neural networks, random forest, sparse representation classifiers, linear discriminant analysis (LDA), support vector machine (SVM), K-nearest neighbors (KNN) algorithm [3][9]. Such techniques assemble healthy and infected eye fundus images for the analysis thereof. DL methods for DR diagnosis were being used with a possibility to variate the features which the corresponding architecture deems to constitute diagnostic indicators, which help identify the most significant areas in the images by researchers [10], by the addition of a global average pooling layer to the CNN instead of a fully- connected layer. Convolutional neural networks (CNNs) are being discussed in the next section as a new inspiring DL method for providing a more accurate and more detailed, and hence, more useful, diagnosis of DR. 2.2 CNN overview The major limitations of the majority of the aforementioned techniques are that: a) they merely give a binary result, yielding: “DR” or “no DR”, becoming, practically, a mere detection rather than a full-scale classification. b) Most models have been trained by researchers on small samples, limiting the generalizability of their findings. Therefore, such automatic diagnosis systems are limited [11]. The development of CNN layers has provided a greater ability to classify images and detect patterns, objects, and other distinguishing features in a picture [12]. These are multiple computational layers that involve the application of image analysis filters in the form of convolutions. [13]. By convoluting multiple filters over an image within a layer, a feature map is generated to be used, as an input, to the next layer, enabling the processing of images as pixels for such input in order to generate the required classification (in our case, diagnosis), as an output. Such a classification approach within a single classifier replaces multiple steps of the previous methods of image analysis [14], and thus, enabling a faster and more efficient image interpretation process. CNNs have been used a lot in the fields of computer vision, in general, and medical imaging, in particular, thanks to their great ability to handle and process images. They have become a state-of-art technique in various medical fields. CNNs generally consist of three types of layers [15]: i) Convolutional layers, where a number of filters are applied to identify a certain feature or pattern in the inputted image. A stack of filters is used in order to extract various features. The values of such filters are tuned by training to be consistent with the extraction of the attributes that are associated with the disease (disease indicators). ii) Pooling layers to reduce the feature map extracted by the filters in order to reduce the necessary calculations while retaining the best values of the attributes resulting from the convolutional layers. iii) Fully-connected (FC) layers, where the final classification process takes place, as every neuron is associated with the neurons of the preceding layer. In addition to such layers, an activation function is employed. The number and sequence of layers vary depending on the complexity of the corresponding problem. Since 2015, researchers have relied on CNNs as a powerful tool in the field of computer-aided diagnosis. For ResNet-34/DR: A Residual Convolutional Neural Network ... Informatica 45 (2021) 115–124 117 example [16], CNN was utilized for the diagnostic of DR in fundus images as one of two classes: normal and abnormal, as such proposed architecture relied on linking three stacks of convolutional filters in parallel, whereby the output is an outcome of global max pooling. The RGB layers of an image were being isolated in order to use the green layer only, as it is the layer that demonstrates the attributes of the disease in the clearest and most distinguishable manner. This architecture has helped reduce the number of parameters and avoid overfitting, yielding a final accuracy of 81% when experimented on 12,000 images. In another work [12], a methodology was proposed for the further classification of the image database (Aptos 2019) into three stages: a) No DR. b) Moderate DR. c) Severe DR. Architecture was built to consist of 18 convolutional layers and 3 fully-connected layers, in addition to max pooling and the use of the preprocessing image techniques of image resizing and data augmentation. This architecture yielded an accuracy of 88%. DL methods for DR diagnosis were being used with a possibility to variate the features which the corresponding architecture deems to constitute diagnostic indicators, which help identify the most significant area in the images by researchers [8], by the addition of a global average pooling layer to the CNN instead of a fully- connected layer. 2.3 The need for Transfer Learning As implied hereinbefore, DL requires a huge quantity of data for the efficient training of a CNN. This is not usually possible in the field of ophthalmology, as the available real data are relatively limited and unbalanced. Therefore, researchers are intensively relying on transfer learning (TL) to overcome the obstacles of computational time and the need for ongoing training. TL is a method of overcoming the limitedness of data by leveraging knowledge from another domain [15]. CNNs and TL are the main two methods of automatic DR diagnosis using DL techniques [9], [12], [14]. TL has proven itself as a very effective technique, especially when handling domains of limited data [16]. Instead of completely training a blank network from scratch, a feed forward approach can be used to fix weights in the lower levels that have already been optimized in order to identify the structures that can be generally detected in images, and retain the weights of the upper levels with backpropagation, enabling the model to identify the unique features of a given set of images, such as fundus images, with much less time, training examples, and computational power [14][19]. Data are analyzed in different methods based on the complexity of the problem and the similarity to, or difference from, the data on which the neural network has been trained. Figure 2 presents the relationship between data similarity, data size, and required tuning. TL methods include feature extraction, copying the architecture of a pre-trans model, and freezing some layers while training the others. An experiment was conducted on three TL models [20], being namely, (Vgg-16, Vgg-19, InceptionV3) in addition to the techniques of data augmentation and local average coloring for the removal of camera noise. Data are classified binarily once and then quinarily once again. Such an experiment demonstrated that increasing data accuracy is directly associated with the number of convolution and pooling layers in the model. Vgg-19 achieved an accuracy of up to 80% and 76.9% in binary and quinary classification, respectively. Architecture [21] has been deployed for DR diagnosis using Messidor-1 datasets and GoogLeNet, AlexNet pre- trained architecture. GoogLeNet achieved the highest accuracy of 66% on the said dataset. In continuation of such efforts, we have sought herein to devise an effective transfer learning algorithm for processing fundus images for providing a faster and more accurate identification of the distinguishing pathological features of every eye image. 2.4 ResNet-34 As neural networks are inspired by the human brain and how it thinks, it is quite natural that the solution of complex problems that require deeper thinking is, thus, simulated by deeper networks for the solution of such problems. The main problem that is facing deep networks is the problem of vanishing gradients [22]. ResNet Figure 2: The relationship between data similarity, data size, and network tuning. Research Year DR-stages Method Accuracy [21] 2018 Four-stage TL with GoogLeNet with used adam optimizer and dropout regulation technique; CLAHE filtering used for preprocessing. 66% [16] 2019 Binary (normal/abnormal) CNN architecture with three stack convolutional filters in parallel in Conv layer and GMP layer. With several preprocessing steps. 81% [20] 2019 Binary; and Five-stage TL models Vgg-16, Vgg-19, InceptionV3 with data augmentation and local average color. 80% 76.9% [12] 2020 Three-stage (normal/moderate/severe) CNN with 18 Conv layer and 3 FC layer with preprocessing and data augmentation techniques. 88% Table 1: A summary of previous research on the diagnosis of DR using various methods. 118 Informatica 45 (2021) 115–124 N. M. Al-Moosawi et al. (residual network) [23] is a type of neural network that alleviates this problem of training deep learning networks by using skip-connections to “skip” a number of convolutional layers in every basic block in the network, a thing which provides alternative paths for original and derived data, rendering training faster and more possible. Such skip connections add the outputs of the prior blocks to the following ones, as expressed by the following equation: y = F(x) + x (1) Where x is input, y is output, and F is the residual function). Each basic block consists of 2 convolution layers and a pooling layer (3x3 size), following by a (ReLU) activation function and batch normalization (BN). Figure 3 shows a learning block of residual learning. Using ResNet has greatly improved the performance of neural networks, where such networks are stacked with more layers for the creation of a deeper architecture, and hence, deeper learning, in contrast with shallower learning. ResNet-34 [23] (ResNet with 34 layers) consists of 33 convolution layers and a max-pooling layer (3x3 size) and an average pooling layer, followed by a fully connected layer. Table 2 shows the architecture of ResNet-34. 3 Methodology In order to develop the best model of optimum performance, we have used pre-trained CNN models that were trained and tested on the ImageNet dataset. Based on our dataset, each VggNet (Vgg-19), GoogLeNet (Xception), and ResNet (ResNet-34) has been trained and tested with a number of refinements for each model. The hyper parameter has been tuned to enhance the networks’ ability to capture complex patterns in DR images. Several preprocessing and data augmentation techniques were applied on the dataset fundus images uniformly for all experiments in order to obtain comparable results. 3.1 Relevant approaches Vgg-19 The Visual Geometry Group (VGG) model presented in [24], consisting of 19 weighted layers, divided into 5 blocks, was the first one that was trained. Each block consists of 2-4 convolution layers (Conv layers), followed by a pooling layer for the reduction of dimensions. The top of the architecture includes 3 fully connected layers (FC layers). Figure 4 shows a proposed modified Vgg-19 model. The FC layers were omitted from the original architecture and replaced with our custom classifier. Global average pooling (GAP) was added to convert the output from the convolutional layers (n x n x d) into a 1-dimension vector (1 x 1 x d) as an input to the required FC layers. Dropout (0.25) regularization technique was used to reduce overfitting. ReLU (y = max (0, x)) was used as activation function in the FC layer. The prediction layer is being used with 4 nodes and a Softmax activation function for predicting the four stages of DR: F(𝑥 𝑖 ) = 𝑒 𝑥 𝑖 ∑ 𝑒 𝑥 𝑗 𝑘 𝑗 =0 (2) Xception Model The next model used is the Xception model [25], which is a CNN architecture that relies on depth-wise separable convolutions that contribute effectively to reducing computational cost and required memory size. This CNN model uses depth-wise separable convolution, which is an independent spatial convolution for each channel, followed by a pointwise convolution (1 x 1) across the channels. This can be thought of as looking firstly for correlations in a 2D space, and then looking for correlations in a 1D space. This 2D + 1D mapping appears to be easier to learn than a complete 3D mapping. This model, as shown in Figure 5, mainly consists of 36 Conv layers, distributed within 14 units, including linear residual connection. It was also used as the feature extraction, while a fully connected layer replaced the top of the architecture. GAP was added to receive the output of the Conv layers, and dropout was used as a regularization technique. FC Layer with (nod= 4) Is being used instead of (node = 1000) in the original architecture. The activation function is Softmax. Cascaded CNN Model Various CNN sub-models discover nonlinear discriminant features and semantic image descriptions from images at multiple levels of analysis [26]. In result, a cascaded CNN model will be extraordinarily generalized and helpful. In order to take advantage of CNN networks and their ability to extract features, the two aforementioned architectures (Vgg-19, Xception) have been concatenated as two different sources of knowledge in order to extract characteristics from an image in two different ways to enable models to achieve maximum learning of features from a given dataset. Thereafter, the outputs of each model are passed through GAP to reduce diminution. A merging Layer Name Output Size 34-Layer Conv1 112 x 112 7 x 7, 64, stride 2 3 x 3 max pool, stride 2 Conv2-x 56 x 56 [ 3 × 3, 64 3 × 3, 64 ] × 3 Conv3-x 28 x 28 [ 3 × 3, 128 3 × 3, 128 ] × 4 Conv4-x 14 x 14 [ 3 × 3, 256 3 × 3, 256 ] × 6 Conv5-x 7 x 7 [ 3 × 3, 512 3 × 3, 512 ] × 3 1 x 1 Average pool,1000 fc, Softmax Table 2: ResNet-34 architecture. Figure 3: A learning block of residual learning. ResNet-34/DR: A Residual Convolutional Neural Network ... Informatica 45 (2021) 115–124 119 layer is added to the top of each branch to combine the features that were deduced from various branches. Each branch’s features are then concatenated and reformed into a vector: 𝑥 𝑚𝑟𝑔 = 𝑀𝑒𝑟𝑔𝑒 (𝑥 1 , 𝑥 2 ) (3) Where 𝑥 1 , 𝑥 2 represent the outputs of the first and second branches, respectively. The merged vector passes through four FC layers that are then added on the top of the merging layer with batch normalization and dropout in order to speed up processing and overcome the overfitting being greatly impaired by this network. The activation function used in FC layers is LeakyReLU. The topmost FC layer uses a Softmax activation function for prediction. Figure 6 illustrates a cascaded CNN model. 3.2 ResNet-34/DR Architecture The proposed architecture is illustrated in Figure 7, where it can be divided into two parts: a features extraction part and a classifier part. ResNet-34 relied on the first part with the ability to handle trainable parameters. ResNet-34/DR consists of 16 basic units, with each unit consisting of 2 Conv layers (16 x 2 = 32 Conv layers in these blocks). ResNet-34/DR is composed of five convolutional groups in each group, where one or more Conv layer output passes through the BN layer and ReLU as a sequence (Conv → BN → ReLU) as demonstrated in section 2.4. The first layer in ResNet-34/DR is a Conv layer with a (7 x 7) filter size that is flowed by a MaxPooling layer with (3 x 3) filters and a stride value of 2. Multiple identical residual units are Conv2-x, Conv3-x, Conv4-x, and Conv5-x, respectively, in the second to the fifth groups. ImageNet weights are used to initialize the first 33 layers (1+16 x 2 = 33 Conv layers in ResNet-34/DR. Then, the classifier part is being represented by the FC layer, followed by a Softmax activation function that is added to the ResNet-34/DR to conform to the DR Dataset’s category label. 3.3 Selected Datasets Our research has relied on APTOS 2019 blindness detection and the Indian Diabetic Retinopathy Image Figure 4: A proposed modified Vgg-19 model. Figure 5: A proposed modified Xception model. Figure 6: A Cascaded CNN Model. Figure 7: Proposed ResNet-34/DR architecture. 120 Informatica 45 (2021) 115–124 N. M. Al-Moosawi et al. Dataset (IDRID) that is available in the Kaggle dataset and used extensively by researchers. APTOS 2019, organized by the Asia Pacific Tele-Ophthalmology Society [18] provided a high-quality fundus image dataset (3662 images) taken by various cameras with various effects, such as camera flashing, low contrast, out of focus, etc. IDRID, being a part of a DR grading challenge [19], includes 512 fundus images. We have only used the training part of this dataset (413 images) because it contains the labels (the DR stage) we need for the classification process. All images have a resolution of 4288 x 2848 pixels. Both datasets were combined in order to increase the volume of available data for training and because the categorized distribution of data was extremely unbalanced, as demonstrated in Figure 8. Both datasets contain fundus images accompanied by labels indicating one of the five different DR stages: (none (class-0), mild (class-1), moderate (class-2), severe (class-3), or proliferative (class4)). The distribution of images among the classes is being shown in Table 3. As class-1 is almost healthy and its images are almost indistinguishable from those of class-0, and based on accustomed medical practice, the images of class-1 will be merged along with those of class-0 in the upcoming proceedings of our experiment in this paper. 4 Implementation Preprocessing was carried out on a Python 3.7.9 environment. Deep-learning CNN models were trained on Google Colab [19], which provides a free GPU Jupiter environment for implementation via the cloud, with the use of Keras and PyTorch deep learning frameworks and Scikit_learn, NumPy, Pandas, and MatPlotLib Python packages. 4.1 Image preprocessing Numerous preprocessing steps have been used in our experiments to enhance and highlight disease-related features in fundus images and to configure the data for DL tasks, as follows: Image Filtering The images are raw data and they have been taken by different camera resolutions with different sizes, containing many effects. These observations were taken into account when dealing with the images to remove noises and increase robustness in our model. Specifically, we adopted the following preprocessing steps: I) Gaussian blurring, II) add weight, III) Masking, and IV) cropping & resizing. Firstly, the fundus images are blurred using the Gaussian function: G(x, y) = 1 2πσ 2 e − x 2 +y 2 2σ 2 (4) Where σ indicates the distribution standard deviation. In our experiment, σ equals 30. This processing method was inspired and modified from Ben Graham’s approach [30]. It is similar to medium filtering, but it employs a different kernel to generate a Gaussian (bell-shaped) hump. This is done to eliminate noise from such images. In the next step, the output image from the previous step was combined with the original image using the equation: 𝐼 𝑐 = 𝛼𝐼 + 𝛽𝐺 (𝑝 ) ∗ 𝐼 + 𝛾 (5) Where * denotes convolution, 𝐼 denotes input images, G (𝑝 ) denotes a Gaussian filter with a standard deviation, while α, β, γ are predefined parameters. Then, we took care to distinguish the fundus area from the background. We enriched the images with circle masks and a dark background. The last step was that about 10% of the image’s outer borders on both sides are cut off, which does not include any helpful information. This sequence of preprocessing steps transformed every image in our dataset from a differently-sized image into a square- shaped image with a similar background and then resized Class Index (Type) DR Stage APTOS 2019 Dataset IDRID Dataset Our Dataset 0 No DR 1805 134 1939 1 Mild 370 20 390 2 Moderate 999 136 1135 3 Severe 193 74 267 4 Proliferative 295 49 344 Table 3: The distribution of images among classes within various datasets. Figure 8: The unbalanced distribution of data within the categories, with the majority of data belonging to the (Normal) class. Original Image Gaussian Blurring Adding Weight Image Masking Image Cropping Figure 9: Preprocessing an eye fundus image (filtering). ResNet-34/DR: A Residual Convolutional Neural Network ... Informatica 45 (2021) 115–124 121 it to 500 x 500. After preprocessing a dataset, 10% of the data was being isolated for the testing set. The remaining data were randomly divided at a ratio of 75:25 for the training validation sets, including 2487, 829, and 368 images for training, validation, and testing phases, respectively. Figure 9 shows the preprocessing steps of an eye fundus image within the filtering stage. Image Normalization This step is crucial in DL because it accelerates the convergence process on the gradient descent algorithm, thereby increasing the model’s efficiency. A straightforward and effective method was used, which involved dividing each pixel in the image by 225. Data Augmentation We used the data augmentation technique to expand the training dataset artificially. Training deep learning models on additional data can result in more skilled models. Augmentation techniques can generate image variations that can improve the fit models’ ability to generalize their learning to new images and avoid overfitting. Image augmentation generates artificial training images through various processing methods or combinations of multiple processing methods, such as random rotation, resizing, mirroring, shearing, and flipping. As depicted in Figure 10, several augmentation techniques are being applied to the training set, including zooming (10 degrees), horizontal flip, and random rotations between [-45, +45] degrees. 4.2 Training After dividing the dataset during the training phase into training and validation sets (75:25), the validation set evaluates model performance improvement over time and selects the best parameter. Instead of generating random initial weights for all models, the advantages of transfer learning were relied upon, and the ImageNet pre trained weights were used as initial weights. This has significantly Related Parameters Models Vgg-19 Xception Cascaded CNN ResNet-34/DR Input Image Size 224 x 224 299 x 299 300 x 300 224 x 224 Batch Size 32 32 32 32 Learning Rate 1 × 10 −5 5 × 10 −5 1 × 10 −5 5 × 10 −5 Epochs 30 50 50 30 Table 4: Hyper-parameter values. Original Image 45-degree Rotation Image Zooming Horizontal Flip Figure 10: Our data augmentation techniques. (a)Vgg19 (b) Xception (a) Cascaded CNN Figure 11: Experimented CNN models’ performance during the training phase. (a)Training Accuracy (b) Training Loss Figure 12: ResNet-34/DR learning curves during a training phase (epoch 30). 122 Informatica 45 (2021) 115–124 N. M. Al-Moosawi et al. contributed to speeding up the training process because the imported models have sufficient knowledge of images. The image size entered for each model was resized to reduce training time and avoid depleting resources and fitting to the input layer in each model. All models used Adam optimizer [31] to update the initial weights iteratively using a training set so that the model can adapt to the current problem area (classification DR). The error was calculated using categorical cross-entropy. An early stopping strategy has been used to stop training if training accuracy has not improved for ten consecutive cycles or worse performance in the validation set. This strategy also aids in reducing model overfitting. The optimal weights are saved in case training is halted prematurely and the validation error does not improve. The preprocessing parameters α, β, γ, p was sequentially set to -4, 30, 4, 128. Each model was trained with a different number of epochs, batch sizes, and learning rates during the training process to achieve the optimal result. The final hyper-parameters of our experiments are being presented in Table 4. 5 Results 5.1 Validation analysis Several experiments were carried out in order to determine the solution with the best performance and to gain a better understanding of the performance of various networks. Plotting training curves for both training and validation data has enabled us to monitor the performance of the various networks during the training phase. Figure 11 clearly demonstrates the accuracy and loss performance of the three models (Vgg-19, Xception, cascaded CNN) when trained on the training and validation datasets. Our deep models were prone to overfitting, resulting in good training but poor validation performance. Over several epochs, models have reached their maximum degree of generalization, and validation loss has increased. In comparison, training loss continues to decrease over time. Vgg-19 networks required an excessive amount of time to train, limiting the number of training epochs. Xception was faster to implement than its competitors. Our experiments with various networks are described in detail below. As previously stated, all models have high loss value, and neither model can detect and learn valuable patterns. The cascaded model was the most prone to overfitting, and attempts were made to mitigate it using batch normalization and dropout. However, we observed that increasing the complexity of the model did not produce a satisfactory result. Xception and cascaded CNN models suffer from an overfitting problem in which the network’s complexity is insufficient to capture the critical features of the landmarks. The performance of the ResNet-34/DR model, which relies on the ResNet-34 architecture for feature extraction, was superior to that of the previous experiments as shown in Table 5 and Figure 12. As a result of the residual neural network’s advantage, the architecture improved the model by providing a deeper network depth with a lower error rate, which contributed to the network's ability to extract more accurate patterns. Obviously, the best results were obtained using. Therefore, we have selected it as our final model for the testing phase. 5.2 Testing & evaluation After obtaining the best model with the highest accuracy, ResNet-34/DR performance is being evaluated on the testing set (unseen data, constituting 10% of the entire dataset being used for this experiment) based on accuracy, sensitivity, specificity, precision, and F1 score as performance metrics (PMs). However, for a given number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN), the following formulas represent performance metrics mathematically: Accuracy = TP + TN TP + TN + FP + FN (6) Specificity = TN TN + FP (7) Models Accuracy Loss Vgg-19 83% 0.44 Xception 83% 0.43 Cascaded CNN 85% 0.45 ResNet-34/DR 95% 0.1 Table 5: Models’ performance comparison, with the best results, to be used for the testing phase, is being shown in bold format. Figure 13: The confusion matrix of ResNet-34/DR testing phase. Class (Type) Specificity (%) Precision (%) Sensitivity (%) F1 Score (%) Class 0 96.9 97 99.6 98 Class 2 100 100 88 93.6 Class 3 97.5 76.2 97 85.3 Class 4 99 95.2 95.2 95.2 Average 98.5 92.0 95.0 93.2 Table 6: ResNet-34/DR classification performance metrics. ResNet-34/DR: A Residual Convolutional Neural Network ... Informatica 45 (2021) 115–124 123 Precision= TP TP + FP (8) Sensitivity= TP TP + FN (9) F1 Score = 2 × Precision × Sensitivity Precision+ Sensitivity (10) We know that our data is unbalanced, as shown in Figure 8. Therefore, accuracy could be deceptive and does not necessarily reflect the model’s quality, as it already represents the correct evaluation of the total number of examples. In this case, precision, recall (i.e., sensitivity), and F1 score were chosen as the performance criteria for the model due to the high cost of false negatives and positives in medical diagnosis. F1 score is a combinational harmonic of the precision and recall metrics, describing the model’s capability to detect class defects [32]. Thus, both precision and recall metrics contribute equally to the generation of the F1 score. Based on the calculations performed using equations (7), (8), (9), and (10) above, the confusion matrix, which summarizes our testing results, is being shown in Figure 13. Sensitivity (equation 9) indicates the proportion of fundus images with diabetic retinopathy which the model has identified as infected. Our model’s average sensitivity rate was 0.95. This means that the model has correctly identified around 95% percent of infections in the testing data. Precision (equation 8) indicates the percentage of images identified as containing effects that include infected cases. For example, the value of 0.92 indicates that more than 92% of the images classified as infected are really infected. Finally, overall accuracy (equation 6) is 0.949, indicating that the model correctly recognizes more than 94% of all images (infected and uninfected). As shown in Table 6, ResNet- 34/DR has achieved average accuracy, sensitivity; precision, specificity, and F1 score rates of 94.9%, 95.0%, 92.0%, 98.5%, and 93.2%, respectively. 6 Discussion In this paper, the suitability and efficacy of our proposed work for the diagnosis of DR using fundus images were demonstrated. In comparison to other published research [12],[16],[20],[21] summarized in Table 1, our proposed model ResNet-34/DR achieved the highest accuracy 95%. Our proposed work emphasized diagnosing DR at multiple stages, which is critical for detecting DR early and avoiding progression and vision loss. Early stages diagnosis contributes to reducing the limitations of human error and allows monitoring the development of the disease, which helps doctors improve medical treatment. In contrast, previous studies [16] and [20] relied solely on DR diagnosis to be normal or abnormal. In addition to both studies in [12], [16] are constructed the CNN architecture from scratch. In contrast, our developed model relied on pre-trained TL models, which are more efficient and allow for the comparison of the performance of different networks to determine the best architecture for our problem. The author [21] paid little attention to data pre-processing, which could improve image quality and thus diagnostic accuracy, whereas our work evaluated these steps. 7 Conclusion This paper has proposed the new ResNet-34/DR architecture, based on a deep CNN with the utilization of transfer learning, which can effectively classify Diabetic Retinopathy into four classes from publicly available Kaggle datasets (APTOS 2019, IDRID). As initial attempts, two well-known architectures (Vgg-19 and Xception) were employed to classify DR stages. They suffered of high loss values due to overfitting, despite several attempts to reduce overfitting by adding dropout and batch normalization techniques and augmenting data. We believe the primary reason for this is that the data is highly skewed, with the vast majority of images falling within the healthy category. That is a significant impediment to networks extracting features, especially when the classes are few. Transfer learning and fine-tuning on the pre-trained ResNet-34 network have proven to be extremely effective for our color fundus image dataset, yielding optimal performance metrics. Preprocessing provided a significant improvement to the color contrast of the input image. Data augmentation aided in increasing the training samples, especially for lower classes. The training technique employed by us in this paper has achieved a relative advancement in DR classification results. As future work, we would look forward to compiling a dataset of our authentic images from Iraqi ophthalmologists. Additionally, we would like to build a new deep learning model from scratch and experiment it with modified pre-trained models. Finally, we might consider the utilization of different preprocessing techniques based on a semantic segmentation output that highlights the details of DR features and investigates how these changes affect the classification of DR stages, particularly, the early stages. References [1] W. H. O. and Others(2020), Diabetic retinopathy screening: a short guide. World Health Organization. Regional Office for Europe. [2] M. T. Islam, H. R. H. Al-Absi, E. A. Ruagh, and T. Alam(2021), “DiaNet: A Deep Learning Based Architecture to Diagnose Diabetes Using Retinal Images only,” IEEE Access, vol. 9, pp. 15686– 15695,doi: 10.1109/ACCESS.2021.3052477. [3] R. Sarki, K. Ahmed, H. Wang, and Y. Zhang(2020), “Automatic Detection of Diabetic Eye Disease through Deep Learning Using Fundus Images: A Survey,” IEEE Access, vol. 8, pp. 151133–151149, doi: 10.1109/ACCESS.2020.3015258. [4] J. J. Gómez-Valverde et al.(2019), “Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning,” Biomed. Opt. Express, vol. 10, no. 2, p. 892, doi: 10.1364/boe.10.000892. 124 Informatica 45 (2021) 115–124 N. M. Al-Moosawi et al. [5] X. Ma et al.(2021), “Understanding adversarial attacks on deep learning based medical image analysis systems,” Pattern Recognit., vol. 110, doi: 10.1016/j.patcog.2020.107332. [6] Feizi, A. (2019), "Convolutional gating network for object tracking." International Journal of Engineering 32.7: 931-939. [7] Sezavar, A., H. Farsi, and Sajad Mohamadzadeh(2019). "A modified grasshopper optimization algorithm combined with cnn for content based image retrieval." International Journal of Engineering 32.7: 924-930. [8] S. Long, J. Chen, A. Hu, H. Liu, Z. Chen, and D. Zheng(2020), “Microaneurysms detection in color fundus images using machine learning based on directional local contrast,” Biomed. Eng. Online, vol. 19, no. 1, pp. 1–23, doi: 10.1186/s12938-020- 00766-3. [9] Y. Tong, W. Lu, Y. Yu, and Y. Shen(2020), “Application of machine learning in ophthalmic imaging modalities,” Eye Vis., vol. 7, no. 1, pp. 1– 15, doi: 10.1186/s40662-020-00183-6. [10] Z. Wang and J. Yang(2017), “Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation,” [Online]. Available: http://arxiv.org/abs/1703.10757. [11] M. T. Hagos and S. Kant(2019), “Transfer learning based detection of Diabetic Retinopathy from small dataset,” arXiv,. [12] M. Shaban et al(2020)., “A convolutional neural network for the screening and staging of diabetic retinopathy,” PLoS One, vol. 15, no. 6 June, pp. 1– 13, doi: 10.1371/journal.pone.0233514. [13] I. Namatēvs(2018), “Deep Convolutional Neural Networks: Structure, Feature Extraction and Training,” Inf. Technol. Manag. Sci., vol. 20, no. 1, pp. 40–47, doi: 10.1515/itms-2017-0007. [14] M. Al-Smadi, M. Hammad, Q. B. Baker, and S. A. Al-Zboon(2021), “A transfer learning with deep neural network approach for diabetic retinopathy classification,” Int. J. Electr. Comput. Eng., vol. 11, no. 4, pp. 3492–3501, doi: 10.11591/ijece.v11i4.pp3492-3501. [15] B. K. Triwijoyo, B. S. Sabarguna, W. Budiharto, and E. Abdurachman(2020), "Deep learning approach for classification of eye diseases based on color fundus images". Elsevier Inc. [16] S. N. Chakravarthy, H. Singhal, and N. R. P. Yadav(2019), “DR-NET: A Stacked Convolutional Classifier Framework for Detection of Diabetic Retinopathy,” in Proceedings of the International Joint Conference on Neural Networks, vol. 2019- July, no. 1, doi: 10.1109/IJCNN.2019.8852011. [17] P. K. V. Warkar(2021), “A Survey on Multiclass Image Classification based on Inception-v3 Transfer Learning Model,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 9, no. 2, pp. 169–172, doi: 10.22214/ijraset.2021.33018. [18] M. A. Morid, A. Borjali, and G. Del Fiol(2021), “A scoping review of transfer learning research on medical image analysis using ImageNet,” Comput. Biol. Med., vol. 128, no. 408, 2021, doi: 10.1016/j.compbiomed.2020.104115. [19] A. K. Gangwar and V. Ravi(2021), Diabetic Retinopathy Detection Using Transfer Learning and Deep Learning, vol. 1176. Springer Singapore. [20] A. Jain, A. Jalui, J. Jasani, Y. Lahoti, and R. Karani(2019), “Deep Learning for Detection and Severity Classification of Diabetic Retinopathy,” Proc. 1st Int. Conf. Innov. Inf. Commun. Technol. ICIICT 2019, pp. 1–6, doi: 10.1109/ICIICT1.2019.8741456. [21] Lam, C., Yi, D., Guo, M., Lindsey(2018), "T.: Automated detection of diabetic retinopathy using deep learning". AMIA Summits Transl. Sci. Proc. 2017, 147 . [22] M. Gao, D. Qi, H. Mu, and J. Chen(2021), “A Transfer Residual Neural Network Based on ResNet-34 for Detection of Wood Knot Defects”. [23] S. Wu, S. Zhong, and Y. Liu(2017), “Deep residual learning for image Recognition,” Multimed. Tools Appl., pp. 1–17, doi: 10.1007/s11042-017-4440-4. [24] K. Simonyan, A. Zisserman(2014), “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556. [25] F. Chollet(2017), ‘‘Xception: Deep learning with depthwise separable convolutions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. , pp. 1251–1258. [26] Y. Chai, H. Liu, and J. Xu(2017), “Glaucoma diagnosis based on both hidden features and domain knowledge through deep learning models” Knowledge-Based Syst., vol. 161, no. December 2017, pp. 147–156, 2018, doi: 10.1016/j.knosys.2018.07.043. [27] “APTOS 2019 Blindness Detection.” [Online]. Available: https://www.kaggle.com/c/aptos2019- blindness-detection/. [28] P. Prasanna, P. Samiksha, K. Ravi, K. Manesh, D. Girish, S. Vivek, and M. Fabrice. (2018). Indian Diabetic Retinopathy Image Dataset (IDRiD), doi: 10.21227/H25W98. [29] colab.research.google.com/notebooks/ welcome.ipynb#. [30] B. Graham(2015), “Kaggle Diabetic Retinopathy Detection competition report,” pp. 1–9. [31] D. P. Kingma and J. L. Ba(2015), “ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION,” pp. 1–15. [32] Konovalenko, I., Maruschak, P., Brevus, V., & Prentkovskis, O. (2021). "Recognition of Scratches and Abrasions on Metal Surfaces Using a Classifier Based on a Convolutional Neural Network". Metals, 11(4), 549.