https://doi.org/10.31449/inf.v48i12.5762 Informatica 48 (2024) 41–54 41 Cocoa-Net: Performance Analysis on Classification of Cocoa Beans Using Structural Image Feature 1 Chandrajit Pal, 2 Samikshan Das, 3 Amitava Akuli, 4 Sudip Kumar Adhikari, 5 Aniruddha Dey * 1 Department of Electrical Engineering, IIT, Hyderabad, India 2 Capgemini, India 3 Centre for Development of Advanced Computing, Kolkata, India. 4 Department of Computer Science & Engineering, CGEC, Cooch Behar, India 5 Department of Computer Science & Engineering, MSIT, Kolkata, India. E-mail: palchandrajit@gmail.com, samikshandas5@gmail.com, amitava.akuli@gmail.com, sudipadhikari@ieee.org, anidey007@gmail.com * Corresponding author Keywords: KNN, decision tree, SVM, random Forest, Cocoa-Net, cocoa beans. Received: July 29, 2024 Abstract. The process of cocoa hybridization produces new types that have unique chemical properties impacting the manufacturing of chocolate yet are resistant to a number of plant illnesses. Image analysis is a valuable tool for visually differentiating cocoa beans where deep neural networks (DNN) play a pivotal role in implementing them. In this manuscript, we compare machine learning and deep learning models because it takes into consideration multiple images covering a wide range of agricultural products. Specifically, we extract features from images using a series of image processing techniques, following which we use both traditional machine learning methods (KNN, Decision tree, SVM, and Random Forest) and Convolutional Neural Networks (proposed Cocoa-Net and RESNET 50) to classify the cocoa beans into four categories: large, medium, small, and rejected. Since each methodology offers strong classification performance and has potential for use in the classification of food, they were all chosen. To test these methods, a dataset including 200 samples of fragmented images was utilized. Studies that compare various similar approaches are also carried out. Two optimization techniques: Univariate Selection and Feature Importance have been leveraged to optimize the retrieved features before the learning models are trained. The Adam optimizer is used to optimize the proposed Cocoa-Net model. K-fold cross validation is utilized to assess trained models, and mean cross validation scores are then computed for performance analysis. The empirical result shows that, the proposed Cocoa-Net model predicts with the highest classification accuracy score of 0.85. Povzetek: V raziskavi so razvili Cocoa-Net model za klasifikacijo kakavovih zrn, ki je dosegel najvišjo klasifikacijsko točnost 0,85. Uvedene tehnike optimizacije značilk so dodatno izboljšale zmogljivost modela v primerjavi s tradicionalnimi metodami strojnega učenja. 1 Introduction The cocoa bean seeds are found in the fruit pods of the Theobroma cocoa tree. Almost two thirds of the cocoa produced worldwide is grown in West Africa [1]. Ghana produces more than twenty percent of the world's total, making it the second largest producer in the world. To obtain the unique flavour and aroma of cocoa, raw cocoa must be fermented, dried, and roasted because it has an unpleasant, astringent flavour [2]. Following the picking of the cocoa beans, the cocoa pods are opened, allowing a variety of bacteria to naturally colonise the pulp surrounding the beans. These microorganisms convert the pulp's sugars into lactic acid and ethanol, which are then used to produce chocolate. A portion of the ethanol is converted into acetic acid by the acetic acid-producing bacteria through an exothermic process [3]. The mass is heated to about 50°C by the ethanol and acetic acid that are introduced into the test. The bean's germs and cell walls are destroyed by this heat. Start the procedures that yield beans with a high degree of fermentation [4]. Ghana is divided into two agro-ecological zones: the southern forest and the northern savannah. The savannah zone includes Sudan, Guinea, and the coastal areas. The semi deciduous, transitional, and rainforest zones are all found in the southern forest region [5]. The models were created using two regression techniques: partial least square regression (PLSR) and principal component regression (PCR) [6]. Ghosh et al proposed entropy- based feature extraction technique which can preserve the core data while reducing the volume of data being processed [7]. Feature extraction has also been applied in the field of smart farming application for Cocoa bean digital image classification prediction [8]. Nazir et al. investigates the procedure of the deep convolutional neural network for mispronunciation finding of Arabic phonemes [9]. In image processing, textural pattern is a crucial component. In order to incorporate the co- 42 Informatica 48 (2024) 41–54 C. Pal et al. occurrence matrix in texture analysis employed the sum and difference histogram technique to manipulate histograms for texture classification [10]. The distribution of the grayscale's spatial value is one of the factors that define a texture, so one of the techniques recommended in the numerous machine vision studies is the application of the Gaussian function [11]. By directly extracting significant features from the data in a multi- level abstract, deep learning (DL) increases the predictive power of machine learning [12]. A variety of agricultural products may find promising solutions acknowledgments to computer vision and machine learning (ML)'s predictive capabilities [13]. Pereira et al predicting the ripening of papaya fruit using digital imaging and random forests [14]. Tian et al. discussed the use of computer vision technology in agricultural automation [15]. In order to boost agricultural productivity, particularly in terms of quality and competitiveness, technology utilisation is required especially in terms of superiority and effectiveness [16]. The accessibility of technological advancements like deep learning and machine learning is also essential for enhancing farmer welfare and piquing the interest of the younger generation in developing diverse derivative business opportunities [17–19]. An example of information technology progress is smart farming which gives farmers the ability to exercise more reliable control. Srikanth et al. implemented ANN with 35 input nodes for the purpose of classifying four different classes of cocoa beans: i.e. whole, broken, fractions, and skin- damaged beans [19]. Numerous investigators have carried out the process of crop classification and grading systems identification [20, 21]. In order to meet quality control requirements, Dey et al. used image processing technology and SVM classifier [23, 24]. In supermarkets, fruit was categorized based on species class and price using deep learning techniques. In order to classify fruit in supermarkets according to species class and price applied deep learning techniques [25]. Furthermore, automation in agriculture has also made use of traditional machine learning techniques [26]. To help farmers measure the rate of fermentation and guarantee the quality of their cocoa beans, a study on the subject was carried out by Tan et al [27]. In order to provide the farmer with a higher-quality grade of beans and a more suitable payment, the processing factory can distinguish between good and regular beans during the classification process [28]. Real-time image processing can quickly reduce the amount of time it takes to fused image, and it can be processed for additional analysis [29]. Image fusion has a broad coverage area in conjunction with the adoption of machine learning to combine hybrid data [30]. Summarization of very recent machine learning based cocoa classification techniques has been presented in Table I. Table 1: Recent study (2021-2022) of summarized methods A high-quality control measure is the effectiveness attained when applying computer vision techniques to automation [38]. To train machine learning models for classification based on D-S theory, features are extracted as part of image processing [39, 40]. Images can be processed and evaluated using the proposed methods to provide the user with helpful information. The image is processed to extract structural features, such as size, shape, and texture. Features are optimized using two feature optimization techniques, namely Univariate Selection and Feature Importance, to eliminate the unnecessary and redundant features. The main contributions of the proposed work are: • Database Creation of Cocoa Beans image. • This study's primary goal is to ascertain whether characteristics of cocoa beans' size, shape, and texture can be used to assess their quality. • Four machine learning algorithms are used for classification in order to assess the quality of the cocoa bean. Based on the results of testing four Authors, Year, Reference Summery Das et al., 2022 [31] Machine vision approach for morphologically categorizing cocoa beans Tercan and Meisen, 2022 [32] A comprehensive review of predictive quality in manufacturing using machine learning and deep learning Kim et al., 2021 [33] Kim et al. present a deep learning-based framework for product quality inspection. Lopes et al., 2021 [34] To classify cocoa beans into different varieties, Lopes et al. compare two computer vision systems: a traditional Computer Vision System (CVS) and a Deep Computer Vision System (DCVS). Anggraini et al. 2021 [35] To differentiate between fermented and unfermented cocoa, machine learning model to make this determination. Abu et al. 2021 [36] Identifying cocoa plantations in Ghana and Cote d'Ivoire and the effects they have on protected areas Oliveira et al. 2021 [37] Quick and accurate classification of cocoa beans into four fermentation categories was achieved through the use of computer vision and Random Forest. Cocoa-Net: Performance Analysis on Classification… Informatica 48 (2024) 41–54 43 traditional classifiers (KNN, Decision Tree, SVM, Random Forest) and Convolutional Neural Networks (proposed Cocoa-Net and RESNET 50) on the cocoa bean test dataset, a comparative analysis has been conducted. The suggested Cocoa-Net model and ResNet50 calculates with the overall mean accuracy score of 0.85, 0.84 respectively The remaining portion of the manuscript is organised as follows. Section II delineates the materials and methods. Proposed CocoaNet architecture describes in Section III. The tentative results for cocoa bean databases can be found in Section IV. Finally, conclusion is included in Section V. 2 Materials and methods Our algorithm in this manuscript uses structural image features to discriminate between cocoa beans. The market is the source of Indian samples, and digital data or photos of cocoa beans are used for data collection. Beans are positioned on a white background prior to photo capture. For best results, use 25 beans per image [31]. Using a digital image capture setup, pictures of the cocoa beans are taken. A digital colour camera and a controlled lighting system are housed inside a closed cabinet to form the image capture setup. The e-Cocoa Vision system is designed to grade cocoa samples according to predetermined criteria after acquiring images from an input device for analysis [31]. The system configuration for taking pictures of cocoa beans is shown in Figure 1. A transportable system for taking pictures has been created by placing twenty LEDs evenly spaced across the cabinet's roof. There is a Logitech C920 webcam to take the picture [31]. The cabinet is painted black to prevent unwanted reflections and is constructed of aluminium sheets. For the study, digital photos of the cocoa beans were acquired from South Sulawesi, Indonesia. Based on [36], the sampling procedure was used. The following categories applied to these samples of cocoa beans: (i) Whole beans are defined as cocoa beans that have a whole seed skin covering every part of the bean and do not show any fractures; (ii) broken beans are defined as cocoa beans that have a missing part that is half (1/2) or less than the full bean; (iii) beans fractions are defined as cocoa beans that are less than half (1/2) of the full bean; (iv) skin-damaged beans are defined as beans that have a missing bean shell that is half or lower size than the full bean; (v) fermented beans, a type of cocoa bean that is the end result of the curing process and is cleaned or left unwashed before being dried; (vi) unfermented beans, a type of cocoa bean in which half or more of the sliced greyish chips' surface is visible, while the surface is dirty white; (vii) moldy beans, a type of cocoa bean that has mound inside of it, and when the bean is split exposed, the fungus can visible with the naked eye. Figure 1: System structure for Cocoa Beans image capturing [31] The ultimate objective of gathering these digital photos of cocoa beans at the factory was to lessen the amount of classification work that needs to be done there. The acquisition of the cocoa bean image was made possible by a small digital camera [31], as represented in Figure 2. Figure 2: Cocoa beans on white paper [31] Four classes of cocoa beans are included in the digital images; three classes consist of whole beans and are classified as (i) large bean, (ii) medium bean, and (iii) small bean. The remaining beans are classified as (iv) rejected beans that are fragmented. For the experiment, pictures of 220 beans were captured of the 220 images, 30% were used for testing and 70% were taken for model training [31]. Fig 3 displays the workflow diagram for the system using proposed Cocoa-Net model and machine learning models. 2.1 Data Pre-processing After data collection, data processing is required to improve image quality by removing background. The actions listed below are taken. • Gray image conversion: We are utilising an RGB image with 24 bits. The RGB image has been converted to an 8-bit grayscale image. Grayscale image analysis aids in the removal of the white background [31]. • Image Segmentation: For image thresholding, a global thresholding method utilising OTSU has been applied. After using the thresholding technique, the output image produces a binary image [31]. • Smoothing with Gaussian filter: A Gaussian smoothing filter with a kernel size of three has been used to apply the smoothing technique. This aids in removing the image's high frequency noise [31]. • Object Identification: The erosion method is used to locate and eliminate tiny particles that are close to the 44 Informatica 48 (2024) 41–54 C. Pal et al. boundaries of the image. Lastly, the area of the particles in the image is used to identify the objects [31]. • Particle Analysis: Finally, a set of 23 pertinent image features are extracted from the images using particle analysis by selecting all pixel measurements [31]. Figure 3: Workflow diagram of the proposed method For the convolution neural network model, the image dataset is prepared by separating the images of different classes in different folders with their respective class names. To enhance the number of images data augmentation is done using ‘Image Data Generator’ class available is Keras library. It also minimizes the chances of over fitting. After augmentation total 900 images comprises of all four classes were generated with rotation range=45, width shift range=0.2, height shift range=0.2 and enabled horizontal flip. 2.2 Feature extraction A set of twenty-three image features—including area, convex hull area, Max Feret diameter, equivalent ellipse major axis, equivalent ellipse minor axis, equivalent rectangle long side, equivalent rectangle short side, equivalent rectangle diagonal, hydraulic radius, Elongation factor, compactness factor, Heywood circularity factor, and seven HU moment features—are extracted from the images in order to train the proposed models [31]. 2.3 Feature optimization In most cases, not every independent variable in the dataset has the same amount of influence on the dependent feature when it comes to machine learning models [31]. Certain aspects could not have much of an effect. In order to enhance machine learning models, superfluous features are removed using feature optimisation [7]. It keeps accuracy intact while cutting down on model complexity and training time. Univariate analysis and feature importance are the two feature optimisation strategies used in this study. 2.4 Feature scaling In our approach we have implemented the feature scaling algorithm over the cocoa bean’s images [31]. Let 𝑚𝑖𝑛 be the minimum pixel value of the image matrix𝐼 𝑚 𝑖𝑛 Now, subtract 𝐼 𝑚𝑖𝑛 from each pixel of the 𝐼 . Similarly, let max is the maximum pixel value of the 𝐼 𝑚𝑎𝑥 image matrix. We define an 𝐼𝑚𝑎𝑔𝑒𝐹𝑎𝑐𝑡𝑜𝑟 (𝐼 𝑓 ) as follows: 𝐼𝑚𝑎𝑔𝑒𝐹𝑎𝑐𝑡𝑜𝑟 (𝐼 𝑓 ) = (𝐼 – 𝐼 𝑚𝑖𝑛 ) (𝐼 𝑚𝑎𝑥 – 𝐼 𝑚𝑖𝑛 ) (1) Now, each pixel value of the 𝐼 𝑆 image matrix is updated by dividing with the 𝐼𝑚𝑎𝑔𝑒𝐹𝑎𝑐𝑡𝑜𝑟 (𝐼 𝑓 ). The new 𝐼 𝑆 values are defined as follows: 𝐼𝑚𝑎𝑔𝑒𝐹𝑎𝑐𝑡𝑜𝑟 (𝐼 |𝑆 ) = 𝐼 𝑆 𝐼 𝑓 ⁄ (2) 3 Details of proposed Cocoa-Net In this section, we'll go over the details of the proposed Cocoa-Net model, a Deep Learning-based technology with the potential for excellent accuracy in the field of Cocoa Beans recognition. Cocoa-Net: Performance Analysis on Classification… Informatica 48 (2024) 41–54 45 Cocoa-Net utilizes a series of convolutional layers followed by pooling layers, flattens layers, fully connected layers, and an output layer. The first convolution layer uses 6 filters, and subsequent layers use 16, 64, and more filters with strides and batch normalization. The architecture employs Max Pooling, ReLU activation functions, and a final SoftMax activation function for classification. Whereas Alexnet Consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers at the end. The first convolutional layer uses 96 filters, with subsequent layers increasing the number of filters up to 384 and 256. Cocoa-Net uses varying kernel sizes and filters specifically tuned for detecting features relevant to cocoa beans, such as edge detection and curved features. Each of CNN's layers are responsible for a distinct function. The proposed Cocoa-Net structure contains convolution layer, pooling layer, flatten layer, fully connected layer and Output layer. Figure 4 displays the proposed Cocoa-Net structure with input and output image shape representations. a. Convolutional layer: This layer performs a dot ⊙ product between the weights and small patches of the input data to produce a feature map. The layer is called a convolutional layer because it performs a convolution operation on the input data. For example, the first convolution layer uses a 5×5 kernel with 6 filters, while subsequent layers adjust the kernel and filter sizes to optimize feature extraction. Cocoa-Net is tailored for cocoa bean recognition with specific architectural choices, filter configurations, and optimization techniques, whereas AlexNet is a general- purpose CNN model known for its broader application in image recognition tasks. In the subsequent second convolution layer, a 3 × 3 kernel, 16 filters in total, and one stride are utilised. We resemble to the maximum pooling layer with a 2 × 2 kernel size. In the final convolution layer, a 3 × 3 kernel, 64 kernels in total, one stride, batch normalisation, and dropout are utilised. b. Pooling layer: This layer down samples the feature map by taking the maximum or average value of a set of adjacent values. Pooling helps to reduce the spatial size of the data, which reduce the computational cost and helps to reduce over fitting. Following the Convolutional layer, the Pooling Layer’s operation is performed. Pooling decreases the quantity of information in each feature retrieved from the layer above while preserving the most essential data. Pooling layer reduces the dimensionality. Here, Max Pooling was utilised. Here, we have taken a 2 × 2 filter and a stride of length 2 is used. c. ReLU layer: This layer applies the Rectified Linear Unit (ReLU) activation function to the output of the previous layer. The ReLU function replaces all negative values in the output with zeros, allowing the network to model non-linear relationships between the input and output. The ReLU activation function is used to introduce non-linearity to a DNN model. In neural networks, particularly convolutional neural networks (CNNs) and multilayer perceptions, it is the most widely employed activation function. d. Fully Connected layer: This layer is used to perform classification or regression tasks. The fully connected layer takes the output of the previous layer, flattens it into a vector, and then multiplies it by a weight matrix to produce the final output. The experiment employed a batch size of eight, a learning rate of 0.001, a training epoch of fifty, and a cross entropy loss function. After each convolution and pooling layer, the ReLU activation function is utilised, whereas the SoftMax activation function is used at the output layer. The image is resized as 48 pixels by 48 pixels. The layers are sequentially stacked along with the size of the output metrics after each layer is shown in Figure 4. Cocoa-Net case study utilising loss function Categorical Sparse Cross Entropy illustrated in Table II. Cocoa-Net is designed with an optimized number of layers and filters, allowing it to extract significant features from the dataset more efficiently compared to ResNet-50's deeper architecture. Cooa-Net is specifically tailored for the cocoa bean classification task. Its layers and filters are optimized to capture the nuances of cocoa bean images, which might not be as efficiently captured by the more general-purpose ResNet-50. The training techniques used for Cocoa-Net, including the choice of optimizers like Adam, hyper parameters, and data augmentation strategies, contribute to its superior performance. The use of stratified K-fold cross- validation ensures that the model is well-validated and reduces the risk of over fitting. These factors collectively enable Cooa-Net to outperform ResNet-50 in the specific task of cocoa bean classification, despite its relatively simpler architecture. Descriptions of proposed Cocoa- Net parameters are given in Table III. 46 Informatica 48 (2024) 41–54 C. Pal et al. Figure 4: Proposed Cocoa-Net architecture Table 2: Case study Cocoa-Net with loss function sparse categorical cross entropy Justification for kernel size: The initial layer is designed to capture low-level features such as edges and textures. Using a larger 5×5 kernel helps in detecting these fundamental features over a slightly broader area of the image, which is crucial for effective feature extraction from the outset. The smaller 3×3 kernels are used in subsequent layers to capture more detailed and localized features. This size is commonly used in CNNs because it allows the network to learn intricate patterns and hierarchical features while keeping the computational load manageable. Smaller kernels are effective in maintaining spatial resolution and are sufficient to detect complex features when combined in deeper layers. Justification for number of filters: Starting with a small number of filters helps in learning basic features without overwhelming the model with parameters. This approach ensures the model captures essential features first before diving into more complex patterns. The exponential increase in the number of filters as the network goes deeper (16 in the second layer to 64 in the third) is a strategic design choice. This progression allows the network to gradually learn more complex and higher-level features from the No. of Conv Layer Optimizer Filter Distribution (No. Of filters in each layer) Epoch Test Accu. Test Loss 4 RMSprop (16,32,64,128,256) 50 0.76 1.4 4 adam (16,32,64,128,256) 30 0.84 0.5 4 adam (16,32,64,128,256) 50 0.82 0.98 4 adadelta (16,32,64,128,256) 30 0.38 1.35 4 adam (16,32,64,128,512) 30 0.81 0.68 4 adam (16,32,64,128,512) 40 0.83 0.88 5 adam (16,32,64,128,256,512) 30 0.86 0.44 5 adam (16,32,64,128,256,512) 40 0.84 0.44 5 adam (16,32,64,128,256,512) 50 0.86 0.63 5 adagrad (16,32,64,128,256,512) 30 0.28 1.44 5 adamax (16,32,64,128,256,512) 30 0.84 0.48 5 adamax (16,32,64,128,256,512) 40 0.86 0.50 5 adamax (16,32,64,128,256,512) 50 0.86 0.42 5 Nadam (16,32,64,128,256,512) 30 0.81 0.78 5 Nadam (16,32,64,128,256,512) 40 0.7 0.92 5 Nadam (16,32,64,128,256,512) 50 0.76 0.89 Cocoa-Net: Performance Analysis on Classification… Informatica 48 (2024) 41–54 47 input data. Each subsequent layer can detect more sophisticated patterns and combinations of features detected by the previous layers, enhancing the model's ability to capture fine-grained details necessary for accurate classification. In this study the ‘Sequential’ class available in Keras library is used to build the proposed Cocoa-Net model with five convolutional layers consisting increasing number of filters of size (3, 3) as the model goes deeper. Number of filters in these five consecutive layers is 16, 32, 64, 128 and 256 respectively. Activation function used at each convolutional layer is ReLU to prevent the exponential growth in the computation required to operate the neural network. It also prevents the chance of vanishing gradient or exploding gradient that lies while using sigmoid activation function. After that a flatten, layer is placed to convert the two-dimensional resultant metrices from pooled feature map to a single continuous one-dimensional vector for transition from the convolutional layer to fully connected layer. After that two fully connected dense layers are placed. The first one consists of 512 neurons along with ReLU as activation function and the second dense layer which is also the output layer consists of 4 neurons representing the four output classes for our model. Activation function used at the output layer is softmax [25] for multinomial probability distribution of the output for four classes. Different optimization algorithms like- Adaptive Momentum (Adam), Root Mean Square Propagation (RMS Prop), Adaptive Gradiant optimizer (Adagrad), Adamax optimizer and Nadam optimizer are tested for optimizing the model where each time with different hyper parameters Adam optimizer results with maximum accuracy. The training images and class labels are fitted to the model with epoch value set to 30. These hyper parameters are measured after training the model with different sets of hyper parameters and evaluating the model each time using Accuracy metrics and Sparse Categorical Cross Entropy loss function. The maximum accuracy score achieved by the CNN model is 0.86 while the loss is 0.44. For this experiment, the default optimizer is Adam. Adam is the finest alternative for the first training of deep learning networks. The subsequent layer is the flatten layer, which takes the output of the preceding layers and flattens it into a single vector that may be used as input for the subsequent stage. The objective of the Flatten layer is to reduce the matrix to a vector with a single dimension. Fully connected layer provides the final probability associated with each class. The 1D data used as the input to the neurons of this layer, which execute a dot product of this input data and the neuron weights to generate a single probability value per neuron. The likelihood of each emotion is estimated after applying the softmax function. When all probability values are compared, the one with the highest probability is considered as the final emotion for the supplied input. We used dropout and regularization strategies in our experiment to deal with the limited size of the datasets. Figure5 describes the model summary. Table 3: Description of proposed Cocoa-Net architecture. Layer Name Network parameter Training parameter Input layer: Image --------------------------------------- Loss function: categorical cross entropy, Optimizer: Adam, Number of Epochs:50, Learning rate: 0.001 Convolution Kernel size= 5×5, Number of kernels=6, Stride=1×1, Padding=Same, Activation function= ReLU Pooling Pool size =2×2, Stride= 1×1, Padding=Same, Pool technique= Max pool Convolution Kernel size= 5×5, Number of kernels =16, Stride=1×1, Padding=Same, Activation function=ReLU Pooling Pool size =2×2, Stride= 1×1, Padding=Same, Pool technique=Max pool Convolution Kernel size= 3×3, Number of kernels= 64, Stride=1×1, Padding=Same, Activation function=ReLU Pooling Pool size =2×2, Stride= 1×1, Padding= Same, Pool technique= Max pool Flatten ----------------------------------------- Fully Connected layer Number of neurons=128 Output layer Number of neurons=7, Activation function=SoftMax 48 Informatica 48 (2024) 41–54 C. Pal et al. 4 Simulation results and discussion The performance of the proposed method has been tested on the cocoa beans testing dataset. The evaluation metrics used for evaluating the models is 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑠𝑐𝑜𝑟𝑒 . It is the sum of True Negative and True Positive divided by the sum of True Negative, True Positive, False Positive and False Negative. Here, formula defined as follows: 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑠𝑐𝑜𝑟𝑒 = (𝑇 𝑃 + 𝑇 𝑁 ) (𝑇 𝑃 + 𝑇 𝑁 + 𝐹 𝑃 + 𝐹 𝑁 ) ⁄ (7) 𝐹𝛽𝑠𝑐𝑜𝑟𝑒 = (1 + 𝛽 2 ) × (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙 ) (𝛽 2 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 ) ⁄ (8) The ratio of correctly predicted positive observations to all predicted positive observations is known as precision. The ratio of all observations in the actual positive class to the correctly predicted positive observation is known as recall. F1 score is equal to Fβ score for β = 1. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇 𝑃 (𝑇 𝑃 + 𝐹 𝑃 ) ⁄ (9) 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇 𝑃 (𝑇 𝑃 + 𝐹 𝑁 ) ⁄ (10) Two classification algorithms used for distance- based algorithms are K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) [23, 24]. Cases are categorized by KNN according to their similarity, which is determined by a distance matrix. "Neighbours" are cases that are close to one another. The most widely used class label or the class label with the majority value from its neighbours is taken into consideration when predicting classes for unknown data points. However, SVM is effective at managing the non-linearity of the dataset by converting the data into a higher dimensional space. Two well-liked algorithms for tree-based algorithms are Random Forest classifier and Decision Tree classifier. The decision rules are represented by the branches of the decision tree classifier, the sample dataset's features are represented by the internal nodes, and the final output, or class labels are represented by the leaf nodes. By reducing the impurity at each stage, it divides the training records into segments using recursive partitioning. However, when a decision tree is fully developed, it has low bias, indicating that the model is overfit to the training dataset, and high variance, indicating that the model is likely to produce a high number of errors when working with fresh test data. Instead of using a single decision tree, the Random Forest Classifier considers multiple high variance decision trees that are generated from subsets of the main dataset. The high variance is then converted to low variance by combining the trees according to a majority vote. Furthermore, since we are randomly sampling the rows and columns, any changes we make to the model will affect all of the decision trees equally, so adding new or changing existing data won't have a significant impact. Entropy is the criterion function chosen for both of these tree-based algorithms in order to choose the internal or root nodes at various decision tree levels. Figure 5: Our proposed Cocoa-Net model summary Cocoa-Net: Performance Analysis on Classification… Informatica 48 (2024) 41–54 49 Finding the tree whose nodes have the least amount of entropy is the aim. The maximum depth of the decision tree is therefore determined to be 4 after attempting a range of values from 1 to 10 in order to achieve the maximum accuracy score of 0.74 and F1 score of 0.73. The criterion function chosen for splitting in the Decision Tree model is "entropy" with the "best" splitter strategy. To obtain the maximum accuracy score of 0.74 and the F1 score of 0.71, the random forest classifier is trained using the same criterion function for 150 decision trees with a maximum depth of 4. Table IV describes the stratified K-fold cross validation performance evaluation of the KNN, SVM, Decision Tree, Random Forest, proposed Cocoa-Net, and ResNet50 algorithms. Table V defines performance evaluation of the KNN, SVM, Decision Tree, Random Forest, proposed Cocoa-Net, and ResNet50 algorithms. The training dataset for the KNN classification model has a K value between 1 and 20. Additionally, based on the testing dataset, it is found that 10 is the optimal minimum value for K, for which the model predicts with a maximum accuracy score of 0.76 and an F1 score of 0.71. Cocoa-Net outperforms for following reason: • Cocoa-Net employs a series of convolutional and pooling layers to effectively extract features from the input images. Each convolutional layer in Cocoa-Net applies filters to detect different features, such as edges, textures, and shapes, which are crucial for accurate classification of cocoa beans. The network's structure allows it to capture fine-grained details and hierarchical features, enhancing its ability to differentiate between different types of cocoa beans. • The first convolutional layer uses a 5×5 kernel with six filters, which helps in capturing basic features like edges. This larger kernel size at the beginning allows for a broader receptive field, enabling the network to gather more contextual information from the initial layers. • Following layers use 3×3 kernels, which are standard in many state-of-the-art CNN architectures, as they balance the trade-off between computational efficiency and the ability to capture fine details. The increase in the number of filters from 16 in the second convolutional layer to 64 in the third convolutional layer is exponential, allowing the network to learn more complex features as the depth increases. • The use of pooling layers with a stride of 2 helps in down-sampling the feature maps, reducing the spatial dimensions while preserving the most significant features. This process reduces the computational load and helps in mitigating overfitting by providing a form of translational invariance. By analyzing the design choices and the performance metrics of Cocoa-Net, it is clear that the model's architecture is tailored to efficiently extract relevant features with fewer layers, resulting in superior performance compared to deeper networks like ResNet- 50. The careful selection of kernel sizes, the exponential increase in the number of filters, and the strategic use of pooling layers contribute to the model's ability to outperform other models while maintaining computational efficiency. Figure 6 describes (proposed Cocoa-Net and ResNet50) training and validation performance in terms of iteration on the cocoa beans datasets, respectively. Figure 7 shows the classification reports that were produced for each of the five classification models. Since the dataset is not perfectly balanced, stratified K-fold cross validation with 10 folds is ultimately used to evaluate the classification models. It guarantees that the original data, training data, and testing data all have the same percentage of target features for the various classes Cocoa-Net is crucial for understanding the down- sampling process: • In the absence of specific information, it is generally assumed that the stride for convolutional layers is set to 1. This is a common practice in many CNN architectures where detailed stride information is not specified. The assumption helps maintain the resolution of feature maps until pooling layers are applied for down-sampling. • Stride lengths in convolutional layers directly affect the spatial dimensions of the output feature maps. A stride of 1 ensures that the feature maps retain their spatial dimensions, while a larger stride reduces the dimensions more aggressively. This control over dimensions can be crucial for capturing fine details in the earlier layers and gradually abstracting information in deeper layers. • Using a stride of 1 in convolutional layers helps balance the computational load by ensuring that down-sampling is primarily handled by pooling layers. This strategy allows the network to learn detailed spatial hierarchies before significant dimensionality reduction occurs, potentially improving the feature learning process. • Maintaining a stride of 1 in convolutional layers before applying pooling operations with a larger stride (e.g., stride of 2) helps in retaining more detailed features. This approach is beneficial for models like Cooa-Net, which aims to outperform deeper networks like ResNet-50 with a more efficient architecture. 50 Informatica 48 (2024) 41–54 C. Pal et al. Table 4: Performance evaluation of K folds cross validation of KNN, SVM, Decision Tree, Random Forest, proposed Cocoa-Net, and ResNet50. Table 5: Performance evaluation of KNN, SVM, Decision Tree, Random Forest, proposed Cocoa-Net and ResNet50. Figure 6: Training and validation performances analysis using cocoa bean dataset a) Proposed Cocoa-Net, b) ResNet50 images. Fold KNN SVM Decision Tree Random Forest Proposed Cocoa-Net ResNet50 1 0.727 0.727 0.682 0.773 0.708 0.705 2 0.591 0.682 0.727 0.682 0.971 0.931 3 0.727 0.773 0.727 0.773 0.883 0.863 4 0.682 0.773 0.682 0.727 0.783 0.783 5 0.636 0.591 0.636 0.682 0.723 0.723 6 0.773 0.773 0.773 0.818 0.758 0.748 7 0.773 0.773 0.727 0.773 0.767 0.767 8 0.864 0.864 0.818 0.818 0.792 0.792 9 0.591 0.591 0.682 0.636 0.967 0.947 10 0.727 0.727 0.773 0.772 0.917 0.907 Max 0.861 0.861 0.821 0.821 0.971 0.947 Min 0.591 0.591 0.640 0.640 0.708 0.705 Mean 0.713 0.731 0.721 0.751 0.832 0.812 Method Machine Learning Techniques Deep Learing KNN SVM Decision Tree Random Forest Proposed Cocoa-Net ResNet50 Accuracy 0.760 0.730 0.740 0.740 0.850 0.840 Precision 0.568 0.788 0.740 0.850 0.852 0.841 Recall 0.585 0.610 0.600 0.580 0.850 0.841 F Score 0.570 0.645 0.640 0.610 0.850 0.835 (b) (a) Cocoa-Net: Performance Analysis on Classification… Informatica 48 (2024) 41–54 51 Figure7: Classification accuracy Score of (KNN, SVM, Decision Tree, Random Forest [31]) Proposed Cocoa-Net 5 Conclusion Pattern recognition is a complex task due to the numerous aspects of the image that must be examined in order to achieve precise results. The four conventional machine learning algorithms KNN, SVM, Decision Tree, Random Forest as well as our proposed Cocoa-Net algorithm and ResNet50 for cocoa bean classification were compared in this study. Through the visualization of key image regions, the significance of the extracted features was examined in order to offer insights. The accuracy range for proposed Cocoa-Net proved to be better, ranging from 0.72 to 0.97 with a loss value in the range of 0.59 to 0.88. The resulting accuracy and F1 scores obtained by the four ML models are in ranges between 0.71 to 0.75 and 0.68 to 0.73, respectively. The Random Forest Classifier has the highest mean accuracy score of 0.75 according to the K-fold cross validation results. The proposed Cocoa-Net model and ResNet50 predicts with the overall mean accuracy score of 0.85, 0.84 respectively. Proposed Cocoa-Net contributes to the development of solutions through the visualization of techniques, by offering pertinent information for future research based on comprehensive learning (applied to the food industry) algorithms. As a result, the Cocoa-Net approach may be applied as a quick and impartial way to distinguish among various types of cocoa beans in the food business. Additionally, the food industry can enhance supply chain product tracking by utilizing visualization techniques. Conflict of interest The authors declare that they have no conflict of interest. Data availability All data analysed are included in this paper. 52 Informatica 48 (2024) 41–54 C. Pal et al. References [1] R. Essah, D. Anand, and S. Singh (2022) An intelligent cocoa quality testing framework based on deep learning techniques. Measurement: Sensors 24: 100466. https://doi.org/10.1016/j.measen.2022.100466 [2] R. Hayati, Z. Zulfahrizal, and A. A. Munawar (2021) Robust prediction performance of inner quality attributes in intact cocoa beans using near infrared spectroscopy and multivariate analysis. Heliyon 7: (2): 1-7. https://doi.org/10.1016/j.heliyon.2021.e06286 [3] M. S. Farooq, S. Riaz, A. Abid, K. Abid, and M. A. Naeem (2019) A survey on the role of IoT in agriculture for the implementation of smart farming. IEEE Access 7: 156237–156271. https://doi.org/ 10.1109/ACCESS.2019.2949703 [4] C. Yoon, M. Huh, S. G. Kang, J. Park, and C. Lee (2018) Implement smart farm with IoT technology. in: 20th International Conference on Advanced Communication Technology (ICACT), pp. 749–752. https://doi.org/ 10.23919/ICACT.2018.8323908 [5] I. Abdulai, P. Vaast, M. P. Hoffmann et al. (2018) Cocoa agroforestry is less resilient to sub-optimal and extreme climate than cocoa in full sun. Global Change Biol. 24 (1): 273–286. https://doi.org/ 10.1111/gcb.13885 [6] D. N. de Oliveira, A. C. B. Camargo, C. F. O. R. Melo et al. (2018) A fast semiquantitative screening for cocoa content in chocolates using MALDI-MSI. Food Res. Int. 103: 8- 11. https://doi.org/10.1016/j.foodres.2017.10.035. [7] M. Ghosh, and A. Dey (2023) Fractional-weighted Entropy-based Fuzzy G-2DLDA Algorithm: A New Facial Feature Extraction method. Mutimedia Tools and Applications, 82 (2): 2689–2707. https://doi.org/10.1007/s11042-022-13328-7. [8] Y. Adhitya, S. W. Prakosa, M. Köppen et al. (2020) Feature Extraction for Cocoa Bean Digital Image Classification Prediction for Smart Farming Application. Agronomy, 10 (11): 1642. https://doi.org/10.3390/agronomy10111642 [9] F. Nazir, M. N. Majeed, M. A. Ghazanfar et al. (2019) Mispronunciation detection using deep convolutional neural network features and transfer learning—based model for Arabic phonemes. IEEE Access 7, 52589- 52608. https://doi.org/ 10.1109/ACCESS.2019.2912648 [10] M. Mukhopadhyay, A. Dey, A. Ghosh et al. (2022) Facial emotion recognition based on Textural pattern and Histogram of Oriented Gradient. Proceeding of the ICACIS 2022, pp- 111-119. https://doi.org/10.1007/978-3-031-25088-0_9. [11] J. K. Sing, A. Dey, M. Ghosh (2019) Confidence Factor Weighted Gaussian Function Induced Parallel Fuzzy Rank Level Fusion for Inference and its Application to Face Recognition. Information Fusion 47: 60- 71. https://doi.org/10.1016/j.inffus.2018.07.005 [12] A. Z. da Costa, H. E. Figueroa, and J. A. Fracarolli (2020) Computer vision-based detection of external defects on tomatoes using deep learning. Biosyst Eng., 190: 131–144. https://doi.org/10.1016/j.biosystemseng.2019.12.003 [13] A. Bhargava and A. Bansal (2020) Quality evaluation of mono & bi-colored apples with computer vision and multispectral imaging. Multimedia Tools and Applications, 79: 7857–7874, https://doi.org/10.1007/s11042-019-08564-3 [14] L. F. S. Pereira, S. Barbon, N A. Valous et al. (2018) Predicting the ripening of papaya fruit with digital imaging and random forests. Comput Electron Agric 145: 76–82. https://doi.org/10.1016/j.compag.2017.12.029 [15] H. Tian, T. Wang, Y. Liu et al. (2020) Computer vision technology in agricultural automation—a review. Inf Process Agric 7(1):1–19, https://doi.org/10.1016/j.inpa.2019.09.006. [16] S. Navulur, A.S.C.S Sastry and M N G. Prasad (2017) Agricultural management through wireless sensors and Internet of Things. Int. J. Electr. Comput. Eng. 7: 3492–3499. http://doi.org/10.11591/ijece.v7i6.pp3492-3499 [17] S. K. Behera, A. K. Rath, A. Mahapatra et al. (2020) Identification, classification & grading of fruits using machine learning & computer intelligence: A review. J. Ambient Intell. Human. Comput., 11: 1- 11. https://doi.org/10.1007/s12652-020-01865-8 [18] K. G. Liakos, P. Busato, D. Moshou et al. (2018) Machine Learning in Agriculture: A Review. Sensors, 18(8), 2674. https://doi.org/10.3390/s18082674 [19] V. Srikanth, G. K. Rajesh, A. Kothakota et al. (2020) Modeling and optimization of developed cocoa beans extractor parameters using box behnken design and artificial neural network. Computers and Electronics in Agriculture, 177: 105715. https://doi.org/10.1016/j.compag.2020.105715 [20] O. Saha Mandal, A. Dey, A. Ghosh, and R. N. Shaw (2022) Fruit-Net: Fruits recognition system using Convolution Neural Network. Proceeding of the ICACIS 2022, pp- 120-133. https://doi.org/10.1007/978-3-031-25088-0_10 [21] H. S. Gill and B S. Khehra (2021) Hybrid classifier model for fruit classification. Multimed Tools Appl 80: 27495–27530. https://doi.org/10.1007/s11042- 021-10772-9 [22] G. Ashiagbor, O. A. Asare-Ansah, E. Boakye Amoah et al. (2023) Assessment of machine learning Cocoa-Net: Performance Analysis on Classification… Informatica 48 (2024) 41–54 53 classifiers in mapping the cocoa-forest mosaic landscape of Ghana. Scientific African, 20, e01718. https://doi.org/10.1016/j.sciaf.2023.e01718 [23] A. Dey, S. Chowdhury, and M. Ghosh (2017) Face Recognition using Ensemble Support Vector Machine. Proceeding of the ICRCICN 2017, pp. 46- 50. https://doi.org/10.1109/ICRCICN.2017.8234479 [24] A. Dey, and S. Chowdhury (2020) Probabilistic Weighted induced Multi-Class Support Vector Machines for Face Recognition. Informatica Si, 44 (4): 345- 353. https://doi.org/10.31449/inf.v44i4.3142 [25] M. S Hossain, Al-Hammadi M, and Muhammad G. (2018) Automatic Fruits Classification Using Deep Learning for Industrial Applications. IEEE Trans. Ind. Inform. 15: 1027–1034, https://doi.org/ 10.1109/TII.2018.2875149. [26] B. Dhiman, Y. Kumar, and M. Kumar (2022) Fruit quality evaluation using machine learning techniques: review, motivation and future perspectives. Multimedia Tools and Applications, 81(12): 16255- 16277. https://doi.org/10.1007/s11042-022-12652-2 [27] J. Tan, B. Balasubramanian, D. A. Sukha et al. (2019) Sensing fermentation degree of cocoa (Theobroma cacao L.) beans by machine learning classification models based electronic nose system. J. Food Process Eng. 42 (4): e13175. https://doi.org/ 10.1111/jfpe.13175 [28] J. Cruz-Tirado, J. A. Fernandez Pierna, H. Rogez et al. (2020) Authentication of cocoa (theobroma cacao) bean hybrids by nir-hyperspectral imaging and chemometrics. Food Control 118, 107445. https://doi.org/ 10.1016/j.foodcont.2020.107445 [29] A. Dey, S. Chowdhury, and J. K. Sing, Performance Evaluation on Image Fusion Techniques for face recognition, (2018) International Journal Computational Vision and Robotics. Vol. 8, No. 5, pp. 455- 475, https://doi.org/ 10.1504/IJCVR.2018.095000 [30] A. Dey, S. Chowdhury, M. Ghosh, S. Kahali (2023) T2-Fuzzy Multi-Fused Facial Image Fusion (T2FMIF): An Efficient Face Recognition, Journal of Intelligent & fuzzy system, Vol. 45, No. 1, pp.743-761. https://doi.org/10.3233/JIFS-224288 [31] S. Das, A. Akuli, S. Biswas et al. (2022) Discrimination of Cocoa Beans using Structural Image Features: An Experimental Analysis. IEEE IAS Global Conference on Emerging Technologies (GlobConET), pp. 1138-1142. https://doi.org/10.1109/GlobConET53749.2022.9872 329 [32] H. Tercan and T. Meisen (2022) Machine learning and deep learning based predictive quality in manufacturing: a systematic review. J. Intell. Manuf. 33: 1879-1905. https://doi.org/10.1007/s10845-022- 01963-8 [33] T. H. E. Kim, H.R Kim, Y. J. Cho (2021) Product inspection methodology via deep learning: an overview. Sensors 21 (15): 5039. https://doi.org/10.3390/s21155039 [34] J. F. Lopes, V. G. T. da Costa, F. D. Barbin et al. (2022) Deep computer vision system for cocoa classification. Multimed. Tool. Appl. 81: 41059– 41077. https://doi.org/10.1007/s11042-022-13097-3 [35] C. D. Anggraini, A. W. Putranto, Z. Iqbal et al. (2021) Preliminary study on development of cocoa beans fermentation level measurement based on computer vision and artificial intelligence IOP Conference Series: earth and Environmental Science. IOP Publishing 924 (1): 012019. https://doi.org/10.1088/1755-1315/924/1/012019 [36] I. O. Abu, Z. Szantoi, A. Brink et al. (2021) Detecting cocoa plantations in Coted’Ivoire and Ghana and their implications on protected areas. Ecol. Indicat. 129: 107863. https://doi.org/10.1016/j.ecolind.2021.107863 [37] M. M. Oliveira, B. V. Cerqueira, S. Barbon et al. (2021) Classification of fermented cocoa beans (cut test) using computer vision. J. Food Compos. Anal. 97: 103771. https://doi.org/10.1016/j.jfca.2020.103771 [38] M. Mukhopadhyay, A. Dey and S. Kahali (2023) A Deep-Learning-Based Facial Expression Recognition Method Using Textural Features. Neural Computing and Applications, 35 (9): 6499–6514. https://doi.org/10.1007/s00521-022-08005-7 [39] M. Ghosh, A. Dey and S. Kahali (2022) Type-2 fuzzy blended improved D-S evidence theory-based decision fusion for face recognition. Appl. Soft Comput. 125: 109179. https://doi.org/10.1016/j.asoc.2022.109179 [40] M. Ghosh, A. Dey, S. Kahali (2024) A weighted fuzzy belief factor-based D-S evidence theory of sensor data fusion method and its application to face recognition. Mutimedia Tools and Applications, 83: 10637–10659. https://doi.org/10.1007/s11042-023- 16037-x 54 Informatica 48 (2024) 41–54 C. Pal et al.