*Corr. Author’s Address: School of mechanical Engineering and mechanics, Xiangtan University, Xiangtan, China, zhouyouhang@xtu.edu.cn 554
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568 Received for review: 2023-12-18
© 2024 The Authors. CC BY 4.0 Int. Licensee: SV-JME Received revised form: 2024-10-08
DOI:10.5545/sv-jme.2023.900 Original Scientific Paper Accepted for publication: 2024-10-09
Improving the Efficiency of Steel Plate Surface Defect 
Classification by Reducing the Labelling Cost  
Using Deep Active Learning
Yang, W. – Zhou. Y . – Meng, G. – Li, Y . – Gong, T.
Wenjia Yang
1
 – Youhang Zhou
1,2,*
 – Gaolei Meng
1
 – Yuze Li
1
 – Tianyu Gong
1
1 
Xiangtan University, School of Mechanical Engineering and Mechanics, China 
2 
Xiangtan University, Engineering Research Center of Complex Tracks Processing Technology and Equipment  
of Ministry of Education, China
Efficient surface defects classification is one of the research hotpots in steel plate defect recognition. Compared with traditional methods, 
deep learning methods have been effective in improving classification accuracy and efficiency, but require a large amount of labeled data, 
resulting in limited improvement of detection efficiency. To reduce the labeling effort under the premise of satisfying the classification 
accuracy, a deep active learning method is proposed for steel plate surface defects classification. Firstly, a lightweight convolutional neural 
network is designed, which speeds up the training process and enhances the model regularization. Secondly, a novel uncertainty-based 
sampling strategy, which calculates Kullback-Leibler (KL) divergence between two kinds of distributions, is used as an uncertainty measure 
to select new samples for labeling. Finally, the performance of the proposed method is validated using the steel surface defects dataset from 
Northeastern University (NEU-CLS) and the milling steel surface defects dataset from a local laboratory. The proposed global pooling-based 
classifier with global average pooling (GAPC) network model combined with the Kullback-Leibler divergence sampling (KLS) strategy has the 
best performance in the classification of steel plate surface defects. This method achieves 97 % classification accuracy with 44 % labeled 
data on the NEU-CLS dataset and 92.3 % classification accuracy with 50 % labeled data on the milling steel surface defects dataset. The 
experimental results show that the proposed method can achieve steel surface defects classification accuracy of not less than 92 % with no 
more than 50 % of the dataset to be labeled, which indicates that this method has potential application in surface defect classification of 
industrial products.
Keywords: surface defect classification, convolutional neural network, active learning, global pooling
Highlights
• 	 A deep active learning method for improving the efficiency of steel plate surface defect classification by reducing labeling cost 
is proposed.
• 	 Proposed GAPC-based CNN model can speed up the training process and enhance the model regularization.
• 	 The KLS uncertainty sampling strategy can effectively reduce the amount of label data required.
• 	 The proposed method can achieve classification accuracy of not less than 92 % with no more than 50 % of the dataset 
requiring labeling.
0  INTRODUCTION
Steel plate is widely utilized in aerospace production 
[1] and [2], architecture industry [3] and [4], and 
machinery manufacturing [5] and [6]. Its surface 
defects are the key factors affecting the quality of 
steel products. However, different categories of 
steel surface defects often occur during production. 
Surface defects of the steel plate, such as rolled-in 
scale, patches, crazing, pitted surface, inclusion, and 
scratches, are unavoidable. The defects not only affect 
production quality but also incur economic losses and 
give rise to safety concerns. An efficient classification 
of steel plate surface defects can contribute to a better 
understanding of the causes of defect formation, 
optimize production processes, enhance product 
quality and improve economic efficiency. Therefore, 
efficient and accurate classification of surface defects 
has become an indispensable function in the iron and 
steel industry.
The traditional methods of steel plate surface 
defect inspection mainly include manual visual 
inspection, eddy current inspection [7] and magnetic 
flux leakage testing [8], etc. Owing to the influence 
of subjective factors and a high error inspection rate, 
these methods have been unable to meet the current 
inspection requirements of the iron and steel industry. 
In recent years, with the development of science and 
technology, the inspection technology based on deep 
learning and machine vision, as a kind of non-contact 
inspection method, has become a research hotspot in 
the field of surface defect inspection [9] to [11]. As 
a kind of deep learning model, convolutional neural 
network (CNN) [12] has outstanding performance in 
many classification tasks in industry. The success of 
CNN models for classification tasks brings a rapid 
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
555 Improving the Efficiency of Steel Plate Surface Defect Classification by Reducing the Labelling Cost Using Deep Active Learning 
development of CNN-based steel defect classification 
methods. Zhou et al. [13] proposed a CNN model for 
effective and robust classification of surface defects 
in hot rolled steel sheets. This model achieved a 
classification accuracy of over 97 % through 500 
iterations using 60 % of the dataset as labeled data 
for training. He et al. [14] proposed a new method for 
defects detection and classification of low carbon steel 
wire arc additive manufacturing (WAAM) products 
using an improved cost-sensitive convolutional 
neural network. This method achieved a classification 
accuracy of over 92 % through 800 iterations using 
75 % of the dataset as labeled data for training. 
However, although the above studies can obtain ideal 
classification accuracy, they need more than 60 % of 
the dataset as labeled data for training, which will 
inevitably produce high labeling cost and affect the 
classification efficiency.
Currently, among the methods that can 
effectively alleviate the labeling cost, four of them 
have shown great potential: transfer learning [15], 
data augmentation by generative models [16] and 
[17], semi-supervised learning [18] to [20], and active 
learning [21]. 
Transfer learning is developed on the assumption 
that earlier layers in the convolutional base learn 
generic, reusable local patterns like curves and edges, 
while higher layers learn task-specific features. Hence 
the lower layers in an existing model trained on one 
big dataset can be reused on a relatively small-sized 
target dataset to improve the generalization ability of 
the model. Both Fu [22] and Yang [23] adopted pre-
trained SqueezeNet [24] as backbone architecture. 
Although the high classification accuracy had been 
achieved, all available data still needs to be labeled. 
In addition, surface defects have different image 
contexts compared to most large datasets, so it is hard 
to find the right number of layers to reuse [25]. This 
means the training time will be inevitably extended. 
As shown in [22], the model was trained on NEU-CLS 
dataset [26] in 20 minutes by using a NVDIA TITAN 
X GPU (12G memory). 
Generative models, like variational autoencoders 
(V AE) [27], generative adversarial networks (GAN) 
[28], and their variants, provide a different way to 
solve the problem. Instead of manually collecting 
more training data, the existing samples can be used 
to guide the generation of new artificial samples. Yun 
et al. [29] used conditional convolutional VAE [30] 
to generate images for each kind of defect and then 
used a CNN-based model for classification. Tang et al. 
[31] took a similar approach to classify photovoltaic 
module defects. However, the generative model 
they adopted was GAN. This kind of method has 
the disadvantage of generating many samples with 
less information because the generation process does 
not take sample informativeness into account [32]. 
Consequently, these methods may prolong the training 
time and waste computational resources. 
Semi-supervised learning uses both labeled and 
unlabeled data for model training. Gao et al. [33] 
proposed combined pseudo-label CNN (PLCNN). 
However, PLCNN abandoned the unsupervised 
pretraining process that plays an essential role 
in the original paper. This may harm the model’s 
classification ability. The accuracy of PLCNN on 
NEU-CLS dataset is 90.7 %, which is inferior to 
other methods. He et al. [34] and He et al. [35] both 
utilized semi-supervised GAN (SGAN) to perform 
defects classification. The major difference between 
their works is the former used a trained convolutional 
autoencoder [36] to initialize the discriminator in 
SGAN with identical topology, whereas the latter 
trained another residual network and combined it 
with SGAN to form a multi-training algorithm. 
Using generative models may help to learn the latent 
structure of defects, but it will take more time and 
computational resources to complete the training.
Compared with the above three methods, the 
uncertainty-based active learning method is an 
effective approach to reduce both labeling cost and 
computing resource, where the most informative 
samples are incrementally selected for labeling 
to improve the model classification ability at low 
labeling budgets. Yang et al. [37] presented a new 
framework that combines a fully convolutional 
network and an uncertainty method in active learning 
to reduce biomedical image analysis annotation effort 
by making judicious suggestions on the most effective 
annotation areas. This method can achieve state-of-
the-art segmentation performance using 50 % of the 
training data. There are three widely used uncertainty 
sample strategies, namely: least confident (LC) [38], 
margin sampling (MS) [39] and entropy (EN) [40]. 
These strategies assume that model’s prediction on an 
unlabeled data pool obtains the model’s uncertainty 
over the unlabeled data. By applying different 
uncertainty measures, the most informative samples 
can be selected for labeling. However, these methods 
only utilize the model predictions on the unlabeled 
data, ignoring the uncertainty information of the 
model on the labeled data, which is considered useful. 
By taking both types of uncertainty into consideration, 
the uncertainty of model can be better measured, and 
the most informative samples can be screened out for 
labeling.
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
556 Yang, W. – Zhou. Y. – Meng, G. – Li, Y. – Gong, T.
In this work, a deep active learning method for 
steel plate defect classification is proposed. To enhance 
the learning efficiency and reduce the computational 
cost, a simplified convolutional network is designed 
based on simple features of hot rolled steel plate 
surface defects to expedite the training process. A 
global pooling layer is adopted to improve the model’s 
generalization ability. Experiments are carried out on 
both global average pooling and global max pooling 
to find which is more suitable for active learning. 
Then, the average probability distribution over classes 
(PDC) calculated from labeled data for a specific class 
is considered as the best performance of the model 
on this class to integrate two kinds of uncertainty. 
By quantifying the difference between the PDC of an 
unlabeled sample and the optimal model performance 
on the predicted sample label, a new uncertainty 
index is obtained to guide sample selection. Based 
on experiment results on the NEU-CLS dataset 
and milling steel plate surface defects dataset, the 
proposed method can achieve superior classification 
results with less labeled data. 
1  OVERALL FRAMEWORK
The framework (Fig. 1) consists of two key 
components: a convolutional neural network for 
model training and a sample strategy for data 
collection. A small portion of the existing dataset is 
randomly selected for labeling. The selected samples 
from the labeled data pool which is denoted as D
L
, 
and the rest of the samples compose the unlabeled 
data pool which is denoted as D
U
. As is shown in Fig. 
1, the model is firstly trained by the initial labeled 
data pool. Then the trained model is used to predict 
the labeled and unlabeled data respectively, and the 
defect images are selected from the unlabeled data 
pool for labeling according to the proposed sampling 
strategy. At this point, the two data pools are updated. 
The training process is restarted and repeated until the 
model classification performance is satisfied.
2  METHOD
2.1  Model Design
2.1.1  Feature Extractor
Steel plate surface defects are not as complex as human 
faces or other objects with lots of features. There is 
no need to use a big and complex convolutional base 
which can be hard to train. Hence, a shallow network 
is utilized to reduce training time. The convolutional 
base adopted in this paper can be considered as a 
shallow version of visual geometry group (VGG) 
network [41] and [42], where only four convolutional 
layers are kept, and every two convolutional layers 
Fig. 1.  Overview of the proposed method framework
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
557 Improving the Efficiency of Steel Plate Surface Defect Classification by Reducing the Labelling Cost Using Deep Active Learning 
are followed by a max-pooling layer. All batch 
normalization layers are removed to speed up training.
2.1.2  Global Pooling as Structural Regularizer
The traditional classifier (TRC), followed by the 
convolutional base, is composed of two hidden layers 
and a dense output layer [43] and [44]. In recent years, 
a new type of classifier has emerged. Scholars [45] and 
[46] have replaced the two hidden layers in traditional 
classifier with global average pooling layer (GAP). 
Szegedy claimed that this replacement has boosted 
the top-1 accuracy by about 0.6 %. The outputs of the 
feature extractor are multiple feature maps. In TRC 
setting, the feature maps need to go through the flatten 
layer to be expanded into a one-dimensional feature 
vector before they can be passed into the classifier. 
However, GAP and global max pooling (GMP) 
calculate the average and maximum value of each 
feature map as the output. Fig. 2 shows the difference 
between flatten and global pooling.
Fig. 2.  The difference between flatten and global pooling
To explore the performance of classifiers based 
on global pooling, this paper carried out a comparative 
analysis of global pooling-based classifier (GPC) 
and TRC based on commonly used regularization 
methods, and clarified which classifier is the most 
effective in the subsequent experimental results. The 
designed CNN model structure is shown in Fig. 3. 
Compared with traditional CNN, the main difference 
is the use of global pooling to replace the hidden 
layers of the final densely connected classifier.
2.2  Sampling Strategy
The labeled data pool D
L
 can not only be used to train 
the model, but also contains the model’s uncertainty 
information about the dataset. Unlike traditional 
uncertainty-based sample strategies which only 
utilize model predictions on the unlabeled data, The 
Kullback-Leibler divergence sampling (KLS) is 
proposed to consider the uncertainty of the model on 
the labeled data and incorporate it into the sampling 
process. 
The single sample in the labeled data pool D
L
 
and the unlabeled data pool D
U
 is denoted as x, and 
the corresponding label is y. The Initial model is 
trained by D
L
. After predicting every sample in D
U
, 
a prediction array is available, each row of which is 
the PDC for a specific sample. For sample x
u
∈D
U
, its 
PDC is denoted as: 
 py yxW
py xW
py xW
py xW
u
u
u
C
u
  















|,
|, ,
|, ,
,
|,
1
2
 






 1 C
, (1)
where p(y
c
 | x
u
, W), c ∈(1, ..., C) represents the 
probability that the label of sample x
u
 is y
c
, with C 
being the number of classes. W represents the model 
parameters. y' is the predicted label which can be 
calculated by:
 yp yxW
y
u
'   argmax |, . (2)
Fig. 3.  Architecture of the designed network
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
558 Yang, W. – Zhou. Y. – Meng, G. – Li, Y. – Gong, T.
The generated prediction array can be split 
into separate PDC subgroups based on the sample’s 
predicted label. Suppose y' = y
c
, the PDC of x
u
 will be 
contained in group y
c
, which is: 
 py yx W
py xW
py xW
py xW
c
u
u
u
C
u


















|,
|, ,
|, ,
,
|,
1
2
 






 1 C
. (3)
The next step is to predict the labeled data. For a 
trained model with strong regularization, it is 
optimized for the trained data pool D
L
. Therefore, its 
predictions on D
L
 are considered as its best 
performance. For sample x
1
∈D
L
, its PDC and 
predicted label can be calculated by Eqs. (1) and (2). 
For label y
c
, the average PDC p(y = y
c
 | X
c
l
, W)
avg
 can 
be calculated by:  
   py yX W
N
py xW
py xW
c
l
c
avg
n
N
l
n
n
N
l
n
n











|,
|, ,
|, ,
,
1
1
1
1
2
1
N N
C
l
n
py xW
























|,
, (4)
where X
c
l
, represents the set of samples whose 
predicted labels are y
c
, N is the size of X
c
l
. 
p(y = y
c
 | X
c
l
, W)
avg
 is taken as the best 
performance of model on label y
c
. In the PDC group 
belonging to y
c
, any normal sample’s PDC should be 
close to this average distribution. If p(y = y
c
 | x
u
, W) 
diverges too far from p(y = y
c
 | X
c
l
, W)
avg
, the sample x
u
 
is considered as abnormal. In other words, the model 
is uncertain about the class of sample x
u
.
To measure the difference between two 
distributions, the Kullback-Leibler (KL) divergence 
[47] is introduced:
 KL PQ Pi
Pi
Qi
i
(| |) () log
()
()
.  (5)
The KL divergence between these two 
distributions (kls) as the model’s uncertainty about 
sample x
u
 can be calculated by: 
klsK LpyyXW py yx W
u
c
l
c
avg
c
u




 
|, || |, , (6)
p(y = y
c
 | X
c
l
, W)
avg
 is calculated for every label y
c
 
as the performance baseline. Then, for every sample’s 
PDC in every PDC subgroup, the corresponding kls 
value is calculated. In every subgroup, the top k 
samples with highest kls value will be selected for 
labeling. This strategy naturally guarantees the 
diversity in each selected batch by sampling an equal 
number of samples in every subgroup. This means 
that the total number of selected samples will be k×C.
2.3  Stopping Criterion
Considering that active learning is iterative, a stopping 
criterion is needed to stop the training process when 
the model performance is reached. Therefore, the 
stopping criterion should be highly connected to the 
model performance. As the designed model is strongly 
regularized, validation accuracy (V A) is adopted as the 
criterion. According to the experiments, the validation 
accuracy of the network is always less than or equal to 
the test accuracy. 
The original data set is split into three parts: 
D
L
 for training, D
V
 for validation, D
U
 for sampling. 
Therefore, all labeling work will be costed by the 
labeling of D
L
, D
V
 and the sampled data. The pseudo-
code of the proposed method is shown in Algorithm I. 
The implementation process of this method is shown 
in Fig. 4.
Algorithm I: Deep active learning method for steel 
plate surface defect classification 
Input: initial training data pool D
L
, validation data 
pool D
V
, unlabeled data pool D
U
, random initialized 
model parameters W', stopping criterion VA, number 
of sampled images for each class k. 
Output: optimized model parameters W.
1 Repeat:
2 Make predictions on D
U
;
3 The predicted labels are calculated by Eq. (2);
4 The PDC of all samples is counted and grouped 
according to the predicted label;
5 Make predictions on D
L
;
6 The performance benchmark of the model on 
each label is calculated according to Eq. (4);
7 The KLS value corresponding to each PDC is 
calculated by Eqs. (5) and (6);
8 In each PDC group, k samples with the highest 
KLS value are selected.
9 The samples are labeled, D
L
and D
U
 are updated;
10 The model is trained with D
L
;
11 D
U
 is used to verify the performance of the model 
and VA' is calculated;
12 Until VA' > VA.
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
559 Improving the Efficiency of Steel Plate Surface Defect Classification by Reducing the Labelling Cost Using Deep Active Learning 
3  EXPERIMENTAL SETUP
The public hot-rolled steel strip database from 
Northeastern University (NEU-CLS) [26] is used to 
evaluate the effectiveness and applicability of the 
proposed method for the defect classification of a steel 
plate surface.
3.1 NEU-CLS Dataset
NEU-CLS is a standard high-quality database 
collecting the typical defects of hot-rolled steel 
plate surfaces. This dataset includes six types of 
defects: rolled-in scale, patch, crazing, pitted surface, 
inclusion, and scratches. There are 300 grayscale 
images for each type of defect (with a total number 
of 1800 images). The resolution of each image is 
200×200 pixels. A selection of defect images is shown 
in Fig. 5.
3.2 Implementation Details
The experiments are performed on a work 
computer with Intel(R) Core i5-11400F CPU, 16GB 
memory and NVIDIA GeForce RTX 3060 GPU (12G 
memory). The dataset of NEU-CLS is divided into 
three parts, specifically 60 % for training, 20 % for 
validation, and 20 % for testing. The 5 % of training 
data (9 images per class) are sampled at random as 
initial labeled data pool D
L
. The rest of the training 
data make up the unlabeled data pool D
U
. Batch size is 
set to 8 and SGD optimizer is used with a momentum 
of 0.9. Early stopping with a patience of 10 is used 
for all experiments to stop the training process when 
overfitting begins. The results of the experiment are 
averaged over 5 runs. The detailed information of the 
designed network is shown in Table 1. The network 
takes 200×200×1 image as input and outputs 6 class 
predictions. 
The experiment is mainly carried out in two parts. 
1. The regularization effect of GPC is tested in 
a supervised learning condition. All training 
data are used to train the model and the 
learning rate is set to 0.005. TRC is mixed with 
dropout and geometric transformation-based 
data augmentation to form four methods as 
comparison. The dropout rate is set to 0.5. The 
geometric transformations include horizontal 
flipping, −30˚ to 30˚ rotations, random shearing 
and zooming. 0.1 of the width and height of 
the image .s shifted at random in the shifting 
operation. The shearing range is set to 0.2 and the 
zooming range is set to [0.8, 1.2]. The GPC with 
global average pooling and global max pooling 
are named as GAPC and GMPC in the following 
experiments, respectively. 
Fig. 4.  The implementation process of this method
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
560 Yang, W. – Zhou. Y. – Meng, G. – Li, Y. – Gong, T.
Fig. 5.  Images in NEU-CLS dataset; a) crazing, b) inclusion,  
c) patch, d) pitted surface, e) rolled-in scale, and f) scratch
Table 1.  Detailed configuration of the designed network 
architecture
Layer Kernel size / Stride Output size
Convolution + ReLu 3×3/2 100×100×32
Convolution + ReLu 3×3/2 50×50×32
Maxpool 2×2/1 25×25×32
Convolution + ReLu 3×3/2 13×13×64
Convolution + ReLu 3×3/1 13×13×64
Maxpool 2×2/1 6×6×64
Global average/Maxpool 1×1×64
Dropout 50 % 1×1×64
FC + Softmax 1×1×6
2. The effect of the proposed KLS sampling 
strategy in active learning is compared with the 
traditional uncertainty-based sampling method 
including LC, MS, and EN. Random Sampling 
(RS) is used as a performance base line, which 
discards all uncertainty strategies and randomly 
selects samples from D
U
 in each training cycle. 
The stopping criterion VA is set to 0.95. In the 
experiment, GPC is combined with dropout to 
further strengthen the regularization effect of the 
model. Meanwhile, inverse-time-decay is used to 
gradually decrease the learning rate. The decay 
strategy of the learning rate is defined as:
 
ll
dN
d
ri ni
re poch
s



1
1
,
 (7)
where l
ini
 is the initial learning rate. d
r
 is the decay 
rate and its value is set to 0.96. d
s
 is the decay step 
and its value is set to 162.  N
epoch
 is the epoch number 
of the current training iteration, which is reset at the 
beginning of each iteration. By using learning rate 
decay, the initial learning rate can be set to a large 
value to speed up the model training and avoid local 
minima [48]. Therefore, in this part of the experiments, 
l
ini
 is set to 0.01.
The accuracy, precision, recall, and F1-score 
are used as the metrics to evaluate the classification 
performance of the proposed method comprehensively. 
After the model prediction, the defect image will be 
defined as one of four cases: true positive (TP), false 
positive (FP), true negative (TN) and false negative 
(FN). The aforementioned metrics are defined as:
Accuracy
NumN um
NumN um NumN um



TP TN
TP FP TN FN
, (8)
 Precision
Num
NumN um


TP
TP FP
, (9)
 Recall
Num
NumN um


TP
TP FN
, (10)
 F
ecision call
1
2
11


Pr Re
, (11)
where the Num
TP
 , Num
FP
 , Num
TN
 , Num
FN
 represent 
the number of defects that are defined as TP, FP, TN, 
FN, respectively.
4  RESULTS AND DISCUSSION
4.1  Comparison of Regularization Methods
As shown in Table 2, TRC without any regularization 
method has a short training time, but its classification 
accuracy is relatively low: only 91.4 % of the test 
samples are correctly classified. The method of 
combining TRC and data augmentation has the 
highest classification accuracy, reaching 96.4 %, but 
its training time is also relatively long, indicating that 
the augmentation process and the enlarged data set 
have a great impact on the model training time. The 
combination of TRC and dropout reduces the training 
time by about 20 % compared with data augmentation, 
but its classification accuracy is the lowest, only 90.8 
%. The classification accuracy of the GPC (GAPC and 
GMPC) method proposed in this paper reaches 96.2 
%, which is only 0.2 % lower than that of the data 
augmentation method, but the GPC method greatly 
reduces the training time. Especially, the GMPC 
method reduces the model training time by about 50 % 
compared with the data augmentation method, which 
greatly improves the training efficiency and enhances 
the generalization ability of the model in a short time.
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
561 Improving the Efficiency of Steel Plate Surface Defect Classification by Reducing the Labelling Cost Using Deep Active Learning 
RS. Fig. 7 shows that traditional methods perform 
similarly under the GMPC-based classifier, and the 
classification performance of RS is the worst. 
However, the proposed KLS method constantly 
outperforms RS and traditional methods in both 
classifier architectures. Fig. 7 shows that traditional 
methods perform similarly under the GMPC-based 
classifier, and the classification performance of RS 
is the worst. Taking the performance of the GAPC-
based classifier as an example, KLS sampling strategy 
can achieve 91.8 % classification accuracy with 32 
% of the data set to be labeled, which can reduce 
the label cost by more than 3 % compared with the 
traditional methods. Moreover, KLS can achieve 97 
% classification accuracy with 44 % of the dataset to 
be labeled. Compared with the traditional uncertainty 
sampling method, KLS sampling strategy is more 
efficient for the use of labeled data. 
4.3  Comparison of Classifier Performance
To determine the best classifier, the performances 
of GPC-based classifiers (GAPC and GMPC) are 
compared and analyzed. Fig. 8 shows that the gap 
Table 2. The performance of GPC and TRC in supervised learning 
setting
Methods
Accuracy 
[%]
Precision 
[%]
Recall
[%]
F1-score 
[%]
Training 
time [s]
TRC 91.4 91.6 91.4 91.2 69.78
TRC+ 
Augmentation
96.4 96.6 96.4 96.4 102.89
TRC + 
Dropout
90.8 91.2 90.8 90.8 85.07
TRC + 
Augmentation
 + Dropout
95.6 95.6 95.6 95.6 142.37
GMPC 96.2 96.2 96.2 96.2 51.83
GAPC 96.2 96.2 96.2 96.2 62.31
4.2  Comparison of Sampling Strategies
As shown in Fig. 6, under the GAPC-based classifier, 
the classification performance of RS is better than 
traditional methods by more than 5 % in a low label 
budget (<35 % of the dataset to be labeled). When 
the number of labeled data increases to more than 35 
%, the performance of RS stagnates, and traditional 
methods start to catch up and finally outperform 
Fig. 6.  The performance of different sample strategies with GAPC
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
562 Yang, W. – Zhou. Y. – Meng, G. – Li, Y. – Gong, T.
between validation loss and training loss of the 
proposed model does not increase with the training 
time, indicating that the regularization effect of GPC is 
significant. Specifically, the accuracy and loss curves 
of the GMPC-based method converge faster in the first 
150 epochs, but after 200 epochs, the performance 
of the GMPC-based method stagnates and starts to 
fluctuate in a wide range. The above situation does not 
appear in the GAPC-based method. Since the number 
of labeled samples increases with time in active 
learning, it can be inferred that the GMPC-based 
method is easy to converge only under the condition 
of a few labeled samples. When the number of labeled 
samples increases to a certain extent, the convergence 
becomes more difficult. Obviously, the GMPC-based 
method is more susceptible to labeled samples. As 
shown in Table 3, under the premise of achieving the 
same classification accuracy (90 %), the GAPC-based 
classifiers require 3 % to 15 % less labeled data than 
the GMPC-based classifiers except for EN. With 44 % 
of the dataset to be labeled (Table 4), the classification 
performance metrics of GAPC-based classifiers 
are 3 % to 14 % higher than that of GMPC-based 
classifiers. Therefore, considering the accuracy and 
model stability, the GAPC-based classifier is more 
suitable than the GMPC-based classifier for steel plate 
surface defect classification. 
The accuracy and amount of labeled data on 
the NEU-CLS dataset of the proposed model and other 
Deep-learning based approaches are shown in Table 5. 
Compared with the end to end (ETE) method [49] and 
Supervised learning method mentioned in section 3.2, 
the proposed method achieves 2 % and 1.4 % higher 
accuracy on the NEU-CLS dataset, respectively, and 
the data that need to be labeled is reduced by 16 % 
and 36 %, respectively. Although the classification 
accuracy of SDC-SN-ELF+MRF method [13] is 0.3 % 
higher than that of the proposed method, the amount 
of labeled data required by this method is significantly 
higher (36 % higher than that of the proposed 
method). The results confirm that the proposed model 
can obtain good accuracy with less labeled data.
Additionally, the average training time of the 
proposed method (omitting the labeling time) is 180 
s, and the final model size is 578.7 KB, which has 
application prospects in improving the efficiency of 
steel plate surface defect classification.
Fig. 7.  The performance of different sample strategies with GMPC
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
563 Improving the Efficiency of Steel Plate Surface Defect Classification by Reducing the Labelling Cost Using Deep Active Learning 
4.4  Visualization of Defect Area Identification
Class activation graph (CAM) [50] and [51] is used to 
improve the interpretability of the proposed network 
model. The adopted model is obtained from the 
previous experiment, the network structure is GAPC-
based neural network, and the training method is 
active learning method based on KLS. As shown in 
Fig. 9, the significant features that play a decisive role 
in the prediction of the network model are visualized 
in the heatmap. The warmer the color of the area in 
the heatmap, the more that area contributes to the 
model prediction. By superimposing the heatmap with 
a)   b) 
Fig. 8.  a) The loss, and b) accuracy of proposed GPC-based method over time
Table 3.  Percentage of labeled samples needed to reach 90 % of corresponding performance metric
Metric
GAPC GMPC
KLS LC MS EN RS KLS LC MS EN RS
Accuracy 32 35 35 41 35 35 50 38 41 44
Precision 32 35 35 41 32 35 50 38 41 44
Recall 32 35 35 41 35 35 50 38 41 44
F1-score 32 35 35 41 35 35 50 38 41 44
Table 4.  Performance score [%] achieved using labeled data that account for 44 % of the dataset
Metric
GAPC GMPC
KLS LC MS EN RS KLS LC MS EN RS
Accuracy 97.8 94.8 95.5 93.3 93.8 93.8 88.0 91.8 80.6 90.2
Precision 97.9 94.8 95.8 93.3 94.0 93.8 88.8 92.4 83.4 91.0
Recall 97.8 94.8 95.5 93.3 93.8 93.8 88.0 91.8 80.6 90.2
F1-score 97.8 94.8 95.5 93.3 93.8 93.8 87.8 91.8 80.2 90.0
Table 5.  The accuracy and amount of labeled data on the NEU-CLS dataset of proposed model and other deep-learning based approaches
Methods
Accuracy  
[%]
Training data  
[% of dataset]
Validation data  
[% of dataset]
Labeled data
[% of dataset]
ETE [49] 95.8 60 - 60
SDC-SN-ELF + MRF [13] 98.1 80 - 80
Supervised learning method mentioned in section 3.2 96.4 60 20 80
Ours 97.8 24 20 44
Fig. 9.  Defect area identification;  
a) crazing, b) patch, c) pitted surface, d) scratch
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
564 Yang, W. – Zhou. Y. – Meng, G. – Li, Y. – Gong, T.
the input image, it is found that the network focuses 
especially on the discriminative parts in the input 
images, which also proves the effectiveness of the 
proposed network model and learning method.
5  EXPERIMENTAL VERIFICATION
5.1  Dataset Preparation
As one of the raw materials commonly used in 
mechanical manufacturing, steel plate needs to be 
processed before it can be put into use. For example, in 
the manufacture of a linear guide plane, the steel plate 
usually needs to be milled. The defects on the surface 
of the steel plate after milling may have an impact on 
the positioning accuracy and service life of the linear 
guide plane. However, unlike hot-rolled steel plates, 
the probability of defects on the steel plates after 
processing is relatively small. Therefore, the number 
of defect samples is usually limited and belongs to the 
dataset with a small amount of data. This is the current 
situation of sample shortage in industry.
Therefore, to verify the applicability of the 
proposed method in industry, the steel plate milling 
experiment is carried out, and the surface defect 
dataset of processed steel plate is obtained. The 
parameters of milling processing are shown in Table 
6. The defect images are collected on the image 
acquisition platform (VMC220) (Fig. 10). The 
defects on the surface of the steel plate after milling 
are pitted surface, scratch, and patch (Fig. 11). The 
number of collected images of pitted surface, scratch 
and patch defects are 200, 200 and 100, respectively. 
Due to the small number of original images, the data 
augmentation method of geometric transformation is 
used to amplify the image data, resulting in a total 
of 900 images, 300 for each type of defect. The 
resolution of each image is 200×200 pixels.
Table 6.  Experimental configurations
Configurations Parameters
Machine tool VMC-C30
Workpiece
Steel S45C
Steel S15C
Milling cutter Kennametal 40A03RS45SE14EG
Spindle speed [r/min]
2000
2500
3000
Cutting depth [mm]
0.2
0.15
0.1
Feed per tooth [mm]
0.005
0.01
0.015
Fig. 10.  Image acquisition platform
a)  b) 
c) 
Fig. 11.  Images in milling steel surface defect dataset;  
a) pitted surface, b) scratch,  and c) patch
5.2  Experimental Setup
The main setups of the experiment have been shown 
in section 3.2. In this experiment, 60 % of the dataset 
is used for training, 20 % for validation and 20 % for 
testing. Then 10 % training data (18 images per class) 
are sampled at random as initial labeled data pool D
L
. 
The rest of the training data make up the unlabeled 
data pool D
U
.
5.3  Results and Discussion
Fig. 12 shows that the proposed GAPC-based network 
model still performs stably on the dataset with good 
regularization effect. As shown in Table 7 and Fig.13, 
when the amount of label data used by the model is 50 
% of the dataset, the proposed KLS sampling method 
can achieve 92.3 % classification accuracy, while the 
maximum accuracy of other traditional methods is 
only about 80.4 %. 
Hence, the experimental results indicate that the 
proposed method can still achieve more than 90 % 
classification accuracy when the number of samples 
is small, which reflects its application possibility in 
industry.
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
565 Improving the Efficiency of Steel Plate Surface Defect Classification by Reducing the Labelling Cost Using Deep Active Learning 
Table 7.  Comparison results of different sample strategies with 
GAPC
Metric Ours [%] LC [%] MS [%] EN [%]
Accuracy 92.3 80.4 75.0 72.5
Precision 91.5 85.2 81.2 79.0
Recall 91.0 80.0 74.6 72.5
F1-score 91.3 79.8 74.3 72.5
6  CONCLUSIONS
To reduce the labeling cost of steel plate surface defect 
classification in industrial production, a lightweight 
CNN model with strong regularization ability is 
designed, and an efficient deep active learning method 
is proposed by combining it with the KLS strategy. 
The specific conclusions are as follows:
Fig. 12.  a) The loss, and  b) accuracy of proposed method over time
Fig. 13.  The performance of different sample strategies with GAPC
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
566 Yang, W. – Zhou. Y. – Meng, G. – Li, Y. – Gong, T.
1. The GPC-based classifier can greatly reduce 
the training time while maintaining the same 
performance as the traditional classifier in steel 
plate surface defect classification.
2. A GPC-based lightweight convolutional neural 
network model is proposed. The result indicates 
that the performance of the GAPC-based network 
model is more stable than that of the GMPC-
based network model.
3. The labeling cost can be significantly reduced 
by using the KLS strategy as the uncertainty 
sampling method. Comparative analysis shows 
that the GAPC-KLS model only needs 44 % 
labeled data to achieve 97.8 % classification 
accuracy, and its performance is optimal. 
Meanwhile, this model can still achieve 92.3 % 
classification accuracy with 50 % labeled data on 
the milling steel surface defect dataset. Therefore, 
the proposed method can achieve classification 
accuracy (≥92 %) with limited labeled data (≤50 
% of the dataset to be labeled) on both NEU-CLS 
and milling steel surface defect datasets.
To further improve the classification efficiency, 
the subsequent research will focus on the optimization 
of the convolutional base to reduce the training time 
and improve the training efficiency while ensuring the 
quality of feature extraction. The proposed method 
can provide a reference for steel plate production 
enterprises to reduce the cost of surface defect image 
annotation. In addition, this method may provide a 
new idea for efficient classification of other surface 
defects in industry.
7  ACKNOWLEDGEMENTS
This work is supported by the National Natural 
Science Foundation of China (52175254); 
Postgraduate Scientific Research Innovation Project 
of Hunan Province (CX20220603, CX20230550). 
8  REFERENCES
[1] Zhu, M., Lu, X., Li, H., Cao, H., Wu, F. (2023). Applicability 
analysis of nickel steel plate friction coefficient model based 
on fractal theory. Coatings, vol. 13, 1096, DOI:10.3390/
coatings13061096.
[2] Mohtaram, Y.F., Kahnamouei, J.T., Shariati, M., Behjat, B. 
(2012). Experimental and numerical investigation of buckling 
in rectangular steel plates with groove-shaped cutouts. 
Journal of Zhejiang University SCIENCE A, vol. 13, p. 469-480, 
DOI:10.1631/jzus.A1100226.
[3] Wang, Y., Shen, X.L., Wu, K., Huang, M.Q. (2022). Corrosion 
grade recognition for weathering steel plate based on a 
convolutional neural network. Measurement Science and 
Technology, vol. 33, 095014, DOI:10.1088/1361-6501/
ac7034.
[4] Dung, C.V., Sekiya, H., Hirano, S., Okatani, T., Miki, C. (2019). 
A vision-based method for crack detection in gusset plate 
welded joints of steel bridges using deep convolutional neural 
networks. Automation in Construction, vol. 102, p. 217-229, 
DOI:10.1016/j.autcon.2019.02.013.
[5] Jiménez-Peña, C., Goulas, C., Preußner, J., Debruyne, D. 
(2020). Failure mechanisms of mechanically and thermally 
produced holes in high-strength low-alloy steel plates 
subjected to fatigue loading. Metals, vol. 10, no. 3, 318, 
DOI:10.3390/met10030318.
[6] Park, C.Y., Kim, J.W., Kim, B., Lee, J. (2020). Prediction for 
manufacturing factors in a steel plate rolling smart factory 
using data clustering-based machine learning. IEEE Access, 
vol. 8, p. 60890-60905, DOI:10.1109/ACCESS.2020.2983188.
[7] Yoshioka, S., Fujii, A., Tohara, M., Gotoh, Y. (2021). Proposed 
inspection method for opposite-side defect in steel plate 
using synthetic magnetic field with high and low excitation 
frequencies. Sensors and Materials, vol. 33, no. 7, p. 2511-
2520, DOI:10.18494/SAM.2021.3380.
[8] Wang, G., Xiao, Q., Gao, Z.H., Li, W.H., Jia, L., Liang, C., Yu, 
X. (2022). Multifrequency AC magnetic flux leakage testing 
for the detection of surface and backside defects in thick 
steel plates. IEEE Magnetics Letters, vol. 13, 8102105, 
DOI:10.1109/LMAG.2022.3142717.
[9] Zheng, X., Zheng, S., Kong, Y. G., Chen, J. (2021). Recent 
advances in surface defect inspection of industrial products 
using deep learning techniques. The International Journal 
of Advanced Manufacturing Technology, vol. 113, p. 35-58, 
DOI:10.1007/s00170-021-06592-8.
[10] Bhatt, P.M., Malhan, R.K., Rajendran, P., Shah, B.C., 
Gupta, S.K. (2021). Image-based surface defect detection 
using deep learning: a review. Journal of Computing and 
Information Science in Engineering, vol. 21, no. 4, 040801, 
DOI:10.1115/1.4049535.
[11] Chen, Y.J., Ding, Y.Y., Zhao F., Zhang, E.H., Wu, Z.N., Shao, 
L. (2021). Surface defect detection methods for industrial 
products: a review. Applied Sciences, vol. 11, no. 16, 7657, 
DOI:10.3390/app11167657.
[12] Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. (1998). 
Gradient-based learning applied to document recognition. 
Proceedings of the IEEE, vol. 86, no. 11, p. 2278-2324, 
DOI10.1109/5.726791.
[13] Zhou, S.Y., Chen, Y.P., Zhang, D.L., Xie, J.M., Zhou, Y.F. 
(2017). Classification of surface defects on steel sheet using 
convolutional neural networks. Materials and Technologies, 
vol. 51, no. 1, p. 123-131, DOI:10.17222/mit.2015.335.
[14] He, X., Wang, T.Q., Wu, K.X., Liu, H.H. (2021). Automatic 
defects detection and classification of low carbon steel 
WAAM products using improved remanence/magneto-
optical imaging and cost-sensitive convolutional neural 
network. Measurement, vol. 173, 108633, DOI:10.1016/j.
measurement.2020.108633.
[15] Pan, S.J.L., Yang, Q. (2010). A survey on transfer learning. IEEE 
Transactions on Knowledge and Data Engineering, vol. 22, no. 
10, p. 1345-1359, DOI:10.1109/TKDE.2009.191.
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
567 Improving the Efficiency of Steel Plate Surface Defect Classification by Reducing the Labelling Cost Using Deep Active Learning 
[16] Dhamala, J., Bajracharya, P., Arevalo, H.J., Sapp, J.L., 
Horácek, B. M., Wu, K.C., Trayanova, N.A., Wang, L.W. (2020). 
Embedding high-dimensional Bayesian optimization via 
generative modeling: parameter personalization of cardiac 
electro-physiological models. Medical Image Analysis, vol. 62, 
101670, DOI:10.1016/j.media.2020.101670.
[17] Wang, Q.X., Yang, R.H., Wu, C.J., Liu, Y. (2021). An effective 
defect detection method based on improved generative 
adversarial networks (iGAN) for machined surfaces. Journal of 
Manufacturing Processes, vol. 65, p. 373-381, DOI:10.1016/j.
jmapro.2021.03.053.
[18] Lee, D.H. (2013). Pseudo-label: The simple and efficient semi-
supervised learning method for deep neural networks. ICML 
2013 Workshop: Challenges in Representation Learning, p. 
1-7.
[19] Odena, A. (2016). Semi-supervised learning with generative 
adversarial networks. arXiv:1606.01583, DOI:10.48550/
arXiv.1606.01583.
[20] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, 
A., Chen, X. (2016). Improved techniques for training GANs. 
arXiv:1606.03498, DOI:10.48550/arXiv.1606.03498.
[21] Weigl, E., Heidl, W., Lughofer, E., Radauer, T., Eitzinger, C. 
(2016). On improving performance of surface inspection 
systems by online active learning and flexible classifier 
updates. Machine Vision and Applications, vol. 27, p. 103-
127, DOI:10.1007/s00138-015-0731-9.
[22] Fu, G.Z., Sun, P.Z., Zhu, W.B., Yang, J.X., Cao, Y.L., Yang, M.Y., 
Cao, Y.P. (2019). A deep-learning-based approach for fast 
and robust steel surface defects classification. Optics and 
Lasers in Engineering, vol. 121, p. 397-405, DOI:10.1016/j.
optlaseng.2019.05.005.
[23] Yang, Y.T., Yang, R.Z., Pan, L.H., Ma, J.X., Zhu, Y.S., Diao, T., 
Zhang, L. (2020). A lightweight deep learning algorithm 
for inspection of laser welding defects on safety vent of 
power battery. Computers in Industry, vol. 123, 103306, 
DOI:10.1016/j.compind.2020.103306.
[24] Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, 
W.J., Keutzer, K. (2016). Squeeze Net: AlexNet-level accuracy 
with 50x fewer parameters and <0.5 MB model size, 
arXiv:1602.07360, DOI:10.48550/arXiv.1602.07360.
[25] He, D., Xu, K., Wang, D.D. (2019). Design of multi-scale 
receptive field convolutional neural network for surface 
inspection of hot rolled steels. Image and Vision Computing, 
vol. 89, p. 12-20, DOI:10.1016/j.imavis.2019.06.008.
[26] Song, K.C., Yan, Y.H. (2013). A noise robust method based 
on completed local binary patterns for hot-rolled steel strip 
surface defects. Applied Surface Science, vol. 285, p. 858-
864, DOI:10.1016/j.apsusc.2013.09.002.
[27] Kingma, D.P., Welling, M. (2013). Auto-encoding variational 
Bayes. arXiv:1312.6114, DOI:10.48550/arXiv.1312.6114.
[28] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-
Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative 
adversarial networks, arXiv:1406.2661, DOI:10.48550/
arXiv.1406.2661.
[29] Yun, J.P., Shin, W.C., Koo, G., Kim, M.S., Lee, C., Lee, S.J. 
(2020). Automated defect inspection system for metal 
surfaces based on deep learning and data augmentation. 
Journal of Manufacturing Systems, vol. 55, p. 317-324, 
DOI:10.1016/j.jmsy.2020.03.009.
[30] Sohn, K., Lee, H., Yan, X.C. (2015). Learning structured output 
representation using deep conditional generative models. 
Advances in Neural Information Processing Systems, vol. 28, 
p. 3483-3491.
[31] Tang, W.Q., Yang, Q., Xiong, K.X., Yan, W.J. (2020). Deep 
learning based automatic defect identification of photovoltaic 
module using electroluminescence images. Solar Energy, vol. 
201, p. 453-460, DOI:10.1016/j.solener.2020.03.049.
[32] Tran, T., Do, T., Reid, I., Carneiro, G. (2019). Bayesian 
generative active deep learning. International Conference on 
Machine Learning, p. 6295-6304.
[33] Gao, Y.P., Gao, L., Li, X.Y., Yan, X.G. (2020). A semi-
supervised convolutional neural network-based method for 
steel surface defect recognition. Robotics and Computer 
Integrated Manufacturing, vol. 61, 101825, DOI:10.1016/j.
rcim.2019.101825.
[34] He, D., Xu, K., Zhou, P., Zhou, D.D. (2019). Surface defect 
classification of steels with a new semi-supervised learning 
method. Optics and Lasers in Engineering, vol. 117, p. 40-48, 
DOI:10.1016/j.optlaseng.2019.01.011.
[35] He, Y., Song, K.C., Dong, H.W., Yan, Y.H. (2019). Semi-
supervised defect classification of steel surface based on 
multi-training and generative adversarial network. Optics and 
Lasers in Engineering, vol. 122, p. 294-302, DOI:10.1016/j.
optlaseng.2019.06.020.
[36] Masci, J., Meier, U., Ciresan¸ D., Schmidhuber, J. (2011). 
Stacked convolutional auto-encoders for hierarchical feature 
extraction. International Conference on Artificial Neural 
Networks, p. 52-59, DOI:10.1007/978-3-642-21735-7_7.
[37] Yang, L., Zhang, Y.Z., Chen, J.X., Zhang, S.Y., Chen, D.Z. (2017). 
Suggestive annotation: A deep active learning framework for 
biomedical image segmentation. Medical Image Computing 
and Computer-Assisted Intervention, Lecture Notes in 
Computer Science, p. 399-407, DOI:10.1007/978-3-319-
66179-7_46.
[38] Huang, Z.J., Li, F.M., Luan, X.D., Cai, Z.W. (2020). A weakly 
supervised method for mud detection in ores based on deep 
active learning. Mathematical Problems in Engineering, vol. 
2020, no. 1, 3510313, DOI:10.1155/2020/3510313.
[39] Lv, X.M., Duan, F.J., Jiang J.J., Fu, X., Gan, L. (2020). Deep 
active learning for surface defect detection. Sensors, vol. 20, 
no. 6, 1650, DOI:10.3390/s20061650.
[40] Wang, K.Z., Zhang, D.Y., Li, Y., Zhang, R.M., Lin, L. (2017). 
Cost-effective active learning for deep image classification. 
IEEE Transactions on Circuits and Systems for Video 
Technology, vol. 27, no. 12, p. 2591-2600, DOI:10.1109/
TCSVT.2016.2589879.
[41] Luo, C., Yu, L.J., Yan, J.X., Li, Z.W., Ren, P., Bai, X., Yang, E.F., Liu, 
Y.H. (2021). Autonomous detection of damage to multiple steel 
surfaces from 360° panoramas using deep neural networks. 
Computer-Aided Civil and Infrastructure Engineering, vol. 36, 
no. 12, p. 1585-1599, DOI:10.1111/mice.12686.
[42] Mao, W.S., Li, L.S., Tao, Y.F., Zhou, W.Y. (2023). Surface defect 
image classification of lithium battery pole piece based 
on deep learning. IEICE Transactions on Information and 
Strojniški vestnik - Journal of Mechanical Engineering 70(2024)11-12, 554-568
568 Yang, W. – Zhou. Y. – Meng, G. – Li, Y. – Gong, T.
Systems, vol. E106.D, no. 9, p. 1546-1555, DOI:10.1587/
transinf.2023EDP7058.
[43] Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). ImageNet 
classification with deep convolutional neural networks. 
Advances in Neural Information Processing Systems, p. 1097-
1105.
[44] Simonyan, K., Zisserman A. (2015). Very deep convolutional 
networks for large-scale image recognition. arXiv:1409.1556, 
DOI:10.48550/arXiv.1409.1556.
[45] Szegedy, C., Liu, W., Jia, Y.Q., Sermanet, P., Reed, S., Anguelov, 
D., Erhan, D., Vanhoucke, V., Rabinovich, A. (2015). Going 
deeper with convolutions. IEEE Conference on Computer 
Vision and Pattern Recognition, p. 1-9, DOI:10.1109/
CVPR.2015.7298594.
[46] He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J. (2016). Deep residual 
learning for image recognition. IEEE Conference on Computer 
Vision and Pattern Recognition, p. 770-778, DOI:10.1109/
CVPR.2016.90.
[47] Ponti, M., Kittler, J., Riva, M., Campos, T.D., Zor, C. 
(2017). A decision cognizant Kullback-Leibler divergence. 
Pattern Recognition, vol. 61, p. 470-478, DOI:10.1016/j.
patcog.2016.08.018.
[48] You, K.C., Long, M.S., Wang, J.M., Jordan, M.I. (2019). How 
does learning rate decay help modern neural networks? 
arXiv:1908.01878, DOI:10.48550/arXiv.1908.01878.
[49] Yi, L., Li, G.Y., Jiang, M.M. (2017). An end-to-end steel strip 
surface defects recognition system based on convolutional 
neural networks. Steel Research International, vol. 88, no. 2, 
p. 1600068, DOI:10.1002/srin.201600068.
[50] Wang, X.Q., Gu, Y. (2022). Classification of macular 
abnormalities using a lightweight CNN-SVM framework. 
Measurement Science and Technology, vol. 33, 065702, 
DOI:10.1088/1361-6501/ac5876.
[51] Lee, S.Y., Tama, B.A., Moon, S.J., Lee, S. (2019). Steel surface 
defect diagnostics using deep convolutional neural network 
and class activation map. Applied Sciences, vol. 9, no. 24, 
5449, DOI:10.3390/app9245449.