https://doi.org/10.31449/inf.v48i11.5996 Informatica 48 (2024) 147–166 147 
 
Application of Improved K-means Algorithm in E-commerce Data 
Processing 
 
Wenwei Chen*, Qindi Wang 
School of Business Management, Hangzhou Polytechnic, Hangzhou 311402, China 
E-mail: chenwenwei1979@163.com 
*
Corresponding author 
 
Keywords: k-means; e-commerce goods; genetic algorithm; recommender system; singular value decomposition 
Received: April 8, 2024 
Accurate recommendation processing for a large number of e-commerce products can play a role in 
increasing e-commerce sales and improving the user's consumption experience. This study uses 
genetic algorithm, coefficient of variation method to design an improved k-means algorithm, and 
design an improved singular value decomposition ++ algorithm, so as to construct an e-commerce 
product data recommendation model. The model uses the improved singular value decomposition 
++ algorithm to extract the hidden features of the data, and the improved k-means algorithm to 
realize the recommendation of the products. The performance test results revealed that when the 
number of recommendations was 15, the area under the recommended precision, recall, and 
receiver operating characteristic curves of the designed recommendation model were 85%, 87%, 
and 0.83, respectively, which were higher than all the ablation experiment comparison models and 
advanced recommendation models. The average computational time consumption of 
ISVD++_I_k-means, RLRA, TRRA, and CF models were 54.2s, 73.8s, 83.3s, and 58.7s, respectively. 
Among them, ISVD++_I_k-means consumed less time, but the computational memory consumption 
of the designed model was in the worse level among all the comparison models. The test results 
demonstrate that, although there are certain drawbacks in terms of resource consumption, the 
recommendation model developed in this study can successfully increase the efficiency and quality 
of recommendations. The research results are beneficial to provide reference for e-commerce 
platforms to design more efficient recommendation models. 
Povzetek: Študija uporablja izboljšan algoritem razvrščanja z voditelji (k-means) za obdelavo 
podatkov v e-trgovini z uporabo genetskega algoritma in metode koeficienta variacije, kar 
izboljšuje točnost in učinkovitost priporočil za izdelke.
1 Introduction 
In the current era, people can buy most of the items 
needed for life in e-commerce platform (ECP) [1]. 
However, too much product information can also make 
people uncertain about their purchasing choices, thus 
wasting users' shopping time and degrading the shopping 
experience [2]. Massive volumes of data about online 
shopping are produced as the e-commerce sector grows 
and matures [3]. Effective utilization of online user 
behavior data in e-commerce industry is the key to 
enhance user's consumption experience, so recommender 
system (RS) has gradually become the focus of academic 
research [4]. RS is an information filtering approach that 
aims to help users find more suitable content in a huge 
ocean of information [5-6]. The fields of content-based 
recommendation, collaborative filtering, and hybrid 
recommendation have all evolved since the 1990s, when 
RS was first proposed. During the course of their gradual 
evolution, the corresponding recommendation algorithm 
models have also grown more intelligent and effective, 
however issues like data sparsity and cold start persist [7]. 
Since RS can find the objects that users may be interested 
in from the huge amount of product data by means of 
precise data analysis, it is beneficial to improve the user's 
stickiness and shopping experience, and thus RS is a key 
element in the development of ECP. However, 
considering the shortcomings of the traditional RS model 
mentioned above, it is necessary to improve it. 
K-means algorithm (KMA) is a division based 
clustering algorithm that finds several cluster centers 
(C-C) in an iterative manner and divides the points to the 
nearest C-Cs to cluster the data [8]. This algorithm has 
demonstrated excellent performance in areas such as 
market segmentation, social networking, image 
processing, etc. Since the k samples closest to a particular 
sample in the feature space can be obtained in KMA, it 
naturally has the potential to be applied to 
recommendation work, but KMA also still has some 
problems in clustering quality and stability [9]. 
Specifically, the traditional KMA obtains the initial 
cluster centers (ICC) by random extraction or weighted 
random extraction, which brings large uncertainty to the 
operation results of the algorithm and weakens the 
stability of the algorithm. If the ICCs are in some special 
148   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
positions may cause the algorithm to converge very 
slowly or even fail to converge [10]. In addition, 
traditional KMA often uses equal weights or simple 
weighting when determining the nearest neighbor (N-N) 
list, which cannot reflect the difference in the importance 
of different features for different items, thus leading to 
lower quality of recommendation results. Many 
improvement strategies, based on sample weights 
according to the distribution density of the samples in the 
feature space, have been proposed in academia and 
industry to address the aforementioned shortcomings. 
These strategies help to mitigate the negative effects of 
the random initialization of the C-Cs, but they are not 
able to address the core issues with KMA. To create a 
recommendation model (RM) for the e-commerce goods 
(ECG) data recommendation task, the goal of this 
research is to enhance KMA. This should enable ECP 
users to receive recommendations for goods of a higher 
caliber.  
The following sections comprise the primary 
material of this research. The first part provides 
background information on the evolution of the 
e-commerce business and the resulting rise in demand for 
product recommendations, which is what made this 
research necessary and intended. The second part designs 
the ECG RM based on improved singular value 
decomposition++ (SVD++), genetic algorithm (GA) to 
improve KMA. The third part of the study focuses on 
designing two experiments for testing the 
recommendation accuracy, recommendation efficiency, 
resource consumption and other metrics. The content of 
the last part is to summarize the research content and 
findings of the whole study, and to elaborate and analyze 
the limitations of this research and the future research 
directions. 
2 Related works 
Many computer experts and academics have studied RS 
because it has great application value in retail business 
scenarios with a lot of information. Gwadabe and Liu’s 
research team found that if there is no user archived 
information, e-commerce websites can improve 
recommendation results by using interaction 
transformations across conversations, a method known as 
session-based recommendation. However, this advice was 
limited because of the scant data and the erratic nature of 
user activity. Consequently, the study created a RM using 
a recurrent neural network that was based on 
session-based recommendation. Then the study tested the 
model for recommendations and got the following 
outcomes. The model outperformed common RMs on 
Yoochoose and Diginetica datasets [11]. Roozbahani et al. 
proposed an integrated RM based on a multilayer 
network, which also introduces a semi-supervised module 
that allows it to achieve better recommendation results 
even with insufficient training data. Test results revealed 
that the designed model significantly outperformed the 
pre-improvement model in the social network data 
recommendation task [12]. Choudhary et al. designed a 
deep neural network-based RM using an integrated 
approach which is capable of analyzing both ratings and 
reviews data and sentiment analysis of reviews for 
subsequent processing data. Moreover, the model had a 
hidden layer structure, which could improve the overall 
learning ability. Test outcomes revealed that the Top-5 
recommendation accuracy of the RM designed by the 
authors was 76.3% higher than the pre-improvement 
model and the collaborative filtering RM [13]. Rabiu et al. 
research team found that the historical rating data used 
for recommendation in collaborative filtering RS is 
generally sparse or unbalanced, and the combination of 
user comments and ratings can better capture user 
sentiment and thus help users make high-quality 
recommendations. In light of the aforementioned 
considerations, the team responsible for developing this 
RM opted to construct it on the basis of long- and 
short-term memory neural networks, with the objective of 
capturing the emotional variations between user ratings 
and ratings. The outcomes of the tests based on Amazon 
ECP revealed that the RM designed by the authors is able 
to output high-quality recommendations for cell phones, 
babies, and fine foods for users, which has certain 
application potential [14]. 
KMA is simple in principle, fast convergence, only 
one parameter k needs to be changed when tuning the 
parameters, good interpretability, this is one of the most 
important reasons why it is used most in the industry. For 
the liver and liver tumor segmentation challenge, V. N. 
Pattwakkar et al. suggested a segmentation model based 
on SegNet deep neural network and KMA. The 
experimental results showed that the Dice coefficient of 
96.46 ± 0.48% and the Jaccard index of 93.16 ± 
0.89% were superior to other models [15]. For the 
short-term traffic flow (TF) prediction problem, Sun et al. 
suggested a prediction model based on KMA and gated 
recursive unit prediction. The historical TF data were 
clustered using KMA by the model, and the N-N 
classification approach was utilized to identify the 
historical TF pattern that most closely resembled the TF 
trend on the forecast date. The model enhanced prediction 
accuracy and took into account the diversity of TF 
patterns, according to experimental data [16]. In reaction 
to the grey wolf optimization (GWO) method's 
propensity to enter a local optimum, Mohammed et al. 
suggested changing the algorithm by employing KMA. 
By segmenting the population into distinct groups, the 
K-means clustering algorithm (KMCA) in the modified 
GWO algorithm will improve the performance of the 
original GWO. According to experimental data, 
KM-GWO outperforms the other algorithms in terms of 
significant values. Furthermore, pressure vessel design 
issues were successfully resolved using KM-GWO [17]. 
For feature selection in supervised situations, Ziabari et al. 
presented an effective infinite feature selection technique 
based on K-means (KM). The process began with 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 149 
clustering the feature space into a predefined number of 
subspaces. Next, the features in each subspace were 
ranked using the Inf-FS approach. Lastly, the resulting 
subclassifications were combined using a measure of 
information theory and cluster size. The accuracy, 
runtime, and memory consumption of this method are 
better than those of infinite feature selection, according to 
experimental data [18]. Natarajan and Rebekka proposed 
an optimization strategy for SC switching algorithm 
based on KMA and dynamic loading for the problem of 
optimizing the energy efficiency of topological network 
systems with small base stations and auxiliary macro base 
stations [19]. The specific research results of various 
scholars are shown in Table 1. 
Table 1: Specific research achievements of scholars 
Lead author Method Result Disadvantage 
Gwadabe and Liu [11] 
Improving graph neural networks 
based on session recommendation 
systems through non sequential 
interaction 
More than 10% higher 
performance than other 
state-of-the-art models on 
the Yoochoose and 
Diginetica datasets 
Relatively 
dependent on 
session-based 
recommendation 
models 
Roozbahani et al. [12] 
Integrated model based on 
multi-layer networks 
This model can prevent 
information loss in the 
network 
Social network data 
is dynamically 
changing, and there 
may be some errors 
in the results 
Choudhary et al. [13] 
Recommendation model based on 
integrated deep learning methods 
The model performs well 
in user preference 
recognition 
The two outputs in 
the integrated 
network must be 
within the same 
range 
Rabiu et al. [14] 
recommendation system based on 
adaptive long short term memory 
network 
This model is superior to 
existing static and dynamic 
models 
There may be some 
errors in simulating 
user and project 
characteristics 
Pattwakka et al. [15] 
A liver tumor segmentation model 
based on K-means clustering and 
segNet deep neural network 
The model performs well 
in liver tumor 
segmentation 
False positives can 
affect the accuracy 
of liver tumor 
segmentation 
Sun et al. [16] 
Short term traffic flow prediction 
model combining K-means 
clustering and doorstep recursive 
units 
This model considers the 
diversity of traffic flow 
patterns and improves 
prediction accuracy 
This model can only 
predict short-term 
traffic flow and 
cannot achieve 
coverage of the 
entire road network 
Mohammed et al. [17] 
Engineering problem solving 
method combining grey wolf 
optimization algorithm and 
K-means clustering 
This method effectively 
solves the problem of 
pressure vessel design and 
has better performance 
than other algorithms 
May fall into local 
optima 
Ziabari et al. [18] 
Infinite feature selection method 
based on K-means clustering 
On six benchmark 
datasets, this method 
outperforms the infinite 
feature selection method in 
terms of accuracy, runtime, 
and memory consumption 
The algorithm needs 
to set the number of 
subspaces to be 
partitioned in 
advance, which may 
affect the feature 
selection results 
Natarajan and Rebekka 
[19] 
Small cell handover algorithm 
based on K-means clustering and 
dynamic load 
Compared with traditional 
K-means clustering 
methods, this method 
improves energy efficiency 
by 20% and system 
throughput by 16% 
The setting of 
dynamic load 
thresholds has a 
certain impact on 
switching decisions 
150   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
 
 
 
In summary, although previous researchers have 
conducted a lot of studies to improve the 
recommendation accuracy and quality of RS, most of the 
studies still choose to use traditional recommendation 
algorithms. Due to the issues with these RMs —such as 
their excessive reliance on data and cold start —this study 
avoids using them and instead builds RMs using KMA, 
which is unaffected by the aforementioned issues. 
 
3 E-Commerce data processing 
model based on improved SVD++ 
with improved K-Means 
With the development of e-commerce, consumers of 
ECPs face a large number of choices when shopping. In 
addition, KMA is a method that can cluster the samples 
according to the similarity principle based on the data 
features [20-22]. Therefore, applying KMA to ECG RMs 
can achieve better interpretability than neural networks 
and machine learning algorithms [23]. However, the 
performance of this approach in RMs is impacted by the 
typical KMA's shortcomings, which include 
unpredictability in the selection of initial clustering 
centers, inappropriate feature weighting calculations, and 
inadequate attention to hidden features (HFs) in the data 
[24]. Therefore this study improves KMA so as to 
construct a RM for ECG data. 
 
3.1 Hybrid improved SVD++ and K-Means 
recommendation model for E-Commerce 
Goods 
The classic SVD++ algorithm's flow is depicted in Figure 
1. The SVD++ algorithm is chosen to be fused with the 
KM model in order to enhance the latter's capacity for 
data processing because of its benefits in HF calculation. 
The SVD++ algorithm has the following problems. First, 
the SVD++ algorithm calculates the implicit feedback by 
training the implicit scores through multiple iterations 
[25]. Second, this SVD++ algorithm does not take 
enough consideration of realistic factors, such as the 
frequency of users' ratings versus the frequency of items 
being rated, which can affect the overall computational 
accuracy of the SVD++ algorithm [26]. A new and 
improved SVD++ algorithm is currently suggested to 
tackle the aforementioned issues. This work modifies the 
implicit feedback calculation to address the SVD++ 
algorithm's low computing efficiency drawback. The 
SVD++ technique is fused with KMA based on the 
principle of N-N in order to increase its computational 
accuracy. This improvement brings the recommendation 
results closer to the user's tastes. 
 
 
Start
Set parameters and initialize metrics 
calculated based on the dataset
Traverse the training set sequentially
Calculate the set and length of 
movies watched by the current user
Calculate implicit feedback
Calculate the deviation of the 
current sample
Update processing metrics based 
on datasets, such as hidden 
features of users and items
Output the data calculated from 
the last iteration
End
N
Y
Calculate global average score
Have 
all training set samples been 
processed?
 
Figure 1: Running process of traditional SVD++algorithm 
 
The computational inefficiency of the SVD++ 
algorithm is due to the fact that implicit feedback can 
only be obtained through implicit scoring training, so this 
study proposes to avoid this approach to obtain implicit 
feedback [27-28]. Implicit feedback also exists for items 
from the perspective of the recommended item (RI), but 
this feedback exists in a passive form due to the 
reciprocal nature of the interaction behavior between the 
item and the user [29]. The user's own implicit feedback 
behavior and the strength of the item's audience can be 
indirectly reflected when the user takes action on the item 
[30]. Therefore, in this study, the map is chose to the 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 151 
user's implicit feature vector 
u
p to implicit feedback 
and superimpose it to the item's implicit feature 
i
q . After 
improving the implicit feedback calculation method, the 
SVD++ algorithm also needs to be improved to 
accommodate more elements that affect the quality of 
recommendation. When applied to the work with 
suggestions, the typical SVD++ algorithm frequently 
ignores the user's rating count and the number of times 
the item has been rated. In addition, these two points will 
bring obvious influence on the user's preference judgment. 
As a result, the SVD++ algorithm's cost function needs to 
be revised; Equation (1) displays the updated cost 
function. 
( )
( )
( )
* * *
2
,,
,
2 2 2 2
[]
min
T
ui
ui u i i u
all all
p q b
u i trainset
u i u i
LL
r b b q ppp i p
LL
p q b b




− − − − −  + + 


+ + + +
 (1) 
 
In Equation (1), 
[] ppp i
 denotes the hotness of the 
corresponding item i on the HF, 
i
L denotes the total 
users who have rated item i . 
u
L is the total items rated 
by user 
u
, and 
all
L is the total samples in the training 
dataset. 
u
all
L
L
 and 
i
all
L
L
 represent the activity of the 
corresponding user and the total heat of the 
corresponding item, respectively. 
u
p is the user's HF 
and 
i
q is the item's HF. 
u
b and 
i
b denote the 
deviation of the user and the deviation of the item, 
respectively. 
ui
r is the rating of user 
u
 on item i 
calculated in Equation (2). 
T
ui i u
r q p =           (2) 
In Equation (1), both 
[] ppp i
 and 
i
all
L
L
 can be 
interpreted as the hotness of the item, the former 
represents the hotness of the corresponding item on the 
HF, which can also be interpreted as the feature hotness. 
For example, a certain item has a high artistic component, 
and also many users go to buy and evaluate it, which 
leads to a higher heat of the item on the artistic 
component. That is to say, feature heat represents the 
length of the item in a certain feature dimension that 
makes the user favorite. 
i
all
L
L
 can be interpreted as the 
overall heat of the item, indicating that when an item is 
rated more times than the rest of the items, the item is 
able to appeal to more than one group of users on 
multiple feature dimensions. For example, in the field of 
commodity recommendation, a commodity contains 
artistic components, practicality, value preservation and 
so on at the same time, which can satisfy the needs of 
different consumer users. Furthermore, an item's rating 
count may be a good indicator of a user's general interest 
in it. If an item has no selling point or is difficult to 
arouse interest, its total number of ratings will be very 
small. Therefore, the total number of ratings of an item 
can be used to describe the overall hotness of the item.  
So far the computational method of the HF 
generation module of ECG RM based on SVD++ 
algorithm with KMA, i.e. ISVD++ algorithm is designed. 
The computational flow of this algorithm is shown in 
Figure 2. As shown in Figure 2, the ISVD++ algorithm 
first needs to input the user's rating data for ECGs and 
construct rating matrices based on these rating data. The 
aforementioned rating matrices will then be divided in 
order to calculate the overall heat matrix of the item, the 
implicit feedback obtained from the user's implicit 
features, and the user's activity data. Furthermore, the 
high-end implicit feedback will be transformed into the 
feature specificity of the item. These data will be stored 
in a database for subsequent KMCA modules. 
 
152   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
User rating data 
input
Building an 
item and user 
rating matrix
Calculate the overall 
heat of the item
Transforming user 
implicit features into 
implicit feedback
Calculate user 
activity
Mapping Implicit 
Feedback to the Feature 
Characteristics of Items
Store the processed 
data in the database 
for future use
Splitting 
training and 
testing sets
 
Figure 2: Calculation process of ISVD++algorithm in e-commerce product recommendation 
 
Specifically, the first step is to construct a user-item 
rating matrix using the input data, i.e., the user's rating 
data, which will be used as input to the ISVD++ 
algorithm. But first, it's also required to separate the data 
into a training set and a test set before feeding the matrix 
into the ISVD++ algorithm. In the third step i.e. to start 
the model training, firstly 
all
L , 
i
L and 
u
L need to be 
calculated based on the training set. Then 
u
L is mapped 
as the user's activity metrics and BB is mapped as the 
item's overall hotness by using Equation (1). After the 
mapping computation is completed, iterative computation 
is started with the aim of transforming 
u
p into implicit 
feedback, and then finally obtaining the characteristic 
hotness of the item. Considering that the N-N users of the 
current user need to be obtained for ECG 
recommendation as a reference for predicting the interest 
of the current user, KMA constructed according to the 
N-N idea is chosen to design the e-commerce RM. 
Furthermore, KMA is better suited for the e-commerce 
recommendation task due to its straightforward 
implementation and high interpretability. 
 
3.2 Improved design of KMA in 
recommendation modeling 
The KMA in the ECG RM designed in this research has 
shortcomings such as unreasonable calculation of 
indicator weights, so the KMA is now improved. The 
coefficient of variation method is essentially an objective 
assignment method, which calculates the corresponding 
weight coefficients through the information in the 
indicators. This time, the coefficient of variation method 
is used to calculate the variable weights of the 
recommended commodities in the RM, which can play a 
role of reflecting the differences between items more 
objectively. In the case of the recommended commodities, 
the variable's significance for the present commodity 
increases with its difference; hence, the coefficient of 
variation approach will assign this variable a higher 
weight. Now the process of calculating the weights of 
item variables for the coefficient of variation method is 
specifically designed in a variable parameterized manner. 
Let a dataset contains 
n
 samples, each sample contains 
r variables, so the dataset 
data
N can be described 
according to Equation (3). 
11 22 1
21 22 2
12
...
...
... ... ... ...
...
r
r
data
n n nr
x x x
x x x
N
x x x



=



        (3) 
In Equation (3), 
np
x
 represents the r th variable 
data for the 
n
th sample. The mean 
j
x
of indicator 
j
 
can be described by Equation (4). 
1
k
ij
i
j
x
x
n
=
=

          (4) 
Similarly, the standardized mean deviation 
j

 of 
indicator 
j
 can be described by Equation (5). 
( )
12
1
1
1
k
j ij j
i
xx
n

=

=−

−


      (5) 
Therefore, the coefficient of variation 
j
V
 for 
indicator 
j
 can be calculated using Equation (6). 
j j j
Vx  =
           (6) 
On the basis of Equation (6), the weight coefficients 
i
W for each indicator of sample i can be calculated by 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 153 
Equation (7). 
1
r
i j j
j
W V V
=
=
         (7) 
KMA also has some pitfalls in the ICC selection. In 
this study, the ICC selection and updating method are 
first analyzed from the traditional KMA point of view, as 
shown in Figure 3. The quality of the current clustering 
results in Figure 3 is evaluated according to the index P 
calculated in Equation (8). 
2
11
nC
ij
ij
P x c
==
=−

          (8) 
In Equation (8), C represents the total cluster 
classes for the current data, 
i
x is the i th data within 
the 
j
th cluster, and 
j
c
 is the 
j
th cluster class 
centroid. In Figure 3, the number of clusters is first 
determined and then the cluster centroids are randomly 
initialized. Subsequently, the sample points are assigned 
by computing the distance of each sample from the C-C 
point. The average value of each cluster is determined 
and used as the new cluster clustering center once all the 
samples have been assigned to the closest C-C. The 
sample points are then redistributed using the previously 
described sample point allocation procedure. The ICCs 
are typically selected at random or weighted randomly 
based on the density of the data distribution; however, 
there is a degree of randomness in both approaches, 
which reduces the stability of the clustering outcomes. 
 
 
Figure 3: Traditional k-means algorithm C-C initialization and update method 
 
GA as a type of heuristic intelligence algorithm, is 
extensively employed in the field of mathematical model 
and algorithm parameter optimization due to its superior 
capacity for automated computing and global 
optimization seeking. The system is based on the 
principle of natural selection and focuses on the encoding 
of decision variables as its primary operational objective. 
The algorithm does not require an understanding of the 
problem itself. Rather, it utilizes the fitness value of 
chromosomes as search information, retaining 
high-fitness chromosomes to eliminate low-fitness 
chromosomes. Ultimately, the optimal or quasi-optimal 
solution is obtained through population iteration. In 
Figure 4, the GA computational flow is displayed. 
Select fitness 
function
Population 
initialization
Construct 
fitness function
Perform selection 
operations
Performing 
breeding 
operations
Repeat the above steps 
until the population 
evolution converges
Select the best 
individual
Complete all 
breeding 
operations
 
154   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
Figure 4: GA calculation process 
 
Therefore, this study uses GA to optimize the ICC 
calculation of KMA. The optimal ICC can be identified 
through the use of the global search capability of the GA. 
This is followed by the implementation of the improved 
KMCA, which enhances the stability of the algorithm and 
yields satisfactory clustering results. The chromosome 
encoding method of GA has numerical encoding and 
binary encoding, but considering that the e-commerce 
data is more complex, it is more appropriate to choose 
binary encoding, as shown in Equation (8). 
( )
( ) 1
1
,2
21
t
kl j tj kk
l k t
j
uv
g x k u x
l
+ −
=
 −
= + 

−


(8) 
In Equation (8), 
t
l
x represents the t th 
chromosome calculated to the l th generation. k is a 
real number parameter, 
k
u and 
k
v are the upper and 
lower limits of parameter k , respectively. ( )
,
t
l
g x k 
represents the gene position of 
t
l
x chromosome under 
the condition of parameter k . l represents the number 
of gene loci of the chromosome. 
The first step of KM clustering using the GA is to 
binary encode the centroids and then select the 
appropriate ICCs according to the fitness function output 
by the GA. Since the research topic of this study is ECG 
recommendation, each set of similar e-commerce data 
can be regarded as a cluster. Suppose the dataset obtained 
by the RM from the database is  
1,2,...,
j
D d j n == , 
and 
j
d
 is the 
j
th data among them. Then let the 
number of C-Cs in D be  
1,2,...,
ki
N C i k
=
== and 
i
C be the i th cluster. Cluster optimization is performed 
according to the standard deviation sum as a metric, 
which is shown in Equation (9). 
( )
2
1
i
k
ki
i P C
f P P m
=
=−
      (9) 
In Equation (9), ( )
k
fP represents the sum of 
standard deviations between clusters in the data set. 
i
m 
is the mean value of the 
i
C clusters and is calculated in 
Equation (10). 
1
i
i
pC i
mp
t

=
           (10) 
In addition, the similarity between the data is 
calculated according to Euler's formula as shown in 
Equation (11). 
( ) ( )
1
2
2
1
,
d
i j it jt
i
d x m x m
=

=−



     (11) 
In Equation (11), 
it
x and 
jt
m
 are the 
i
x data 
and cluster 
j
C
 means at the t th calculation, 
respectively. 
So far the hybrid GA improved KMCA can be 
obtained and its computational flow is shown in Figure 5. 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 155 
Enter e-commerce data
Calculate the distance between the current 
sample and each cluster center
Classify samples based on their distance from 
the centers of each current cluster
Calculate the mean of the sample
Update the next cluster center
Is there a
 difference between the current cluster 
center and the previous generation cluster 
center?
Output the latest clustering results
N
Y
Using GA algorithm to output initial cluster 
centers
Have all samples been calculated?
Traverse the 
next sample
Y
N
 
Figure 5: Calculation steps of improved k-means clustering algorithm for hybrid GA 
 
In Figure 5, the clustering is excellent when 
Equation (9) is calculated to obtain the minimum value of 
the function, and this study follows this way of 
calculating the fitness, as shown in Equation (12). 
( ) ( )
max ii
f R E E R =−      (12) 
Equation (12) in which 
max
E represents the 
maximum value of chromosomal variance in population 
R and ( )
i
ER represents chromosomal variance in 
population 
i
R of the i th population. ( )
i
fR 
represents the calculated comfort level. In addition, after 
incorporating the GA, the selection probability is 
calculated in KM according to Equation (13). 
( ) ( )
1
n
i i i
i
P f R f R
=
=

       (13) 
In Equation (13), 1,2,..., nn = , 
i
P represents the 
selection probability 
i
P for the i th data. 
Now the GA is combined with the hybrid SVD++ 
designed above with the improved KM of the coefficient 
of variation method to construct the ECG-oriented RM. 
In addition, Figure 6 depicts the particular model 
structure. The first step of the model computation is to 
train the ISVD++ model, which aims to provide input 
data for the subsequent computation boards. In the first 
step, the cost function is optimized according to the 
stochastic gradient descent method, and the HF matrix of 
items and users can be output. Moreover, in the first step, 
the sparse rating matrix needs to be complemented and 
the descending process is performed according to the 
ratings to get the recommendation list of all users. The 
second step of the model is to cluster the users using 
improved KMA. The elbow technique is chosen in this 
case to identify the clusters or user groups K due to the 
sizeable user population in the dataset and the presence of 
variances depending on gender, position, age, interests, 
and other variables. Subsequently, the specific user 
u
 
can be identified where the cluster, as determined by the 
KMA, can be located in the N-N of the user B. This 
serves as the basis for determining the user 
u
's 
preferences. Processing the user recommendation list and 
removing the items' current user ratings is the third step. 
Because these items are already known to the user, do not 
need to do a repeat of the recommendation, the rest of the 
list to intercept the first h items that is the user's 
recommendation of the product program. The fourth step 
is to measure whether an item needs to be recommended 
to the corresponding user according to the set strategy. 
The precise procedure is as follows: Determine the 
average score and the total frequency of identical items in 
156   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
the suggested data. Additionally, the mean score and the 
frequency of comparable occurrences can be determined. 
Ultimately, the final list of recommendations is obtained 
by sorting the things in descending order of frequency. 
ISVD++algorithm
User behavior and item 
data input
User Hidden Features
User recommendation 
list
Hidden Feature Matrix of Items
Improved k-means clustering of 
users using hybrid GA
Output nearest neighbor list data 
for each user
Recommendation list
Using the coefficient of variation 
method to weight items
Improved k-means clustering of 
items using hybrid GA
Filter out rated items
Calculate the number of 
occurrences and average 
score of the same item
Calculate the cluster of 
items and the number 
and average score of 
similar items appearing
Process in descending 
order of frequency and 
then in descending order 
of scores
Obtain the final 
recommendation result
Population initialization
Item characteristics
Recommendation result 
processing
 
Figure 6: E-commerce product recommendation model structure of hybrid ISVD++and improved k-means algorithm 
 
 
 
 
4 E-Commerce data processing 
model testing based on improved 
SVD++ with improved K-Means 
To compare the effectiveness of the ECG data RM based 
on Improved SVD++ and Improved KM proposed in this 
research in the recommendation task, a test experiment is 
now being planned and carried out. The creation of the 
experimental protocol, the analysis of the ablation 
experiment's results, and the analysis of the control 
experiment's results comprise the test experiment. 
4.1 Testing of experimental programs 
The dataset chosen for the test experiments is obtained 
from Taobao, a Chinese online shopping platform, which 
contains a total of 5842 users and 6447 items. There are 
827,384 user rating records. The experimental setup and 
parameters are as follows, taking into account the size of 
the dataset and the computational load of the 
recommendation task (Table 1). The parameters in Table 
1 are specifically determined in a larger range of values 
according to the grid analysis method. 
Table 1: Experimental environment and parameters 
Type Name Number Set results 
Model 
parameter 
Population size #01 158 
Cross probability #02 0.7 
Mutation probability #03 0.05 
Selection method #04 Tournament selection 
Chromosome length #05 89 
GA iterations #06 300 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 157 
K-mean algorithm convergence 
threshold 
#11 0.001 
Distance measurement method #12 Euclidean distance 
Hardware Central Processing Unit #21 i7-7560U 
Hard disk #22 Solid state drive, 512GB 
Memory #23 16GB 
Graphics processing unit #24 GTX1060 
Software 
environment 
Simulation software  #31 Jupyter Notebook 
Development environment #32 Python3.7 
Operating system #33 Windows 10 professional edition 
 
The comparison algorithms SVD++, KM, and 
improved k-means with hybrid GA (GA-KM) are 
selected for the ablation experiments to construct the RM. 
Collaborative filtering (CF) is selected for the control 
experiments as well as the advanced temporal-aware 
recommendation algorithm (TRRA), reinforcement 
learning-based recommendation algorithm (RLRA) to 
construct the comparison model. The dataset requires 
result preprocessing with the aim of removing null values, 
illegal data, and useless features. The dataset is split in a 
7:3 ratio between the training and test sets. Precision, 
recall, area under curve (AUC) of receiver operating 
characteristic curve (ROC), computation time consumed, 
and memory consumption of TOP-N are chosen as the 
evaluation indexes of model performance in the test. 
 
 
 
4.2 Analysis of model ablation experiments 
Initially, it examines the data from the ablation 
experiment. In Figure 7, the statistical findings for each 
comparison model's precision and recall throughout the 
training phase are displayed. When the number of 
suggestions is set to 30, Figure 7 displays the results of 
the computation. The model "ISVD++_I_KM" is created 
specifically for this study. As training times and quality 
increase, each model's Precision and Recall start to rise 
upward and then gradually stabilize. When more than 200 
iterations are completed, each model's recommendation 
accuracy index fluctuates less. It can be assumed that the 
training of each model is completed when the number of 
iterations is 400, at which time the Precision and Recall 
of ISVD++_I_KM, GA-KM, SVD++, and KM models 
are 93.5%, 91.2%, 87.2%, and 82.9% versus 92.4%, 
91.3%, and 86.4%, respectively, 81.7%. 
158   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
Precision/%
Iterations
0 50 100 150 200 250 300 350 400
30
40
50
60
70
80
90
100
ISVD++_I_k-means
GA-k-means
SVD++
K-means
Recall/%
Iterations
0 50 100 150 200 250 300 350 400
30
40
50
60
70
80
90
100
ISVD++_I_k-means
GA-k-means
SVD++
K-means
(a) Precision
(b) Recall
 
Figure 7: Precision and recall of each model during the training phase 
 
Figure 8 displays the statistical findings of Precision 
and Recall for each model under various numbers of 
recommendations in the test set. The number of RIs in the 
output is shown by the values of each axis in Figures 8(a) 
and 8(b), respectively. The recommendation accuracy of 
each model exhibits an overall monotonically growing 
change trend when the number of RIs drops, regardless of 
Precision or Recall, while the growth rate is gradually 
decreasing. When the number of recommendations is 15, 
the Precision of ISVD++_I_KM, GA-KM, SVD++, and 
KM models are 87.9%, 85.3%, 82.9%, and 76.4%, 
respectively, and when the number of recommendations 
is 30, ISVD++_I_KM, GA-KM, SVD++, and KM 
models have Precision of 93.1%, 91.90%, 86.7%, and 
83.0%, respectively. It can be noticed that after the 
number of RIs grows from 15 to 30, the recommendation 
accuracy does not show a large increase, and then 
considering that recommending too many items will 
increase the difficulty of the user's choice, so the 
recommendation number parameter of the subsequent 
ablation experiments is fixed at 15. 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 159 
(a) Precision (b) Recall
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
5
10
15
20
25
30
ISVD++_I_k-means GA-k-means
SVD++ K-means
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
5
10
15
20
25
30
ISVD++_I_k-means GA-k-means
SVD++ K-means
 
Figure 8: Precision and recall of ablation experimental models under different recommendation numbers 
 
Finally, under the condition that the number of RIs 
is 15, all the metrics of the statistical ablation 
experimental model are shown in Table 2. The 
ISVD++_I_KM model has a greater accuracy and AUC 
than all of the comparative models combined. However, 
the computational time consumption with average 
memory consumption and maximum memory 
consumption of ISVD++_I_KM model is also higher than 
the comparison models. 
 
Table 2: All performance indicators of the ablation experimental model under 15 recommended item conditions 
Index ISVD++_I_k-means GA-k-means SVD++ K-means 
Top_N_Pre /% 85.4 85.3 82.9 76.4 
Top_N_Rec /% 87.9 84.9 81.9 74.4 
AUC 0.83 0.79 0.74 0.65 
Average calculation time/s 54.2 47.3 36.9 45.1 
Average memory consumption/MB 16.8 11.2 6.5 9.4 
Maximum memory consumption/MB 17.1 11.8 8.9 13.5 
 
4.3 Analysis of Model-Controlled 
experiments 
The statistical results of the recommendation metrics of 
the ISVD++_I_KM model and the rest of the 
state-of-the-art models for each number of 
recommendations are shown in Figure 9. The 
recommendation accuracy of each model increases with 
the increase in the number of RIs. In addition, the 
Top_N_Pre and Top_N_Rec of ISVD++_I_KM model 
are always higher than the rest of the comparison models. 
When the number of RIs is 15, the Precision and Recall 
of ISVD++_I_KM, RLRA, TRRA, and CF models are 
85%, 73%, 70%, and 59% versus 87%, 75%, 69%, and 
58%, respectively. 
 
160   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
0.59
Recommended product quantity
Top_N_Pre
0 5 10 15 20 25 30 35 40
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
ISVD++_I_k-means
RLRA
CF
TRRA
0.70
0.73
0.85
0.58
Recommended product quantity
Top_N_Rec
0 5 10 15 20 25 30 35 40
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.69
0.75
0.87
(a) Precision (b) Recall
ISVD++_I_k-means
RLRA
CF
TRRA
 
Figure 9: Comparison of recommendation accuracy among different models in the comparative experiment 
 
The comparison of the ROC of each model and its 
AUC is shown in Figure 10. The ROC of ISVD++_I_KM, 
RLRA, TRRA, and CF models corresponds to an AUC of 
0.83, 0.81, 0.78, and 0.76, respectively, and the former is 
significantly higher than the latter three. 
True positive rate
False positive rate
0.00 0.20 0.40 0.60 0.80 1.00
0.00
0.20
0.40
0.60
0.80
1.00
(a) ISVD++_I_k-means
True positive rate
False positive rate
0.00 0.20 0.40 0.60 0.80 1.00
0.00
0.20
0.40
0.60
0.80
1.00
(b) RLRA
0.83
True positive rate
False positive rate
0.00 0.20 0.40 0.60 0.80 1.00
0.00
0.20
0.40
0.60
0.80
1.00
(c) TRRA
True positive rate
False positive rate
0.00 0.20 0.40 0.60 0.80 1.00
0.00
0.20
0.40
0.60
0.80
1.00
(d) CF
0.78
0.81
0.76
 
Figure 10: Comparison of ROC and AUC of various models 
 
The computational elapsed time of each model is 
then counted and shown in Figure 11. The computational 
elapsed time of the ISVD++_I_KM model is only longer 
than the CF algorithmic model in different genders and 
different age groups. As a whole, the average 
computational elapsed time of ISVD++_I_KM, RLRA, 
TRRA, and CF models are 54.2s, 73.8s, 83.3s, and 58.7s, 
respectively. In terms of age segments, the computational 
elapsed time of recommendation for adult users is higher 
than that of minors, which is due to the fact that adults 
have a greater need for shopping and more relevant data. 
The reason why the recommendation consumption time 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 161 
of female users is higher than that of males is also the same. 
68.2
41.3
85.1
62.5
93.7
72.9
37.2
21.5
0
20
40
60
80
100
120
Adult
Minor
Average calculation time/s
Object
ISVD++_I_k-means
RLRA
TRRA
CF
38.1
70.5
59.4
86.7
68.3
96.5
24.7
40.1
0
20
40
60
80
100
120
Man Woman
Average calculation time/s
(a) Age group classification (b) Gender classification
54.2
73.8
83.3
58.7
0
20
40
60
80
100
Overall
Average calculation time/s
ISVD++_I_k-means RLRA
(c) Overall statistics
ISVD++_I_k-means
RLRA
TRRA
CF
Object
TRRA CF
 
Figure 11: Comparison of computational time for various models 
 
The computational memory consumption of each 
model is shown in Figure 12. To enhance the reliability of 
the statistical results, each experiment is repeated on 
multiple occasions. The data presented in Figure 12 
represents the outcome of a single run. The median 
memory consumption of the ISVD++_I_KM, RLRA, 
TRRA, and CF models is 17.0MB, 12.6MB, 13.9MB, 
and 6.8MB, respectively. From the perspective of 
stability, the standard deviation of memory consumption 
for the ISVD++_I_KM, RLRA, TRRA, and CF models in 
Figure 12 are 3.24MB, 5.96MB, 4.88MB, and 4.57MB, 
respectively. This indicates that the models developed in 
this work operate with more stability, and that a higher 
level of redundancy in a commercialized product is not 
required to guarantee a smooth functioning of the models. 
 
162   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
Model name
Memory consumption/MB
ISVD++_I_k-means
0.0
3.0
6.0
12.0
15.0
18.0
RLRA TRRA
9.0
21.0
24.0
CF
 
Figure 12: Comparison of computational memory consumption among different models 
 
Next, the data of all performance evaluation indexes 
of each model in the control experiment are counted, and 
Table 3 is still counted in the way that the number of RIs 
is 15. Overall, the recommendation accuracy of the 
ISVD++_I_KM model developed in this work is 
substantially higher than that of the control model, and its 
computational time is slower than that of the more 
sophisticated RLRA and TRRA models. But the 
computational memory consumption is higher, with an 
average memory consumption of 16.8 MB, which is 
higher than that of all the comparison models. 
 
Table 3: Comparison of all performance evaluation indicators of each model in the control experiment 
Index ISVD++_I_k-means RLRA TRRA CF 
Top_N_Pre /% 85.4 73.1 69.8 59.2 
Top_N_Rec /% 87.9 74.8 69.3 58.0 
AUC 0.83 0.81 0.78 0.76 
Average calculation time/s 54.2 73.8 83.3 58.7 
Average memory consumption/MB 16.8 12.5 13.6 6.7 
Maximum memory consumption/MB 17.1 16.6 20.5 13.9 
 
Finally, to further validate the recommendation 
performance of the model, a comparison of ISVD++I_k 
means, RLRA, and TRRA models is conducted through 
A/B testing. A/B testing is a random testing method that 
allows for the comparison of hypotheses between two 
different objects. In A/B testing, two RMs are first 
deployed to the recommendation terminal simultaneously. 
Subsequently, user traffic from the spare parts platform 
website is allocated to the two RMs with a 50% 
probability for recommendation. Then, user behavior 
information is recorded on the recommendation results. 
In the final step, the recommendation indicators of the 
two models must be compared and analyzed based on the 
collected user feedback results. In the comparison 
between ISVD++ and RLRA, after parsing user 
recommendation logs, 70 successful recommendation 
records and 340 recommendation failure data are 
obtained. In the comparison between ISVD++I_k means 
and TRRA, 50 successful recommendation records and 
310 recommendation failure data are obtained. The 
results are presented in Figure 13. 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 163 
(a) Comparison between ISVD++I_k means and RLRA
Reason 1 Reason 2 Reason 3 Reason 4 Reason 5
0
50
100
150
200
Frequency
Recommended Results
58
13
147
192
(b) Comparison between ISVD++I_k means and TRRA
Reason 1 Reason 2 Reason 3 Reason 4 Reason 5
0
50
100
150
200
Frequency
Recommended Results
65
19
140
186
 
Figure 13: A/B test results of different models 
 
 
Figure 13 (a) illustrates that the ISVD++I_k means 
model has 58 successful recommendation records, with a 
recommendation success rate of 28.8%. The RLRA 
model yielded 13 successful recommendation records, 
with a recommendation success rate of 6.34%. Figure 13 
(b) illustrates that the ISVD++I_k means model has a 
successful recommendation record of 65, with a 
recommendation success rate of 31.7%. The RLRA 
model achieved a successful recommendation record of 
19, with a recommendation success rate of 9.27%. In 
comparison to the other two models, the success 
recommendation rate of the ISVD++I_k means model has 
demonstrably improved, thereby corroborating the 
hypothesis that users are more satisfied with the model. 
 
5 Discussion 
The exponential growth of e-commerce has led to the 
accumulation of a vast quantity of intricate data, 
rendering it challenging for users to swiftly and 
accurately identify the information they are seeking 
amidst this deluge of information. The development of 
algorithms capable of providing personalized 
recommendations to users, assisting them in discovering 
products of interest and enhancing purchase rates and 
user satisfaction, has emerged as a pivotal research topic 
among professionals in this field. Although traditional CF 
algorithms and content-based recommendation 
algorithms have demonstrated improvements in 
recommendation performance, these algorithms often 
encounter limitations due to sparsity and cold start issues, 
164   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
which result in average recommendation performance. In 
recent years, the application of deep learning algorithms 
in the domain of personalized recommendations has 
become increasingly prevalent. Conventional algorithms 
typically necessitate a substantial quantity of training data 
and exhibit high computational complexity. Consequently, 
the study integrates the conventional RM SVD++ with a 
straightforward and expedient unsupervised learning 
approach, K-means clustering, and optimizes them 
individually to develop an e-commerce data processing 
model, ISVD++I_KM, based on enhanced SVD++ and 
enhanced k-means. 
The experimental results demonstrated that the 
accuracy, recall, and AUC of the ISVD++I_KM model 
are significantly superior to those of benchmark models 
such as SVD++ and k-means. This suggested that the 
ISVD++I_KM model is more effective than traditional 
methods in identifying complex interaction relationships 
between users and products on ECPs. In comparison to 
the session-based graph neural network RM proposed by 
Gwadabe and Liu [11], the ISVD++I_KM model was 
capable of directly modelling user-product interaction 
relationships, thus effectively circumventing issues that 
may arise due to data sparsity or model instability. In 
comparison to the ensemble model based on deep 
learning proposed by Choudhary et al. [13], the 
ISVD++I_KM model was not constrained by the 
requirement of ensuring that two outputs are within the 
same range. This demonstrated that it is more flexible and 
efficient. In terms of computational efficiency, the 
ISVD++I_KM model exhibited slightly higher time and 
space overhead than the benchmark model, yet it 
outperformed complex deep learning recommendation 
algorithms such as RLRA and TRRA. Furthermore, a 
comparison was made between the ISVD++I_KM model 
and the adaptive long short-term memory network 
designed by Rabiu et al. It was found that the 
ISVD++I_KM model does not require complex feature 
extraction processes, thus having significant advantages 
in computational efficiency. 
In conclusion, although the ISVD++I_KM model 
exhibits some limitations in terms of memory usage, it 
effectively incorporates the strengths of its predecessors. 
The combination of the SVD++ and KMAs has led to 
significant improvements in key indicators such as 
accuracy and computational efficiency. These advances 
have the potential to expand the applicability of the 
model in the field of e-commerce recommendation. 
6 Conclusion 
This work used the SVD++ algorithm to create ECG RM, 
which was based on enhanced KMA. The ablation 
experiments' findings showed that while the growth rate 
was gradually slowing down, each model's suggestion 
accuracy generally displayed a monotonically improving 
trend as the number of RIs decreased. After the number 
of RIs increased from 15 to 30, the recommendation 
accuracy did not show a large increase. Under the 
condition that the number of RIs was 15, the accuracy 
and AUC of the ISVD++_I_KM model were higher than 
those of all the comparison models. The results of the 
control experiments revealed that when the RIs is 15, the 
Precision and Recall of the ISVD++_I_KM, RLRA, 
TRRA, and CF models are 85%, 73%, 70%, and 59% 
versus 87%, 75%, 69%, and 58%, respectively. The 
corresponding AUC of ROC for ISVD++_I_KM, RLRA, 
TRRA, and CF models were 0.83, 0.81, 0.78, and 0.76, 
respectively, and the former was significantly higher than 
the latter three. The average computation time for 
ISVD++_I_KM, RLRA, TRRA, and CF models was 
54.2s, 73.8s, 83.3 s, and 58.7s. The median memory 
consumption of ISVD++_I_KM, RLRA, TRRA, and CF 
models was 17.0MB, 12.6MB, 13.9MB, and 6.8MB, 
respectively. On the whole, the ISVD++_I_KM model 
designed in this study was significantly better than the 
control model in terms of recommendation accuracy, and 
the computational speed was lower than the advanced 
RLRA and TRRA models, but the computational memory 
consumption was higher. The inability to develop 
commercial RS using the intended model and evaluate 
the system's effectiveness in actual application scenarios 
is the research's main shortcoming; this is an area that has 
to be explored more in the future. 
References
 
[1] M. Ravakhah, M. Jalali, Y. Forghani, and R. Sheibani, 
"Balanced hierarchical max margin matrix 
factorization for recommendation system," Expert 
Systems, vol. 39, no. 4, pp. e12911.1-e12911.14, 
2021. https://doi.org/10.1111/exsy.12911 
[2] N. Jiang, L. Gao, F. Duan, J. Wen, T. Wan, and H. 
Chen, "SAN: Attention-based social aggregation 
neural networks for recommendation system," 
International Journal of Intelligent Systems, vol. 37, 
no. 6, pp. 3373-3393, 2021. DOI: 
https://doi.org/10.1002/int.22694 
[3] A. Da'U, N. Salim, and R. Idris, "An adaptive deep 
learning method for item recommendation system," 
Knowledge-Based Systems, vol. 213, no. 8, pp. 
106681.1-106681.12, 2021. 
https://doi.org/10.1016/j.knosys.2020.106681 
[4] U. Yadav, N. Duhan, and K. K. Bhatia, "Dealing with 
pure new user cold-start problem in 
recommendation system based on linked open data 
and social network features," Mobile Information 
Systems, vol. 2020, no. 4, pp. 
8912065.1-8912065.20, 2020. 
https://doi.org/10.1155/2020/8912065 
[5] K. Benabbes, K. Housni, A. E. Mezouary, and A. 
Zellou, "Recommendation system issues, 
approaches and challenges based on user reviews," 
Journal of Web Engineering, vol. 21, no. 4, pp. 
1017-1054, 2022. 
https://doi.org/10.13052/jwe1540-9589.2143 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 165 
[6] N. Mohammadi and A. Rasoolzadegan, "A two-stage 
location-sensitive and user preference-aware 
recommendation system," Expert Systems with 
Applications, vol. 191, no. 4, pp. 
116188.1-116188.25, 2022. 
https://doi.org/10.1016/j.eswa.2021.116188 
[7] L. Li, Z. Zhang, and S. Zhang. "Hybrid algorithm 
based on content and collaborative filtering in 
recommendation system optimization and 
simulation," Scientific Programming, vol. 2021, no. 
3, pp. 742709.1-742709.11, 2021. 
https://doi.org/10.1155/2021/7427409 
[8] Y. Cui, "Intelligent recommendation system based on 
mathematical modeling in personalized data 
mining," Mathematical Problems in Engineering, 
vol. 2021, no. 9, pp. 6672036.1-6672036.11, 2021. 
https://doi.org/10.1155/2021/6672036 
[9] M. C. Chiu, J. H. Huang, S. Gupta, and G. Akman, 
"Developing a personalized recommendation system 
in a smart product service system based on 
unsupervised learning model," Computers in 
Industry, vol. 128, no. 10, pp. 103421.1-103421.19, 
2021. 
https://doi.org/10.1016/j.compind.2021.103421 
[10] R. Shaw and B. K. Patra, "Cognitive-aware lecture 
video recommendation system using brain signal in 
flipped learning pedagogy," Expert Systems with 
Applications, vol. 207, no. 11, pp. 
118057.1-118057.10, 2022. 
https://doi.org/10.1016/j.eswa.2022.118057 
[11] T. R. Gwadabe and Y. Liu, "Improving graph neural 
network for session-based recommendation system 
via non-sequential interactions," Neurocomputing, 
vol. 468, no. 1, pp. 111-122, 2022. 
https://doi.org/10.1016/j.neucom.2021.10.034 
[12] Z. Roozbahani, J. Rezaeenour, A. Katanforoush, and 
A. J. Bidgoly, "Personalization of the collaborator 
recommendation system in multi-layer scientific 
social networks: A case study of ResearchGate," 
Expert Systems, vol. 39, no. 5, pp. 
e12932.1-e12932.18, 2021. 
https://doi.org/10.1111/exsy.12932 
[13] C. Choudhary, I. Singh, and M. Kumar, "SARWAS: 
Deep ensemble learning techniques for sentiment 
based recommendation system," Expert Systems 
with Applications, vol. 216, no. 4, pp. 
119420.1-119420.8, 2023. 
https://doi.org/10.1016/j.eswa.2023.119420 
[14] I. Rabiu, N. Salim, A. Da'U, and M. Nasser, 
"Modeling sentimental bias and temporal dynamics 
for adaptive deep recommendation system," Expert 
Systems with Applications, vol. 191, no. 4, pp. 
116262.1-116262.15, 2022. 
https://doi.org/10.1016/j.eswa.2021.116262 
[15] V. N. Pattwakkar, S. Kamath, M. K. Nanjundappa, 
and R. Kadavigere, "Automatic liver tumor 
segmentation on multiphase computed tomography 
volume using segnet deep neural network and 
k-means clustering," International Journal of 
Imaging Systems and Technology, vol. 33, no. 2, pp. 
729-745, 2023. https://doi.org/10.1002/ima.22816 
[16] Z. Sun, Y. Hu, W. Li, S. Feng, and L. Pei, 
"Prediction model for short-term traffic flow based 
on a k-means-gated recurrent unit combination", 
IET Intelligent Transport Systems, vol. 16, no. 5, pp. 
675-690, 2022. https://doi.org/10.1049/itr2.12165 
[17] H. M. Mohammed, Z. K. Abdul, T. A. Rashid, A. 
Alsadoon, and N. Bacanin, "A new k means grey 
wolf algorithm for engineering problems," World 
Journal of Engineering, vol. 18, no. 4, pp. 630-638, 
2021. https://doi.org/10.1108/WJE-10-2020-0527 
[18] S. F. H. Ziabari, S. Eskandari, and M. Salahi, 
"Clnf-fs_s: An efficient infinite feature selection 
method using k-means clustering to partition large 
feature spaces," Pattern Analysis and Applications, 
vol. 26, no. 4, pp. 1631-1639, 2023. 
https://doi.org/10.1007/s10044-023-01189-1 
[19] J. Natarajan and B. Rebekka, "An energy efficient 
dynamic small cell on/off switching with enhanced 
k-means clustering algorithm for 5g hetnets," 
International Journal of Communication Networks 
and Distributed Systems, vol. 29, no. 2, pp. 209-237, 
2023. https://doi.org/10.1504/IJCNDS 
[20] L. Wang, Z. You, D. Huang, and J. Li, "MGRCDA: 
Metagraph recommendation method for predicting 
circrna-disease association," IEEE Transactions on 
Cybernetics, vol. 53, no. 1, pp. 67-75, 2021. 
https://doi.org/10.1109/TCYB.2021.3090756 
[21] S. N. Amin, P. Shivakumara, T. X. Jun, K. Y. Chong, 
D. Zan, and R. Rahavendra, "An augmented 
reality-based approach for designing interactive 
food menu of restaurant using android," Artificial 
Intelligence and Applications, vol. 1, no. 1, pp. 
26-34, 2023. 
https://doi.org/10.47852/bonviewAIA2202354 
[22] C. Fan, "Evaluating employee performance with an 
improved clustering algorithm," Informatica, vol. 46, 
no. 5, pp. 123-128, 2022. 
https://doi.org/10.31449/inf.v46i5.4079 
[23] T. Chen, "A fuzzy ubiquitous traveler clustering and 
hotel recommendation system by differentiating 
travelers' decision-making behaviors - 
ScienceDirect," Applied Soft Computing, vol. 96, 
no. 1, pp. 106585.1-106585.10, 2020. 
https://doi.org/10.1016/j.asoc.2020.106585 
[24] Z. Abbasi-Moud, H. Vahdat-Nejad, and J. Sadri, 
"Tourism recommendation system based on 
semantic clustering and sentiment analysis," Expert 
Systems with Applications, vol. 167, no. 4, pp. 
114324.1-114324.10, 2021. 
https://doi.org/10.1016/j.eswa.2020.114324 
[25] P. Mazumdar, B. K. Patra, and K. S. Babu, 
"Cold-start point-of-interest recommendation 
through crowdsourcing," ACM Transactions on the 
Web (TWEB), vol. 14, no. 4, pp. 19.1-19.36, 2020. 
https://doi.org/10.1145/3407182 
166   Informatica 48 (2024) 147–166                                                           W. W. Chen et al. 
[26] S. Bin and G. Sun, "Matrix factorization 
recommendation algorithm based on multiple social 
relationships," Mathematical Problems in 
Engineering, vol. 2021, no. 9, pp. 
6610645.1-6610645.8, 2021. 
https://doi.org/10.1155/2021/6610645 
[27] Q. Liang, X. Zheng, Y. Wang, and M. Y. Zhu, 
"O3ERS: An explainable recommendation system 
with online learning, online recommendation, and 
online explanation," Information Sciences, vol. 562, 
no. 7, pp. 94-115, 2021. 
https://doi.org/10.1016/j.ins.2020.12.070 
[28] Y. Guo, Z. Mustafaoglu, and D. Koundal, "Spam 
detection using bidirectional transformers and 
machine learning classifier algorithms," Journal of 
Computational and Cognitive Engineering, vol. 2, 
no. 1, pp. 5-9, 2022. 
https://doi.org/10.47852/bonviewJCCE2202192 
[29] B. Chen, L. Zhu, D. Wang, and J. Cheng, "Research 
on the design of mass recommendation system 
based on lambda architecture," Journal of Web 
Engineering. vol. 20, no. 6, pp. 1971-1990, 2021. 
https://doi.org/10.13052/jwe1540-9589.20614 
[30] Q. Zhu, "Network course recommendation system 
based on double-layer attention mechanism," 
Scientific Programming, vol. 2021, no. 13, pp. 
7613511.1-7613511.9, 2021. 
https://doi.org/10.1155/2021/7613511. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Application of Improved K-means Algorithm in E-commerce Data… Informatica 48 (2024) 147–166 167