https://doi.or g/10.31449/inf.v48i8.5480 Informatica 48 (2024) 49–62 49
Closed Itemset Mining: A Graph Theory Perspective
Fatima Zohra Lebbah
1, 2
, Moussa Larbani
3, 4
, Abdellatif Rahmoun
5
1
Higher School of Electrical and Ener getic Engineering of Oran (ESG2E), V icinal Road N9,Oran, 31000, Algeria
2
Laboratory of Research in Computer Science-SBA, Computational Intelligence and Soft Computing T eam (CISCO),
Higher School of Computer Science, BP 73, Bureau de poste EL WIAM, Sidi-Belabbes, 22016, Algeria
3
School of Mathematics and Statistics, Carleton University , Ontario, 1 125 Colonel By Drive, Ottawa, K1S 5B6, Canada
4
National Higher School of Statistics and Applied Economics (ENSSEA), Pôle Universitaire de Koléa, T ipaza, Algiers,
42003,Algeria
5
Laboratory of Research in Computer Science-SBA, Computational Intelligence and Soft Computing T eam (CISCO),
Higher School of Computer Science, BP 73, Bureau de poste EL WIAM, Sidi-Belabbes, 22016, Algeria
E-mail: fz_lebbah@yahoo.fr , fz.lebbah@esgee-oran.dz, larbani61@hotmail.com, a.rahmoun@esi-sba.dz
Keywords: dataset, closed frequent itemset, labeled graph, maximal clique
Received: November 28, 2023
Data mining is a field that focuses on extracting and analyzing usable data fr om lar ge databases. This
paper specifically concentrates on the pr oblem of finding closed fr equent itemsets, which is extensively
studied in the field. Pr evious techniques based on graph theory have used tr ee structur es and r ecursive
algorithms, which have limitations. In this paper , we pr opose a scalable modeling appr oach that r epr e-
sents a transaction dataset using an undir ected and labeled graph. The labels on the edges ar e computed
and assigned in a clever manner . W e also intr oduce a polynomial and exact algorithm based on the clique
notion in graph theory to compute all the closed fr equent itemsets. Our initial testing r esults demonstrate
the efficiency of our algorithm in terms of CPU-time and memory usage compar ed to r ecent methods in the
literatur e. Additionally , our graph model can be easily extended when the dataset is updated. W e utilize lin-
ear structur es with boolean values to implement our graph and employ an exact algorithm with polynomial
complexity . These aspects of our appr oach pr ovide str ong foundations for investigating mor e challenging
issues r elated to the pr oblem at hand.
Povzetek: Prispevek obravnava skalabilno modeliranje podatkovnih množic z uporabo neusmerjenih, oz-
načenih grafov za rudarjenje zaprtih pogostih množic. Pr edlagan je nov algoritem s polinomsko komplek-
snostjo, temelječ na teoriji klik.
1 Intr oduction
Nowadays, Data Mining is the science that allows us to
work with a dataset in line with our perspectives [18, 16, 17,
19]. It involves the process of finding patterns and knowl-
edge from a lar ge amount of data. However , achieving the
desired and specific information poses a significant chal-
lenge for many researchers. This has led to the develop-
ment of various competing models and techniques, such as
in [27, 3, 2, 4, 7].
Itemset Mining is an appealing field within the realm
of Data Mining, primarily because of the need to manage
lar ge-scale data. As the size of data increases, traditional
methods and techniques used to address Itemsets’ problems
tend to become slower and less ef ficient. Researchers are
currently focused on discovering and proposing new tech-
niques and algorithms to model and solve problems related
to Itemset Mining, including computing frequent, closed,
and maximal itemsets.
The process of extracting frequent sets of items bought
by customers, known as finding frequent itemsets in a trans-
action database [23, 15, 3], has been addressed by various
techniques. Some of these techniques include Constraints
Programming [32, 6, 5, 25, 8], as well as Graph Theory
[24, 28, 29, 30, 26].
Currently , graph theory-based approaches have been pro-
posed to find frequent itemsets. However , in this arti-
cle, we present a new approach to address the problem of
Closed Frequent Itemsets CFI s. W e show that a transac-
tion database can be ef fectively modeled using the clique’ s
concept through an undirected and labeled graph.
In this article, we introduce a clique-based algorithm for
finding CFI s, which we refer to as CfCi. This approach
uses linear structures and a polynomial algorithm. Specif-
ically , we show how our graph model enables the step-by-
step computation of all theCFI s.
The article is structured as follows:
Section 2 introduces the basic concepts of theCFI s min-
ing problem, along with a review of related works. Section
3 presents graph theory notions used to transform a transac-
tion database into an undirected and labeled graph. Section
4 describes the equivalency between a (Closed) Frequent
Itemsets and a (Maximal) clique and how to deduce aCFI
from a maximal clique. Section 5 depicts our CfCi algo-
50 Informatica 48 (2024) 49–62 F .Z. Lebbah et al.
rithm and the called functions. The complexity analysis and
an executed example are also given in this section. In Sec-
tion 6, we compare our algorithm CfCi to the latest exist-
ing techniques in the literature to highlight its contribution.
This section also includes the reporting of experimental re-
sults on various datasets and their analysis. In Section 7,
we discuss, describe, analyze, and interpret our findings.
Finally , Section 8 concludes the paper .
2 Pr oblem definition and r elated
works
In this section, we begin by introducing the problem using
an example of a market basket. Then, we provide basic
concepts and definitions.
For instance, let’ s consider the market baskets illustrated
in Figure 1, where we analyze five shopping baskets. The
aim is to study customer shopping habits and identify as-
sociations and correlations between dif ferent items. In this
paper , we focus on computing itemsets that are frequently
purchased together by customers. Specifically , we address
the problem ofCFI s.
Figure 1: Market baskets analysis
T able 1 represents the market baskets as transactions and
items. Each row corresponds to a customer transaction, and
each column represents an item. The assigned lettersm ,b ,
e ,j , ands represent specific items: milk, bread, egg, jam,
and sugar , respectively .
According to the table, the itemset{ m,j} has a fre-
quency of three, showing that three customers purchase the
both articles milk and jam. Similarly , the itemset{ b,e} has
a frequency of two, showing that two customers purchase
the both articles bread and egg.
T able 1: The itemset database, as shown on the left side of
Figure 1, can be represented using binary matrix notation,
as shown on the right side of the figure.
T Itemset
t
1
{ m,b,e,j } t
2
{ m,s,e,j } t
3
{ b,e} t
4
{ m,j} m s b e j
t
1
1 0 1 1 1
t
2
1 1 0 1 1
t
3
0 0 1 1 0
t
4
1 0 0 0 1
Therefore, an itemset database problem is defined by a
set of itemsI = { m,s,b,e,j } and a set of transactions
T ={ t
1
,t
2
,t
3
,t
4
} . A boolean matrixD(n× m) , which is
called the transaction database, represents the relationship
between these items through the transactions.
Therefore, an itemset database problem is defined by a
set of itemsI = { m,s,b,e,j } and a set of transactions
T ={ t
1
,t
2
,t
3
,t
4
} . The relationship between these items
through the transactions is represented by a boolean matrix
D(n× m) , which is called the transaction database (see
Definition 1). Here,n =|I| the number of items inI and
m =|T| represents the number of transactions inT .
Definition 1 (transaction database) . Let T =
{ t
1
,t
2
,··· t
n
} be the set of transactions, and
I = { i
1
,i
2
,··· i
m
} the set of items. A transaction
database is defined thr ough the boolean database matrix
D(n× m) :
D
j,k
=
   1 if the itemi
j
occurs in the
transactiont
k
0 else
(1)
wher e:
n =|T| is the number of transactions,
m =|I| is the number of items.
Based on Figure 1 and T able 1, the analyzed shopping
results in the following transaction databaseD(4× 5) as
shown in Matrix 2.
D =
    1 0 1 1 1
1 1 0 1 1
0 0 1 1 0
1 0 0 0 1
    (2)
2.1 Fr equent itemset mining
Before delving into the topic of Closed Frequent Itemset, it
is important to introduce the concept of support, as defined
in Definition 2.
Definition 2 (support measure) . LetD be a transaction
database,I be the set of items, andT be the set of transac-
tions. The support measur e of an itemsetI⊆I is denoted
asSupp(I) and defined as:
Supp(I) =
∑ i∈ I
(
∏ t∈T
D
ti
) (3)
Supp(I) is the number of transactions containing all the
items belonging toI .
Definition 3 (frequent itemset mining) . LetD be a trans-
action database and θ be the minimum support thr eshold,
wher e θ > 0 . An itemset I is consider ed fr equent if its
support, denoted asSupp(I) , is gr eater than or equal toθ (θ > 0 ). Conversely , an itemsetI is consider ed infr equent
if its support is less thanθ (Supp(I)<θ ).
For example, let’ s consider the transaction database pre-
sented in Matrix 2. The support values for I
1
= { i
2
} ,
I
2
={ i
3
,i
4
} and I
3
={ i
1
,i
5
} are S1 , 2 and 3 , respec-
tively .
Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 51
Pr operty 1 (downward closure) . Every subset of a fr equent
itemset is also fr equent. In other wor ds,
I is fr equent ⇒∀ I
i
⊂ I :I
i
is fr equent (4)
or equivalently:
I is infr equent ⇒∀ I
i
⊃ I :I
i
is infr equent (5)
Property 1 [2], also known as the Apriori Principle , is re-
lated to Property 3 and Property 2 [1, 21], which introduce
the characteristics of monotonicity and anti-monotonicity
of supports and the inclusion of frequent itemsets, respec-
tively .
Pr operty 2 (anti-Monotony) . [20] Let ther e be two item-
sets I
1
and I
2
such that I
1
⊂ I
2
. It follows that
Supp(I
1
)≥ Supp(I
2
) .
∀ I
i
,I
j
∈I /I
i
⊆ I
j
:Supp(I
i
)≥ Supp(I
j
) (6)
Pr operty 3 (monotonicity) . LetI
1
andI
2
be two itemsets
such thatI
1
⊆ I
2
. IfSupp(I
2
)≥ θ thenSupp(I
1
)≥ θ .
∀ I
i
,I
j
∈I /I
i
⊆ I
j
:Supp(I
j
)≥ θ ⇒ Supp(I
i
)≥ θ (7)
In other words, the relation Fr equent (as stated in Prop-
erty 2) indicates that the subsets of a frequent itemset are
also frequent. Additionally , the support measure exhibits
monotonicity , as stated in Property 3.
2.2 Closed itemset mining
A closed itemset, denoted as I
c
, is a frequent itemset that
has a support, denoted asSupp(I
c
) , equal to or greater than
a fixed minimum support threshold, denoted as θ . All the
items in I
c
must be present in the optimal set of transac-
tions, denoted as T
Ic
, which has a size equal to Supp(I
c
)
(as defined in Definition 4).
Definition 4 (closed itemset) . LetI
c
be a fr equent itemset
and Supp(I
c
) be its support. The itemset I
c
is closed if-
if it is not included in a lar ger fr equent itemsetI
i
that has
the same support, Supp(I
c
) = Supp(I
i
) , and appears in
all the transactions that include I
c
, as well as additional
transactions.
I
c
is closed ⇔ ∄ I
i
⊃ I
c
:I
i
is fr equent∧ Supp(I
c
) =Supp(I
i
)∧ T
Ic
⊂ T
Ii
.
(8)
wher e, T
Ic
and T
Ii
ar e the sets of the transactions which
contain the elements ofI
c
andI
i
, r espectively .
Note that our aim is to identify all the closed itemsets.
Figure 2 depicts the first part of the Hasse diagram for the
database represented in Matrix 2. An edge is drawn from
the itemset I
1
to the itemset I
2
if and only if I
1
⊂ I
2
and
|I
2
| =|I
1
| + 1 . In this figure, the blue ellipses represent
the frequent itemsets, while the green ellipses represent the
closed itemsets. The support of each closed itemset is men-
tioned on the left side of the corresponding ellipse.
For instance, as shown in Figure 2, if we set the minimum
support threshold (θ ) to 2, the support of{ e} is3 , while the
support of its superset{ b,e} is 2 . Similarly , the support
of{ m,j} is 3 . This shows the anti-monotony property (as
stated in Property 2), where e is a subset of{ e}⊂{ b,e} ,
and their supports are 3 and 2 , respectively .
2.3 Related works
So far , several algorithms have been developed for min-
ing Closed Frequent Itemsets, including AprioriTID Close
[20], LCM [22], CHARM [9], FP-Close[12], FCILINK
[31], NAFCP [10], EMFCI[14] NECLA T [13] and
GrAFCI+ [1 1]. However , most of these methods utilize tree
structures and recursive algorithms.
One primary challenge in the field of data mining is to ef-
ficiently find all the closed itemsets while optimizing CPU
time and memory usage.
T o address this challenge, we propose a graph model
based on a linear structure that minimizes memory usage.
Additionally , we introduce the exact algorithm CfCi to find
all the closed frequent itemsets.
T o emphasize the contribution of our proposed approach
in this work, we will compare the results of the CfCi algo-
rithm with the latest algorithms, namely NAFCP, NECLA T
and GrAFCI+. This comparison will be presented in Sec-
tion 6. W e consider not only the novelty of the chosen algo-
rithm but also the availability of corresponding applications
and the detailed aspect of the results.
T able 2 presents the key characteristics of the algorithms
that are considered in the experiments section.
T able 2: Characteristics of the tested algorithms and their used
structures
Used Struc-
ture
Used Al-
gorithms
Proposed Ap-
proach
CfCi linear structure iterative exact
NAFCP tree structure recursive approximate
NECLA T tree structure recursive approximate
GrAFCI+ tree structure recursive approximate
3 Itemset mining vs undir ected
graph
Firstly , we propose an undirected and labeled graph to rep-
resent a dataset. This model is based on the idea that con-
necting items that appear in the same transaction forms a
clique, which is a sub-graph where every pair of distinct
vertices are adjacent. Therefore, each transaction is repre-
sented by a clique. However , since a bond may be shared
by multiple cliques, we introduce a labeling approach that
ensures a bond shared by dif ferent cliques (as defined in
Definition 9) is accurately represented in the graph model.
52 Informatica 48 (2024) 49–62 F .Z. Lebbah et al.
{}
{m} {b} {e} {j} {s}
{m,b} {m,e} {m,j} {m,s} {b,e} {b,j} {b,s} {e,j} {e,s} {j,s}
Supp(x) x frequent itemset
Supp(x) x closed itemset
Figure 2: The search space of frequent itemset mining for the database of T able 1 andθ =2
In our graph model, we assign a binary value, known as
a sub-label, to each transaction. Specifically , as defined in
Definition 5, the bond (j,k ) that represents the occurrence
of items i
j
and i
k
in the same transaction t
p
is character -
ized by the sub-labelL
p
(j,k ) = 10
n− p− 1
, wheren is the
number of transactions.
Definition 5 (sub-label) . Consider a databaseD(n× m)
and a transactiont
p
∈T . The occurr ence of itemsi
j
andi
k
in the same transactiont
p
, r epr esented byD[p,j ] = 1 and
D[p,k ] = 1 , is modeled by the bond(j,k ) labeled with the
binary valueL
p
(j,k ) = 10
n− p− 1
. This sub-label is a part
of the overall labelL
p
(j,k ) that r epr esents the r elationship
between the itemsi
j
andi
k
.
Each time the pair of itemsi
j
andi
k
appears in a transac-
tion, a new sub-label is added to the label of the bond(i,j ) .
In other words, the label of the bond (i
j
,i
k
) , denoted by
L(i
j
,i
k
) , is the sum of all the sub-labels associated with
that bond, as expressed by Equation 9 in Definition 6.
Definition 6 (bond label) . LetG =⟨ X,U, L⟩ be the graph
model of a transactions database. The label of the bond
(i,j )∈ U is given by:
L(i,j ) =
∑ tp∈T
L
p
(i,j ) (9)
In Definition 7, we introduce the notion of the label size
denoted by HW(L(i,j )) , which is the number of the in-
cluded ones or the included sub-labels inL(i,j ) .
Definition 7 (label size) . LetG =⟨ X,U, L⟩ be the graph
model of a transaction database. The size of the label
L(i,j ) , denoted asHW(L(i,j )) , r epr esents the Hamming
W eight ofL(i,j ) , which is the number of included sub-
labels.
Definition 8 introduces the concept that the items appear -
ing in the transaction t are well represented by a clique.
T o dif ferentiate one transaction or clique from another , we
assign the corresponding sub-labelL
p
(i,j ) to each bond
(i,j ) .
Definition 8 (transaction VS clique) . LetD(n× m) be a
transaction database defined on the setT of transactions
and the setI of items. Suppose that the transactiont
p
∈T
includes all the items of the itemsetI⊂I . The transaction
t
p
is r epr esented by the cliqueC
p
, wher e:
C
p
=
{ i
j
,i
k
∈ I
i
,L
p
(i
j
,i
k
) = 10
(n− p− 1)
} (10)
In Definition 9, we depict how we model each transaction
and its included items by a labeled clique in the correspond-
ing undirected and labeled graph, called dataset graph.
Definition 9 ( dataset VS Labeled graph) . LetD(n× m) be a
transaction database, wher e n =|T| r epr esents the num-
ber of transactions and m = |I| r epr esents the number
of items. W e associate withD the undir ected and labeled
graphG =⟨ X,U, L⟩ wher e:
X : set of vertices that r epr esents the set of items, with
|X| =|I| =m
U : set of bonds which ar e defined as follows:
   (i
j
,i
k
)∈ U
D
if∃ t
p
∈T :D[p,j ] = 1
∧D [p,k ] = 1
(i
j
,i
k
) / ∈ U else
and
|U| =m+
∑ i=1..n
∑ j=1..m
D[i,j ]− n
(1 1)
L
ij
: binary value assigned to the bond (i,j )∈ U , which
is defined as follows:
L
ij,i
k
=
∑ p=1..n
D[p,j ]∗D [p,k ]∗ 10
n− p− 1
(12)
For instance, let’ s consider the itemset mining problem
[6] described below:
i
1
i
2
i
3
i
4
t
1
1 0 1 1
t
2
1 1 0 1
t
3
0 0 1 1
In Figures 3a, 3b, and 3c, we depict the cliques generated
from the transactionst
1
,t
2
, andt
3
, respectively .
Therefore, our graph modelG =⟨ X,U, L⟩ is formed by
combining all the cliques extracted fromD , and the corre-
sponding adjacency matrixA will be a binary symmetric
matrix, as defined in Definition 10.
Definition 10 (items adjacency matrix) . LetG =⟨ X,U, L⟩
be the graph modeling the databaseD(n× m) . The corr e-
sponding adjacency matrixA toG is defined as follows:
a
jk
=
{ L(i
j
,i
k
) if (i
j
,i
k
)∈ U
0 else
(13)
Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 53
i
1
i
3
i
4
100
100
100
100
100 100
(a)t1 VSC1
i1
i2 i4
10 10
10
10
10 10
(b)t1 VSC1
i
3
i
4
1
1 1
(c)t2 VSC2
Figure 3: T ransactions VS Cliques
wher e:
L(i
j
,i
k
) =
∑ n
p=1
L
p
(i
j
,i
k
)
W e illustrate through Figures 4, 5 and 6, the sub-graphs
G
1
=
⟨ X,U, L
1
⟩ ,G
2
=
⟨ X,U, L
2
⟩ andG
3
=
⟨ X,U, L
3
⟩ which model the transactions t
1
, t
2
and t
3
, respectively .
On the right of each sub-graphG
p
, we introduce the cor -
responding adjacency matrixL
p
containing the bonds sub-
labels.
i1
i3 i2 i4
100
100
100
100
100 100
Figure 4:G
1
=
⟨ X,U, L
1
⟩ models the transac-
tiont
1
L
1
=
    100 000 100 100
000 000 000 000
100 000 100 100
100 000 100 100
    i1
i3 i2 i4
010 010
010
010
010 010
Figure 5:G
2
=
⟨ X,U, L
2
⟩ models the transac-
tiont
2
L
2
=
    010 010 000 010
010 010 000 010
000 000 000 000
010 010 000 010
    i1
i3 i2 i4
001
001 001
Figure 6:G
3
=
⟨ X,U, L
3
⟩ models the transac-
tiont
3
L
3
=
    000 000 000 000
000 000 000 000
000 000 001 001
000 000 001 001
    As illustrated in Figure 7, according to Definition 10, the
combination of the sub-graphsG
1
,G
2
andG
3
, provides the
labeled and undirected graphG =⟨ X,U, L⟩ . The corre-
sponding adjacency matrixA =L
1
+L
2
+L
3
is given on
the right side of the figure.
i1
i3 i2 i4
100
101
010 110
010
110
010 101 111
Figure 7: G = ⟨ X,U, L⟩ models the database
transactionst
1
,t
2
andt
3
A
D
=
    1 10 010 100 1 10
010 010 000 010
100 000 101 101
1 10 010 101 1 1 1
    4 Fr equent itemset vs clique
In this section, we present the utilization of the maximal
clique principle in our approach to find the closed itemsets.
The concept of a frequent itemset is defined in Defini-
tion 1 1, using the concept of a clique in graph theory . In
other words, a clique in the dataset corresponding graph
G =⟨ X,Y, L⟩ models a frequent itemset, where the con-
tained vertices correspond to items.
Definition 1 1 (frequent itemset VS Clique) . Let’ s consider
the adjacency matrixA(n× n) , the set of itemsI , the set
of transactionsT and the min-support θ . An itemset I is
fr equent if and only if its items ar e vertices of a cliqueC in
G , such that all the labels of the corr esponding bonds shar e
at leastθ sub-labels.
∀ i,j ∈ I :|∩ SL(i,j )
| ⩾ θ (14)
wher eSL(i,j ) is the set of the sub-labels whose sum equals
A[i,j ]
54 Informatica 48 (2024) 49–62 F .Z. Lebbah et al.
i
1
i
2
i
3
i
4
i
5
t
1
1 0 1 1 0
t
2
1 1 0 1 0
t
3
0 0 1 1 0
t
4
0 0 0 0 1
(a)
D =
    1 0 1 1 0
1 1 0 1 0
0 0 1 1 0
0 0 0 0 1
    (b)
Figure 8: Example of itemset mining problem including 5
items and 4 transactions
T o compute theCFI s, the infrequent items (see Property
1) and empty transactions should be removed. For instance,
consider the dataset given in Figure 8.
As illustrated in Figure 9, if the min-support is set to
θ = 2 , we notice thati
2
andi
5
are infrequent and should be
removed. Thus, the datasetD is replaced byD
′ (see sub-
Figure 9a), which includes the empty transactiont
4
. At this
level,t
4
is deleted to move toD
′′ (see sub-Figure 9b).
i
1
i
3
i
4
t
1
1 1 1
t
2
1 0 1
t
3
0 1 1
t
4
0 0 0
D
′ =
    1 1 1
1 0 1
0 1 1
0 0 0
    (a)
i
1
i
3
i
4
t
1
1 1 1
t
2
1 0 1
t
3
0 1 1
D
′′ =
  1 1 1
1 0 1
0 1 1
  (b)
i1
i3 i4
100
101
110
110
101 111
G
D
′′ = ⟨ X
D
′′ ,U
D
′′ , LD⟩ (c)
Figure 9: Removing infrequent items and empty transac-
tions
The graph corresponding to D
′′ , called G
′′ =
⟨ X
′′ ,U
′′ ,L⟩ (see sub-Figure 9c), will be the input for our
proposed algorithms and will be noted asG =⟨ X,U, L⟩ .
Every pairi
j
,i
k
∈ I
c
, wherel
c
is a closed itemset, is fre-
quent, as defined in Definition 4. Therefore, if we project
this concept onto the graph modelG =⟨ X,U, L⟩ , the label
of each bond(i,j )∈ U must contain at leastθ ones. Stated
dif ferently , a bond in our graph needs to be consistent (Def-
inition 12).
Definition 12 (consistent bond) . Let G = ⟨ X,U, L⟩
be the graph model of the dataset D
′′ . Suppose that
L(i,j ) is the corr esponding label to the bond (i,j ) , and
HW(L(i,j )) is the number of the ones included inL(i,j ) .
If HW(L(i,j )) ⩾ θ then (i,j ) is consistent and (i,j ) is
inconsistent otherwise.
Consequently , as given in Definition 13, if all the bonds
of the graph are consistent,the graph is consistent.
Definition 13 (Consistent data mining graph) . LetG
c
=
⟨ X,U
c
,L⟩ be the dataset graph.G
c
is consistent if-if all its
bonds ar e consistent:
G
c
=⟨ X
c
,U
c
,L⟩ is consistent ⇔∀ (i,j )∈ U :
(i,j ) is consistent.
(15)
Thus, the input graph of our algorithm should be a con-
sistent graph whose inconsistent bonds are omitted. As il-
lustrated in Figure 10,G
c
=⟨ X
c
,U
c
,L
c
⟩ is the consistent
version ofG (see Figure 7). The corresponding adjacency
matrixA(3× 3) is shown on the right side of the figure.
i1
i3 i4
101
110
110
101 111
Figure 10: The consistent graph
G
c
= ⟨ X
c
,U
c
,L
c
⟩ (see sub-Figure
9c)
A =
  1 10 0 1 10
0 101 101
1 10 101 1 1 1
  5 CfCi exact algorithm for complete
enumeration
In Definition 15, we introduce the closed itemset notion in
terms of the graph theory . ACFI corresponds to a maximal
clique (see Definition 14), whose at least θ sub-labels are
shared by the bonds.
Definition 14 (maximal clique) . LetG = ⟨ X,U⟩ be an
unditr ected graph. A clique C is maximal if it cannot be
extended into a lar ger clique. In other wor ds, C is not a
subset of a lar ger clique.
Definition 15 (closed itemset VS dataset graph) . LetA(n× n) be the adjacency matrix of the consistent graphG =
⟨ X,U, L⟩ . An itemset I is closed if its items ar e the ver -
tices of a maximal cliqueCQ such that all the labels of the
corr esponding bonds shar e at leastθ sub-labels.
Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 55
In graph theory , finding maximal cliques is a hard prob-
lem. It is dif ficult to find a polynomial algorithm that pro-
vides this kind of clique. However , in this paper , we pro-
pose Algorithm 2, called CfCi. This algorithm, based on the
bonds’ sub-labels, is exact and has polynomial complexity .
Considering, for example, the graph given in Figure 10.
The corresponding structures are given below .
D =
  1 1 1
1 0 1
0 1 1
  A =
  1 10 0 1 10
0 101 101
1 10 101 1 1 1
  Labs = [110, 101, 111]
supp = [2, 2, 3]
IndVect =
  1
2
3
  Algorithm 1, having the polynomial complexity
O(
nbcons× (nbcons− 1)
2
) (see Proposition 1), provides the
CFI s corresponding to the initial labels that belong to
Labs .
Pr oposition 1. Algorithm 1 has the worst case complexity
≈ O(
nbcons
2
− nbcons
2
) , wher enbcons is the number of the
consistent vertices.
Pr oof. In this algorithm, we have three nested loops. At
most, at the outer onenbcons vertices will be treated and at
the inner ones
nbcons− 1
2
will be processed. Thus, the com-
plexity ofBasicCFI ’ s≈ O(
nbcons
2
− nbcons
2
) .
Let’ s consider , for example, the graphG = ⟨ X,U, L⟩
illustrated in Figure 1 1.
i1
i3 i2 i4
i5
1000
1010
0100 1100
0100
1100
0100 1010 1110
0001
Figure 1 1: G = ⟨ X,U, L⟩ models the dataset given in T able 1
W e suppose thatθ = 1 , which makes the graph consistent
(see 13) and the correct input of our algorithm.
Figure 12 shows the dif ferent steps of Algorithm 1 ap-
plied to the graph. The sub-figures contain arrows in green,
blue and red, that correspond to the fact that the intersec-
tion between the first label and the second generates a label
equal to the first one, a label with support⩾ θ and a label
with support<θ , respectively .
The algorithm 2, called CfCi, is the proposed algorithm
of the polynomial complexity≈ O(
l
3
− l
2
2
(v− 1)) , wherel is
the labels’ number andv is the vertices’ number (see Propo-
sition 2). It provides all the CFI s of the tackled dataset
since the second loop of the algorithm computes all the pos-
sible labels’ intersections. In other words, each new gener -
ated label whosesupp⩾ θ should lead to a newCFI .
0001 0100 1010 1100 1110 Labs
0000
0000 0000 0000
[i
5
]
FCI i
5
IndVect
1 SuppVect
(a)I1 = { i5} 0001 0100 1010 1100 1110 Labs
0000
0100
0100
[i
5
] [i
2
, i
1
, i
4
] FCI
i
5
i
2 IndVect
1 1 SuppVect
(b)I2 = { i2,i 1,i 4} 0001 0100 1010 1100 1110 Labs
1000
1010
[i
5
] [i
2
, i
1
, i
4
] [i
3
, i
4
] FCI
i
5
i
2
i
3 IndVect
1 1 2 SuppVect
(c)I3 = { i3,i 4} 0001 0100 1010 1100 1110 Labs
1100
[i
5
] [i
2
, i
1
, i
4
] [i
3
, i
4
] [i
1
, i
4
] FCI
i
5
i
2
i
3
i
1 IndVect
1 1 2 2 SuppVect
(d)I4 = { i1,i 4} 0001 0100 1010 1100 1110 Labs
1110
[i
5
] [i
2
, i
1
, i
4
] [i
3
, i
4
] [i
1
, i
4
] [i
4
] FCI
i
5
i
2
i
3
i
1
i
4 IndVect
1 1 2 2 3 SuppVect
(e)I5 = { i4} Figure 12: GeneratedCFI s by Algorithm 1. IndVect and
SuppVect contain the vertices who are at the origin of the
corresponding labels and their supports, respectively .
T o find a new maximal clique, our algorithm calls the
function NewClique (see Algorithm 3). NewClique(r)
computes the incident vertices toIndVect[r] , that share at
leastθ sub-labels contained inLabs[r] .
Pr oposition 2. Algorithm 2 has the worst case complexity
≈ O(
l
3
− l
2
2
(v− 1)) , wher el is the labels’ number andv is
vertices’ number .
Pr oof. In this algorithm, we have three nested loops. Con-
siderl the labels’ number andv vertices’ number . At most,
at the outer onel labels will be treated, and at the first and
the second inner onesl and
l− 1
2
iterations will be processed,
respectively . In addition, at mostv− 1 successors are han-
dled in the function NewClique . Thus, the CfCi’ s com-
plexity≈ O(
l
3
− l
2
2
× v− 1) .
In Figure 13, we illustrate the variation of the used struc-
tures: Labs , FCI , IndVect and SuppVect . As given in
the proof of Proposition 2, Algorithm 2 executes innbcons
outer loops, wherenbcons is the number of the consistent
56 Informatica 48 (2024) 49–62 F .Z. Lebbah et al.
Algorithm 1 BasicCFI(M )
Requir e: θ : threshold, M : adjacency matrix of the cleaned graph,nbcons : number of consistent vertices; ConsItem :
vector of consistent vertices;
1: VertexSet←∅ ;
2: fori = 0..nbcons do ▷ the vertices are covered, according to their labels,
3: from the smallest to the biggest one
4: VertexSet← VertexSet∪{ i} ;
5: IndVect← IndVect∪{ i} ;SuppVect← SuppVect∪{ supp[i]} 6: forj =i+1..nbcons do
7: if DistLabs(i,j ) =supp[j] then ▷ return the distance between the
8: labelsi andj
9: VertexSet← VertexSet∪{ j} 10: end if
1 1: end for
12: if VertexSet̸=∅ then
13: FCI← FCI∪{ VertexSet} 14: end if
15: VertexSet←∅ 16: end for
17: r eturnFCI
Algorithm 2 CfCi()
Requir e: θ : threshold, M : adjacency matrix of the cleaned graph,nbcons : number of consistent vertices; ConsItem :
vector of consistent vertices;Labs : set of the generated labels byBasicCFI algorithm
1: beg← 0 ;end←| Labs| ;e←| Labs| 2: fork = 0···| Labs| do
3: bg =beg[k] ;ed =end[k]
4: fori =bg··· ed do
5: b← 0
6: forj =i+1··· ed do
7: cl← 0 ;dst← DistLabs(Labs[i],Labs [j])
8: if dst̸=SuppVect[i]∧ dst≥ θ then
9: cl =comp2Labs(Labs[i],Labs [j]) ▷ return tocl the intersection of the labelsi andj
10: if cl / ∈ Labs then
1 1: Labs← Labs∪{ cl} ;IndVect← IndVect∪{ i} 12: SuppVect← SuppVect∪{ dst} 13: VertexSet← NewClique(|Labs|− 1)
14: if VertexSet̸=∅ then
15: VertexSet← VertexSet∪{ ind} ;FCI← FCI∪{ VertexSet} 16: VertexSet←∅ 17: end if
18: b← b+1
19: end if
20: end if
21: end for
22: if b≥ 1 then
23: beg← beg +e ;end← end+ed+b ;e← e+b
24: end if
25: end for
26: end for
vertices. Thus, sincenbcons = 5 , our algorithm executes
5 outer loops. Through sub-Figures 13a, 13b, 13c, 13d and
13e, we illustrate the dif ferent loops’ details. W e used the
green arrow to express the novelty and consistency of the
generated label, the blue to show that the generated label is
already implemented, and the red when the support of the
generated label is less thanθ .
Thus, Algorithm 2, applied to the graph, provides six
Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 57
Algorithm 3NewClique
1: functionNewClique (r )
2: VertexSet←∅ 3: ind← IndVect[r]
4: whilev∈ succ[ind] do
5: dst =DistLabs(r,v )
6: if dst =SuppVect[r] then
7: VertexSet← VertexSet∪{ v} 8: end if
9: end while
10: r eturnVertexSet
1 1: end function
0001 0100 1010 1100 1110
0000
0000 0000 0000
[i
5
] [i
2
, i
1
, i
4
] [i
3
, i
4
] [i
1
, i
4
] [i
4
] FCI
i
5
i
2
i
3
i
1
i
4 IndVect
1 1 2 2 3 SuppVect
(a) No new label is generated.
0001 0100 1010 1100 1110
0000
0100
0100
[i
5
] [i
2
, i
1
, i
4
] [i
3
, i
4
] [i
1
, i
4
] [i
4
] FCI
i
5
i
2
i
3
i
1
i
4 IndVect
1 1 2 2 3 SuppVect
(b) No new label is generated.
0001 0100 1010 1100 1110 Labs + 1000
1000
1000
[i
5
] [i
2
, i
1
, i
4
] [i
3
, i
4
] [i
1
, i
4
] [i
4
] [i
3
, i
1
, i
4
] FCI
i
5
i
2
i
3
i
1
i
4
i
3 IndVect
1 1 2 2 3 1 SuppVect
(c)I6 = { i3,i 4,i 4} is generated.
0001 0100 1010 1100 1110 1000 Labs
1010
[i
5
] [i
2
, i
1
, i
4
] [i
3
, i
4
] [i
1
, i
4
] [i
4
] [i
3
, i
1
, i
4
] FCI
i
5
i
2
i
3
i
1
i
4
i
3 IndVect
1 1 2 2 3 1 SuppVect
(d) No new label is generated.
0001 0100 1010 1100 1110 1000 Labs
1100
[i
5
] [i
2
, i
1
, i
4
] [i
3
, i
4
] [i
1
, i
4
] [i
4
] [i
3
, i
1
, i
4
] FCI
i
5
i
2
i
3
i
1
i
4
i
3 IndVect
1 1 2 2 3 1 SuppVect
(e) No new label is generated.
Figure 13: GeneratedCFI s by Algorithm CfCi. IndVect
andSuppVect contain the vertices who are at the origin of
the corresponding labels and their supports, respectively .
maximal cliques as given in Figure 14. Each cliqueC
i
in-
duces a closed itemsetI
i
.
The maximal cliques resulted from our algorithmC
1
,
C
2
,C
3
,C
4
,C
5
andC
6
are illustrated via sub-Figures 14a,
14b,14c, 14d, 14e and 14f, respectively .
i5 0001
(a)
C
1
modeling theCFI I
1
={ i
5
} i1
i2 i4
0100 1100
0100
1100
0100 1110
(b)C2 modeling theCFI I2 = { i2,i 1,i 4} i3 i4
1010
1010 1110
(c)C3 modeling theCFI I3 = { i3,i 4} i1
i4
1100
1100
1110
(d)C4 modeling theCFI I4 = { i1,i 4} i4 1110
(e)C5 modeling theCFI I5 = { i4} i1
i3 i4
1000
1010
1100
1100
1010 1110
(f)C6 modeling theCFI I6 = { i3,i 1,i 4} Figure 14:
Maximal cliques VSCFI s
6 Experiments
As stated in Section 2.3, besides CfCi, we consider in our
experiments the algorithms NAFCP [10], NECLA T [13]
and GrAFCI+ [1 1]. In addition, to have credible results,
we chose the datasets according to their varied densities.
More precisely , the chosen datasets : Mushroom, L ymph
and Soybean datasets
1
have respectively 18 %, 40 % and
32 % (see T able 3).
In T able 3, we introduce the characteristics of each tested
dataset DName(nbItem,nbTr,dns ) , where DName is
the dataset’ s name, and nbItem , nbTr and dns are, re-
spectively , the corresponding items’ number , transactions’
number and density . The datasets’ information given in the
table are:
– freq : the frequency adopted by the algorithm,
– θ : the corresponding support tofreq ,
– dns : the density of the dataset according tofreq ,
1
The tested datasets are downloaded from https://dtai.cs.
kuleuven.be/CP4IM/datasets/
58 Informatica 48 (2024) 49–62 F .Z. Lebbah et al.
– nvr : the vertices’ number of the consistent dataset’ s
graph,
– nbd : the bonds’ number of the consistent dataset’ s
graph.
T able 3: Characteristics of the tested dataset: Mushroom,
L ymph and SoyBean.
Mushroom(1 19,8124,18%)
freq θ dns nvr nbd
5 % 406 30% 67 1 168
25 % 2031 51% 31 229
50 % 4062 74% 12 52
80 % 6499 94% 5 14
L ymph(148,68,40%)
freq θ dns nvr nbd
5 % 7 45% 59 1369
25 % 37 60% 40 549
50 % 74 76% 24 193
80 % 1 18 91% 1 1 58
Soybean(50,630,32%)
freq θ dns nvr nbd
5 % 32 36% 44 601
25 % 158 56% 23 157
50 % 315 75% 12 55
80 % 504 93% 5 13
The results provided by the tested algorithms are ana-
lyzed regarding the CPU-time, the number of computed
CFI s (calledncl ) and the memory usage (calledMem ).
The used computer has the following configuration:
Linux-20.04.1 on a notebook with Intel Core i7 and 8
GB/RAM. The experiments are performed in the Java lan-
guage.
In sub-Figures 15a, 15b and 15c, we present the behav-
ior of CfCi, NAFCP, NECLA T and GrAFCI+ applied to
Mushroom, in terms of the gotten CPU-time,ncl (the num-
ber of closed itemsets) and mem (the memory usage), re-
spectively .
W e notice, via sub-Figure 12a, that the applied methods
have approximately the same CPU time until the frequency
of 25 %, where CfCi and GrAFCI+ become slower . At a
frequency of 5 %, CfCi and NECLA T are the fastest ones,
whereas NAFCP is considerably the slowest one. Sub-
Figure 15b shows that all the methods, at frequencies of
80 %, 50 % and 25 %, provide the same number of CFI s.
Whereas, at a frequency of 5 %, our algorithm provides
more CFI s than NECLA T and GrAFCI+, and less than
NAFCP.
In terms of memory usage, sub-Figure 15c highlights the
ef ficiency of our algorithm compared to the others.
In Figure 16, we introduce three diagrams that express
the behavior of the adopted methods, applied to the dataset
(a) CPU-time
(b) Number ofCFI s
(c) Memory usage
Figure 15: MUSHROOM dataset
Lymph , in terms of CPU-time, number ofCFI s and mem-
ory usage.
As shown in sub-Figure 16a, at frequencies 80 % and
50 %, all the methods have the same level of fastness,
whereas CfCi becomes faster at frequency 25 %. In addi-
tion, we notice that at frequency 5 %, NAFCP becomes the
slowest, NECLA T the fastest and CfCi becomes close to
NECLA T.
Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 59
(a) CPU-time
(b) Number ofCFI s
(c) Memory usage
Figure 16: L YMPH dataset
Sub-Figure 16b shows, in terms of the CFI s number ,
that at frequencies80 %,50 % and25 %, all the methods pro-
vide the same results except GrAFCI+, which gives more
CFI s at frequency 80 %. Whereas, at frequency 5 %, the
provided number ofCFI s dif fers from algorithm to algo-
rithm, and our algorithm provides moreCFI s than NAFCP
and NECLA T and less than GrAFCI+. Figure 17 contains
SoyBean’ s diagrams, that reveal the behavior of the tested
methods in terms of CPU-time, number ofCFI s and mem-
ory usage.
(a) CPU-time
(b) Number ofCFI s
(c) Memory usage
Figure 17: SOYBEAN dataset
W e notice through sub-Figure 17a that CfCi and
NECLA T start the fastest ones at freq = 80 %, then
NECLA T remains the most ef ficient. On the other side, as
the frequency decreases, the fastness of CfCi and GrAFCI+
decreases.
Regarding the number of CFIs, all the methods pro-
60 Informatica 48 (2024) 49–62 F .Z. Lebbah et al.
vide the same number of CFIs at each frequency , except
GrAFCI+, which gives one more CFI.
W e notice through sub-Figure 17a that CfCi and
NECLA T start the fastest ones at freq = 80 %, then
NECLA T remains the most ef ficient, and as the frequency
decreases, CfCi and GrAFCI+.
Regarding the number ofCFI s, all the methods provide
the same number of closed frequent itemsets at each fre-
quency , except GrAFCI+ which gives one moreCFI s.
Sub-Figure 17c shows the ef ficiency of our algorithm
in regards to memory usage. Contrary to GrAFCI+ and
NECLA T, which are really costly .
As stated in Section 5, our algorithm is a complete ap-
proach that is designed to provide the exact number of
CFI s. W e observed through the diagrams given above,
that NAFCP, GrAFCI+ and NECLA T may yield less or
more CFI s in certain situations. In addition, our algo-
rithm is the most ef ficient in terms of memory usage for all
datasets, except L ymph withfreq = 5 %. Thus, CfCi has
the average CPU-time results and the best results in terms
of the number ofCFI s and memory usage.
W e have to precise that the case where CfCi, applied to
L ymph, provides the worst results in terms of memory us-
age. This result can be justified by the size of the corre-
sponding graph. According to T able 3, the graph that mod-
els the L ymph dataset contains the biggest values, namely
1 18 vertices and 91 bonds, compared to the others.
7 Discussion
There are advantages and disadvantages to each of the pre-
vious data mining techniques. Most of them are based on
tree structures (see T able 2), namely NList, Gr -tree and F P-
tree in NAFCP, GrAFCI+ and NECLA T, respectively . This
is because of the recursive relationship indirectly expressed
in Property 1.
In addition, the authors of the tested techniques, pro-
posed approximate approaches, and employed recursive al-
gorithms to construct and to explore the corresponding tree
structure.
Using recursive structures and approximated algorithms
in most SOT A approaches means that memory usage is
costly and finding all closed itemsets is not guaranteed. In
addition, updating tree structures, when the corresponding
dataset is modified, is generally a complex task.
In this article, we developed a new boolean and linear
structure that can be easily updated when necessary , and
iterative algorithms to optimize memory usage. Since our
algorithm CfCi is exact and polynomial, it provides all the
CFI s and shows good CPU-time.
In certain instances, when the frequency is 5 %, CfCi
slows down as compared to NAFCP, GrAFCI+ and
NECLA T. This is because of the step that verifies if the
computedCFI does not belong to the current set ofCFI s.
Most times, the number of the CFI s ncl provided by
our algorithm is the same as the other techniques. Some-
times, we get less or moreCFI s, particularly whenfreq =
5 %. Our algorithm comprises calculating all the intersec-
tions between the bonds’ labels and preventing duplicated
CFI s. Thus, we have the incompleteness of the algorithms
NAFCP, NECLA T and GrAFCI+ compared to CfCi.
Our approach is the best choice for optimizing memory
usage, except in the L ymph dataset whenfreq = 5 %.
8 Conclusion and futur e works
This paper has introduced new modeling and solving ap-
proaches to find all theCFI s. W e have presented an ef fi-
cient graph modeling approach that is based on the clique
notion in graph theory . Implementing our model by using a
linear structure makes our graph model more flexible to be
updated when transactions or items are added or removed.
Our proposed model is based on the fact that a frequent
itemset is considered a clique in an undirected graph. Thus,
CfCi is a new algorithm, based on the maximal clique’ s
principle, has been introduced to tackle the closed itemsets
mining problem.
Our first experiments have shown the ef ficiency of CfCi
in finding all theCFI s compared to the recent algorithms
existing in the literature. This is because the proposed algo-
rithm uses boolean structures and is basically conceived to
find all CFI s, through systematic searches. Thus, our al-
gorithm is more ef ficient than the tested methods in terms
of the number of CFI s and memory usage, and approxi-
mately more ef ficient in terms of CPU time.
T o summarize, the structure used in our approach is lin-
ear , and the proposed algorithm is iterative, exact, and poly-
nomial. That is the origin of the promising results exposed
and analyzed in this article.
However , there are some cases where our approach is less
ef ficient than the newest methods, particularly in terms of
CPU time. This is because of the CfCi’ s step, which tackles
the duplicateCFI s cases.
As with any research work, a new version that inher -
its the advantages and avoids the disadvantages of our ap-
proach can follow our approach. Exploring opportunities
for the practical implementation of the proposed algorithm
will serve as the future direction for our research. W e con-
sider adapting our dataset graph model and CfCi algorithm
to tackle classification and clustering problems.
Refer ences
[1] Fournier -V iger , Philippe and Chun-W ei Lin, Jerry and
T ruong-Chi, T in and Nkambou, Roger (2019) A Sur -
vey of High Utility Itemset Mining, High-Utility Pat-
tern Mining: Theory , Algorithms and Applications ,
Springer International Publishing, pp. 1–45. https:
//doi.org/10.1007/978- 3- 030- 04921- 8_1
Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 61
[2] Charu C. Aggarwal (2015) Data Mining-The T ext-
book , Springer International Publishing. https://
doi.org/10.1007/978- 3- 319- 14142- 8
[3] Han, Jiawei and Kamber , Micheline and Pei, Jian
(201 1) Data Mining: Concepts and T echniques ,
Mor gan Kaufmann. http://www.sciencedirect.
com/science/book/9780123814791
[4] T ruong-Chi, T in and Fournier -V iger , Philippe (2019)
A Survey of High Utility Sequential Pattern Min-
ing High-Utility Pattern Mining , Springer Interna-
tional Publishing, pp. 97–129. http://dx.doi.
org/10.1007/978- 3- 030- 04921- 8_4
[5] Carson Kai-Sang Leung (2009)Frequent Item-
set Mining with Constraints, Encyclope-
dia of Database Systems , Springer Interna-
tional Publishing, pp. 1 179–1 183. https:
//doi.org/10.1007/978- 0- 387- 39940- 9_170
[6] T ias Guns and Siegfried Nijssen and Luc De Raedt
(201 1) Itemset mining: A constraint programming
perspective, Artificial Intelligence , Elsevier BV ,
175(12-13), pp. 1951–1983. https://doi.org/10.
1016/j.artint.2011.05.002
[7] W ensheng Gan and Jerry Chun-W ei Lin and Philippe
Fournier -V iger and Han-Chieh Chao and Justin Zhan
(2017) Mining of frequent patterns with multiple min-
imum supports, Engineering Applications of Artificial
Intelligence , Elsevier BV , 60(C), pp. 83–96. https:
//doi.org/10.1016/j.engappai.2017.01.009
[8] Fatima-Zahra El Mazouri and Said Jabbour and Bad-
ran Raddaoui and Lakhdar Sais and Mohammed
Chaouki Abounaima and Khalid Zenkouar (2019)
Breaking Symmetries in Association Rules, Pr o-
cedia Computer Science , Elsevier BV , 148(C),
pp. 283–290. https://www.sciencedirect.com/
science/article/pii/S1877050919300523
[9] Zaki, M.J. and Hsiao, C.-J. (2005) Ef ficient algo-
rithms for mining closed itemsets and their lattice
structure, T ransactions on Knowledge and Data En-
gineering , Institute of Electrical and Electronics Engi-
neers (IEEE), 17(4), pp. 462–478. http://dx.doi.
org/10.1109/tkde.2005.60
[10] T uong Le and Bay V o (2015) An N-list-based al-
gorithm for mining frequent closed patterns, Expert
Systems with Applications , Elsevier BV , 42(19), pp.
6648-6657. https://doi.org/10.1016/j.eswa.
2015.04.048
[1 1] Ledmi, Makhlouf and Zidat, Samir and Hamdi-
Cherif, Aboubekeur (2021)GrAFCI+ A Fast
Generator -Based Algorithm for Mining Fre-
quent Closed Itemsets, Knowledge and Infor -
mation Systems , Springer Science and Business
Media LLC, 63(7), pp. 1873–1908. https:
//doi.org/10.1007/s10115- 021- 01575- 3
[12] Grahne, G. and Zhu, J. (2005) Fast algorithms for fre-
quent itemset mining using FP-trees, T ransactions on
Knowledge and Data Engineering , Institute of Elec-
trical and Electronics Engineers (IEEE), 17(10), pp.
1347–1362. http://dx.doi.org/10.1109/tkde.
2005.166
[13] Nader Aryabarzan and Behrouz Minaei-Bidgoli
(2021) NEclatClosed: A vertical algorithm for min-
ing frequent closed itemsets, Expert Systems with Ap-
plications , Elsevier BV , 174(C), pp. 1 14738. https:
//doi.org/10.1016/j.eswa.2021.114738
[14] Gang Fang and Y ue W u and Ming Li and Jia
Chen (2015) An Ef ficient Algorithm for Mining
Frequent Closed Itemsets, Informatica (Slove-
nia) , Slovenian Society Informatika, 39(1), pp.
87–98. https://api.semanticscholar.org/
CorpusID:214751872
[15] Renata Iváncsy and István V ajk (2005) Fast Dis-
covery of Frequent Itemsets: a Cubic Structure-
Based Approach, Informatica (Slovenia) , Slove-
nian Society Informatika, 29(1), pp. 71–78.
http://www.informatica.si/index.php/
informatica/article/view/19
[16] Alkenani, Jawad and Kheerallah, Y ousif Abdulwahab
(2023) A New Method Based on Machine Learning to
Increase Ef ficiency in W ireless Sensor Networks, In-
formatica (Slovenia) , Slovenian Society Informatika,
46(9), pp. 45–52. http://dx.doi.org/10.31449/
inf.v46i9.4396
[17] Al-Jammali, Karrar (2023) Prediction of Heart
Diseases Using Data Mining Algorithms, Infor -
matica (Slovenia) , Slovenian Association Infor -
matika, 47(5), http://dx.doi.org/10.31449/
inf.v47i5.4467
[18] Karna, Hrvoje (2020) Data Mining Approach to
Ef fort Modeling On Agile Software Projects, In-
formatica (Slovenia) , Slovenian Association Infor -
matika, 44(2), pp. 231–139. http://dx.doi.org/
10.31449/inf.v44i2.2759
[19] A wad, Fouad Hammadi and Hamad, Mur -
tadha M. (2023) Big Data Clustering T ech-
niques Challenged and Perspectives: Review ,
Informatica (Slovenia) , Slovenian Associa-
tion Informatika, 47(6), pp. 203–218. http:
//dx.doi.org/10.31449/inf.v47i6.4445
[20] Agrawal, Rakesh and Srikant, Ramakrishnan (1994)
Fast Algorithms for Mining Association Rules in
Lar ge Databases, Pr oceedings of the 20th Interna-
tional Confer ence on V ery Lar ge Data Bases ,Mor gan
62 Informatica 48 (2024) 49–62 F .Z. Lebbah et al.
Kaufmann Publishers Inc., San Francisco, CA,
USA, pp. 487–499. https://dl.acm.org/doi/
10.5555/645920.672836
[21] Saïd Jabbour and Mehdi Khiari and Lakhdar Sais and
Y akoub Salhi and Karim T abia (2013) Symmetry-
Based Pruning in Itemset Mining, 25th IEEE Inter -
national Confer ence on T ools with Artificial Intelli-
gence -ICT AI , IEEE Computer Society , Los Alami-
tos, CA, USA, pp. 483–490. https://doi.org/10.
1109/ICTAI.2013.78
[22] T akeaki Uno and Masashi Kiyomi and Hiroki
Arimura (2004) LCM ver . 2: Ef ficient Mining
Algorithms for Frequent/Closed/Maximal Itemsets,
FIMI ’04, Pr oceedings of the IEEE ICDM W ork-
shop on Fr equent Itemset Mining Implementations ,
CEUR-WS.or g, Brighton, UK, pp. 1–1 1. https://
ceur- ws.org/Vol- 126/uno.pdf
[23] Agrawal, Rakesh and Imieliński, T omasz and Swami,
Arun (1993) Mining association rules between sets
of items in lar ge databases, International Confer ence
on Management of Data , Association for Comput-
ing Machinery , W ashington, D.C., USA, pp. 207–216.
https://doi.org/10.1145/170035.170072 ,
[24] Y un Chi and Haixun W ang and Y u, P .S. and Muntz,
R.R. (2004) Moment: Maintaining Closed Frequent
Itemsets over a Stream Sliding W indow , Fourth In-
ternational Confer ence on Data Mining (ICDM’04) ,
IEEE, Brighton, UK, pp. 59–66. http://dx.doi.
org/10.1109/icdm.2004.10084
[25] Belaid, Mohamed-Bachir and Bessiere, Christian and
Lazaar , Nadjib (2019) Constraint Programming for
Mining Borders of Frequent Itemsets, Pr oceedings of
the T wenty-Eighth International Joint Confer ence on
Artificial Intelligence, IJCAI-19 , International Joint
Conferences on Artificial Intelligence Or ganization,
Macao, China, pp. 1064–1070. https://doi.org/
10.24963/ijcai.2019/149
[26] AlZoubi, W ael (2015) An Improved Graph Based
Method for Extracting Association Rules, Interna-
tional Journal of Softwar e Engineering and Applica-
tions , Academy and Industry Research Collaboration
Center (AIRCC), pp. 1–10. http://dx.doi.org/
10.5121/ijsea.2015.6301
[27] Raedt, Luc De and Guns, T ias and Nijssen,
Siegfried (2010) Constraint programming for data
mining and machine learning, Pr oceedings of the
T wenty-Fourth AAAI Confer ence on Artificial Intel-
ligence , AAAI Press, Atlanta, Geor gia, pp. 1671–
-1675. https://dl.acm.org/doi/abs/10.5555/
2898607.2898874
[28] Schlegel, Benjamin and Gemulla, Rainer and Lehner ,
W olfgang (201 1) Memory-ef ficient frequent-itemset
mining, Pr oceedings of the 14th International Con-
fer ence on Extending Database T echnology , Asso-
ciation for Computing Machinery , Uppsala, Swe-
den, pp. 461–472. https://doi.org/10.1145/
1951365.1951420
[29] T iwari, V ivek and T iwari, V ipin and Gupta, Shailen-
dra and T iwari, Renu (2010) Association rule min-
ing: A graph based approach for mining frequent
itemsets, 2010 International Confer ence on Net-
working and Information T echnology , IEEE, Manila,
Philippines, pp. 309–313. http://dx.doi.org/10.
1109/icnit.2010.5508505
[30] Gouda, K. and Zaki, M.J.(2001) Ef ficiently mining
maximal frequent itemsets, Pr oceedings 2001 IEEE
International Confer ence on Data Mining , IEEE, San
Jose, CA, USA, pp. 163–170. http://dx.doi.org/
10.1109/icdm.2001.989514
[31] Han, Kyong Rok and Kim, Jae Y earn (2005)
FCILINK: Mining Frequent Closed Itemsets Based
on a Link Structure between T ransactions, Journal of
Information & Knowledge Management , W orld Sci-
entific Pub Co Pte Lt, pp. 257–267. https://doi.
org/10.1142/S0219649205001213
[32] De Raedt, Luc and Guns, T ias and Nijssen, Siegfried
(2008) Constraint Programming for Itemset Mining,
Pr oceedings of the 14th ACM SIGKDD International
Confer ence on Knowledge Discovery and Data Min-
ing , Association for Computing Machinery , Las V e-
gas, Nevada, USA, pp. 204–212. https://doi.
org/10.1145/1401890.1401919