https://doi.or g/10.31449/inf.v48i8.5480 Informatica 48 (2024) 49–62 49 Closed Itemset Mining: A Graph Theory Perspective Fatima Zohra Lebbah 1, 2 , Moussa Larbani 3, 4 , Abdellatif Rahmoun 5 1 Higher School of Electrical and Ener getic Engineering of Oran (ESG2E), V icinal Road N9,Oran, 31000, Algeria 2 Laboratory of Research in Computer Science-SBA, Computational Intelligence and Soft Computing T eam (CISCO), Higher School of Computer Science, BP 73, Bureau de poste EL WIAM, Sidi-Belabbes, 22016, Algeria 3 School of Mathematics and Statistics, Carleton University , Ontario, 1 125 Colonel By Drive, Ottawa, K1S 5B6, Canada 4 National Higher School of Statistics and Applied Economics (ENSSEA), Pôle Universitaire de Koléa, T ipaza, Algiers, 42003,Algeria 5 Laboratory of Research in Computer Science-SBA, Computational Intelligence and Soft Computing T eam (CISCO), Higher School of Computer Science, BP 73, Bureau de poste EL WIAM, Sidi-Belabbes, 22016, Algeria E-mail: fz_lebbah@yahoo.fr , fz.lebbah@esgee-oran.dz, larbani61@hotmail.com, a.rahmoun@esi-sba.dz Keywords: dataset, closed frequent itemset, labeled graph, maximal clique Received: November 28, 2023 Data mining is a field that focuses on extracting and analyzing usable data fr om lar ge databases. This paper specifically concentrates on the pr oblem of finding closed fr equent itemsets, which is extensively studied in the field. Pr evious techniques based on graph theory have used tr ee structur es and r ecursive algorithms, which have limitations. In this paper , we pr opose a scalable modeling appr oach that r epr e- sents a transaction dataset using an undir ected and labeled graph. The labels on the edges ar e computed and assigned in a clever manner . W e also intr oduce a polynomial and exact algorithm based on the clique notion in graph theory to compute all the closed fr equent itemsets. Our initial testing r esults demonstrate the efficiency of our algorithm in terms of CPU-time and memory usage compar ed to r ecent methods in the literatur e. Additionally , our graph model can be easily extended when the dataset is updated. W e utilize lin- ear structur es with boolean values to implement our graph and employ an exact algorithm with polynomial complexity . These aspects of our appr oach pr ovide str ong foundations for investigating mor e challenging issues r elated to the pr oblem at hand. Povzetek: Prispevek obravnava skalabilno modeliranje podatkovnih množic z uporabo neusmerjenih, oz- načenih grafov za rudarjenje zaprtih pogostih množic. Pr edlagan je nov algoritem s polinomsko komplek- snostjo, temelječ na teoriji klik. 1 Intr oduction Nowadays, Data Mining is the science that allows us to work with a dataset in line with our perspectives [18, 16, 17, 19]. It involves the process of finding patterns and knowl- edge from a lar ge amount of data. However , achieving the desired and specific information poses a significant chal- lenge for many researchers. This has led to the develop- ment of various competing models and techniques, such as in [27, 3, 2, 4, 7]. Itemset Mining is an appealing field within the realm of Data Mining, primarily because of the need to manage lar ge-scale data. As the size of data increases, traditional methods and techniques used to address Itemsets’ problems tend to become slower and less ef ficient. Researchers are currently focused on discovering and proposing new tech- niques and algorithms to model and solve problems related to Itemset Mining, including computing frequent, closed, and maximal itemsets. The process of extracting frequent sets of items bought by customers, known as finding frequent itemsets in a trans- action database [23, 15, 3], has been addressed by various techniques. Some of these techniques include Constraints Programming [32, 6, 5, 25, 8], as well as Graph Theory [24, 28, 29, 30, 26]. Currently , graph theory-based approaches have been pro- posed to find frequent itemsets. However , in this arti- cle, we present a new approach to address the problem of Closed Frequent Itemsets CFI s. W e show that a transac- tion database can be ef fectively modeled using the clique’ s concept through an undirected and labeled graph. In this article, we introduce a clique-based algorithm for finding CFI s, which we refer to as CfCi. This approach uses linear structures and a polynomial algorithm. Specif- ically , we show how our graph model enables the step-by- step computation of all theCFI s. The article is structured as follows: Section 2 introduces the basic concepts of theCFI s min- ing problem, along with a review of related works. Section 3 presents graph theory notions used to transform a transac- tion database into an undirected and labeled graph. Section 4 describes the equivalency between a (Closed) Frequent Itemsets and a (Maximal) clique and how to deduce aCFI from a maximal clique. Section 5 depicts our CfCi algo- 50 Informatica 48 (2024) 49–62 F .Z. Lebbah et al. rithm and the called functions. The complexity analysis and an executed example are also given in this section. In Sec- tion 6, we compare our algorithm CfCi to the latest exist- ing techniques in the literature to highlight its contribution. This section also includes the reporting of experimental re- sults on various datasets and their analysis. In Section 7, we discuss, describe, analyze, and interpret our findings. Finally , Section 8 concludes the paper . 2 Pr oblem definition and r elated works In this section, we begin by introducing the problem using an example of a market basket. Then, we provide basic concepts and definitions. For instance, let’ s consider the market baskets illustrated in Figure 1, where we analyze five shopping baskets. The aim is to study customer shopping habits and identify as- sociations and correlations between dif ferent items. In this paper , we focus on computing itemsets that are frequently purchased together by customers. Specifically , we address the problem ofCFI s. Figure 1: Market baskets analysis T able 1 represents the market baskets as transactions and items. Each row corresponds to a customer transaction, and each column represents an item. The assigned lettersm ,b , e ,j , ands represent specific items: milk, bread, egg, jam, and sugar , respectively . According to the table, the itemset{ m,j} has a fre- quency of three, showing that three customers purchase the both articles milk and jam. Similarly , the itemset{ b,e} has a frequency of two, showing that two customers purchase the both articles bread and egg. T able 1: The itemset database, as shown on the left side of Figure 1, can be represented using binary matrix notation, as shown on the right side of the figure. T Itemset t 1 { m,b,e,j } t 2 { m,s,e,j } t 3 { b,e} t 4 { m,j} m s b e j t 1 1 0 1 1 1 t 2 1 1 0 1 1 t 3 0 0 1 1 0 t 4 1 0 0 0 1 Therefore, an itemset database problem is defined by a set of itemsI = { m,s,b,e,j } and a set of transactions T ={ t 1 ,t 2 ,t 3 ,t 4 } . A boolean matrixD(n× m) , which is called the transaction database, represents the relationship between these items through the transactions. Therefore, an itemset database problem is defined by a set of itemsI = { m,s,b,e,j } and a set of transactions T ={ t 1 ,t 2 ,t 3 ,t 4 } . The relationship between these items through the transactions is represented by a boolean matrix D(n× m) , which is called the transaction database (see Definition 1). Here,n =|I| the number of items inI and m =|T| represents the number of transactions inT . Definition 1 (transaction database) . Let T = { t 1 ,t 2 ,··· t n } be the set of transactions, and I = { i 1 ,i 2 ,··· i m } the set of items. A transaction database is defined thr ough the boolean database matrix D(n× m) : D j,k =    1 if the itemi j occurs in the transactiont k 0 else (1) wher e: n =|T| is the number of transactions, m =|I| is the number of items. Based on Figure 1 and T able 1, the analyzed shopping results in the following transaction databaseD(4× 5) as shown in Matrix 2. D =     1 0 1 1 1 1 1 0 1 1 0 0 1 1 0 1 0 0 0 1     (2) 2.1 Fr equent itemset mining Before delving into the topic of Closed Frequent Itemset, it is important to introduce the concept of support, as defined in Definition 2. Definition 2 (support measure) . LetD be a transaction database,I be the set of items, andT be the set of transac- tions. The support measur e of an itemsetI⊆I is denoted asSupp(I) and defined as: Supp(I) = ∑ i∈ I ( ∏ t∈T D ti ) (3) Supp(I) is the number of transactions containing all the items belonging toI . Definition 3 (frequent itemset mining) . LetD be a trans- action database and θ be the minimum support thr eshold, wher e θ > 0 . An itemset I is consider ed fr equent if its support, denoted asSupp(I) , is gr eater than or equal toθ (θ > 0 ). Conversely , an itemsetI is consider ed infr equent if its support is less thanθ (Supp(I)<θ ). For example, let’ s consider the transaction database pre- sented in Matrix 2. The support values for I 1 = { i 2 } , I 2 ={ i 3 ,i 4 } and I 3 ={ i 1 ,i 5 } are S1 , 2 and 3 , respec- tively . Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 51 Pr operty 1 (downward closure) . Every subset of a fr equent itemset is also fr equent. In other wor ds, I is fr equent ⇒∀ I i ⊂ I :I i is fr equent (4) or equivalently: I is infr equent ⇒∀ I i ⊃ I :I i is infr equent (5) Property 1 [2], also known as the Apriori Principle , is re- lated to Property 3 and Property 2 [1, 21], which introduce the characteristics of monotonicity and anti-monotonicity of supports and the inclusion of frequent itemsets, respec- tively . Pr operty 2 (anti-Monotony) . [20] Let ther e be two item- sets I 1 and I 2 such that I 1 ⊂ I 2 . It follows that Supp(I 1 )≥ Supp(I 2 ) . ∀ I i ,I j ∈I /I i ⊆ I j :Supp(I i )≥ Supp(I j ) (6) Pr operty 3 (monotonicity) . LetI 1 andI 2 be two itemsets such thatI 1 ⊆ I 2 . IfSupp(I 2 )≥ θ thenSupp(I 1 )≥ θ . ∀ I i ,I j ∈I /I i ⊆ I j :Supp(I j )≥ θ ⇒ Supp(I i )≥ θ (7) In other words, the relation Fr equent (as stated in Prop- erty 2) indicates that the subsets of a frequent itemset are also frequent. Additionally , the support measure exhibits monotonicity , as stated in Property 3. 2.2 Closed itemset mining A closed itemset, denoted as I c , is a frequent itemset that has a support, denoted asSupp(I c ) , equal to or greater than a fixed minimum support threshold, denoted as θ . All the items in I c must be present in the optimal set of transac- tions, denoted as T Ic , which has a size equal to Supp(I c ) (as defined in Definition 4). Definition 4 (closed itemset) . LetI c be a fr equent itemset and Supp(I c ) be its support. The itemset I c is closed if- if it is not included in a lar ger fr equent itemsetI i that has the same support, Supp(I c ) = Supp(I i ) , and appears in all the transactions that include I c , as well as additional transactions. I c is closed ⇔ ∄ I i ⊃ I c :I i is fr equent∧ Supp(I c ) =Supp(I i )∧ T Ic ⊂ T Ii . (8) wher e, T Ic and T Ii ar e the sets of the transactions which contain the elements ofI c andI i , r espectively . Note that our aim is to identify all the closed itemsets. Figure 2 depicts the first part of the Hasse diagram for the database represented in Matrix 2. An edge is drawn from the itemset I 1 to the itemset I 2 if and only if I 1 ⊂ I 2 and |I 2 | =|I 1 | + 1 . In this figure, the blue ellipses represent the frequent itemsets, while the green ellipses represent the closed itemsets. The support of each closed itemset is men- tioned on the left side of the corresponding ellipse. For instance, as shown in Figure 2, if we set the minimum support threshold (θ ) to 2, the support of{ e} is3 , while the support of its superset{ b,e} is 2 . Similarly , the support of{ m,j} is 3 . This shows the anti-monotony property (as stated in Property 2), where e is a subset of{ e}⊂{ b,e} , and their supports are 3 and 2 , respectively . 2.3 Related works So far , several algorithms have been developed for min- ing Closed Frequent Itemsets, including AprioriTID Close [20], LCM [22], CHARM [9], FP-Close[12], FCILINK [31], NAFCP [10], EMFCI[14] NECLA T [13] and GrAFCI+ [1 1]. However , most of these methods utilize tree structures and recursive algorithms. One primary challenge in the field of data mining is to ef- ficiently find all the closed itemsets while optimizing CPU time and memory usage. T o address this challenge, we propose a graph model based on a linear structure that minimizes memory usage. Additionally , we introduce the exact algorithm CfCi to find all the closed frequent itemsets. T o emphasize the contribution of our proposed approach in this work, we will compare the results of the CfCi algo- rithm with the latest algorithms, namely NAFCP, NECLA T and GrAFCI+. This comparison will be presented in Sec- tion 6. W e consider not only the novelty of the chosen algo- rithm but also the availability of corresponding applications and the detailed aspect of the results. T able 2 presents the key characteristics of the algorithms that are considered in the experiments section. T able 2: Characteristics of the tested algorithms and their used structures Used Struc- ture Used Al- gorithms Proposed Ap- proach CfCi linear structure iterative exact NAFCP tree structure recursive approximate NECLA T tree structure recursive approximate GrAFCI+ tree structure recursive approximate 3 Itemset mining vs undir ected graph Firstly , we propose an undirected and labeled graph to rep- resent a dataset. This model is based on the idea that con- necting items that appear in the same transaction forms a clique, which is a sub-graph where every pair of distinct vertices are adjacent. Therefore, each transaction is repre- sented by a clique. However , since a bond may be shared by multiple cliques, we introduce a labeling approach that ensures a bond shared by dif ferent cliques (as defined in Definition 9) is accurately represented in the graph model. 52 Informatica 48 (2024) 49–62 F .Z. Lebbah et al. {} {m} {b} {e} {j} {s} {m,b} {m,e} {m,j} {m,s} {b,e} {b,j} {b,s} {e,j} {e,s} {j,s} Supp(x) x frequent itemset Supp(x) x closed itemset Figure 2: The search space of frequent itemset mining for the database of T able 1 andθ =2 In our graph model, we assign a binary value, known as a sub-label, to each transaction. Specifically , as defined in Definition 5, the bond (j,k ) that represents the occurrence of items i j and i k in the same transaction t p is character - ized by the sub-labelL p (j,k ) = 10 n− p− 1 , wheren is the number of transactions. Definition 5 (sub-label) . Consider a databaseD(n× m) and a transactiont p ∈T . The occurr ence of itemsi j andi k in the same transactiont p , r epr esented byD[p,j ] = 1 and D[p,k ] = 1 , is modeled by the bond(j,k ) labeled with the binary valueL p (j,k ) = 10 n− p− 1 . This sub-label is a part of the overall labelL p (j,k ) that r epr esents the r elationship between the itemsi j andi k . Each time the pair of itemsi j andi k appears in a transac- tion, a new sub-label is added to the label of the bond(i,j ) . In other words, the label of the bond (i j ,i k ) , denoted by L(i j ,i k ) , is the sum of all the sub-labels associated with that bond, as expressed by Equation 9 in Definition 6. Definition 6 (bond label) . LetG =⟨ X,U, L⟩ be the graph model of a transactions database. The label of the bond (i,j )∈ U is given by: L(i,j ) = ∑ tp∈T L p (i,j ) (9) In Definition 7, we introduce the notion of the label size denoted by HW(L(i,j )) , which is the number of the in- cluded ones or the included sub-labels inL(i,j ) . Definition 7 (label size) . LetG =⟨ X,U, L⟩ be the graph model of a transaction database. The size of the label L(i,j ) , denoted asHW(L(i,j )) , r epr esents the Hamming W eight ofL(i,j ) , which is the number of included sub- labels. Definition 8 introduces the concept that the items appear - ing in the transaction t are well represented by a clique. T o dif ferentiate one transaction or clique from another , we assign the corresponding sub-labelL p (i,j ) to each bond (i,j ) . Definition 8 (transaction VS clique) . LetD(n× m) be a transaction database defined on the setT of transactions and the setI of items. Suppose that the transactiont p ∈T includes all the items of the itemsetI⊂I . The transaction t p is r epr esented by the cliqueC p , wher e: C p = { i j ,i k ∈ I i ,L p (i j ,i k ) = 10 (n− p− 1) } (10) In Definition 9, we depict how we model each transaction and its included items by a labeled clique in the correspond- ing undirected and labeled graph, called dataset graph. Definition 9 ( dataset VS Labeled graph) . LetD(n× m) be a transaction database, wher e n =|T| r epr esents the num- ber of transactions and m = |I| r epr esents the number of items. W e associate withD the undir ected and labeled graphG =⟨ X,U, L⟩ wher e: X : set of vertices that r epr esents the set of items, with |X| =|I| =m U : set of bonds which ar e defined as follows:    (i j ,i k )∈ U D if∃ t p ∈T :D[p,j ] = 1 ∧D [p,k ] = 1 (i j ,i k ) / ∈ U else and |U| =m+ ∑ i=1..n ∑ j=1..m D[i,j ]− n (1 1) L ij : binary value assigned to the bond (i,j )∈ U , which is defined as follows: L ij,i k = ∑ p=1..n D[p,j ]∗D [p,k ]∗ 10 n− p− 1 (12) For instance, let’ s consider the itemset mining problem [6] described below: i 1 i 2 i 3 i 4 t 1 1 0 1 1 t 2 1 1 0 1 t 3 0 0 1 1 In Figures 3a, 3b, and 3c, we depict the cliques generated from the transactionst 1 ,t 2 , andt 3 , respectively . Therefore, our graph modelG =⟨ X,U, L⟩ is formed by combining all the cliques extracted fromD , and the corre- sponding adjacency matrixA will be a binary symmetric matrix, as defined in Definition 10. Definition 10 (items adjacency matrix) . LetG =⟨ X,U, L⟩ be the graph modeling the databaseD(n× m) . The corr e- sponding adjacency matrixA toG is defined as follows: a jk = { L(i j ,i k ) if (i j ,i k )∈ U 0 else (13) Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 53 i 1 i 3 i 4 100 100 100 100 100 100 (a)t1 VSC1 i1 i2 i4 10 10 10 10 10 10 (b)t1 VSC1 i 3 i 4 1 1 1 (c)t2 VSC2 Figure 3: T ransactions VS Cliques wher e: L(i j ,i k ) = ∑ n p=1 L p (i j ,i k ) W e illustrate through Figures 4, 5 and 6, the sub-graphs G 1 = ⟨ X,U, L 1 ⟩ ,G 2 = ⟨ X,U, L 2 ⟩ andG 3 = ⟨ X,U, L 3 ⟩ which model the transactions t 1 , t 2 and t 3 , respectively . On the right of each sub-graphG p , we introduce the cor - responding adjacency matrixL p containing the bonds sub- labels. i1 i3 i2 i4 100 100 100 100 100 100 Figure 4:G 1 = ⟨ X,U, L 1 ⟩ models the transac- tiont 1 L 1 =     100 000 100 100 000 000 000 000 100 000 100 100 100 000 100 100     i1 i3 i2 i4 010 010 010 010 010 010 Figure 5:G 2 = ⟨ X,U, L 2 ⟩ models the transac- tiont 2 L 2 =     010 010 000 010 010 010 000 010 000 000 000 000 010 010 000 010     i1 i3 i2 i4 001 001 001 Figure 6:G 3 = ⟨ X,U, L 3 ⟩ models the transac- tiont 3 L 3 =     000 000 000 000 000 000 000 000 000 000 001 001 000 000 001 001     As illustrated in Figure 7, according to Definition 10, the combination of the sub-graphsG 1 ,G 2 andG 3 , provides the labeled and undirected graphG =⟨ X,U, L⟩ . The corre- sponding adjacency matrixA =L 1 +L 2 +L 3 is given on the right side of the figure. i1 i3 i2 i4 100 101 010 110 010 110 010 101 111 Figure 7: G = ⟨ X,U, L⟩ models the database transactionst 1 ,t 2 andt 3 A D =     1 10 010 100 1 10 010 010 000 010 100 000 101 101 1 10 010 101 1 1 1     4 Fr equent itemset vs clique In this section, we present the utilization of the maximal clique principle in our approach to find the closed itemsets. The concept of a frequent itemset is defined in Defini- tion 1 1, using the concept of a clique in graph theory . In other words, a clique in the dataset corresponding graph G =⟨ X,Y, L⟩ models a frequent itemset, where the con- tained vertices correspond to items. Definition 1 1 (frequent itemset VS Clique) . Let’ s consider the adjacency matrixA(n× n) , the set of itemsI , the set of transactionsT and the min-support θ . An itemset I is fr equent if and only if its items ar e vertices of a cliqueC in G , such that all the labels of the corr esponding bonds shar e at leastθ sub-labels. ∀ i,j ∈ I :|∩ SL(i,j ) | ⩾ θ (14) wher eSL(i,j ) is the set of the sub-labels whose sum equals A[i,j ] 54 Informatica 48 (2024) 49–62 F .Z. Lebbah et al. i 1 i 2 i 3 i 4 i 5 t 1 1 0 1 1 0 t 2 1 1 0 1 0 t 3 0 0 1 1 0 t 4 0 0 0 0 1 (a) D =     1 0 1 1 0 1 1 0 1 0 0 0 1 1 0 0 0 0 0 1     (b) Figure 8: Example of itemset mining problem including 5 items and 4 transactions T o compute theCFI s, the infrequent items (see Property 1) and empty transactions should be removed. For instance, consider the dataset given in Figure 8. As illustrated in Figure 9, if the min-support is set to θ = 2 , we notice thati 2 andi 5 are infrequent and should be removed. Thus, the datasetD is replaced byD ′ (see sub- Figure 9a), which includes the empty transactiont 4 . At this level,t 4 is deleted to move toD ′′ (see sub-Figure 9b). i 1 i 3 i 4 t 1 1 1 1 t 2 1 0 1 t 3 0 1 1 t 4 0 0 0 D ′ =     1 1 1 1 0 1 0 1 1 0 0 0     (a) i 1 i 3 i 4 t 1 1 1 1 t 2 1 0 1 t 3 0 1 1 D ′′ =   1 1 1 1 0 1 0 1 1   (b) i1 i3 i4 100 101 110 110 101 111 G D ′′ = ⟨ X D ′′ ,U D ′′ , LD⟩ (c) Figure 9: Removing infrequent items and empty transac- tions The graph corresponding to D ′′ , called G ′′ = ⟨ X ′′ ,U ′′ ,L⟩ (see sub-Figure 9c), will be the input for our proposed algorithms and will be noted asG =⟨ X,U, L⟩ . Every pairi j ,i k ∈ I c , wherel c is a closed itemset, is fre- quent, as defined in Definition 4. Therefore, if we project this concept onto the graph modelG =⟨ X,U, L⟩ , the label of each bond(i,j )∈ U must contain at leastθ ones. Stated dif ferently , a bond in our graph needs to be consistent (Def- inition 12). Definition 12 (consistent bond) . Let G = ⟨ X,U, L⟩ be the graph model of the dataset D ′′ . Suppose that L(i,j ) is the corr esponding label to the bond (i,j ) , and HW(L(i,j )) is the number of the ones included inL(i,j ) . If HW(L(i,j )) ⩾ θ then (i,j ) is consistent and (i,j ) is inconsistent otherwise. Consequently , as given in Definition 13, if all the bonds of the graph are consistent,the graph is consistent. Definition 13 (Consistent data mining graph) . LetG c = ⟨ X,U c ,L⟩ be the dataset graph.G c is consistent if-if all its bonds ar e consistent: G c =⟨ X c ,U c ,L⟩ is consistent ⇔∀ (i,j )∈ U : (i,j ) is consistent. (15) Thus, the input graph of our algorithm should be a con- sistent graph whose inconsistent bonds are omitted. As il- lustrated in Figure 10,G c =⟨ X c ,U c ,L c ⟩ is the consistent version ofG (see Figure 7). The corresponding adjacency matrixA(3× 3) is shown on the right side of the figure. i1 i3 i4 101 110 110 101 111 Figure 10: The consistent graph G c = ⟨ X c ,U c ,L c ⟩ (see sub-Figure 9c) A =   1 10 0 1 10 0 101 101 1 10 101 1 1 1   5 CfCi exact algorithm for complete enumeration In Definition 15, we introduce the closed itemset notion in terms of the graph theory . ACFI corresponds to a maximal clique (see Definition 14), whose at least θ sub-labels are shared by the bonds. Definition 14 (maximal clique) . LetG = ⟨ X,U⟩ be an unditr ected graph. A clique C is maximal if it cannot be extended into a lar ger clique. In other wor ds, C is not a subset of a lar ger clique. Definition 15 (closed itemset VS dataset graph) . LetA(n× n) be the adjacency matrix of the consistent graphG = ⟨ X,U, L⟩ . An itemset I is closed if its items ar e the ver - tices of a maximal cliqueCQ such that all the labels of the corr esponding bonds shar e at leastθ sub-labels. Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 55 In graph theory , finding maximal cliques is a hard prob- lem. It is dif ficult to find a polynomial algorithm that pro- vides this kind of clique. However , in this paper , we pro- pose Algorithm 2, called CfCi. This algorithm, based on the bonds’ sub-labels, is exact and has polynomial complexity . Considering, for example, the graph given in Figure 10. The corresponding structures are given below . D =   1 1 1 1 0 1 0 1 1   A =   1 10 0 1 10 0 101 101 1 10 101 1 1 1   Labs = [110, 101, 111] supp = [2, 2, 3] IndVect =   1 2 3   Algorithm 1, having the polynomial complexity O( nbcons× (nbcons− 1) 2 ) (see Proposition 1), provides the CFI s corresponding to the initial labels that belong to Labs . Pr oposition 1. Algorithm 1 has the worst case complexity ≈ O( nbcons 2 − nbcons 2 ) , wher enbcons is the number of the consistent vertices. Pr oof. In this algorithm, we have three nested loops. At most, at the outer onenbcons vertices will be treated and at the inner ones nbcons− 1 2 will be processed. Thus, the com- plexity ofBasicCFI ’ s≈ O( nbcons 2 − nbcons 2 ) . Let’ s consider , for example, the graphG = ⟨ X,U, L⟩ illustrated in Figure 1 1. i1 i3 i2 i4 i5 1000 1010 0100 1100 0100 1100 0100 1010 1110 0001 Figure 1 1: G = ⟨ X,U, L⟩ models the dataset given in T able 1 W e suppose thatθ = 1 , which makes the graph consistent (see 13) and the correct input of our algorithm. Figure 12 shows the dif ferent steps of Algorithm 1 ap- plied to the graph. The sub-figures contain arrows in green, blue and red, that correspond to the fact that the intersec- tion between the first label and the second generates a label equal to the first one, a label with support⩾ θ and a label with support<θ , respectively . The algorithm 2, called CfCi, is the proposed algorithm of the polynomial complexity≈ O( l 3 − l 2 2 (v− 1)) , wherel is the labels’ number andv is the vertices’ number (see Propo- sition 2). It provides all the CFI s of the tackled dataset since the second loop of the algorithm computes all the pos- sible labels’ intersections. In other words, each new gener - ated label whosesupp⩾ θ should lead to a newCFI . 0001 0100 1010 1100 1110 Labs 0000 0000 0000 0000 [i 5 ] FCI i 5 IndVect 1 SuppVect (a)I1 = { i5} 0001 0100 1010 1100 1110 Labs 0000 0100 0100 [i 5 ] [i 2 , i 1 , i 4 ] FCI i 5 i 2 IndVect 1 1 SuppVect (b)I2 = { i2,i 1,i 4} 0001 0100 1010 1100 1110 Labs 1000 1010 [i 5 ] [i 2 , i 1 , i 4 ] [i 3 , i 4 ] FCI i 5 i 2 i 3 IndVect 1 1 2 SuppVect (c)I3 = { i3,i 4} 0001 0100 1010 1100 1110 Labs 1100 [i 5 ] [i 2 , i 1 , i 4 ] [i 3 , i 4 ] [i 1 , i 4 ] FCI i 5 i 2 i 3 i 1 IndVect 1 1 2 2 SuppVect (d)I4 = { i1,i 4} 0001 0100 1010 1100 1110 Labs 1110 [i 5 ] [i 2 , i 1 , i 4 ] [i 3 , i 4 ] [i 1 , i 4 ] [i 4 ] FCI i 5 i 2 i 3 i 1 i 4 IndVect 1 1 2 2 3 SuppVect (e)I5 = { i4} Figure 12: GeneratedCFI s by Algorithm 1. IndVect and SuppVect contain the vertices who are at the origin of the corresponding labels and their supports, respectively . T o find a new maximal clique, our algorithm calls the function NewClique (see Algorithm 3). NewClique(r) computes the incident vertices toIndVect[r] , that share at leastθ sub-labels contained inLabs[r] . Pr oposition 2. Algorithm 2 has the worst case complexity ≈ O( l 3 − l 2 2 (v− 1)) , wher el is the labels’ number andv is vertices’ number . Pr oof. In this algorithm, we have three nested loops. Con- siderl the labels’ number andv vertices’ number . At most, at the outer onel labels will be treated, and at the first and the second inner onesl and l− 1 2 iterations will be processed, respectively . In addition, at mostv− 1 successors are han- dled in the function NewClique . Thus, the CfCi’ s com- plexity≈ O( l 3 − l 2 2 × v− 1) . In Figure 13, we illustrate the variation of the used struc- tures: Labs , FCI , IndVect and SuppVect . As given in the proof of Proposition 2, Algorithm 2 executes innbcons outer loops, wherenbcons is the number of the consistent 56 Informatica 48 (2024) 49–62 F .Z. Lebbah et al. Algorithm 1 BasicCFI(M ) Requir e: θ : threshold, M : adjacency matrix of the cleaned graph,nbcons : number of consistent vertices; ConsItem : vector of consistent vertices; 1: VertexSet←∅ ; 2: fori = 0..nbcons do ▷ the vertices are covered, according to their labels, 3: from the smallest to the biggest one 4: VertexSet← VertexSet∪{ i} ; 5: IndVect← IndVect∪{ i} ;SuppVect← SuppVect∪{ supp[i]} 6: forj =i+1..nbcons do 7: if DistLabs(i,j ) =supp[j] then ▷ return the distance between the 8: labelsi andj 9: VertexSet← VertexSet∪{ j} 10: end if 1 1: end for 12: if VertexSet̸=∅ then 13: FCI← FCI∪{ VertexSet} 14: end if 15: VertexSet←∅ 16: end for 17: r eturnFCI Algorithm 2 CfCi() Requir e: θ : threshold, M : adjacency matrix of the cleaned graph,nbcons : number of consistent vertices; ConsItem : vector of consistent vertices;Labs : set of the generated labels byBasicCFI algorithm 1: beg← 0 ;end←| Labs| ;e←| Labs| 2: fork = 0···| Labs| do 3: bg =beg[k] ;ed =end[k] 4: fori =bg··· ed do 5: b← 0 6: forj =i+1··· ed do 7: cl← 0 ;dst← DistLabs(Labs[i],Labs [j]) 8: if dst̸=SuppVect[i]∧ dst≥ θ then 9: cl =comp2Labs(Labs[i],Labs [j]) ▷ return tocl the intersection of the labelsi andj 10: if cl / ∈ Labs then 1 1: Labs← Labs∪{ cl} ;IndVect← IndVect∪{ i} 12: SuppVect← SuppVect∪{ dst} 13: VertexSet← NewClique(|Labs|− 1) 14: if VertexSet̸=∅ then 15: VertexSet← VertexSet∪{ ind} ;FCI← FCI∪{ VertexSet} 16: VertexSet←∅ 17: end if 18: b← b+1 19: end if 20: end if 21: end for 22: if b≥ 1 then 23: beg← beg +e ;end← end+ed+b ;e← e+b 24: end if 25: end for 26: end for vertices. Thus, sincenbcons = 5 , our algorithm executes 5 outer loops. Through sub-Figures 13a, 13b, 13c, 13d and 13e, we illustrate the dif ferent loops’ details. W e used the green arrow to express the novelty and consistency of the generated label, the blue to show that the generated label is already implemented, and the red when the support of the generated label is less thanθ . Thus, Algorithm 2, applied to the graph, provides six Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 57 Algorithm 3NewClique 1: functionNewClique (r ) 2: VertexSet←∅ 3: ind← IndVect[r] 4: whilev∈ succ[ind] do 5: dst =DistLabs(r,v ) 6: if dst =SuppVect[r] then 7: VertexSet← VertexSet∪{ v} 8: end if 9: end while 10: r eturnVertexSet 1 1: end function 0001 0100 1010 1100 1110 0000 0000 0000 0000 [i 5 ] [i 2 , i 1 , i 4 ] [i 3 , i 4 ] [i 1 , i 4 ] [i 4 ] FCI i 5 i 2 i 3 i 1 i 4 IndVect 1 1 2 2 3 SuppVect (a) No new label is generated. 0001 0100 1010 1100 1110 0000 0100 0100 [i 5 ] [i 2 , i 1 , i 4 ] [i 3 , i 4 ] [i 1 , i 4 ] [i 4 ] FCI i 5 i 2 i 3 i 1 i 4 IndVect 1 1 2 2 3 SuppVect (b) No new label is generated. 0001 0100 1010 1100 1110 Labs + 1000 1000 1000 [i 5 ] [i 2 , i 1 , i 4 ] [i 3 , i 4 ] [i 1 , i 4 ] [i 4 ] [i 3 , i 1 , i 4 ] FCI i 5 i 2 i 3 i 1 i 4 i 3 IndVect 1 1 2 2 3 1 SuppVect (c)I6 = { i3,i 4,i 4} is generated. 0001 0100 1010 1100 1110 1000 Labs 1010 [i 5 ] [i 2 , i 1 , i 4 ] [i 3 , i 4 ] [i 1 , i 4 ] [i 4 ] [i 3 , i 1 , i 4 ] FCI i 5 i 2 i 3 i 1 i 4 i 3 IndVect 1 1 2 2 3 1 SuppVect (d) No new label is generated. 0001 0100 1010 1100 1110 1000 Labs 1100 [i 5 ] [i 2 , i 1 , i 4 ] [i 3 , i 4 ] [i 1 , i 4 ] [i 4 ] [i 3 , i 1 , i 4 ] FCI i 5 i 2 i 3 i 1 i 4 i 3 IndVect 1 1 2 2 3 1 SuppVect (e) No new label is generated. Figure 13: GeneratedCFI s by Algorithm CfCi. IndVect andSuppVect contain the vertices who are at the origin of the corresponding labels and their supports, respectively . maximal cliques as given in Figure 14. Each cliqueC i in- duces a closed itemsetI i . The maximal cliques resulted from our algorithmC 1 , C 2 ,C 3 ,C 4 ,C 5 andC 6 are illustrated via sub-Figures 14a, 14b,14c, 14d, 14e and 14f, respectively . i5 0001 (a) C 1 modeling theCFI I 1 ={ i 5 } i1 i2 i4 0100 1100 0100 1100 0100 1110 (b)C2 modeling theCFI I2 = { i2,i 1,i 4} i3 i4 1010 1010 1110 (c)C3 modeling theCFI I3 = { i3,i 4} i1 i4 1100 1100 1110 (d)C4 modeling theCFI I4 = { i1,i 4} i4 1110 (e)C5 modeling theCFI I5 = { i4} i1 i3 i4 1000 1010 1100 1100 1010 1110 (f)C6 modeling theCFI I6 = { i3,i 1,i 4} Figure 14: Maximal cliques VSCFI s 6 Experiments As stated in Section 2.3, besides CfCi, we consider in our experiments the algorithms NAFCP [10], NECLA T [13] and GrAFCI+ [1 1]. In addition, to have credible results, we chose the datasets according to their varied densities. More precisely , the chosen datasets : Mushroom, L ymph and Soybean datasets 1 have respectively 18 %, 40 % and 32 % (see T able 3). In T able 3, we introduce the characteristics of each tested dataset DName(nbItem,nbTr,dns ) , where DName is the dataset’ s name, and nbItem , nbTr and dns are, re- spectively , the corresponding items’ number , transactions’ number and density . The datasets’ information given in the table are: – freq : the frequency adopted by the algorithm, – θ : the corresponding support tofreq , – dns : the density of the dataset according tofreq , 1 The tested datasets are downloaded from https://dtai.cs. kuleuven.be/CP4IM/datasets/ 58 Informatica 48 (2024) 49–62 F .Z. Lebbah et al. – nvr : the vertices’ number of the consistent dataset’ s graph, – nbd : the bonds’ number of the consistent dataset’ s graph. T able 3: Characteristics of the tested dataset: Mushroom, L ymph and SoyBean. Mushroom(1 19,8124,18%) freq θ dns nvr nbd 5 % 406 30% 67 1 168 25 % 2031 51% 31 229 50 % 4062 74% 12 52 80 % 6499 94% 5 14 L ymph(148,68,40%) freq θ dns nvr nbd 5 % 7 45% 59 1369 25 % 37 60% 40 549 50 % 74 76% 24 193 80 % 1 18 91% 1 1 58 Soybean(50,630,32%) freq θ dns nvr nbd 5 % 32 36% 44 601 25 % 158 56% 23 157 50 % 315 75% 12 55 80 % 504 93% 5 13 The results provided by the tested algorithms are ana- lyzed regarding the CPU-time, the number of computed CFI s (calledncl ) and the memory usage (calledMem ). The used computer has the following configuration: Linux-20.04.1 on a notebook with Intel Core i7 and 8 GB/RAM. The experiments are performed in the Java lan- guage. In sub-Figures 15a, 15b and 15c, we present the behav- ior of CfCi, NAFCP, NECLA T and GrAFCI+ applied to Mushroom, in terms of the gotten CPU-time,ncl (the num- ber of closed itemsets) and mem (the memory usage), re- spectively . W e notice, via sub-Figure 12a, that the applied methods have approximately the same CPU time until the frequency of 25 %, where CfCi and GrAFCI+ become slower . At a frequency of 5 %, CfCi and NECLA T are the fastest ones, whereas NAFCP is considerably the slowest one. Sub- Figure 15b shows that all the methods, at frequencies of 80 %, 50 % and 25 %, provide the same number of CFI s. Whereas, at a frequency of 5 %, our algorithm provides more CFI s than NECLA T and GrAFCI+, and less than NAFCP. In terms of memory usage, sub-Figure 15c highlights the ef ficiency of our algorithm compared to the others. In Figure 16, we introduce three diagrams that express the behavior of the adopted methods, applied to the dataset (a) CPU-time (b) Number ofCFI s (c) Memory usage Figure 15: MUSHROOM dataset Lymph , in terms of CPU-time, number ofCFI s and mem- ory usage. As shown in sub-Figure 16a, at frequencies 80 % and 50 %, all the methods have the same level of fastness, whereas CfCi becomes faster at frequency 25 %. In addi- tion, we notice that at frequency 5 %, NAFCP becomes the slowest, NECLA T the fastest and CfCi becomes close to NECLA T. Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 59 (a) CPU-time (b) Number ofCFI s (c) Memory usage Figure 16: L YMPH dataset Sub-Figure 16b shows, in terms of the CFI s number , that at frequencies80 %,50 % and25 %, all the methods pro- vide the same results except GrAFCI+, which gives more CFI s at frequency 80 %. Whereas, at frequency 5 %, the provided number ofCFI s dif fers from algorithm to algo- rithm, and our algorithm provides moreCFI s than NAFCP and NECLA T and less than GrAFCI+. Figure 17 contains SoyBean’ s diagrams, that reveal the behavior of the tested methods in terms of CPU-time, number ofCFI s and mem- ory usage. (a) CPU-time (b) Number ofCFI s (c) Memory usage Figure 17: SOYBEAN dataset W e notice through sub-Figure 17a that CfCi and NECLA T start the fastest ones at freq = 80 %, then NECLA T remains the most ef ficient. On the other side, as the frequency decreases, the fastness of CfCi and GrAFCI+ decreases. Regarding the number of CFIs, all the methods pro- 60 Informatica 48 (2024) 49–62 F .Z. Lebbah et al. vide the same number of CFIs at each frequency , except GrAFCI+, which gives one more CFI. W e notice through sub-Figure 17a that CfCi and NECLA T start the fastest ones at freq = 80 %, then NECLA T remains the most ef ficient, and as the frequency decreases, CfCi and GrAFCI+. Regarding the number ofCFI s, all the methods provide the same number of closed frequent itemsets at each fre- quency , except GrAFCI+ which gives one moreCFI s. Sub-Figure 17c shows the ef ficiency of our algorithm in regards to memory usage. Contrary to GrAFCI+ and NECLA T, which are really costly . As stated in Section 5, our algorithm is a complete ap- proach that is designed to provide the exact number of CFI s. W e observed through the diagrams given above, that NAFCP, GrAFCI+ and NECLA T may yield less or more CFI s in certain situations. In addition, our algo- rithm is the most ef ficient in terms of memory usage for all datasets, except L ymph withfreq = 5 %. Thus, CfCi has the average CPU-time results and the best results in terms of the number ofCFI s and memory usage. W e have to precise that the case where CfCi, applied to L ymph, provides the worst results in terms of memory us- age. This result can be justified by the size of the corre- sponding graph. According to T able 3, the graph that mod- els the L ymph dataset contains the biggest values, namely 1 18 vertices and 91 bonds, compared to the others. 7 Discussion There are advantages and disadvantages to each of the pre- vious data mining techniques. Most of them are based on tree structures (see T able 2), namely NList, Gr -tree and F P- tree in NAFCP, GrAFCI+ and NECLA T, respectively . This is because of the recursive relationship indirectly expressed in Property 1. In addition, the authors of the tested techniques, pro- posed approximate approaches, and employed recursive al- gorithms to construct and to explore the corresponding tree structure. Using recursive structures and approximated algorithms in most SOT A approaches means that memory usage is costly and finding all closed itemsets is not guaranteed. In addition, updating tree structures, when the corresponding dataset is modified, is generally a complex task. In this article, we developed a new boolean and linear structure that can be easily updated when necessary , and iterative algorithms to optimize memory usage. Since our algorithm CfCi is exact and polynomial, it provides all the CFI s and shows good CPU-time. In certain instances, when the frequency is 5 %, CfCi slows down as compared to NAFCP, GrAFCI+ and NECLA T. This is because of the step that verifies if the computedCFI does not belong to the current set ofCFI s. Most times, the number of the CFI s ncl provided by our algorithm is the same as the other techniques. Some- times, we get less or moreCFI s, particularly whenfreq = 5 %. Our algorithm comprises calculating all the intersec- tions between the bonds’ labels and preventing duplicated CFI s. Thus, we have the incompleteness of the algorithms NAFCP, NECLA T and GrAFCI+ compared to CfCi. Our approach is the best choice for optimizing memory usage, except in the L ymph dataset whenfreq = 5 %. 8 Conclusion and futur e works This paper has introduced new modeling and solving ap- proaches to find all theCFI s. W e have presented an ef fi- cient graph modeling approach that is based on the clique notion in graph theory . Implementing our model by using a linear structure makes our graph model more flexible to be updated when transactions or items are added or removed. Our proposed model is based on the fact that a frequent itemset is considered a clique in an undirected graph. Thus, CfCi is a new algorithm, based on the maximal clique’ s principle, has been introduced to tackle the closed itemsets mining problem. Our first experiments have shown the ef ficiency of CfCi in finding all theCFI s compared to the recent algorithms existing in the literature. This is because the proposed algo- rithm uses boolean structures and is basically conceived to find all CFI s, through systematic searches. Thus, our al- gorithm is more ef ficient than the tested methods in terms of the number of CFI s and memory usage, and approxi- mately more ef ficient in terms of CPU time. T o summarize, the structure used in our approach is lin- ear , and the proposed algorithm is iterative, exact, and poly- nomial. That is the origin of the promising results exposed and analyzed in this article. However , there are some cases where our approach is less ef ficient than the newest methods, particularly in terms of CPU time. This is because of the CfCi’ s step, which tackles the duplicateCFI s cases. As with any research work, a new version that inher - its the advantages and avoids the disadvantages of our ap- proach can follow our approach. Exploring opportunities for the practical implementation of the proposed algorithm will serve as the future direction for our research. W e con- sider adapting our dataset graph model and CfCi algorithm to tackle classification and clustering problems. Refer ences [1] Fournier -V iger , Philippe and Chun-W ei Lin, Jerry and T ruong-Chi, T in and Nkambou, Roger (2019) A Sur - vey of High Utility Itemset Mining, High-Utility Pat- tern Mining: Theory , Algorithms and Applications , Springer International Publishing, pp. 1–45. https: //doi.org/10.1007/978- 3- 030- 04921- 8_1 Closed Itemset Mining: A Graph Theory Perspective Informatica 48 (2024) 49–62 61 [2] Charu C. Aggarwal (2015) Data Mining-The T ext- book , Springer International Publishing. https:// doi.org/10.1007/978- 3- 319- 14142- 8 [3] Han, Jiawei and Kamber , Micheline and Pei, Jian (201 1) Data Mining: Concepts and T echniques , Mor gan Kaufmann. http://www.sciencedirect. com/science/book/9780123814791 [4] T ruong-Chi, T in and Fournier -V iger , Philippe (2019) A Survey of High Utility Sequential Pattern Min- ing High-Utility Pattern Mining , Springer Interna- tional Publishing, pp. 97–129. http://dx.doi. org/10.1007/978- 3- 030- 04921- 8_4 [5] Carson Kai-Sang Leung (2009)Frequent Item- set Mining with Constraints, Encyclope- dia of Database Systems , Springer Interna- tional Publishing, pp. 1 179–1 183. https: //doi.org/10.1007/978- 0- 387- 39940- 9_170 [6] T ias Guns and Siegfried Nijssen and Luc De Raedt (201 1) Itemset mining: A constraint programming perspective, Artificial Intelligence , Elsevier BV , 175(12-13), pp. 1951–1983. https://doi.org/10. 1016/j.artint.2011.05.002 [7] W ensheng Gan and Jerry Chun-W ei Lin and Philippe Fournier -V iger and Han-Chieh Chao and Justin Zhan (2017) Mining of frequent patterns with multiple min- imum supports, Engineering Applications of Artificial Intelligence , Elsevier BV , 60(C), pp. 83–96. https: //doi.org/10.1016/j.engappai.2017.01.009 [8] Fatima-Zahra El Mazouri and Said Jabbour and Bad- ran Raddaoui and Lakhdar Sais and Mohammed Chaouki Abounaima and Khalid Zenkouar (2019) Breaking Symmetries in Association Rules, Pr o- cedia Computer Science , Elsevier BV , 148(C), pp. 283–290. https://www.sciencedirect.com/ science/article/pii/S1877050919300523 [9] Zaki, M.J. and Hsiao, C.-J. (2005) Ef ficient algo- rithms for mining closed itemsets and their lattice structure, T ransactions on Knowledge and Data En- gineering , Institute of Electrical and Electronics Engi- neers (IEEE), 17(4), pp. 462–478. http://dx.doi. org/10.1109/tkde.2005.60 [10] T uong Le and Bay V o (2015) An N-list-based al- gorithm for mining frequent closed patterns, Expert Systems with Applications , Elsevier BV , 42(19), pp. 6648-6657. https://doi.org/10.1016/j.eswa. 2015.04.048 [1 1] Ledmi, Makhlouf and Zidat, Samir and Hamdi- Cherif, Aboubekeur (2021)GrAFCI+ A Fast Generator -Based Algorithm for Mining Fre- quent Closed Itemsets, Knowledge and Infor - mation Systems , Springer Science and Business Media LLC, 63(7), pp. 1873–1908. https: //doi.org/10.1007/s10115- 021- 01575- 3 [12] Grahne, G. and Zhu, J. (2005) Fast algorithms for fre- quent itemset mining using FP-trees, T ransactions on Knowledge and Data Engineering , Institute of Elec- trical and Electronics Engineers (IEEE), 17(10), pp. 1347–1362. http://dx.doi.org/10.1109/tkde. 2005.166 [13] Nader Aryabarzan and Behrouz Minaei-Bidgoli (2021) NEclatClosed: A vertical algorithm for min- ing frequent closed itemsets, Expert Systems with Ap- plications , Elsevier BV , 174(C), pp. 1 14738. https: //doi.org/10.1016/j.eswa.2021.114738 [14] Gang Fang and Y ue W u and Ming Li and Jia Chen (2015) An Ef ficient Algorithm for Mining Frequent Closed Itemsets, Informatica (Slove- nia) , Slovenian Society Informatika, 39(1), pp. 87–98. https://api.semanticscholar.org/ CorpusID:214751872 [15] Renata Iváncsy and István V ajk (2005) Fast Dis- covery of Frequent Itemsets: a Cubic Structure- Based Approach, Informatica (Slovenia) , Slove- nian Society Informatika, 29(1), pp. 71–78. http://www.informatica.si/index.php/ informatica/article/view/19 [16] Alkenani, Jawad and Kheerallah, Y ousif Abdulwahab (2023) A New Method Based on Machine Learning to Increase Ef ficiency in W ireless Sensor Networks, In- formatica (Slovenia) , Slovenian Society Informatika, 46(9), pp. 45–52. http://dx.doi.org/10.31449/ inf.v46i9.4396 [17] Al-Jammali, Karrar (2023) Prediction of Heart Diseases Using Data Mining Algorithms, Infor - matica (Slovenia) , Slovenian Association Infor - matika, 47(5), http://dx.doi.org/10.31449/ inf.v47i5.4467 [18] Karna, Hrvoje (2020) Data Mining Approach to Ef fort Modeling On Agile Software Projects, In- formatica (Slovenia) , Slovenian Association Infor - matika, 44(2), pp. 231–139. http://dx.doi.org/ 10.31449/inf.v44i2.2759 [19] A wad, Fouad Hammadi and Hamad, Mur - tadha M. (2023) Big Data Clustering T ech- niques Challenged and Perspectives: Review , Informatica (Slovenia) , Slovenian Associa- tion Informatika, 47(6), pp. 203–218. http: //dx.doi.org/10.31449/inf.v47i6.4445 [20] Agrawal, Rakesh and Srikant, Ramakrishnan (1994) Fast Algorithms for Mining Association Rules in Lar ge Databases, Pr oceedings of the 20th Interna- tional Confer ence on V ery Lar ge Data Bases ,Mor gan 62 Informatica 48 (2024) 49–62 F .Z. Lebbah et al. Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 487–499. https://dl.acm.org/doi/ 10.5555/645920.672836 [21] Saïd Jabbour and Mehdi Khiari and Lakhdar Sais and Y akoub Salhi and Karim T abia (2013) Symmetry- Based Pruning in Itemset Mining, 25th IEEE Inter - national Confer ence on T ools with Artificial Intelli- gence -ICT AI , IEEE Computer Society , Los Alami- tos, CA, USA, pp. 483–490. https://doi.org/10. 1109/ICTAI.2013.78 [22] T akeaki Uno and Masashi Kiyomi and Hiroki Arimura (2004) LCM ver . 2: Ef ficient Mining Algorithms for Frequent/Closed/Maximal Itemsets, FIMI ’04, Pr oceedings of the IEEE ICDM W ork- shop on Fr equent Itemset Mining Implementations , CEUR-WS.or g, Brighton, UK, pp. 1–1 1. https:// ceur- ws.org/Vol- 126/uno.pdf [23] Agrawal, Rakesh and Imieliński, T omasz and Swami, Arun (1993) Mining association rules between sets of items in lar ge databases, International Confer ence on Management of Data , Association for Comput- ing Machinery , W ashington, D.C., USA, pp. 207–216. https://doi.org/10.1145/170035.170072 , [24] Y un Chi and Haixun W ang and Y u, P .S. and Muntz, R.R. (2004) Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding W indow , Fourth In- ternational Confer ence on Data Mining (ICDM’04) , IEEE, Brighton, UK, pp. 59–66. http://dx.doi. org/10.1109/icdm.2004.10084 [25] Belaid, Mohamed-Bachir and Bessiere, Christian and Lazaar , Nadjib (2019) Constraint Programming for Mining Borders of Frequent Itemsets, Pr oceedings of the T wenty-Eighth International Joint Confer ence on Artificial Intelligence, IJCAI-19 , International Joint Conferences on Artificial Intelligence Or ganization, Macao, China, pp. 1064–1070. https://doi.org/ 10.24963/ijcai.2019/149 [26] AlZoubi, W ael (2015) An Improved Graph Based Method for Extracting Association Rules, Interna- tional Journal of Softwar e Engineering and Applica- tions , Academy and Industry Research Collaboration Center (AIRCC), pp. 1–10. http://dx.doi.org/ 10.5121/ijsea.2015.6301 [27] Raedt, Luc De and Guns, T ias and Nijssen, Siegfried (2010) Constraint programming for data mining and machine learning, Pr oceedings of the T wenty-Fourth AAAI Confer ence on Artificial Intel- ligence , AAAI Press, Atlanta, Geor gia, pp. 1671– -1675. https://dl.acm.org/doi/abs/10.5555/ 2898607.2898874 [28] Schlegel, Benjamin and Gemulla, Rainer and Lehner , W olfgang (201 1) Memory-ef ficient frequent-itemset mining, Pr oceedings of the 14th International Con- fer ence on Extending Database T echnology , Asso- ciation for Computing Machinery , Uppsala, Swe- den, pp. 461–472. https://doi.org/10.1145/ 1951365.1951420 [29] T iwari, V ivek and T iwari, V ipin and Gupta, Shailen- dra and T iwari, Renu (2010) Association rule min- ing: A graph based approach for mining frequent itemsets, 2010 International Confer ence on Net- working and Information T echnology , IEEE, Manila, Philippines, pp. 309–313. http://dx.doi.org/10. 1109/icnit.2010.5508505 [30] Gouda, K. and Zaki, M.J.(2001) Ef ficiently mining maximal frequent itemsets, Pr oceedings 2001 IEEE International Confer ence on Data Mining , IEEE, San Jose, CA, USA, pp. 163–170. http://dx.doi.org/ 10.1109/icdm.2001.989514 [31] Han, Kyong Rok and Kim, Jae Y earn (2005) FCILINK: Mining Frequent Closed Itemsets Based on a Link Structure between T ransactions, Journal of Information & Knowledge Management , W orld Sci- entific Pub Co Pte Lt, pp. 257–267. https://doi. org/10.1142/S0219649205001213 [32] De Raedt, Luc and Guns, T ias and Nijssen, Siegfried (2008) Constraint Programming for Itemset Mining, Pr oceedings of the 14th ACM SIGKDD International Confer ence on Knowledge Discovery and Data Min- ing , Association for Computing Machinery , Las V e- gas, Nevada, USA, pp. 204–212. https://doi. org/10.1145/1401890.1401919