AdvancesinMethodologyandStatistics/Metodološkizvezki,Vol.18,No.1,2021,17–30 https://doi.org/10.51936/kcpy2449 Comparingdifferentmethodsforone-modehomogeneity blockmodelingaccordingtostructuralequivalence onbinarynetworks AlešŽiberna University of Ljubljana, Faculty of Social Sciences, Ljubljana, Slovenia Abstract One-modehomogeneityblockmodelingisanapproachtoclusteringnetworksthatsearches forpartitionsofunitsinanetworksothattheresultingblocksareashomogeneousaspossible. Blockisapartofthenetworkthatcontain(possible)tiesfromtheunitsofoneclustertothe unitsofanothercluster(ortieswithinacluster). Typically,sumofsquareddeviationsfrom themeanistakenasthemeasureofvariability(non-homogeneity). Thepaperpresentsthe resultsofasimulationstudythatappliedseveralmethodsforthisproblemtobinarynetworks generatedaccordingtostructuralequivalence. Severalversionsofhomogeneitygeneralized blockmodeling(usingarelocationalgorithm),a k-means-basedalgorithm,andanindirect approacharecompared. Sinceallofthemethodsbeingcomparedtrytooptimizethesame criterionfunction,thisandtheAdjustedRandIndexarethemaincriteriaforthecomparison. Allmethods(excepttheindirectapproach,whichisnotiterative)weregiventhesameamount oftimetofindthebestpossiblesolution. Theoverallconclusionisthatthek-meansapproach is advised in most cases, except when smaller networks (200 units) are being partitioned intolargernumberofclusters,inwhichcasethehomogeneitygeneralizedblockmodelingis preferred. Keywords: blockmodeling,sumofsquares,generalizedblockmodeling, k-meansapproach, indirectapproach 1. Introduction One-mode homogeneity blockmodeling (Žiberna, 2007) is an approach for clustering networks. Itsearchesforpartitionsofunitsinanetworksothattheresultingblocks,namely, thepartsofthenetworkthatcontain(possible)tiesfromtheunitsofonegrouptotheunitsof anothergroup(ortieswithinagroup),areashomogeneousaspossible(wheresomemeasure of their variability is as low as possible). Typically, the sum of squared deviations from ∗ Correspondingauthor Email address: ales.ziberna@fdv.uni-lj.si(AlešŽiberna) ORCIDiD: 0000-0003-1534-6971(AlešŽiberna) 18 Žiberna the mean (sum of squares for short) is taken as the measure of variability and hence only thismeasureisconsideredheresincethen k-means-like(Žiberna,2020)approachesmaybe used. The aim of this paper is to investigate which strategies are best for finding the ideal partitionusingthiscriterion. Here, k-meansandgeneralizedblockmodelingapproachesare considered,togetherwithvariousstrategiesforcertainaspectsoftherelocationalgorithmused withingeneralizedblockmodelingandthepossibilityofcombiningthetwoapproaches. In addition,anindirectapproachtoblockmodelingisexamined. Resultsofasimilarsimulation as presented here, but based on linked networks and without the indirect approach, has already been presented (Žiberna, 2020). However, given that most users of homogeneity blockmodeling approaches analyze one-mode networks, I believe these results are more relevantformostcaseswhiletheyalsoincorporateuseofanindirectapproach. Inthispaper, only“simple”one-modebinarydirectednetworkswithoutloops(self-ties)areexamined,that is,thosewhichcontainonlyonesetofunitsandonlyonedirectedrelation. 2. Blockmodelingcriteriaandapproaches Blockmodeling is a clustering method for clustering units in a network. Its aim is to partition network units into clusters and, at the same time, to partition the set of ties into blocks (Doreian et al., 2005, p. 29). Blockmodeling may also be “[v]iewed as a method of data reduction, [...] a valuable technique in which redundant elements in an observed system are reduced to yield a simplified model of relationships among types of elements (units)”(Borgatti&Everett,1992). Severalapproachestoblockmodelingexist. Onepossible division entails stochastic approaches, namely stochastic blockmodeling (Anderson et al., 1992;Hollandetal.,1983;Snijders&Nowicki,1997),anddeterministicapproaches. Only selected deterministic approaches are considered in this text because the aim is to only look at approaches that minimize the sum of squares within blocks. Deterministic approaches may be further split into indirect approaches1 (e.g. Breiger et al., 1975; Burt, 1976; see Doreian et al., 2005, pp. 25–26 for definition) and direct approaches. In this paper, one indirect and two direct approaches are examined. The indirect approach used reliesoncorrectedsquaredEuclidiandistance(Batageljetal.,1992)andWardhierarchical clustering(Ward,1963). Thedirectapproachesusedaregeneralizedblockmodeling(Doreian etal.,1994,2005;Žiberna,2007)and k-meansbasedblockmodelingapproaches(Brusco& Doreian,2015a,2015b;Žiberna,2020). Inthesubsectionsbelow,thesumofsquarescriterion ispresentedfirst,followedbytheapproachesthatarebeingconsidered. 2.1. Sum of squares criterion Inthispaper,themethodsaimtominimizewithin-blocksumofsquareddeviationsfrom themean. Theytherebyaimtoobtainthepartitionthatinducesblockswhosevaluesareas homogenousaspossible. For the binary networks under study here, ideal blocks (those leading to value 0 of a criterion function) include complete blocks (all possible ties are present) and null blocks (notiesarepresent). However,suchsituationsarerarelyfoundinrealnetworksunlessthe numberofclustersislargerelativetothenumberofunits. Therefore,forbinarynetworks, thiscriterionessentiallymeanswearelookingforblockswherethedensitiesoftherowsand likewiseofthecolumnsareassimilaraspossibletoeachother. Moreprecisely,thecriterionfunctionthatisoptimizedis: f(P)= K ∑ k=1 K ∑ l=1 v kl Comparing different methods for one-mode homogeneity blockmodeling ... 19 where v kl is the sum of squared deviations from the mean for block kl. The notation is explainedinTable1. Table1: Nomenclature n numberofunits U asetof nunits X arealorbinarymatrixwithelements x ij andwithdimensions n×n K numberofclusters P apartitionofallunits, P={C 1 ,...,C K },∪ K i=1 C i =U Π asetofallpossiblepartitions ¯ x thegrandmeanofX, ¯ x= n ∑ i=1 n ∑ j=1,i̸=j x ij (n(n−1)) assumingtoloops(self-ties) ¯ X a matrix with dimensions K×K with elements ¯ x kl representing the mean of the blockfromclusterC k toclusterC l ,where ¯ x kl = ∑ i∈C k ∑ j∈(C l \i) x ij |C k |·(|C l |−δ(k,l)) δ(l,k) afunctionthatreturns1ifitsargumentsareequal,0otherwise V amatrixwithdimensions K×K withelements v kl = ∑ i∈C k ∑ j∈{C l \} x ij − ¯ x kl  2 , thewithin-blocksumofsquaresforblockfromclusterC k toclusterC l 2.2. Indirect blockmodeling Asmentioned,theindirectapproachusedisWard’shierarchicalclustering(Ward,1963) appliedtothecorrectedsquaredEuclidiandistance(Batageljetal.,1992)since,atleaston classicaldata,thiscombinationisoftenseenasa(usuallyinferior)alternativetohomogeneity approachesbasedlocaloptimizationofsumofsquarescriteria,e.g. the k-meansapproach (Steinley,2003). Someothercombinationsofdistance(using“unsquared”correctedEuclidian distance) and other hierarchical methods (complete, single) are also used for comparison purposes. Thisapproachisnotthefocusofthisresearch,butincludedtoalsocheckwhetherinthe case of homogeneity criteria the partitions obtained from indirect approaches are inferior to those from direct ones as observed for the criterion function counting the number of inconsistencies(Batageljetal.,1992). 2.3. Generalized blockmodeling Itwasnotedthatthefirstdirectapproachtobeusedisgeneralizedblockmodeling(Doreian etal.,2005). Thisisaveryflexibleapproachforblockmodeling, asitcannotonlycluster units according to one type of equivalence, such as structural (Lorrain & White, 1971) or regular(White&Reitz,1983)equivalence,butcanbeusedwithanyequivalencethatcanbe 20 Žiberna expressedbyspecifyingasetofallowedblocktypes. Whileinitiallydesignedforonlybinary one-mode networks (Doreian et al., 1994), it was later extended to various other network types(Doreianetal.,2004;Doreian&Mrvar,2009). Forthispaper,theextensiontovalued networkscalledsumofsquareshomogeneityblockmodeling(Žiberna,2007)isespecially important because this minimizes the sum of squares within-blocks criteria, which is the criterionusedinthispaper. Particularly for the purpose of optimizing the sum of squares criteria introduced in Subsection2.1,thisapproachbasicallyamountstousingarelocationalgorithmforoptimizing thiscriterion. Sinceitisonlyalocally-optimalalgorithm,thewholeprocedureisrepeated (restarted)withseveral(typicallyrandomlyselected)startingpoints. Implementationofthe relocationalgorithmusedhereinvolvestwopossible“relocations”: 1. Transfer: Transferringaunitfromoneclustertoanother(iftheunitisnottheonlyunit initscluster); 2. Exchange: Exchangingtwounitsfromdifferentclusters. Thefirstoption(transfer)isalwaysusedbecauseitisrelativelyfast. Thesecondoption (exchange) on the other hand uses much more time. For binary two-mode blockmodeling has been shown (Brusco & Steinley, 2011) that it is better to use more restarts (repeat the algorithmwithanewrandomstartingpoint)withoutexchangesthantotrytobetteroptimize eachpartitionusingthisoption. Anotherimplementationdetailiswhetherallallowedrelocations(soeitherjusttransfers ormovesandtransfers)arefirstevaluatedandthepartitionthatimprovesthecriterionfunction themostisselectedasthepartitionforuseinthenextiteration. Theotherpossibilityisthatwhenthefirstpartitionwhichimprovesthecriterionfunction isfound, thealgorithmmovestothisnewpartition. Whenthisoptionisused, theorderin whichtherelocationsareperformedisveryimportant. Topreventbiasandallowasdiverse searchaspossible,theorderofrelocationswasrandom. 1 2.4. k-means blockmodeling k-meansblockmodelinghasitsrootsinthe k-meansalgorithm(Hartigan,1975;Hartigan &Wong,1979;MacQueen,1967),awell-knownalgorithmforpartitioningunitsin“classical” data(unitsbyvariables)byminimizingthewithin-clustersumofsquares. Thealgorithmis locally-optimal. Athoroughreviewisgivenbye.g. Steinley(2006). Theoriginal k-means algorithm therefore partitions one mode of a two-mode valued matrix. The method was first adapted to the two-mode clustering of two-mode matrices or networks (Baier et al., 1997; Brusco & Doreian, 2015a, 2015b; Van Mechelen et al., 2004; van Rosmalen et al., 2009;Vichi,2001)andrecentlytotheone-modeclusteringofone-modematrices/networks (Žiberna, 2020). This last variant of thealgorithm is evaluated in this paper. The k-means algorithmcanbeinitializedeitherwitharandompartitionorwithrandomcentroids. Thefirst variantisreliedonhere. 1 Forfasterperformance,theorderwasnotcompletelyrandom. Ineachiteration,theprogramgoesthrough allclustersinrandomorder. Withineachcluster,itgoesthroughallunitsfromthisclusterinrandomorder. For eachunit,theprogramthengoesthroughallremainingclustersinrandomorder. Foreachcluster,itfirstchecks ifthetransferofaselectedunittothisclusterimprovesthecriterionfunction. Ifitdoesnot,theexchangesare allowed,andthesecondclusterhasahigheridthanthefirstone(toavoidexaminingthesameexchangetwice), itgoesthroughallunitsinthisclusterandevaluatesitiftheexchangeofthesetwounits(fromdifferentclusters) improvesthecriterionfunction. Wheneveranimprovementisfound,thetransferorexchangeisexecutedand thenewpartitionistakenasthestartingpartitionforthenextiteration. All“randomorders”aredifferentineach iteration. Comparing different methods for one-mode homogeneity blockmodeling ... 21 2.5. k-means plus relocation Späth (1980, as cited in Steinley, 2006, p. 3) noted the results of the original k-means algorithmcansometimes(althoughpresumablyrarelyinpractice)beimprovedbyrelocating a single unit from one cluster to another. Similarly was found for the algorithm for the two-modepartitioningoftwo-modenetworks(Brusco&Steinley,2007). Assumofsquares homogeneityblockmodelingoptimizesthecriterionfunctionusingarelocationalgorithmas the k-meansapproachforone-modenetworks,thisrelocation-basedalgorithmisusedhere inanattempttoimprovethe k-meanssolution. Thefactthatthegeneralizedblockmodeling algorithmisrelativelyslowisnotasproblematicbecauseinmostcasesoptimizingasingle goodpartitionisrelativelyquick. 3. Simulationstudydesign The aim of this simulation study is to compare approaches to one-mode homogeneity blockmodeling on binary networks. The approaches under comparison described in sub- sectionsinSection2areindirectblockmodelingbasedonthecorrectedsquaredEuclidian distance and Ward’s hierarchical clustering, homogeneity sum of squares blockmodeling, and k-means blockmodeling. Given that what mainly differentiates k-means-based algo- rithmsfromhomogeneity(sumofsquares)generalizedblockmodelingalgorithms(onlywhen usedwithstructuralequivalenceandnopre-specification)istheoptimizationalgorithm,the generalizedblockmodelinginthissectionistypicallycalledarelocation-basedalgorithm. 3.1. Design of the study Inthissimulationstudy,randomnetworksweregeneratedusingthefollowinggenerating procedure. First, the partition of units into the desired number of clusters was randomly generated with the constraint that each cluster had to have at least three units. For each network,arandomimagematrix(amatrixrepresentingtiesamongclusters)wasgenerated basedonthreeconstraints: 1. Onlytwotypesofblocksweregenerated,named‘null’(verysparse)and‘complete’ (lesssparse); 2. Theexpectednumberof‘complete’blocksinanyrow(orcolumn)oftheimagematrix was1.5(regardlessofthenumberofclusters); 3. Theimagematrixhadtobesuchthatnotwoclustershadthesamepatternofties(were notstructurallyequivalentintheimagematrix/network). Thefinalnetworkwasgeneratedbasedonsuchanimagematrix. Theprobabilityofatie in‘null’blockswassetto0.05,whiletheprobabilityofatiein‘complete’blockswassetto oneoffivepossiblevalues: 0.08,0.12,0.16,0.20,0.24. The‘null’blockdensitywaschosen basedonsomerealnetworks,whichareusuallysparse. The‘complete’blockdensitieswere chosen so that the recovery of the correct partition for at least the best methods in most scenariosrunsfromterribletogood. Someexamplesofgeneratednetworksarepresentedin Figure1. Thedesignofthesimulationallowedallpossiblecombinationsofthefactorslistedbelow (theirlevels)tobetested. Foreachcombination,10randomnetworksweregeneratedand analyzed. Thefactorsthatvariedwere(thenamesinsquarebracketsareusedinthefigures andtables): 1. totalnumberofunits[n]: 200,400,800,1600; 2. thenumberofclusters[k]: 2,4,8,16; 22 Žiberna 3. theprobabilityofatiein‘complete’blocks[Prob. ofatieincom. blocks]: 0.08,0.12, 0.16,0.20,0.24. Sinceallcombinationsoffactorsweretestedand10randomnetworksweregeneratedfor eachcombination,800(=4×4×5×10)networksweregenerated. Thesenetworkswere thenanalyzedwiththealgorithmsthatalltrytominimizethesumofsquarescriterion,thatis, thesumofsquareddeviationsfromtheblockmeans. Thefollowingmethodswereused(thenamesinsquarebracketsareusedinthefigures andtables): • indirectapproach,describedinSubsection2.2. – Hierarchicalclustering, Ward’s method applied to corrected squared Euclidian distances[HclustWard2] – Hierarchicalclustering,Ward’smethodappliedtocorrectedEuclidiandistances [HclustWard] – Hierarchicalclustering,theCompleteLinkagemethodappliedtocorrectedEu- clidiandistances[HclustComp] – hierarchicalclustering,theSingleLinkagemethodappliedtocorrectedEuclidian distances[HclustSingle] • relocation-basedalgorithms–homogeneitysumofsquaresgeneralizedblockmodeling accordingtostructuralequivalenceasdescribedinSubsection2.3andimplemented intheblockmodelingTest package(Žiberna,2019a). Thefollowingversionswere used: – withonlytransfersallowed;thetransfersaretriedinrandomorderand,assoon asanimprovementisfound,thealgorithmmovestotheimprovedpartition[RL moves] – thesameasabove,exceptthatinthebothtransfersandexchangesweretried[RL moves&ex.] – withonlytransfersallowed;allpossibletransfersareevaluatedandthealgorithm movestothepartitionassociatedwiththebiggestimprovement[RLmovesall] – sameasabove,exceptthatbothallpossibletransfersandallpossibleexchanges areevaluated[RLmoves&ex. all] • the k-means-based algorithm described in Subsection 2.4, implemented in package kmBlocks(Žiberna,2019b)fortheRprogramminglanguage: – Only k-means[k-means] • the k-means-basedfollowedbytherelocationalgorithmdescribedinSubsection2.5: – k-meansfollowedby“RLmove”. 5/6ofthetimewasallocatedtok-means,while theremaining1/6wasgiventothe“RLmove”foroptimizingthebestpartition foundby“k-means”[k-means+RLmove] – k-meansfollowedby“RLmove&ex.” 5/6ofthetimewasallocatedto k-means, whiletheremaining1/6wasgivento“RLmove&ex.” foroptimizingthebest partitionfoundby“k-means”[k-means+RLmove&ex.]. Inaddition,theerroroftheoriginalpartition,namely,thepartitionusedingeneratingthe network, and the error for the partition where all units are in the same (one) cluster, were computed. Theresultsforthesepartitionsarenotpresented,butwereonlyusedtocompute oneoftheperformancemeasures: relativeerror(seethenextsubsection). 3.2. Implementation and evaluation measures ThesimulationstudywasrunonadesktopcomputerwithanIntel ® Core ™ i7-77003.6GHz CPUwith16GBofRAM.AllalgorithmswererunthroughtheRstatisticalsoftwareversion Comparing different methods for one-mode homogeneity blockmodeling ... 23 3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196 3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196 k = 2, Prob. of a tie in com. blocks = 0.08 3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196 3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196 k = 2, Prob. of a tie in com. blocks = 0.16 3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196 3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196 k = 2, Prob. of a tie in com. blocks = 0.24 4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196 4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196 k = 4, Prob. of a tie in com. blocks = 0.08 4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196 4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196 k = 4, Prob. of a tie in com. blocks = 0.16 4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196 4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196 k = 4, Prob. of a tie in com. blocks = 0.24 8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198 8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198 k = 8, Prob. of a tie in com. blocks = 0.08 8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198 8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198 k = 8, Prob. of a tie in com. blocks = 0.16 8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198 8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198 k = 8, Prob. of a tie in com. blocks = 0.24 19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188 19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188 k = 16, Prob. of a tie in com. blocks = 0.08 19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188 19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188 k = 16, Prob. of a tie in com. blocks = 0.16 19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188 19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188 k = 16, Prob. of a tie in com. blocks = 0.24 Figure1: Examplesofageneratednetworkbasedontheoriginalpartitionfornetworkswith 200units(n=200),withallpossibleclustersizes(k=2,4,8,16)andwiththeprobabilityof ‘complete’blocks[Prob. ofatieincom. blocks]equaling0.08,0.16,0.24. Theimagematrix isthesameforalldensitieswithagivennumberofclusters. 24 Žiberna 3.5.3(64-bitversion). TheindirectapproachwasconductedbyfirstcomputingthecorrectedEuclidiandistance usingthesedist()functionfromtheblockmodelingTestRpackage(Žiberna,2019a). On this,thehierarchicalclusteringwascomputedwiththehclust()function(methodward.D2) fromthestatspackage(RCoreTeam,2019). 2 Fortherelocationalgorithm(homogeneitygeneralizedblockmodeling),theimplementa- tioninthe blockmodelingTestRpackageversion0.3.5.9000wasused(Žiberna,2019a). The most time-consuming parts of the algorithm (refinement of the partitions) was pro- grammedinC.Forthek-meansalgorithm,thekmBlockspackagewasused(Žiberna,2019b). The same part (refinement of partitions) was programmed inC++ using the Rcpp (Eddel- buettel & Francois, 2011) and RcppArmadillo (Eddelbuettel & Sanderson, 2014). The implementationoftheotherparts(generatingrandompartitions,choosingthebestresult,etc., wasimplantedinRandpracticallythesameforbothapproachestomaketheresultsofthese twoapproachesassimilaraspossible. Itshouldbementionedthatthefocuswasmoreonthe comparabilityofthesetwoimplementations(relocationalgorithmsand k-means)thanonthe efficiencyofthetwoalgorithms. Theanalysisofallnetworkswithallalgorithmsforasinglecombinationoffactorswas run on a single thread, although multiple threads were used to run simulations for several factorcombinationssimultaneously. Eachiterativemethodwasallocatedatmost3minutesforanalyzingeachnetwork. The iterativemethods(k-meansandgeneralizedblockmodeling)methodswereimplementedso thattheyfinishedafterthetimelimithadbeenreached. Accordingly,ifthetimelimitwas reached, the refinement stopped and the current partition (and error) were saved together withthosewithalreadyfinishedrestarts(ifany). 3 Themethods“k-means+RLmove”and “k-means+RLmove&ex.” oftenfinished(substantially)soonerbecausethesecondphase (relocationalgorithm)finishedmuchfasterthanallowed. Thisexcludestheindirectapproach,whichisnotaniterativeoneandwheretheexecution time was not limited. The results show this approach needed much less time on networks with 800 units or less, although it also needed about 5 minutes on networks with 1600 units. However,sincethismethodisnotthemainfocusofthepaper,thisisnotregardedas problematic. The results of the methods were assessed based on two main measures: error (sum of squareddeviationsfromtheblockmeans)andtheAdjustedRandIndex(Hubert&Arabie, 1985). Especiallyingraphs(tomakescalesmorecomparable),relativeerrorisusedinsteadof originalerror. Someothermeasuresbasedonthesetwo(foreaseofinterpretation)andsome auxiliarystatisticswerealsocomputedorrecorded. Thefollowingstatisticsweretherefore saved(thenamesinsquarebracketsindicatethenamesusedinsupplementarymaterials): • AdjustedRandIndex[ARI] 2 Theward.Dmethodwasalsotested. ward.D2internallysquaresthedistancesandisthereforeequivalent tousingsquaredEuclidiandistances,whileward.DusesoriginalEuclidiandistances. Theresultswerevery similar. Thesinglelinkageandcompletelinkagemethodswerealsotestedbut,asexpected,theirresultswere inferiorandarethereforenotpresented. 3 Asthecheckforthetimelimitisonlyperformedatcertainpointsinthecode,itcouldhappenthatthetotal timewasexceeded. Inthiscase,theresultsofthelastrestartwereusedwiththeprobabilityproportionaltothe timeleftforexecutionbeforethelastrestartrelativetothetimeusedbythelastrestart. Thiswasimplemented sothatmethodswithfasterrestartswouldnotbefavored(forbiggerproblems). Inrarecaseswherethisresulted innorestartsbeingused,theworst-casescenario,thatis,whenweassumeonlyasinglecluster,wastakenasthe solutionforthismethod. Comparing different methods for one-mode homogeneity blockmodeling ... 25 • IsthemethodthebestmethodforagivennetworkbasedontheARI?[IsbestARI] • Error[Error]–sumofsquareddeviationsfromtheblockmeans • Relativeerror[Relativeerror]. Therelativeerrorcomparestheerrorofeachmethod with the error of the original partition (used to generate the network) and the error that we would obtain if we had assumed only one cluster and therefore the whole networkwouldrepresentoneblock(allreasonablemethodsshoulddobetterthanthat). Itrepresentshowmuchtheerrorofagivenmethodisworsethanthatoftheoriginal partition,relativetohowmuchtheerroroftheoriginalisbetterthanthatinthecaseof onlyonecluster. Itiscomputedas: relativeerror= err−err org err one −err org whereerristheerroroftheevaluatedmethod,err org istheerroroftheoriginalpartition, anderr one istheerrorinthecaseofonlyonecluster. Lowervaluesareofcoursebetter, althoughvaluesbelow0mightbeinterpretedasoverfitting. • Isthemethodthebestmethodforagivennetworkbasedontheerror? [Isbesterror] • Numberofrestartsachieved 4 [Numberofrestarts] • Elapsedtime[Elapsedtime];thetimeusedbythealgorithm. 4. Results Duetothemagnitudeoftheresults,notallarepresentedinthispaper,althoughallresults maybefoundintheinteractivetableandchartinthesupplementarymaterials. Inthepaper, onlytheresultsfortwomeasures—relativeerrorandtheAdjustedRandIndex—arepresented. For better readability, solely the results for a selection of methods are presented. At leastonemethodispresentedforeachgroupofmethods(indirectapproaches,generalized blockmodeling/relocation-based,k-means,combination). Amongindirectapproaches,only theresultsforWard’shierarchicalclusteringbasedoncorrectedsquaredEuclidiandistance are presented. Here the selection was mainly based on theory (see Subsection 2.2). The resultsofusingWard’smethodon“unsquared”correctedEuclidiandistancewereinsome casesslightlybetter,yetthedifferenceisnegligible. Otherversionsoftheindirectapproach performedworse. Inaddition,theversionsofrelocationalgorithmwhereallpossibletransfersortransfers and exchanges are evaluated before selecting the best one (“RL move all” and “RL move & ex. all”) are omitted. As may be seen in the supplementary materials, these methods practically never performed better than the other versions of the relocation algorithm and usually(especiallyinthecaseoflargernetworks)performedmuchworse. TheresultsformeanrelativeerrorarepresentedinFigure2,whiletheresultsforthemean Adjusted Rand Index (ARI) are shown in Figure 3. Based on the relative error (Figure 2), we may conclude that in most cases studied with only 2 groups the difference among the various approaches is minimal. For most other cases, hierarchical clustering is shown to performworse thanother methods. The differenceis in most casesbiggerwhen there is a lowerprobabilityofatieinthe‘complete’blocks. However,inthecaseofalargenumber ofunits(800andespecially1600)andarelativelyhighprobabilityofatieinthe‘complete’ 4 Thelastrestartistakenintoaccount(ifnotdeletedasexplainedinthepreviousfootnote)evenifitended prematurely. 26 Žiberna blocks,thehierarchicalapproachperformsbetterthantherelocationalgorithm, 5 whilethe k-means-basedapproachesareclearlysuperior. Asthenumberofunitsrises,thedifferencesamongapproachesthatincludethe k-means algorithmandthoseusingonlytherelocationalgorithmincrease,wherethealgorithmswhich include k-meansalgorithmaretheclearwinners. Thedifferencesareslightlybiggerasthe probabilityofatieinthe‘complete’blocksincreases. Onlyincaseswitharelativelysmall number of units and large number of clusters do the approaches based on the relocation algorithmperformslightlybetterthanthosebasedon k-means. Thesearealsocaseswhere usingtherelocationalgorithmafter k-meansisobviouslybeneficial. Inmostothercases,no clearsuggestionsregardingthiscanbegiven. l l l l l l l l ll l l l l l l ll ll l l l l ll ll ll l l l l ll l l ll l l l l l l l l l l l l l l l l l l ll l l l l l l ll ll l l l l l l ll ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l k: 2 k: 4 k: 8 k: 16 n: 200 n: 400 n: 800 n: 1600 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 −4 −2 0 −5 −4 −3 −2 −1 0 1 −2 −1 0 1 −0.5 0.0 0.5 1.0 Prob. of a tie in com. blocks Mean relative error Method l l HclustWard2 RL move RL move & ex. k−means k−means + RL move k−means + RL move & ex. Figure2: Valuesofthemeanrelativeerrorforthedifferentmethods. Eachlinerepresentsthe resultsofasinglemethod. Thepanelsaredeterminedbythevaluesofthevariablesnumberof groups(k)andnumberofunits(n)indicated(variablesandtheirvalue)inthegrayrectangles onthetopandrightsidesofthefigure. Insomecases,wecanobservethattherelativeerrorislowerthan0,indicatingthatthe algorithmsfoundapartitionwiththeerrorlowerthantheerrorofthepartitionthatwasused togeneratethenetwork. Thisisacaseofoverfittingandhappenswherethereisnotenough 5 Itshouldhoweverbenotedthatonnetworkswith1600unitsthehierarchicalapproachonaverageneeded 5minutestocomplete,whichismoretimethanwasgivenfortheotherapproaches. Comparing different methods for one-mode homogeneity blockmodeling ... 27 datatorecoverthetruestructure. Thatis,thisoccursincaseswherethedifferencebetween theprobabilityofatiein ‘null’and ofthat in‘complete’ blocksisto low comparedto the expected size of the blocks. As may also be observed in Figure 1, random tie generation causesthatblockdensitiesarenotexactlyequaltotheprobabilityofatieinacertainblock. Another interesting observation is that while for the relocation algorithms the relative errormoreoftenthannotincreasesasthenumberofunitsincreases,thisisnotthecasewith the k-means-based methods. Larger networks on one hand mean larger blocks which are, especially at the relative low densities used in this simulation, easier to find, while on the otherhandtheyalsoincreasethecomputationalburden. Inasettingwith800unitsormore, k-meansbasedalgorithmsclearlyoutperformtherelocationalgorithms. l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll ll ll l l l l ll ll l l l l l l ll l l l l l l l l ll l l l l l l l l k: 2 k: 4 k: 8 k: 16 n: 200 n: 400 n: 800 n: 1600 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Prob. of a tie in com. blocks Mean ARI Method l l HclustWard2 RL move RL move & ex. k−means k−means + RL move k−means + RL move & ex. Figure 3: Values of the mean Adjusted Rand Index for the different methods. Each line represents the results of a single method. The panels are determined by the values of the variablesnumberofgroups(k)andnumberofunits(n)indicated(variablesandtheirvalue) inthegrayrectanglesonthetopandrightsidesofthefigure. ResultsbasedonthemeanAdjustedRandIndex(Figure3)areevenmorefavorablefor the k-meansversusrelocationalgorithmapproaches. Inalmostallcases,oneofthe k-means basedapproachesisthebestandthedifferencesamongthemarenegligible. Onlyinlarge k,small ncases(alargenumberofclustersandsmallnumberofunits)aretheapproaches using the relocation algorithm slightly better, however ARI is relatively low here for all methods. Asseenonthepreviousgraph,indirectapproachpreformswellonlyinthecase 28 Žiberna ofalarge numberof units(800andespecially1600)and arelativelyhighprobabilityof a tie in the ‘complete’ blocks. As expected, for all methods ARI rises as the probability of a tie in ‘complete’ blocks rises and when the number of clusters decreases (due to larger blocks). The effect of networks’ size on ARI depends on the methods used. In general, it hasapositiveeffecton k-meansbasedapproachesandtheindirectapproach,andgenerally anegativeeffectontherelocation-basedapproaches(withsomeexceptions). Asdiscussed abovewithrespecttorelativeerror,largernetworksmeanlargerblockswhichare,especially attherelativelowdensitiesusedinthissimulation,easiertofind,yettheyalsoincreasethe computational burden. It seems that k-means based approaches are fast enough to use the advantagesoflargernetworks(ofthesizesanalyzedhere),thatis,theyarenothamperedthat muchbythegreatercomputationalburden. 5. Conclusion One can find several methods for one-mode blockmodeling of networks based on the minimum sum of squares criteria. This paper aimed to compare these methods using a simulation. The main aim was to compare approaches based on the relocation algorithm (homogeneitygeneralizedblockmodeling(Žiberna,2007))anditsvariantswiththe k-means algorithm(Žiberna,2020)andanapproachwheretherelocationalgorithmisusedtooptimize thepartitionproducedby k-means. Inaddition,anindirectapproachwasconsidered. Theoverallfindingisthatinmostcases,the k-meansbasedapproaches,namelyeither k-meansaloneor k-meansfollowedbytherelocationalgorithm,performbestbothinterms of the optimized criterion and the similarity between the original partition (the one used for generating the network) and the one found by the algorithm. In addition, it is never muchworsethanthebestapproachandthusgenerallyseemstobethesafestapproach. The difference is negligible for networks with 400 units or smaller, but becomes very big for large networks (1600 units), especially when there are 4 clusters or more (in case of large networksandaclearstructure,thehierarchicalapproachcanproducesimilarresultsasthe k-meansbasedapproach). Onlyinthecaseofrelativelysmallnetworksandalargenumberof clustersdopurerelocationalgorithmsusuallyperformbetter. Asexpected,theuseofindirect approachesisneverthebestapproach,atleastnotintermsoftheoptimizedcriterion. Theresultsforwhetheritisbettertousejust k-meansor k-meansfollowedbyrelocation algorithmsareinconclusive. Thelatterapproachisclearlyonlysuperiorforsmallnetworks witharelativelylargenumberofclusters,wheretheuseofthepurerelocationalgorithmis generallypreferred. Withrespecttorelocationapproaches,theuseofversionsofalgorithmthatmovetoanew partitionassoonasanimprovementisfoundisrecommendedbecauseevaluatingallpossible transfers(ortransfersandexchanges)isprohibitiveforlargernetworks. Of course, this study has its own limitations, mainly stemming from the fact that only binary networks of selected sizes (from 200 to 1600), numbers of clusters (2 to 16) and densitieswereevaluated. Thestudyalsoassumedthatthenumberofclustersisknown. A numberofunsettledquestionsremainsthatcouldbeansweredusingsimulationstudies,for example, ways of determining the correct number of clusters and what is the best way for initializingthe k-meansalgorithm,tonamejustafew. References Anderson,C.J.,Wasserman,S.,&Faust,K.(1992).Buildingstochasticblockmodels. Social Networks, 14(1–2),137–161.https://doi.org/10.1016/0378-8733(92)90017-2 Comparing different methods for one-mode homogeneity blockmodeling ... 29 Baier,D.,Gaul,W.,&Schader,M.(1997).Two-modeoverlappingclusteringwithapplica- tionstosimultaneousbenefitsegmentationandmarketstructuring.In Classification and knowledge organization(pp.557–566).Springer.https://doi.org/10.1007/978-3- 642-59051-1_58 Batagelj, V., Ferligoj, A., & Doreian, P. (1992). Direct and indirect methods for structural equivalence. Social Networks, 14(1–2),63–90.https://doi.org/10.1016/0378-8733(92 )90014-X Borgatti, S. P., & Everett, M. G. (1992). Regular blockmodels of multiway, multimode matrices. Social Networks, 14(1–2),91–120.https://doi.org/10.1016/0378-8733(92)9 0015-Y Breiger,R.L.,Boorman,S.A.,&Arabie,P.(1975).Analgorithmforclusteringrelationaldata withapplicationstosocialnetworkanalysisandcomparisonwithmultidimensional scaling. Journal of Mathematical Psychology, 12(3),328–383.https://doi.org/10.101 6/0022-2496(75)90028-0 Brusco, M. J., & Doreian, P. (2015a). An exact algorithm for the two-mode KL-means partitioningproblem. Journal of Classification, 32(3),481–515.https://doi.org/10.10 07/s00357-015-9185-z Brusco,M.J.,&Doreian,P.(2015b).Areal-codedgeneticalgorithmfortwo-modeKL-means partitioning with application to homogeneity blockmodeling. Social Networks, 41, 26–35.https://doi.org/10.1016/j.socnet.2014.11.007 Brusco,M.J.,&Steinley,D.(2007).Avariableneighborhoodsearchmethodforgeneralized blockmodelingoftwo-modebinarymatrices.JournalofMathematicalPsychology, 51,325–338.https://doi.org/10.1016/j.jmp.2007.07.001 Brusco, M. J., & Steinley, D. (2011). A tabu-search heuristic for deterministic two-mode blockmodelingofbinarynetworkmatrices. Psychometrika, 76(4),612–633.https://d oi.org/10.1007/s11336-011-9221-9 Burt,R.S.(1976).Positionsinnetworks. Social Forces, 55(1),93–122.https://doi.org/10.23 07/2577097 Doreian,P.,Batagelj,V.,&Ferligoj,A.(1994).Partitioningnetworksbasedongeneralized conceptsofequivalence.TheJournal ofMathematical Sociology,19(1),1–27.https: //doi.org/10.1080/0022250X.1994.9990133 Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized blockmodeling of two-mode networkdata. Social Networks, 26,29–53.https://doi.org/10.1016/j.socnet.2004.01.0 02 Doreian, P., Batagelj, V., & Ferligoj, A. (2005). Generalized blockmodeling. Cambridge UniversityPress. Doreian,P.,&Mrvar,A.(2009).Partitioningsignedsocialnetworks. Social Networks, 31(1), 1–11.https://doi.org/10.1016/j.socnet.2008.08.001 Eddelbuettel,D.,&Francois,R.(2011).Rcpp:SeamlessRandC++Integration.Journal of Statistical Software, 40(8),1–18.https://doi.org/10.18637/jss.v040.i08 Eddelbuettel,D.,&Sanderson,C.(2014).RcppArmadillo:AcceleratingRwithhigh-perfor- manceC++linearalgebra. Computational Statistics & Data Analysis, 71,1054–1063. https://doi.org/10.1016/j.csda.2013.02.005 Hartigan,J.A.(1975). Clustering algorithms.Wiley. Hartigan,J.A.,&Wong,M.A.(1979).AlgorithmAS136:A K-meansclusteringalgorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1),100–108. https://doi.org/10.2307/2346830 30 Žiberna Holland,P.W.,Laskey,K.B.,&Leinhardt,S.(1983).Stochasticblockmodels:Firststeps. Social Networks, 5(2),109–137.https://doi.org/10.1016/0378-8733(83)90021-7 Hubert,L.,&Arabie,P.(1985).Comparingpartitions. Journal of Classification, 2(1),193– 218.https://doi.org/10.1007/BF01908075 Lorrain,F.,&White,H.C.(1971).Structuralequivalenceofindividualsinsocialnetworks. The Journal of Mathematical Sociology, 1(1),49–80.https://doi.org/10.1080/002225 0X.1971.9989788 MacQueen,J.(1967).Somemethodsforclassificationandanalysisofmultivariateobserva- tions.InL.Lecam&J.Neyman(Eds.), Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability(pp.281–297). RCoreTeam.(2019).R:Alanguageandenvironmentforstatisticalcomputing(Version3.5.3) [Computersoftware].RFoundationforStatisticalComputing.https://www.r-project .org/ Snijders,T.A.B.,&Nowicki,K.(1997).Estimationandpredictionforstochasticblockmodels forgraphswithlatentnlockstructure. Journal of Classification, 14,75–100. Steinley,D.(2003).LocaloptimainK-meansclustering:Whatyoudon’tknowmayhurtyou. Psychological Methods, 8(3),294–304.https://doi.org/10.1037/1082-989X.8.3.294 Steinley,D.(2006).K-meansclustering:Ahalf-centurysynthesis. British Journal of Mathe- matical and Statistical Psychology, 59(1),1–34.https://doi.org/10.1348/000711005 X48266 Van Mechelen, I., Bock, H.-H., & De Boeck, P. (2004). Two-mode clustering methods: A structuredoverview. Statistical Methods in Medical Research, 13(5),363–394. vanRosmalen,J.,Groenen,P.J.F.,Trejos,J.,&Castillo,W.(2009).Optimizationstrategies fortwo-modepartitioning.JournalofClassification,26,155–181.https://doi.org/10 .1007/s00357-009-9031-2 Vichi, M. (2001). Double k-means clustering for simultaneous classification of objects and variables. In S. Borra, R. Rocci, M. Vichi, & M. Schader (Eds.), Advances in classification and data analysis(pp.43–52).Springer.https://doi.org/10.1007/978-3- 642-59471-7_6 Ward,J.H.(1963).Hierarchicalgroupingtooptimizeanobjectivefunction.Journalofthe American Statistical Association, 58(301),236–244.https://doi.org/10.2307/2282967 White,D.R.,&Reitz,K.P.(1983).Graphandsemigrouphomomorphismsonnetworksof relations. Social Networks, 5(2),193–234.https://doi.org/10.1016/0378-8733(83)900 25-4 Žiberna, A. (2007). Generalized blockmodeling of valued networks. Social Networks, 29, 105–126.https://doi.org/10.1016/j.socnet.2006.04.002 Žiberna,A.(2019a).blockmodelingTest:AnRpackageforgeneralizedandclassicalblock- modeling of valued networks (Version 0.3.5.9000) [Computer software]. R-Forge. https://rdrr.io/rforge/blockmodelingTest/ Žiberna,A.(2019b). kmBlock: k-means like blockmodeling of one-mode and linked networks (Version0.0.1.9102)[Computersoftware].R-Forge.https://rdrr.io/rforge/kmBlock/ Žiberna, A. (2020). k-means-based algorithm for blockmodeling linked networks. Social Networks, 61,153–169.https://doi.org/10.1016/j.socnet.2019.10.006