AdvancesinMethodologyandStatistics/Metodološkizvezki,Vol.18,No.1,2021,17–30
https://doi.org/10.51936/kcpy2449
Comparingdifferentmethodsforone-modehomogeneity
blockmodelingaccordingtostructuralequivalence
onbinarynetworks
AlešŽiberna
University of Ljubljana, Faculty of Social Sciences, Ljubljana, Slovenia
Abstract
One-modehomogeneityblockmodelingisanapproachtoclusteringnetworksthatsearches
forpartitionsofunitsinanetworksothattheresultingblocksareashomogeneousaspossible.
Blockisapartofthenetworkthatcontain(possible)tiesfromtheunitsofoneclustertothe
unitsofanothercluster(ortieswithinacluster). Typically,sumofsquareddeviationsfrom
themeanistakenasthemeasureofvariability(non-homogeneity). Thepaperpresentsthe
resultsofasimulationstudythatappliedseveralmethodsforthisproblemtobinarynetworks
generatedaccordingtostructuralequivalence. Severalversionsofhomogeneitygeneralized
blockmodeling(usingarelocationalgorithm),a k-means-basedalgorithm,andanindirect
approacharecompared. Sinceallofthemethodsbeingcomparedtrytooptimizethesame
criterionfunction,thisandtheAdjustedRandIndexarethemaincriteriaforthecomparison.
Allmethods(excepttheindirectapproach,whichisnotiterative)weregiventhesameamount
oftimetoﬁndthebestpossiblesolution. Theoverallconclusionisthatthek-meansapproach
is advised in most cases, except when smaller networks (200 units) are being partitioned
intolargernumberofclusters,inwhichcasethehomogeneitygeneralizedblockmodelingis
preferred.
Keywords: blockmodeling,sumofsquares,generalizedblockmodeling, k-meansapproach,
indirectapproach
1. Introduction
One-mode homogeneity blockmodeling (Žiberna, 2007) is an approach for clustering
networks. Itsearchesforpartitionsofunitsinanetworksothattheresultingblocks,namely,
thepartsofthenetworkthatcontain(possible)tiesfromtheunitsofonegrouptotheunitsof
anothergroup(ortieswithinagroup),areashomogeneousaspossible(wheresomemeasure
of their variability is as low as possible). Typically, the sum of squared deviations from
∗
Correspondingauthor
Email address: ales.ziberna@fdv.uni-lj.si(AlešŽiberna)
ORCIDiD: 0000-0003-1534-6971(AlešŽiberna)
18 Žiberna
the mean (sum of squares for short) is taken as the measure of variability and hence only
thismeasureisconsideredheresincethen k-means-like(Žiberna,2020)approachesmaybe
used. The aim of this paper is to investigate which strategies are best for ﬁnding the ideal
partitionusingthiscriterion. Here, k-meansandgeneralizedblockmodelingapproachesare
considered,togetherwithvariousstrategiesforcertainaspectsoftherelocationalgorithmused
withingeneralizedblockmodelingandthepossibilityofcombiningthetwoapproaches. In
addition,anindirectapproachtoblockmodelingisexamined. Resultsofasimilarsimulation
as presented here, but based on linked networks and without the indirect approach, has
already been presented (Žiberna, 2020). However, given that most users of homogeneity
blockmodeling approaches analyze one-mode networks, I believe these results are more
relevantformostcaseswhiletheyalsoincorporateuseofanindirectapproach. Inthispaper,
only“simple”one-modebinarydirectednetworkswithoutloops(self-ties)areexamined,that
is,thosewhichcontainonlyonesetofunitsandonlyonedirectedrelation.
2. Blockmodelingcriteriaandapproaches
Blockmodeling is a clustering method for clustering units in a network. Its aim is to
partition network units into clusters and, at the same time, to partition the set of ties into
blocks (Doreian et al., 2005, p. 29). Blockmodeling may also be “[v]iewed as a method
of data reduction, [...] a valuable technique in which redundant elements in an observed
system are reduced to yield a simpliﬁed model of relationships among types of elements
(units)”(Borgatti&Everett,1992). Severalapproachestoblockmodelingexist. Onepossible
division entails stochastic approaches, namely stochastic blockmodeling (Anderson et al.,
1992;Hollandetal.,1983;Snijders&Nowicki,1997),anddeterministicapproaches.
Only selected deterministic approaches are considered in this text because the aim is
to only look at approaches that minimize the sum of squares within blocks. Deterministic
approaches may be further split into indirect approaches1 (e.g. Breiger et al., 1975; Burt,
1976; see Doreian et al., 2005, pp. 25–26 for deﬁnition) and direct approaches. In this
paper, one indirect and two direct approaches are examined. The indirect approach used
reliesoncorrectedsquaredEuclidiandistance(Batageljetal.,1992)andWardhierarchical
clustering(Ward,1963). Thedirectapproachesusedaregeneralizedblockmodeling(Doreian
etal.,1994,2005;Žiberna,2007)and k-meansbasedblockmodelingapproaches(Brusco&
Doreian,2015a,2015b;Žiberna,2020). Inthesubsectionsbelow,thesumofsquarescriterion
ispresentedﬁrst,followedbytheapproachesthatarebeingconsidered.
2.1. Sum of squares criterion
Inthispaper,themethodsaimtominimizewithin-blocksumofsquareddeviationsfrom
themean. Theytherebyaimtoobtainthepartitionthatinducesblockswhosevaluesareas
homogenousaspossible.
For the binary networks under study here, ideal blocks (those leading to value 0 of a
criterion function) include complete blocks (all possible ties are present) and null blocks
(notiesarepresent). However,suchsituationsarerarelyfoundinrealnetworksunlessthe
numberofclustersislargerelativetothenumberofunits. Therefore,forbinarynetworks,
thiscriterionessentiallymeanswearelookingforblockswherethedensitiesoftherowsand
likewiseofthecolumnsareassimilaraspossibletoeachother.
Moreprecisely,thecriterionfunctionthatisoptimizedis:
f(P)=
K
∑
k=1
K
∑
l=1
v
kl
Comparing different methods for one-mode homogeneity blockmodeling ... 19
where v
kl
is the sum of squared deviations from the mean for block kl. The notation is
explainedinTable1.
Table1: Nomenclature
n numberofunits
U asetof nunits
X arealorbinarymatrixwithelements x
ij
andwithdimensions n×n
K numberofclusters
P apartitionofallunits, P={C
1
,...,C
K
},∪
K
i=1
C
i
=U
Π asetofallpossiblepartitions
¯ x thegrandmeanofX,
¯ x=
n
∑
i=1
n
∑
j=1,i̸=j
x
ij
(n(n−1))
assumingtoloops(self-ties)
¯
X a matrix with dimensions K×K with elements ¯ x
kl
representing the mean of the
blockfromclusterC
k
toclusterC
l
,where
¯ x
kl
=
∑
i∈C
k
∑
j∈(C
l
\i)
x
ij
|C
k
|·(|C
l
|−δ(k,l))
δ(l,k) afunctionthatreturns1ifitsargumentsareequal,0otherwise
V amatrixwithdimensions K×K withelements
v
kl
=
∑
i∈C
k
∑
j∈{C
l
\}
 
x
ij
− ¯ x
kl

2
,
thewithin-blocksumofsquaresforblockfromclusterC
k
toclusterC
l
2.2. Indirect blockmodeling
Asmentioned,theindirectapproachusedisWard’shierarchicalclustering(Ward,1963)
appliedtothecorrectedsquaredEuclidiandistance(Batageljetal.,1992)since,atleaston
classicaldata,thiscombinationisoftenseenasa(usuallyinferior)alternativetohomogeneity
approachesbasedlocaloptimizationofsumofsquarescriteria,e.g. the k-meansapproach
(Steinley,2003). Someothercombinationsofdistance(using“unsquared”correctedEuclidian
distance) and other hierarchical methods (complete, single) are also used for comparison
purposes.
Thisapproachisnotthefocusofthisresearch,butincludedtoalsocheckwhetherinthe
case of homogeneity criteria the partitions obtained from indirect approaches are inferior
to those from direct ones as observed for the criterion function counting the number of
inconsistencies(Batageljetal.,1992).
2.3. Generalized blockmodeling
Itwasnotedthattheﬁrstdirectapproachtobeusedisgeneralizedblockmodeling(Doreian
etal.,2005). Thisisaveryﬂexibleapproachforblockmodeling, asitcannotonlycluster
units according to one type of equivalence, such as structural (Lorrain & White, 1971) or
regular(White&Reitz,1983)equivalence,butcanbeusedwithanyequivalencethatcanbe
20 Žiberna
expressedbyspecifyingasetofallowedblocktypes. Whileinitiallydesignedforonlybinary
one-mode networks (Doreian et al., 1994), it was later extended to various other network
types(Doreianetal.,2004;Doreian&Mrvar,2009). Forthispaper,theextensiontovalued
networkscalledsumofsquareshomogeneityblockmodeling(Žiberna,2007)isespecially
important because this minimizes the sum of squares within-blocks criteria, which is the
criterionusedinthispaper.
Particularly for the purpose of optimizing the sum of squares criteria introduced in
Subsection2.1,thisapproachbasicallyamountstousingarelocationalgorithmforoptimizing
thiscriterion. Sinceitisonlyalocally-optimalalgorithm,thewholeprocedureisrepeated
(restarted)withseveral(typicallyrandomlyselected)startingpoints. Implementationofthe
relocationalgorithmusedhereinvolvestwopossible“relocations”:
1. Transfer: Transferringaunitfromoneclustertoanother(iftheunitisnottheonlyunit
initscluster);
2. Exchange: Exchangingtwounitsfromdifferentclusters.
Theﬁrstoption(transfer)isalwaysusedbecauseitisrelativelyfast. Thesecondoption
(exchange) on the other hand uses much more time. For binary two-mode blockmodeling
has been shown (Brusco & Steinley, 2011) that it is better to use more restarts (repeat the
algorithmwithanewrandomstartingpoint)withoutexchangesthantotrytobetteroptimize
eachpartitionusingthisoption.
Anotherimplementationdetailiswhetherallallowedrelocations(soeitherjusttransfers
ormovesandtransfers)areﬁrstevaluatedandthepartitionthatimprovesthecriterionfunction
themostisselectedasthepartitionforuseinthenextiteration.
Theotherpossibilityisthatwhentheﬁrstpartitionwhichimprovesthecriterionfunction
isfound, thealgorithmmovestothisnewpartition. Whenthisoptionisused, theorderin
whichtherelocationsareperformedisveryimportant. Topreventbiasandallowasdiverse
searchaspossible,theorderofrelocationswasrandom.
1
2.4. k-means blockmodeling
k-meansblockmodelinghasitsrootsinthe k-meansalgorithm(Hartigan,1975;Hartigan
&Wong,1979;MacQueen,1967),awell-knownalgorithmforpartitioningunitsin“classical”
data(unitsbyvariables)byminimizingthewithin-clustersumofsquares. Thealgorithmis
locally-optimal. Athoroughreviewisgivenbye.g. Steinley(2006). Theoriginal k-means
algorithm therefore partitions one mode of a two-mode valued matrix. The method was
ﬁrst adapted to the two-mode clustering of two-mode matrices or networks (Baier et al.,
1997; Brusco & Doreian, 2015a, 2015b; Van Mechelen et al., 2004; van Rosmalen et al.,
2009;Vichi,2001)andrecentlytotheone-modeclusteringofone-modematrices/networks
(Žiberna, 2020). This last variant of thealgorithm is evaluated in this paper. The k-means
algorithmcanbeinitializedeitherwitharandompartitionorwithrandomcentroids. Theﬁrst
variantisreliedonhere.
1
Forfasterperformance,theorderwasnotcompletelyrandom. Ineachiteration,theprogramgoesthrough
allclustersinrandomorder. Withineachcluster,itgoesthroughallunitsfromthisclusterinrandomorder. For
eachunit,theprogramthengoesthroughallremainingclustersinrandomorder. Foreachcluster,itﬁrstchecks
ifthetransferofaselectedunittothisclusterimprovesthecriterionfunction. Ifitdoesnot,theexchangesare
allowed,andthesecondclusterhasahigheridthantheﬁrstone(toavoidexaminingthesameexchangetwice),
itgoesthroughallunitsinthisclusterandevaluatesitiftheexchangeofthesetwounits(fromdifferentclusters)
improvesthecriterionfunction. Wheneveranimprovementisfound,thetransferorexchangeisexecutedand
thenewpartitionistakenasthestartingpartitionforthenextiteration. All“randomorders”aredifferentineach
iteration.
Comparing different methods for one-mode homogeneity blockmodeling ... 21
2.5. k-means plus relocation
Späth (1980, as cited in Steinley, 2006, p. 3) noted the results of the original k-means
algorithmcansometimes(althoughpresumablyrarelyinpractice)beimprovedbyrelocating
a single unit from one cluster to another. Similarly was found for the algorithm for the
two-modepartitioningoftwo-modenetworks(Brusco&Steinley,2007). Assumofsquares
homogeneityblockmodelingoptimizesthecriterionfunctionusingarelocationalgorithmas
the k-meansapproachforone-modenetworks,thisrelocation-basedalgorithmisusedhere
inanattempttoimprovethe k-meanssolution. Thefactthatthegeneralizedblockmodeling
algorithmisrelativelyslowisnotasproblematicbecauseinmostcasesoptimizingasingle
goodpartitionisrelativelyquick.
3. Simulationstudydesign
The aim of this simulation study is to compare approaches to one-mode homogeneity
blockmodeling on binary networks. The approaches under comparison described in sub-
sectionsinSection2areindirectblockmodelingbasedonthecorrectedsquaredEuclidian
distance and Ward’s hierarchical clustering, homogeneity sum of squares blockmodeling,
and k-means blockmodeling. Given that what mainly differentiates k-means-based algo-
rithmsfromhomogeneity(sumofsquares)generalizedblockmodelingalgorithms(onlywhen
usedwithstructuralequivalenceandnopre-speciﬁcation)istheoptimizationalgorithm,the
generalizedblockmodelinginthissectionistypicallycalledarelocation-basedalgorithm.
3.1. Design of the study
Inthissimulationstudy,randomnetworksweregeneratedusingthefollowinggenerating
procedure. First, the partition of units into the desired number of clusters was randomly
generated with the constraint that each cluster had to have at least three units. For each
network,arandomimagematrix(amatrixrepresentingtiesamongclusters)wasgenerated
basedonthreeconstraints:
1. Onlytwotypesofblocksweregenerated,named‘null’(verysparse)and‘complete’
(lesssparse);
2. Theexpectednumberof‘complete’blocksinanyrow(orcolumn)oftheimagematrix
was1.5(regardlessofthenumberofclusters);
3. Theimagematrixhadtobesuchthatnotwoclustershadthesamepatternofties(were
notstructurallyequivalentintheimagematrix/network).
Theﬁnalnetworkwasgeneratedbasedonsuchanimagematrix. Theprobabilityofatie
in‘null’blockswassetto0.05,whiletheprobabilityofatiein‘complete’blockswassetto
oneofﬁvepossiblevalues: 0.08,0.12,0.16,0.20,0.24. The‘null’blockdensitywaschosen
basedonsomerealnetworks,whichareusuallysparse. The‘complete’blockdensitieswere
chosen so that the recovery of the correct partition for at least the best methods in most
scenariosrunsfromterribletogood. Someexamplesofgeneratednetworksarepresentedin
Figure1.
Thedesignofthesimulationallowedallpossiblecombinationsofthefactorslistedbelow
(theirlevels)tobetested. Foreachcombination,10randomnetworksweregeneratedand
analyzed. Thefactorsthatvariedwere(thenamesinsquarebracketsareusedintheﬁgures
andtables):
1. totalnumberofunits[n]: 200,400,800,1600;
2. thenumberofclusters[k]: 2,4,8,16;
22 Žiberna
3. theprobabilityofatiein‘complete’blocks[Prob. ofatieincom. blocks]: 0.08,0.12,
0.16,0.20,0.24.
Sinceallcombinationsoffactorsweretestedand10randomnetworksweregeneratedfor
eachcombination,800(=4×4×5×10)networksweregenerated. Thesenetworkswere
thenanalyzedwiththealgorithmsthatalltrytominimizethesumofsquarescriterion,thatis,
thesumofsquareddeviationsfromtheblockmeans.
Thefollowingmethodswereused(thenamesinsquarebracketsareusedintheﬁgures
andtables):
• indirectapproach,describedinSubsection2.2.
– Hierarchicalclustering, Ward’s method applied to corrected squared Euclidian
distances[HclustWard2]
– Hierarchicalclustering,Ward’smethodappliedtocorrectedEuclidiandistances
[HclustWard]
– Hierarchicalclustering,theCompleteLinkagemethodappliedtocorrectedEu-
clidiandistances[HclustComp]
– hierarchicalclustering,theSingleLinkagemethodappliedtocorrectedEuclidian
distances[HclustSingle]
• relocation-basedalgorithms–homogeneitysumofsquaresgeneralizedblockmodeling
accordingtostructuralequivalenceasdescribedinSubsection2.3andimplemented
intheblockmodelingTest package(Žiberna,2019a). Thefollowingversionswere
used:
– withonlytransfersallowed;thetransfersaretriedinrandomorderand,assoon
asanimprovementisfound,thealgorithmmovestotheimprovedpartition[RL
moves]
– thesameasabove,exceptthatinthebothtransfersandexchangesweretried[RL
moves&ex.]
– withonlytransfersallowed;allpossibletransfersareevaluatedandthealgorithm
movestothepartitionassociatedwiththebiggestimprovement[RLmovesall]
– sameasabove,exceptthatbothallpossibletransfersandallpossibleexchanges
areevaluated[RLmoves&ex. all]
• the k-means-based algorithm described in Subsection 2.4, implemented in package
kmBlocks(Žiberna,2019b)fortheRprogramminglanguage:
– Only k-means[k-means]
• the k-means-basedfollowedbytherelocationalgorithmdescribedinSubsection2.5:
– k-meansfollowedby“RLmove”. 5/6ofthetimewasallocatedtok-means,while
theremaining1/6wasgiventothe“RLmove”foroptimizingthebestpartition
foundby“k-means”[k-means+RLmove]
– k-meansfollowedby“RLmove&ex.” 5/6ofthetimewasallocatedto k-means,
whiletheremaining1/6wasgivento“RLmove&ex.” foroptimizingthebest
partitionfoundby“k-means”[k-means+RLmove&ex.].
Inaddition,theerroroftheoriginalpartition,namely,thepartitionusedingeneratingthe
network, and the error for the partition where all units are in the same (one) cluster, were
computed. Theresultsforthesepartitionsarenotpresented,butwereonlyusedtocompute
oneoftheperformancemeasures: relativeerror(seethenextsubsection).
3.2. Implementation and evaluation measures
ThesimulationstudywasrunonadesktopcomputerwithanIntel
®
Core
™
i7-77003.6GHz
CPUwith16GBofRAM.AllalgorithmswererunthroughtheRstatisticalsoftwareversion
Comparing different methods for one-mode homogeneity blockmodeling ... 23
3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196
3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196
k = 2, Prob. of a tie
in com. blocks = 0.08
3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196
3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196
k = 2, Prob. of a tie
in com. blocks = 0.16
3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196
3 6 8 9 12 13 14 16 17 21 22 24 26 29 32 33 34 35 41 43 45 46 47 48 49 51 52 53 57 58 59 61 62 64 69 70 71 74 76 77 78 79 82 84 85 87 88 89 91 92 97 98 100 101 105 106 109 110 114 115 118 119 120 122 123 124 125 128 130 132 134 137 138 139 141 147 150 151 152 154 155 156 158 159 160 161 162 163 165 167 168 171 173 178 183 184 187 189 191 192 193 194 197 198 199 200 1 2 4 5 7 10 11 15 18 19 20 23 25 27 28 30 31 36 37 38 39 40 42 44 50 54 55 56 60 63 65 66 67 68 72 73 75 80 81 83 86 90 93 94 95 96 99 102 103 104 107 108 111 112 113 116 117 121 126 127 129 131 133 135 136 140 142 143 144 145 146 148 149 153 157 164 166 169 170 172 174 175 176 177 179 180 181 182 185 186 188 190 195 196
k = 2, Prob. of a tie
in com. blocks = 0.24
4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196
4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196
k = 4, Prob. of a tie
in com. blocks = 0.08
4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196
4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196
k = 4, Prob. of a tie
in com. blocks = 0.16
4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196
4 5 6 9 13 14 25 26 27 33 35 38 41 44 49 54 56 61 62 66 68 76 77 79 81 83 84 90 93 98 106 111 116 120 122 126 142 143 144 147 148 152 155 159 160 163 170 176 183 184 192 193 198 2 3 11 15 17 20 28 30 34 36 42 48 52 59 64 72 75 85 91 100 103 104 113 118 119 121 138 141 145 158 161 162 164 166 169 174 180 187 1 8 16 18 21 24 37 39 40 43 45 50 51 53 63 69 70 71 74 80 89 92 97 101 102 107 110 112 114 115 117 124 129 130 131 133 139 146 150 151 153 154 157 165 175 179 181 185 186 189 190 191 195 197 199 200 7 10 12 19 22 23 29 31 32 46 47 55 57 58 60 65 67 73 78 82 86 87 88 94 95 96 99 105 108 109 123 125 127 128 132 134 135 136 137 140 149 156 167 168 171 172 173 177 178 182 188 194 196
k = 4, Prob. of a tie
in com. blocks = 0.24
8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198
8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198
k = 8, Prob. of a tie
in com. blocks = 0.08
8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198
8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198
k = 8, Prob. of a tie
in com. blocks = 0.16
8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198
8 13 14 31 42 45 50 72 74 94 95 96 115 128 135 136 145 150 158 168 171 190 191 200 4 16 24 27 37 43 55 65 73 90 93 113 114 116 121 126 157 161 175 177 187 192 196 2 15 21 23 44 53 69 76 83 91 102 105 142 143 149 151 160 162 170 172 186 189 7 12 17 19 30 38 39 48 57 60 61 75 77 79 84 86 88 92 101 119 120 123 130 146 154 164 173 178 180 184 193 195 197 1 6 18 20 28 29 33 35 36 58 63 68 78 99 100 104 107 111 112 122 144 153 159 163 165 169 176 185 11 52 56 70 71 97 110 118 132 139 155 166 174 179 3 5 22 26 32 41 49 54 59 62 64 66 67 81 82 85 98 103 106 109 117 127 131 133 137 138 141 147 152 188 194 199 9 10 25 34 40 46 47 51 80 87 89 108 124 125 129 134 140 148 156 167 181 182 183 198
k = 8, Prob. of a tie
in com. blocks = 0.24
19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188
19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188
k = 16, Prob. of a tie
in com. blocks = 0.08
19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188
19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188
k = 16, Prob. of a tie
in com. blocks = 0.16
19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188
19 37 45 65 95 131 139 163 169 179 182 22 31 55 59 63 85 106 112 154 165 199 9 29 39 86 110 121 135 141 143 172 1 3 5 36 123 130 152 181 191 197 10 20 21 38 53 57 71 82 83 90 100 101 103 111 113 125 190 43 109 120 158 162 171 7 15 16 18 24 25 40 44 89 92 96 122 129 144 147 151 168 170 176 185 33 47 74 75 88 133 136 148 153 164 186 8 14 58 66 70 80 102 104 107 115 118 128 142 157 180 189 192 193 200 4 32 35 41 46 73 78 79 98 105 114 124 119 145 175 187 28 52 56 68 91 132 137 140 150 160 167 196 198 13 17 49 54 67 69 72 76 94 126 149 166 195 12 23 27 42 50 61 64 97 108 116 146 173 178 183 184 2 11 34 51 60 62 81 87 117 134 138 156 159 161 174 177 194 6 26 30 48 77 84 93 99 127 155 188
k = 16, Prob. of a tie
in com. blocks = 0.24
Figure1: Examplesofageneratednetworkbasedontheoriginalpartitionfornetworkswith
200units(n=200),withallpossibleclustersizes(k=2,4,8,16)andwiththeprobabilityof
‘complete’blocks[Prob. ofatieincom. blocks]equaling0.08,0.16,0.24. Theimagematrix
isthesameforalldensitieswithagivennumberofclusters.
24 Žiberna
3.5.3(64-bitversion).
TheindirectapproachwasconductedbyﬁrstcomputingthecorrectedEuclidiandistance
usingthesedist()functionfromtheblockmodelingTestRpackage(Žiberna,2019a). On
this,thehierarchicalclusteringwascomputedwiththehclust()function(methodward.D2)
fromthestatspackage(RCoreTeam,2019).
2
Fortherelocationalgorithm(homogeneitygeneralizedblockmodeling),theimplementa-
tioninthe blockmodelingTestRpackageversion0.3.5.9000wasused(Žiberna,2019a).
The most time-consuming parts of the algorithm (reﬁnement of the partitions) was pro-
grammedinC.Forthek-meansalgorithm,thekmBlockspackagewasused(Žiberna,2019b).
The same part (reﬁnement of partitions) was programmed inC++ using the Rcpp (Eddel-
buettel & Francois, 2011) and RcppArmadillo (Eddelbuettel & Sanderson, 2014). The
implementationoftheotherparts(generatingrandompartitions,choosingthebestresult,etc.,
wasimplantedinRandpracticallythesameforbothapproachestomaketheresultsofthese
twoapproachesassimilaraspossible. Itshouldbementionedthatthefocuswasmoreonthe
comparabilityofthesetwoimplementations(relocationalgorithmsand k-means)thanonthe
efﬁciencyofthetwoalgorithms.
Theanalysisofallnetworkswithallalgorithmsforasinglecombinationoffactorswas
run on a single thread, although multiple threads were used to run simulations for several
factorcombinationssimultaneously.
Eachiterativemethodwasallocatedatmost3minutesforanalyzingeachnetwork. The
iterativemethods(k-meansandgeneralizedblockmodeling)methodswereimplementedso
thattheyﬁnishedafterthetimelimithadbeenreached. Accordingly,ifthetimelimitwas
reached, the reﬁnement stopped and the current partition (and error) were saved together
withthosewithalreadyﬁnishedrestarts(ifany).
3
Themethods“k-means+RLmove”and
“k-means+RLmove&ex.” oftenﬁnished(substantially)soonerbecausethesecondphase
(relocationalgorithm)ﬁnishedmuchfasterthanallowed.
Thisexcludestheindirectapproach,whichisnotaniterativeoneandwheretheexecution
time was not limited. The results show this approach needed much less time on networks
with 800 units or less, although it also needed about 5 minutes on networks with 1600
units. However,sincethismethodisnotthemainfocusofthepaper,thisisnotregardedas
problematic.
The results of the methods were assessed based on two main measures: error (sum of
squareddeviationsfromtheblockmeans)andtheAdjustedRandIndex(Hubert&Arabie,
1985). Especiallyingraphs(tomakescalesmorecomparable),relativeerrorisusedinsteadof
originalerror. Someothermeasuresbasedonthesetwo(foreaseofinterpretation)andsome
auxiliarystatisticswerealsocomputedorrecorded. Thefollowingstatisticsweretherefore
saved(thenamesinsquarebracketsindicatethenamesusedinsupplementarymaterials):
• AdjustedRandIndex[ARI]
2
Theward.Dmethodwasalsotested. ward.D2internallysquaresthedistancesandisthereforeequivalent
tousingsquaredEuclidiandistances,whileward.DusesoriginalEuclidiandistances. Theresultswerevery
similar. Thesinglelinkageandcompletelinkagemethodswerealsotestedbut,asexpected,theirresultswere
inferiorandarethereforenotpresented.
3
Asthecheckforthetimelimitisonlyperformedatcertainpointsinthecode,itcouldhappenthatthetotal
timewasexceeded. Inthiscase,theresultsofthelastrestartwereusedwiththeprobabilityproportionaltothe
timeleftforexecutionbeforethelastrestartrelativetothetimeusedbythelastrestart. Thiswasimplemented
sothatmethodswithfasterrestartswouldnotbefavored(forbiggerproblems). Inrarecaseswherethisresulted
innorestartsbeingused,theworst-casescenario,thatis,whenweassumeonlyasinglecluster,wastakenasthe
solutionforthismethod.
Comparing different methods for one-mode homogeneity blockmodeling ... 25
• IsthemethodthebestmethodforagivennetworkbasedontheARI?[IsbestARI]
• Error[Error]–sumofsquareddeviationsfromtheblockmeans
• Relativeerror[Relativeerror]. Therelativeerrorcomparestheerrorofeachmethod
with the error of the original partition (used to generate the network) and the error
that we would obtain if we had assumed only one cluster and therefore the whole
networkwouldrepresentoneblock(allreasonablemethodsshoulddobetterthanthat).
Itrepresentshowmuchtheerrorofagivenmethodisworsethanthatoftheoriginal
partition,relativetohowmuchtheerroroftheoriginalisbetterthanthatinthecaseof
onlyonecluster. Itiscomputedas:
relativeerror=
err−err
org
err
one
−err
org
whereerristheerroroftheevaluatedmethod,err
org
istheerroroftheoriginalpartition,
anderr
one
istheerrorinthecaseofonlyonecluster. Lowervaluesareofcoursebetter,
althoughvaluesbelow0mightbeinterpretedasoverﬁtting.
• Isthemethodthebestmethodforagivennetworkbasedontheerror? [Isbesterror]
• Numberofrestartsachieved
4
[Numberofrestarts]
• Elapsedtime[Elapsedtime];thetimeusedbythealgorithm.
4. Results
Duetothemagnitudeoftheresults,notallarepresentedinthispaper,althoughallresults
maybefoundintheinteractivetableandchartinthesupplementarymaterials. Inthepaper,
onlytheresultsfortwomeasures—relativeerrorandtheAdjustedRandIndex—arepresented.
For better readability, solely the results for a selection of methods are presented. At
leastonemethodispresentedforeachgroupofmethods(indirectapproaches,generalized
blockmodeling/relocation-based,k-means,combination). Amongindirectapproaches,only
theresultsforWard’shierarchicalclusteringbasedoncorrectedsquaredEuclidiandistance
are presented. Here the selection was mainly based on theory (see Subsection 2.2). The
resultsofusingWard’smethodon“unsquared”correctedEuclidiandistancewereinsome
casesslightlybetter,yetthedifferenceisnegligible. Otherversionsoftheindirectapproach
performedworse.
Inaddition,theversionsofrelocationalgorithmwhereallpossibletransfersortransfers
and exchanges are evaluated before selecting the best one (“RL move all” and “RL move
& ex. all”) are omitted. As may be seen in the supplementary materials, these methods
practically never performed better than the other versions of the relocation algorithm and
usually(especiallyinthecaseoflargernetworks)performedmuchworse.
TheresultsformeanrelativeerrorarepresentedinFigure2,whiletheresultsforthemean
Adjusted Rand Index (ARI) are shown in Figure 3. Based on the relative error (Figure 2),
we may conclude that in most cases studied with only 2 groups the difference among the
various approaches is minimal. For most other cases, hierarchical clustering is shown to
performworse thanother methods. The differenceis in most casesbiggerwhen there is a
lowerprobabilityofatieinthe‘complete’blocks. However,inthecaseofalargenumber
ofunits(800andespecially1600)andarelativelyhighprobabilityofatieinthe‘complete’
4
Thelastrestartistakenintoaccount(ifnotdeletedasexplainedinthepreviousfootnote)evenifitended
prematurely.
26 Žiberna
blocks,thehierarchicalapproachperformsbetterthantherelocationalgorithm,
5
whilethe
k-means-basedapproachesareclearlysuperior.
Asthenumberofunitsrises,thedifferencesamongapproachesthatincludethe k-means
algorithmandthoseusingonlytherelocationalgorithmincrease,wherethealgorithmswhich
include k-meansalgorithmaretheclearwinners. Thedifferencesareslightlybiggerasthe
probabilityofatieinthe‘complete’blocksincreases. Onlyincaseswitharelativelysmall
number of units and large number of clusters do the approaches based on the relocation
algorithmperformslightlybetterthanthosebasedon k-means. Thesearealsocaseswhere
usingtherelocationalgorithmafter k-meansisobviouslybeneﬁcial. Inmostothercases,no
clearsuggestionsregardingthiscanbegiven.
l
l
l
l
l
l l l ll
l
l
l
l l l ll ll
l
l
l
l ll ll ll
l
l
l
l ll l l ll
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l l l ll
l
l
l
l
l
l ll ll
l
l
l
l l l ll ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
l l l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
k: 2 k: 4 k: 8 k: 16
n: 200 n: 400 n: 800 n: 1600
0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24
−4
−2
0
−5
−4
−3
−2
−1
0
1
−2
−1
0
1
−0.5
0.0
0.5
1.0
Prob. of a tie in com. blocks
Mean relative error
Method
l
l
HclustWard2
RL move
RL move & ex.
k−means
k−means + RL move
k−means + RL move & ex.
Figure2: Valuesofthemeanrelativeerrorforthedifferentmethods. Eachlinerepresentsthe
resultsofasinglemethod. Thepanelsaredeterminedbythevaluesofthevariablesnumberof
groups(k)andnumberofunits(n)indicated(variablesandtheirvalue)inthegrayrectangles
onthetopandrightsidesoftheﬁgure.
Insomecases,wecanobservethattherelativeerrorislowerthan0,indicatingthatthe
algorithmsfoundapartitionwiththeerrorlowerthantheerrorofthepartitionthatwasused
togeneratethenetwork. Thisisacaseofoverﬁttingandhappenswherethereisnotenough
5
Itshouldhoweverbenotedthatonnetworkswith1600unitsthehierarchicalapproachonaverageneeded
5minutestocomplete,whichismoretimethanwasgivenfortheotherapproaches.
Comparing different methods for one-mode homogeneity blockmodeling ... 27
datatorecoverthetruestructure. Thatis,thisoccursincaseswherethedifferencebetween
theprobabilityofatiein ‘null’and ofthat in‘complete’ blocksisto low comparedto the
expected size of the blocks. As may also be observed in Figure 1, random tie generation
causesthatblockdensitiesarenotexactlyequaltotheprobabilityofatieinacertainblock.
Another interesting observation is that while for the relocation algorithms the relative
errormoreoftenthannotincreasesasthenumberofunitsincreases,thisisnotthecasewith
the k-means-based methods. Larger networks on one hand mean larger blocks which are,
especially at the relative low densities used in this simulation, easier to ﬁnd, while on the
otherhandtheyalsoincreasethecomputationalburden. Inasettingwith800unitsormore,
k-meansbasedalgorithmsclearlyoutperformtherelocationalgorithms.
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l
l
l
l
l
l
l l l ll
l
l
l
l l l l l ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l l
ll l
l
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
ll ll ll
l
l
l
l
ll ll
l
l
l
l
l
l
ll l l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
k: 2 k: 4 k: 8 k: 16
n: 200 n: 400 n: 800 n: 1600
0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24 0.08 0.12 0.16 0.20 0.24
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Prob. of a tie in com. blocks
Mean ARI
Method
l
l
HclustWard2
RL move
RL move & ex.
k−means
k−means + RL move
k−means + RL move & ex.
Figure 3: Values of the mean Adjusted Rand Index for the different methods. Each line
represents the results of a single method. The panels are determined by the values of the
variablesnumberofgroups(k)andnumberofunits(n)indicated(variablesandtheirvalue)
inthegrayrectanglesonthetopandrightsidesoftheﬁgure.
ResultsbasedonthemeanAdjustedRandIndex(Figure3)areevenmorefavorablefor
the k-meansversusrelocationalgorithmapproaches. Inalmostallcases,oneofthe k-means
basedapproachesisthebestandthedifferencesamongthemarenegligible. Onlyinlarge
k,small ncases(alargenumberofclustersandsmallnumberofunits)aretheapproaches
using the relocation algorithm slightly better, however ARI is relatively low here for all
methods. Asseenonthepreviousgraph,indirectapproachpreformswellonlyinthecase
28 Žiberna
ofalarge numberof units(800andespecially1600)and arelativelyhighprobabilityof a
tie in the ‘complete’ blocks. As expected, for all methods ARI rises as the probability of
a tie in ‘complete’ blocks rises and when the number of clusters decreases (due to larger
blocks). The effect of networks’ size on ARI depends on the methods used. In general, it
hasapositiveeffecton k-meansbasedapproachesandtheindirectapproach,andgenerally
anegativeeffectontherelocation-basedapproaches(withsomeexceptions). Asdiscussed
abovewithrespecttorelativeerror,largernetworksmeanlargerblockswhichare,especially
attherelativelowdensitiesusedinthissimulation,easiertoﬁnd,yettheyalsoincreasethe
computational burden. It seems that k-means based approaches are fast enough to use the
advantagesoflargernetworks(ofthesizesanalyzedhere),thatis,theyarenothamperedthat
muchbythegreatercomputationalburden.
5. Conclusion
One can ﬁnd several methods for one-mode blockmodeling of networks based on the
minimum sum of squares criteria. This paper aimed to compare these methods using a
simulation. The main aim was to compare approaches based on the relocation algorithm
(homogeneitygeneralizedblockmodeling(Žiberna,2007))anditsvariantswiththe k-means
algorithm(Žiberna,2020)andanapproachwheretherelocationalgorithmisusedtooptimize
thepartitionproducedby k-means. Inaddition,anindirectapproachwasconsidered.
Theoverallﬁndingisthatinmostcases,the k-meansbasedapproaches,namelyeither
k-meansaloneor k-meansfollowedbytherelocationalgorithm,performbestbothinterms
of the optimized criterion and the similarity between the original partition (the one used
for generating the network) and the one found by the algorithm. In addition, it is never
muchworsethanthebestapproachandthusgenerallyseemstobethesafestapproach. The
difference is negligible for networks with 400 units or smaller, but becomes very big for
large networks (1600 units), especially when there are 4 clusters or more (in case of large
networksandaclearstructure,thehierarchicalapproachcanproducesimilarresultsasthe
k-meansbasedapproach). Onlyinthecaseofrelativelysmallnetworksandalargenumberof
clustersdopurerelocationalgorithmsusuallyperformbetter. Asexpected,theuseofindirect
approachesisneverthebestapproach,atleastnotintermsoftheoptimizedcriterion.
Theresultsforwhetheritisbettertousejust k-meansor k-meansfollowedbyrelocation
algorithmsareinconclusive. Thelatterapproachisclearlyonlysuperiorforsmallnetworks
witharelativelylargenumberofclusters,wheretheuseofthepurerelocationalgorithmis
generallypreferred.
Withrespecttorelocationapproaches,theuseofversionsofalgorithmthatmovetoanew
partitionassoonasanimprovementisfoundisrecommendedbecauseevaluatingallpossible
transfers(ortransfersandexchanges)isprohibitiveforlargernetworks.
Of course, this study has its own limitations, mainly stemming from the fact that only
binary networks of selected sizes (from 200 to 1600), numbers of clusters (2 to 16) and
densitieswereevaluated. Thestudyalsoassumedthatthenumberofclustersisknown. A
numberofunsettledquestionsremainsthatcouldbeansweredusingsimulationstudies,for
example, ways of determining the correct number of clusters and what is the best way for
initializingthe k-meansalgorithm,tonamejustafew.
References
Anderson,C.J.,Wasserman,S.,&Faust,K.(1992).Buildingstochasticblockmodels. Social
Networks, 14(1–2),137–161.https://doi.org/10.1016/0378-8733(92)90017-2
Comparing different methods for one-mode homogeneity blockmodeling ... 29
Baier,D.,Gaul,W.,&Schader,M.(1997).Two-modeoverlappingclusteringwithapplica-
tionstosimultaneousbeneﬁtsegmentationandmarketstructuring.In Classiﬁcation
and knowledge organization(pp.557–566).Springer.https://doi.org/10.1007/978-3-
642-59051-1_58
Batagelj, V., Ferligoj, A., & Doreian, P. (1992). Direct and indirect methods for structural
equivalence. Social Networks, 14(1–2),63–90.https://doi.org/10.1016/0378-8733(92
)90014-X
Borgatti, S. P., & Everett, M. G. (1992). Regular blockmodels of multiway, multimode
matrices. Social Networks, 14(1–2),91–120.https://doi.org/10.1016/0378-8733(92)9
0015-Y
Breiger,R.L.,Boorman,S.A.,&Arabie,P.(1975).Analgorithmforclusteringrelationaldata
withapplicationstosocialnetworkanalysisandcomparisonwithmultidimensional
scaling. Journal of Mathematical Psychology, 12(3),328–383.https://doi.org/10.101
6/0022-2496(75)90028-0
Brusco, M. J., & Doreian, P. (2015a). An exact algorithm for the two-mode KL-means
partitioningproblem. Journal of Classiﬁcation, 32(3),481–515.https://doi.org/10.10
07/s00357-015-9185-z
Brusco,M.J.,&Doreian,P.(2015b).Areal-codedgeneticalgorithmfortwo-modeKL-means
partitioning with application to homogeneity blockmodeling. Social Networks, 41,
26–35.https://doi.org/10.1016/j.socnet.2014.11.007
Brusco,M.J.,&Steinley,D.(2007).Avariableneighborhoodsearchmethodforgeneralized
blockmodelingoftwo-modebinarymatrices.JournalofMathematicalPsychology,
51,325–338.https://doi.org/10.1016/j.jmp.2007.07.001
Brusco, M. J., & Steinley, D. (2011). A tabu-search heuristic for deterministic two-mode
blockmodelingofbinarynetworkmatrices. Psychometrika, 76(4),612–633.https://d
oi.org/10.1007/s11336-011-9221-9
Burt,R.S.(1976).Positionsinnetworks. Social Forces, 55(1),93–122.https://doi.org/10.23
07/2577097
Doreian,P.,Batagelj,V.,&Ferligoj,A.(1994).Partitioningnetworksbasedongeneralized
conceptsofequivalence.TheJournal ofMathematical Sociology,19(1),1–27.https:
//doi.org/10.1080/0022250X.1994.9990133
Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized blockmodeling of two-mode
networkdata. Social Networks, 26,29–53.https://doi.org/10.1016/j.socnet.2004.01.0
02
Doreian, P., Batagelj, V., & Ferligoj, A. (2005). Generalized blockmodeling. Cambridge
UniversityPress.
Doreian,P.,&Mrvar,A.(2009).Partitioningsignedsocialnetworks. Social Networks, 31(1),
1–11.https://doi.org/10.1016/j.socnet.2008.08.001
Eddelbuettel,D.,&Francois,R.(2011).Rcpp:SeamlessRandC++Integration.Journal of
Statistical Software, 40(8),1–18.https://doi.org/10.18637/jss.v040.i08
Eddelbuettel,D.,&Sanderson,C.(2014).RcppArmadillo:AcceleratingRwithhigh-perfor-
manceC++linearalgebra. Computational Statistics & Data Analysis, 71,1054–1063.
https://doi.org/10.1016/j.csda.2013.02.005
Hartigan,J.A.(1975). Clustering algorithms.Wiley.
Hartigan,J.A.,&Wong,M.A.(1979).AlgorithmAS136:A K-meansclusteringalgorithm.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1),100–108.
https://doi.org/10.2307/2346830
30 Žiberna
Holland,P.W.,Laskey,K.B.,&Leinhardt,S.(1983).Stochasticblockmodels:Firststeps.
Social Networks, 5(2),109–137.https://doi.org/10.1016/0378-8733(83)90021-7
Hubert,L.,&Arabie,P.(1985).Comparingpartitions. Journal of Classiﬁcation, 2(1),193–
218.https://doi.org/10.1007/BF01908075
Lorrain,F.,&White,H.C.(1971).Structuralequivalenceofindividualsinsocialnetworks.
The Journal of Mathematical Sociology, 1(1),49–80.https://doi.org/10.1080/002225
0X.1971.9989788
MacQueen,J.(1967).Somemethodsforclassiﬁcationandanalysisofmultivariateobserva-
tions.InL.Lecam&J.Neyman(Eds.), Proceedings of the Fifth Berkeley symposium
on mathematical statistics and probability(pp.281–297).
RCoreTeam.(2019).R:Alanguageandenvironmentforstatisticalcomputing(Version3.5.3)
[Computersoftware].RFoundationforStatisticalComputing.https://www.r-project
.org/
Snijders,T.A.B.,&Nowicki,K.(1997).Estimationandpredictionforstochasticblockmodels
forgraphswithlatentnlockstructure. Journal of Classiﬁcation, 14,75–100.
Steinley,D.(2003).LocaloptimainK-meansclustering:Whatyoudon’tknowmayhurtyou.
Psychological Methods, 8(3),294–304.https://doi.org/10.1037/1082-989X.8.3.294
Steinley,D.(2006).K-meansclustering:Ahalf-centurysynthesis. British Journal of Mathe-
matical and Statistical Psychology, 59(1),1–34.https://doi.org/10.1348/000711005
X48266
Van Mechelen, I., Bock, H.-H., & De Boeck, P. (2004). Two-mode clustering methods: A
structuredoverview. Statistical Methods in Medical Research, 13(5),363–394.
vanRosmalen,J.,Groenen,P.J.F.,Trejos,J.,&Castillo,W.(2009).Optimizationstrategies
fortwo-modepartitioning.JournalofClassiﬁcation,26,155–181.https://doi.org/10
.1007/s00357-009-9031-2
Vichi, M. (2001). Double k-means clustering for simultaneous classiﬁcation of objects
and variables. In S. Borra, R. Rocci, M. Vichi, & M. Schader (Eds.), Advances in
classiﬁcation and data analysis(pp.43–52).Springer.https://doi.org/10.1007/978-3-
642-59471-7_6
Ward,J.H.(1963).Hierarchicalgroupingtooptimizeanobjectivefunction.Journalofthe
American Statistical Association, 58(301),236–244.https://doi.org/10.2307/2282967
White,D.R.,&Reitz,K.P.(1983).Graphandsemigrouphomomorphismsonnetworksof
relations. Social Networks, 5(2),193–234.https://doi.org/10.1016/0378-8733(83)900
25-4
Žiberna, A. (2007). Generalized blockmodeling of valued networks. Social Networks, 29,
105–126.https://doi.org/10.1016/j.socnet.2006.04.002
Žiberna,A.(2019a).blockmodelingTest:AnRpackageforgeneralizedandclassicalblock-
modeling of valued networks (Version 0.3.5.9000) [Computer software]. R-Forge.
https://rdrr.io/rforge/blockmodelingTest/
Žiberna,A.(2019b). kmBlock: k-means like blockmodeling of one-mode and linked networks
(Version0.0.1.9102)[Computersoftware].R-Forge.https://rdrr.io/rforge/kmBlock/
Žiberna, A. (2020). k-means-based algorithm for blockmodeling linked networks. Social
Networks, 61,153–169.https://doi.org/10.1016/j.socnet.2019.10.006