Metodoloˇ skizvezki,Vol.17,No.2,2020,49–66 blockmodeling: AnRpackageforgeneralizedblockmodeling MihaMatjaˇ siˇ c,MarjanCugmas,Aleˇ s ˇ Ziberna ∗ UniversityofLjubljana,FacultyofSocialSciences,Ljubljana,Slovenia Abstract This paper presents the R package blockmodeling which is primarily meant as an imple- mentationofgeneralizedblockmodeling(morebroadlyblockmodeling)forvaluednetworks where the values of the ties are assumed to be measured on at least interval scale. Block- modelingisoneofthemostcommonlyusedapproachesintheanalysisof(social)networks, whichdealswiththeanalysisofrelationshipsorconnections,betweentheunitsstudied(e.g., peoples,organizations,journalsetc.). TheRpackageblockmodelingimplementsseveralap- proaches for the generalized blockmodeling of binary and valued networks. Generalized blockmodeling is commonly used to cluster nodes in a network with regard to the structure of their links. The theoretical foundations of generalized blockmodeling for binary and val- ued networks are summarized in the paper while the use of the R package blockmodeling is illustratedbyapplyingittoanempiricaldataset. 1. Introduction The aim of this paper so to present the R package blockmodeling which is primarily meantasanimplementationofgeneralizedblockmodelingforvaluednetworks. Anetworkisdefinedbythesetofnodes(alsocalledvertices,unitsoractors)andtheset of links among the nodes. These two sets determine a graph which describes the network’s structure. In,e.g. social sciences, thenodesoften representindividualsand thelinksamong them represent a selected (social) relationship among individuals. Additional data can be assignedtothenodes(e.g.,genderorage)andlinks(e.g.,thenumberofcontacts)todescribe theirproperties(alsocalledattributes)(Batageljetal.,2004). Since real-world networks may be large and complex, researchers try to simplify them tosmallerandmoreunderstandablestructuresthatareeasiertointerpret. Acommonwayof accomplishingthisgoalisablockmodelingapproachwhichpartitionsthenodesofanetwork anddeterminesthetiesamongthe(obtained)clustersofnodes(Batageljetal.,2004). Inthe social sciences, blockmodeling is also a very important explanatory tool for studying social roles because it is assumed that the way a cluster of nodes is embedded in the network structureiscloselyassociatedwiththenodes’socialrole(s)(BorgattiandEverett,1992). ∗ Correspondingauthor Emailaddresses: miha.matjasic@fdv.uni-lj.si(MihaMatjaˇ siˇ c), marjan.cugmas@fdv.uni-lj.si(MarjanCugmas),ales.ziberna@fdv.uni-lj.si(Aleˇ s ˇ Ziberna) 50 Matjaˇ siˇ cetal. While blockmodeling may entail several different methods, the focus of this paper is on generalizedblockmodelingofbinaryandvaluednetworksusingthe blockmodeling package fortheRprogramminglanguage(RCoreTeam,2018). The structure of this paper is as follows: In Section 2, we describe blockmodeling. In Section 3, we describe the R package blockmodeling and in the Section 4, we provide ex- amplesofthepackageuse,whileinSection5,wesummarisethemainfunctionalitiesofthe package. 2. Blockmodeling Blockmodeling is a set of approaches for partitioning nodes into clusters (also called positions) and simultaneously partitioning the links into blocks which are defined by the obtained clusters (Lorrain and White, 1971; Batagelj et al., 2004; ˇ Ziberna, 2007). A block isasubmatrixshowingthelinksbetweennodesfromthesameordifferentclusters. The concept of blockmodeling is presented in Figure 1 where the illustrative valued network is shown in both matrix form and as a graph (Figure 1a). The units are ordered in rows and columns according to their names (n1, n2, n3, ...). The units are then partitioned by considering the weights such that those with similar patterns of links are partitioned into the same clusters. The network is represented consistently represented in matrix form in accordance with the clusters obtained in matrix form so that units from the same clusters are placed next to each other and different clusters are separated by blue lines (Figure 1b). Nodes of the same cluster are coloured using the same colour in the corresponding graphic visualization. Thennodesfromthesameclustersareshriekedandrepresentedasnodesofa blockmodel, which is shown as both a matrix and a graph (Figure 1c). The block densities are provided in the matrix. These summarize the strength of the relationship within and between the clusters. It can be seen that one core cluster and two cohesive clusters were identified. The core cluster is linked with both cohesive clusters (and vice versa) whereas thecohesiveclustersarenotlinkedtoeachother. Thenodesareclusteredaccordingtosomenotionofequivalence(WassermanandFaust, 1994). The most commonly used are structural equivalence (Lorrain and White, 1971) and regular equivalence (White and Reitz, 1983), both originally defined for binary networks ( ˇ Ziberna, 2007). Two nodes are structurally equivalent if they are identically linked to the rest of the network (and to themselves), while the nodes are regularly equivalent if they are connected in the same way to equivalent others. Regular equivalence is a generalization of structural equivalence. While analysing valued networks, regular equivalence should be replacedby f-regularequivalence,where f referstoanyfunction,suchassum,maxormean ( ˇ Ziberna,2007). In practice, structural equivalence is probably the most commonly used type of equiv- alence ( ˇ Znidarˇ siˇ c, Ferligoj and Doreian, 2012). At the same time, regular equivalence has never achieved widespread use ( ˇ Ziberna, 2013), especially because it is rarely present in empirical data (Boyd and Jonas, 2001) and very sensitive to small changes in the network ( ˇ Znidarˇ siˇ c, Ferligoj and Doreian, 2012). Concerns have also been voiced about regular equivalence’sapplicabilitytosocialtheory(Boyd,2002). Intermsofgeneralizedblockmodeling,achosentypeofequivalencedefinesthepossible blocktypes(andviceversa,i.e.,theallowedblocktypesingeneralizedblockmodelingimply the type of equivalence). For example, when binary networks are analysed, and structural equivalence is used, only null (ideally there are no links) and complete (ideally there are all possiblelinks)blocksarepossible,whilewithregularequivalencenull,completeandregular 52 Matjaˇ siˇ cetal. Table1: Characterisationsofidealblocks( ˇ Ziberna,2007) Idealblockname Descriptionfor binary blockmodeling Descriptionfor valued blockmodeling Descriptionfor homogeneity blockmodeling null all0 a all0 b all0 c complete all1 d allvaluesatleastm d allequal e row-dominant anall1rowexists d arowwhereall valuesareatleastm exists d arowexistswhere valuesareallequal c col-dominant anall1column exists d acolumnwhereall valuesareatleastm exists d acolumnexists whereallvaluesare equal c row(-f)-regular atleastone1in eachrowexists d thefovereachrow isatleastm d foverallrowsequal column(-f)-regular atleastone1in eachcolumnexists thefovereach columnisatleastm foverallcolumns equal (f-)regular atleastone1in eachrowandeach columnexists thefovereachrow andeachcolumnis atleastm foverallrowsand allcolumns seperatelyequal row-functional exactlyone1in eachrowexists exactlyonetiewith valueatleastmin eachrowexists,all other0 maxoverallrows equal,allother values0 column-functional exactlyone1in eachcolumnexists exactlyonetiewith valueatleastmin eachcolumnexists, allother0 maxoverallrows equal,allother values0 a Anexceptionmaybecellsonthediagonal,theirvaluesshouldallbeequalto1. b Anexceptionmaybecellsonthediagonal,theirvaluesshouldallbeleastm. c Anexceptionmaybecellsonthediagonal,theirvaluesshouldbeequal. d Diagonal,theirvaluesshouldallbeequalto0. e Cells on the diagonal may be treated separately - their values should all be equal, however they can be differentfromthevaluesoftheoff-diagonalcells. blockmodeling: AnRpackageforgeneralizedblockmodeling 53 of the direct blockmodeling approach. Use of the blockmodeling package is demonstrated for(direct)generalizedblockmodelingonly. Somenon-generalizeddirectblockmodelingapproaches(fornon-signedandsignednet- works) are implemented in the dBlockmodeling package (Brusco, 2020), while generalized blockmodelingforbinarynetworksandsomedirectapproachesforsignednetworksareim- plementedinPajek(Batageljetal. 2004). 2.1.1. Conventionalblockmodeling Conventional blockmodeling (Doreian et al., 2005) is an indirect approach involving two steps: (i) obtaining a dissimilarity matrix on the nodes using a dissimilarity measure whichisconsistentwiththetypeofequivalenceselected(e.g.,correctedEuclideandistance (Batagelj, Ferligoj and Doreian, 1992) for structural equivalence); and (ii) clustering the nodes with a hierarchical clustering method (e.g. Ward’s agglomerative clustering method (Ward,1963)),basedonthedissimilaritymatrixobtained. Sincethesecondstepiswellsup- ported by other R packages, the blockmodeling package only provides functions for com- puting (dis)similarity matrices according to structural equivalence (sedist function) and regularequivalence(REGEfunctionandotherfunctions). 2.1.2. Generalizedblockmodeling With generalized blockmodeling, a blockmodel is directly obtained from the network databyoptimizingacriterionfunction,typicallywitharelocationalgorithm(Batageljetal., 1992). Differenttypesofequivalencesand/orblocktypescanbespecified. Generalized blockmodeling holds several advantages over conventional blockmodeling (Doreian, 2006; Doreian et al., 2005; Batagelj et al., 2006): (i) since the direct approach al- readyincludesthecriterionfunctionintheprocessofoptimizingpartitions,atleastalocally optimal solution will be obtained with the generalized approach; (ii) the partitions obtained by generalized blockmodeling frequently outperform those obtained with the conventional approach(atleastinthecaseofstructuralandregularequivalence);(iii)conventionalblock- modelinghasmainlybeenusedinaninductiveway,meaningthatresearchershaveaccepted what was delineated through the clustering procedure. Yet, researchers often possess some priorknowledgeabouttheglobalnetworkstructurethatcanbeincludedintheblockmodel’s specification. Examples of generalized blockmodeling use are found in Doreian et al. (2005), Mrvar and Doreian (2009), Cugmas, Ferligoj, and Kronegger (2016) and Cugmas et al. (2020). These examples include social relations in working settings, classroom networks, political unitnetworks,scientificcollaborationandcitationnetworks,sportnetworksandothertypes ofnetworks. Inthispaper,thefollowingtypesofgeneralizedblockmodelingareconsidered: (i) Generalized binary blockmodeling, which is intended for analysing binary networks. ThebinaryblockmodelingconceptispresentedthoroughlyinDoreianetal. (2005). (ii) Generalized valued blockmodeling, which was developed because earlier researchers were converting valued networks into binary networks and analysing them as binary networks. Thebinarizationwasaccomplishedbyrecodingvaluesabove(orequalto)a certainthreshold(often1)into1sandtheotherinto0s(seeDoreianetal.,2005),which however caused a loss of considerable amount of information. The valued blockmod- eling approach reduces the amount of information lost, although some loss may still occur. 54 Matjaˇ siˇ cetal. Valued blockmodeling may be seen as an extension of binary blockmodeling. It ex- tends the equivalence relations and thereby the definitions of possible block types by replacing the stipulations for 1 with analogous stipulations for the value m (the mini- mal value that characterizes the tie between a unit and either a cluster or another unit suchthatthistiesatisfiestheconditionoftheblock). Therefore,thecriterionfunction used in the valued blockmodeling measures block inconsistencies as the deviation of appropriatevaluesfromeither0orm( ˇ Ziberna,2007). (iii) Generalizedhomogeneityblockmodeling,whichisbasedontheideathatblocksshould beashomogeneousaspossiblewithrespecttosomeproperty. Accordingly,theincon- sistencies of an empirical block with respect to its ideal block are measured by the within-blockvariabilityofappropriatevalues. Oneofthetwovariabilitycriteriacanbeused: thesumofthesquareddeviationsfrom themeanorthesumofabsolutedeviationsfromthemedian( ˇ Ziberna,2007). 2.2. Prespecifiedblockmodeling A researcher can consider a prior knowledge concerning the ties among the clusters whileconductingblockmodeling(Doreianetal.,2005). Thismaybedonebyspecifyingnot only the number of clusters and allowed block types (the same for all blocks), but also by specifyingtheallowedblocktypesforeachblockseparately. Typically,onlyoneblocktype isspecifiedasallowedforatleastsomeblocks. 3. Packagedescription A stable version of the R package blockmodeling 1 is available from the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/bloc kmodeling while test versions are available from the R-Forge at https://r-forge.r-pr oject.org/R/?group id=203. The package has been around since 2007 and is currently writtenintheprogramminglanguagesR,CandFortran. Inthispaper,version1.0.0isused. The package supports generalized and indirect blockmodeling. For generalized block- modeling, one-mode, two-mode and multilevel networks (also linked networks ( ˇ Ziberna, 2020)) with one or more relations are supported. However, for purpose of clarity and sim- plicity, this paper is limited to generalized blockmodeling of one-mode single relational networks. Toobtainageneralizedblockmodelingsolution,aresearchermightwanttousethefunc- tionoptRandomParC,whichoptimizesaspecifiednumberofrandomlygeneratedpartitions based on the criterion function selected (to optimize a single partition, a researcher can use thefunctionoptParCthatoptimizesonlythesupplied(one)partition,althoughsoastoavoid alocalminimumthisisnotrecommended). Themainargumentsofthefunctionare: • M: anadjacencymatrixrepresentingthe(usuallyvalued)network. • k: thenumberofclusters. 1 The blockmodeling package leverages functions from a variety of other packages. Key computations use stats (R Core Team, 2019a), methods (R Core Team, 2019b), Matrix (Bates and Maechler, 2019), parallel (R CoreTeam,2019c)andothers. blockmodeling: AnRpackageforgeneralizedblockmodeling 55 • approach: the chosen generalized blockmodeling approach; 'bin' for generalized binary blockmodeling, 'val' for generalized valued blockmodeling and 'hom' for homogeneityblockmodeling. • regFun: the function f specifies regular block types (e.g., max-regular block when regFun = 'max'). The functionisonlyrelevantwhen the f-regular blocksare spec- ifiedbytheargumentblocks. • blocks: a vector with the names of allowed block types. At least two must be speci- fied for binary and valued blockmodeling. Possible types are: null ('nul'), complete ('com'), regular ('reg'), column-(function) regular ('cre') and row-(function) reg- ular block ('rre'). In the case of binary and valued blockmodeling, a researcher can also specify column-dominant block ('cfn') and row-dominant block ('rfn') and with valued blockmodeling a researcher can also specify average block ('avg'). The option “do not care” ('dnc') is also available. When pre-specification is used, the argumentisstatedintheformofanarray,asshownintheexamplesection. • rep: thenumberofdierentstartingpartitions. • nCores: thenumberofphysicalCPUcores tobeused. Allavailablebutonephysical CPUcoresareusedwhennCores = 0. • preSpecM: thevaluemmustbespecifiedonlyinthecaseofgeneralizedvaluedblock- modeling. To calculate only the value of a criterion function, a researcher can use the function critFunC. The same arguments apply as for the case of the function oprRandParC, ex- cept that k is replaced by a partition (a vector) clu and the arguments rep and nCores are omitted. OnceablockmodelandpartitionhavebeenobtainedbyeitheroptParCoroptRandom- ParC, a researcher can use the function IM to extract a blockmodel, the function clu to extractanobtainedpartitionorthefunctionerrtoextractthevalueofacriterionfunction. The package contains some other handy functions such as funByBlocks (which com- putesthevalueofafunction(meanbydefault)overblocksofamatrixdefinedbyapartition), plotMat(whichplotsanetworkinmatrixformbyconsideringthecorrespondingpartition) and functions for computing the adjusted and original Rand Index (e.g. crand2). A plot method that internally calls plotMat is available for S3 classes returned by optParC and optRandomParC. 4. Demonstrationofthepackageuse The use of various generalized blockmodeling approaches is illustrated using the Baker citation network data (Baker, 1992). Here, the nodes represent journals from the field of social work (the 20 journals listed in Table 2). There is an arc from journal i to journal j if journal i cited journal j. The values on the arcs correspond to the number of citations in 1985. The data can be loaded from the package blockmodeling using data('baker'). The diagonal values, representing the number of citations by papers from the same journals, are replaced with 0s (diag(baker) <- 0). The network can be visualized with the function plotMat. Sincethepartitionisnotyetobtained,thecluargumentisnotset. 56 Matjaˇ siˇ cetal. Table2: JournalsinSocialWorkCitationNetwork Label Journal AMH AdministrationinMentalHealth ASW AdministrationinSocialWork BJSW BritishJournalofSocialWork CAN ChildAbuseandNeglect CCQ ChileCareQuarterly CW ChildWelfare CYSR ChildrenandYouthServicesReview CSWJ ClinicalSocialWorkJournal FR FamilyRelations IJ6W IndianJournalofSocialWork JGSW JournalofGerontologicalSocialWork JSP JournalofSocialPolicy JSWE JournalofSocialWorkEducation PW PublicWelfare SCW SocialCasework SSR SocialServicesReview SW SocialWork SWG SocialWorkwithGroups SWHC SocialWorkinHealthCare SWRA SocialWorkResearchandAbstracts plotMat(baker, main = 'Baker Network Data', mar = c(1, 1, 3, 1), title.line = 2) Figure2isobtainedwiththefunctionplotMat. Tomaketheploteasiertoread,thecell valuesareautomaticallymultipliedbythefactor(inthiscase0.1)which(bydefault)places their absolute values in the range [0,100). The factor by which the values are multiplied is automaticallyselectedandreported,asnotedbelowtheplot. It is immediately apparent from Figure 2 that the network is relatively sparse, meaning thejournalsdidnottendtociteeachother. However,thehighestnumberofcitationsextends from SCW to SW and from SSR to SW. The latter journal is also the one which cited the highestnumberofotherjournals. 4.1. Binaryblockmodeling To analyse valued networks by using generalized blockmodeling for binary networks, a re-searchermustbinarizethevaluednetworkandforthisadoptoneofseveralways,suchas keepingallofthearcswithvaluesgreaterthan0. bakerBinar <- baker bakerBinar[bakerBinar > 0] <- 1 In all of the following examples, the function optRandomParC is used. The number of clusters is set to 2 or 3. The number of clusters is chosen arbitrarily by examining multiple partitionswithdifferentnumbersofclusters(2or3seemtobethemostappropriate)(alsosee Doreian et al., 2005). The 1000 randomly generated partitions are optimized and multiple coresareused. Forbinaryblockmodeling,theapproachargumentmustbesetto'bin'. blockmodeling: AnRpackageforgeneralizedblockmodeling 57 AMH ASW BJSW CAN CCQ CW CYSR CSWJ FR IJSW JGSW JSP JSWE PW SCW SSR SW SWG SWHC SWRA AMH ASW BJSW CAN CCQ CW CYSR CSWJ FR IJSW JGSW JSP JSWE PW SCW SSR SW SWG SWHC SWRA Baker Network Data 1 1 1 2 0 1 1 1 1 0 1 1 7 0 2 2 5 1 1 1 0 1 1 1 2 1 1 2 1 3 1 2 1 1 1 2 1 0 3 1 5 2 2 2 3 6 1 2 1 2 1 1 2 1 2 4 5 1 4 0 7 2 1 6 3 4 1 0 2 6 12 11 4 3 4 1 2 1 1 4 1 1 0 2 2 2 1 * all values in cells were multiplied by 0.1 Figure2: Bakernetworkdatainmatrixform 4.1.1. Structuralequivalence Ifstructuralequivalenceisused,onlynullandcompleteblocktypesarepossible. There- fore,avectorc('nul', 'com')isprovidedtoblocks(thestructuralequivalenceissetby thevectoroftheallowedblocktypes). resBinStr <- optRandomParC(M = bakerBinar, k = 3, rep = 1000, nCores = 0, blocks = c('nul', 'com'), approach = 'bin') The number of errors (inconsistencies) of the blockmodel then obtained is 47 (accessed via the function err). The obtained partition can be accessed with the function clu while theblockmodelcanbeseenintheformofanimagematrixusingthefunctionIM. Theimage matrixspecifiestheblocktypesbyblocks. ThefunctionIMshowstheimagematrixobtained (notthespecifiedone)withblockmodeling. IM(resBinStr) [,1] [,2] [,3] [1,] "nul" "nul" "com" [2,] "nul" "com" "com" [3,] "nul" "com" "com" Theimagematrixshowsthejournalsincluster2andcluster3citeeachotherbothwithin andbetweentheclusters. Journalsincluster1donotciteeachotheringeneral,buttheycite journalsincluster3. Cluster3canbeidentifiedasthemostcentralclusterwhilecluster1as aperipheralclusterbecauseinthisclusterthejournalsaregenerallynotcitedmuchbyother journals. The block densities can be calculated with the function funByBlocks and visualized with the function plotMat. Finally, the empirical network can be visualized in matrix form and in line with the blockmodeling solution that is obtained. When using the func- 58 Matjaˇ siˇ cetal. tion plotMat, the obtained partition has to be provided to the function by the argument clu. The latter is not necessary when using the function plot (the S3 method exists for the optMorePar class that is returned by the optRandomParC function), as shown below. The clustersofjournalsobtainedareseparatedbylinesinFigure3. plot(resBinStr, main = 'A Baker Network Data', mar = c(1, 2, 3, 1), title.line = 2) AMH BJSW CAN CCQ CYSR CSWJ FR IJSW JGSW JSP PW SWG SWHC ASW CW JSWE SSR SWRA SCW SW AMH BJSW CAN CCQ CYSR CSWJ FR IJSW JGSW JSP PW SWG SWHC ASW CW JSWE SSR SWRA SCW SW Baker Network Data 1 2 3 1 2 3 Block densities 0 1 4 2 8 10 7 9 10 * all values in cells were multiplied by 10 Figure 3: Matrix representation of the network of journals partitioned into 3 clusters using binaryblockmodelingwithstructuralequivalenceandthecorrespondingblockdensities It can be seen in Figure 3 (left) that the block densities are lowest in the null blocks, as expected. Among the null blocks, however, the density is highest in the block belonging to thelink(citing)fromcluster3tocluster1,whichreflectsatendencyforreciprocity. The most central cluster (cluster 3) only consists of two journals, SCW and SW, while cluster 2 contains the following journals: ASW, CW, JSWE, SSR and SWRA. All other journalsarelocatedintheperipheralcluster. 4.1.2. Regularequivalence Here regular equivalence is used and the number of clusters is set to 2. The regular equivalence is specified in the function optRandomParC by adding a regular block type amongthepossibleblocktypes. resBinReg <- optRandomParC(M = bakerBinar, k = 2, rep = 1000, nCores = 0, blocks = c('nul', 'com', 'reg'), approach = 'bin') ThepartitionedmatrixinFigure4showsthatasmallcluster(cluster1)ofjournalsexists that are not cited by any journal. The three journals in this cluster are AMH, IJSW and JSP. The first two journals cited SW while JSP cited SSR (all of the cited journals are in cluster 2). Thesecitationsrepresentinconsistentlinks(err(resBinReg)). The similarity of the obtained partitions can be measured with the Adjusted Rand Index (Rand, 1971; Hubert and Arabie, 1985), where the expected value is 0 in the case of two random partitions and the maximum value of the measure is 1 (in the event of two identical partitions). crand2(clu1 = clu(resBinStr), clu2 = clu(resBinReg)) blockmodeling: AnRpackageforgeneralizedblockmodeling 59 AMH IJSW JSP ASW BJSW CAN CCQ CW CYSR CSWJ FR JGSW JSWE PW SCW SSR SW SWG SWHC SWRA AMH IJSW JSP ASW BJSW CAN CCQ CW CYSR CSWJ FR JGSW JSWE PW SCW SSR SW SWG SWHC SWRA Baker Network Data 1 2 1 2 Block densities 6 31 * all values in cells were multiplied by 100 Figure 4: The network of journals partitioned into 2 clusters using binary blockmodeling with regularequivalenceandthecorrespondingblockdensities The value −0.12 confirms what is seen when comparing Figure 3 and Figure 4, i.e. that the partitions obtained (by using structural equivalence vs. regular equivalence) are very different. 4.2. Valuedblockmodeling The main dilemma in valued blockmodeling is how to determine the most appropriate valueofm. Thebestapproachistochooseavaluebasedonpriorknowledgeabouthowhigh the value of a tie should be for it to be considered as strong or relevant. In the absence of such prior knowledge, a researcher may refer to one of the guidelines provided by ˇ Ziberna (2007)orselectthemostappropriatembasedonthedistributionofalltievalues(Figure5). Number of citations Frequency 0 20 40 60 80 100 120 0 5 10 15 20 25 30 Figure5: Distributionofthenumberofcitationsamongthejournals Here,missettoamedianvalue(onlyvaluesgreaterthan0aretakenintoaccount),which is13. 4.2.1. Structuralequivalence To apply blockmodeling of valued networks, a researcher must set approach = 'val' and specify the value m by setting the argument preSpecM. In addition, the allowed block 60 Matjaˇ siˇ cetal. types (indirectly the type of equivalence) and number of clusters must be specified. The numberofclustersissettothree. resValStr <- optRandomParC(M = baker, k = 3, rep = 1000, preSpecM = 13, approach = 'val', blocks = c('nul', 'com'), nCores = 0) It can be seen in Figure 6 that cluster 1 and cluster 3 form a very clear symmetric core- periphery global network structure since the journals in cluster 1 (core; JSWE, SCW, SSR and SW) mutually cited each other and also cited those in cluster 3 (periphery; ASW, CW, CSWJ,SWG,SWHCandSWRA).Anotherinternallynon-linkedcluster(cluster2)ofjour- nalsexists. Somejournalsinthisclustercitedthejournalsinthecorecluster. JSWE SCW SSR SW AMH BJSW CAN CCQ CYSR FR IJSW JGSW JSP PW ASW CW CSWJ SWG SWHC SWRA JSWE SCW SSR SW AMH BJSW CAN CCQ CYSR FR IJSW JGSW JSP PW ASW CW CSWJ SWG SWHC SWRA Baker Network Data 2 1 3 1 2 1 1 2 2 3 6 1 0 1 2 2 3 5 1 2 1 2 4 5 1 1 2 1 2 1 4 6 12 11 0 2 1 3 1 0 2 7 6 4 4 3 4 0 1 1 1 1 0 1 1 1 0 1 2 1 1 1 1 1 1 2 2 2 5 1 1 7 0 1 1 1 2 1 1 4 1 2 2 2 1 0 1 * all values in cells were multiplied by 0.1 1 2 3 1 2 3 Block densities 47 4 23 1 0 1 12 2 1 Figure 6: The network of journals partitioned into 3 clusters using valued blockmodeling (m= 13)withstructuralequivalenceandthecorrespondingblockdensities 4.2.2. Regularequivalence In the case of valued blockmodeling with regular equivalence, a researcher must select thefunction f tospecifythetypeof f-regularblocks. Thisissettomaxbydefault,although it can also be set to sum, mean or other functions. Here, the max-regular block type is to be allowed and therefore the argument regFun within the function optRandomPar is set to 'max'. A regular block type is added to the vector of allowed block types. The number of clustersisarbitrarilysettotwoclusters. resValReg <- optRandomParC(M = baker, k = 2, rep = 1000, preSpecM = 13, approach = 'val', blocks = c('nul', 'com', 'reg'), nCores = 0, regFun = 'max') The blockmodel (image matrix) that is obtained is the same at that obtained by binary blockmodeling with regular equivalence, but the sizes and obtained partitions are differ- ent with a value of the Adjusted Rand Index of 0.1. There are more links in null blocks (comparedtobinaryblockmodelingwithstructuralequivalence),butthecorrespondinglink values are relatively low. Consequently, the sizeofthecluster withthejournalsthat are less cited(andciteless)isbigger. blockmodeling: AnRpackageforgeneralizedblockmodeling 61 ASW CW CYSR CSWJ JSWE SCW SSR SW SWG SWHC SWRA AMH BJSW CAN CCQ FR IJSW JGSW JSP PW ASW CW CYSR CSWJ JSWE SCW SSR SW SWG SWHC SWRA AMH BJSW CAN CCQ FR IJSW JGSW JSP PW Baker Network Data 1 1 1 2 7 2 2 5 1 1 1 0 1 1 2 1 2 1 3 1 2 1 3 1 5 2 3 6 1 2 1 1 0 2 2 2 1 1 2 2 4 5 1 4 1 7 6 3 4 6 12 11 4 3 4 0 2 1 1 0 2 1 2 1 1 4 1 1 0 2 2 2 1 0 1 1 1 1 0 0 1 1 1 1 1 1 2 * all values in cells were multiplied by 0.1 1 2 1 2 Block densities 14 1 1 0 Figure 7: The network of journals partitioned into 2 clusters using valued blockmodeling (m= 13)withmaxregularequivalenceandthecorrespondingblockdensities 4.3. Homogenityblockmodeling The homogeneity blockmodeling approach’s advantage over the valued blockmodeling approach is that no parameters (such as the binarization threshold or parameter m) need to be set. Therefore, it is very well suited as a preliminary or the main approach to valued networks when no prior knowledge about these values is available. Homogeneity block- modelingemphasizesthesimilarityoftiestrengthswithinblocksoverthepatternofties. 4.3.1. Structuralequivalence To use homogeneity blockmodeling, the approach argument must be set to 'hom'. To apply sum of squares homogeneity blockmodeling, the homFun argument must be set to 'ss'while,toapplyabsolutedeviationblockmodeling,theargumentmustbesetto'ad'in theoptRandomParCfunction. Becausethecomputationofinconsistenciesisverysimilarforsumofsquaresandabso- lutedeviationsblockmodeling,applicationofthefirstapproachisonlyshownhere. resHomSSStr <- optRandomParC(M = baker, k = 2, rep = 1000, approach = 'hom', homFun = 'ss', blocks = c('nul', 'com'), nCores = 0) Usually, the image matrix is not of interest in the case of homogeneity blockmodeling becausethenullblocksareaspecialcaseofcompleteblocksandthusonlyclassifiedasnull when the mean of the block is exactly 0, which rarely happens in practice. Instead, blocks with low block means are interpreted as null blocks (see ˇ Ziberna (2013) for another way of identifyingnullblocks). The results shown in Figure 8 suggests the global network structure of the journal cita- tion network can be characterized as a symmetric core-periphery structure. Here, the core cluster is cluster 2 because the corresponding journals (SCW, SSR and SW) not only cited each other, but also cited and were cited by other journals (according to the block densities, the peripheral journals cited the core journal more often than the other way around). All otherjournalsarelocatedintheperipheralclusterwithaveryfewcitationsfoundwithinthe cluster. 62 Matjaˇ siˇ cetal. AMH ASW BJSW CAN CCQ CW CYSR CSWJ FR IJSW JGSW JSP JSWE PW SWG SWHC SWRA SCW SSR SW AMH ASW BJSW CAN CCQ CW CYSR CSWJ FR IJSW JGSW JSP JSWE PW SWG SWHC SWRA SCW SSR SW Baker Network Data 1 1 1 2 0 1 1 1 1 0 1 1 7 0 1 2 2 5 1 1 0 1 1 1 2 1 1 1 2 2 1 3 1 1 1 2 1 2 1 1 1 4 1 0 2 2 2 1 1 0 3 1 5 2 2 2 1 2 1 3 6 2 1 1 2 1 2 1 4 4 5 0 7 2 1 6 3 4 1 0 2 6 4 3 4 12 11 * all values in cells were multiplied by 0.1 1 2 1 2 Block densities 1 7 15 68 Figure8: Thenetworkofjournalspartitionedinto3clustersusinghomogeneityblockmodeling (sumofsquares)structuralequivalenceandthecorrespondingblockdensities 4.3.2. Regularequivalence To apply blockmodeling with homogeneity regular equivalence, the regular block type mustbeaddedtothevectorofpossibleblocktypesinthefunctionoptRandomParCandthe f functionmustbedefined,e.g. 'max',asanargumentofregFun. resHomSSReg <- optRandomParC(M = baker, k = 2, rep = 1000, approach = 'hom', blocks = c('nul', 'com', 'reg'), regFun = 'max', nCores = 0) Given that the partition and blockmodel which are obtained are the same as those in the caseofstructuralequivalence,theyarenotinterpreted. 4.4. Pre-specifiedblockmodeling A blockmodel can be fully or partially specified (see the subsection Prespecified block- modeling). Thefollowinggivesanexampleoftheuseofpre-specifiedblockmodels. Inthecaseofajournalcitationnetwork,aresearchermightpossesspriorknowledgethat the global network structure is symmetric core-periphery, i.e. there are some journals (the core) which are cited by most journals, while other journals (the periphery) cite journals in the core and not those in their own cluster. Therefore, the pre-specified blockmodel may be representedbythefollowingimagematrix: preImageReg <- rbind(c('com', 'reg'), c('reg', 'nul')) Here, the blocks connecting the core and the periphery are of the regular type. Alterna- tively, a researcher can assume these blocks can be of the regular or complete type. When thisisthecase,theimagematrixmustbespecifiedasanarray. preImageRegCom <- array(NA, dim = c(2, 2, 2)) preImageRegCom[1,,] <- rbind(c('com', 'reg'), c('reg', 'nul')) preImageRegCom[2,,] <- rbind(c('com', 'com'), c('com', 'nul')) To apply pre-specified blockmodeling, the above matrix or array must be provided as the argument to blocks within the function optRandomParC. To apply valued blockmod- eling with m = 13, the approach and preSpecM arguments must be set to 'val' and 13, respectively. blockmodeling: AnRpackageforgeneralizedblockmodeling 63 resValPre <- optRandomParC(M = baker, k = 2, rep = 1000, preSpecM = 13, approach = 'val', blocks = preImageRegCom, nCores = 0) Theobtainedimagematrix(blockmodel)isthefollowingone, IM(resValPre) [,1] [,2] [1,] "com" "reg" [2,] "reg" "nul" indicating that the journals CW, SCW, SSR and SW are all part of a closely connected core (cluster 1) while other journals are classified in the periphery (cluster 2). The core and the periphery are connected with max-regular links and the density (Figure 9) is higher within the block that links periphery to the core than within the block that links the core to theperiphery. CW SCW SSR SW AMH ASW BJSW CAN CCQ CYSR CSWJ FR IJSW JGSW JSP JSWE PW SWG SWHC SWRA CW SCW SSR SW AMH ASW BJSW CAN CCQ CYSR CSWJ FR IJSW JGSW JSP JSWE PW SWG SWHC SWRA Baker Network Data 2 2 5 1 1 7 0 1 3 3 6 1 0 1 5 2 2 2 1 2 1 1 4 5 2 1 2 1 2 1 4 6 12 11 0 7 2 1 3 4 1 0 2 6 4 3 4 1 1 2 1 0 1 1 1 1 0 1 1 1 1 0 1 1 2 1 3 2 1 1 2 1 2 1 1 1 2 1 4 1 1 2 2 1 1 0 2 * all values in cells were multiplied by 0.1 1 2 1 2 Block densities 49 12 5 1 Figure9: Thenetworkofjournalspartitionedinto2clustersusinghomogeneityblockmodeling (sumofsquares)max-regularequivalenceandthecorrespondingblockdensities 5. Conclusion Generalized blockmodeling is an approach for finding clusters of equivalent units in a network and for determining the ties among these units. As such, it is used to study global network structures and the (social) positions of the units. While generalized blockmodel- ing is also implemented in the Pajek software (Batagelj et al., 2004), the implementation of generalized blockmodeling in the blockmodeling package for the R programming language, which is presented in this paper, is the only one that supports also the blockmodeling of valued networks and the generalized blockmodeling of more complex networks (e.g., mul- tilevel,multi-relational). Inaddition,italsosupportssomeotherblockmodelingapproaches (indirectapproach)besidesgeneralizedblockmodeling. This paper demonstrates the use of the blockmodeling package for generalized block- modeling of binary and valued one-mode networks on a real network data set, namely Baker’s data (Baker, 1992) set on citing among the journals. Based on the examples given, 64 Matjaˇ siˇ cetal. itisclearthatblockmodelingsolutionscanvaryacrossdifferentblockmodelingapproaches, underlining the fact that a prior knowledge concerning the analysed networks is crucial, not onlyforthechoiceofthemostappropriateblockmodelingapproach,butalsowhenitcomes tointerpretingtheresultsobtained. Ultimately,thispaper,togetherwiththepackagedocumentation,canserveasabasisfor analysingmorecomplexnetworksandfurtherexplorationsofthepackage’scapabilities. Acknowledgment This research was financially supported by the Slovenian Research Agency (http: //www.arrs.si) within the research program P5-0168 and the research project J7-8279 (Blockmodelingmultilevelandtemporalnetworks). References [1] Baker,D.(1992): Astructuralanalysisofthesocialworkjournalnetwork: 1985–1986. JournalofSocialServiceResearch,15,153–167. [2] Batagelj,V.,Bock,H.,Ferligoj,A.,and ˇ Ziberna,A.(2006): Datascienceand classifi- cation.Berlin: Springer. [3] Batagelj, V., Doreian, P., Ferligoj A., and Kejˇ zar, N. (2014): Understanding large temporalnetworksandspatialnetworks: Exploration,patternsearching,visualization andnetworkevolution.NewYork,NY:JohnWiley&Sons. [4] Batagelj, V., Ferligoj, A., and Doreian, P. (1992): Direct and indirect methods for structuralequivalence.SocialNetworks,14,63–90. [5] Batagelj,V.,Mrvar,A.,Ferligoj,A.,andDoreian,P.(2004): Generalizedblockmodel- ingwithPajek.Metodoloˇ skizvezki,1,455–467. [6] Bates,D.andMaechler,M.(2019): Matrix: Sparseanddensematrixclassesandmeth- ods. R package version 1.2-17. Retrieved from https://cran.r-project.org/web /packages/Matrix/index.html. [7] Borgatti,S.P.andEverett.M.G.(1992): Notionsofpositioninsocialnetworkanalysis. SociologicalMethodology,22,1–35. [8] Boyd, J.P. (2002): Finding and testing regular equivalence. Social Networks, 24, 315– 331. [9] Boyd, J.P. and Jonas, K.J. (2001): Are social equivalences ever regular? Permutation andexacttests.SocialNetworks,23,87–123. [10] Brusco, M. (2020): dBlockmodeling: Deterministic blockmodeling of signed, one- mode and two-mode networks. R package version 0.2.0. Retrieved from https://CR AN.R-project.org/package=dBlockmodeling. [11] Cugmas, M., DeLay, D., ˇ Ziberna, A., and Ferligoj, A. (2020): Symmetric core- cohesive blockmodel in preschool children’s interaction networks. PLOS ONE, 15, e0226801. blockmodeling: AnRpackageforgeneralizedblockmodeling 65 [12] Cugmas, M., Ferligoj, A., and Kronegger, L. (2016): The stability of co-authorship structures.Scientometrics,106,163–186. [13] Doreian, P. (2006): Some open problem sets for generalized blockmodeling. In V. Batagelj, H.-H. Bock, A. Ferligoj, A. ˇ Ziberna (eds.): Data science and classification, 119–130.Berlin: Springer. [14] Doreian, P., Batagelj, V., and Ferligoj, A. (2005): Generalized blockmodeling. Struc- turalanalysisinthesocialsciences.NewYork,NY:CambridgeUniversityPress. [15] Funke, T. and Becker, T. (2019): Stochastic block models: A comparison of variants andinferencemethods.PLOSONE,14,e0215296. [16] Holland, P.W., Laskey, K.B., and Leinhardt, S. (1983): Stochastic blockmodels: First steps.SocialNetworks,5,109–137. [17] Hubert, L. and Arabie, P. (1985): Comparing partitions. Journal of Classification, 2, 193–218. [18] INRA, L. J. B. (2015): blockmodels: Latent and stochastic block model estimation by a ‘V-EM’ algorithm. R package version 1.1.1. Retrieved from https://CRAN.R-pro ject.org/package=blockmodels. [19] Lorrain, F. and White, C. H. (1971): Structural equivalence of individuals in social networks.JournalofMathematicalSociology,1,49–80. [20] Matias, C. and Miele, V. (2020): dynsbm: Dynamic stochastic block models. R pack- ageversion0.7.Retrievedfromhttps://CRAN.R-project.org/package=dynsbm. [21] Mrvar, A. and Doreian, P. (2009): Partitioning signed two-mode networks. Journal of MathematicalSociology,33,196–221. [22] Peixoto, T. (2020). Bayesian Stochastic Blockmodeling. In P. Doreian, V. Batagelj, A. Ferligoj (eds.): Advances in Network Clustering and Blockmodeling, 289–332. Nex York,NY:Wiley. [23] R Core Team. (2018): R: A language and environment for statistical computing. R FoundationforStatisticalComputing,Vienna,Austria.Retrievedfromhttps://www. R-project.org. [24] R Core Team. (2019a): R: A language and environment for statistical computing. R FoundationforStatisticalComputing,Vienna,Austria.Retrievedfromhttps://www. R-project.org. [25] R Core Team. (2019b): R: A language and environment for statistical computing. R FoundationforStatisticalComputing,Vienna,Austria.Retrievedfromhttps://www. R-project.org. [26] R Core Team. (2019c): R: A language and environment for statistical computing. R FoundationforStatisticalComputing,Vienna,Austria.Retrievedfromhttps://www. R-project.org. 66 Matjaˇ siˇ cetal. [27] Rand, W. M. (1971): Objective criteria for the evaluation of clustering methods. Jour- naloftheAmericanStatisticalAssociation,66,846–850. [28] Snijders, T. and Nowicki, K. (1997): Estimation and prediction for stochastic block- modelsforgraphswithlatentblockstructure.JournalofClassification,14,75–100. [29] Ward, J. H. (1963): Hierarchical grouping to optimize an objective function. Journal oftheAmericanStatisticalAssociation,58,236–244. [30] Wasserman, S. and Faust, K. (1994): Social network analysis: Methods and applica- tions.Cambridge: CambridgeUniversityPress. [31] White, D. R and Reitz, K. P. (1983): Graph and semigroup homomorphisms on net- worksofrelations.SocialNetworks,5,193–234. [32] ˇ Ziberna, A. (2007): Generalized blockmodeling of valued networks. Social Networks, 29,105–126. [33] ˇ Ziberna, A. (2013): Generalized blockmodeling of sparse networks. Metodoloˇ ski zvezki,10,99–119. [34] ˇ Ziberna, A. (2020): Blockmodeling linked networks. In P. Doreian, V. Batagelj, A. Ferligoj (eds.): Advances in network clustering and blockmodeling, 267–287. New York,NY:JohnWiley&Sons. [35] ˇ Znidarˇ siˇ c, A., Ferligoj, A., and Doreian, P. (2012): Non-response in social networks: Theimpactofdifferentnon-responsetreatmentsonthestabilityofblockmodels.Social Networks,34,438–450.