Arheološki vestnik (Arh. vest.) 44, 1993, str. 269-293 269 The KOR Seriation Program and its applicability in Archaeological Research Andrej PLETERSKI and Tomaž ZWITTER Izvleček Tu predstavljamo rezultate prvih poizkusov z razvrstilnim programom KOR. Ta je v obeh svojih različicah (KOR50, KOR62) računsko preccj bolj zapleten od primerljivih programov. kljub temu pa še vedno zadovoljivo hiter. Poleg tega program dopušča vrsto nastavitev, s katerimi uporabnik lahko vpliva na način in intenzivnost računanja. Menimo, daje treba vsak tak program testirati na dejanskem arheološkem vzorcu. To smo tudi storili. Prvi rezultati, ki jih tu predstavljamo, okvirno kažejo na doseg takih metod razvrščanja. Oba razvrstilna programa sestavljata skupine nosilcev z enakimi lastnostmi. KOR62 je pri tem hitrejši, skupine pa bolj vidne. Ponuja nam več "dobrih" možnosti, med katerimi si lahko izberemo najustreznejšo. Poleg tega lahko skupine razporedi v stratigrafsko zaporedje, če imamo na voljo gradivo z najdišča, kjer so taki odnosi bili. Program za iskanje skupin je tako samo orodje; od naše priprave vhodnega gradiva, od naše sposobnosti razlage dobljenega rezultata, od naših delovnih korakov je odvisen končni uspeh. Vendar ni izključeno, da bo s pridobivanjem izkušenj ob razvrščanju nadaljnjih najdišč zanesljivost programa bistveno narastla. Abstract Here the authors present the results of the first experiments with the KOR scriation program. This is, in both of its versions (KORSO, KOR62), relatively complicated in terms of computations, in comparison with comparable programs, in spite of which, however, it is still sufficiently fast. Apart from this, this program permits a range of modes, with which the user can influence the type and intensity of calculation. It is posited here that it is important to test each of these types of programs on an actual archaeological sample. This was also undertaken. The first results, which are presented here, approximately show the potential of such a seriation method. Both of the seriation programs comprise groups of carriers with identical attributes. KOR62 is faster in this, whilst the groups are more obvious. It offers us more "good" options, from which the most suitable may be selected. Apart from this, the groups can be arranged in stratigraphic order, if material is available from a site where such relationship existed. The program for the definition of groups is, thus, only a tool, whose ultimate success is dependant on the preparation of the initial data set, our abilities to interpret the results achieved and our work steps. However, it cannot denied that the reliability of the program will probably essentially grow with the gaining of experience through the scriation of further sites. INTRODUCTION Seriation methods are a very popular theme in the area of the use of computers in archaeology. It is not our purpose here to present the history and state of research, as up to 1988 alone over 117 bibliographic units had been published (Herzog, Scollar 1988, 53). For example Stadler (Stadler 1984) and Legoux and Pčrin (Legoux, Pčrin 1990) give concise histories. The authors present here the results of the first experiments with the indigenous seriation program KOR. This is, in both of its variants (KOR50, KOR62), much more complicated than comparable '"reign programs with regard to computation, al-though, in spite of this, in is still sufficienlty fast. Apart from (his, the program permits a range of modes, with which the user can influence the type and intensity of calculation. It is posited that it is important to test each of these types of program on an actual archaeological sample. This was also undertaken. The brst results, which are presented here, generally indi- cate the potential of such seriation methods, although it cannot be denied that the reliability of the program will essentially grow with the gaining of more experience in the seriation of further sites. SOME PREVIOUS EXPERIENCE OF OTHER AUTHORS From the point of view of the user, it is important for him to know what he can expect from the seriation program and how lie can rely on it. Opinions about this amongst archaeologists are still frequently completely opposed to one another. Some swear by the objectivity of the program with which they work, others deny their applicability. Eggert, Kurz and Wot-zka (Eggert, Kurz, Wotzka 1980, 140) took a middle line more than dozen years ago, when they studied the applicability of a seriation program for chronological scriation. They showed that a mathematically good result is not necessarily also historically useful. They warned of this that the program seriates the material by groups. These are usually correctly seriated in relation to each other with regard to the chronological order, which was sought, although the order within in the group is arbitrary, and different with each experiment. However, "jumpers" ("Springer") also exist, which connect first with one group and then with another. They appear for at least two reasons; either as attributes, which only occaisionally appear, or as carriers with only a few attributes. A certain blurring can be expected on the boundaries of groups due to the (in)stability of the seriation. The groups can be unclearly defined, particularly when the material only slightly changes over time (ibidem, 137 ss). Certain dangers in seriation are known, which appear as a result of the attributes in the initial material. Too small a linkage of attributes can result in the inverting of the groups (Legoux, Perin 1990). However, in linkage, a chronological order from earliest to latest, or the opposite from latest to earliest, is equally possible. Further, it is advisable to analyse material from male and female graves separately, so that the groups will not combine by sex, or even by mixed chronological order (pers.comm. P. Stadler in: Daim. Lippert 1984, 69). Just as there exists an awareness that groups in the data can form for largely non-chro-nological reasons, e.g. groups of looted graves, technological and sociological differences etc. (Daim 1987, 41; Theune 1988, 12; similar: Beinhauer 1985, 155 ss). The majority of authors try with the aid of seriation to locate chronological groups, whilst its use is also known with regard to specific artefact types (Legoux, Perin 1990) and with caution for the analysis of other site data, e.g. different forms of burial type (Daim 1987, 28). They normal analyse the cleaned data; they remove all of the attributes, which appear only once and all of the carriers with only one attribute and usually all of the attributes, which appear throughout the entire data set, so that it cannot disturb the final picture. Extremely severe cleaning can lead to this, that there is almost nothing left, which can be analysed (Daim 1987, 41). Only rarely do the authors speak of the modes of group definition, of their boundaries. All define them with the aid of figures, which they recieve at the end of the computer or "hand" seriation. The boundaries of the groups ar defined by the places, where the groups of attributes begin or disappear (well presented in: Perin 1980, Fig. 73, 74). They still call for completely analysed material in the division into groups. Perhaps it is for this reason that it has already been "cleanded" and it is hoped that they have removed "jumpers"? Or on the basis of the logical assumption that it must belong somewhere? A clarification of this part of the process was not found in the papers by the authors. SI. /. Sedlo na Blejskem Gradu. Spol nosilcev. Šifre nosilcev in lastnosti po zaporednih mestih uvrstitve. Fig. 1: Sedlo on Bled Castle. Sex of the carriers. The numbers of the carriers and attributes arc equal to the sequence of ranking. 1-01 13-13 25-25 37-37 2-02 14-14 26-26 38-38 3-03 15-15 27-27 39-39 4-04 16-16 28-28 40-40 5-05 17-17 29-29 41-41 6-06 18-18 30-30 42-42 7-07 19-19 31-31 43-43 8-08 20-20 32-32 44-44 9-09 21-21 33-33 45-45 10-10 22-22 34-34 46-46 11-11 23-23 35-35 M-W 12-12 24-24 36-36 48-48 1-088/089 20-028 39-137 58-005 2-136 21-056 40-077 59-074 3-023 22-053 41-085 60-012 4-011 23-054 42-144 61-041 5-163 24-048 43-072 62-159 6-043 25-004 44-032 63-060/019 7-101 26-067 45-166 64-007 8-022 27-170 46-065 65-009 9-146 28-040 47-164 66-024 10-058 29-096 48-076 67-147 11 090 30-055 49-093 68-036 12-020 31-031 50-149 694)82 13-062 32-015 51-152 704)06 14-010 33-027 52-059 714)44 15-087 34-094 53-155 72-154 16-013 35-049 54-165 734)02 17-091 36-102 55-033 744)81 18-037 37-086 56-003 754)29 19-025 38-097 57-168 76-143 DESCRIPTION OF THE FUNCTION OF THE PROGRAM The purpose of every correlation program is to seek a logical sequence of attributes and their carriers. On a table with an organised sequence of attributes and their carriers on the axes, the appearence of attributes occurs in an organised manner. They combine in groups, which appear in chronological or other sequences. The size of the groups is usually varied. The interface between two sequential groups can be linked or stepped. Apart from this, attributes which appear unselectively in different groups, insert themselves between attributes, which only appear in specific groups of attributes. This means that a bell-shaped gauss-like distribution of the density of appearences within each group cannot be expected. A presumption about such distributions of the density of appearences is implicitly present in the majority of existing correlation programs. The presence of sudden cut-off points in the development of individual biological and cultu-ral-sociological environments shows that such programs are not always ideal for the description of real samples. Of course, the question of how to define the optimal order of the attributes and their carriers is decisive. In this article, the authors will describe two different approaches to this problem. In the first, "diagonal" approach (KOR50), the sample is organised, when the sum of the relative escaping distance between points, which mark the appearence of attributes in carriers, is minimal from the diagonal of the table. With this, it is permitted for individual attributes to have different weights, so that the diagonal position of the "weighted" attribute are more highly valued than that of the less important "light" attributes. The program also permits an additional option of allowing one to designate the "quarter" of the table, in which selected attributes should locate themselves. Such a minimum distance from the diagonal cannot possibly be sought directly, as the list of N attributes can be put down to the N x (N-l) x...x 3x2x 1 = N modes. It is necessary to help out with a statistical approach, by which an analogy front nature can be taken. If molten matter is coolcd to a crystaline state, something similar happens: the chaotic distribution of atoms in the molten state gently transforms into the ordered network of the crystal. If different atoms are present in the molten state, some of which are heavier than others, then these will stabilize and "freeze" in their place in the growing crystal before the others. It is also possible to talk of the "temperature" of the sample in the correlation program. At the start of the ordering process, the temperature of the sample is high. The sample is chaotic, whilst the high temperature permits optional exchange in the order of the attributes and their carriers. "Cooling" follows, which is undertaken in sequential steps. Bach step is 10% lower in temperature than the preceeding step. The program carries out the following operations within the parameters of a step: a) it selects two attributes and two carriers by accident, b) calculates the running distance of the full points in these two attributes and carriers from the diagonal, c) temporarily exchanges these two attributes and carriers with each other, d) calculates new running distances, as under (b), e) calculates the value of the exponent function, which has the difference between the old and new distances, divided by the momentary temperature, as an argument, f) if the value of the exponent function is less than that of the initially selected number at an interval of between 0 and 1, it carries out the actual exchange, otherwise it retains the old order, g) M - times repeats operations (a) to (f), h) if none of the M exchanges were successful, it concludes that the sample has frozen and ends the organisation process; however, it usually lowers the temperature for 10% and repeats operations (a) to (g) in the framework of the next temperature step. The scheme, described, has some important advantages. In the first phase of calculation, practically all attempts at exchange are successful, so that it quickly negates the influence of the ranking of the data on entry. Gradual cooling also usually permits some exchanges, which are disadvantageous to the momentary ranking of the sample, although it does prevent the ordering from ending at one of the localised maximums SI. 2. Sedlo na Blejskem Gradu. Pripadnost pokolenjem. Fig. 2: Sedlo on Bled Castle. Generational membership. of ordering and thus missing the globally most advantageous solution. It is also necessary to mention two disadvantages, as well as the advantages. The first is the calculation time. If N is the number of attributes or their carriers, then the calculation time increases at approximately with the fourth potential of N. In practice, this means that it is not possible to organise a table with more than a few hundred members on a microVax computer. We later further improved the program. The candidates for exchange (step a) were no longer selected exclusively, but precedence was given to those, which "sat" in their place in the table at a given temperature. The temperature steps were also different, the difference between the sequential temperatures was smaller during the stabilization of the sample in the ordering of the structure. The calculation was increased by about 10 times as a result of these changes. Micro Vax can now deal with a sample with 400 attributes and as many carriers in a few hours of CPU time. The other disadvantage is more in the nature of a principle. The idea that the points of the ordering of the sample should collect along a diagonal, is otherwise aesthetically and mathematically attractive, but frequently does not happen in reality. As the size of the sequential groups differ from one another, the spine of the ranked sample runs between the two opposite poles of the table in the form of an arc or, better still, in the form of a twisted curve and not by nature in a diagonal. Of course, the forms that this spine will take are not known beforehand, so that it pays to experiment with a alternative method to seriation. This alternative method (the KOR62 program) is also described as "ranked". The table is constructed here in a single operation, as opposed to that of KOR50. Firstly, the initial carrier is selected. The carrier, whose attributes best fit those of the first carrier, is placed in the second position. The third, which follows, is that, which has the best correlation with the first two, and so on, until all of the carriers are ranked. The carrier, which best correlates with the already ranked attributes, is that, which has the most dynamic weight. This is defined as the sum of the weights of those attributes, which are present in the carrier and have already been ranked. Recently ranked attributes count for more than less recently ranked attributes. That which is counted as a recently ordered attribute is indicated by the expected size of the group, the parameter, which must be set before the start of calculation. Finally, the sum of the already ordered attributes is decreased by a proportion of the attributes, present in the carrier, which have still not been ranked in the table. The program also permits the option of defining in the initial data, which of the carriers it should rank before and those, which it should rank after. It is possible, for example, to achieve the ranking of earlier graves before those of a later period with this specification of before: after pairs. SI. Sedlo na Blejskem Gradu. Skupine in spol nosilcev. Šifre lastnosti in nosilcev po zaporednih mestih uvrstitve. Fig. 3: Sedlo on Bled Castle. Groups and sex of carriers. The number of the carriers and attributes are equal to tile sequence of seriation. 1-36 13—42 254)8 37-24 2-4)9 14—16 26-32 38-12 3-23 15-10 27-27 394)5 4-39 16-44 28-15 40-11 5-38 17-28 294)3 41-14 6-45 18-26 30-20 42-47 7-34 19-43 314)1 43-35 8-48 20-33 324)2 44-18 9-29 21-16 33-06 45-13 10-31 22-17 344)4 46-37 11-30 23-40 354)7 47-21 12-22 24-25 36-41 48-19 1-058 204)02 394)27 584)24 2-029 21-143 404)10 594)15 3-081 22-144 414)97 60-137 4-031 234)12 424)85 614)13 5-168 244)41 434)48 624)90 6-4)07 254)32 444)04 634)56 7-060/019 264)72 454)54 644)44 8-003 27-159 464)62 654)53 9-154 284)86 474)11 664)25 10-152 29-102 48-163 674)28 11-165 30-166 49-136 68-146 12-155 314)94 504)88/08'/ 69-022 13-059 324)67 514)23 704)74 14-149 334)49 524)91 714)37 15-093 344)82 534)05 724)20 16-164 354)36 544)87 73 040 17-077 364)06 554)43 744N6 18-076 37-147 56-101 754)33 19-055 384)65 574X19 76-170 The advantage of this method is twofold. The ranking of a specific carrier is influenced only by its connections with already ranked carriers. This is more logical than the "diagonal" criteria (the KOR50 program), which is suitable for the ideal ordering of a sample, similar to a correct crystal, which actual archaeological samples do not resemble. Apart from this, the demands of calculation are essentially less and the speed of calculation is much greater than that in the KOR50 program. Calculation can be repeated with different initial carriers and the results can be compared with each other. The program can also independantly estimate, which of the results are more logical. In a moment, it chooses that, which has the greatest sum of dynamic weights of its individual carriers by their ranking in the table. This criteria is logical, although possibly not optimal. The authors also intend in the near future to experiment with other options for the selection of an optimal seriation result. It is hoped that this will increase the quality of the results achieved. ENTRY DESIGNATIONS, ENTRY AND EXIT DATA BASES The entry data base for KOR50 is composed of three parts. The first is the list of carriers with the SI. 4. Sedlo na Blejskem Gradu. Skupine in pripadnost poko-Fig. 4: Sedlo on Bled Castle. Groups and membership of generations. attributes, which they contain. The second is the list of attributes with their weights from 1 to 100 (whole numbers), which the user designates himself. It is also possible here to denote the quarter of the table, in which a specific attribute will seriate. The third section is the list of values of individual distances from the axes. These must be whole, positive numbers. The best result gives a linear fall in value of one. The number of values must be at least as great as is the greatest of the entry data of the most numerous type (carriers or attributes). With KOR62, this third section is exchanged with a list of carriers, which have before: after relationships. The intention here is to include data about the layering of site structures. Both forms of the program permit the user to set some of the entry designations for computation. With KOR50, the intensity of mixing can be defined, which means how many movements of a defined attribute in the average the program tests at each value of temperature. A one times greater intensity also means an equally lenght of calculation time, although this must not be too small, as the sample will naturally cool too quickly. Further, it is possible to designate how great the distance should be for the testing exchange of chosen carriers spread out around the average value. The best are values between 2.0 and half of the value of the intensity of mixing. We have two options to decrease the calculation time. We can begin with a lower starting temperature or we can increase the speed of cooling. However, each of these types of acceleration gives an inferior result. With KOR62, the entry designations are naturally different. It is possible to define a scale for the size of the group. Different types of weighting can be chosen. The weights of the attributes can be the same, without taking into account the number of appearences, equal to the number of appearences, the reverse of equal to the number of appearences, or as one designates in the entry data base. At the moment, the best weighting seems to be that which is equal to the number of appearences. Attributes, which appear more frequently, are heavier than infrequent ones. However, different weights above all have influence on the choice of the initial carrier. Namely, we have the option of designating the initial carrier ourselves and carrying out one seriation only, or we can leave the choice to the computer and tell it to test all of SI. 5. Sedlo na Blejskem Gradu. Zgradba skupin po spolu in razporeditev v času. Fig. 5: Sedlo on Bled Castle. Groups construction by sex and chronological order. 1 T ? jf 1 K TT | _ L _ _ 1 -- k ! L - ;; : ;; - ; = : ■ - i i E L \ j - : t ; ;; < 3 «j 5« - li r I : t < X d s 4- S T 3 H >* 1 1 » 1 ( A B A l> l - 5 6 M 7« i JI 1 - : - ■ - : — # t S it - 9 H B : < I-- h f A K 1 c' 4-i i f i 7 S l> 1 2 3 4 5 6 7 8 9 A d1 ? 9 9 B d d d" 9 d 9 C 9* d i SEE SI. 6. Sedlo na Blejskem Gradu. Skupine in spol nosilcev. Šifre lastnosti in nosilcev po zaporednih mestih uvrstitve. Fig. 6; Sedlo on Bled Castle. Groups and sex carriers. The numbers of the carriers and attributes arc equal to the sequence of seriation. 1-20 134)5 2-25 144)7 3-10 154)2 4-23 164)9 5-03 174)8 6-47 18-29 7-46 19-11 8-48 20-31 9-45 21-36 104)1 22-34 114)4 23-35 124)6 24-39 14)40 204)91 24)58 214)43 34)27 224)90 44)10 234)23 54)62 24-152 64)31 25-146 74)11 26-155 8-163 274)59 94)13 284)65 10 16,S 29-165 11-143 JO 119 12-136 31-093 134)25 324)77 144)29 33-164 154)87 344)72 164X15 35-166 17-101 364)12 184)88/089 37-144 194)22 384X17 25-28 37-18 26-41 38-30 27—4« 39-24 28-17 40-43 29-13 41-44 30-14 42-16 31-38 43-42 32-32 44-19 33-33 45-26 34-12 46-21 35-27 47-22 36-15 48-37 39-009 584 X M 4