ON a o\ o\ a\ [f^j [«s S i I 0 Informatica An International Journal of Computing and Informatics Special Issue: Information Society and Inteligent Systems Guest Editors: Cene Bavec, Matjaž Gams The Slovene Society Informatika, Ljubljana, Slovenia X Informatica An International Journal of Computing and Informatics Basic info about Informatica and back issues may be FTP'ed from ftp. arnes. si in magazines/informatica ID: anonymous PASSWORD: FTP archive may be also accessed with WWW (worldwide web) clients with URL:http ://www2.ij s.si/"mezi/informatica.html Subscription Information Informatica (ISSN 0350-5596) is published four times a year in Spring, Summer, Autumn, and Winter (4 issues per year) by the Slovene Society Informatika, Vožarski pot 12, 1000 Ljubljana, Slovenia. The subscription rate for 1999 (Volume 23) is - DEM 100 (US$ 70) for institutions, - DEM 50 (US$ 34) for individuals, and - DEM 20 (US$ 14) for students plus the mail charge DEM 10 (USS 7). Claims for missing issues will be honored free of charge within six months after the publication date of the issue. Tech. Support: Borut Žnidar, Kranj, Slovenia. Lectorship: Fergus F. Smith, AMIDAS d.o.o., Cankarjevo nabrežje 11, Ljubljana, Slovenia. Printed by Biro M, d.o.o., Žibertova 1,1000 Ljubljana, Slovenia. Orders for subscription may be placed by telephone or fax using any major credit card. Please call Mr. R. Mum, Jožef Stefan Institute: Tel (+386) 61 1773 900, Fax (+386) 61 219 385, or send checks or VISA card number or use the bank account number 900-27620-5159/4 Nova Ljubljanska Banka d.d. Slovenia (LB 50101-678-51841 for domestic subscribers only). J i According to the opinion of the Ministry for Informing (number 23/216-92 of March 27, 1992), the scientific ; journal Informatica is a product of informative matter (point 13 of the tariff number 3), for which the tax of traffic \ amounts to 5%. t i f f Informatica is published in cooperation with the following societies (and contact persons): Robotics Society of Slovenia (Jadran Lenarčič) Slovene Society for Pattern Recognition (Franjo Pemuš) Slovenian Artificial Intelligence Society; Cognitive Science Society (Matjaž Gams) Slovenian Society of Mathematicians, Physicists and Astronomers (Bojan Mohar) Automatic Control Society of Slovenia (Borut Zupančič) Slovenian Association of Technical and Natural Sciences (Janez Peklenik) Informatica is surveyed by: AI and Robotic Abstracts, AI References, ACM Computing Surveys, ACM Digital Library, Applied Science & Techn. Index, COMPENDEX*PLUS, Computer ASAP, Computer Literature Index, Cur. Cont. & Comp. & Math. Sear., Current Mathematical Publications, Engineering Index, INSPEC, Mathematical Reviews, MathSci, Sociological Abstracts, Uncover, Zentralblatt für Mathematik, Linguistics and Language Behaviour Abstracts, Cybemetica Newsletter The issuing of the Inforwaticajoumal is financially supported by the Ministry for Science and Technology, Slovenska 50, 1000 Ljubljana, Slovenia. Post tax payed at post 1102 Ljubljana. Slovenia taxe Percue. Introduction: Information Society and Intelligent Systems This "Information Society" special issue of Informatica consists of selected papers, presented at the "Information society - IS'99" international conference held in Ljubljana, Slovenia. The conference papers were modified and typically extended for around 30%. The multiconference included five independent conferences: Information Society and Intelligent Systems, Education in Information Society, Data Mining and Warehouses, Development and Reengineering of Information Systems, Biology and Cognitive Sciences. All of them are related to information society. Most of the papers are dealing with information society in general and with information society in Slovenia. No doubt we are rushing into the information society. Information, the Internet and intelligent systems characterize this, the most developed era of human civilization. In the information age, everybody has to adapt constantly. Even the mighty Microsoft, the undisputed king of SW-tools generation, is trembling in new circumstances. While big companies are intensively searching for new approaches and markets, an Internet society of millionaires is emerging in developed countries. These nouveau richies do not need enormous investments or decades of hard work. What they need is a good idea to be implemented on the Internet. Risks may be high, but incomes are enormous. Internet business is growing by 70-80% each year. One year ago we had to observe Europe lagging behind USA. This year, Europe is speeding into information society with the same rate as America. Finland and some European countries are among most developed information countries in the world. Nokia, Finnish telecommunications and information giant, controls over 50% of the mobile telephone market. Slovenia faces a kind of stagnation in 1999, opposite to the year before. Previous successful introduction of the Internet and information society was strongly correlated with the growth of national providers such as ARNES ("Academic and Research Network of Slovenia") for science and education, and CVI ("Government Centre for Informatics") for governmental institutions. Mobile telephony in Slovenia is growing rapidly because of the end of state monopoly. In other telecommunication areas, state monopoly remains the greatest obstacle for further progress. This in turn decreases the Internet and information society growth in Slovenia. Institutions, politicians and leading managers in Slovenia spend much of their energy fighting for their positions. Instead, they should take a clear position towards progress of information society and its introduction into private enterprises and governmental institutions. For example, unlike many Eastern European countries we have not managed to establish our national ACM chapter or national Information society Forum. Next year we are going to try again. Information society is the only possibility of development if Slovenia wants to catch up with the most highly developed countries in Europe and elsewhere. The process of becoming a member of the European Union is a unique opportunity for Slovenia to decide clearly in favor for development strategy, and thus to become prepared to enter into the third millennium and into the information society. The special issue consists of contributions grouped into general papers and technical applications. The first five papers in the special issue deal with information society in general, and often also with its introduction in Slovenia: The first paper by M. Gams "Information Society and the Intelligent Systems Generation" describes general properties of information society and intelligent systems. It is claimed that information society promotes a primitive network intelligence displayed as connected autonomous intelligent agents. A short history introducing intelligent systems and agents in Slovenia is presented. The "A New Perspective in Comparative Analysis of Information Society Indicators" paper by P. Sicherl analyses number of hosts in European countries and in particular, relations to Slovenia. Although Slovenia's position is relatively good - above European average, recent years indicate certain stagnation in comparison to most developed European countries like Finland. V. Vehovar and M. Kovačič similarly to Sicherl conclude that Slovenia is generally on European average with respect to the penetration of the information technologies in society. In the paper "Measuring Information Society: Some Methodological Problems" they analyse different technical indicators and attitudinal measures and not just the number of hosts. Another modelling and visualisation of information society development is presented in "Modelling of an Information Society in Transition - Slovenia's Position in the CE Countries" by M. Krisper and T. Zrimec. Different techniques are used, from clustering, portfolio, diagramming etc. Six Central European countries associated to the European Union are compared. Results show that Slovenia is often most closely associated with EU. K. H. lizuka and M. Wada in "Customer Satisfaction of Information System Integration Business in Japan" describe customer satisfaction as one of the major motivating factors in information society. They analyse and propose elements that affect total customer satisfaction. The next group of papers describes information society and an additional technical subject, e.g. digital signatures; First of these papers is the "Digital Signatures Infrastructure" by T. Klobučar and B. Jerman Blažič. They present an infrastructure for the use of digital signatures, technical aspects, and a short overview of several existing legal frameworks. E. Jereb and B. Šmitek in "Using an Electronic Book in Distance Education" present experience they obtained by forming an electronic book in distant education, and student opinion studying in the new Distance learning centre. In "Multi-Attribute Decision Modeling: Industrial Applications of DEX" M. Bohanec and V. Rajkovič describe application of a decision-support system DEX. M. Ankerst, C. Elsen, M. Ester and H.-P. Kriegel describe algorithms and system for data visualisation in the "Perception-Based Classification" paper. Large networks can be reduced into a smaller comprehensible structure that can be easier interpreted as proposed by V. Batagelj, A. Ferligoj, and P. Doreian in "Generalized Blockmodeling". Clustering methods are described in the "Adapted Methods For Clustering Large Data-Sets Of Mixed Units" by S. Korenjak Čeme. I. Nančovska, L. Todorovski, A. Jeglič and D. Fefer describe an application of two systems - an equation discovery system and a neural network in "Equation Discovery System and Neural Networks for Short-term DC Voltage Prediction". Soft computing and neural network methods are applied in "Adaptive On-line ANN Learning Algorithm and Application to Identification of Non-linear System" as presented by D. Sha and V.B. Bajič. We hope that the special issue will be a contribution to information-society related efforts and that it will, from an independent, scientific aspect, give answers to a number of practical questions appearing in economy and in policy. Information society is based on the civil society of free-thinking individuals and of academic, economic and other groups offering the possibility to release the intellectual potential which will enlighten societies and countries, including Slovenia. Cene Bavec, Matjaž Gams Information Society And The Intelligent Systems Generation Matjaž Gams Jožef Stefan Institute, Jamova 39, Ljubljana, Slovenia Phone:+38661 1773900; Fax:+386 61 1251038 matjaz.gams@ijs.si, http://www2.ijs.si/~mezi/matjaz.hlml Keywords: information age, Internet, intelligent agents, overview paper, viewpoint Edited by: Cene Bavec Received: October 12, 1999 Revised: December 8, 1999 Accepted: December 19, 1999 In this overview paper we analyze basic laws and properties of the information society in general, and its introduction in Slovenia. It is claimed that information society initiated the emergence of primitive network intelligence demonstrated through intelligent assistants on the Internet. One of the key reasons for emergence of the new software generation is the growth of the Internet, and the other information overload. The introduction of intelligent systems, and particularly intelligent agents in Slovenia is analyzed. Finally, the EMA employment agent, one of important intelligent agent applications in Central Europe, is described in detail. 1 Introduction Information society is often seen as another step in the progress of human civilization. We are moving from post-industrial society and economy into information society and information-technology dominated economy. Changes essentially inlluence the way we work and live. By 2002, it is predicted that over 80 million Europeans will have access to the Internet, and that 5 % of EU gross domestic product will be affected by the use of digital systems. Great trends are expected in the Internet commerce. In 1998, nearly $8 billion sales were generated in USA by around 9 million American households. In comparison, only $1.2 billion were accounted for online shopping in Europe. Europe was and is significantly lagging behind USA. But in 1999, Europe progressed much faster. Forecasts predict that 500.000 e-commerce-related jobs will be created within the next few years. Each year, Web sales to consumers are expected to grow by 70%. This growth is expected to be exponential for a couple of forthcoming years. The technological basis of the information age is the Internet with its constant growth (Etzioni 1996; http://www.cio.com/WebMaster/metcallel.html). Another important improvement is the emergence of network intelligence that represents a natural step in the computational evolution heading towards more helpful, adaptive and creative programs. These programs are essential for humans because without intelligent assistants we can not cope with information overload. At the same time, the pace of progress is so quick and unpredictable in details that we can not determine future in any detail. What we can do is to recognize major information society laws (Lewis 1998; Metcalfe 1997) as described in Section 2. These laws are related to electronics, informatics, and the Internet. In Section 3 and 4 we analyze Slovenian introduction of information society and intelligent systems. The first major application of intelligent agents in Slovenia, the EMA employment agent, is described in Section 5. 2 Global Information Society Information society is by definition global, however, its implementation is to a large extend dependent on the GNP of a particular country. Therefore, while the state and the pace of progress depend on each country, the basic information society laws stay valid for the global world. Moore's Law (http://www.whatis.com/mooresla.htm) describes a constant trend in chip properties. The chip capacity doubles in a time span from L5 to 2 years depending on the type of particular performance of a chip. The formula is: Perlbrmance(new) = Performance(old) * 1.5 time where "lime" is the number of years: The basic property of the law - the constant exponential growth - remains unchanged over several decades (Moore 1975, Hamilton 1999). Metcalfe's Law (http://www.cuug.ab.ca/~branderr/csce/ metcalfe.html) says that the value of a network is proportional to the square of the number of nodes, connected by the network: Value =: K * nodes' In other words, the bigger the net, the square bigger the value. "K" is a constant. Sidgemore's Law determines the growth of traffic over nets. The law says that the traffic doubles every three months: Traffic(new) = Traffic(old) * 2'"'""' Andreesen's Law says that the cost of bandwidth is dropping exponentially and inversely proportional to Sidgemore's law: Cost(new) = Cost(old) * 1/2 (4 lime) Lewis/Flemig's Law describes the network type of capitalism. It denotes nearly "friction-free economy" in the sense that there is small marginal cost and a huge shelf space. The exponential growth indicates that a genuine new market idea will get awarded by huge profits. But in addition to quick rise, an exponential decline is expected when new, more advanced systems appear on the market. The equation describing the law is: MarketShare(time) = 1/(1+ K * B * time) where "K" is a constant. The "B" parameter denotes the learning parameter. Rules of the thumb: Put on the Internet all your information and information activities. This law means that it is cheaper to put information and information activities on the Web sooner than later (Petrie et al. 1998). Not only it is indeed cheaper and more cost-effective than when done in a standard way, it is also the only way to go along with competition. The cyber-world doubles fortune. Besides the material world we actually live in, the cyber-copy of our world matures. Since the introduction of the cyber-world in effect tends to double activities and money in circulation, stories of reach youngsters or rich Internet population in the devèloped countries are well grounded by a general trend. It also guarantees further growth of the developed world despite saturation in other human activities, which are related to classical material world. Another important trend is that our information systems on computers are becoming more and more a cyber-copy of ourselves. Side-effect of information society is information overload. In infosphere we have to cope with more and more information from one month to another in order just to stay competitive. As a consequence, the information overload causes disappearance of free time, it causes the brain overload and decrease in classical human social life. Information society demands intensive information knowledge for successful leadership. It is commonly accepted that there is a huge gap between existing knowledge of top executives, politicians and other leaders, and the desired knowledge for successful managing and leadership (O'Leary 1997). The gap is higher in Europe than in USA, and higher in Slovenia than in EU. Information society belongs to all of us. In a democratic society there are several institutions cooperating in the process of governing and creating strategic directions. Among essential institutions of democratic societies are civil institutions (Borenstein 1998). Information society is by definition a civil society although governmental institutions typically implement it. An example would be Clinton's advocating of information highway or several governmental information society projects in Europe; e.g. Bangemann's reports (http://europa.eu.int/comm/ dg03/speechba.htm). In countries like Slovenia, lacking richness of civil society structures developed in decades of Western democracy, the introduction of Internet is a major inhibitor of faster progress. The Internet is the most democratic and free media in the world. This was legally established with the American Supreme Court decisions about pornography and free speech on the Internet. In the simplest way it can be observed as a fact that pornography (inside "reasonable" limits) is allowed on the Internet and not on TV. It also means that anarchy and even criminal organizations can exploit this freedom, but the freedom of speech is accompanied by the fact that in such a case sooner or later we are going to hear things we don't like. Whatever the case, the Internet is the most democratic media humans ever had. While countries differ in their social and economic order, the Internet enhances democracy and civil society regardless of their previous level. The Internet and information society are our hope for the future. There have been many technological innovations that spurred human progress. For example, we speak of the "iron age" historic period. These days we speak of the "information age" or of the "information society" age. Not only new technology changes the way we live in the technical sense; the changes are essential also in the way society functions (Negroponte 1998). At the same time, the world of computer systems we use is rapidly changing due to the massive introduction of information activities. The trend is towards more user-friendly intelligent systems. 3 Slovenia and IS The introduction of information society and intelligent systems in Slovenia was accompanied with problems of all kinds. Slovenia is one of independent countries that emerged from former Yugoslavia. In those turbulent transitions, funds for science continued to decrease for at last 5 years. Having in mind that there are only a couple of computer R&D institutions in the country with sufficient critical mass of educated stuff, the introduction of information society looi^ed bleak. Surprisingly, the progress in turbulent times was faster than anticipated. There might be at least two reasons: first, Slovenia lacked state institutions and by not being burdened by old institutions, new ones could be more up to date. Second, due to the inevitable conflict in the independence days the Internet played substantial role in helping to inform and thus motivate public opinion in the West. The introduction of information society started with the introduction of the Internet at the Jozef Stefan Institute (http://www.iis.si) as result of a long-term cooperation with the Cern European Laboratory for Particle Physics (www.cern.ch/CERN/Technology/ index.html). Soon, the use spread through universities and schools. The major Internet provider was (and still is) ARNES (http://www.arnes.si/). After transformation into a market society, the economy trends changed to positive, but several state firms still faced substantial problems. Foreign investitures often bought firms and soon afterwards transformed research departments into production units. As a result, 45% of researchers and developers in the Slovenian R&D sector moved into other sectors. Universities and research institutions were not as heavily stressed because of transition problems. University stuff even increased while for example the major research institute, the Jozef Stefan Institute with over 900 employees declined to 700. The initial fast growth of the Internet hosts was slowed down after majority of governmental institutions got connected. Private institutions and individuals followed cautiously. Another important indicator is the number of people earning money through the Internet. While Bill Gates and CEOs of major computer companies dominate the list of world's richest, Slovenian computer professionals seldom appear on the list of nation richest. While in the developed countries an Internet generation of rich youngsters emerges, this phenomenon is substantially less present in Slovenia. Unlike in the developed countries, Slovenian political and partially business leaders often belong to the computer hardly literate generation. As a consequence, the real progress of information society is not as fast as it could be. Overall, unemployment rate in Slovenia has grown up substantially in- the last 10 years, especially in the first transition years. In recent years the unemployment in Slovenia is stable - with 125.000 unemployed and 2.000.000 inhabitants the unemployment rate is close to European average. On the other hand, Slovenia is currently one of the most perspective candidates for joining the European Union ba.sed on political stability and economic parameters. It borders Italy, Austria, Hungary and Croatia (see Figure I). GNP is roughly 10.000 US $. The number of computer connections to the Internet per capita is close to average in Western Europe. In recent years, all economic trends tend to be positive (with the exception of 7% inflation and growing depth). Even science and research, while not as supported and worshiped as they should be, are facing better times. Figure I: Slovenia's position in Europe. Today, practically all schools from universities to elementary schools, all research and development institutions, and large majority of other governmental institutions are connected to the Internet. Several R&D conferences are more or less strongly related to the introduction of IS in Slovenia: "Information society" http://ai.iis.si/i.s/indexa99.html. "Electronic commerce". Infos, BRK, Informatika etc. In 1999, the major improvement has been in the mobile phone area. The number of mobile phones is expected to soon overcome the number of classical telephones. 4 Intelligent Systems Intelligent systems are computer systems aimed at developing advanced user-friendly systems that work in real-life environments (Goonatilake, Treleaven 1995; Bielawski, Lewland 1991). The Internet is the media enabling substantial advantages for intelligent systems (Etzioni 1996; Etzioni 1997). Intelligent systems use a wide variety of artificial intelligence techniques typically implemented on top of classical modules (Bratko, Muggleton 1995): rule-based systems, production systems, expert systems, fuzzy logic systems, neural networks, memory-based reasoning. Advanced systems often combine various methods into one hybrid or integrated system (Gams et al., 1996). The emphasis of intelligent systems design is on combination of AI methods and engineering techniques enabling construction of systems performing practical tasks better than classical systems. Intelligent agents (Bradshaw 1997; Mueller 1996) arc a special branch of intelligent systems, capable of learning. 452 Informatica 23 ( 1999) 449-454 M. Gams adapting to the environment, to each specific user, and to each specific situation as much as possible. According to Pattie Maes (Maes at al. 1999) intelligent agents are an important step ahead in humanising computers. Intelligent agents represent personal assistants collaborating with the user in the same environment (Maes, 1994; Minsky, 1987; 1991). Intelligent agents are basically intelligent interfaces providing specific utilities of the system while the core of the system is typically an Internet based query or database system. Unlike passive query languages, agents and humans both initiate communications, monitor events and perform tasks. The essential properties of agents are autonomy and sociability (Bradshaw 1997; Jennings, Wooldridge 1995; 1997; Etzioni, Weld 1995). Intelligent systems have been developed in a couple of SW centers in Slovenia, among others in the Department of intelligent systems (headed by Prof. Ivan Bratko) at the Jozef Stefan institute. One example is an intelligent system for controlling quality of the Sendzimir rolling mill emulsion (Gams et al. 1996). Practically all national production of rolling steel is manufactured through this machine. The application represents one of major national intelligent system in regular industrial use. In addition to this application, the department has in the last ten years designed around 10 intelligent systems now available on the Internet http://turing.ijs.si/Ales/katalog-a/KATALOG-A.html. 5 The EMA Employment Agent In 1993, the first agent was designed in Slovenia - lOI, an Intelligent Operating Interface (Hribovsek 1994; Gams, Hribovsek 1996). The basic task of lOI was correcting typing errors and providing help for users communicating with the VAX/VMS operating systems. lOI is an intelligent agent able to learn, adapt, and communicate in a relatively complex environment with human users, Its most important property is self-learning through observing the user performing tasks in the environment. Later, lOI uses accumulated knowledge through user experience to advise new users. The system thus performs a task similar to MS Office Assistant with the essential difference that knowledge in Assistant is coded in advance while lOI constructs most of its knowledge through user observation. lOI is implemented as a 2000-line program in Pascal with parts of it written in the VAX/VMS command language. The lOI agent was implemented as a research prototype only, however, its flexibility and adaptability as a personal intelligent agent have shown reasonable improvements over classical systems. The most positive properties as observed by users in the testing period, are; lOI is easy to use, it does not demand specific knowledge, is easy to learn and use, and is very transparent. These favorable properties are typical intelligent-agent properties. Two other major intelligent agents developed in Slovenia are Personal WebWatcher (Mladenič 1998) and Ema (Gams et al. 1998). The EMA project (see Figure 2) started seven years ago as an R&D project "An Integrated Information System for Employment in Slovenia" to provide help regarding unemployment problems. The project was partly funded by the Slovenian Ministry of Science and Technology and partly by the Employment Service of Slovenia (ESS). The system consists of two parts; one is applied at the Employment Service of Slovenia (http://www.ess.gov.si/English/elementi-okviriev/F-Introduction.htm) where one gets basic information about employment activities in Slovenia, about ESS, and about interesting employment functions. The top part of the system is the EMA agent. For the last three years, the system was further developed as part of the INCO-Copernicus Project: 960154, cooperative research in information infrastructure, CRII (http://www-ai.ijs.si/~ema/proj.html). The intelligent system/agent EMA with a natural language interface consists of several modules. ■ '-Inl^ ...g „ŽS., & - iLlV'lployniciit Agent ■ ^"H'l^P'PB'rt^W'^ C" ^tt-oKtiuiij aiai o CvcQT 2-•'000 free jobs. »tj.-" (r..-iuant>j or ■ o, >t. I VP I. le m điff "üaidc Eouj' hb««. 1. ' 'V"^ «Uli. u 2." B.tgisii'e«! uoera t5T>« u!iei Figure 2: The first version of the EMÀ employment agent was among the first in the world to offer substantial amount of nationally available jobs on the Internet. A user has to identify with a username (note that here security is not very relevant) or through a favorite/bookmark list. EMA has four basic functional modules: storing patterns and ordering mails regarding vacant jobs, available workers, it enables storing and observing interesting WEB sites chosen by users, and enables matching jobs and workers. EMA is a "classical" agent providing user-friendly information upon demand or when it notices relevant information for each particular user. The system is a 15.000 lines program written mainly in C, partially in other languages. Together with text and data it occupies 30 M on a disk. EMA receives data as limited Slovenian text (with the exception of bulletin boards with language independent free input) and translates it into English. The translation is based on a dictionary consisting of up to four words observed before in the employment data. New combinations are in the worst case translated as direct word-by-word translation and stored for further overview by humans. Stored combinations are sorted by frequency and translated by humans if reasonable. In addition, the translation system looks into the morphology dictionary to capture different forms of the same words. Finally, a spell-checker submodule corrects spelling errors. The translation is currently not yet at the level performed by systems translating between larger European languages, however, it is sufficiently good to enable understanding since the syntax is quite limited. In the next stage, the text is transformed into appropriate computer readable forms and HTML forms as outputs. Two speech modules transform the data into speech. The English speech system is based on the Microsoft agents. We have designed our own Slovenian speech module (Sef et al. 1998). The EMA agent was and still is among most successful applications of intelligent agents in Slovenia. In the first year of its implementation, our country was the third in Europe to offer national employment information through the Internet. At that time, we were the first country in the world to provide over 90% of all nationally available jobs on the Internet. 6 Conclusion Without any doubt human civilization further evolves into information society. We are able to establish certain laws and rules of this development while specific details and further progress remain enigma to all of us. Intelligent systems and agents through the Internet and partially through PCs form the new software generation, the intelligent systems generation. While this generation still lacks true human intelligence and consciousness, the primitive network intelligence emerges consisting of intelligent assistants capable of autonomous and social activities (Munindar 1997; Mylopoulos 1997). In Slovenia, one of the countries wishing to join European Union, information society is perceived as a global phenomenon and as a major technological field which can bring us fortune or stagnation. The essential question is whether the existing or at least the forthcoming generation of political and business leaders will fully embrace the information-age rules of the game. Acknowledgement: Financial support for the EMA project was provided by an international project INCO-Copernicus 960154, Cooperative Research in Information Infrastructure, CRII; by the Ministry of science and technology in Slovenia; and by ESS. We would like to thank the CEO of the Employment Service of Slovenia, Mr. J. Glazer. 7 References [1] L. Bielawski, R. Lewland, Intelligent Systems Design; Integrating Expert systems. Hypermedia and Database Technologies, Wiley, 1991. [2] N. S. Borenstein, "Whose Net is it Anyway", Communications of the ACM, April 1998, pp. 1921. [3] M. Bradshaw (ed.). Software Agents, AAAI Press/The MIT Press, 1997. [4] I. Bratko, S. Muggleton, "Applications of Inductive Logic Programming", Communications of the ACM, Vol. 38, No. I I, 1995, pp. 65-70. [5] O. Etzioni, D.S. Weld, "Intelligent Agents on the Internet: Fact, Fiction, and Forecast", IEEE EXPERT, Intelligent Systems & Their Applications, Vol. 10, No. 4, 1995, pp. 44-49. [6] O. Etzioni, "The WWW: Quagmire or Gold Mine?" Communications of the ACM, Vol. 39, No. 11, 1996, pp. 65-68. [7] O. Etzioni, "Moving Up the Information Food Chain", AI Magazine, Vol. 18, No. 2, 1997, pp. IIIS. [8] M. Gams, B. Hribovšek, "Intelligent-Personal- Agent Interface for Operating Systems", Applied Artificial Intelligence, Vol. 10, 1996, pp. 353-383. [9] M. Gams, M. Drobnič, N. Karba, "Average-Case Improvements when Integrating ML and KA", Applied Intelligence 6, No. 2, 1996, pp. 87-99. [10] M. Gams, A. Karalič, M. Drobnič, V. Križman, "EMA - An Intelligent Employment Agent", Proc. of the Forth World Congress on Expert Systems, Mexico, 1998, pp. 57-64. [11] S. Goonatilake, P. Treleaven (eds.), Intelligent Systems for Finance and Business, Wiley, 1995. [12] S. Hamilton, "Taking Moore's Law into the Next Century", IEEE Computer, Januar 1999, pp. 43-48. [13] B. Hribovsek: Intelligent Interface for VAX/VMS, M: Sc. Thesis (in Slovene). [14] N.R. Jennings, M. Wooldridge, "Intelligent Agents and Multi-Agent Systems", Applied Artificial Intelligence, An International Journal, Vol. 9, 1995, pp. 357-369. [15] N.R. Jennings, M. Wooldridge, Agent Technology, Springer, 1997. [16] M. Lewis, "Designing for Human-Agent Interaction", AI Magazine, Vol. 19, No. 2, 1998, pp. 67-78. [17] P. Maes, "Agents that Reduce Work and Information Overload", Communications of the ACM, 37, 1994, pp, 31-40. [18] P. Maes, R.H. Guttman, A. G. Moukas, "Agents That Buy and Sell", Communications of the ACM, Vol. 42, No. 3, 1999, pp. 81-91. [19] B. Metcalfe, "What's Wrong with the Internet", IEEE Internet Computing, 1997, pp. 6-8. [20] M. Minsky, The Society ol' Mind, Simon and Scliuster, New York, 1987. [21] M. Minsky, "Society of mind: a response to four reviews", Artificial Intelligence 48, 1991, pp. 371396. [22] D. Mladenič, "Turning Yahoo into an Automatic Web-Page Classifier", in Proceedings of the 13th European Conference on Artificial Intelligence ECAI^S, 1998, pp. 473-474. [23] G. E. Moore, "Progress in digital integrated electronics". Technical Digest of 1975 International Electronic Devices Meeting 11, 1975. [24] J.P. Mueller, The Design of Intelligent Agents, Springer, 1996. [25] P.S. Munindar, "Agent Communication Languages: Rethink the Principles", Computer, December 1997, pp. 40-47. [26] J. Mylopoulos, "Cooperative Information Systems", IEEE EXPERT, Intelligent Systems & Their Applications, Vol. 12, No. 5, 1997, pp. 28-30. [27] N. Negroponte, "A Wired Worldview", EU RTD info, March 1998, pp. 28-30. [28] D. E. OLeary, "A Lack of Knowledge at the Top", IEEE Expert, November 1997, p. 2. [29] C. J. Petrie, A. M. Rutkovski, M. Zacks etc., "Dimensioning the Internet", IEEE Internet Computing, April 1998, pp. 8-9. [30] T. Sef, A. Dobnikar and M. Gams, "Improvements in Slovene Text-to-Speech Synthesis", Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98), pp. 2027-2030, 1998, Sydney. A New Perspective in Comparative Analysis of Information Society Indicators Pavle Sicherl Law School, University of Ljubljana and SICENTER, Brajnikova 19, 1000 Ljubljana, Slovenia Tel:+386 61 1501510; fax:+386 61 1501514 Pavle.Sicherl@sicenter.si Keywords: time distance, S-distance, two-dimensional comparison in time and indicator space Edited by: Cene Bavec and Matjaž Gams Received: October 8, 1999 Revised: November 23, 1999 Accepted: December 12, 1999 The analysis of information society indicators can he enriched by supplying a new view of data that can provide new insight from existing data. The slowdown of growth of Internet hosts per 10000 inhabitants in Slovenia after mid-1997 increased the time lag of Slovenia behind leading Finland from 3 years at the end of 1996 to nearly 5 years by August 1999. Time distance methodology is used as a presentation and communication tool to raise awareness of the problem and its consequences in simple understandable terms and to signal the need for an in-depth analysis and action. 1 Introduction Problem: In Slovenia, after a very high rate of growth in the indicator Internet hosts per 10000 inhabitants until mid-1997, such growth slowed down substantially. One can describe the facts in various ways and with various measures. Objective: To make the government, other agents and general public aware of these developments and signal the need for immediate action to correct them. Method: Time distance will be used as a presentation and communication tool to raise awareness of the problem and its consequences in simple widely understandable terms. Since this method can be a useful addition to existing methods of analysing differences between compared units in many fields, a further illustration is provided for the case when the benchmark for comparison is the average value of the analysed indicator for EU15. 2 Methodology: time distance concept and statistical measure S-distance The time perspective, which no doubt exists in human perception when comparing different situations, is systematically introduced both as a concept and as a quantifiable measure. Since events are dated in time, in time series comparisons, regressions, models, forecasting and monitoring, the notion of time distance always existed as a "hidden" dimension. In order to systematise and formalise the approach and defme an appropriate statistical measure for operational use, amendments to the present state-of-the-art are needed on two levels: conceptual and analytical. First, a broader theoretical framework is required. The conventional approach does not realise that, in addition to the disparity (difference, distance) in the indicator space at a given point in time, in principle there exist a theoretically equally universal disparity (difference, distance) in time when a. certain level of the indicator is attained by the two compared units. Second, a statistical measure S-distance has been defined to suggest a possibility how the broader concept and reference framework can be measured in operational terms. The aim is to provide new insights from existing data due to an added dimension of analysis and thus to complement conventional statistical measures. Time distance in general means the difference in time when two events occurred. We define a special category of time distance, which is related to the level of the analysed indicator. The suggested statistical measure S-distance measures the distance (proximity) in time between the points in time when the two compared series reach a specified level of the indicator X. The observed distance in time (the number of years, quarters, months, days, minutes, etc.) is used as a dynamic (temporal) measure of disparity between the two series in the same way as the observed difference (absolute or relative) at a given point in time is used as a static measure of disparity [1,2,3]. For a given level of Xl, Xl = Xi(ti) = Xj(tj), and the S-distance, the time separating unit (i) and unit (j) for the level Xl, will be written as Sij(XL) = AT(XL) = tiCXJ - ti(X,J where T is determined by Xl. In special cases T can be a function of the level of the indicator Xl, while in general it can be expected to take more values when the same level is attained at more points in time, i.e. it is a vector which can in addition to the level Xl be related to time. Three subscripts are needed to indicate the specific value of S-distance: (1 and 2) between which two units is the time distance measured and (3) for which level of the indicator (in the same way as the time subscript is used to identify the static measures). In the general case also the fourth subscript would be necessary to indicate to which point in time it is related (T|,T2,...,T|,). The sign of the time distance comparing two units is important to distinguish whether it is a time lead (-) or time lag (+) (in a statistical sense and not as a functional relationship): Sìì(XL) = -SÌÌ(XL) . Using the comparison between two units it can be shown that the generic concept of time distance goes together very naturally with the existing concepts of static disparity at a given point in time and the notion of the growth rate over time. Table 1 provides a schematic example for such comparisons for a given indicator. Row one is the most frequently used type of comparative analysis; levels of the indicator at a given point in time are compared. In such comparison two points are used, for each of them we have three elements of information: (i) the respective level of the indicator, (ii) to which unit it belongs, and (iii) at what time it happened. In this case unit as well as time (since it is constant for static comparison) serve as identifiers, while the levels are used to calculate the static difference. Row two compares two levels of the indicator for each unit at two points in time, separately for each unit, which means that one calculation indicated in row two refers to unit 1, and another to unit 2. The simplest example would be growth rate for unit 1 and growth rate for unit 2. Here the unit is the identifier, while the numerical values on levels and time are used in calculating this measure. These two steps are standard procedures. The first one represents the static type of comparison; the second one measures the dynamic properties of the indicator for each unit separately. Following the same logic, for the novel statistical measure S-distance in row three level is the same, level and unit serve as identifiers, and time is used for calculating time distance. It is remarkable that the notion of time distance, which can be in principle developed from the same information used in steps one and two, has not been developed theoretically and as a standard statistical measure. TIME UNIT LEVEL Measure TIME same 2 2 static difference UNIT 2 same 2 change over time LEVEL 2 2 same time distance Table 1. Points of comparison for static difference, change over time and time distance (two units) While there may be different problems involved in the calculation of these three types of measures, in terms of availability and comparability of data, in principle these three types of measure can be integrated into a formally consistent analytical framework. There are alternative ways of doing this, following from the distinction between backward looking (ex post) and forward looking (ex ante) time distances. They relate to different periods, past and future, the first belongs to the domain of statistical measures based on known facts, the second is important for describing the time distance outcomes of the results of alternative policy scenarios for the future. Looking backwards, ex post or historical time distance indicates how many years ago the more developed unit experienced a specified level of the indicator of the less developed unit at a given point in time [3]. A very important relationship shows that, ceteris paribus, time distance is a decreasing function of the magnitude of the growth rate of the indicator. This conclusion shows that the S-distance as a dynamic (temporal) measure of disparity offers a perspective which may be quite distinct from that provided by static measures. This new view of the information is using level(s) of the variables as identifiers and time as a focus of comparison and numeraire. This approach and the broad range of its possible applications is much more complex and general, but the time distance is the priority choice because of its intuitive nature, and the importance of the time dimension in semantics of describing various situations in real life and forming our perceptions about them. In this paper only the application to comparison of one indicator between several units will be used. However, the approach has been generalised to complement conventional measures in time series comparisons, regressions, models, forecasting and monitoring, and to analysis of single time series [3] and lo variables other than time [4], In all such applications it can provide from existing data new insights due to an added dimension of analysis. 3 Data and results for Slovenia, EU15 countries and candidate countries Data on Internet hosts per 10000 inhabitants used relate to the period end of 1993-August 1999 [5,6,7], At present is the measurement and empirical analysis of information society indicators beset with problems. It is stated that the single most important obstacle to effective data collection is the lack of standardised definitions of information technology and the exclusion of important costs associated with its use, like personnel and training expenses. A further weakness is the relative absence of systematic information how information technology is actually being u.sed [8]. In addition to these general obstacles there may be also some specific reasons that the slowdown of the increase in Internet hosts per capita in Slovenia in the last two years shown in RIPE data may have been exaggerated [9], We shall proceed by analysing the available RIPE data, yet there should be an appropriate caution about possible inaccuracy in the available data. Comparative analysis of the differences among countries can be presented in two dimensions. The conventional static differences at a given point in time are in this paper complemented by the time distance dimension. Time distance in Table 3 is for practical reasons calculated for the levels of the indicator for those countries, which are behind Slovenia, and for the level of Slovenia for the countries, which are ahead of Slovenia. 1993 1994 1995 1996 1997 1998 Aug. 1999 LUX lA 12.5 46.0 85.2 113.4 182.3 218.8 DAN 16.1 35.4 96.9 203.3 321.1 571.1 608.3 BEL 7.0 17.3 30.2 64.0 104.8 202.4 307.5 AUT 18.9 34.0 66.3 110.2 134.4 214.0 235,7 DEU 13.7 24.4 58.0 84.4 137.7 177.0 186,3 FRA 9.3 14.4 26.0 40.6 60.7 84.2 106.4 NED 28.6 55.8 110.8 173.4 249.2 395.1 481.9 ITA 2.9 5.0 13.1 25.8 44.2 64,5 96.4 SVE 47.0 84.8 164.1 269.0 394.0 429.9 569.5 UK 19.1 38.7 75.1 122.4 167.3 247.4 272.2 FIN 65.2 133.9 416.7 612.1 945.8 902.6 930.5 IRL 6.5 15.3 37.3 74.2 109.3 155.6 181.7 ESP 3.6 7.0 13.1 28.8 49.9 78.2 94.2 PRT 3.6 5.1 11.9 23.6 42.7 56.3 67.4 GRE 1.7 3.4- 7.4 16.0 26.7 47.1 63.5 SLO 3.1 8.2 28.3 69.5 98.2 1 15.3 116.3 CZE 4.3 10.1 21.1 39.6 55.2 83.6 101.8 SVK 0.7 2.6 5.6 14.8 27.0 41.0 48,3 HUN 3.0 6.6 15.4 29.2 66.7 87.8 106.2 POL 1.3 2.8 6.0 13.7 22.9 32,4 42.9 EST 2.9 7.7 24.1 54.3 108.4 151.2 . 180.8 ROM 0.0 0.2 0.8 3.5 6.0 9,9 14.1 LIT 0.3 1.2 4.7 10.9 26,0 32.7 LAT 0.2 2.0 5.2 23.1 28.6 54,4 63.7 BG 0.0 0.2 1.3 4.0 8.2 12.2 18.3 EU15 12.2 23.6 50.5 78.6 124.3 171,1 199.3 Table 2: Data on Internet host density per 10000 inhabitants Source: Internationai Telecommunication Union Database, Geneva 1998 for 1993-1997 [5]; RIPE [6] in RIS [7] for 1998 and 1999. In Tables 2 and 3 the countries are sorted by the level of GDP per capita (at purchasing power standards) in 1997. Obviously, the Internet hosts per capita are not firmly correlated with GDP per capita. In 1996 Slovenia was occupying a comfortable comparative position in terms of Internet hosts per capita: it was lagging less than 3 years behind Finland as the leading country, and was ahead of several EU countries, i.e. Belgium, France, Italy, Spain, Portugal and Greece. The last four mentioned countries had substantially lower values than Slovenia. The slowdown of growth rate in this indicator for Slovenia after mid-1997 led to a quick deterioration of the comparative situation of Slovenia. By August 1999 the lag behind Finland increased to nearly 5 years. Namely, in case of indicators with high rates of growth the situation can change very quickly, as distinct from the fields where the rate of change is slow. Figure 1 provides visualization of these changes. Tables 2 and 3, and Figure 1 compare Slovenia with EU 15 countries and the nine candidate countries from Central and Eastern Europe. One could also speculate what would be the situation if the rate of growth for the period 1997-August 1999 would continue until the end of 2000 (this should not be interpreted as projections). 1994 1995 1996 1997 1998 Aug. 1999 LUX -0.9 -0.5 -0.4 -0.5 -1,0 -1.5 DAN #N/A -1.4 -1.5 -2.0 -2.8 -3.4 BEL -0.9 -0.2 0.1 -0.2 -0.9 -1.5 AUT #N/A -1.4 -0.9 -1.3 -1.8 -2.3 DEU #N/A .-0.9 -0.6 -0.7 -1.4 -2.0 FRA #N/A 0.1 0,7 1.2 1.5 1.1 NED #N/A #N/A -1.8 -2.2 -2.9 -3.5 ITA 0.6 0.8 1.1 1.6 2.1 1.7 SVE #N/A #N/A -2.4 -2,8 -3.6 -4.2 UK #N/A -1.5 -1.2 -1.5 -2.2 -2,7 FIN #N/A #N/A -2.9 -3.5 -4.3 -4.8 IRL -0.8 -0.4 -0.1 -0.3 -0.9 -1.4 ESP 0.2 0.8 1.0 1.5 1.7 1.7 PRT 0.6 0.8 1.2 1.7 2.3 2.6 GRE 0.9 1.2 1.6 2.1 2.5 2.7 SLO 0.0 0.0 0.0 0.0 0.0 0.0 CZE -0.3 0.4 0.7 1.4 1.5 1.4 SVK #N/A 1.5 1.7 2.1 2.7 3.1 HUN 0.3 0.6 1.0 1.1 1.4 LI POL #N/A 1.4 1.7 2.3 2.9 3.2 EST O.l 0.2 0.4 -0.2 -0.8 -1.4 ROM #N/A #N/A 2.9 3.4 3.9 4.3 LIT #N/A #N/A 2,7 2.9 3.1 3.5 LAT #N/A 1.6 1.3 2.0 2,4 2.7 BG #N/A #N/A 2.8 3.0 3,8 4.1 EU15 #N/A 0.8 0.3 0.6 1,2 1.8 Table 3: Time distance between compared countries and Slovenia, S-distance in years: - time lead, + time lag, Slovenia=0 Source: Own calculation based on data in Table I. If no action would be taken and such slowdown would continue until the end of 2000, a further deterioration of the relative position of Slovenia for this indicator would take place. Slovenia would within a period of only a few years move from a comfortable position near the EU15 average in 1996 (despite being more than 30 per cent below the average EU 15 level of GDP per capita) to a position where the lag behind the foreruntier Finland would be already 6 years. The lag behind Sweden, Denmark and Netherlands would be around 5 years, France, Italy, Spain and Greece would surpass or catch up with Slovenia, and only Portugal out of the EU15 countries would be still behind it. Time distance seems to be an excellent way of presenting the danger of a rapidly deteriorating situation, which everybody can understand, and to signal that an in-depth analysis and corresponding actions are necessary. Some other conventional measures may not provide such warning. E. g., static comparison showed that in 1996 Finland had 8.8 times the number of Internet host per capita in Slovenia, and in 2000 it would be 6.6 times. Time distance adds a qualitatively different conclusion. Similar consequences can be seen from comparison with selected Central and Eastern European countries. In 1996 Slovenia was with Estonia a clear leader in the region for the indicator Internet hosts per capita. In the meantime Estonia moved ahead, and the gap would widen if the present trends would continue. By August 1999 Slovenia is lagging behind Estonia for more than I year. The quality of time distance measure, being transparent and easy to perceive and understand, can be even more appreciated when a larger set of indicators is analysed, involving more issues and different fields of concern. For instance, in 1997 Italy was 18,3 years ahead of Slovenia for GDP per capita at purchasing power parity, while Slovenia was 1.6 years ahead of Italy for Internet hosts per capita. Some of these indicators can change very quickly, some others, like some demographic variables and some other characteristics of human factor, very slowly. Time distances will be different, smaller for those indicators that are more dynamic by their nature, more conducive to policy measures and given higher priority in decision-making process. _____X ____«— "if -- J ------ -^LUX -B-DAN .....i,- BEL .....*......AUT -«-DEU -•-FRA —t—NED -ITA .. .g UK FIN RL .....*.....ESP .....» PRT —•—SLO ---------CZE -SVK POL ■~A-EST -«-ROM -*-LIT -«-LAT -+-BG Time Figure 1. Time distance for Internet host density per 10000 inhabitants, EU and candidate countries, Slovenia=0 Finkwl I IKii _ Austria | Be emhoijrcj | :ui Germany "Ül Ireland Estonia jHungaiv j Czech R, _llaly ■ Spain gWiliBWiiti^Sjiliti^M ■lili J ' ' I Lrilvi.n ]]cireijM= _ akia Foland i'iiUiHhtitM Llllnkiiir.i ' \ I Bulgaria 11 Roman -6 -5 -4 -3-2-1012 S-distance (In years): - time lead, + time lag Figure 2. Differences from EU15 average for Internet hosts per capita expressed in time (August 1999) Figure 2 is an illustration of application of time distance presentation in a similar case of comparative analysis. In this example the average value of Internet hosts per capita for EU 15 is the benchmark for comparison. The dispersion of situations in this respect for EU 15 countries and Central and Eastern European candidate countries can be presented in various ways, like ratios, percentages, absolute value and absolute differences, etc. Furthermore, various summary measures of dispersion could be calculated. Absolute values of the indicator are presented in Table 2. A widely used conventional measure would be indeces or percentage differences. For instance, in August 1999 the index for forerunner Finland would be 467, for Portugal and Greece about 33 as the lowest value for EU 15 countries, and 9 for Bulgaria and 7 for Romania (EU 15= 100). Figure 2 presents another conaplementary view of this set of data. Time distances are calculated in cases of above the average countries for the level of EU15 average, and in cases of below the average countries for the level of indicator in these countries in August 1999. Finland had a lead of about 4.5 years ahead of the EU 15 average, Portugal and Greece were laging the EÜ15 average for about 3 years, and Bulgaria and Romania for more than 4 years. Time distances alow for a distinct new insight that can help to form a richer perception of the situation. Since time distance is expressed in units of time, which everybody understands from ministers, managers to general public, it possesses one of the ideal characteristics of a presentation and communication instrument. It is expected that the analysis of and discussion about time distances will have considerable influence on how people will form their perception about a situation and on public opinion. For instance, in the EU the consideration of economic and social cohesion is an important goal. A series of presentation of results like Figure 2 for a number of relevant indicators would without any doubt provide a new additional insight to a complex multidimensional problem. Similarly, it would be very useful if the results in Table 3 and in Figures I and 2 would be provided for a broad selection of information society indicators. This offers improved semantics for analysis and policy debate, and can in many cases lead to qualitatively different conclusions from those reached in a static conceptual and analytical framework. By analogy, there is a wide-open possibility to apply this methodology to numerous business problems at the micro, corporate and sector levels. Another important advantage of this approach is that the results and conclusions based on the two-dimensional analysis add new information and new insight, while none of the earlier results are lost or replaced. 4 Conclusions In empirical research the art of handling and understanding of different views of data is crucial for discovering the relevant patterns. The time distance approach (with associated statistical measure S-distance) is useful at least in two domains: it offers a new view of data that is exceptionally easy to understand and communicate, and it may allow for developing and exploring new hypotheses and perspectives that cannot be adequately dealt without the new concept. The generic nature of the time distance concept and the S-distance measure leads to the conclusion that the methodology can be usefully applied as an important analytical and presentation tool in numerous applications in a wide variety of substantive fields. Especially in the field of information technology indicators, which is characterised by great speed of change, it would be of great interest to complement rather than replace the conventional measurement of differences between countries or other units with this new perspective of the situation. 5 References [1] P. Sicherl. A Novel Methodology for Comparisons in Time and Space. East European Series No. 45. Vienna Institute for Advanced Studies. Vienna. 1997. [2] P. Sicherl. Time Distance in Economics and Statistics; Concept, Statistical Measure and Examples. In A. Ferligoj (ed.). Advances in Methodology, Data Analysis, and Statistics. Metodološki zvezki. 14. FDV. Ljubljana. 1998. [3] P. Sicherl. The Time Dimension of Disparities in the World. XIIth World Congress of the International Economic Association. Buenos Aires. August 1999. [4] P. Sicherl. Measuring disparities in two dimensions: proximity in time and proximity in indicator space. 70''' International Conferene on Socio-economics, Vienna, Austria 13-16 July 1998. [5] International Telecommunication Union. Database. Geneva. 1998. [6] http://www.ripe.net [7] http://www.ris.org [8] National Science Board, Science & Engineering Indicators - 1998. Arlington, VA: National Science Foundation, 1998. [9] V. Vehovar, Spremljanje informacijske družbe, in P. Sicherl, A. Vahcic (eds.). Model indikatorjev za podporo odločanju o razvojni politiki in za spremljanje izvajanja SGRS, Sicenter, Ljubljana, oktober 1999. Measuring Information Society: Some Methodological Problems Vasja Vehovar, Matej Kovačič Faculty of Social Sciences, Kardeljeva ploščad 5, 1000 Ljubljana, Slovenia Phone: +386 61 1805 100; Fax: +386 61 1805 102 vasja.vehovar®uni-lj.si, matej.kovacic@uni-lj.si Keywords: information society, Internet indicators, survey research, electronic commerce Edited by: Cene Bavec and Matjaž Gams Received: October 10, 1999 Revised: November 30, 1999 Accepted: December 20, 1999 The paper addresses methodological problem of measuring information society. Both, technical indicators and attitudinal measurements for Slovenia are discussed in this context. In particular, the results related to the interest for information society services are presented. The comparison between Slovenia and European Union - despite some methodologiccd problems - shows that the interest for these services is extremely high in Slovenia. Other figures also confirm that Slovenian households and businesses are generally on European average with respect to the penetration of the basic information technologies. However, certain discrepancies with other sources of data call for more efforts in performing these kind of analysis. 1 Preface The concept of the information society has already been around for many decades. Nevertheless, its definition is relatively unclear, and the same is true for the corresponding indicators. There do exist some ad-hock measures from as early as sixties, particularly in the area of services, the information professions and the extent of the business information sector. However, it is extremely hard to establish official indicators for the phenomena in such a dynamic field. The Internet, in particular, has brought even more complications in these measurements. Even in United States, the official estimates about the scope of the electronic commerce will be available only in year 2000, after the electronic commerce transactions have already reached hundreds of billions of USD. The first official estimated will thus arrive after several years of extreme variety in the estimates from numerous consulting agencies. In addition, there are considerable discrepancies also in other measurements of information society. The paper presents an overview of most frequent divergences that arise from the interpretation of indicators of information society. The methodological misunderstanding is also an important reason that unnecessarily hinders the understanding of the position of Slovenia. 2 Quantitative measurement Quantitative indicators of the information society usually refer to numerical Figures — expressed in numbers/percentages of users, or, in the form of financial totals — which most often relate to the use and penetration of modem technologies, especially the Internet, mobile telephony and electronic commerce. However, we have to be extremely careful when interpreting these data. We can, for instance, classify the use of electronic payment orders of the Slovenian Agency for Payments as a form of electronic commerce. Many experts claim that this is also a specific form o Electronic Data Interchange (EDI). In this case the amount of electronic commerce in Slovenia can be measured in hundreds of billions of USD. But if we talk about the transactions where searching, ordering, billing and payment procedures are all performed in the electronic forms — without any paper recording — it is clear that the amount of electronic commerce is only a fraction of this amount, e.g. only around few millions USD. Therefore, we have to be very precise when stating such observations. It is not surprising that the estimates of leading consulting agencies on electronic commerce have varied at rates 1:10 in the past years, and at present they still vary at the rate 1:2. Often, the source of the problems is not even in the statistical methodology or in the definitions, but in a simple fact that the methodological framework is not properly reported. Recent international efforts for standardised measurement on electronic commerce, particularly those at OECD (1999) already brought some results and we can perform certain international comparison of the Internet and electronic commerce usage among companies. The available comparisons with Singapore, Scandinavian countries and Australia shows that with respect to PC usage, Internet penetration and Web site penetration among companies there is, as for 1998, no significant lagging for Slovenia. However, there is a certain time lag in the adoption of electronic commerce applications. Unfortunately, the international comparisons in the area of electronic commerce are much more complicated. Recent experiments in RIS 1999 survey of companies clearly demonstrated the sensitivity in these kinds of measurements. It has been shown that when electronic commerce was defined as any business transaction performed over the computer networks the percentage of companies claiming to use electronic commerce was 10% higher compared to the definition that was restricted only to the transactions that lead to the purchase (RIS, 1999). A similar problem of definition is the estimate of the number of Internet users. There exist at least five categories of Internet users (Vehovar et al. 1998). This prompts to a need for an exact definition of the term "Internet user". For instance, the estimate of EITO (1999) talks about 60.000 Internet users in Slovenia in 1998, however, the definition of Internet user and description of how the eslimate was obtained is not available there. In addition, this estimate differs from all other estimates for Slovenia. Even the number of users with personal e-mail accounts in 1998 is much higher than 60.000. Of course, EITO's estimate could refer to some specific group of intensive users. When measuring information society phenomena we are also faced with divergence, which originates from the methodology of data collection. The IDC corporation, for instance, provides estimates ba.sed on distribution channels and one of the figures states that 65.000 purchases (shipments) of new personal computers were made in Slovenia in 1998, thereof 17% in households. This suggests that households buy around 10.000 personal computers yearly. Such an estimate also matches the ESIS (1999) figures stating that Slovenian households possess 100.000 personal computers (with a processor 486 or more). However, this does not match the survey estimates that Slovenian households have more than 200.000 personal computers, which is a result of practically all surveys (Statistical office. Mediana, Slovenian public opinion, RIS...). Survey estimates consistently show that the number of personal computers has surpassed 200.000 also in business use, what suggests that Slovenia is highly ranked by number of personal computers per 100 residents. There are more than 25 personal computers per 100 habitants in Slovenia. This is surprisingly high, however, as the usage of information technology is rather complex, the criticisms regarding low technology penetration in Slovenian economy may still hold true (The World Competitiveness Yearbook 1999, IMD Lausanne). One of the most exposed indicators of the Internet and information society is the statistics on the Internet hosts (Vehovar, 1998). This indicator shows extremely inconvenient trends for Slovenia: the growth of hosts in the last two years has almost entirely stopped while all other countries rapidly progress. However, the number of hosts is a typical example of an indicator that is more complex than a casual observer might think. For instance — all the hosts which are not included in *.si domain are excepted from Slovenian host cunts. This does not happen so often in larger countries or in countries with more liberal legislation for assigning domains. In Slovenia, non-domestic domains are very frequent, even among the most visited sites and among the largest Internet access providers: siol.net, s.net, s5.net, amis.net. It seems that the large majority of commercial dial-up modems/ hosts is registered under domains *.net. The high usage of dial-up access in Slovenia also presents a problem for itself and contributes to a low host counts, because each host/modem serves many dial-up users. Additional problems can present the multiple IP numbers - e.g. virtual hosts - located on one computer. This is more often the case in countries such as Estonia than in Slovenia. We have to understand that the "host" does not necessarily mean a computer connected to the Internet, but only an IP number. In Slovenia, additional problems are also the computers that are connected to many large local networks with full access to the Internet but without an IP number. The problem of host counting is getting even more complex because of the technical problems of measurement procedures, which are becoming increasingly more difficult due to fire-walls and other forms of security protections. This forced Network Wizards to change entirely the methodology and broke with the time series. The data about host numbers from RIPE (http://www.ripe.net) and Network Wizard (hltp://www.nw.com) thus vary considerably. The RIPE host count often shows a clear monthly a recession in the number of hosts for some countries (Italy, for instance) what is not realistic. All the above arguments may explain the situation for Slovenia, where the host counts in the last two years show less than 10% yearly growth (Picture 1, Picture 3), but all other indicators (number of registered domains, number of companies connected to the Internet, number of households with access to the Internet, number of Internet users) demonstrated more than 50% growth (Picture 2). Number of hosts (Dec. 92 - Nov. 99) Source: RIPE 25000 20000 15000 10000 5000 number of domains - sum(dec.92 - nov.99) {Stxjrce: ARNES) 6000 5000 4000 3000 2000 1000 dst;.92 das.93 d6c.94 dec.95 dsc.SS > > > > Indicators on some areas are not correlated with economical power of the countries; Absolute investments are significantly lower then EU average; Administrative structure is still developing; Legal and organisational framework is rapidly improving; Level of capitalisation in telecommunication is improving although some monopolies still exits; ICT advanced development is beyond legal, organisational and administrative progress. IliiHlB'^I^^BB E<1.1-^OUP |>«T .................... ■I lllÄSlIHlwtilfl • IB^BÌPBIBIIB illilBiiiWIii^pli^Élll ■■■I 8' c 7.1. „1si h tTIP ni 1'19? i™ Figure 6: Portfolio showing GDP/pc and Telecommunication growth rate in 1997/1998 The proposed approach to data analysis and the developed tools are not only good for comprehensible modelling and presentation of the results, but also lor permanent monitoring and studying of Slovenia and other CE countries' progress. Figure 7 shows a six dimensional diagram for CE countries comparison, based on the total values of six indicator's groups. Very clearly can be seen the leading position of Slovenia (SLO-white). This diagram and the diagram in the figure 6 were generated using our ISIT system. ■jll^pllliiM ^^^^liiil^HliiiBilBilBIH lEWiiBiiiir liplSlilSiliBUiflii I 1 I- ..................null l>-IIIILlll|nt .l,il L I I I >.011 'IUI' I .KMliKilll'lli i.il M I I hitodinliriiii I hiiMt lyfol II I I I I riiil iiiil. 11 iiiif ill! iiilfi.iiii.'A'iikt ir il r'[ I iMni. Ill K|.'ii'.=E >=E =E ::i:|:Tiü':ii.i;cir'n.i'-- .■.-li-.i: ;:i .iL-.e.rr.'. rr ..iiJ i Jun .n- , -.^'.-ii'i,: ^ ■ ..I ■'i.' i;. ...J • - : ---------------------....a------------------ Figure I : Example of electronic book screen At the end of our inquiry we reckoned up the arithmetic means of answers for each statement so that we could see weather the students had a positive or a negative relation to the specific statement and so to the specific element of studying process. The results of inquiry are shortly described below: • Students were satisfied with a new way of study, they meant that the learning quality was grooving by using the electronic book. • They said that the chapter separation in the electronic book was good and that all parts were well linked. • Opinions about the learning speed were divided. • Students' opinion weather concrete examples contribute to better understanding of material or not were also divided. • Students were satisfied with the contents of the electronic book and they meant that the extent of the learning material is appropriate. • They showed a positive statement about methodological point of view of the electronic book. • Students were satisfied with the pretentiousness of the electronic book, with the systematic work, with learning goals and with the acquired knowledge. • They could not tell weather the learning material mediates only facts or not. These could be the consequence of not knowing the learning material in full. Maybe the answers would have been different if the inquiry had been curried out after the final exam, when the students would have been more acquainted with learning material, and not immediately after the first work with the electronic book. 6 Conclusion In the article the use of an electronic book as a new method of distance learning is represented. The results of the research, which was carried out among the students of Faculty of Organisational Sciences who were using an electronic book by their study, are shown. Students are satisfied with this kind of study and are looking forward to using electronic books for other subjects as well. The use of an electronic book variegates the study and increases the individual work and motivation of students. The positive experiences of using an electronic book are pointing out that the introducing of distance learning would probably also have a good response among the students. 7 References [1] Batagelj V., Rajkovič V.: Information Technology Project in Slovenia Schools, Proc. of 1st Euro Education Conference, Aalborg, 9.-15. [2] Boud D.: The Challenge of Problem Based Learning, Kogan Page, London, 1992. [3] Collins J.: Computers in Classroom and College, Computer Education, June 1994. [4] Dhanarajam G. ed.: Economics of Distance Education: Recent Experience, Open Learning Institute, Hongkong, 1994. [5] Hedberg, J.: Converging Technologies in Education: Interactive Multimedia and Online Learning; The University of Wollongong, New South Wales, Australia, 1996. [6] Jereb J., Jug J.: Učna sredstva v izobraževanju, Moderna organizacija, Kranj, 1987. [7] Jereb J.: Računalnik v izobraževanju, Mc&Boss, Kranj, 1991. [8] Jereb J.: Strokovno izobraževanje in razvoj kadrov, Moderna organizacija, Kranj, 1989. [9] Jerram P.; Gosney M.: Multimedia Power Tools, Verbum Inc. and Gosney Company, 1995. [10] Jereb J, Jug J. et.al.: Izobraževanje odraslih, poročilo o raziskovalni nalogi. Fakulteta za organizacijske vede, Kranj, 1992. [11] Keegan, D.: The Study of Distance Education: Terminology, Definition and Field of Study in Research and Distance Education, Peter Lang, Frankfurt am Main, 1991. [12] Laurillard, D.: Rethinking University Teaching: A Framework for the Effective Use of Educational Technology, Routlcdge, London, 1993. [13] Marentič - Požarnik B.: Prispevek k visokošolski didaktiki, DZS, Ljubljana, 1978. [14] Rowntree, D.: Teaching through Self Instruction , How to develop Open Learning materials, Kogan Page, London, 1991. [15] Rowntree, D.: Exploring Opem and Distance Learning, Kogan Page, London, 1992. [16] Rowntree D.: Preparing Materials for Open, Distance and Flexible Learning, Kogan Page, 1994. [17] Strmčnik F.: Sodobna šola v luči programiranega pouka, DDU Univerzum, Ljubljana, 1978. [18] Van der Brande, L.: Flexible and Distance Learning, John Wiley & Sony, Chichester, 1993 [19] Zorman L: Sestava testov znanja in njihova uporaba v šoli. Zavod za šolstvo, Ljubljana, 1974. Multi-Attribute Decision Modeling: Industrial Applications of DEX Marko Bohanec', Vladislav Rajkovič ' Jožef Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia Phone:+386 61 1773 309, Fax:+386 61 1258 058 ^University of Maribor, Faculty of Organisational Sciences, Kranj, Slovenia {marko.bohanec, vladislav.rajkovic} @ijs.si Keywords: decision support, multi-attribute decision making, qualitative decision models Edited by: Cene Bavec and Matjaž Gams Received: October 2, 1999 Revised: November 20, 1999 Accepted: December 12, 1999 DEX is an expert system shell for qualitative multi-attrihute decision modeling and support. During the last decade, it has been applied over fifty times in complex real-world decision problems. In this article we advocate for the applicability and great potential of this approach for industrial decision-making. The approach is illustrated by a typical industrial application in land use planning, and supplemented by an overview of some other completed industrial applications. The learned lessons indicate the suitability of the qualitative DEX methodology particularly for "soft", i.e., less structured and less formalized, decision problems. Practical experience also indicates the importance of methods that facilitate the analysis, simulation, and explanation of decisions. 1 Introduction In complex decision-making processes, it is often necessary to deal with the problem of choice (Simon, 1977). Given a set of options (or alternatives), which typically represent some objects or actions, the goal is (1) to choose an option that best satisfies the aims or goals of decision maker, or (2) to rank the options from the best to the worst one. One of the approaches to such problems, which is well known and commonly employed within Decision Support Systems (Andriole, 1989), is based on evaluation models (Figure I). The idea is to develop a model that evaluates options giving an estimate of their worthiness (utility) for the decision-maker. Based on this estimate, the options are ranked and/or the best one is identified. Usually, a decision model is designed in an interaction between the decision maker and decision analyst. An important feature of evaluation models is that they can be, in addition to the sole evaluation of options, used for various analyses and simulations, which may contribute to a better justification and explanation of decisions. For example, a what-if analysis can provide a better insight into a causal relation between problem parameters and outcomes. Another example is a sensitivity analysis that can assess the sensitivity of model with respect to small changes of options. An evaluation model can be developed in many ways. The approach that prevails in decision practice is based on multi-attribute decomposition (Chankong and Haimes, 1983; Saaty, 1993; Buede and Maxwell, 1995): OPTIONS UTILITY EVALUATION ..EVALUATION MODEL ANALYSIS Figure 1: Evaluation-based decision modeling CARS UTILITY _j|„.,; »I buying mainr PRICE safety | "Sil_ CAR f doors I ,,T-1, , .. -K^ I TECH I pers I^COMF 1 protjioni decomposition Figure 2: Multi-attribute decision modeling we take a complex decision problem and decompose it into smaller and less complex subproblems. The result of such development is a decision model that consists of attributes, each of which represents a decision subproblem. Attributes are organized hierarchically and connected by utility functions that evaluate them with respect to their immediate descendants in the hierarchy. Figure 2 illustrates this basic principle of multi-attribute modeling by showing a simple hierarchy of attributes for the evaluation of cars. Real-life applications of multi-attribute methods, which were conducted at Jožef Stefan Institute in Ljubljana, were all based on DEX (Bohanec and Rajkovič, 1990). This is an expert system shell for multi-attribute decision making that combines the "traditional" multiattribute decision making with soine elements of Expert Systems and Machine Learning. The distinguishing characteristic of DEX is its capability to deal with qualitative models. Instead of numerical variables, which typically constitute traditional quantitative models, DEX uses qualitative variables; their values are usually represented by words rather than numbers, for example "low", "appropriate", "unacceptable", etc. Furthermore, to represent and evaluate utility functions, DEX uses if-then decision rules. In contrast, this is traditionally carried out in a numerical way, using weights or similar indicators of attributes' importance. An important additional feature of DEX is its capability to deal with inaccurate, uncertain or even missing data about options. In such cases, DEX represents options by distributions of qualitative values, and evaluates them by methods based on probabilistic and/or fuzzy propagation of uncertainty. During the last decade, DEX was used in more than fifty real-life decision problems. The aim of this article is to advocate for the wide applicability of DEX to complex decision problems that occur in industry. In the next section, we first illustrate the approach by a typical industrial application in land use planning. This is followed by an overview of several other completed industrial applications in performance evaluation of companies, evaluation of products, projects and investments, ecology, and loan allocation. Finally, we summarize the lessons learned in these applications, and propose some future directions for the development of underlying methodology. 2 A Real-World Case One of the most typical applications of DEX occurred with Goriške opekarne, a company located near the Slovenian city of Nova Gorica. The company is engaged in a very traditional business: production of bricks and tiles. Decades ago, they had built a factory near a suitable clay pit that was then providing raw material for their production. Until 1993, however, the clay pit has become almost completely exhausted, so the company was faced with a critical strategic decision of how to survive and continue with this type of production. Their only option was to find a new appropriate clay-pit location. An exploratory study revealed three possible candidate locations. Unfortunatelly, none of them was really appropriate as numerous difficult problems were foreseen, ranging from technological, transportational and financial to environmental and socio-psychological. The latter two problems seemed particularly important as the project was inevitably going to affect the environment, leading to a possible rejection of local inhabitants. For these reasons, a group of experts was formed to thorougly analyze the problem and propose alternative solutions (Bohanec, et al., 1993). ■ ATTRACT j VULNERAB ' SOC-PSYCH j| ECONOM „.develop;,; 'site char, land attr. • pollution • valuation • land use ' Figure 3: Topmost levels of clay-pit evaluation model In the first stage, the experts developed the structure of multi-attribute model for the evaluation of clay-pit locations.. Two primary evaluation dimensions were taken into account: Environmental impact and Feasibility of the project. For each of these, the most relevant attributes were identified and organized into a hierarchical structure (Figure 3). Note that only topmost levels of the model are shown in the figure. In total, the model contained 49 attributes: 29 basic (tenninal nodes) and 20 aggregate (internal nodes). Table 1: Decision rules for Site suitability ENVIRONMENT FEASIBILITY SITE 1 * unacc unacc 2 unacc * unacc 3 less-acc less-acc marg-acc 4 > acc less-acc less-acc 5 less-acc acc less-acc 6 acc acc acc 7 good acc good The second stage involved the definition of decision rules. Basically, these are simple if-then rules that for each of the 20 internal nodes in the model determine its evaluation with respect to its lower-level descendants in the hierarchy. Usually, they are represented in a tabular form. For example. Table I shows decision rules that were defined by the experts for the topmost node Site suitability. In the table, an asterisk '*' represents any value, and '>' means 'belter or equal'. In the third stage of the decision-making process, the options are identified and described by the values of basic attributes. In our case, there were three clay-pit locations, each of which was represented by 29 data items that corresponded to basic attributes of the model. Furthermore, as some of these items, such as Social-psychological feasibility, were inherently inaccurate or difficult to obtain, several variations of the descriptions were formed, anticipating either an "optimistic" or "pessimistic" development of the project. Effectively, this increased the number of considered options to eight (Figure 4) and provided a foundation for subsequent what-if analysis. which clearly indicate the wide applicability of DEX for a variety of decision problems. The description of some other early industrial applications can also be found in (Urbančič, et al., 1991). 3.1 Performance Evaluation of Companies Here, the general task is that a company or agency develops an evaluation model that assesses the performance of some other companies. The aim is, for example, to find a suitable business partner. The work with DEX in this area began in 1987, where a number of such models were developed in collaboration with the International Center for Public Enterprises (Bohanec and Rajkovič, 1990). An example hierarchy of attributes that was used to assess the performance of 54 public enterprises in Pakistan, is shown in Figure 5. This work culminated in 1989 with the development of models that were used in the privatization of Peruvian public enterprises. »latjglnica o UNE Fit S0C-PS3 Figure 4: Visualization of clay-pit evaluation results In the last stage, the model was utilized to evaluate the clay-pit locations. As shown in Figure 4, the best location was the one called Marjetnica, which was evaluated as "acceptable", but only in its "optimistic" instance. On the other hand, all the "pessimistic" instances were unacceptable, indicating the great sensitivity of decision. Therefore, thorough what-if and sensitivity analyses were performed for each location. The most important result was achieved by comparing "optimistic" and "pessimistic" options with respect to basic attributes. The outcome of this comparison was a comprehensive list of possible problems that could occur with each location. On this basis, the expert team not only was able to find the best location, but also to foresee potential pitfalls and suggest how to avoid them. 3 Other Applications In about ten years time, DEX was used in more than fifty real-life decision problems in various areas. About one half of the problems can be classified as industrial, while the remaining were conducted in the fields such as education or medicine and health care (Bohanec, et al., 1999). Some of the industrial problems were very difficult and involved substantial financial and other risks for decision-making organizations. In what follows we briefly outline five representative application areas. Figure 5: Topmost levels of the model for performance evaluation of public enterprises 3.2 Product portfolio evaluation The problem is to assess the quality of products made by a company or production unit. This assessment is vital for the formation of strategies-. The approach with DEX was based on the so-called portfolio method (Krisper, et al., 1991), which evaluates products using two primary evaluation dimensions: market attractiveness and competitive ability. Several practical cases were analyzed in this way, including the products of some well-known Slovenian companies Fructal, Radenska, SRC, and DZS. 3.3 Evaluation of projects and investments The evaluation of projects or investment strategies is an industrial application context in which DEX has got the largest number of applications. The most typical investments included various software, hardware and technology, such as data base management systems, production control software, meteorological radar equipment, or a production line. The decision problems were often related to various investment proposals and tenders. An example of such applications, which is documented quite in detail, is a model for the evaluation of research and development projects (Bohanec, et al., 1995). 3.4 RemediatioM of dumpsites This is a recent application in the field of environmental care. In order to alleviate the problem of illegal dumpsites in Slovenia, an expert system was developed that assesses the environmental impact of dumpsites and suggests activities for their remediation (Spendi, 1998). The environmental impact of dumpsites is assessed by a qualitative DEX model (Figure 6), which is embedded in the expert system. Figure 6: Model for the assessment of dumpsile's environmental impact (topmost levels only) 3.5 Housing loae alllocatioini This is an example of a repetitive decision-making task being supported by a DEX model. The model is a part of a management decision support system that is used since 1991 by the Housing Fund of the Republic of Slovenia for the allocation of housing loans with favorable terms to citizens (Bohanec, et al., 1996). Until 1999, the Fund has issued 16 floats of loans, i.e., about two per year, and approved almost 20 thousand loans. The amount requested by applicants in a float typically far exceeds the available funds. Thus, the applicants must be ranked into a priority order. The procedure is required to be fast, reliable, transparent, and fair for all applicants. The request for transparency asks for effective explanations of loan priority order, which have to be provided to both the decision-making committee and a large number (usually, several thousands) of applicants. In the Fund's system, these requirements were fulfilled by a qualitative model that ranks the applications into five priority classes and provides a foundation for various explanations, which are obtained by analyses and simulations of application data and the model itself. 4 Experieece Some important lessons have been learned in the applications of DEX. Here, we present some findings related to the duration of model development processes, difficulty of development stages, and categories of decision problems that seem to be particularly well suited for the application of DEX. The time needed to develop a DEX model turns out to be extremely problem-dependent: it may take from few hours to several months. Most typically, however, the development requires about two working days for the development of model structure, from one to two days to define decision rules, and from one to several days to coiled data about options, to evaluate, and analyze them. Therefore, the process most typically lasts from two to ten working days. The most difficult stage of the process is its first one, in which the relevant attributes must be identified and appropriately organized into a hierarchical structure. This stage heavily relies on knowledge and experience of decision-makers and experts, and requires a deep understanding of the decision problem. It can still be considered more art than science. The remaining stages have been found much less problematic. Therefore, an appropriate identification of model structure mostly determines the success of the decision-making process. DEX with its qualitative modeling and ability to handle inaccurate and/or incomplete data about options appears particularly well suited for decision problems that involve qualitative concepts and a great deal of expert judgement. Also, it seems that the usefulness of DEX increases with the increasing difficulty, or "complexity", of the decision problem. So far, the best results were achieved in problems that required large models, consisting of at least 15 attributes, and/or involving a large number of options, i.e., from about 10 to several hundreds of options. On the other hand, DEX turned out to be unsuitable for problems that require exact formal modeling, numerical simulation and/or optimization. 5 Ferther Work Currently, there are three limitations of the DEX approach that, we believe, can be greatly improved by appropriate extensions of the methodology. First, the difficult stage of model structure development could be additionally supported by a machine learning method that would develop (or at least suggest) model structure using decision examples taken either from an existing database of past decisions, or provided explicitly by the decisionmaker. A considerable progress in this direction has already been made by the development of a learning method called HINT (Zupan, et al., 1999). Given training examples, HINT develops a hierarchical multi-attribute evaluation model that explains and possibly generalizes the examples. The structure of the models developed by HINT is essentially the same as the structure of models developed "manually" using DEX. The HINT'S model development is based on function decomposition, an approach that was originally developed for the design of digital circuits. Another limitation of DEX is that it is strictly limited to qualitative decision models; it cannot use numerical variables nor analytically represented utility functions that are commonly used in traditional quantitative models. This is sometimes advantageous in comparison with other decision modeling systems, which exclusively rely on quantitative models. However, many real-life decision problems require both qualitative and quantitative attributes, so the integration of these two may have a great practical impact: it may increase the flexibility of the method and extend the range of decision problems that can be successfully approached. Methodologically, such integration appears quite difficult and requires more research. In the context of DEX, we consider it a long-term goal. Last but not least, the major part of DEX software has been developed about ten years ago and currently appears quite outdated. Therefore, an overall redesign and renewal of software is planned for the near future. Currently, we are developing a program called DEXi, an educational subset of DEX to be used by students and teachers in secondary schools and faculties. We plan to follow this by the development of a functionally complete state-of-the-art DEX system. 6 Conclusion The DEX system effectively integrates two methodologies: multi-attribute decision making and expert systems. To a limited extent, it also includes some elements of machine learning and fuzzy logic. By this, it facilitates a structured and systematic approach to complex decision problems. So far, DEX has been successfully used in over fifty real-life decision problems in industry, medicine, health care and education, which all speak in favor for its wide applicability and flexibility. From the practical viewpoint, the most important characteristics of DEX are: 1. Qualitative (symbolic) decision modeling, which is particularly well suited for "soft" decision problems, i.e., less structured and less formalized problems, which involve a great deal of expert judgement. 2. Focus on the explanation and analysis of options, which lead to better-understood and justified decisions. 3. Active support of the decision-maker in the acquisition of decision rules, which speeds up model development and reduces the number of errors. The goals of further research and development related to DEX are twofold. First, we wish to improve the support in the difficult stage of model structure development, and propose to use machine learning methods, such as HINT, for that purpose. To further improve the flexibility and general applicability of the approach, we suggest further research towards an integration of qualitative and quantitative decision models. 7 References [1] S.J. Andriole: Handbook of Decision Support Systems. TAB Books, 1989. [2] M. Bohanec, V. Rajkovič, V.: DEX: An expert system shell for decision support. Sistemica I, 145157, 1990. [3] M. Bohanec, B. Kontić, D. Kos, J. Marušič, S., Polič, J. Rakovec, B. Sedej, et al.: Comparison of clay-pit locations Okroglica, Bukovnik, Marjetnica with respect to environmental protection (in Slovenian). Ljubljana: Jožef Stefan Institute, Report DP-6742, 1993. [4] M. Bohanec, V. Rajkovič, B. Semolič, A. Pogačnik: Knowledge-based portfolio analysis for project evaluation. Information & Management 28, 293302, 1995. [5] M. Bohanec, B. Cestnik, V. Rajkovič: A management decision support system for allocating housing loans. Implementing Systems for Supporting Management Decision (eds. P. Humphreys, L. Bannon, A. McCosh, Migliarese, J.-C. Pomerol). Chapman & Hall, 1996. [6] M. Bohanec, B. Zupan, V. Rajkovič: Hierarchical multi-attribute decision models and their application in health care. Proc. Medical Informatics Europe 99 (eds. P. Kokol, B. Zupan, J. Stare, M. Premik, R. Engelbrecht), Amsterdam: lOS Press, 670-675, 1999. [7] D.M. Buede, D.T. Maxwell: Rank disagreement: A comparison of multi-criteria methodologies. Journal of Multi-Criteria Decision Analysis 4, 1-21, 1995. [8] V. Chankong, Y.Y, Haimes: Multiobjective Decision Making: Theory and Methodology. North-Holland, 1983. [9] M. Krisper, V. Bukvič, V. Rajkovič, T. Sagadin: Strategic planning with expert system based portfolio analysis. EXPERSYS-9I: Expert system applications (eds. J. Hasemi, J.G. Gouardères, J.P. Marciano). IITT International, 1991. [10]T.L. Saaty: Multicriteria Decision Making: The Analytic Hierarchy Process. RWS Publications, 1993. [11]A.H. Simon: The New Science of Management Decw/o«. Prentice-Hall, 1977. [12]R. Spendi: An expert system for the evaluation of environmental impact and remediation of illegal dumpsites (in Slovenian). M.Sc. Thesis. University of Ljubljana, Faculty of Information and Computer Science, 1998. [13]T. Urbančič, I. Kononenko, V. Križman: Review of Applications by Ljubljana Artificial Intelligence Laboratories. Ljubljana: Jožef Stefan Institute, Report DP-6218,"l991. [14]B. Zupan, M. Bohanec, J. Demšar, I. Bratko, L: Learning by discovering concept hierarchies. Artificial Intelligence 109,211-242, 1999. Perception-Based Classification Mihael Ankerst, Christian Elsen, Martin Ester, Hans-Peter Kriegel Institute for Computer Science, University of Munich Oettingenstr. 67, D-80538 München, Germany (ankerst I ester I kriegel} @dbs.informatik.uni-muenchen.de, c.elsen@elsen.net Keywords: classification, decision tree, data mining, visualization Edited by: Cene Bavec and Matjaž Gams Received: October 2, 1999 Revised: December 2, 1999 Accepted: December 19, 1999 Classification is an important problem in the emerging field of data mining. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A very popular class of classifiers are decision trees because they sati.sfy the basic requirements of accuracy and understandability. Instead of constructing the decision tree by a sophisticated algorithm, we introduce a fully interactive method based on a midtidimensional visualÌ7.ation technique and appropriate interaction capabilities. Thus, domain knowledge of an expert can he profitably included in the tree construction phase. Furthermore, after the interactive construction of a decision tree, the user has a much deeper understanding of the data than just knowing the decision tree generated by an arbitrary algorithm. The interactive approach also overcomes the limitation of most decision trees which are fixed to binary splits for numeric attributes and which do not allow to backtrack in the tree construction phase. Our performance evaluation with several well-known datasets demonstrates that even users with no a priori knowledge of the data construct a decision tree with an accuracy similar to the tree generated by state of the art algorithms. Additionally, visual interactive classification significantly reduces the tree size and improves the understandibility of the resulting decision tree. 1 Introduction The success of computerized data management has resulted in the accumulation of huge amounts of data in several organizations. There is a growing perception that analyses of these large databases can turn this "passive" data into useful information. The term Data Mining refers to the discovery of non-trivial, previously unknown, and potentially useful patterns embedded in databases. Classification is one of the major tasks of data mining. The goal of classification is to assign a new object to a class from a given set of classes based on the attribute values of this object. Different methods [12] have been proposed for the task of classification, for instance decision tree classifiers which have become very popular. Decision tree classifiers are primarily aimed at attributes with a categorical domain, that is a small set of discrete values. Numeric attributes, however, play a dominant role in application domains such as astronomy, earth sciences and inolecular biology where the attribute values are obtained by automatic equipment such as radio telescopes, earth observation satellites and X-ray cristallographs. [6] discusses an approach that splits numeric attributes into multiple intervals rather than just two intervals. The well-known algorithms, however, perform a binary split of the form for a numeric attribute a and a real number v. The SPRINT decision tree classifier [3] processes numeric attributes as follows. There are n - 1 possible splits for n distinct values of a. The gini index is calculated at each of these n - I points and the attribute value yielding the minimum gini index is chosen as the split point. CLOUDS [4] draws a sample from the set of all attribute values and evaluates the gini index only for this sample thus improving the efficiency. A commercial system for interactive decision tree construction is SPSS CHAID [15] which - in contrast to our approach - does not visualize the training data but only the decision tree. Furthermore, the interaction happens only before the tree construction yielding user defined values for global parameters such as maximum tree depth or minimum support for a node of the decision tree. Visual representation of data as a basis for the human-computer interface has evolved rapidly in recent years. [8] gives a comprehensive overview over existing visualization techniques for large amounts of multidimensional data. Recently, several techniques of visual data mining have been introduced. [5] presents the technique of Independence Diagrams for visualizing dependencies between two attributes. The brightness of a cell in the two-dimensional grid is set proportional to the density of corresponding data objects. This is one of the few techniques which does not visualize the discovered knowledge but the underlying data. However, the proposed technique is limited to two attributes. [10] presents a decision table classifier and a mechanism to visualize the resulting decision tables. It is argued that the visualization is appropriate for business users not familiar with machine learning concepts. In contrast to well-known decision tree classifiers, our novel interactive approach enables arbitrary split points for numeric attributes, the use of domain knowledge in the tree construction phase and backtracking. In this paper, we introduce a novel interactive decision tree classifier based on a multidimensional visualization of the training data. Our approach allows to integrate the domain knowledge of an expert in the tree construction phase and it overcomes the limitation of binary splits for numeric attributes. The rest of this paper is organized as follows. In section 2 we introduce our technique for visualizing the training data. The support for interactively constructing a decision tree - which we have implemented in the Perception-Based Classification (PBC) system - is discussed in section 3. Section 4 reports the results of an extensive experimental evaluation on several well-known datasets. Section 5 summarizes this paper and outlines several issues for future research. 2 In our approach, we visualize the training data in order to support interactive decision tree construction. We introduce a novel method for visualizing multidimensional data with a class label such that their degree of impurity with respect to class membership can be easily perceived by a user. Our pixel-oriented method maps the classes to colors in an appropriate way. The basic idea of pixel-oriented visualization techniques [8] is to map each attribute value v; of each data object to one colored pixel and to represent the values belonging to different attributes in separate subwindows. The proposed techniques [9] differ in the arrangement of pixels within a subwindow. Circle Segments [2] is a recent pixel-oriented technique which was introduced for a more intuitive visualization of high-dimensional data. The Circle Segments technique maps d-dimensional objects to a circle which is partitioned into d segments representing one attribute each. Figure I illustrates the partitioning of the circle as well as the arrangement. Within each segment, the arrangement starts in the middle of the circle and continues to the outer border of the corresponding segment in a line-by-line fashion. These lines upon which the pixels are arranged are orthogonal to the segment halving lines. An extension of this technique has been applied in the context of cluster analysis [1] . While most approaches of visual data mining visualize the discovered knowledge, our approach is to visualize the training data in order to support interactive decision tree construction. attr. 8 attr. ^^^atóbute 1 ^^^^^attr. 2 attr. attr^l^ ^^^^attr. 3 attr. 4 Figure l.Illustration of the Circle Segments technique for 8-dimensionall data objects We introduce a novel method for visualizing multidimensional data with a class label such that their degree of impurity with respect to class membership can be easily perceived by a user. Our method performs pixel-oriented visualization and maps the classes to colors in an appropriate way. Let D be a set of data objects consisting of d attributes y4|, . . ., /4j and having a unique class label from the set of Classes = (c'l, ci, . . . c^}. For each attribute Aj, let a total order < be defined, for example the <-order for numeric attributes or the lexicographic order for string attributes. To map each attribute value of D to a unique pixel, we follow the idea of the Circle Segments technique, i.e. we represent all values of one attribute in a segment of a circle with the proposed arrangement inside a segment. We do not use, however, the overall distance from a query to determine the pixel position of an attribute value. Instead, we sort each attribute separately and use the induced order for the arrangement in the corresponding circle segment. The color of a pixel is determined by the class label of the object to which the attribute value belongs. In the following, we introduce our technique for mapping classes to colors. Let Colors be the set of all different colors which can be represented in a given color model such as the RGB model, denoted as Colors - {coli, cok, . . . col„,}, m {^...mj- maps class indices to color indices as follows: map(i) = 1 if i=i map{i-])+ dist{c-_^, f ■ ) total - class - disi ■X(m-\) else Note that (m-1) is the maximum difference between the indices of two elements from Colors and X denotes the smallest integer i with i> x . Finally, we define the function visualize:Classes —> Colors mapping classes to colors as follows: visualize(Ci) = cok,,,^-,^^) Several color scales satisfying these requirements have been proposed [11]. These color scales are appropriate when a total or partial order is defined for the classes. For the purpose of comparability of the results, the experiments reported in this paper have been performed on several datasets where no semantics about the classes is known. If no order of the classes is given, we do not need the first requirement to preserve the order of the attribute values. Furthermore, the second requirement is weakened such that each pair of colors co/j and co/| is perceived as being different, i.e. ,/-' - \1 else represented in a 374x374 window and 10.000 objects with 20 attributes fit into a 516x516 window. We have developed a color scale for class labels based on the HSI color model [7] , a variation of the HSV model. The HSI model represents each color by a triple (hue, saturation, intensity). In our experiments, we observed the most distinctly perceived colors for the following parameter settings: For col 1 we set hue = 2.5 and intensity = saturation = 1.0, for col m we set hue = 0.5 and intensity = saturation = 1.0, and all other colors were obtained by partitioning the hue scale into equidistant intervals. Our approach of visualizing the training data also considers attributes having a low number of distinct values. In that case, there are many objects sharing the same attribute value and their relative order is not uniquely defined. Depending on the chosen order, we might create homogeneous (with respect to the class label) areas within the same attribute value. To avoid the creation of artificial homogeneous areas, we use the technique of shuffling-, for a set of data objects sharing the same attribute value the required order for the arrangement is determined randomly, i.e. their class labels are distributed randomly. 3 Perception-based classification dist(col-, col The amount of training data that can be visualized at one time is approximately determined by the product of the number of attributes and the number of data objects. For example, 2.000 data objects with 50 attributes can be Figure 2. A model for interactive classification 496 Informatica 23 ( 1999) 493-499 Ankerst et al. hie roois Oper«ions oijtioiis Vievj Help ~n-> 3 rawbluBTtnean ISplIlC- 0.31-38.11- 7 1 which determine a network M = {E,RuR2,... ,Rr) In the following we restrict our discussion to a single relation R described by a corresponding binary matrix R = ' zjjnxn where _ r 1 XiRXj \ 0 otherwi otherwise In some applications ry can be a nonnegative real number expressing the strength of the relation R between units Xi andXj. 1.1.1 Example: Student Government In Table 1 and Figure 1 the Student Government network is presented. It consists of communication interactions among twelve members and advisors of the Student Government at the University in Ljubljana (Hlebec, 1993). The results of the measurement are not real interactions among actors but cognition about communication interactions. Data were collected with face to face interviews, conducted in May 1992. Figure 1 : Network graph: Student Government - discussion, recall Table 1 : Student Government matrix m p m m m m m m a a a ! 2 3 4 5 6 7 8 9 10 11 minister 1 1 0 1 1 0 0 1 0 0 0 0 0 p.minister 2 0 0 0 0 0 0 0 1 0 0 0 minister 2 3 1 1 0 1 0 1 1 1 0 0 0 minister 3 4 0 0 0 0 0 0 1 1 0 0 0 minister 4 5 0 1 0 1 0 1 1 1 0 0 0 minister 5 6 0 1 0 1 1 0 1 1 0 0 0 minister 6 7 0 0 0 1 0 0 0 1 1 0 1 minister 7 8 0 1 0 1 0 0 1 0 0 0 1 adviser 1 9 0 0 0 1 0 0 1 1 0 0 1 adviser 2 10 1 0 1 I 1 0 0 0 0 0 0 adviser 3 11 0 0 0 0 0 1 0 I 1 0 0 Communication flow among actors was identified by the following question: Of the members and advisors of the Student Government, whom do you (most often) talk with? The content of the communication flow was limited to the matters of the Student Government. The time frame was also defined: the question was referred to the six months period. One respondent refiised to cooperate in the experiment. As he was not considered in the analysis, the network consists of eleven actors. 1.2 Cluster amd ClmsterMg One of the main procedural goals of blockmodeling is to identify, in a given network, clusters (classes) of units that share structural characteristics defined in terms of R. The units within a cluster have the same or similar cormection patterns to other units. They form a clustering C = {CijCa,... ,Cfc} which is a partition of the set E: {}^Ci= E and i ^ j Cif] C j = 0. Each partition determines an equivalence relation (and vice versa). 1.3 Block A clustering C partitions also the relation R into blocks R{Ci,Cj) = RnCixCj Each such block consists of units belonging to clusters Ci and C j and all arcs leading from cluster C» to cluster Cj. If i = j, a block iž(Cj, Ci) is called a diagonal block. 1.4 Blockmodd amd Blockmodeling The goal of blockmodeling is to reduce a large, potentially incoherent network to a smaller comprehensible structure that can be interpreted more readily. Blockmodeling, as an empirical procedure, is based on the idea that units in a network can be grouped according to the extent to which they if O i/ >o Figure 2: Blockmodeling scheme. complete col-dominant row-functional col-functional Figure 3 : Types of connection between two sets; the left set is the ego-set. are equivalent, according to some meaningful definition of equivalence. A blockmodel consists of structures obtained by identifying all units fi-om the same cluster of the clustering C. For an exact definition of a blockmodel (see Figure 2) we have to be precise also about which blocks produce an arc in the reduced graph and which do not, and of what type. Some types of connections are presented in Figure 3. A block is symmetric if V(X,F) e Ci X Cj : {XRY YRX) Note that for nondiagonal blocks this condition involves a pair of blocks Cj) and R{Cj,Ci). Table 2: Block types and 1 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 Ci C2 complete regular C2 null complete Q I é) The reduced graph can be presented by relational matrix, called also image matrix (see Table 2). A clustering and the induced blockmodel of the Student Government is presented in Figure 4. Figure 4: Blockmodeling example. 2 Blockmodeling - Formalization Let C/ be a set of positions or images of clusters of units. Let /X : £ C/ denote a mapping which maps each unit to its position. The cluster of units C{t) with the same position ^ G 17 is Therefore C{ii) = {C{t) :teU} is a partition (clustering) of the set of units E. A blockmodel is an ordered sextuple M = (Ì7,r, <5, TT, a) where: - C/ is a set of positions (types of units); - JT C X [/ is a set of connections', - T is a set of predicates used to describe the types of connections between different clusters in a network. We assume that nul 6 T. - a mapping -k : K ^ T \ {nul} assigns predicates to connections; - Q is a set of averaging rules. A mapping a : K Q determines rules for computing values of connections. A (surjective) mapping p, : E U determines a blockmodel M of network Af iff it satisfies the conditions: G K ■.Tr{t,w){C{t),C{w)) and \/{t,w) eUxU\K: nul{Cit),C{w)). 2.1 Equivalences Let « be an equivalence relation over E and [JT] — {Ye jS : X « y}. We say that w is compatible with T over a network M iff "iX^Y € E 3T £ T ■. T {\X],[Y]). It is easy to verify that the notion of compatibility for T = {nul, reg} reduces to the usual definition of regular equivalence (White and Reitz 1983). Similarly, compatibility for T = {nul, com} reduces to structural equivalence (Lorrain and White 19 71 ). For a compatible equivalence k the mapping /i : X i-> [A'] determines a blockmodel with U = EJ 3 Optimization 3.1 Criterion Function The problem of establishing a partition of units in a network in terms of a selected type of equivalence is a special case of clustering problem that can be formulated as an optimization problem: determine the clustering C* for which P(0=minP(C) Table 3: Characterizations of Types of Blocks. null nul all 0* complete com all r row-regular rre each row is 1-covered col-regular ere each column is 1 -covered row-dominant rdo 3 all 1 row* col-dominant cdo 3 all 1 column* regular reg 1-covered rows and 1-covered columns non-null one 3 at least one 1 * except may be diagonal where C is a clustering of a given set of units E, $ is the set of all feasible clusterings and P : $ IR the criterion function. One of the possible ways of constructing a criterion function that directly reflects the considered equivalence is to measure the fit of a clustering to an ideal one with perfect relations within each cluster and between clusters according to the considered equivalence. Given a set of types of connection T we can introduce the set of ideal blocks for a given type T e T by ß{Ci,Cf,T) = {BCCixCr.T{B)} Using Table 3 we can efficiently test whether the block R{Ci,Cj) is of the type T; and define the deviation 5{Ci,Cj]T) of a block R{Ci,Cj) from the nearest ideal block. For example <5(C,,Q;reg) = |Ci| ■ - c,) + |C7,-| • - n) where cj is the number of non-zero columns, and r^ is the number of non-zero rows in the block R{Ci,Cj). For details see (Batagelj 1997). For the proposed types all deviations are sensitive 6{Ci,Cy,T) = Q <^T{R{Ci,Cj)). Therefore a block R{Ci, Cj) is of a type T exactly when the corresponding deviation 5{Ci,Cj\ T) is 0. In the deviation 5 we can also incorporate values of lines v. Based on deviation S(Ci, C j ; T) we introduce the block-error e{C u C f,T) of R(Ci,Cj) for typer. An example of block-error is £iCi,Cf,T) = w(T)SiCi,Cj-,T) where w{T) > 0 is a weight of type T. We extend the block-error to the set of feasible types T by defining and siCuCj^T) = min B{Ci, Cf,T) TT{fx{Ci),ßiCj)) = aigmiaj^^j-e{Ci,Cj-,T) To make tt well-defined, we order (priorities) the set T and select the first type from T which minimizes e. We combine block-errors into a total error - blockmodeling criterion function P{c{ßy,r)= Y. e{c{t),c{w)-,r). (t,w)euxu For criterion fiinction P we have P{C{ß)) = 0 /li is an exact blockmodeling The obtained optimization problem can be solved by local optimization. Once a partitioning ^ and types of connection TT are determined, we can also compute the values of connections by using averaging rules. 3.2 Local Optimization For solving the blockmodeling problem we use a local optimization procedure (relocation algorithm): Determine the initial clustering C; repeat: if in the neighborhood of the current clustering C there exists a clustering C such that P(C') < P{C) then move to clustering C . The neighborhood in this local optimization procedure is determined by the following two transformations: - moving a unit Xk from cluster Cp to cluster Cq {transition)-, - interchanginguniis Xu Xy from different clusters Cp and Cq {transposition). 3.3 Benefits from Optimization Approach - ordinary / inductive blockmodeling: Given a network M and set of types of connection T, determine M, i.e.,/It, TT and a; - evaluation of the quality of a model, comparing different models, analyzing the evolution of a network (Sampson data, Doreian and Mrvar 1996): Given a network tV, a model M, and blockmodeling /i, compute the corresponding criterion fiinction; - model fitting / deductive blockmodeling: Given a network Af, set of types T, and a model M, determine ß which minimizes the criterion fiinction (Batagelj, Feriigoj, Doreian, 1998). - we can fit the network to a partial model and analyze the residual afterward; - we can also introduce different constraints on the model, for example: units X and Y are of the same type; or, types of units X and Y are not connected; 4 Pre-Specified Blockmodels Figure 5: Symmetric acyclic blockmodel of Student Gov-ermnent. The pre-specified blockmodeling starts with a blockmodel specified, in terms of substance,to an analysis. Given a network, a set of ideal blocks is selected, a reduced model is formulated, and partitions are established by minimizing the criterion function. The pre-specified blockmodeling is supported by the program MODEL 2 (Batagelj, 1996). As an example of pre-specified blockmodel we present in Figure 5 a symmetric acyclic blockmodel of Student Government. The obtained clustering in 4 clusters is almost exact - acyclic model with symmetric clusters. The only error is produced by the arc (a3, m5). 5 Final Remarks The current, local optimization based, programs for generalized blockmodeling can deal only with networks with at most some hundreds of units. What to do with larger networks is an open question. For some specialized problems also procedures for (very) large networks can be developed (Doreian, Batagelj, Ferligoj, 1998). Another interesting problem is the development of blockmodeling of valued networks. MODEL 2 and related programs and data can be obtained from http://vlado.fmf.uni-Ij.si/ pub/networks/stran/ Acknowledgment: This work was supported by the Ministry of Science and Technology of Slovenia, Project Jl-8532. References [1] Batagelj, V. (1991): STRAN - STRucture ANalysis. Manual, Ljubljana. [2] Batagelj, V. (1997): Notes on Blockmodeling. Social Networks, 19, 143-155. Also in: Abstracts and Short Versions of Papers, 3rd European Conference on Social Network Analysis, München, 1993: DJI, 1-9. [3] Batagelj, V., R Doreian, and A. Ferligoj (1992): An optimizational approach to regular equivalence. Social Networks, 14:121-135. [4] Batagelj, V., A. Ferligoj, and R Doreian (1992): Direct and indirect methods for structural equivalence. Social Networks, 14:63-90. [5] Batagelj, V., Ferligoj, A., and Doreian, R (1998): Fitting Pre-Specified Blockmodels, in Data Science, Classification, and Related Methods, Eds., C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. H. Bock, and Y. Baba, Springer-Verlag, Tokyo, p.p. 199206. [6] Borgatti, S.R and M.G. Everett (1989): The class of all regular equivalences: Algebraic structure and computation. Social Networks, 11:65-88. [7] Doreian, R, V. Batagelj, and A. Ferligoj (1994): Partitioning Networks on Generalized Concepts of Equivalence. Journal of Mathematical Sociology, 19/1:127. [8] Doreian, R, V. Batagelj, and A. Ferligoj (1998): Symmetric-Acyclic Decompositions of Networks. To appear in Journal of Classification. [9] Doreian, P. and A. Mrvar (1996) A Partitioning Approach to Structural Balance. Social Networks 18:149-168. [10] Faust, K. (1988): Comparison of methods for positional analysis: Structural and general equivalences. Social Networks, 10:313-341. [11] A. Feriigoj, V. Batagelj, and Doreian, R (1994): On Connecting Network Analysis and Cluster Analysis. In Contributions to Mathematical Psychology, Psy-chometrics, and Methodology {G.U. Fischer, D. Laming Eds.), New York: Springer. [12] Lorrain, F. and H.C. White (1971): Structural equivalence of individuals in social networks. Journal of Mathematical Sociology, 1:49-80. [13] Hlebec, V. (1993): Recall versus recognition: Comparison of two alternative procedures for collecting social network data. Developments in Statistics and Methodology. (A. Ferligoj and A. Kramberger, editors) Metodološki zvezki 9, Ljubljana: FDV, 121-128. [14] White, D.R. and K.R Reitz (1983): Graph and semigroup homomorphisms on networks of relations. Social Networks, 5:193-234. Adapted Methods For Clustering Large Datasets Of Mixed Units Simona Korenjak-Čeme IMFM Ljubljana, Dept. of TCS, Jadranska 19, 1 000 Ljubljana, Slovenia E-mail: simona.korenjak@finf.uni-lj.si Keywords: clustering, large datasets, mixed units, hierarchical clustering, cluster description compatible with merging of clusters, leaders method, adding clustering method Edited by: Cene Bavec and Matjaž Gams Received: October 17, 1999 Revised: October 30, 1999 Accepted: December 11, 1999 The proposed clustering methods are based on the recoding of the original mixed units and their clusters into a uniform representation. The description of a cluster consists for each variable of the frequencies of the variable values over its range partition. The proposed representation can be used also for clustering symbolic data. On the basis of this representation the adapted version of the leaders method and adding clustering method were implemented. We describe both approaches, which were successfully applied on several large datasets. 1 Introduction Abstraction is the main tool to deal with large amounts of data. The first step is to identify groups of similar units -clusters. In data analysis this is a task of clustering methods. The most popular are hierarchical clustering methods. Because they usually use a similarity/dissimilarity matrix they are appropriate only for clustering datasets of a moderate size (some hundreds of units). On the other hand well known nonhierarchical methods are mostly implemented for datasets of variables measured in the same scale type (such as for example 'k-means method'). Because of these limits we are searching for new clustering methods or at least trying to adapt known methods to be appropriate for clustering large datasets of mixed units, where variables (properties) of the units are measured in different scales. Let £? be a finite set of units. A nonempty subset C C E is called a cluster. A set of clusters C = {Ci} forms a clustering. In this paper we shall require that every clustering C is a partition of E. The clustering problem can be formulated as an optimization problem: Determine the clustering C* 6 for which P(C*) = minP(C) ^ ' Ce«' IR+ where $ is a set of feasible clusterings and P is a criterion function. In many clustering methods the criterion function measures the deviation of units from representatives {leaders) of corresponding clusters. In our method we select the criterion function in one of the most frequent form E E cecx€c where Rc is a representative of cluster C and d is a dissimilarity. The cluster representatives usually consist of variable-wise summaries of variable values over the cluster. For homogeneous units with only numerical variables their means are usually selected as representatives of clusters. For mixed (nonhomogeneous) units a new description has to be selected. In this paper we investigate a description satisfying two additional requirements: 1. it should require a fixed space per variable; 2. it should be compatible with merging of clusters -knowing the description of two disjoint clusters we can, without additional information, produce the description of their union. Note that only some of the cluster descriptions are compatible with merging. For example mean (as sum and number of units) for numerical variables and (min, max) intervals for ordinal variables. 2 A description of a cluster For our adaptation of clustering methods to be appropriate for clustering large datasets of mixed units, we choose a cluster description based on frequencies. For this purpose, the ranges of the variables are partitioned into selected number of classes. Let {Vi, i = 1,... ,k{V)} be a partition of the range of values of variable V (the number of classes k{V) depends on variable). Then we can define for a cluster C the sets Q{i, C-,V) = {X^C-. ViX) € Fi}, i = 1,... , k{V) where V{X) denotes the value of variable V on unit X. In the case of an ordinal variable V (numerical scales are a special case of ordinal scales) the partition {Vi, i = 1,... , k{V)} usually consists of intervals determined by selected threshold values to < h < t2 < ts <■■■ < h(v)-\ < h(v), to = inf V, hiv) = sup V. For nominal variables we can obtain the partition, for example, by selecting k{V) — 1 values ti,t2,t3,... , tk(v)-i from the range of variable V (usually the most frequent values on jS) and setting Vj = {ti}, i = l,... ,A;(y)-l;and putting all the remaining values in class Vk{v) • Units are not necessarily represented with single value for each variable, but they can also be represented with frequencies over the classes of variables ranges. Using classes of ranges we get frequencies q{i,C-,V) = caxàQ{i,C-,V) and relative frequencies q{i,C-,V) p{i,C-,V) = card C Note that k(V) i=\ AVhen only a single unit is in the cluster C we get pV'C-.yi'io] " itx eQ(i,c-,v) otherwise We can add, for each variable, a new class for a missing value and treat it as a special value, or we can also consider a missing value on V for a unit X by setting p{i, {xy, V) = i = 1,... , k{V) (or by some other distribution). It is easy to see that such a description is compatible with merging, because for two disjoint clusters Ci and C2 we have Q{i, Ci U C2; y) = Q{i, Ci; y) U Q{i, C2-,V), q{i, Ci U C2; V) = q(i, Ci;V) + q(i,C2;V). The threshold values are usually determined in such a way that, for the given set of units E (or the space of units Q, it holds ±at p(i,E;V) « = 1,... ,k(V). As a compatible description or nominal variable over a cluster C also its range V(C) can be used, since we have V(CiUC2) = V(Ci)UV(C2). Example: Recoding of flags dataset Original data are taken from the address f tp://ftp.ics.uci.edu/pub/ machine-learning-databases/flags (Flags from Collins Gem Guide to Flags, donated by Richard S. Forsyth.) Let us consider the following three variables; - population (in round millions), - mainhue (predominant color in the flag (tie-breaks decided by taking the topmost hue, if that fails then the most central hue, and if that fails the leftmost hue)), - text (1 if any letters or writing on the flag (e.g., a motto or slogan), 0 otherwise). The range of the variable population is divided into 5 classes with approximately the same number of units in each of them. The ranges of the others variables are so small that we put for discretization of them each possible value in a separate class: var= population var= mainhue var= text map 1 = {0} 2 = (0,4] 3 = (4,18] 4 = (18,158] map 1 = {red} 2 = {green} 3 = {blue} 4 = {gold} map 1 = {0} 2 = {1} 5 = (158,1100] 5 = {white} 6 = {black} 7 = {orange} ORIGINAL DATA unit ID population mainhue text Austria 8 red 0 New-Zealand 2 blue 0 Saudi-Arabia 9 green 1 Switzerland 6 red 0 USA 231 white 0 RECODED DATA Austria New-Zealand Saudi-Arabia Switzerland USA 3 2 3 3 5 1 1 3 1 2 1 1 In our case for each variable a unit is represented with index of the appropriate class. The description of a cluster Ce (only for considered variables) obtained with the leaders method is qiCe] population) 8 1 ICQ qiCei mainhue) 1 0 9 0 0 q{C6',text) . 6 4 0 0 >From this description we can see that in eight countries the population is less than a million, in one country is between 1 and 4 millions and in one country population is between 4 and 18 millions. This cluster is one of the seven clusters obtained with the adapted version of the leaders method with maximal allowed dissimilarity between a unit and its nearest leader 0.5. In one of the countries flags red is a dominant color and all of the remaining units have blue mainhue. In six units some text is presented and four units inside cluster Ce have no text in their description. For better understanding, cluster Cg consists of: Bermuda, Brit. Virg. Isles, Cayman Islands, Falklands Malvi, Fiji, HongKong, Montserrat, St. Helena, Turks Cocos Islands and Tuvalu. 3 Dissimilarity between clusters Let us return to our approach to clustering problem as an optimization problem. After deciding to use the uniform representation of units and clusters, we have to define a measure of dissimilarity between clusters (a unit is a special case of a cluster with only one element). First the dissimilarity between clusters for individual variable V is defined as , k(v) i=l We shall use the abbreviation d{X,C-,V) = di{X},C;V). In both cases it can be shown that 1. d{Ci, C2 ; F) is a semidistance on clusters; i.e. (a) d{Ci,C2;V)>0 (b) diC,C-,V)==Q (c) ,C2-,V)+ d{C2, C3-,V)> d{Ci, C3; F) 2. d(Ci,C2;F)e[0,l] and for the representation of a single unit also X € Q(i, E- V) ^ d{X, C; y) = 1 - p{i, C; V) The semidistances on clusters for individual variable can be combined into a semidistance on clusters for complete descriptions by m j=i where m is the number of variables and aj are weights {a j > 0 and a j = 1); often a j = i. We can use weights to consider dependencies among variables or to tune the dissimilarity to a given learning set in AI applications. 4 Clustering procedures In the proposed approach the original nonhomogeneous data are first recoded to a uniform representation. For the recoded data efficient clustering procedures can be built by adapting leaders method (Hartigati, 1975) or adding clustering method (Zupan 1982, Jambu and Lebeaux 1983, Batagelj andMandelj 1993). 4.1 The adapted version of the leaders method The adapted version of the leaders method is a variant of a dynamic clustering method (Diday 1979, Batagelj 1985). To describe the dynamic clustering method for solving the clustering problem let us denote: A a set oirepresentatives', L C A a representation-, $ a set offeasible representations', P : # IRq criterion function', G : ^ ^ a representation function', F : $ ^ $ a clusteringfiinction and suppose that the functions G and F tend to improve (diminish) the value of the criterion function P. Then a simple version of the dynamic clustering method can be described by the scheme: L := Lo; repeat C := F(L) L := G(C) until the leaders stabilize We begin with the initial representation and then repeat to assign each unit to the nearest leader and after that select leaders for each (new) cluster until we reach the minimum of the criterion function or until the leaders don't change any more (local minimum). Let us assume the following model C = {Cj}ig/, L = {Li}iei, M^) = Li : X Ci (the nearest leader to die unit X), L = [L{Vi),... , L(Kr»)], L{V) = [s{l,L'V),...,s{k{V),L',V)], E-Z's{j,L',V) = 1 (the description of a leader has the same form as the description of a cluster) and , k{V) diC,L',V) = -J2\pU,C;V)-sij,L',V)\. j=i For selected criterion function P(C) = ^ d{X,L{X)) = xeE iei where p{C,L) = ^ diX,L) X€C we define F(L) = {C'i) with X E Cl: i = minArgmin{cf(X,Lj) : Lj G L}. 3 This means that each unit is assigned to the (first) nearest leader. We define G(C) = {LJ} with L'i = argminp(C, L). Let The unique symmetric optimal solution of this optimization problem is i; ifjGM otherwise where M — {j : q(j, C; V) = maxi q(i, C; and t = card M. The representative (leader) of a cluster is obtained from the most frequent range(s) of values of variables on this cluster. Example: Leader of a cluster For the description of a cluster Ce q(C6;population) 8 1 qiCß-, mainhue) 1 0 q{Ce;text) 6 4 the optimal leader Le is q{Ce\ population) 1 0 q^Ce; mainhue) 0 0 q{C6-,text) 1 0 1 0 0 9 0 0 0 0 0 0 0 1 0 0 0 0 The characteristics of the cluster are population = less than a million 80 % mainhue in the flag = blue 90 % text in the flag = no 60 % For example, 80% of all countries in the cluster Cq have less than a million inhabitants, 90% of all countries flags have blue mainhue and 60% of the flags in the cluster have no text in their descriptions. Properties of the leaders method The main properties of the adapted version of the leaders method are: 1. Selection of the leaders and formation of new clusters diminish the value of the criterion function. 2. The program always stops (converges). The number of iterations is usually less than 10. 3. The program is suitable for clustering (very) large datasets. 4. The leaders descriptions provide us with simple interpretations of clustering results. 4.2 The adapted adding method The adding clustering method is a hierarchical clustering method in which a new unit is added in a clustering tree. Each vertex corresponds to a cluster. For large datasets usually only the upper part of the hierarchy is maintained, the lower levels subtrees are replaced by 'bags' containing all units from a subtree. We shall use the same description of a cluster (vertex) and the same definition of a dissimilarity as in the leaders method. Every time we add a unit in a cluster (vertex) the frequencies are recalculated. There are two possible ways how to add a new unit: a) To maximize the dissimilarity between clusters (sons) of the current vertex or, b) To minimize the dissimilarity from clusters (sons) of the current vertex. In the first case (see Figure 1) the dissimilarities between both sons of a current vertex are calculated. Because of greedy approach the case with the biggest dissimilarity is chosen: ma,x{d(Cp U {X},Cg),d{Cp,Cg U {X}),diC,{X})}. C A CU{X} A CpU{x} c, Cp c, u{a:} Cp c. Figure 1 : Maximize the dissimilarity between clusters c A ■X Cp c, ifd(Cp,X) < d(C„JV)then CU{X} A CpU{X} c. else CU{X} A Cp C, u {X} Figure 2: Minimize the dissimilarity from clusters In the second case (see Figure 2) the dissimilarities from each of the sons of current vertex are calculated and the unit is added to the nearest one: mm{d{Cp,{X}),d{Cg,{X})}. The proposed approaches can also be extended on non-binary trees. The adding clustering method has some advantages: 1. Presentation of the result with a tree. 2. It can be used for classification. 3. Speed up - if the tree has many (hundreds of) leaves which represent the leaders, it is more efficient adding unit into the tree with this method than to calculate the dissimilarities to each of the leaders. A drawback of the adding method is that the result strongly depends on the ordering of the input sequence of units. A possible way to avoid this problem is to select a 'good' initial tree. We are suggesting to built the initial tree with some agglomerative hierarchical clustering method on leaders obtained with the leaders method. The other possibility is to include balancing of the tree in the process of adding new unit. Both possibilities are still under the development. 5 Conclusion We successfully applied the proposed approach on the dataset of types of cars (1 349 units, 26 variables), on the ISSP data (45 784 units, 21 variables) and also on some large datasets from AI collection http: //www. ics .uci. edu/~inlearn/ MLRepos i tory.html The first version of the program ClaMix (based on the adapted version of the leaders method) and some of the results are available at http://www.educa.fmf.uni -1j.si/datana/ Acknowledgment: This work was supported by the Ministry of Science and Technology of Slovenia, Project Jl-8532. [5] Diday, E. (1979) Optimisation en classification au-tomatique. Tome l.,2. INRIA, Rocquencourt, (in French). [6] Diday, E. (1997) Extracting Information fi-om Extensive datasets by Symbolic Data Analysis. Indo-French Workshop on Symbolic Data Analysis and its Applications, Paris, 23-24. September 1997, Paris IX, Dauphine, p. 3-12. [7] Hartigan, J.A. (1975) Clustering Algorithms. Wiley, New York. [8] Jambu, M. & Lebeaux, M.O. (1983) Cluster Analysis and Data Analysis. North-Holland Publishing Company. [9] Korenjak-Čeme, S. & Batagelj, V. (1998). Clustering large datasets of mixed units. Advances in Data Science and Classification. Rizzi, A., Vichi, M. and Bock, H.-H. (Eds.), Springer, Berlin, 1998, p. 43-48. [10] Tukey, J.W. (1977) Exploratory Data Analysis. Addison-Wesley, Reading, MA. [11] Zupan, J. (1982) Clustering of Large Data Sets. Research Studies Press, John Wiley & Sons LTD. [12] Flags from Collins Gem Guide to Flags. Collins Publishers (1986). Donated by Richard S. Forsyth. ftp://ftp.ics.uci.edu/pub/ machine-learning-databases/flags References [1] Batagelj, V. (1985) Notes on the dynamic clusters method. Proceedings of the IV conference on applied mathematics. Split, May 28-30, 1984. University of Split, Split, p. 139-146. [2] Batagelj, V. & Bren, M. (1995) Comparing Resemblance Measures. Journal of Classification, 12, 1, p. 7390. [3] Batagelj, V. & Mandelj, M. (1993) Adding Clustering Algorithm Based on L-W-J Formula. Paper presented at: /FC5'Pi,Paris, 31.aug-4.sep 1993. [4] Brucker, P. (1978) On the complexity of clustering problems. Lecture Notes in Economics and Mathematical Systems J 75, in: Optimization and Operations Research, Proceedings, Bonn. Henn,R., Korte,B., Oettli,W. (Eds.), Springer-Verlag, Beriin 1978. Equation Discovery System And Neural Networks For Short-Term Dc Voltage Prediction Irena Nančovska, Anton Jeglič and Dušan Fefer Faculty of Electrical Engineering, Tržaška 25, Ljubljana, Slovenia Phone: +386 61 1768 216, Fax: +386 61 1768 214 E-mail: {Irena.Nancovska,Anton.Jeglic,Dusan.Fefer}@fe.uni-lj.si AND Ljupčo Todorovski, Jozef Stefan Institute, Jamova 39, Ljubljana, Slovenia Phone: +386 61 1773 307, Fax: +386 61 1258 058 E-mail: Ljupco.Todorovski@ijs.si Keywords: neural networks, equation discovery, machine learning Edited by: Cene Bavec and Matjaž Gams Received: October 12, 1999 Revised: November 25,1999 Accepted: December 15, 1999 The aim of the paper is to compare the predictive abiUties of the novel method for time series prediction that is based on equation discovery with neural networks. Both methods are used for short-term (one-step ahead) prediction and have the abihty to learn from examples. With purpose to validate the predictive models, they are applied to several data sets. The successful predictive models could be used for voltage monitoring in a high precision solid-state DC voltage reference source (DCVRS) without presence of a high level standard, and further for voltage correction as a segment in the software controlled voltage reference elements (VRE). 1 Introduction some important characteristics: universal approximation (input-output mapping), ability to leam from and adapt to Measured time series could be described as mixtures of dy- their environment and the ability to evoke weak assump-namic, deterministic part which drives the process and ob- tion about the underlying physical system which gener-servational noise which is added in the measurement pro- ates the input data (Haykin 1998). In the paper we use cess, but does not influence the future behavior of the sys- three types of neural networks. The first one is a su-tem. Many up-to-date scientific researches on predicting pervised multilayer feedforward network, which is trained the future behavior of system are based on modelling of the with back-propagation learning algorithm (Haykin 1998, deterministic part. Examples ranges from the irregularity iii Nielsen 1990, Pham 1995). The second type emphasizes the annual number of sunspots to the changes of currency the role of time as an essential dimension of learning. It exchange rates. To make a forecast if the underlying deter- is a natural extension of the first type, replacing the or-ministic equations of the observed system are not known, dinary synaptic weights with finite-duration impulse re-one must find out both the rules governing system dynam- sponse (FIR) filters (Haykin 1998, Gershenfeld & Weigend ics and the present state of the system (Gershenfeld & 1992). The third type of network a recurrent structure Weigend 1992). Mainstream statistical techniques for pre- with a hidden neurons which introduce time in the network dieting include variations of the auto-regressive technique processing by virtue of the built-in feedback loop (Alippi that Yule invented in 1927. The technique uses weighted 1996, Haykin 1998). sum of previous observations of the series to predict the Equation discovery systems explore the hypothesis next value. However, there are a number of cases for which space of all equations that can be constructed given a set of this paradigm is inadequate because of the non-linearity of arithmetical operators, fiinctions and variables, searching the underlying model (Gershenfeld & Weigend 1992). In for an equation that fits the input data best. In the paper, the paper we present two different paradigms for forecast- we present an equation discovery system Lagramge that ing: neural networks and equation discovery. Neural net- uses context free grammars for restrictmg the hypothesis works used for prediction are characterized as black-box space of equations. The hypothesis space of lagramge models whereas models obtained with equation discovery is a set of equations, such that the expressions on their systems are transparent (white-box). right hand sides can be derived from a given context free Neural networks represent an emerging technology with grammar. For the purpose of time series prediction, we use difference equations, that predicts the present value of the time series. Three different grammars for Hnear, quadratic and piecewise linear equations are used. In order to compare the predictive abilities of two described paradigms we performed experiments in two synthetic and three real world time series prediction problems. The domains used in the experiments present models with different amounts of non-linear dynamics (determinism) and noise (randomness). The predictive models obtained in the experiment with reference voltage domain can be used in to improve the metrological characteristics of a DCVRS in two different manners: voltage monitoring and voltage correction. For the purpose of voltage monitoring, predictive models could be used during the inter-calibration period without presence of a high level standard while the predictors are obtained during the calibration period by using a high precision instrument. Further, the models could be used for voltage correction, as a segment in a software controlled VRE. By implementation of a control loop for voltage correction, based on the obtained predictors, the sensitivity of the reference source could be reduced, which contributes to enhancement of the robustness of the system and thereby the stability of the reference voltage (Nancov-ska 1997). The paper is organized as follows. First two sections describe the techniques used for time series predicting. In Section 2 a brief description of used neural networks is given and Section 3 gives overview of the equation discovery system Lagramge. The results of applying both techniques on five time series data sets are presented in Section 4, Finally, Section 5 concludes with a summary of the results and directions for further work. 2 Neuairal Neitworks The time series a;(l),a;(2),a;(3)..., which describes the system is given. From the series we generate vectors x(n) = [x(n - 1), x{n - 2),... , x{n - p)]^, which describe the last p values of the phenomenon until time n -1. We are trying to find a map F(x(n)) = x{n) such that the predicted value x(n) in time n is the most similar to the original signal value x{n) in time n. For accomplishment of F we use three different types of neural networks (Ger-shenfeld & Weigend 1992, Narendra 1990, Pham 1995). 2.1 Mmltilayer perceptrom (MP) We use a general multilayer feed-forward network (Lippmann 1987) whose learning algorithm is generalized S-rule or back-propagation (BP). The user interface provides regulation of the following parameters: number of layers, number of neurons in each layer, learning rate t] and momentum term a. Parameters rj and a could be changed during the training. Neurons in input layer act as buffers for distributing the input signals x(n) to neurons in the hidden layer (Haykin 1998, Lippmann 1987, Nielsen 1990, Pham 1995). MP is usually used as pattern recognition tool, but from a systems theoretic point of view it can be also used for approximation of non-linear maps (Narendra 1990). 2.2 FIR mmWilayer perceptron (FIR-MP) In order to allow time to be represented by the effect it has on signal processing or to make the network to be dynamic, time delays are introduced into the synaptic structure of the network and their values are adjusted during the learning phase (Haykin 1998). In fact each synapse is represented by a finite-duration impulse response (FIR) filter (Figure 1). FIR-MP network is a vector generalization of the MP and its learning algorithm is a vector generalization of the standard BP algorithm, called temporal BP (TBP). The basic form of TBP is non-causal because the computation of weights requires knowledge of future values of weights' changes 6-s and weights w-s. It could be made causal by adding a finite number of delay operators on the feedback connections so that only present and past values of S-s and w-s are used. We hypothesize that by introducing tapped delay feedbacks the NN performance on problems involving time dependencies could be improved. In (Lin 1996) FIR-MP is compared to the NARX recurrent network by its computational power. NARX is computationally as strong as fully connected recurrent network thus is Turing machine equivalent (Siegelmann 1995, Siegelmann & Sontag 1995). 2.3 ReciuirreEt metwork im real time (EN) The net (Haykin 1998, Pham 1995) consists of connected input-output layers and processing layer. RN has ability to connect the external time-varying input with its previous output by using delay operator. The learning algorithm used is real time recurrent learning (RTLL) (Haykin 1998), which is gradient-descent learning algorithm and minimizes the error function by changing the weights of all visible neurons. This architecture is capable of representation of arbitrary non-linear dynamical system and it is Turing equivalent (Alippi 1996, Siegelmann 1995, Siegelmann & Sontag 1995). However, learning simple behavior can be quite difficult by using gradient descent'. RTRL is not guaranteed to follow the negative gradient of the error function. This is a consequence of the feedback connection and it can be improved by slow changing of weights. Although RN has difficulty capturing the global behavior (Lin 1996) it is usefial for learning short-term dependencies and thus can be used for short-term predictions. ' For example, even it is Turing equivalent, it has been difficult to get it successfully learn finite-state machines from example strings encoded as sequences. )-»-j ^/p-l) ) > , I 1-»-0 Figure 1 : Signal-flow graph of a synaptic FIR filter 3 Equation Discovery The problem of equation discovery, as addressed by La- gramge, can be defined as follows. Given are - a context free grammar G = (N, T, P, S) (see next section) and - input data D = {V,Vd, M), where - V = {vi,v2,-. -Vm} is a set of domain variables, - Wd G is the dependent variable and - M is a set of one or more measurements. Each measurement is a table of measured values of the domain variables at successive time points: time Vl V2 Vm to Vlfi V2,0 ■ ■ Vm.O h fl,l V2,l ■ ■ Vm^i t2 Vl,2 V2,2 . ■ Vm,2 tN Vl,N V2,N ■ ■ Vm,N Find an equation for expressing the dependent variable Vd in terms of variables in V. This equation is expected to minimize the discrepancy between the measured and calculated values of the dependent variable. The equation can be: - differential, i.e. of the form dvd/dt = va = E, or - ordinary, i.e. of the form Vd = E, where £ is an expression that can be derived from the context free grammar G. 3.1 Restricting the space of possible equations The syntax of the expressions on the right hand side of the equation is prescribed with a context free grammar (Hopcroft & Ullman 1979). A context free grammar contains a finite set of variables (also called nonterminals or syntactic categories) each of which represents expressions or phrases in a language (in equation discovery, nonterminals represent sets of expressions that can appear in the equations). The expressions represented by the nonterminals are described in terms of nonterminals and primitive symbols called terminals. The rules relating the nonterminals among themselves and to terminals are called productions. The original motivation for the development of context free grammars was the description of natural languages. For example, a simple grammar for deriving sentences consists of the productions sentence noun verb, noun network, noun equation, and verb predicts. Here sentence, noun and verb are nonterminals, while words that actually appears in sentences (i.e. network, predicts) are terminals. The sentences networkpredicts and equation predicts can be derived with this grammar. We denote a context free grammar as a tuple G = (iV, T, P, S), where N and T are finite disjoint sets of nonterminals and terminals, respectively. P is a finite set of productions; each production is of the form A a, where ^ is a nonterminal and a is a string of symbols From NUT. We use the notation A ^ ai \ a^ \ ... | a^ for a set of productions for the nonterminal A: A ai, A a2,..., A ^ ak- Finally, 5 is a special nonterminal called starting symbol. Grammars used in equation discovery system La-gramge have several symbols with special meanings. The terminal const e T is used to denote a constant parameter in an equation that has to be fitted to the input data. The terminals Vi are used to denote variables from the input domain D. Finally, the nonterminal v e N denotes any variable from the input domain. Productions connecting this nonterminal symbol to the terminals Uj are attached to v automatically, i.e., \/vi ■. v Vi e P. The only restriction on the grammar G is that the right sides of the productions in P have to be expressions that are legal in the C programming language. This means that we can use all C built-in operators and functions in the grammar. Additional functions, representing background knowledge about the domain at hand can be used, as long as they are defined in conjunction with the grammar. Note that the derived equations may be non-linear in both the constant parameters and the system variables. Expressions can be derived by grammar G from the nonterminal symbol S by applying productions from P. We start with the string w consisting of S only. At each step, we replace the leftmost nonterminal symbol A in string w with a, according to some production A a from P. When w consists solely of terminal symbols, the derivation process is over. 3.2 Lagramge - the algorithm Expressions generated by the context free grammar G contain one or more special terminal symbols const. A nonlinear fitting method is applied to determine the values of these parameters. The fitting method minimizes the value of the error function Error{c), i.e. if c is the vector of constant parameters in expression E, then the result of the fitting algorithm is a vector of parameter values c*, such that .Brror(c*) = mincgfl-^ {£'rror(c)}. The error function Error is a sum of squared errors function, defined in the following manner: - for a differential equation of the form dvd/dt — E: Error{c) = Eilo - (vifi + //; Eic,vu.. , and - for an ordinary equation of the form v d = E: Error {c) — EZoi^d.i - E(C, Vi^i, . . . Vd-i,i, Vd+i,i, . . . where N is the size of the measurement table and vj^i the value of the system variable v j at time ti. Note that in the case of calculating the error fiinction for differential equations we use the integral of the expression on the right hand side of the equation instead of the derivative of the dependent variable. This is done because the error of algorithms for numerical integration is in general smaller than the error involved of numerical derivation. We use a simple trapezoid formula for numerical integration with the same step size as the time step between successive measurements in the measurement table. The downhill simplex and Levenberg-Marquardt algorithms (Press et al. 1986) can be used to minimize the error function. Furthermore, the value of a heuristic fiinction for the expression is evaluated. It is equal to the sum of squared errors value SSE calculated by the fitting method (SSE{E) = Error{c*)). An alternative heuristic function MDL (minimal description length) can be used, that takes into account the length I of expression E: MDL{E) = SSE{E) + I 10-L where l^ax is the length of the largest expression generated by the grammar and a^^ is the standard deviation of the dependent variable v^. The length is measured as the number of terminals in the expression. The MDL heuristic function prefers shorter equations. A context free grammar can in principle derive an infinite number of expressions (equations). Lagramge thus uses a bound on the complexity (depth) of the derivation used to produce the equation (Todorovski & Džeroski 1997). The Lagramge algorithm exhaustively or heuris-tically searches for the best equation (according to the selected heuristic function) within the allowed complexity (depth) limits. 3.3 Time series prediction with equation discovery We reformulate the problem of time series prediction mto the equation discovery problem in the following way. Given a time series a;(l), x{2),x(3),..., we choose a constant p and build matrix M as follows: time Vi V2 .. Vp+l to x(l) . x{p+l) h x{2) x(3) .. . x{p + 2) ti x(3) x(4) .. . x(p + 3) Now the input domain for equation discovery problem equivalent to the problem of time series prediction is D = ({vi,v2,... ,Vp+i},Vp+i,M). We search for ordinary equation of the form Vp+i = F{vi ,V2,... Vp). The obtained equation can be interpreted as difference equation for predicting the next value of the time series x(n) = F{x{n-l),x{n-2),... ,x{n-p)). The form of function F on the right-hand side of the equation is biased with a context free grammar G. We used three different context free grammars for restricting the space of possible equation in time series prediction domains. First grammar is used to produce linear models: E —> const I const *v\E + const * v The second grammar generates quadratic multivariate polynomials: E const I const * F \ E+ const * F F —^ u I u * u Finally, the third grammar used in the experiments generates piecewise linear models. The breakpoint is set to 0.5, which is the middle of the interval of normalized values of time series: double If(double v, double el, double e2) { return((v < 0.5) ? el : e2); } IfE E\lf{v,E,E) E const I const * u | £ const * v 4 Experiments 4.1 Data sets descriptions We applied the techniques described in the previous two sections to two synthetic and three real worid data sets: Lorenz system Model of the Lorenz attractor is one of the most frequently used examples of the deterministic chaos system. It is described with the following differential equations: X = a{y - x) y = x{R-z)-y ž = xy — bz The values of the constant parameters were chosen to be: a = 16.0, iž = 451992,0 = 4.0. For initial state a;(0) = 0.06735,2/(0) = 1.8841,2(0) = 15.7734 the system is well-conditioned. The equations were simulated for 2000 time steps of length h = 0.001. For the prediction task we use the time series for variable z, because it clearly reflects non-linear dynamics of the system. Reference voltage The observed time series are generated by solid-state VRE-s, based on 7V zener diodes LTZIOOO of a group DCVRS. They are produced by measuring the absolute voltage values of VRE, which is controlled by PC-computer. The PC communicates with DCVRS via serial port RS232. Measuring instrument is digital voltmeter HP3458A. The time series present 2000 samples taken in time intervals of 15 minutes during 500 hour measurement. Fractional Brownian motions or 1//noises FBM is a random function provided by Mandelbrot and Van Ness (FBM). The most important feature of FBM is that its increments [Bnit + T) - Bnit)] = h~"[BHÌt + hT) - Bnit)] are stationary, statistically self-similar and have Gaussian distribution with a standard deviation ChT'^ , where C h is constant. This is usually called T" law of FBM. The parameter H is directly related to the fractal (Hausdorff) dimension D. For generation of the FBM signals we use the method of spectral synthesis (Nancovska 1997). Lorenz-like chaos in NH3-FIR lasers Far infrared lasers have been proposed as examples of a physical realization of the Lorenz model, mentioned earlier (Hubner et al. 1992). However, the actual laser systems are more complex then simple coherently coupled three-level systems. The data set was chosen to obtain several of the important quantities pertinent in comparison to the parameters of numerical data sets obtained by the integration of Lorenz equations. We took into consideration first 2000 time points of the time series. Sunspots The data set is standard benchmark test for various techniques for time series prediction. It contains the observation of the number of annual sunspots for 280 years. 4.2 Experimental setting The following methodology of the experiments with the time series prediction data sets was used. Each data set was divided in two parts of equal sizes (1000 time points per set, expect for the Sunspot data set which has only 280 time points). The first part was used as input for the learning system in the training phase, and the second one was used for testing the performance of the obtained predictive model. Furthermore, in the traming phase 80% of the training set was used directly for learning and the rest is used for error evaluation only (the evaluation applies the estimation of the performance of the predictor learned so far). The length of the input vector p varied between 1 and 6. The criterion for choosing the best predictor was the root mean square error (RMSE). In the experiments with neural networks, the value of parameter rj is gradually decreased from 0.95 to 0.003 to avoid local minima of the error surface. When training the recurrent network 77 is set to the lower value from the be-girming to make the time scale of the weight changes small enough to allow the learning algorithm to follow the negative gradient of error function. We used beam search strategy in equation discovery system Lagramge with beam width set to 50 and both heuristic functions (SSE and MDL) with downhill simplex method for constant parameters fitting. The depth complexity parameter was set to 10 and three different context free grammars (from the previous section) were used. The best equation was then chosen that minimizes the RMSE on the test training set. 4.3 Results The results of the experiments for neural networks and equation discovery system Lagramge are given in Table 1 and Table 2, respectively. The architecture of the MP neural network is represented with x — y — z, where x, y and z denote numbers of neurons in first, second and third layer, respectively, l^y^l denotes FIR-MP neural network architecture with p and q time operators (taps) between corresponding layers and p equals the length of the input vector. The architecture of the recurrent neural network is represented with x y - I, where x denotes the length of input vector and y the number of feedback connections. Recurrent neural networks has the best performance for the Reference voltage and FBM data sets. Both data sets represent time series with very fast changing values without long-term trend. The recurrent neural network has worst performance for the time series with trend. In that case the MP and FIR-MP networks better identify the underlying system, as we can see from the results of the experiments for Lorenz and Sunspots data sets. For Lorenz-like chaos data set the best performance is surprisingly achieved with MP network^. For all data sets, expect the Lorenz-like, the prediction performance of different types of neural networks are comparable. In the experiments with Lorenz-like chaos MP is significantly better then other two types. ^Finding a simple representation for a complex signal might require looking for relationship among input variables. In the case of Lorenz-like chaos input vectors are representative enough to allow the MP to tind the "simple" regression model which is good enough for local description of the model. Data set Winning NN RMSE Type Architecture P Training Testing Lorenz FIR-MP 5 9.9-10-^ Ref voltage Ree. 6<--4-l 6 0.0646 0.0749 FBM Ree. 5 •M 4- 1 5 0.0895 0.0823 Lorenz-like MP 6-3-1 6 0.0153 0.0260 Sunspots FIR-MP iHil 5 0.09795 - Table 1: Results of the experiments with neural networks Data set Winning equation RMSE Type P Training Testing Lorenz piecewise linear 6 1.919-10-« 2.465-10-« Ref. voltage piecewise linear 6 0.06375 0.07405 FBM linear 6 0.08846 0.0827 Lorenz-like quadratic 3 0.03889 0.05939 Sunspots quadratic 4 0.103 - Table 2: Results of the experiments with equation discovery In the experiments with equation discovery for die Lomez, Reference voltage and FBM data sets the discovered equations are linear. Although the Lorenz data set is obtained with simulating three non-linear differential equations, the interval enclosed in the data set do not expose the non-linearity of the underlying equations (due to the stability of the numerical integration a small time step was chosen). For Lorenz-like and Sunspots data sets quadratic equations were discovered, which was expected because of their non-linearity. The parameter p (number of previous values used for prediction) is significantly smaller in cases where quadratic equations were discovered. As in the experiments with neural networks, for all data sets, expect the Lorenz-like, the prediction performances of different types of equations are comparable. In the experiments with Lorenz-like chaos quadratic equations are significantly better then other two types. Both methods manifest comparable performance on three data sets. Equation discovery outperforms neural networks for the Lorenz data set, which was expected because of the determinism of the underlying model. Neural networks have better performance on the Lorenz-like and Sunspots data sets where quadratic equations were discovered by Lagramge. Figure 2 shows the performance of the obtained predictors for different types of neural networks and equations. 5 Discussion In the paper, we presented the equation discovery system Lagramge that uses context free grammars for restricting the hypothesis space of equations. Background knowledge from the domain of use in the form of function definitions can be used along with common arithmetical operators and functions built in the C programming language. The hypothesis space of Lagramge is a set of equations, such that the expressions on their right hand sides can be derived from a given context free grammar. In contrast with system identification methods, where the structure of the model has to be provided explicitly by the human expert, Lagramge can use a more sophisticated form of representing the expert's theoretical knowledge about the domain at hand. A context free grammar can be used to specify a whole range of possible equation structures that make sense from the expert's point of view. Therefore, the discovered equations are in comprehensible form and can give domain experts better or even new insight into the measured data. This also distinguishes Lagramge from other system identification methods like neural networks, which can be used for obtaining blackbox models, i.e., models with incomprehensible structure. On the equation discovery side, the presented work is related to equation discovery systems, such as Bacon (Langley et al. 1987), EF (Zembowitz & Zytkow 1992), E* (Schaffer 1993), LaGRANGE (Dzeroski & Todorovski 1993) and GoldHorn (Krizman et al. 1995). However, none of them was applied to the task of time series prediction. Various architectures of neural networks have already been used for system identification and prediction. Some of them are closely related to the architectures used in this paper (Haykin 1998, Lippmann 1987, Narendra 1990, Pham 1995), and others are different, such as radial basis function (RBF) neural network (Haykin 1998) and Group-Method-of-Data-Handling (Pham 1995). The NARX recurrent neural networks (Alippi 1996, Lin 1996) architecture is suitable for learning long-term dependencies in time series. For the task of short-term prediction, addressed in this paper, learning the local structure is good enough (Narendra Referenco Voltoge Lorenz- „ Ukochao. Data sat NNtypa Figure 2: RMSE of the predictors for different types of neural networks and equations 1990). Support vector machine (SVM) for non-linear regression (Haykin 1998), which is approximate implementation of the method of structural risk minimization, could be also used. Finally, SVM may be implemented in the form of a polynomial learning machine, RBF network or MP. References [1] Alippi, C., and Piuri, V. (1996) Experimental Neural Network for Prediction and Identification, In IEEE Transactions on Instrumentation and Measurement, Vol. 45, No. 2, 1996, pages 670-676. The outcomes of the experiments confirm the potential applicability of both paradigms for time series prediction. For each domain, the performances of the predictors, obtained with both methods, are comparable. The best predictors were obtained for the deterministic Lorenz-z time series. The efficiency of the predictors for real world domains (Reference voltage and Lorenz-like chaos) are comparable. The predictors with worst efficiency were obtained for FBM and Sunspots time series, due to the randomness of the underlying model (FBM) and the small number of measurements available (Sunspots). The predictive models could be used as a segment of software controlled voltage reference element (VRE), which consists of three main parts: measuring, predictive and control part. Stability of a reference voltage source could be enhanced by implementation of voltage control, which includes a function of correction (feedback loop). This could be done by correction of the current voltage by using a prediction based on measurements made before. It is anticipated for a solid-state voltage reference source to achieve the stability better than Ippm/lOOOh (Nancovska 1997). First step towards the further work will be the comparison of the methods described in the paper with mainstream statistical methods for time series prediction, such as ARIMA or exponential smoothing. It will be of great interest to apply these methods to the variety of data sets from different domains, where some background knowledge from the concrete domain can be used for restricting the equation space. Alternative types of neural networks architectures (NARX recurrent network, SVM for non-linear regression (Haykin 1998)) could be also implemented. [2] Džeroski, S., and Todorovski, Lj. (1993) Discovering dynamics In Proc. Tenth International Conference on Machine Learning, pages 97-103. Morgan Kaufmann, San Mateo, CA. [3] Gershenfeld, N. A., and Weigend, A. S. (1992) The future of time series: learning and understanding. In Time series prediction: Forecasting the future and understanding the past. Proceedings of the NATO Advanced Research Workshop on Comparative Time Series Analysis held in Santa Fe, New Mexico, May 14-17, 1992, pages 1-70. Addison-Wesley Publishing Company, Reading, MA. [4] Haykin, S. (1998) Neural Network - A Comprehensive Foundation, Second Edition, Macmillan College Publishing Company, Inc. [5] Hopcroft J. E., and Ullman, J. D. (1979) Introduction to automata theory, languages, and computation. Addison-Wesley, Reading, MA. [6] Hübner, U., and Weiss, C. O., and Abraham, N. B., and Dingyuan T. (1992) Lorenz-like chaos in NHz-FIR lasers (Data Set A). In Time series prediction: Forecasting the future and understanding the past. Proceedings of the NATO Advanced Research Workshop on Comparative Time Series Analysis held in Santa Fe, New Mexico, May 14-17,1992, pages 1-70. Addison-Wesley Publishing Company, Reading, MA. [7] Hecht-Nielsen R. (1990) Introduction to back-propagation, In Neurocomputing, HNC, Inc. and University of California, San Diego, Addison-Wesley Publishing Company. [8] Križman, V., andDžeroski, S., and Kompare, B, (1995) Discovering dynamics from measured data. Electrotech-nicalReview, 62: 191-198. [9] Langley, P., and Simon, H., and Bradshaw, G. (1987) Heuristics for empirical discovery. In Bole, L., editor, Computational Models of Learning. Springer, Berlin. [10] Lin, T., and Home, B. G., and Tino, P., and Giles, C. L. (1996) Learning Long-Term Dependences in NARX Recurrent Neural Networks. In IEEE Transactions on Neural Networks, Vol. 7, No. 6, November 1996, pages 1329- 1338. [11] Lippmann, R. P., (1987) An Introduction to Computing with Neural Nets, In IEEE ASSP Magazine, April 1987, pages 4-22. [12] Mandelbrot, B., and Van Ness, J. W., (1968) Fractional Brownian Noises and Applications, In 5X4MRev. 10(4), 1968, pages 422-436. [13] Nančovska, L, and Kranjec, P., and Fefer, D., and Jeglič, D. (1998) Case Study of the Predictive Models Used for Improvement of the Stability of the DC Voltage Reference Source, In IEEE Transactions on Instrumentation and Measurement, vol.47, no. 6, 1998, pages 1487- 1491. [14] Narendra, K. S., and Parthasarathy, K. ( 1990) Identification and Control of Dynamical Systems Using Neural Networks. In IEEE Transactions on Neural Networks, Vol. 1, No. 1., March 1990, pages 4 - 27. [15] Siegelmann, H. T., and Home, B. G., and Giles, C. L. (1995) Computational capabilities of recurrent NARX neural network, In Tech. Rep. UMIACS-TR-95-12 nad CS-TR-3408, Inst, of Adv. Comp. Stud., Univ. of Maryland. [16] Siegelmann, H. T., and Sontag E. D. (1995) On the Computational Power on Neural Networks. In Journal of Comp. Systems in Science, Vol. 50, No. 1, pages 132 - 150, 1995. [17] Pham, D.T., and Liu, X. (1995) Neural Networks for Identification, Prediction and Control. Springer-Verlag, London, GB. [18] Press, W. H., and Flannety, B. P, and Teukolsky, S. A., and Vetterling, W. T. (1986) Numerical Recipes. Cambridge University Press, Cambridge, MA. [19] Schaffer, C. (1993) Divariate scientific fiinction finding in a sampled, real-data testbed. Machine Learning, 12: 167-183. [20] Todorovski, Lj., and Džeroski, S. (1997) Declarative bias in equation discovery. In Machine learning. Proceedings of the 14th international conference (ICML '9 7), pages 376-384. Morgan Kaufmann publishers, San Francisco, CA. [21] Zembowitz, R., and Zytkow, J. (1992) Discovery of equations: experimental evaluation of convergence. In Proc. Tenth National Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo, CA. Adaptive On-line ANN Learning Algorithm and Application to Identification of Non-linear Systems Daohang Sha and Vladimir B. Bajić Centre for Engineering Research, Technikon Natal, P.O.Box 953, Durban 4000, South Africa dhsha@hotmail.com, bajic.v@umfolozi.ntech.ac.za tel/fax: +27-31-2042560, http://nsys.ntech.ac.za Keywords: Soft computing, neural networks, gradient descent method, real-time algorithms, Input/Output modelling Edited by: Cene Bavec and Matjaž Gams Received: October 13, 1999 Revised: November 10, 1999 Accepted: Decembers, 1999 Anew on-line adaptive learning rate algorithm for I/O identification based on two ANNs is proposed. The algorithm is derived from the convergence analysis of the conventional gradient descent method. Simulation experiments are given to illustrate the advantages of the proposed algorithm in its application to an identification problem of some non-linear dynamic systems. 1 Introduction Feedback linearization can be used for controlling a broad class of non-linear processes. To perform feedback linearization it is necessary that the process allows description by a particular model structure which iits into the affine non-linear form (see [l]-[4]). When the process is unknown we can let two multilayer perceptron (MLP) models approximate the linear relationship between the input and output data of the process. In this paper, we will develop an on-line variable rate training algorithm for this double neural network system. Determination of the fixed learning rate of the conventional back-propagation (BP) algorithm for feedforeward neural networks (FNNs) [5] has to be made with care. If the learning rate is large, learning may occur quickly, but it may also become unstable. To ensure stable learning, the learning rate must be sufficiently small. However, with a small learning rate, an ANN, for example an MLP, may adapt reliably, but it may take quite a long time. It is thus difficult to select a suitable fixed learning rate for different initial values of the parameters of the ANNs and for different structures that ANNs may have. Moreover, a good fixed learning rate for one system is not necessarily good for another. These are characteristics of the basic ANN learning rule that relies on the gradient descent (GD) method and the chain rule [6]. For the convergence of such algorithms see [7] and [8]. The GD method is known for its slowness and its tendency to become trapped in local minima. To reduce these shortcomings, a number of faster ANN training algorithms have been developed, such as different adaptive learning algorithms (see [9], [10] and [11]), and other modified algorithms (see [12]-[23]). In spite of their improved convergence, these methods are not based on the optimal instantaneous learning rates of the GD approach. One may use second-order nonlinear optimizing methods to acceler- ate the MLP learning, such as the conjugate gradient algorithm (see [24]) or the Levenberg-Marquardt based method (see [25]). The critical drawbacks of these methods, however, are the ill conditioning of the Hessian matrix in many applications and the computational complexity related to the Hessian. In addition, most of these algorithms are developed only for off-line training of the ANNs. Recently, the layer-by-layer optimizing algorithm was proposed in which each layer of MLP's is decomposed into both a linear part and a nonlinear part [26]-[33]. The linear part of each layer is solved via the least squares problem formulation. Although these algorithms show fast convergence with much less computational complexity than those of the conjugate gradient or Newton methods, they result in an unavoidable problem caused by target assignments at hidden nodes. When the targets for a hidden layer cannot be linearly separated, it is impossible to reduce the MSE sufficiently at both the hidden layer and the output layer. Another class of fast learning algorithms is the one based on the extended Kaiman filter (EKF) technique for training of a multilayer FNN. It has received considerable attention recently (see [34]-[38]). These algorithm improved the convergence rate considerably and exhibited good performance. However, their numerical stability is not guaranteed. This may degrade learning convergence, increase training time and, generally, can make on-line implementation questionable. In this paper, we develop an on-line variable rate learning algorithm for double neural network system which can speed up the learning process substantially and can simultaneously provide stability of the learning process. In [39], we proposed a variable rate algorithm for the on-line training of an MLP. Here we extend this solution to online training of a double neural network system for I/O modelling of SISO time-invariant non-linear systems. As ANNs are widely used for the identification of non-linear 522 Infomiatica 23 ( 1999) 521-529 D. Sha et al. systems (see [40]-[61]), as well as for prediction of their behavior (see [62]-[64]), we will test this algorithm in the on-line identification and prediction of behavior of three non-linear systems. ©Linear Nods •Nonlinear Node 9 ' On-line Training Multiplier INk, k = and HNk, k = f,g, denote the number of nodes of the hidden layer. Then, starting from (1) one can consider a double three-layers-forward neural network as an identification model for a non-linear plant (Fig.l), where the network model is governed by / Figure 1 : Double neural network system for identification of non-linear plant 2 Modeling Non-linear Plants by ANNS 2.1 Problem Description Consider a SISO time-invariant non-linear system for which we will attempt to obtain an I/O model in the form of y{k + l) = fMk)]+g[^{k)].uik). Here cp(k) = [-yik) ...-y{k-n + l) - 1)... u{k -m)]^. Integer parameter n may be associated with the system order; m is a non-negative integer. Let the functions / and g of the above model be unknown. We will use two neural networks to model / and g, in order to obtain their approximations / and g, respectively. The assumption is that these networks are governed by yNN{k + l\