EN t ►O Ph ö a CO* Ol c6 U i o Informatica An International Journal of Computing and Informatics Special Issue: Multistrategy Learning Guest Editor: Gheorghe Tecuci The Slovene Society Informatika, Ljubljana, Slovenia! Informatica An International Journal of Computing and Informatics >. Subscription Information Informatica (ISSN 0350-5596) is published four times a year in Spring, Summer, Autumn, and Winter (4 issues per year) by the Slovene Society Informatika, Vozarski pot 12, 61000 Ljubljana, Slovenia. ; The subscription rate for^ 1993 (Volume 17) is - ' DEM 50 (US$ 34) for institutions," - DEM 25 _ (US$17) for individuals, and - DEM 10 (US$ 4) for students plus _ the mail charge DEM 10 (US$ 4).' Claims for missing issues will be honored free of charge within six months after the publication date of the issue. -~&TeX Technical ■ Support: Borut Znidar, DALCOM ■ d.o.o. Stegne 27, 61000 Ljubljana, Slovenia. Lectorship: Fergus F. Smith, AMIDAS d.o.o., Cankarjevo nabrežje 11, Ljubljana, Slovenia.' Printed by Biro M, d.o.o., Zibertova 1, 61000 Ljubljana, Slovenia. Orders for subscription may be placed by telephone or fax using any major credit card. Please call Mr. R. Murn, Department for Computer Science, Jožef Stefan Institute:■ Tel (+386) 61 1259 199, Fax (+386) 61 219 385, or use the bank account number 50100-620-133-27620-5159/4 (LB 50101-678-51841 for domestic ■ subscribers only). " f ■' According to the opinion of the Ministry for Informing (number 23/216-92 of March 27, C 1992), the scientific journal Informatica is a product of informative matter (point 13 of the tariff number 3), for which the tax of traffic amounts to 5%. Informatica is published in cooperation with the following societies (and contact persons) : Robotics Society of Slovenia (Jadran Lenarcic) Slovene Society for Pattern Recognition (Franjo Pernus) Slovenian Artificial Intelligence Society (Matjaž Gams) Slovenian Society of Mathematicians, Physicists and Astronomers (Bojan Mohar) Referees: , D. Abel, M. Bohanec, M. Bonac, M. Drobnit, B. Filipic, J. Fiirnkranz, G.K. Gupta, M.A. Jenkins, A. Karalic, M. Kljajic, M. Kovacic, J. Lenarcic, B. Likar, B. Litow, D. Mladenic, I. Možetic, M. Presern, M. Rihar, B. Robic, I. Savnik, I. Sega, T. Sžuba, M. Tamiz, G. Trinidad, O. de Vel. The issuing of the Informatica journal is financially supported by the Ministry for Science and Technology, Slovenska 50, 61000Ljubljana, Slovenia. PROFILE The readers of Informatica will probably ascertain that the choice of profiles is quite diverse concerning the positions and careers of our editors. I must confess that my and the editors choice for certain profiles is intuitive and simultaneously spontaneous. We search for excellence in individuals. In this choice, the international (geographical), generational, and disciplinary-field balance may play a significant role. Some of our editors are well-situated, but some are neither supplied with the basic research infrastructure nor living suitability. As researchers they persist in their profession irrespective of the obstacles and misunderstanding of their research communities. In Eastern Europe and also in the West, some of them are left to themselves, without public support. But, the only important issue is their scientific achievement which may already impact the present research and technology generations or will have an impact in the future. Regarding this point, the intuition of choosing individuals for this column of Informatica is very challenging. Professor Gheorghe Tecuci belongs to the younger generation of researchers and his professional position is in harmonic accordance with his professional achievements. He is already a member of the national academy of sciences, cosmopolitan, and holds a leading position in the field of machine learning and knowledge acquisition. In this respect he might be a real pattern for the coming research generations—particularly in the fields which concern modern artificial intelligence, knowledge methodologies and technologies, and even more the postmodern science and technology of the informational. It is a great pleasure to present the profile of professor Tecuci to the readers of Informatica together with his introductory article entitled "Mul-tistrategy Approaches to Learning: Why and How" and a group of five articles being edited by him. In this manner, the readers of Informatica will obtain the best insight and comprehension of the field professor Tecuci is leading. In this issue of Informatica, for the first time, the so-called thematic groups of articles are presented. In the next Volume 18 (1994), in Informatica, several other thematic groups will be pub- lished concerning modern fields of logic, computation, artificial intelligence and, hopefully, also the critics (and crisis) of computer science and informational treatment of knowledge. In this respect, I believe, the readers and contributors of Informatica will not be disappointed. Informatica will also proceed from its quarterly issuing of the journal to a bimonthly one. All this will require greater activity and help from our editors. Gheorghe Tecuci Gheorghe Tecuci, a full member of the Romanian Academy since September 1993, is Associate Professor of Computer Science at George Mason University in Fairfax, Virginia, U.S.A., and Director of the Romanian Academy Center for Machine Learning, Natural Language Processing and Conceptual Modeling. He received the M.S. degree in Computer Science from the Polytechnic Institute of Bucharest in 1979, graduating first among all the Computer Science students at the Polytechnic Universities of Romania. He received two Ph.D. degrees in Computer Science, one from the University of Paris-South, in July 1988, and the other from the Polytechnic Institute of Bucharest, in December 1988. Between 1979 and 1990 Tecuci was a researcher and project leader at the Research Institute for Informatics in Bucharest, Romania, where he developed many of the skills that are essential for a successful researcher. During the summers of 1986-1990 he worked at LRI, University of Paris-South, as a research director of a joint research program of the Romanian Academy and the French National Research Center, and in Fall 1990 he joined the faculty of the Computer Science Department of the George ' Mason University. In March 1991, Tecuci was elected corresponding member of the Romanian Academy and became a full member in September 1993. Tecuci has published over 60 scientific papers. He is one the founders of the field of multistrategy learning which represents a new stage in the evolution of machine learning. Tecuci has developed one of the first multi-strategy learning systems, DISCIPLE, which syn- ergistically integrates explanation-based learning, learning by analogy, empirical inductive learning, and learning by questioning the user. Many important concepts in machine learning and knowledge acquisition have originated from research on DISCIPLE [Ph.D. Thesis, University of Paris-South, 1988]. Besides multistrategy learning, one could also mention adaptive learning (which is illustrated by DISCIPLE's ability to adapt its behavior to the features of the learning task), the usefulness of plausible explanations and over-generalizations in learning and knowledge acquisition, and plausible version spaces as a symbolic representation of uncertainty. For his work on DISCIPLE, Tecuci received the prize "Traian Vuia" of the Romanian Academy, in 1987. A paper on DISCIPLE [International Journal of Expert Systems, 1 (1): 39-66, (1987)] has also been included in "Readings in Machine Learning and Knowledge Acquisition", a book from the well-known Morgan Kaufmann's series which collects the best papers from each field of artificial intelligence. More recently, Tecuci has introduced the concept of a plausible justification tree as a generalization of the concept of explanation, and as a framework for the integration of different types of inference. He has also introduced the concept of inference-based generalization, showing that there is a special type of generalization associated with each type of inference. Based on these new concepts, Tecuci has developed a general framework for multistrategy learning [Machine Learning, 11 (2-3): 129-153, (1993)] which dynamically integrates the elementary building blocks of the single-strategy learning methods (i.e., deduction, analogy, abduction, generalization, specialization, abstraction, concretion, etc.). The learning method MTL-JT (Multistrategy Task-adaptive Learning by Justification Trees), developed in this framework, integrates deeply and dynamically explanation-based learning, determination-based analogy, empirical induction, constructive induction, and abduction, depending of the features of the learning task. It also behaves as any of these single-strategy learning methods whenever the applicability conditions of such a method are satisfied. Tecuci has also made important contributions to the integration of machine learning and knowl- edge acquisition. He has developed NeoDIS-CIPLE which represents a general approach to the automation of knowledge acquisition. This approach is based on several powerful ideas like understanding-based knowledge extension, knowledge acquisition through plausible inference and multistrategy learning, consistency-driven concept elicitation, and plausible version spaces [IEEE Transactions on Systems, Man and Cybernetics, 22 (6): 1444-1460 (1992); Knowledge Acquisition 4 (4), (1993)]. Other research contributions of Tecuci are in the areas of expert systems, advanced robotics, compiler generation, and the application of graph theory to vision and to the analysis of electrical circuits. Tecuci is the co-editor of "Machine Learning: A Multistrategy Approach", which is the first comprehensive book on this topic, as well as the forth volume in the series of classical books "Machine Learning". Previous volumes were co-edited by R.S. Michalski, J.G. Carbonell, T.M. Mitchell and Y. Kodratoff. Tecuci was the program chairman and a co-organizer of several workshops on new research directions in artificial intelligence: the First International Workshop on Multistrat-egy Learning (MSL-91, Harpers Ferry, West Virginia), the Second International Workshop on Multistrategy Learning (MSL-93, Harpers Ferry, West Virginia), and the IJCAI-93 workshop Machine Learning and Knowledge Acquisition: Common Issues, Contrasting Methods and Integrated Approaches (Chambery, France). Tecuci was also a tutorial speaker on Multi-strategy Learning at the two most important artificial intelligence conferences, the International Joint Conference on Artificial Intelligence (Cham-bery, France, 1993) and the National Conference of the American Association for Artificial Intelligence (Washington D.C., 1993). He is the guest editor of this special issue of Informatica on mul-tistrategy learning, and of a special issue of the Knowledge Acquisition Journal on the integration of machine learning and knowledge acquisition. He served the scientific community as program chairman, program committee member, organizer, member of the editorial board, or reviewer of several conferences, journals, and publishers. He speaks four languages, Romanian, English, French, and Italian. Edited by A.P. Zeleznikar MULTISTRATEGY APPROACHES TO LEARNING: WHY AND HOW Gheorghe Tecuci Center for Artificial Intelligence, Department of Computer Science, George Mason University, Fairfax, VA 22030, U.S.A., and Center for Machine Learning, Natural Language Processing and Conceptual Modelling, Romanian Academy, Bucharest, Romania tecuciQcs.gmu.edu Keywords: case-based reasoning, machine learning, multistrategy approach, single-strategy learning Machine Learning is concerned with developing systems which are able to improve themselves by learning from an external source of information (e.g. a teacher or a collection of data) or from their own experience. Research in this area can be traced back to the beginning of Artificial Intelligence. However, it is in the last several years that Machine Learning has greatly expanded and diversified. Most machine learning research has been concerned with single strategy learning methods which illustrate basic forms of learning. Examples of single-strategy learning are: — empirical inductive learning from examples, which consists of learning the definition of a concept by comparing positive and negative examples of the concept in terms of their similarities and differences, and inductively creating a generalized description of the similarities of the positive examples; — explanation-based learning, which consists of learning an operational definition of a concept by proving that an example is an instance of the concept and by deductively generalizing the proof; — learning by analogy, which consists of learning new knowledge about an entity by transferring it from a similar entity; — case-based reasoning and learning, which consists of using past experiences to deal with new situations and then extending and improving the descriptions of these experiences; — abductive learning, which consists of hypothesizing causes based on observed effects; - reinforcement learning, which consists of updating the knowledge, based on feedback from the environment; - genetic algorithm-based learning, which consists of evolving a population of individuals over a sequence of generations, based on models of heredity and evolution; - neurai network learning, which consists of evolving a network of simple units to achieve an input-output behavior based on a simplified model of the brain's dendrites and axons. Single-strategy learning methods have been successfully used to build adaptive intelligent agents. However, each such method, taken separately, has a limited applicability because it requires a special type of input and background knowledge, and it only learns a specific type of knowledge. For instance, an empirical inductive learning system needs many input examples in order to learn, but does not require much prior knowledge. On the other hand, an explanation-based learning system can learn from only one example, but it needs complete prior knowledge. Many real-world learning situations are such that the learner has more than one example, but not enough to successfully apply an empirical inductive learning strategy. However, the learner usually has more prior knowledge than needed by empirical induction, but not enough for applying an explanation-based learning strategy. /• This comparison can easily be extended to the other single-strategy approaches to learning. For instance, a learning by analogy system could learn additional features of some input entity only if it has a similar entity in the knowledge base. Similarly, an abductive learning system could learn new knowledge about an input situation only if it has causal knowledge that could explain the situation. Moreover, each of these single-strategy learners produces a different (and often complementary) result. The complementary nature of the requirements and results of the single-strategy learning methods naturally suggests that by properly integrating them, one could obtain a synergistic effect in which different strategies mutually support each other, and compensate for each other's weaknesses. As a result, one may build a multisti^ategy learning system that is applicable to a wider spectrum of problems. There are several general frameworks for the design of multistrategy learning systems. One such framework consists of a global control module, and a toolbox of single strategy learning modules, all using the same knowledge base. The control module analyžes the relationship between the input and the knowledge base and decides which learning module to activate. Another framework consists of a cascade of single strategy learning modules, in which the output of one module is an input to the next module. Yet another framework consists of integrating the elementary inferences (like deduction, analogy, abduction, generaližation, specialižation, abstraction, concretion, etc.) that generate the individual learning strategies, and thus achieving a deep integration of these strategies. Early multistrategy learning research has often been limited to the integration of empirical inductive learning and explanation-based learning. Besides integrating a greater range of learning strategies, current research in multistrategy learning addresses the utiližation of learning goals and the adaptation of the learner to the learning task. This special issue of Informatica contains updated versions of papers that have been selected from among those presented at the Second International Workshop on Multistrategy Learning, organized by George Mason University, in May 1993. The papers are representative of the current research by leading researchers in multistrat-egy learning. Gordon and Subr^amanian present a multistrategy learning method for designing and refining knowledge for autonomous agents. The method combines two complementary strategies, deductive concretion (a form of explanation-based learning) and inductive refinement (by a genetic algorithm). First, high-level advice provided by a human is deductively operationaližed in all possible ways, resulting in a set of operational rules for the autonomous agent. Then these rules are inductively refined by a genetic algorithm, producing a set of better rules for the agent. The two main advantages of this multistrat-egy learning method are a high convergence rate (due to the operationaližed advice which biases the genetic algorithm into a favorable region of the search space) and robustness (due to the genetic algorithm-based learning). Ram and Santamar^ia present a multistrategy learning system for robot navigation (called SINS), which combines case-based reasoning and reinforcement learning. The case-based reasoning component of SINS perceives the robot's environment and retrieves an appropriate case which recommends parameter values for the control system of the robot. The reinforcement learning component of SINS updates the content of a case to reflect the current experience. Extensive experiments show that SINS is able to achieve the same level of performance as non-adaptive systems acting in environments for which they have been optimized. Moreover, SINS is able to achieve this level of performance autonomously, and can cope with sudden changes in the characteristics of the environment. Widmer presents a multistrategy learning method for predicting numeric values. The method, which is implemented in the system IBL-SMART, combines multistrategy symbolic concept learning and numeric instance-based learning (which is a form of case-based reasoning and learning). The multistrategy symbolic concept learner uses deductive and inductive specialization operators to learn rules for partitioning a set of training examples into different instance spaces. The prediction of a numeric value associated with a new instance takes place in two stages. In the first stage, IBL-SMART determines the symbolic rule which covers the new instance and hence the corresponding instance space. In the second stage, the instance-based learner predicts the numeric value of the new instance based on the value of the nearest neighbor in the instance space. The most important advantage of this multi-strategy approach is that it provides a way of using qualitative domain knowledge for predicting precise numeric values. Baffes and Mooney address the general problem of theory refinement which consists of improving an incomplete and/or incorrect rule base of a classification system to correctly classify a set of input training examples. The system EITHER, which combines deduction, abduction and empirical induction, showed that multistrategy learning from theory (the rule base) and data (the input examples) gives better results than single strategy learning from data alone (i.e. learning the rule base only from examples). This paper presents NEITHER which is a major revision of EITHER. NEITHER extends the representation of the rule base to include M-of-N rules, and also improves significantly the speed of learning. Finally, Hieb and Michalski present a new knowledge representation formalism, dynamic interlaced hierarchies (DIH), that is specially developed to facilitate the performance of the basic inferential learning strategies, deduction, induction and analogy. The idea is to represent the knowledge in the form of concept hierarchies and traces that link nodes of different hierarchies. Within this representation the basic inferential operations such as generalization, abstraction, similization, as well as their opposites, specialization, concretion and dissimilation, could be performed by modifying the traces in different ways. The goal of DIH design is to provide a foun- dation for developing multistrategy learning systems that could adapt their learning strategy to the learning task characterized by a learner's input, knowledge and goal. These five papers illustrate only a few representative approaches to multistrategy learning. Other approaches are presented in the papers from the following bibliography. Besides developing ' multistrategy learning methods and systems, research in multi-strategy learning is also concerned with the analysis and comparison of different learning strategies (which identify their capabilities and limits) and with human learning (which is intrinsically multi-strategy). Through its research goals, multistrategy learning contributes to the unification of the entire field of machine learning, and is one of the field's most significant new developments. References [1] Buchanan B., and Wilkins D. (Eds.), Readings in Knowledge Acquisition and Learning: Automating the Construction and the Improvement of Programs. Morgan Kaufmann, San Mateo, CA, (1992). [2] Kodratoff Y., and Michalski R. S. (Eds.), Machine Learning: An Ai^tificial Intelligence Approach, III. Morgan Kaufmann, San Ma-teo, CA, (1990). [3] Michalski, R.S, and Tecuci, G. (Eds.), Proceedings of the Fir^st International Workshop on Multistrategy Learning. Organized by George Mason University, Harpers Ferry, WV, November 7-9, (1991). [4] Michalski, R.S, and Tecuci, G. (Eds.), Proceedings of the Second International Workshop on Multistrategy Learning. Organized by George Mason University, Harpers Ferry, WV, May 26-29, (1993). [5] Michalski R.S. and Tecuci G. (Eds.), Machine Learning: A Multistrategy Approach, IV. Morgan Kaufmann, San Mateo, CA, (1993). [6] Shavlik J., and Dietterich T. (Eds.), Readings in Machine Learning. Morgan Kaufmann, San Mateo, CA, (1990). [7] Tecuci G., DISCIPLE: A Theory^, Methodology and System for Learning Expert Knowledge. These de Docteur en Science, Universite de Paris-Sud, France, (1988). [8] Tecuci G., Plausible Justification Trees: a Fi^amework for the Deep and Dynamic Integration of Learning Strategies. Machine Learning Journal, vol.11, pp.237-261, (1993). A MULTISTRATEGY LEARNING SCHEME FOR AGENT KNOWLEDGE ACQUISITION Diana Gordon Naval Research Laboratory, Code 5514 Washington D.C. 20375 gordonQaic.nrl.navy.mil AND Devika Subramanian Department of Computer Science Cornell University Ithaca, NY 14853 devikafics.Cornell.edu Keywords: multistrategy learning, advice taking, compilation, operationalization, genetic algorithms Edited by: Gheorghe Tecuci Received: May 26, 1993 Revised: October 8, 1993 Accepted: October 15, 1993 The problem of designing and refining task-level strategies in an embedded multiagent setting is an_ important unsolved question. To address this problem, we have developed a multistrategy system that combines two learning methods: operationalization of high-level advice provided by a human and inci^emental refinement by a genetic algorithm. The first method generates seed r^ules for finer-grained refinement's by the genetic algor^ithm. Our multistr^ategy learning system is evaluated on two complex simulated domains as well as with a Nomad 200 robot. 1 Introduction ing their tasks, by interacting with the environ- ment. Agent knowledge acquisitionis desirable for The problem of designing and refining task-level the same reasons that knowledge acquisition for strategies in an embedded multi-agent setting exPer' sys'ems 's" "'s eas'er for a user 'o Pr°- is an important unsolved question. To address v'de Wgh-level knowledge about the world and the this problem, we have developed a multistrat- taskthanto Provide knowledge at a lower level of egy learning system that combines two learn- de'aiL The iatter is well-known to be a costly, ing methods: operationalization of high-level ad- tedious, and error-prone process. Although agent vice provided by a human, and incremental re- knowledge acquisition is desirable, it is very dif- finement by a genetic algorithm (GA). We de- ficul' (for 'he aSen')' as is knowledge acquisition fine advice as a recommendation to achieve a for exPer' systems. The additional challenge for goal under certain conditions. Advice is consid- aSen' knowledge acquisition comes from the fact ered to be operationalized when it is translated 'ha''he knowledge must be dynamically updated into stimulus- response rules directly usable by the a th e aSent th ro u Sh its interactions with the en- agent. Operationalization generates seed rules for vironment. finer-grained refinements by a GA. There are two basic approacn'es to construct- The long term goal of the work proposed here ing agents for dynamic environments. The first is to develop task-directed agents capable of act- decomposes the design into stages: a paramet- ing, planning, and learning in worlds about which ric design followed by refinement of the parame- they do not possess complete information. These ter values using feedback from the world in the agents refine factual knowledge of the world they context of the task. Several refinement strate- inhabit, as well as strategic knowledge for achiev- gies have been studied in the literature: GAs [21], neural-net learning [3], statistical learning [13], and reinforcement learning [14]. The second, more ambitious, approach [7, 30] is to acquire the agent knowledge directly from example interactions with the environment. The success of this approach is tied to the efficacy of the credit assignment procedures, and whether or not it is possible to obtain good training runs with a knowledge-impoverished agent. We have adopted the first approach. The direction we pursue is to compile an initial parametric agent using high-level strategic knowledge (e.g., advice) input by the user, as well as a body of general (not domain-specific) spatial knowledge in the form of a Spatial Knowledge Base (SKB). The SKB contains qualitative rules about movement in space. Example rules in our SKB are If something is on my side, and I turn to the other side, I will not be facing it and If I move toward something it will get closer. This SKB is portable because it is applicable to a variety of domains where qualitative spatial knowledge is important. A similar qualitative knowledge base was constructed by [18] for the task of pushing objects in a plane. Since the knowledge provided to our agent is imperfect (incomplete and incorrect), our agent refines the knowledge using a GA by directly interacting with the world. First we describe our deductive advice opera-tionalization process and the nature of the parameterization adopted for our agent. Then we describe the inductive (GA) refinement stage and compare our multistrategy approach with one that is purely inductive. Before we present the details of the method, we characterize the class of environments and tasks for which we have found this decomposition of an agent design into an initial parametric stage and subsequent refinement stage to be effective. - Env^ir^onment char^acl^erisl^ics: Complete models of the dynamics of the environment in the form of differential equations or difference equations, or discrete models like STRIPS operators, are unavailable. An analytical design that maps the percepts of an agent to its actions (e.g., using differential game theory or control theory) in these domains is thus not possible. Even if a model were available, standard analytical methods for deriv- ing agents are extensional and involve exploration of the entire state space. They are inapplicable here because the domains we consider have of the order of a hundred million states. Task char^ad^erist^ics: Tasks are sequential decision problems: payoff is obtained at the end of a sequence of actions and not after individual actions. Examples are pursuit-evasion in a single or multi-pursuer setting and navigating in a world with moving obstacles. The tasks are typically multi-objective in nature: for instance for pursuit-evasion, the agent needs to minimize energy consumption while maximizing the time till capture by the pursuers. Agent char^ad^erist^ics: The agent has imperfect sensors. Imperfections occur in the form of noise, as well as incompleteness (all aspects of the state of the world cannot be sensed by our agent, a problem called per-cept^ual aliasing in [32]). Stochastic differential game theory has methods for deriving agents with noisy sensors, but it requires detailed models of the noise as well as a detailed model of the environment and agent dynamics. The action set of the agents and the values taken on by sensors are discrete and can be grouped into equivalence classes. This is the basis for the design of the parametric agent. A similar intuition underlies the design of fuzzy controllers that divide the space of sensor values into a small set of classes described by linguistic variables. In domains with characteristics such as those just described, human designers typically derive an initial solution by hand and use numerical methods (usually very dependent on the initial solution) to refine their solution. Our ultimate objective is to automate the derivation of good initial solutions by using general knowledge about the environment, task, and agent characteristics and thus provide a better starting point for the refinement process. We begin with the SKB and advice. 2 Compiling Advice Our operationalization method compiles highlevel domain-specific knowledge (e.g., advice) and spatial knowledge (SKB) into low-level reactive rules directly usable by the agent. The compilation performs deduct^iv^e cona^el^ion [16] because it deductively converts abstract goals and other knowledge into concrete actions. An important question is why we adopt a deductive procedure for the operationalization of advice. At this time, we are able to generate good parametric designs with deductive inference alone. We expect that as we expand our experimental studies to cover more domains, the incompleteness of the SKB will force us to adopt more powerful operationaliza-tion methods. The advice compilation problem can be stated as follows: Given: - Strategic advice of the form: Set c^fSt^at^es —> achiev^efGoal), which recommends that Goal be achieved (be made to hold in the world) whenever the agent finds itself in a state that matches the antecedent of the advice. Note that there is no requirement that the set of states in the antecedent be directly perceivable. The agent may have to plan to acquire the information needed to determine whether or not it is in the specified set of states. - Qualitative factual knowledge about the domain in the form of a description of a Set of States. - A qualitative theory with information of the following types: (1) Domain-specific rules. a. A set of domain-specific terminological mappings that define terms needed for interpreting the advice: these mappings are of the form Set of statues —y Set of st^at^es and are vocabulary conversion rules, b. A set of state transition rules that represent the dynamics of the world: these mappings are of the form Set of st^ates x A — • Set of spates, where A is the set of actions, c. A set of rules for predicting other agent's actions: these mappings are of the form Set o^fSt^at^es x A A.. These rules are special to multi-agent domains. (2) A general qualitative theory of movement (SKB). This is a common subpart of the domains that we consider here. Therefore it need not be reintroduced by the user for each new domain. The SKB is of the form Set of st^at^es x A^M —* Set ofst^at^es, where A^M is the subset of actions A that involve movement. - The operational sensors P and actions A of the agent. A detailed agent design involves finding a mapping from the sense history P* of the agent to A. This is usually simplified to be of the form —> A so that only current percepts determine the next action. The map need not be deterministic. Indeed, in the GA architecture, which is our target architecture, the agent map is non-deterministic. Find: An operational mapping —> A from the sensors to the effectors of the agent such that the map implements the provided strategic advice. By this we mean that the map tests the conditions under which the recommended goal is to be achieved, and the action sequence it generates in the world places the agent in a state in which the goal is achieved. In summary, advice tells the agent to achieve a goal under specified conditions. Facts, domain-specific rules, a general theory of movement, and a list of what is operational are all required for interpreting this advice and compiling it into an operational form. As an example, consider the problem of an agent that tries to evade a pursuing adversary, (For a more detailed description of this problem, see Section 4.3.) We assume the adversary is initially faster than the agent, though it is less ma-neuverable. Both the adversary and the agent lose speed as they turn, and the adversary's speed loss is permanent. The agent and the adversary can sense each other's instantaneous position and velocity to within some specified accuracy. Although these are facts of the problem, they are initially unknown to the agent. The strategic advice offered by a human expert for this problem is the following: if the adversary's speed is high, then try to slow down the adversary; if the adversary's speed is low, then avoid the adversary. In this example the preconditions of the advice involve predicates that the agent can sense; however it needs to pin down the semantics of hig^h and l^ow. The goals to be achieved are: slow down the adversary in the one case, and avoid the adversary in the other. These are not directly implement able using a single action from A. We need to devise a plan to realize these goals using knowledge about the domain, including knowledge about movements. These plans form the initial parametric design with high and ^ow being two of the parameters whose ranges need to be learned by interaction with the environment. An example SKB rule that is used to compile our advice is: IF bearing(agent,adversary) = right turn(agent) = left THEN heading(adversary,agent) a headon. Heading (X,Y) refers to the direction of motion of Y relative to and bearing (X.Y) refers to the direction of Y relative to This SKB rule states that an agent no longer faces an adversary when it turns away from it. Turn is an operational action. To compile the advice, we also need terminological mappings that define terms such as av^oidi. An example definition needed to compile our advice is: IF range (X.Y) # close PNSD heading ^Y,X) ^ headon THEN avoids/X, Y). Avoids(X, Y) means X avoids Y. Factual knowledge is also required for compilation, e.g., Moving(agent). Moving(adversary) . From our advice, the set of operational sensors and effectors, and our qualitative theory, compilation results in a set of operational sensor-effector mappings. The set of all these operational sensor-effector mappings is what we consider to be a re-act^i^eplan (agent map). A portion of the reactive plan that implements our advice, and is the product of our compilation procedure, is shown below. IF speed (adversary) = high range(agent.adversary) = close THEN turn(agent) = hard IF speed (adversary) = high range(agent,adversary) -£ close bearing(agent,adversary) = left THEN turn(agent) = left IF speed (adversary) = high range(agent,adversary) a close bearing(agent,adversary) = right THEN turn(agent) = right IF speed (adversary) = low bearing(agent,adversary) = right THEN turn (agent) = left IF speed (adversary) = low bearing(agent,adversary) = left THEN turn(agent) = right This is a fairly complex plan. It implements the strategic advice in the following manner: when the adversary's speed is high, the agent moves toward the adversary and makes a hard turn when it is close enough. Since the adversary is chasing the agent, this plan will cause the adversary to turn hard, which then causes the adversary to permanently lose speed. If the adversary is moving slowly, the agent simply stays out of its way. In this section we present an algorithm that generates plans of this form starting from high level advice and a qualitative domain theory. Our compilation algorithm is tailored for situations in which the execution of plans, but not the learning of plans, is highly time-critical. We assume the advice provided is operationalized immediately, though it need not be applied immediately. We precompile all high-level knowledge into an operational form because the agent will apply it in a time-critical situation. In our approach, advice and SKB rules have nonoperational elements, and the compilation process results in rules that are fully operational. The compilation algorithm, shown in Figure 1, uses two stacks: a GoalStack and an (operational condition) OpCondStack. Four types of knowledge are initially given to the compiler: advice, facts, nonoperational rules (abbreviated nonop rules), which include domain-specific rules and the SKB, and the set of operational sensors and actions. The output from compilation is a set of op rules directly usable by the agent, i.e., operational. Advice has the form: ACHIEVE goal. Facts have the form: Predicate (Xi, .... Xn) Nonop rules have the form: IF cond ^^D THEN goal. cond action> The set of operational sensors and actions are presented to the agent as a list. Anything in angle brackets is optional. The portion preceding the THEN is the rule anl^ecedenl^, and the portion following the THEN is the rule consequents. A nonop rule consequent is a single goal. The syntax for a goal is function(Xi,.. .,XnJ = // value or predicate (Xi,.. .,XnJ. Each X,-is an object (e.g., an agent). The syntax for a cond (condition) or action in the rule antecedent is the same as for goals. Although advice has a similar syntax to nonop rules, its interpretation differs. Advice recommends achieving the given goal under the given conds. A nonop rule, which is needed to compile this advice, states that the given goal will be achieved if the given conds (and action) occur. Compilation results in stimulus-response op rules of the form: IF cond ^^D ... PtD cond THEN action. The conditions of an op rule are sensor values detectable by the agent. The action can be performed by the agent. Our compilation algorithm in Figure 1 takes advice and backchains through the SKB and user-provided nonop rules until an operational action is found. Once an operational action is found, it pops back up the levels of recursion, attaching operational conditions along the way, to form a new reactive agent op rule. To prevent cycles, the last nonop rule used in step 3 is marked as "used" so that it will not be used again. Let us examine a simple example of how this algorithm operates, as shown in Figures 2 and 3. Consider the advice which advocates avoiding the adversary when the adversary's speed is low. IF speed (adversary) = low THEN ACHIEVE heading(adversary,agent^ headon Figure 2 shows how SKB nonop rules match this advice for backchaining, thereby creating an "and" tree. Anything preceded by a "*" is operational. Pushadvice on GoalStack: goal followed by conditions. Initialize OpCondStack to be empty and invoke Compile(GoalStack,OpCondStack). Procedure Compile (GoalStack, OpCondStack) if GoalStack is not empty then g<—pop (GoalStack) ; case g: 1. g is an operational condition: Push(g, OpCondStack); Compile(GoalStack, OpCondStack) ; 2. g matches a fact: Compile(GoalStack, OpCondStack) ; 3. g is nonoperational: foreach nonop rule whose consequent matches g do Push(antecedent(.R;), GoalStack) ; Compile(GoalStack, OpCondStack); 4. g is an operational action Form a new op rule from the contents of OpCondStack and g; Clear OpCondStack; else Clear OpCondStack; Figure 1: Algorithm for operationalizing advice Figure 3 shows this algorithm in operation. Note that stacks grow downward. The algorithm begins by pushing the advice goal, followed by the advice condition, on the GoalStack. It then calls procedure Compile, which moves the advice condition to the OpCondStack because it is operational. The advice goal is not operational. In our example, the advice goal can be unified with the goal of SKB RULEl, which states IF bearing(agent.adversary) = right turn(agent) = left THEN heading(adversary,agent) ^ headon. The condition and action of RULEl are pushed on the GoalStack. Because the condition of RULEl is operational, it is moved to the OpCondStack. Note that if this condition had not been oper- ADVICE COND, ^speed(V hd(VA) = low ^headon SKB RULE 1 SKB RULE 2 Figure 2: Graph of example. A is short for agent, V for adversary, br for bearing and hd for heading. ational, our compilation algorithm would have searched for all nonop rules whose consequents unify with this nonoperational condition. Further backchaining would have continued until operational conditions were found. At this point, the action of RTJLE1 is at the top of the GoalStack, and it is operational, so we can create an op rule. The conditions from the OpCondStack are added to the action. This creates an op rule that states IF speed (adversary) = low bearing(agent,adversary) = right THEN turn (agent) = left Both stacks are cleared. The algorithm (see Figure 2) continues similarly to generate a second op rule from SKB RULE2 that states IF speed (adversary) = low bearing(agent, adversary) = left THEN turn(agent) = right Next, we apply a conversion from qualitative to quantitative op rules. The rules are given default quantitative ranges. For example, if speed has two values, slow and fast, we bisect the range of all possible values into two subranges. Then, we allow the system to improve this initial choice of quantitative ranges by using a GA to refine the initial ranges while interacting with the environment. 3 Executing and Refining Advice The system we use to refine and apply the op rules derived from our compiled advice is the SAMUEL reactive planner [7]. We have chosen SAMUEL because this system has already proven to be highly effective for refining rules on complex domains [7, 25]. SAMUEL adopts the role of an agent in a multiagent environment in which it senses and acts. This system has two major components: a performance module and a learning module. Section 4.2 explains how performance interleaves with learning in our experiments. The performance module, called the Competitive Production System (CPS), interacts with a simulated or real world by reading sensors, setting effector values, and receiving payoff from a critic. CPS performs matching and conflict resolution on the set of op rules. This performance module follows the match/conflict-resolution/act cycle of traditional production systems. Time is divided into episodes: the choice of what constitutes an episode is domain-specific. Episodes begin with random initialization and end when a critic provides payoff. At each time step within an episode, CPS selects an action using a probabilistic voting scheme based on rule strengths. All rules that match (or partially match - see [7]) the current state bid to have their actions fire. The actions of rules with higher strengths are more likely to GoalStack-OpCoadStack hd(V,A)Aheadon "speed(V)=low GoalStack hd(V,A)Aheadon OyCondStack *speed(V)=lo 2 4 GoalStack OyCondStack 1 GoalStack OyCondStdck *turn(A)= *br(A,V): =left *speed(V)=lc>J =right v *bur(A(,AVj==iijfh?speed(V)=lo< 3 5 GoalStack OvCnndStacK GoalStack OyCondStack *turn(A)= left *speed(V)=lo *br(A,V)=rig V it *turn(A)=right"speed(V)=lo' *br(A,V)4left Figure 3: Example of compilation algorithm fire. If the world is being simulated, then after an action fires, the world model is advanced one simulation step and sensor readings are updated. CPS assigns credit to individual rules based on feedback from the critic. At the end of each episode, all rules that suggested actions taken during this episode have their strengths incrementally adjusted to reflect the current payoff. Over time, rule strengths reflect the degree of usefulness of the rules. SAMUEL's learning module is a genetic algorithm. GAs are motivated by simplified models of heredity and evolution in the field of population genetics [8]. GAs evolve a population of individuals over a sequence of generations. Each individual acts as an alternative solution to the problem at hand, and its fitness (i.e., potential worth as a solution) is regularly evaluated. During a generation, individuals create offspring (new individuals). The fitness of an individual probabilistically determines how many offspring it can have. Genetic operators, such as crossover and mutation, are applied to produce offspring. Crossover combines elements of two individuals to form two new individuals; mutation randomly alters elements of a single individual. In SAMUEL, an individual is a set of op rules, i.e., a reactive plan. In addition to genetic operators, this system also applies non-genetic knowledge refinement operators, such as generalize and specialize, to op rules within a rule set. The interface between our compilation algorithm and the SAMUEL system is straightforward. The output of our compilation algorithm is a set of op rules for the SAMUEL agent. Because the op rules may be incomplete, a r^andom r^ule is added to this rule set. The random rule recommends performing a random action under any conditions. The final rule set, along with CPS and the GA learning component for improving the rules, is our initial agent. 4 Evaluation We have not yet analyzed the cost of our compilation algorithm. The worst case cost appears to be exponential in the sizes of the inputs to the algorithm because the STRIPS planning problem can be reduced to it. In the future, we plan to investigate methods to reduce this cost for complex realistic problems. Potential methods include: (1) attaching a likelihood of occurrence onto advice, which enables the agent to prioritize which advice to compile first if time is limited, (2) tailoring the levels of generality and abstraction of the advice to suit the time available for compilation (e.g., less abstract advice is closer to being operational and therefore requires less compilation time), and (3) generating a parallel version of the algorithm. We have evaluated our multistrategy approach empirically. We focus on answering the following questions: 1. Will our advice compilation method be effective for a reactive agent on complex domains? 2. Will the coordination of multiple learning techniques lead to improved performance over using any one learning method? In particular, we want the GA to increase the success rate of the compiled advice, and the advice to reduce the convergence rate of the GA. Success and failure have domain-specific definitions (see Sections 4.3 and 4.4). The convergence rate is defined as the time taken to ascend to a given success rate. A reduced convergence rate is useful when learning time is limited. 3. Can we construct a portable SKB? 4.1 Domain characteristics To address our questions, we have run experiments on two complex problems: Evasion and Navigation. Our choice of domains is motivated by the results of Schultz and Grefenstette, who have obtained large performance improvements by initializing the GA component of SAMUEL with hand-coded op rules in these domains [25]. Their success has inspired this work: our objective is to automate their tedious manual task, and the work described here is one step toward the goal. Both problems are two-dimensional simulations of realistic tactical problems. However, our simulations include several features that make these two problems sufficiently complex to cause difficulties for more traditional control-theoretic or game-theoretic approaches [7]: — A weaA' domain model: The learner has no initial model of other agents or objects in the domain. Most control-theoretic and game-theoretic models make worst case assumptions about adversaries. This yields poor designs in the worlds we consider because we have statistical rather than worst case adversaries. — Incomplete state information: The sensors are discrete, which causes perceptual aliasing: a many-to-one mapping from actual states to perceived states. - A large state space: The discretization of the state space makes the learning problem combinatorial. In the Evasion domain, for instance, over 25 million distinct feature vectors are observed. — Delayed payoff: The critic only provides payoff at the end of an episode. Therefore a credit assignment scheme is required. - Noisy sensors: Gaussian noise is added to all sensor readings. Noise consists of a random draw from a normal distribution with mean 0.0 and standard deviation equal to 5% of the legal range for the corresponding sensor. The value that results is discretized according to the defined granularity of the sensor. A 5% noise level is sufficient to slightly degrade SAMUEL's performance. 4.2 Experimental design Two sets of experiments are performed on each of the two domains. Perception is noise-free for the first set, but noisy for the second. The primary purpose of the first set is to address our question about the effectiveness of our advice compilation method alone, without GA refinement. Facts, nonop rules, advice, and the set of operational sensors and actions are given to the compiler and the output is a set of op rules. The random rule is added to the op rules and this rule set is given to SAMUEL's CPS module to be applied within the simulated world model. The baseline performance with which these rules are compared is the random rule alone. These experiments measure how the success rate of the compiled rules compares with that of the baseline as problem complexity increases, where the success rate is an average over 1000 episodes. Statistical significance of the differences between the curves with and without advice are presented. Significance is measured using the large-sample test for the differences between two means. The primary purpose of the second set of experiments is to address our question about the effectiveness of the multistrategy approach (compilation followed by GA refinement). We used the same set of op rules (i.e., the output of the compiler) for this second set of experiments as was used for the first set of experiments. This rule set, plus the random rule, becomes every individual in SAMUEL's initial GA population, i.e., it seeds the GA with initial knowledge. The baseline performance with which these rules are compared is SAMUEL initialized with every individual equal to just the random rule. In either case, GA learning evolves this initial population. In other words, we compare the performance of advice seeding the GA with GA learning alone (i.e., random seeding). Random seeding produces an initially unbiased GA search; advice initially biases the GA search - hopefully into favorable regions of the search space. In this second set of experiments, performance of the agent in the environment interleaves with GA refinement. SAMUEL runs for 100 generations using a population size of 100 rule sets. Every 5 generations, the best (in terms of success rate) 10% of the current population are evaluated over 100 episodes to choose a single plan to represent the population. This plan is evaluated on 1000 randomly chosen episodes and the success rate is calculated. This entire process is repeated 10 times and the final success rate, averaged over all 10 trials, is found. The curves in our graphs plot these averages. For this set of experiments, statistical significance is measured using the two-sample £-test, with adjustments as required whenever the F statistic indicates unequal variances. We add sensor noise, as defined in Section 4.1, for this second set of experiments because GAs can learn robustly in the presence of noise [7]. Two performance measures are ■ used: the success rate and the convergence rate. The convergence rate is defined as the number of GA generations required to achieve and maintain an n% success rate, where n is different for each of the two domains. The value of n is set empirically. 4.3 Evaluation on the Evasion problem Our simulation of the Evasion problem is partially inspired by [4]. This problem was introduced in Section 2. The problem consists of an agent that moves in a two-dimensional world with a single adversary pursuing the agent. The agent is controlled by SAMUEL, and the adversary is controlled by a simple set of rules. The agent's objective is to avoid contact with the adversary for a bounded length of time. Contact implies the agent is captured by the adversary. The problem is divided into episodes that begin with the adversary approaching the agent from a random direction. The adversary initially travels faster than the agent, but is less maneuverable (i.e., it has a greater turning radius). Both the agent and the adversary gradually lose speed when maneuvering, but only the adversary's loss is permanent. An episode ends when either the adversary captures the agent (failure) or the the agent evades the adversary (success). At the end of each episode, a critic provides full payoff for successful evasion and partial payoff otherwise, proportional to the amount of time before the agent is captured. The strengths of op rules that fired are updated in proportion to the payoff. The agent has the following operational sensors: time, last agent turning rate, adversary speed, adversary range, adversary bearing, and adversary heading. The agent has one operational action: it can control its own turning rate. For further detail, see [7]. In our experiments, we provide the following domain-specific knowledge to the agent: FACTS: Chased-by(agent,adversary). Moving(agent). Moving(advers ary) . DOMAIN-SPECIFIC NQNOP RULES : IF chased-by fx, y; AND range(X.y) = close AND turn(X)=Z THEN turn(F) = Z. IF range (X.y) ^ close AND heading ^ headon THEN avoids fx ,Y) . IF turn (adversary) = hard THEN decelerates (adversary) . 340 Informatica 17 (1993) 331-346 D. Gordon and D. Subramanian IF speed (adversary) = high THEN ACHIEVE decelerates(adversary). IF speed (adversary) = low THEN ACHIEV^ avoids(agent,adversary). We also include knowledge of the agent's operational sensors and actions as facts. Ten SKB nonop rules are used (they are instantiations of the two rules described in English in the introduction of this paper). Although room does not permit listing them all, some examples are: IF bearing (X.Y) = right AND turn(X) = left THEN heading (Y,X) + headon. IF bearing {X, Y) = left AND moving(X) AND turn(X) = left THEN range(X.F) = close. From our input and our SKB rules, the compilation method of Section 2 generates op rules. The sensor values of these rules are translated from qualitative values to default quantitative ranges. For example, bearing = left is translated into bearing = [6.. 12], where the numbers correspond to a clock, e.g., 6 means 6 o'clock. Every new rule is given an initial strength of 1.0 (the maximum). The final op rule set includes rules such as1 100$ Evasion Domain 90 80 70 60 50 40 30 20 10 T -+0 _ + o - ++ + o "1 I r Advice O No Advice + - IF speed (adversary) = [700..1000] PtD range(agent,adversary) = [0..750] THEN turn(agent) = hard-left. The total number of op rules generated from our advice is 16. We begin our experiments by addressing the first question, which concerns the effectiveness of our advice-taking method. We do not use the GA. Problem difficulty is varied by adjusting a safety envelope around the agent. The safety envelope is the distance at which the adversary can be from the agent before the agent is considered captured by the adversary. Figure 4 shows how the performance (averaged over 1000 episodes) of these op rules compares 1To generate a few of these rules, we used a variant of our compilation algorithm. We omitted a description of this variation for the sake of clarity. See [6] for details. 0 20 40 60 80 100 120 140 160 180 200 Increasing Problem Difficulty Figure 4: Results on the Evasion domain without GA refinement. Y axis denotes success rate. with that of just the random rule. All of the differences between the means are statistically significant (using significance level a = 0.05). From Figure 4 we see that from difficulty levels 80 to 120, the agent is approximately twice as successful with advice than without it. This is a 100% performance advantage. Furthermore, for levels 130 to 160, the agent is about four times more effective with advice. For levels 160 to 200, the agent is an order of magnitude more effective with advice. We conclude that as the difficulty of this problem increases, the advice becomes more helpful relative to the baseline. These results answer our first question: our advice compilation method is effective on this domain. We address the second question about multi-strategy effectiveness by combining the compiled advice with GA refinement. Figure 5 shows the results of comparing the performance of the GA with and without advice. The safety envelope is fixed at 100 (chosen arbitrarily) and noise is added to the sensors. For this domain, the convergence rate is the number of GA generations required to maintain a 60% success rate. Figure 5 shows that in a small amount of time (approximately 10 generations), the GA doubles the success rate of the advice rules. Furthermore, the addition of advice produces a 3.5-fold improvement in the convergence rate over using a GA with random initialization. The differences between the means are statistically significant for the first 65 generations, but not afterwards (a = Evasion Domain 100 80 60 40 20 0 Advice C" No Advice + ^ +++++++ + <> + + + 0 20 40 60 80 100 GA Generations Figure 5: Results on the Evasion domain with the GA. Y axis denotes success rate. 0.05).2 Apparently, the advice initially biases the GA into a favorable region of the search space, which improves the convergence rate over random initialization. The convergence value, however, does not appear to be higher with advice than without it because eventually the performance of the randomly initialized GA is roughly equivalent to that of the GA seeded with advice. 4.4 Evaluation on the Navigation problem In the Navigation domain, our agent is again controlled by SAMUEL in a two-dimensional simulated world. The agent's objective is to avoid obstacles and navigate to a stationary target with which it must rendezvous before exhausting its fuel (implemented as a bounded length of time for motion). Each episode begins with the agent centered in front of a randomly generated field of obstacles with a specified density. An episode ends with either a rendezvous at the target location (success) or the exhaustion of the agent's fuel or a collision with an obstacle (failure). At the end of an episode, a critic provides full payoff 2These results differ from the results presented in [6]. This discrepancy results from an improvement in the initial rule strengths. Future work will focus on studies analyzing system sensitivity to parameter variations. Our initial studies indicate that performance is more robust with respect to changes in some parameters, e.g., variations in the qualitative to quantitative mappings, than to changes in others, e.g., the initial rule strengths. if the agent reaches the target, and partial payoff otherwise, depending on the agent's distance to the goal. The agent has the following operational sensors: time, the bearing of the target, the bearing and range of an obstacle, and the range of the target. The agent has two operational actions: it can control its own turning rate and its speed. For further detail, see [24]. We provide the following domain-specific knowledge (in addition to a list of operational sensors and actions): FACTS : Moving(agent) . DOMAIN-SPECIFIC N0N0P RULES: IF range (X,Y) ^ close AND heading ^ headon THEN avoids (X,Y) . ADVICE: IF range(agent,obstacle) ^ close THEN ACHIEVE range (agent.target) = close. IF range(agent,obstacle) = close THEN ACHIEVE avoids(agent,obstacle). ACHIEVE speed (agent) = high. The same 10 SKB nonop rules used for Evasion are again used for this domain and are successful in the compilation procedure. This confirms the portability of our SKB for these two domains, thus addressing our third question. Also, although the definition of avoids was intended to be domain-specific, it actually applies to both of our domains. A total of 42 op rules are generated.3 Again, we address the first question by using SAMUEL without the GA. We increase the problem difficulty on this domain by increasing the number of obstacles. The success rate is an average over 1000 episodes. Without advice, the success rate is 0% because this is a very difficult domain. Figure 6 shows how we improve the success rate to as much as 90% by using our advice on this domain. We increase problem difficulty by increasing the number of obstacles. At all but the last few points, the differences between the ■ means are statistically significant (a — 0.05). When we 3 We were able to decrease the number of op rules to 9 by making one careful qualitative to quantitative mapping choice. 100 80 60 40 20 0 Navigation Domain 0 10 20 30 40 50 60 70 80 90 100 Increasing Problem Difficulty 100 80< 60 40 20 o- ^ ^ A A Navigation Domain 40 60 GA Generations Figure 6: Results on the Navigation domain without GA refinement. Y axis denotes success rate. vary the number of obstacles, performance follows a different trend than for the Evasion domain. By far the greatest benefit of the advice occurs when there are few obstacles. The success rate is 90% with advice and 0% without advice when there is only one obstacle, for example. The advantage drops as the problem complexity increases. After difficulty level 80, the particular advice we gave the agent no longer offers any benefit. For this level of problem difficulty, different advice is probably more appropriate. Our experiments on both domains confirm that our advice compiler can be effective, however, they also indicate that the usefulness of advice may be restricted to a particular range of situations. Another learning task, which we are currently exploring, would be to identify this range and add additional conditions to the advice. We address the second question by comparing the performance of the GA with and without advice. Noise is added. Figure 7 shows the results. All differences between the means are statistically significant (a = 0.05). Here, the number of obstacles is fixed at five (chosen arbitrarily). For this domain, the convergence rate is the number of GA generations required to maintain a 95% success rate. Figure 7 shows that the addition of advice yields an enormous performance advantage on this domain. Figure 7 also shows that given a moderate amount of time (10 generations), the GA provides a 10% increase in the success rate. Figure 7: Results on the Navigation domain with the GA. Y axis denotes success rate. Furthermore, the addition of advice produces an 18-fold improvement in the convergence rate over using GAs alone. Not only does advice improve the convergence rate, but it also appears to improve the level of convergence: after 80 generations, the GA with advice holds a 99% or above success rate whereas after all 100 generations the GA without advice still cannot get above a 97% success rate.4 To further test our compilation method, we have recompiled our Navigation advice into op rules for a Nomad 200 mobile robot that is equipped with very noisy sonar and infrared sensors and can adjust its turning rate and speed. The sensors are so noisy that the robot sometimes mistakes two boxes four feet apart for a wall. The op rules that result from compilation have not been refined by the GA to develop a tolerance to noise; therefore, this noise poses a severe challenge. The op rules are linked to a vendor-provided interface that translates the language of the SAMUEL rules (e.g., IF asonar4 [17..85] THEN SET turn -400) into joint velocity and servo motor commands. From high-level advice to avoid obstacles and rendezvous with a goal point, our method has compiled rules that enable the 4 On both this domain and the one described in the previous section, our results are for 100 GA generations. In future work we plan to run the GA longer to determine whether our conclusions hold over a longer time period. robot to succeed approximately a third of the time in avoiding three large boxes and reaching a goal point on the other side of a room. The same SKB rules are used for compilation. With the random rule alone, it is extremely unlikely to successfully complete this task. Our next step will be to refine these robot rules using GAs within a simulation of this actual domain.5 In conclusion, our multistrategy system offers two advantages. First, it provides an initial boost from seeding with high-level knowledge. On both simulated domains we see a significant improvement in the convergence rate - an order of magnitude on the Navigation domain. Second, the multistrategy system provides the robustness and improvement gained from GA refinement. Refinement yields a 10% increase in the success rate on Navigation and a 100% increase on Evasion. The fact that on both domains the advice biases the GA into a favorable region and significantly improves its convergence rate is very important in complex domains where a GA approach alone may not be feasible or may be very inefficient. Finally, the effectiveness of our advice-taking method has been confirmed on a robot with poor sensory capabilities. To determine whether the presence of noise influences our conclusions, we have repeated our experiments on the two simulated domains with sensor readings that are not noisy. Although the success rates differ slightly, our conclusions still hold. 5 Related Work This work relates most strongly to the following topics in machine learning: advice taking, combining projective and reactive planning, methods for compiling high-level goals into reactive rules, learning in fuzzy controllers, and multistrategy learning. This work also relates to research in differential game theory. We discuss each in turn. 5.1 Machine Learning Advice taking has been considered by McCarthy as early as 1958 [15] and later by Mostow [19] and 5We wish to use SAMUEL both to handle the noise and because we had to manually refine the qualitative to quantitative mappings somewhat - SAMUEL could automate this. others. To date, research on assimilating advice in embedded agents has been limited but encouraging. Previous research has focused mainly on providing low-level knowledge. For example, Laird et a^^. [11] and Clouse and Utgoff [3] have had good success providing agents with information about which action to take. Chapman gives his agent high-level advice [2] . Our advice taker differs from Chapman's because it can operational-ize advice long before the advice is applied and because it refines the advice with a GA. Most important of all, our advice taking method is unique because it involves a multistrategy approach that couples a knowledge-intensive deductive precompilation phase with an empirical inductive refinement phase. We assume that high-level knowledge is opera-tionalized but not applied immediately. Methods for operationalizing advice that will be applied immediately include STRIPS-like planners [20] and explanation-based learning (EBL) planners e.g., [26]. A closely related system is Mitchell's [17]. This system combines EBL projective planning with reactive planning. Our method for compiling goals is similar to that of EBL because it uses the notion of operationally. It differs because we do not assume that the advice will be applied immediately, and therefore our compilation method has no current state on which to focus plan generation. All of the above-mentioned methods create a projective plan to achieve a goal from the current state. We precompile advice for multiple possible states. Because our method precompiles plans from possible states rather than from a current state, it is very similar to the methods of Schoppers [23] and Kaelbling [9] for compiling high-level goals into low-level reactive rules. Our method differs from those of Schoppers and Kaelbling because it includes the EBL notion of operationally. Also unlike Schoppers and Kaelbling, we use a refinement method following compilation. Considerable prior work has focused on knowledge refinement. Others have used GAs to refine qualitative to quantitative mappings. For example, Karr uses GAs to select fuzzy membership functions for a fuzzy controller [10]. Lin, Mahade-van and Connell, and Singh initialize their systems with modular agent architectures then refine them with reinforcement learning [12], [14], and [27]. Lin trains a robot by giving it advice in the form of a sequence of desired actions. Mahadevan and Connell initialize their reinforcement learner with a prespecified subsumption architecture, and Singh guides his reinforcement learner by giving it abstract actions to decompose. One of the most similar approaches to ours is that of Towell and Shavlik [31]. They also couple rule-based input with a refinement method; however, their refinement method is neural networks. This multistrategy system converts rules into a network topology. The content of each rule is preserved; therefore, the transformation is syntactic. Our multistrategy system, on the other hand, focuses primarily on semantic transformations that use qualitative knowledge about movements in space to convert abstract goals into concrete actions. Ram and Santamaria [22] present a multistrategy refinement scheme for a parametric agent performing a navigation task. In contrast with our genetic algorithms based approach, their system acquires values for the defined parameters by using reinforcement learning. Different parameter value combinations are acquired, for distinct environment subclasses. The interesting feature of their system is the explicit identification of different environmental configurations using case-based reasoning. The identification of regularities in the environment occurs implicitly in the genetic algorithms approach. The deductive compilation scheme (but not the refinement) is in common with Mitchell e? a/.'s derivation of a strategy for pushing objects in a tray using a qualitative theory of the process [18]. 5.2 Differential game theory Differential game theory is a branch of mathematical optimal control theory. It assumes that the behavior of the controlled system can be modeled as a system of ordinary differential equations (ODEs). The evasion problem considered in this paper is a typical example of a differential game. In particular, the problem is two-person zero-sum differential game with a constant terminal time. Both the pursuer and the evader move in a bounded rectangle in two dimensions. The evader has to avoid getting to within a certain distance of the pursuer for a certain length of time. In the minimax formulation of the problem, the optimal strategy of the evader is one that achieves its ob- jective under the least favorable assumptions on the motion of the pursuer. Differential games are formulated mathematically by specifying the motion equations of the pursuer and evader, the class of admissible controls for both systems (which identifies the way in which the pursuer and evader can change their motions), and the target or goal functional. A classic reference for this is [1]. The focus of work in differential game theory is to identify conditions under which optimal strategies for the evader can be derived. This assumes complete knowledge of the dynamics of the evader and pursuer, both of which are unavailable to us. The theory would be more useful to us if it had a qualitative counterpart which allowed us to determine the existence of solutions to the evader's problem from partial knowledge of the evader and pursuer's dynamics. 6 Discussion We have presented a novel multistrategy learning method for operationalizing and refining highlevel advice into low-level rules to be used by a reactive agent. Operationalization uses a portable SKB. An implementation of this method has been tested on two complex domains and a Nomad 200 robot. We have learned the following lessons: 1. Our advice compiler can be effective on complex domains, and it will be important to identify the regions of greatest effectiveness for advice, 2. A portable SKB appears feasible, and 3. Coordinating a deductive learning strategy (advice compilation) with an inductive learning strategy (GA refinement) can lead to a substantial performance improvement over either method alone. This success, however, depends on the how the advice biases the GA search. Future work will focus on identifying those characteristics of advice that bias this search favorably. We will also focus on further addressing our questions about performance using different advice and alternative domains (e.g., [28]). Many other interesting directions are suggested by our experimental results. At present we do not consider the cost of incorporating advice. For larger scale problems and situations where advice is provided more frequently, the agent has to reason about the costs and benefits of compiling advice at a given point in time. Classical issues in trading off deliberation time for action time are relevant here. Another issue for future study is the problem of operationalization theory incompleteness. For example, our SKB was sufficiently complete for both domains but had it not been, we would have been faced with the problem of supplementing this knowledge base with additional rules. We are considering an approach like that used in the DISCIPLE system [29] to elicit advice and learn rules for operationalizing the advice. We have chosen the GA method for refinement because it was readily available to us. A comparison of a neural network and other approaches with our current GA approach, on the problems studied here, will provide valuable insights into the tradeoffs between different refinement strategies. We believe that multistrategy learning systems of the future must have a bank of operationaliza-tion and refinement methods at their disposal and have fast methods for selecting them. We have chosen a specific breakdown of effort between the advice compilation and refinement phases. How this coordinates with our choice of problem domains and refinement schemes is another question for future study. Acknowledgements We appreciate the helpful comments and suggestions provided by Bill Spears, Alan Schultz, Connie Ramsey, John Grefenstette, Alan Mey-rowitz. We would especially like to thank John Greffenstette for generously allowing us to use the SAMUEL system upon which our system is built. References [1] Basar, T. and G. Olsder, Dynamic Noncooperative Game Theory. New York: Academic Press, 1982. [2] Chapman, D., Vision, instruction and action. Ph.D. thesis. MIT, 1990. [3] Clouse, J. and P. Utgoff, A teaching method for reinforcement learning. In Proc. of the Ninth International Workshop on Machine Learning, 1992. [4] Erickson, M. and J. Zytkow, Utilizing experience for improving the tactical manager. In Proc. of the Fifth International Workshop on Machine Learning, 1988. [5] Gordon, D. and D. Subramanian, A multi-strategy learning scheme for assimilating advice in embedded agents. In Proc. of the Second International Workshop on Multistrategy Learning, 1993. [6] Gordon, D. and D. Subramanian, Assimilating advice in embedded agents. Unpublished manuscript, 1993. [7] Grefenstette, J., Ramsey, C, and A. Schultz, Learning sequential decision rules using simulation models and competition. Machine Learning, Volume 5, Number 4, 1990. [8] Holland, J. Adaptation in Natural and Artificial Systems. University of Michigan Press: Ann Arbor, 1975. [9] Kaelbling, L., Goals as parallel program specifications. In Proc. of the Seventh Na^-tional Conference on Artificial Intelligence, 1988. [10] Karr, C, Design of an adaptive fuzzy logic controller using a genetic algorithm. In Proc. of the Ninth International Conference on Genetic Algorithms, 1991. [11] Laird, J., Hucka, M., Yager, E., and C. Tuck, Correcting and extending domain knowledge using outside guidance. In Proc. of the Seventh International Conference on Machine Learning, 1990. [12] Lin, L., Programming robots using reinforcement learning and teaching. In Proc. of the Ninth National Conference on Artificial Intelligence, 1991. [13] Maes, P. and R. Brooks, Learning to coordinate behaviors. In Proc. of the Eighth Na^-tional Conference on Artificial Intelligence, 1990. [14] Mahadevan, S. and J. Connell, Automatic programming of behavior-based robots using reinforcement learning. In Proc. of the Ninth National Conference on Artificial Intelligence, 1991. [15] McCarthy, J., Mechanisation of thought processes. In Proc. Symposium, Volume 1, 1958. [16] Michalski, R., Inferential learning theory as a basis for multistrategy task-adaptive learning. In Proc. of the Eighth International Workshop on Multistrategy Learning, 1991. [17] Mitchell, T., Becoming increasingly reactive. In Pr^oc. of the Eighth National Conference on Artificial Intelligence, 1990. [18] Mitchell, T., Mason, M., and A. Christiansen, Toward a learning robot. CMU Technical Report, 1987. [19] Mostow, D. J., Machine transformation of advice into a heuristic search procedure. In R. Michalski, J. Carbonell, and T. Mitchell (Eds.), Machine Learning: An Artificial Intelligence Approach (Vol. 1). Tioga Publishing Co., Palo Alto, CA., 1983. [20] Nilsson, N., Principles of Artificial Int^elli-gence. Tioga: Palo Alto, 1980. [21] Odetayo, M. and D. McGregor, Genetic algorithm for inducing control rules for a dynamic system. In Proc. of the Third International Conference on Genetic Algorithms, 1989. [22] Ram, A. and J.C. Santamaria, A multi-strategy case-based and reinforcement learning approach to self-improving reactive control systems for autonomous robotic navigation. In Proc. of the Second International Workshop on MultiStr^ategy Learning, 1993. [23] Schoppers, M., Universal plans for reactive robots in unpredictable environments. In Proc. of the Sixth National Conference on Artificial Intelligence, 1987. [24] Schultz, A. and J. Grefenstette, Using a genetic algorithm to learn behaviors for autonomous vehicles. In Proc. of the Navigation and Control Conference, 1992. [25] Schultz, A. and J. Grefenstette, Improving tactical plans with genetic algorithms. In Proc. of the IEEE Conference on Tools for A^, 1990. [26] Segre, A., Machine Learning of Robot Assembly Plans. Kluwer: Boston, 1988. [27] Singh, S., Scaling reinforcement learning algorithms by learning variable temporal resolution models. In Proc. of the Ninth International Workshop on Machine Learning, 1992. [28] Subramanian, D. and S. Hunter, Some preliminary studies in agent design in simulated environments. Unpublished manuscript, 1993. [29] Tecuci, G. Cooperation in knowledge base refinement. In Proc. of the Ninth International Workshop on Machine Learning, 1992. [30] Tesauro, G., Temporal difference learning of backgammon strategy. In Proc. of the Ninth International Workshop on Machine Learning, 1992. [31] Towell, G. and J. Shavlik, Refining symbolic knowledge using neural networks. In Proc. of the Eighth Workshop on Machine Learning, 1991. [32] Whitehead, S. and D. Ballard, Learning to perceive and act by trial and error. Machine Learning, Volume 7, Number 1, 1991. MULTISTRATEGY LEARNING IN REACTIVE CONTROL SYSTEMS FOR AUTONOMOUS ROBOTIC NAVIGATION Ashwin Ram and Juan Carlos Santamarfa College of Computing, Georgia Institute of Technology, Atlanta, Georgia 30332-0280, U.S.A. Keywords: Robot navigation, reactive control, case-based reasoning, reinforcement learning, adaptive control Edited by: Gheorghe Tecuci Received: October 15, 1993 Revised: November 6, 1993 Accepted: November 14, 1993 This paper presents a self-impr^oving reactive control system for autonomous r^obotic navigation. The navigation module uses a schema-based reactive contr^ol system to perform the navigation task. The learning module combines case-based reasoning and reinforcement learning to continuously tune the navigation system thr^ough exper^ience. The case-based r^easoning component perceives and char^acter^izes the system's environment, retr^ieves an appropriate case, and uses the recommendations of the case to tune the parameters of the r^eactive contr^ol system. The reinforcement learning component refines the consent of the cases based on the cur^rent experience. Together^, the learning components perform on-line adaptation, resulting in impr^oved performance as the reactive control system tunes itself to the environment, as well as on-line case learning, resulting in an improved libr^ary of cases that captur^e envir^onmental regularities necessary to perform on-line adaptation. The system is extensively evaluated through simulation studies using several performance metrics and system configurations. 1 Introduction needed to learn from planning experiences (e.g., Mostow & Bhatnagar, 1987; Minton, 1988; Segre, Autonomous robotic navigation is denned as the 1988; Hammond, 1989). Such a representation is task of finding a path along which a robot can usually not available in real-world environments, move safely from a source point to a destination which are complex and dynamic in nature. To point in an obstacle-ridden terrain, and executing build the necessary representations, a fast and ac- the actions to carry out the movement in a real or curate perception process is required to reliably simulated world. Several methods have been pro- map sensory inputs to high-level representations posed for this task, ranging from high-level plan- of the world. A second problem with high-level ning methods to reactive control methods. planning is the large amount of processing time High-level planning methods use extensive required, resulting in significant slowdown and the world knowledge and inferences about the envi- inability to respond immediately to unexpected ronment they interact with (Fikes, Hart & Nils- situations. son, 1972; Sacerdoti, 1975; Georgeff, 1987; Maes, Situated or reactive control methods have been 1990). Knowledge about available actions and proposed as an alternative to high-level planning their consequences is used to formulate a detailed methods (Brooks, 1986; Kaelbling, 1986; Payton, plan before the actions are actually executed in 1986; Arkin, 1989). In these methods, no plan- the world. Such systems can successfully per- ning is performed; instead, a simple sensory rep- form the path-finding required by the navigation resentation of the environment is used to select task, but only if an accurate and complete rep- the next action that should be performed. Ac- resentation of the world is available to the sys- tions are represented as simple behaviors, which tem. Considerable high-level knowledge is also can be selected and executed rapidly, often in real- time. These methods can cope with unknown and dynamic environmental configurations, but only those that lie within the scope of predetermined behaviors. Furthermore, such methods cannot modify or improve their behaviors through experience, since they do not have any predictive capability that could account for future consequences of their actions, nor a higher-level formalism in which to represent and reason about the knowledge necessary for such analysis. We propose a self-improving navigation system that uses reactive control for fast performance, augmented with multistrategy learning methods that allow the system to adapt to novel environments and to learn from its experiences. The system autonomously and progressively constructs representational structures that aid the navigation task by supplying the predictive capability that standard reactive systems lack. The representations are constructed using a hybrid case-based and reinforcement learning method without extensive high-level reasoning. The system is very robust and can perform successfully in (and learn from) novel environments, yet it compares favorably with traditional reactive methods in terms of speed and performance. A further advantage of the method is that the system designers do not need to foresee and represent all the possibilities that might occur since the system develops its own "understanding" of the world and its actions. Through experience, the system is able to adapt to, and perform well in, a wide range of environments without any user intervention or supervisory input. This is a primary characteristic that autonomous agents must have to interact with real-world environments. This paper is organized as follows. Section 2 presents a technical description of the system, including the schema-based reactive control component, the case-based and reinforcement learning methods, and the system-environment model representations, and places it in the context of related work in the area. Section 3 presents several experiments that evaluate the system. The results shown provide empirical validation of our approach. Section 4 concludes with a discussion of the lessons learned from this research and suggests directions for future research. 2 Technical Details 2.1 System Description The Self-Improving Navigation System (SINS) consists of a navigation module, which uses schema-based reactive control methods, and an on-line adaptation and learning module, which uses case-based reasoning and reinforcement learning methods. The navigation module is responsible for moving the robot through the environment from the starting location to the desired goal location while avoiding obstacles along the way. The adaptation and learning module has two responsibilities. The adaptation sub-module performs on-line adaptation of the reactive control parameters being used in the navigation module to get the best performance. The adaptation is based on recommendations from cases that capture and model the interaction of the system with its environment. With such a model, SINS is able to predict future consequences of its actions and act accordingly. The learning sub-module monitors the progress of the system and incrementally modifies the case representations through experience. Figure 1 shows the SINS functional architecture. The main objective of the learning module is to construct a model of the continuous sensorirnotor interaction of the system with its environment, that is, a mapping from sensory inputs to appropriate behavioral (schema) parameters used for reactive control. This model allows the adaptation module to control the behavior of the navigation module by selecting and adapting schema parameters in different environments. To learn a mapping in this context is to discover environment configurations that are relevant to the navigation task and corresponding schema parameters that improve the navigational performance of the system. The learning method is unsupervised and uses a reward that depends on the similarity of the observed mapping in the current environment to the mapping represented in the model combined with a more traditional punishment signal. This causes the system to converge towards those mappings that are consistent and beneficial over a set of experiences. The representations used by SINS to model its interaction with the environment are initially under-constrained and generic; they contain very Robot Agent Figure 1: System architecture little useful information for the navigation task. As the system interacts with the environment, the learning module gradually modifies the content of the representations until they become useful and provide reliable information for adapting the navigation system to the particular environment at hand. The learning and navigation modules function in an integrated manner. The learning module is always trying to find a better model of the interaction of the system with its environment so that it can tune the navigation module to perform its function better. The navigation module provides feedback to the learning module so it can build a better model of this interaction. The behavior of the system is then the result of an equilibrium point established by the learning module which is trying to refine the model and the environment which is complex and dynamic in nature. This equilibrium may shift and need to be re-established if the environment changes drastically; however, the model is generic enough at any point to be able to deal with a very wide range of environments. We now present the reactive module, the representations used by the system, and the methods used by the learning module in more detail. 2.2 The Navigation Module The navigation module, which uses schema-based reactive control methods, is based on the AuRA architecture (Arkin, 1989). The module consists of a set of motor schemas that represent the individual motor behaviors available to the system. Each schema reacts to sensory information from the environment, and produces a velocity vector representing the direction and speed at which the robot is to move given current environmental conditions. The velocity vectors produced by all the schemas are then combined to produce a potential field that directs the actual movement of the robot. Simple behaviors, such as wandering, obstacle avoidance, and goal following, can combine to produce complex emergent behaviors in a particular environment. Different emergent behaviors can be obtained by modifying the simple behaviors. This allows the system to interact successfully in different environmental configurations requiring different navigational "strategies" (Ram, Arkin, Moorman, h Clark, 1992). A detailed description of schema-based reactive control methods can be found in Arkin (1989). In this research, we used three motor schemas: AVOID-STATIC-OBSTACLE, MOVE-TO-GOAL, and NOISE. AVOID-STATIC-OBSTACLE directs the system to move itself away from detected obstacles. MOVE-TO-GOAL schema directs the system to move towards a particular point in the terrain. The NOISE schema makes the system move in a random direction; it is used to escape from local minima and, in conjunction with other schemas, to produce wandering behaviors. Each motor schema has a set of parameters that control the potential field generated by the motor schema. In this research, we used the following parameters: Obstacle-Gain, associated with AVOID-STATIC-OBSTACLE, determines the magnitude of the repulsive potential field generated by the obstacles perceived by the system; Goal-Gain, associated with MOVE-TO-GOAL, determines the magnitude of the attractive potential field generated by the goal; Noise-Gain, associated with NOISE, determines the magnitude of the noise; and Noise-Persistence, also associated with NOISE, determines the duration for which a noise value is allowed to persist. Different combinations of schema parameters cause different .behaviors to be exhibited by the system (see figure 2). Traditionally, parameters are fixed and determined ahead of time by the system designer. However, on-line selection and modification of the appropriate parameters based on the current environment can enhance navigational performance, as in the ACBARR system (Ram, Arkin, Moorman & Clark, 1992). SINS adopts this approach by allowing schema parameters to be modified dynamically. However, in ACBARR, schema modification information is supplied by the designer using hand-coded cases. Our system, in contrast, can learn and modify its own cases through experience. The representation of our cases is also considerably different and is designed to support reinforcement learning. 2.3 The System-Environment Model Representation The navigation module in SINS can be adapted to exhibit many different behaviors. SINS improves its performance by learning how and when to tune the navigation module. In this way, the system can use the appropriate behavior in each environmental configuration encountered. The learning module, therefore, must learn about and discriminate between different environments, and associate with each the appropriate adaptations to be performed on the motor schemas. This requires a representational scheme to model, not just the environment, but the interaction between the sys- tem and the environment. However, to ensure that the system does not get bogged down in extensive high-level reasoning, the knowledge represented in the model must be based on perceptual and motor information easily available at the reactive level. SINS uses a model consisting of associations between sensory inputs and schema parameters values. Each set of associations is represented as a case. Sensory inputs provide information about the configuration of the environment, and schema parameter information specifies how to adapt the navigation module in the environments to which the case is applicable. Each type of information is represented as a vector of analog values. Each analog value corresponds to an estimate of a quantitative variable (a sensory input or a schema parameter) over a window of time. A vector represents the trend or recent history of such estimates of a variable. A case models an association between sensory inputs and schema parameters by grouping their respective vectors together. Figure 3 shows an example of this representation. This representation has three essential properties. First, the representation is capable of capturing a wide range of possible associations between.sensory inputs and schema parameters. Second, it permits continuous progressive refinement of the associations. Finally, the representation captures trends or patterns of input and output values over time. This allows the system to detect patterns over larger time windows rather than having to make a decision based only on instantaneous values of perceptual inputs. In this research, we used four input vectors to characterize the environmental and discriminate among different environment configurations: Obstacle-Density provides a measure of the occupied areas that impede navigation; AbsoluteMotion measures the activity of the system; Relative-Motion represents the change in motion activity; and Motion-Towards-Goal specifies how much progress the system has actually made towards the goal. These input vectors are constantly updated with the information received from the sensors. We also used four output vectors to represent the schema parameter values used to adapt the navigation module, one for each of the schema parameters (Obstacle-Gain, Goal-Gain, Noise- 9fwUm w^, Eoutputfc from the perceive step against the corresponding input and output vectors C"^utj, C?utputfc of the cases Cn in the system's memory (see figure 4). The best matching case C"be3t and the position of the best match pbe^st are handed to the adapt_schemas function, which modifies the schema parameter values currently in use based on the recommendations C"u,epitj, (pbes, + 1) from the output vectors of the case. Finally, the learning and adaptation module decides how to utilize information from the current experience with the best case in order to improve its case library. The system learns in three different ways: by improving the content of the case that was just used in order to make it more reliable, by creating a new case whenever the best case retrieved is not good enough, or by extending the length of the case in order to build up longer sequences of associations. The contents of a case are improved by the reinforce_schemas function, which reinforces the suggestions of the case if these suggestions led to a favorable outcome over the last control interval, and by the explore_schemas function, which uses random exploration to try out other schema parameter values if the suggested set of values did not prove useful. The outcome is evaluated by monitoring the behavior of the robot over the last control interval; collisions are undesirable, as is lack of movement. One difference between our methods and traditional reinforcement learning is that SINS is trying to maximize consistency in "useful" behaviors as determined by a reward signal, whereas traditional reinforcement learning tries to maximize the expected utility the system is going to receive in the future as determined by the reward signal (cf. Watkins, 1989; Whitehead k Ballard, 1990). In schema-based reactive control navigation, it is inherently a good idea to modify schema parame- 354 Informatica 17 (1993) 347-369 A. Ram & J.C. Santamaria recent past Current Environment Configuratibn Representation Input Vectors Output Vectors Case Environment Configuration Representation Input Vectors Output Vectors current time »..Time Case length(IQ) Environment length (lp) n t t ' I-Z-H ' •• ^ * / . prediction . Tuning sequence -H F>.it Figure 4: Schematic representation of the match process. Each graph in the case (below) is matched against the corresponding graph in the current environment (above) to determine the best match, after which the remaining part of the case is used to guide navigation (shown as dashed lines). ters in an on-line fashion; however, not all modifications are equally good since some may cause the robot to collide with obstacles or not to move at all. SINS uses the reward signal to decide whether to reinforce a behavior or to explore alternative behaviors; reinforcement, when chosen, is used to reinforce behaviors that are consistent across experiences. Thus, in addition to external outcome, consistency is used as an "internal" reward signal for the reinforcement learning method. Furthermore, traditional reinforcement learning assumes that the outcomes of the system's actions are known; it learns which actions to execute to maximize a reward. In SINS, the outcomes of what corresponds to "actions" (the adaptations to be performed on the navigation module) are not known; part of the learning task is to discover the sequences of environmental situations are likely to result from a given sequence of adaptations and, in turn, which adaptations are appropriate in different situations. Thus, SINS is learning a model of its sensorimotor interaction with the environment (represented as a set of cases) at the same time as it is learning to improve its navigational performance through on-line adaptation of its reactive control schemas. In addition to modifying its cases, SINS can also extend its cases and learn hew cases. In order to decide which kind of learning to perform in a given situation, SINS uses a relative similarity criterion to judge the appropriateness of the best matching case in the current situation. This determination is based on statistical information about the quality of match in prior applications of the case as compared to the quality of match ^ in the current situation. If the best matching case is not as similar to the current environment situation as it has been in previous situations, the case is probably inappropriate for this situation; thus, it is better to learn a new case to represent what is probably a new class of situations. If this occurs, SINS uses the create_new_case function to create a new case based on the current experience and add it to the case library. To determine whether to create a new case, SINS compares the current match with the mean match plus the standard deviation of the matches over the past utilizations of the case. This ensures that new sequences of associations are created only when the available sequences of associations already cap- tured in the case library do not fit the current environment. The third kind of learning is carried out by the extend_case_size function, which extends the length of a case whenever the best case makes an accurate prediction of the next environment situation and there are no more associations in the sequence. This allows the system to increase the length of the sequence of associations only when it is confident that the sequence of the case accurately predicts how the environment changes if the suggested schema parameters are used. To estimate this confidence, the predicted values are matched with the actual environmental parameters that result; if this match-is better than the mean match, the case is extended. Intuitively (as before), if the case predicts the current situation better than it predicted the previous situations that it was used in, it is likely that the current situation involves the very regularities that the case is beginning to capture; thus, it is worthwhile extending the case so as to incorporate the current situation. Alternatively, if the match is not quite as good, the case should not be modified because doing so would take it away from the regularities it has been converging towards. Since the reinforcement formulae are based on a relative similarity criterion, the overall effect of the learning process is to cause the cases to converge on stable associations between environment configurations and schema parameters. Stable associations represent regularities in the world that have been identified by the system through its experience, and provide the predictive power necessary to navigate in future situations. The assumption behind this method is that the interaction between the system and the environment can be characterized by a finite set of causal patterns or associations between the sensory inputs and the actions performed by the system. The method allows the system to learn only the causal patterns that the reward utility identifies as useful and to use them to modify its actions by updating its schema parameters as appropriate. Useful causal patterns are those that do not cause robot collisions or do not cause the robot to stop. Genetic algorithms may also be used to modify schema parameters in a given environment (Pearce, Arkin, k Ram, 1992). However, while this approach is useful in the initial design of the navigation system, it cannot change schema parameters during navigation when the system faces environments that are significantly different from the environments used in the training phase of the genetic algorithm. Another approach to self-organizing adaptive control is that of Verschure, Krose, and Pfeifer (1992), in which a neural network is used to learn how to associate conditional stimulus to unconditional responses. Although their system and ours are both self-improving navigation systems, there is a fundamental difference on how the performance of the navigation task is improved. Their system improves its navigation performance by learning how to incorporate new input data (i.e., conditional stimuli) into an already working navigation system, while SINS improves its navigation performance by learning how to adapt the system (i.e., the navigation module) itself. Our system does not rely on new sensory input, but on patterns or regularities detected in perceived environment. Our learning methods are also similar to Sutton (1990), whose system uses a trial-and-error reinforcement learning strategy to develop a world model and to plan optimal routes using the evolving world model. Unlike this system, however, SINS does not need to be trained on the same world many times, nor are the results of its learning specific to a particular world, initial location, or destination location. We now present a detailed description of and the mathematical formulas used in the perception, matching, adaptation, and learning tasks. 2.4.1 Perception The objective of the perceive function is to generate an accurate description of the current environment situation. It performs this task by shifting the previous values in each input and output vector one position back in time1 and then calculating the current values for each input vector Einputj.(0),j = 1,...,J and output vector Eou,putfc(0),A: = l,...,K, where 0 is the current position in time. The current values for the input vectors are based on the robot's sensors, and the current values for the output vectors are just the respective values of the schema parameters suggested in the previous control interval. The vectors are updated at the end of each control interval of time T. To update the input vectors, the system monitors the values of the robot's sensors Sensory corresponding to each input vector Einput>. The sensors are monitored at each time step over the past control interval; these sensor readings are then averaged to yield the new value for the corresponding input vectors. Thus, the input vectors in the environment representation are updated using the following formula:2 Einputy(i) = Einputj (i 1) Sensor^) T_ if i > 0 if i = 0 1 This is implemented using a circular buffer which does not requires copying each of the values from one cell to the next. where Sensorj(ra) is the sensory input that corresponds to the input vector Einpu,. (sensed obstacles for Obtacle-Density, distance traveled for Absolute-Motion, relative position for Relative-Mot ion, and normal relative position for Motion-Towards-Goal), and t ranges over each robot step since the last control interval. 2.4.2 Retrieval and Matching The function retrieve_best_case is responsible for selecting a case from the case library that best matches the current environment situation. The case similarity metric is based on the mean squared difference between each of the vector values of the case over a trending window, and the vector values of the environment. The best match window is calculated using a reverse sweep over the time axis p similar to a convolution process to find the relative position that matches best. Each case Cn in the case library is matched against the current environment using exhaustive search, which returns the best matching case Cb e " along with the relative position pbe8, of the match (see figure 4). After retrieving the best case, the mean and variance of the case's statistical match history are updated; these will be used later to calculate the relative similarity criterion during learning. The case similarity metric SM of a case C at position p relative to the environment E is a value that indicates the similarity between the sequence of associations encoded in the case to the sequence of associations in the current environment situation starting at position p. The lower the value 2Note that i counts back in time (i.e., t = 0 is the current time and t > 0 is the recent past.) of the case similarity metric, the more similar the sequences of associations. The case similarity metric formula calculates a weighted sum of the squared difference between the corresponding vectors of the case and the environment. For the SM to be valid, p must lie between 0 and lc-3 SM(E,C, p) = niin(p,ij5) (E ,.(0-C ,>-Qi2 vJ-^inpuljW ^inpul-^ V/ K min (p, li)__ w A: = l ž. t = 0 I)) P+1 new new CSMvar C,„.dCgM,ne..... + SM C„„d +1 Cuaed - 1 C SMvar -f(new {SM - new C„ new Cused =' + Cused + 1 ^uaed 2.4.3 Adaptation The best matching case Cnbeat is used to adapt the schema parameter values currently in use by the reactive control module. The values of output vectors for the next association C"btpsutfc after position p}ef^.tt are used to determine the new set of schema parameters values Parameter^ until the next control interval. Since learning tends to reinforce those associations that are consistently ^We used Wj = = 1.0 (i.e., input and output vectors all contribute equally in the similarity metric.) observed over several experiences, the new set of schema parameters can be expected to cause the robot to move safely and the next environment configuration that results from the movement can be expected to be the one predicted by the association. Since output vectors directly represent schema parameters, adaptation is a straightforward operation: Parameter^ = Vfc = The best case is obtained by matching each case C™ in the case library at all 'the positions p and selecting the pair (nbe3t,pbeat) that yields the lowest SM. Formally, this can be expressed as: K.s,,Pbe,,|min(SM(E,Cn,p)),- Vn, 0 < p < /c»-} Each case C maintains a statistical record of the similarity metrics it has produced in the past, which is updated every time the case is retrieved as the best case. The mean (CsM^^^ir) and variance (CsMv«) °f the case similarity metric as well as the number of times the case has been used (CU3ed) are updated using standard formulae in descriptive statistics: 2.4.4 Learning In addition to perceiving the environment, retrieving the best matching case, and adapting the schema parameters being used by the reactive control module, SINS must also learn by updating its case library based on its current experience. Three types of learning are possible: modification of the associations contained in a case, creation of a new case based on the current experience, and extension of the size of a case to yield associations over larger time windows. Modification of case contents, in turn, can be of two types: reinforcement of the associations contained in the case based on a successful experience, and exploration of alternative associations based on an unsuccessful experience. SINS decides which kind of learning to perform using a relative similarity criterion which determines the quality of the best match. The match value of the best case, based on the case similarity metric, is compared with the match values of the case in previous situations in which it was used. If the current match is worse than the mean match value by more than a standard deviation, the case (although still the best match) is considered to be too different from the current situation, since it has been a better match to other situations in the past. In this case, the create.case function is invoked to create a new case containing a sequence of associations formed by copying the values of the sequence of associations in the current environmental representation: If, on the other hand, the best case matches the current situation well, it is likely that the current situation is representative of the class of situations that that case is beginning to converge 358 Informatica 17 (1993) 347-369 A. Ram & J.C. Santamarfa towards. If the case provides good recommendations for action, its recommendations should be reinforced; if not, its recommendations should be modified. In SINS, collisions with obstacles and lack of movement are undesirable by definition of the navigation task. A set of schema parameters is considered beneficial if using it does not lead to an undesirable outcome. The objective of the learning functions is to improve the accuracy of prediction of the system's cases and, in turn, to discover those schema parameter values that result in environmental situations that are beneficial for the robot. If the best case recommends a set of schema parameters that are not beneficial to the robot, the explore_schemas function is used to modify the case such that it suggests a different set of schema parameters in similar circumstances in the future. Specifically, the output vectors C^'^n^ipt^iint + 1) associated with the environment situation following the best match position pt^f^^^t are modified in a random manner since the current values are not useful to the system. The small random changes allow the system to explore the space of possible schema parameters in a controlled manner. These changes are defined by the following formula: p = min(l, a collisions + /^rn.^'^^^etcity) cAOw +^1) = + p random(min C VA; = 1.....K ,k,maxCA'St k), uation, so as to produce the same recommendations in similar situations in the future. This reinforcement is done using the following formulae: Vi = 0,...,pbest Vi = 0,...,jw f+T V^output/t (pbest - ^ — "ou VA; = 1,..,K where A determines the learning rate (0.9 in the cuurent version of SINS). Finally, the extend _case function extends the sequence of associations contained in a case. The decision to extend a case is also based on a statistical relative similarity criterion. If the case's predictions C"b",V0»be8, + 1) are similar to the resulting environment situation within a standard deviation from the mean predictive similarity, and the case does not have more associations in the sequence (that is, it cannot provide a next set of schema parameters), then the case is extended by duplicating the last association of the case: nputj' 1 Ibejl -"outputij(Pbeat "I" .-iTlbeji + 1), CoutpBtj|,(i'best T l)j where p is a "reject" value that determines the extent to which the current recommendations should be taken into account when determining the modified values. A value of p = 0 specifies that the value of the output vector should be left unchanged, and a value of p = 1 specifies that the value of output vector should be replaced completely by a new random value in the allowable range. In any given learning cycle, the value of p depends on a and /?, which represent the importance of avoiding collisions and moving, respectively. In this paper, we used a = 0.5 and /9 = 1.0. If, on the other hand, the schema parameters suggested by the best matching case produce desirable results, the reinforce_schemas function is invoked. This function updates the case by making it more like the current environmental sit- The net result of these learning procedures is to cause the cases in the system's case library to converge towards regularities in the system's interactions with its environment. The system learns useful sequences of schema parameters for different environment situations; these are used to guide navigation and, in turn, are updated based on navigational outcomes so as to improve the reliability of their predictions in similar situations in the future. 3 Evaluation We evaluated the methods presented above using extensive simulations across a variety of different types of environment, performance criteria, and system configurations. The objective of these experiments was to measure the qualitative and quantitative improvement in the navigation performance of SINS (denoted "sins" in the figures), and to compare this performance against several non-learning schema-based reactive systems (the "static" systems) that do not change schema parameters and a system that changes its schema parameter values randomly after every control interval (the "random" system). Furthermore, rather than simply measure improvement in performance in SINS by some given metric such as "speedup," we were interested in systematically evaluating the effects of various design decisions on the performance of the system using several different metrics. To achieve this, we designed several experiments, which can be grouped into four sets as discussed below. 3.1 Experiment Design The systems were tested on randomly generated environments consisting of rectangular bounded worlds. Each environment contains circular obstacles, a start location, and a destination location, as shown in figure 2. Figure 5 shows an example runs of two of the static systems, the random system, and the SINS system on a typical randomly generated world. The location, number, and radius of the obstacles were randomly determined to create environments of varying amounts of clutter, defined as the ratio of free space to occupied space. 15% clutter corresponded to relatively easy worlds and 50% clutter to very difficult worlds. We tested the effect of three different design parameters in the SINS system: max-cases, the maximum number of cases that SINS is allowed to create; max-size, lc, representing the maximum number of associations in a case; and control-interval, T, which determines how often the schema, parameters in the reactive control module are adapted. We used three non-adaptive systems for comparison. These systems were identical to the underlying navigation module in SINS, but used different fixed sets of schema parameter values. The "staticH" system used a hand-coded set that we designed manually by watching the behavior of the system on both 15% and 50% cluttered worlds. The "staticl5" and "static50" systems used the schema parameter values reported in Ram, Arkin, Moorman, and Clark (1992), which were hand-optimized through extensive trial and error to 15% and 50% cluttered worlds, respec- tively. As discussed below, SINS performed as well as the staticl5 system on 15% cluttered worlds and the static50 system on the 50% cluttered worlds; furthermore, SINS reached this level of performance autonomously and, unlike the static systems which were optimized only for a specific clutter, was robust across sudden changes in clutter. We used six estimators to evaluate the navigation performance of the systems. These metrics were computed using a cumulative average over several hundred test worlds to factor out the intrinsic differences in difficulty of different worlds. Percentage of worlds solved indicates in how many of the worlds posed the system actually found a path to the goal location. The optimum value is 100% since this would indicate that every world presented was successfully solved. A^ver^a^ge steps per world indicates the average of number of steps that the robot takes to terminate each world; smaller values indicate better performance. Average distance per world indicates the total distance traveled per world on average; again, smaller values indicate better performance. Average a:}V-aK opiimal distance per world indicates the ratio of the total distance traveled and the Euclidean distance between the start and end points, averaged over the solved worlds. The optimal value is 1, but this is only possible in a world without obstacles. Average virtual collision's per world indicates the total number of times the robot came within a pre-defined distance of an obstacle. Finally, a^v-erage time per world indicates the total time the system takes to execute a world on average. The data for the estimators was obtained after the systems terminated each world. This was to ensure that we were consistently measuring the effect of learning across experiences rather than within a single experience (which is less significant on worlds of this size anyway). The execution is terminated when the navigation system reaches its destination or when the number of steps reaches an upper limit (1000 in the current evaluation). The latter condition guarantees termination since some worlds are unsolvable by one or both systems. In this paper, we discuss the results from the following sets of experiments: — Experiment set 1: Evaluation of the effect of our multistrategy case-based and reinforce- 360 Informatica 17 (1993) 347-369 A. Ram k J.C. Santamaria Mcit*-P»rii»tcnc* HQlj«-C»lr Cpat-CBln Ob.\««»'t«Jn Curnnt-eii* 2 0.30 1.00 1.10 0 Bt«pa 1000 Cont-Ot» 13 Oft 310.73 Db«t«ci«» 0.00 Motion QOO To Gfvl OOO 0.34 mr»etlAT. -36. fl W w ^ •• •T« Hoii^trii»tenc- Noii^^-Galn CD«iC»Ln Ob.1-ctC-ir^. Cumnteai* 4 0.30 1.57 0,69 0 5tfpa 106 Cpntwtt 9 D-iBt_91.S^O Ob»t.•ci«« 0.00 Hntton 0.00 To Coal 0.00 1.00 Direction -128.378 Mslt»-Prrilit««e« M9l»»-C«tf» Co«J-C»ln 0b.Uct-6*l" CurTenfcti* B I.00 1.00 1.10 0 297 Contwit- 70 Dial 173,39 0->»t»ctf 0.00 Hotlni 0,00 To Co'I OOO K-tnUui- I.OO Dir»rtiO» 179.5713 Nolif-PiraliCfK* Oblcct-Cfll-» CurrcM-Citt .76 0.70 18 « _ Cww» t Oi»l /^O.T.Obj.t^ciK g.oo Hat*«* 0.3? To Co"! 0.3*0 Figure 5: Sample runs of the staticl5 (top left), static50 (top right), random (bottom left), and SINS (bottom right) systems on a randomly generated world. The robot starts at the black box (towards the lower right side of the world) and tries to navigate to the white box. The graphs near the top of each figure show the values of each schema parameter over time. ment learning method by comparing the performance of the SINS system against the static and random systems. - Experiment set 2: Evaluation of the effect of the parameters of the case-based reasoning component of the multistrategy learning system. - Experiment set 3: Evaluation the effect of the control-interval parameter, which determines how often the adaptation and learning module modifies the schema parameters of the reactive control module. - Experiment set 4: Evaluation of the effect of changing environmental characteristics, and evaluation of the ability of the systems to adapt to new environments and learn new regularities. 3.2 Discussion of Experimental Results The results in figures 6 through 10 show that SINS does indeed perform significantly better than its non-learning counterparts. To obtain a more detailed insight into the nature of the improvement, let us discuss the experimental results in more detail. Experiment set 1 : Effect of the multistrat-egy learning method. We first evaluated the effect of our multistrategy case-based and reinforcement learning method by comparing the performance of the SINS system against the static and random systems. SINS was allowed to learn up to 10 cases (max-cases = 10), each of max-size = 10. Adaptation occurred every control-interval = 4 steps. Figure 6 shows the results obtained for each estimator over 200 randomly generated worlds. Each graph compares the performance on one estimator of each of the five systems, SINS, staticH, staticl5, static50, and random, discussed above. Figure 6 shows the results obtained for each estimator over the 200 worlds with 50% clutter. The best configuration of SINS, which could learn up to 10 cases of maximum size 10 and with a control interval of 4, was selected to do the comparison. As shown in the graphs, SINS performed as well as or better than the other systems with respect to five out of the six estimators. Table 2 and 3 show the final improvement in the system after all the worlds with 15% and 50% clutter, respectively. For example, table 3 shows that SINS successfully navigates 100% of the worlds, the same as the static50 system optimized for 50% cluttered worlds, with 27% fewer virtual collisions. Although the non-learning system was 85% faster in time, the paths it found required the same number of steps. On average, SINS' solution paths were about the same length as those of the static50 system; however, it should be noted that the static50 system was customized for 50% cluttered worlds, whereas SINS improved its performance regardless what type of environment was dealing with. The static50 system did not perform as well in 15% cluttered worlds; although the staticl5 system did better, SINS compared favorably with that system in those worlds. Another important result is that SINS improved the performance independently of the initial values for the schema parameters; for example, when initialized with the same schema parameters as the staticH system, it was able to achieve performance far superior to the staticH system, and comparable with or better than the staticl5 system in 15% cluttered worlds and the static50 system in 50% cluttered worlds. The average time per world was the only estimator in which the self-improving system performed worse. The reason for this behavior is that the case retrieval process in SINS is very time consuming. However, since in the physical world the time required for physical execution of a motor action outweighs the time required to select the action, the time estimator is less critical than the distance, steps, and solved worlds estimators. Furthermore, as discussed below, better case organization methods should reduce the time overhead significantly. The experiments also demonstrate also our assumption about the utility of adapting schema parameter values in reactive systems: the number of worlds solved by the navigation system is increased by changing the values of the schema parameters even in a random fashion, although the random changes lead to greater distances traveled. This may be due to the fact that random changes can get the system out of "local minima" situations in which the current settings of 362 Informatica 17 (1993) 347-369 A. Ram & J.C. Santamaria 100 93 90 8j 80 Percenlage of worlds solved Avenge steps per world 7S - 7 ; -------- --------- ------- — -------- "St •B] •ri 'dm-lUclJ-iticJO-aücH" idorn" liQl h Vv / v^i V w 0 20 40 60 80 100 120 140 160 180 200 Number of worlds 0 20 40 60 80 100 120 140 160 180 200 Number of worlds Average distance per world Average actual/optimal distance prr world 0 20 40 60 80 100 120 140 16() 180 200 Number of worlds 20 40 60 80 100 IX) 140 160 180 200 Number of worlds Average collisions per world 20 40 60 80 100 120 140 160 180 200 Number of worIds 2500 2000 1500 1000 SKI f/V y-" n ^--------- Aveope lime per world 1 -1 ■sins-i -•jiiticis: • ..rjUtitSÜ-V-. •siälicH-; - --■r*i(tonr,--- 0 "P 40 60 80 100 120 140 160 180 200 Number of worlds Figure 6: Cumulative performance results on 50% cluttered worlds. Average steps per world Average collisions per world case40 case30 CO5620 case 10 case40 case 30 case20 case10 Figure 7: Effect of max-cases and max-size on 15% cluttered worlds. Average steps per world Average collisions per world 350 cose40 case30 case20 V case10 Figure 8: Effect of max-cases and max-size on 50% cluttered worlds. 364 Informatica 17 (1993) 347-369 A. Ram & J.C. Santamaria 400 350 300 250 200 ISO 100 50 P Average steps per world , 1...... ! 1 20 40 60 80 ■c(iuol2Ì1- . .-cotitroM'; i ;-.- "cOntio!61. J....-.Ì. "coiiuo!8'i — i 100 120 Numher of worlds 140 160 180 200 Average collisions per world •rojiuolll - ^cojiboid — "ctHiüolST - 0 20 40 60 80 100 120 140 160 180 2O0 Number of worlds Figure 9: Effect of control-interval on 50% cluttered world with max-cases=20 and max-size=15. staticl5 stat^icSO staticH random S^l^NS Percentage of worlds solved 99.5% 100% 98.5% 99% 99.5% Average steps per world 69.1 106.4 108.5 73.8 68.9 Average distance per world 49.9 71.8 49.8 56.5 51.5 Average distance ° optimal 1.23 1.76 1.22 1.39 1.27 Average virtual collisions 0.85 1.53 3.99 2.66 0.62 Average time per world, ms 57 90 210 94 513 Table 2: Final performance results for 15% cluttered worlds. its parameters are inadequate. However, consistent changes (i.e., those that follow the regularities captured by our method) lead to better performance than random changes alone. Experiment set 2: Effect of case parameters. This set of experiments evaluated the effect of two parameters of the case-based reasoning component of the multistrategy learning system, that is, max-cases and max-size. The parameter control-interval was held constant at 4, while max-cases was set to 10, 20, 30 and 40, and max-size was set to 5, 10, 15 and 20. All these configurations of SINS, and the static and random systems, were evaluated using all six estimators on 200 randomly generated worlds of 15% and 50% clutter. Figures 7 and 8 show the results for two of the estimators, a^er^age steeps and average vir^tual collisions. AH configurations of the SINS system navigated successfully in a comparable or larger percent- age of the test worlds than the static systems. Regardless of the max-cases and max-size parameters, SINS could solve most of the 15% and 50% cluttered worlds. As before, the graphs show that SINS' performance was comparable to that of the staticl5 system in 15% cluttered worlds, and to the static50 system in 50% cluttered worlds. Thus, even if SINS is initialized with a poor set of schema parameter values, it can discover a good set of values and improve upon its initial performance. Our experiments revealed that, in both 15% and 50% cluttered worlds, SINS needed about 40 worlds to learn enough to be able to perform successfully thereafter using 10 or 20 cases. However, with higher numbers of cases (30 and 40), it took more trials to learn the regularities in the environment. It appears that larger numbers of cases require more trials to train through trial-and-error reinforcement learning methods, and furthermore there is no appreciable improvement in later performance. The max-size parameter has an ap- 1 A ~ 0 Percentage of world* solved 150 200 250 Number of worlds 400 220 200 Average steps per world 150 200 250 350 400 Number of worlds Avenge distance per world Avenge actual/optimal distance per world 150 200 250 Number of worlds 50 100 150 200 250 300 350 400 Number of worlds Avenge collisions per world Average time per world iUldclS' .^"iUIlcJQ: tOtlcH' uidocn' ssm 50 100 150 200 250 300 350 400 Number of worlds 1800 r 1600 ]—• - -1400 1200 1000 800 600 400 200 150 200 250 300 350 400 Number of worlds Figure 10: Effect of a sudden change in environment (after the 200th world). staticlS static50 staticH random SJ^NS Percentage of worlds solved 85.5% 100% 82% 100% 100% Average steps per world 272.7 143.6 315.5 141 141.9 Average distance per world 127.1 103.8 126.2 113.7 107.2 Average op CRUIĐŽI) i Rules^ Knowledge-based symbolic learner cr—y Ru!ei\7 c—y Ru!o2l7 Ru Ü I » Instance-based numeric learner Figure 1 : Integration of knowledge-based symbolic learning and instance-based numeric learning in ibl-smart. the search heuristic combines weak, imprecise background knowledge with empirical information from the training data, producing hypotheses that tend to correspond to the background knowledge as much as the data permits and overriding the background knowledge if the data is in conflict with the knowledge. The search algorithm thus realizes a tight and fine-grained integration of deduction and more or less plausible inference. In a sense, the search tree can be regarded as a plausible explanat^ion of the training instances: it relates the training examples to the target concept via the system's domain knowledge and, where that is missing or inappropriate, via empirical arguments. 4.3 The numeric instance-based component The symbolic learning component produces a concept hypothesis for a discrete, qualitative sub-concept T{ in the form of a DNF expression, where each conjunct describes one particular subtype of the sub-concept. For each of these conjuncts, i.e., for each leaf of the search tree, the instance-based learner collects all the training instances covered by the conjunct and builds an instance store in the form of an interpolation table, using these examples. The output dimension (the dependent variable) is the value of the numeric target function V = T(X). The input dimensions (the independent variables) are chosen to be all the numeric attributes (C PN) that are shared by all instances assigned to the instance store. (The reason for this is that in our musical application, not all in- stances may have defined values for all numeric attributes). When given a new instance for which to predict the value of the target variable, the system matches the instance against all learned rules, retrieves those instance spaces whose associated rules are matched, and computes a value for the instance's target value by linear interpolation between the two nearest neighbors in each of the retrieved spaces. As all the dimensions of the tables are numeric, Euclidean distance is used as the similarity measure. If the instance matches more than one rule, ' and thus target values are computed in several spaces, the target values are simply averaged. Figure 1 summarizes the basic structure of ibl-smart. 5 An Application of IBL-SMART: Learning Expressive Interpretation Expressive interpretation is the art of shaping a piece of music by playing it not exactly as given in the written score, but continuously varying tempo, loudness, and other parameters during a performance. Actually, when played exactly as written, most pieces of music would sound utterly mechanical and lifeless. When viewed as a learning problem, the task is to learn, from listening to recordings of known pieces as played by some performer and comparing them to the score of the pieces as actually written, general rules that allow one to decide how to play new melodies, i.e., when to increase or de- A crease the loudness, when to speed up or slow down, etc. IBL-SMART was applied to two dimensions of this learning problem, namely, dynamics (loud-ness variations) and rubat^o (tempo variations). In the following presentation, we will concentrate on the dimension of dynamics. The level on which examples and target concepts are denned in our application is the level of the individual note: each note in a piece is an example, and the goal is to learn general rules to decide how loud to play any note in some new piece. Obviously, this concept is inherently numeric, as the task is to decide not just whether or not to play some note louder or softer, but exactly by how much. But there are two discrete, qualitative sub-concepts that can naturally be distinguished: crescendo (Note) and decrescendo(Note), which are defined here to mean that a note is to be played louder or softer, respectively, than some standard level.3 These are the target concepts T{ for the symbolic learning component. The precise amounts by which the loudness is to be varied are numeric multiplication factors that are to be learned by the instance-based component. Tr^aining inst^ances are derived from actual performances of piano pieces recorded on an electronic piano via a MIDI interface. At the moment, we are restricting ourselves to single line melodies (with additional information about the underlying harmonic structure of the piece). That is, the input is a sequence of notes, described in terms of various predicates, symbolic and numeric, and accompanied by explicit information about the degree of crescendo or decrescendo that was applied to it by the performer (the target variable). The description language consists of predicates that describe various features of a note and structural features of its surroundings. There are currently 41 operational predicates, of which 21 are symbolic (like followed_by_rest(Note)) and 20 are numeric (like duration (Note.X)). Some of these predicates are computed by a preprocessing component which performs a music-theoretic analysis of the given piece in terms of 3 In standard music terminology, crescendo means an increase of loudness with respect to the previous note. For various reasons, we have adopted a somewhat different definition for our musical system. To the learner this should make no difference. some relevant musical structures (e.g., phrases and various types of patterns such as linear melodic lines (ascending or descending), rhythmic patterns, etc.). Many numeric attributes then describe the relative position of a note in a phrase or in other types of patterns. Note that the number of attributes defined for a given note varies: some notes occur in many patterns, others only in some. So not all numeric attributes are defined for every note. The domain theory we have developed for this problem is based in part on accepted theories of music, and in part represents our personal hypothesis concerning the relation between the structure of music and its plausible interpretations.4 For our presentation here, it suffices to say that the domain theory is mainly in the form of directed and undirected qualitative dependency statements. It is a hierarchy of such dependencies and some crisp rules. The top level of the theory relates the phenomenon of loudness variations to some abstract musical notions by a set of dependencies like depends_on(crescendo(Note,X), [stability(Note.Y)]). depends_on(crescendo(Note,X), [goal_directedness(Note,Y)]). depends_on(crescendo(Note,X), [closure(Note.Y)]). The first of these can be paraphrased as "Whether crescendo should be applied t^o a not^e (and it so, the exact amount X) (depends, among ot^her t^h^ngs, on the musical st^ability Y of the not^e," and analogously for the other ones. Abstract notions like stability, goal, directedness, and closure are then again related to lower-level musical effects, all the way down to some surface features of training instances, for example: q+( metrical_strength(Note,X), stability(Note.Y)). q+( harmonic_stability(Note,X), stability(Note,Y)). "The degree of stability Y of a note is positively proportionally related to 4This project also has a strong music-theoretic component which cannot be discussed here. Readers interested in the musicological aspects are referred to [39]. (among other things) the metrical strength X of the note" etc. where metrical_strength is a numeric and harmonicstability is a symbolic attribute (with a discrete, ordered domain of qualitative values). Both are defined as operational. Given this domain theory and some played pieces, the plausible explanation component learns mixed symbolic/numeric rules that discriminate various types of situations where a crescendo or a decrescendo occurs. Each rule describes a particular class of crescendo/decrescendo situations. The IBL component then creates a numeric interpolation table for each such rule and stores the instances associated with the rule. The set of all numeric attributes shared by all the instances covered by a rule defines the dimensions of the respective interpolation table. 6 Two experiments The system has been tested with pieces from various musical epochs and styles (Bach minuets, Chopin waltzes, even jazz standards). The following sections present two small experiments that show the typical behavior of IBL-SMART. 6.1 Bach minuets In this experiment, we chose three well-known minuets from J.S.Bach's Not^enbi^chlein fiir Anna, Ma.gda.ler^a Bach as training and test pieces. The beginnings of the three minuets are shown in Figure 2. All three pieces consist of two parts. The second part of each piece was used for training: they were played on an electronic piano by the author, and recorded through a MIDI interface. After learning, the system was tested on the first parts of the same pieces. In this way, we combined some variation in the training data (three different pieces) with some uniformity in style (three pieces from the same period and with similar characteristics; test data from the same pieces as training data, though different). The training input consisted in 212 examples (notes), of which 79 were examples of crescendo, and 120 were examples of decrescendo (the rest were played in a neutral way). The system learned 14 rules and, correspondingly, 14 interpolation tables characterizing crescendo situations, and 15 rules for decrescendo. Quite a number of instances were covered by more than one rule. For illustration, here is a simple rule for crescendo: crescendo(Note,X) :- metrical_strength(Note,S), S > 4.0, harmony_stability(Note,high) , previous_interval(Note,Il), direction(Il,up), next_interval(Note,I2), direction(I2,down). "Apply some crescendo to the current note if the metrical strength of the note is > 4 and the underlying harmony is stable and the direction of the melody from the previous to the current note is up and the direction of the melody from the current note to the next is down" The quality of the learning results is not easy to measure, as there is no precise criterion to decide whether some performance is right or wrong. Judging the correctness is a matter of listening. Unfortunately, we cannot attach a recording to this article so that the reader can appreciate the results. Instead, Figure 3 depicts a part of one of the training pieces (the second part of the first minuet in G major as played by the author), and also shows the performance created by the system for a test piece (the first part of the same minuet) after learning. The figures plot the relative loud-ness with which the individual notes were played. A level of 1.0 would be neutral, values above 1.0 represent crescendo (increased loudness), values below 1.0 decrescendo. The reader familiar with standard music notation may appreciate that there are strong similarities in the way similar types of phrases are played by the human teacher and the learner. Note, for instance, the crescendo in lines rising by stepwise motion, and the decrescendo patterns in measures with three quarter notes. Note also the consistent pattern of accents (loud notes) at the beginnings of measures. Given the limited amount of training data, the degree of generalization achieved is quite remarkable. In addition, an inspection of 380 Informatica 17 (1993) 371-385 G. Widmer I fr- m M ; L Figure 2: Beginnings of three little minuets by J.S.Bach. 1.8 1.Ć 1.4 J.: Ü.g T'..... — I ----- r.......1 !"■ ! ■ ! ! ..... ......{-I- ! ............. ; T1 ; i LT i LT.....1...... ^-Ìl ...........1.......It: LZ^ mäf^ II Time (bejis) 15 18 24 18 1.6 1.4 1.2 1 0.8 i .......U^m- I 9 12 15 Time (beau) Figure 3: Parts of a training piece as played by teacher (top) and test piece as played by learner after learning (bottom). 12 Time (beau) Figure 4: Part of test piece as played after instance-based learning only. the symbolic rules learned in this experiment revealed that the system had re-discovered some expression principles that had been formulated years ago by music theorists (for more details, see [39]). Results of similar quality were obtained in experiments with music from other eras and styles (again, see [39]). 6.2 The effect of knowledge In order to verify the importance of the domain theory and generally the superiority of the complete IBL-SMART system, experiments on the same data were also performed with a system restricted to instance-based learning only, i.e., without the symbolic explanation component. In these experiments, the system simply used all available training examples with all their attributes (numeric and symbolic) to predict values (decide how to play new melodies) by nearest-neighbor interpolation. The similarity metric mapped all the numeric attributes to a uniform scale between 0 and 1, and for symbolic attributes, the distance was defined to be 0 in the case of a match and 1 otherwise. As not all training instances share all numeric dimensions, the system learned as many interpolation tables as there were combinations of numeric attributes occurring in the training data. Figure 4 shows the result (the system's performance on the same test piece) after learning from the Bach minuets in this way. In this case, 18 rules for crescendo and 12 for decrescendo were learned. There is a marked deterioration in the resulting performance from learning w^^h knowledge (figure 3) to learning without knowledge (figure 4). The differences are easily audible (the performance sounds irregular and unfocused, with no clear stress patterns except in measures 3 and 4), but can also be appreciated by looking at the plots: the performance pattern of figure 4 is rather blurred, it lacks the regularity of figure 3. Obviously, the pure IBL learner does not distinguish as well between relevant types of situations, and generalization does not always seem to focus on musically sensible patterns. Similar experiences were made in experiments with other musical data sets. These results highlight the role of the domain knowledge: the learned symbolic rules have a focusing effect; they cluster those examples (predictors) together in distinct instance spaces that are not just coincident ally similar, but can be explained in similar ways and thus seem to represent significant sub-classes. The effect is more specialized prediction, structured according to domain principles. 7 Discussion and Related Work In this paper, we have described a general class of numeric learning problems and a learning model that deals with these problems. The model is composed of two parts - a symbolic, knowledge-intensive search algorithm and a numeric, instance-based prediction method. Applying the model to a learning task requires defining abstract sub-concepts T,, which introduces a natural distinction between a symbolic and a numeric learning problem, and also produces negative instances for the symbolic learner. All the available (qualitative) background knowledge can then be brought to bear on the symbolic learning problem. The most important advantages of our method are (1) that it can effectively use qualitative domain knowledge for learning precise numeric target functions; (2) that learning symbolic rules for discrete sub-concepts partitions the space for the instance-based prediction component; this results in specialized instance stores and very specific prediction behavior; and (3) as a side-effect, that learning rules for discrete sub-concepts clusters the examples around meaningful abstractions, which may be useful for other tasks. There is an interesting parallel between our method and the symbolic-numeric clustering algorithm CSNAP that has recently been proposed by Whitehall and Sirag [35]. CSNAP also learns to predict numeric target attributes. It first uses statistical criteria to cluster the examples into disjoint groups, and then searches for explanat^ions of these clusters in terms of symbolic descriptions. In that process, instances may be moved between clusters so as to permit more concise or more plausible descriptions. The CSNAP approach is somewhat orthogonal to ours, but is guided by the same motivation: to group the learning examples into clusters that are specialized predictors and at the same time are similar with respect to given background knowledge (or, more generally, amenability to concise symbolic description). The IBL-SMART learning method is related to a number of other systems described in the literature. An important source of influence was the work of G. DeJong [9, 10] on learning and plausible reasoning. He had presented a system that combined a very weak notion of plausible inference over single cases with interpolation over numeric target variables. Our approach departs from his, among other things, in the variety of types of background knowledge and in the use of a heuristically guided, search-based, multi-instance explanation algorithm that permits much more control over the learning process. Not only does this search introduce a strong notion of empirical plausibility by taking into account the distribution of instances; the use of an explicit search heuristic also makes it possible to exploit the qualitative knowledge contained in qualitative dependencies (q+, q-) to compute the relative plausibility of arguments. The best-first search is likely to find explanations that are most plausible overall, both with respect to the knowledge and the data. DeJong's system, on the other hand, simply assumed that the syntactically simplest explanation was also the most plausible one. Another way of exploiting qualitative relationships similar to q+ and q- (though in a more process-oriented framework) has recently been proposed in [5], where the known direction of the qualitative dependencies is used to severely constrain the search space for an inductive learner. There is also a close relation between IBL-SMART and the work on plausible justification trees by Tecuci [30]. In both cases, the goal is a fine-grained integration of various types of inference in the framework of creating plausible explanations. IBL-SMART takes a more data-driven approach, in that it learns from all the instances at once and uses the statistical information contained in the data (in addition to the domain knowledge) to guide a heuristic search. As noted, the basic search framework is related to Bergadano and Giordana's method [3]. Generally, we see two main advantages in constructing plausible explanations from whole sets of examples at once. The first is, of course, the empir^ical suppor^t afforded by the data. All plausible arguments have an implicit inductive component, so it is really the actual observations that lend more or less support to such arguments and the generalizations that are drawn from them. A second advantage of the multi-instance explanation approach is that it suggests a natural way to deal with certain types of noise in the training data. IBL-SMART implements a simple noise handling mechanism. The search algorithm incorporates two thresholds: the evaluation function accepts only nodes (conjuncts) that cover some minimum number of positive instances, and the termination criterion allows the search to halt when a certain percentage (usually less than 100%) of positive instances are covered. Thus, the system can ignore rare instances that look like exceptions, but are really the result of noise. By varying these thresholds, the system can be tuned to the characteristics of different application domains.5 The main disadvantage of the multi-instance explanation approach is its non-incrementality. Whether that is a problem depends on the particular application. Also, it is possible that, using techniques similar to those in [37], IBL-SMART could be made to learn incrementally without losing too much in effectiveness. Finally, remember that the system IBL-SMART presented here is just one particular incarnation of a more general approach. We have found it convenient to use a best-first search algorithm as the basis for the plausible explanation component, as it constructs an explicit a search tree and allows us to integrate various sources of knowledge into the learning process via the search heuristic. However, with appropriate modifications and extensions, other symbolic learners capable of utilizing incomplete and inconsistent knowledge - for instance, FOCL [26] - might be used just as well in this framework. Similarly, more elaborate strategies could be used in the instance-based component. Aha et al. [1, 17] have described a number of instance-based learning methods, both for symbolic classification and for numeric prediction tasks, that could be applied within a framework such as ours. Also, available domain knowledge about the relative degree of relevance of numeric attributes or about the domains and typical values of numeric variables could be used to devise more sophisticated similarity metrics, tailored to the particular application. Given this generality, we may expect the IBL-SMART approach to be applicable to a wide variety of domains. 5In fact, the musical experiments described in the previous section were characterized very strongly by noise in the data, originating from the author's imperfect piano technique, from the imprecise boundaries between the abstract sub-concepts crescendo and decrescendoi and from imprecision inherent in the domain itself (there are simply no 100% laws as to how some passage must and will be played; variation will invariably happen). The parameter settings in the experiments reported in section 6 were as follows: each leaf of the tree had to cover at least 5% of the positive training examples, and the search terminated when 80% of the examples were covered. Acknowledgments I would like to thank Robert Trappl for his support of this work. This research is sponsored in part by the Austrian Fonds znr Forderung der Wissenschaftlichen Forschung (FWF) under grant P8756-TEC. Financial support for the Austrian Research Institute for Artificial Intelligence is provided by the Austrian Federal Ministry for Science and Research. References [1] D.W. Aha, D. Kibler and M.K. Albert, Instance-Based Learning Algorithms. Machine Learning 6(1), pp. 37-66 (1991). [2] P.T. Baffes and R.J. Mooney, Symbolic Revision of Theories with M-of-N Rules. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93), Chambery, France, pp. 11351142 (1993). [3] F. Bergadano and A. Giordana, A Knowledge Intensive Approach to Concept Induction. In Pr^oceedings of the F^fl^h International Conference on Machine Learning, Ann Arbor, MI, pp. 305-317 (1988). [4] F. Bergadano, A. Giordana and S. Ponsero, Deduction in Top-Down Inductive Learning. In Proceedings of the Sixth International Workshop on Machine Learning, Ithaca, N.Y., pp. 23-25 (1989). [5] P. Clark and S. Matwin, Using Qualitative Models to Guide Inductive Learning. In Proceedings of ^h^e 10th International Conference on Machine Learning, Amherst, MA (1993). [6] A. Collins and R.S. Michalski, The Logic of Plausible Reasoning: A Core Theory. Cogni-ti^e Science 13(1), pp. 1-49 (1989). [7] A.P. Danyluk, Finding New Rules for Incomplete Theories: Explicit Biases for Induction with Contextual Information. In Proceedings of the Sixth International Workshop on Machine Learning, Ithaca, N.Y., pp.34-36 (1989). [8] A.P. Danyluk, An Integration of Analytical and Empirical Learning. In Proceedings of the First International Workshop on Mu^t^-strategy Learning (MSL-91), Harpers Ferry, W.VA., pp.191-206 (1991). [9] G. DeJong, Explanation-Based Learning with Plausible Inferencing. In Proceedings of the Four^th European Working Session on Learning (EWSL-89), Montpellier, France, pp. 1-10 (1989). [10] G. DeJong, Explanation-Based Control: An Approach to Reactive Planning in Continuous Domains. In Proceedings of the DAR^PA Workshop on Innovative Approaches to Planning, Scheduling, and Control, San Diego, CA, pp. 325-336 (1990). [11] G. DeJong and R.J. Mooney, Explanation-based Learning: An Alternative View. Machine Lear^ning 1, pp. 145-176 (1986). [12] T.E. Fawcett, Learning from Plausible Explanations. In Pr^oceedings of the Sixth International Workshop on Machine Learning, Ithaca, N.Y., pp. 37-39 (1989). [13] U. Fayyad and K. Irani, On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning 8(1), pp. 87-102 (1992). [14] N.S. Flann and T.G. Dietterich, A Study of Explanation-Based Methods for Inductive Learning. Machine Learning 4(2), pp. 187226 (1989). [15] K.D. Forbus, Qualitative Process Theory. Ar^ti^icia^l Intelligence 24(1-3), pp. 85-169 (1984). [16] J. Genest, S. Matwin, and B.Plante, Explanation-Based Learning with Incomplete Theories: A Three-Step Approach. In Proceedings of the Seventh International Conference on Machine Learning, Austin, Texas, pp. 286-294 (1990). [17] D. Kibler, D.W. Aha and M.K. Albert, Instance-based Prediction of Real-valued Attributes. Computational Intelligence 5, pp. 51-57 (1989). [18] S. Mahadevan, Using Determinations in EBL: A Solution to the Incomplete Theory Problem. In Pr^oceedings of the Sixth International Workshop on Machine Learning, Ithaca, N.Y., pp. 320-325 (1989). [19] S. Mahadevan and P. Tadepalli, On the Tractability of Learning from Incomplete Theories. In Proceedings of the Fifth International Conference on Machine Learning, Ann Arbor, MI, pp. 235-241 (1988). [20] R.S. Michalski, A Theory and Methodology of Inductive Learning. In R.S. Michal-ski, J.G. Carbonell and T.M. Mitchell (Eds.), Machine Learning: An Artificial Intelligence Approach, vol. I. Palo Alto, CA: Tioga (1983). [21] R.S. Michalski, Inferential Learning Theory as a Conceptual Basis for Multistrategy Learning. Machine Learning 11(2/3), Special Issue on Multistrategy Learning, pp. 111-151 (1993). [22] T.M. Mitchell, R.M. Keller and S.T. Kedar-Cabelli, Explanation-Based Generalization: A Unifying View. Machine Learning 1(1), pp. 47-80 (1986). [23] R.J. Mooney, Induction over the Unexplained: Using Overly-General Domain Theories to Aid Concept Learning. Machine L^e^c^r^n^^n^g 10(1), pp. 79-110 (1993). [24] R.J. Mooney and D. Ourston, A Multistrat-egy Approach to Theory Refinement. In Proceedings of the First International Workshop on Multistrategy Learning (MSL-91), Harpers Ferry, W.VA., pp. 115-130 (1991). [25] M.J. Pazzani, Integrated Learning with Incorrect and Incomplete Theories. In Proceedings of the Fifth International Conference on Machine Learning, Ann Arbor, MI, pp. 291297 (1988). [26] M. Pazzani and D. Kibler, The Utility of Knowledge in Inductive Learning. Machine Learning 9(1), pp. 57-94 (1992). [27] S. Rajamoney S. and G.F. DeJong, The Classification, Detection and Handling of Imperfect Theory Problems. In Proceedings of the 10th International Joint Conference on Ar^ti-f^icial Intelligence (IJCAI-87), Milano, Italy, pp. 205-207 (1987). [28] S.J. Russell, Analogical and Inductive Reasoning. Ph.D. thesis, Report STAN-CS-87-1150, Stanford University, Stanford, CA (1987). [29] G. Tecuci, Learning as Understanding the External World. In Proceedings of the First International Workshop on Multistrategy Learning, Harper's Ferry, W.VA, pp. 49-64 (1991). [30] G. Tecuci, Plausible Justification Trees: A Framework for Deep and Dynamics Integration of Learning Strategies. Machine Learning 11(2/3), Special Issue on Multistrategy Learning, pp. 237-261 (1993). [31] G. Tecuci and Y. Kodratoff, Apprenticeship Learning in Imperfect Domain Theories, In Y. Kodratoff and R.S. Michalski (eds.), Maschine Learning: An Artificial Intelligence Approach, vol. Ill, San Mateo, CA: Morgan Kaufmann, pp. 514-552 (1990). [32] G. Tecuci and R.S. Michalski, A Method for Multistrategy Task-adaptive Learning Based on Plausible Justifications. In Proceedings of the Eighth International Workshop on Machine Learning, Evanston, HI., pp. 549-553 (1991). [33] K. VanLehn, Learning One Subprocedure per Lesson. Artificial Intelligence 31(1), pp. 1-41 (1987). [34] B.L. Whitehall, S.C. Lu and R.E. Stepp, Theory Completion Using Knowledge-Based Learning. In Proceedings of the Fir^st International Workshop on Multistrategy Learning (MSL-91), Harpers Ferry, W.VA., pp. 144-159 (1991). [35] B.L. Whitehall and D.J. Sirag, Conceptual Clustering of Events Using Statistical Split Criteria. In Proceedings of the Second International Workshop on Multistrategy Learning, Harper's Ferry, W.VA., pp. 166-179 (1993). [36] G. Widmer, A Tight Integration of Deductive and Inductive Learning. In Proceedings of the Sixth International Workshop on Machine Learning, Ithaca, N.Y., pp. 11-13 (1989). [37] G. Widmer, An Incremental Version of Bergadano & Giordana's Integrated Learning Strategy. In Pr^oceedings of the Four^th European Working Session on Learning (EWSL-89), Montpellier, France, pp. 227238 (1989). [38] G. Widmer, Learning with a Qualitative Domain Theory by Means of Plausible Explanations. In R.S. Michalski and G. Tecuci, eds., Machine Learning: A Multistrategy Approach, vol. IV. Los Altos, CA: Morgan Kaufmann (1993). [39] G. Widmer, Modeling the Rational Basis of Musical Expression. Computer Music Jour^-nal (in press). [40] D.C. Wilkins, Knowledge Base Refinement as Improving ah Incorrect and Incomplete Domain Theory. In Y. Kodratoff and R.S. Michalski (eds.), Machine Learning: An Ar^-tificial Intelligence Approach, vol. Ill, San Mateo, CA: Morgan Kaufmann, pp. 493-513 (1990). EXTENDING THEORY REFINEMENT TO M-of-N RULES Paul T. Baffes and Raymond J. Mooney Department of Computer Sciences University of Texas Austin, Texas 78712-1188 USA Keywords: artificial intelligence, multistrategy learning, theory refinement Edited by: Gheorghe Tecuci Received: May 26, 1993 Revised: October 15, 1993 Accepted: October 15, 1993 In recent years, machine learning r^esear^ch has started addressing a pr^oblem known as theory refinement. The goa^l of a theory refinement learner is to modify a^n incomplete or incorr^ect rule base, ' representing a domain theory, to make it consist^ent with a set of input training examples. This paper presents a major revision of the EITHER propositional theory refinement system. Two issues are discussed. Firsts, i^e show how r^un time efficiency ca^n be greatly improved by changing from a exhaustive scheme for computing repairs to an iterative gr^eedy method. Second, i^e show how ^o extend EITHER ^o refine M-of-N rules. The resulting algorithm, NEITHER (New EITHER^, is more than an or^der of magnitude faster and pr^oduces significantly more accur^ate results with theories that f^i^t the M-of-N format. To demonstrate the advantages of NEITHER, we present experimental r^esults from two r^eal-world domains. 1 Introduction Recently, a number of machine learning systems have been developed that use examples to revise an approximate (incomplete and/or incorrect) domain theory [4, 11, 18, 3, 21, 7]. Most of these systems revise theories composed of strict if-then rules (Horn clauses). However, many concepts are best represented using some form of partial matching or evidence summing, such as M-of-N concepts, which are true if at least M of a set of N specified features are present in an exampie. There has been some work on the indue-tion of M-of-N rules demonstrating the advantages of this representation [17, 9]. Other work has focused on revising rules that have real-valued weights [19, 6]. However, revising theories with simple M-of-N rules has not previously been addressed. Since M-of-N rules are more constrained than rules with real-valued weights, they provide a stronger bias and are easier to comprehend. This paper presents a major revision of the EITHER propositional theory refinement system al, 12] that is significantly more efficient and is also capable of revising theories with M-of-N rules. EITHER is inefficient because it computes a potentially exponential number of repairs for each failing example. The new version, NEITHER (New EITHER), computes only the single best repair for each example, and is therefore much more efficient, Also, because it was restricted to strict Horn-clause theories, the old EITHER algorithm could not produce as accurate results as a neural-network revision system called KBANN on a domain known as the DNA promoter problem [18, 19]. Essentially, this is because some aspects of the promoter concept fit the M-of-N format, Specifically, there are several potential sites where hydrogen bonds can form between the DNA and a protein; if enough of these bonds form, promoter activity can occur. EITHER attempts to learn this concept by forming a separate rule for each poten-' tial configuration by deleting different combinations of antecedents from the initial rules. Since a combinatoric number of such rules is needed to accurately model an M-of-N concept, the generality of the resulting theory is impaired. The new NEITHER algorithm, however, includes the abil- 8 - b » c b - d b -•- e c •<- f c - 9 d - hi i & | e A k&l 1 - m& n 9 - o & p ft q A def 9 A A K /K h I | k I m n o p q example: f, I, n, q true; h, 1, k, m, o, p false. A A A A A A A A A'\ A 't* hlj mn h I| o q p ^^ mn kM oqp Figure 1: Partial proofs for unprovable positive example. Unprovable antecedents are shown with dotted lines. ity to modify a theory by changing thresholds of M-of-N rules. Including threshold changes as an alternative method for covering misclassified examples was easily incorporated within the basic EITHER framework. To demonstrate the advantages of NEITHER, we present experimental results comparing it to EITHER and various other systems on refining the DNA promoter domain theory. NEITHER runs more than an order of magnitude faster than EITHER and produces a significantly more accurate theory with minor revisions that are easy to understand. We also present results showing NEITHER'S ability to repair faults in a theory used to teach the diagnosis of shock to novice nursing students. We show that NEITHER is able to restore significantly damaged theories to near perfect accuracy. 2 Theory Revision Algorithm 2.1 The either Algorithm The original EITHER theory refinement algorithm has been presented in various levels of detail in [11, 12, 10]. It was designed to revise proposi-tional Horn-clause theories. For EITHER, a theory is a set of prepositional Horn-clause rules such as those shown in the top half of Figure 1. Each theory is assumed to function as a classification system whereby examples are labeled as belonging to one of a given set of categories. Examples are vectors of feature-value pairs listing the value corresponding to each feature, as well as the category into which the example should be classified. As an illustration, one might imagine a diagnostic theory for determining whether or not a given product coming off an assembly line passes inspection. The categories for such a rule base might be "pass" and "fail" and the examples would consist of whatever measurements could be made on the product as part of the inspection test. In revising an incorrect theory, note that EITHER can fix either overly-general or overly-specific rules through a combination of deductive, abductive and inductive techniques as shown in Figure 2. An overly-general theory is one that causes an example (called a failing negative) to be classified in categories other than its own. EITHER specializes existing antecedents, adds new antecedents, and retracts rules to fix these problems. An overly-specific theory causes an example (called a, failing positive) not to be classified in its own category. EITHER retracts and generalizes existing antecedents and learns new rules to fix these problems. Unlike other theory revision systems that perform hill-climbing (and are therefore subject to local maxima), EITHER is guaranteed to fix any arbitrarily incorrect propositional Horn-clause theory [10]. The basic algorithm used by EITHER for both generalization and specialization is shown in the top half of Figure 3. There are three steps. First, all possible repairs for each failing example are computed. Next, EITHER enters a loop to compute a subset of these repairs that can be applied to the theory to fix all of the failing examples. This subset is called a cover. Repairs are ranked according to a benefit-to-cost ratio that trades off the number of examples covered against the size of the repair and the number of new failing examples it creates. The best repair is added to the cover on each iteration. Lastly, the repairs in the cover are applied to the theory. If the application of a repair over-compensates by creating new failing examples, EITHER passes the covered examples and the new failing examples to an in- Figure 2: Block diagram of either. either Main Loop Compute all repairs for each example While some examples remain uncovered Add best repair to cover set Remove examples covered by repair end Apply repairs in cover set to theory neither Main Loop While some examples remain Compute a single repair for each example Apply best repair to theory Remove examples fixed by repair end Figure 3: Comparison of EITHER and NEITHER algorithms. duction component.1 The results of the induction are added as a new rule when generalizing or as additional antecedents when specializing. The time consuming part of this algorithm is the first step where all repairs for a given failing example are found. Figure 1 illustrates this process for theory generalization where EITHER is searching for leaf-rule antecedent deletions to correct failing positive examples. A leaf rule is a rule whose antecedents include an observable or an intermediate concept that is not the consequent of any existing rule. The upper half of the diagram shows an input theory both as rules (on the left) and as an AND-OR graph. The lower half of the diagram shows a hypothetical failing positive example and its partial proofs. A partial proof is one in which some antecedents cannot be satisfied. From these proofs there are four possible repairs which will fix the example, corresponding to the four partial proofs. In each repair, the dotted lines represent antecedents which cannot be proved and must therefore be removed from the given rule(s). Thus, for the leftmost partial proof, h and j cannot be proved in the rule d <— h & i & j, and m cannot be proved for rule f <— m & n, so the repair for this partial proof is: delete (h, j ,m) from their respective rules. Likewise, the three other repairs are: delete (h.j.o.p), delete (k,m) and delete (k,o,p). Theory specialization follows a similar process to return sets of leaf-rule rule deletions which fix individual failing negative examples. 2.2 Speeding Up EITHER We have recently implemented a new version of EITHER (NEITHER) that takes a different approach, as shown in the bottom half of Figure 3. Two new algorithms form the basis for the difference between EITHER and NEITHER. First, calculation of repairs is now achieved in linear time. Second, all searches through the theory (for deduction, antecedent deletion and rule deletion) are optimized in NEITHER to operate in linear time by marking the • theory to avoid redundant subproofs. NEITHER abandons the notion of searching for all partial proofs in favor of a greedy approach which rapidly selects a single best repair for each example. The three steps of the old 1 EITHER uses a version of ID3 [13] for its induction. EITHER algorithm can then be integrated into a single loop (see Figure 3). Rather than computing all partial proofs, NEITHER works bottom-up, constructing a single set of deletions. When multiple options exist, NEITHER alternates between returning the smallest option and returning the union of the options, depending whether the choice involves an AND or OR node. For generalization, deletions are unioned at AND nodes because all unprovable antecedents must be removed to make the rule provable. At OR nodes, only the smallest set of deletions is kept since only one rule need be provable. For specialization, these choices are reversed. Results are unioned at OR nodes to disable all rules which fire for a faulty concept. At AND nodes, the smallest set of rule deletions is selected since any single failure will disable a rule. To illustrate how repairs are computed in linear time, refer again to Figure 1. The antecedent deletion calculations for this example would begin at the root of the graph, recursively calling nodes b and c. Deletion for node b would then recurse on nodes d and e. Since h, j and k are false, node d returns (h,j) and node e returns (k). When the recursion returns back to node b a choice must be made between the results from nodes d and e because the theory is being generalized and node b is an OR node. Since node e requires fewer deletions, its deletions are chosen as the return value for node b. Recursion for node c follows a similar pattern: node f returns (m), node g returns (o,p) and node c chooses the smaller results from node f as its return value. Finally, nodes b and c return their values to node a. Now, since node a is an AND node and the theory is being generalized, the results from b and c are combined. The final repair returned from node a is delete (k,m). Thus the rule e <— k & 1 is generalized to e «— 1, and the rule f <— m & n is generalized to f <— n. Note that this algorithm is linear in the size of the theory. No node is visited more than once, and the computation for choosing among potential deletions must traverse the length of each rule at most once. The final repair is also minimum with respect to the various choices made along the way; it is not possible to find a smaller repair that will satisfy the example with the given set of rules. Of course, once a repair is applied to the theory it will effect subsequent repair calculations because the theory will change. Thus although each repair is minimum with respect to the state of the theory from which is was calculated, the total sum of all repairs may not be minimum due to ordering effects. The only way to reach such a global minimum is to do an exhaustive search which is exponential in the size of the theory. This new algorithm trades the complete information available in the partial proofs for speed in computation. 2.3 Adding M-of-N Rules to NEITHER Expanding NEITHER to handle M-of-N rules involves both a change in the syntax and interpretation of rules as well as a modification to the types of revisions which can be made to a given theory. With M-of-N rules, there are six types of revisions. As before, antecedents may be deleted or rules may be added to generalize the theory, and antecedents may be added or rules deleted to specialize the theory. The two new revisions are to increase or decrease the threshold: decreasing generalizes a rule and increasing specializes it. To incorporate these two new revisions, NEITHER must be changed in four places. First, the computation of a repair for each failing example must take thresholds into account. For generalization, one need only delete enough antecedents to make the rule provable; there is no need to delete all false antecedents if the rule has a threshold. For example, if the rule for e in Figure 1 had a threshold of 1 there would be no need to delete k to prove this rule. A similar accounting for thresholds is required for computing rule deletions for specialization. Note that during generalization the threshold of each rule from which antecedents are deleted must be decreased by the number of antecedents deleted to account for the smaller size of the rule. Second, NEITHER must compute threshold repairs. Calculating threshold changes can be done in conjunction with the computation of antecedent and rule deletion repairs since it is directly related to how many of antecedents of a rule are provable. For generalization, we change the threshold to the number of antecedents which are provable. In specialization, we set the threshold to one more than the number of provable antecedents. Third, a mechanism must be provided for se- Generalization Specialization change resulting rule b c be change resulting rule b c be orig. rule a <- 2 of (b.c) N N Y orig. rule a *- 1 of (b,c) Y Y Y threshold -1 a <- 1 of (b.c) Y Y Y threshold +1 a <- 2 of (b,c) N N Y delete b a <— c N Y Y delete rule none N N N Table 1: Comparison of Revisions. lecting between a threshold change and a deletion. Effectively, this amounts to deciding which type of revision to try first. The philosophy used in NEITHER is to try the most aggressive changes initially in the hopes that the resulting repair will cover more examples. If the repair creates new failing examples, the less ambitious repairs are tried in turn with induction used as a last resort. During generalization, more radical repairs are those which create more general rules (i.e., rules which can prove more examples). In specialization, the opposite is true. As with EITHER, if all changes result in new failing examples, the algorithm falls back to induction to learn new rules or add new antecedents. Table 1 compares equivalent threshold and deletion changes for generalization and specialization. The columns labeled with b, c and be indicate whether the corresponding rule will conclude a when just b, just c or both b and c are true. Note that in both cases, the threshold change results in a more general rule. This means that threshold changes should be tried before antecedent deletions during generalization, but tried after rule deletions during specialization. This is because during generalization, the most aggressive changes are those which generalize the most, but during specialization, the most aggressive changes are those which generalize the least. Fourth and finally, the induction component of NEITHER must be altered slightly to accommodate threshold rules. When the application of a repair causes new failing examples to occur, NEITHER resorts to induction as did EITHER. The result of the induction cannot, however, simply be added to the theory as before. Table 2 illustrates the problem. The original rule shown can be used to prove both the positive and negative examples, and deleting this rule or incrementing its threshold only prevents the positive example from being proved. Assume that induction returns a new fea- ture, d, which can be used to distinguish the two examples (i.e., d is true for the positive example but false for the negative example). Because the original rule has a threshold, adding d directly will still allow both examples to prove the rule. This problem remains even if one tries to increment the threshold in addition to adding d. Instead, the rule must be split by renaming the consequent of the original rule, and creating a new rule with the renamed consequent and the results of induction as the new rule's antecedent list. 3 Experimental Results In the experiments which follow, there are two versions of the NEITHER algorithm. The first has only the speedup changes and is termed simply NEITHER. The second includes M-of-N refinements and is termed NEITHER-MOFN. 3.1 The DNA Promoter Domain 3.1.1 Experimental Design We tested both NEITHER and NEITHER-MOFN against other classification algorithms using the DNA promoter sequences data set [20]. This data set involves 57 features, 106 examples, and 2 categories. The theory provided with the data set is supposed to recognize pr^omoters in strings of nucleotides. A promoter is a genetic region which initiates the first step in the expression of an adjacent gene transcription. However, the original theory has an initial classification accuracy of only 50%. We selected this particular data set because the original EITHER algorithm was outperformed by other systems on this data due to the M-of-N qualities required to reason correctly in this domain. In addition to testing EITHER, NEITHER and NEITHER-MOFN, we ran experiments using id3 [13], backpropagation [16] and RAPTURE [6] (a revision system based on certainty factors). We example features pos. example neg. example b, c, d b, -ic, d b, C, -id orig. rule pos. example neg. example a <- 1 of (b,c) Y Y add to rule pos. example neg. example a «- 1 of (b.c.d) Y I Y split rule pos. example neg. example X «- 1 of (b,c) Y Y a <- X,d Y N Table 2: Induced Antecedent Addition. tCama Test Accuracy 1 M on r" jifirv> 1/ 75.00 1: ; V f - TO 00 i i 6500 / — dt . M i i «om . / i 1 i „„ 11/'-" t t JO.OO_ linn f I 1 1 1 tlAPTURE NEffHER-Mnrt - KBÄNN SifiLPRÙP NEffHtìf FJTHER""' Figure 4: DNA Test Set Accuracy. also included data on the performance of kbann on this data set as reported in [20]. The experiments proceeded as follows. Each data set was divided into training and test sets. Training sets were further divided into subsets, so that the algorithms could be evaluated with varying amounts of training data. After training, each system's accuracy was recorded on the test set. To reduce statistical fluctuations, the results of this process of dividing the examples, training, and testing were averaged over 25 runs. The random seeds for the backpropagation algorithm were reset for each run. Training time, and test set accuracy were recorded for each run. Statistical significance was measured using a Student t-test for paired difference of means at the 0.05 level of confidence (i.e., 95% certainty that the differences are not due to random chance). 3.1.2 Results The results of our experiments are shown in the three graphs of Figures 4, 5 and 6. Figure 4 compares the learning curves of the systems tested, showing how predictive accuracy on the test set changes as a function of the number of training examples. As can be seen NEITHER-MOFN'S performance was significantly better than all other systems except RAPTURE and KBANN.2 RAPTURE out-performed neither-mofn with small numbers of training examples but their accuracy was comparable with larger inputs. NEITHER'S accuracy was on par with backpropagation, but was lower than either for small training sets and higher than either for large training sets. Note, that Figure 4 is not direct comparison of neither and kbann since the results reported were compiled from different subsets of the DNA promoter sequences data set. ID3 had significantly lower accuracy than the other systems. Figure 5 shows a comparison of training times. Both neither-mofn and neither were more than an order of magnitude faster than backprop-agation and EITHER. Only id3 ran faster than neither-mofn. We also collected data on the average complexity of the revised theories produced by both neither and neither-mofn. Complexity was measured as the total size; i.e., the total number of all literals in the theory. The results are shown in Figure 6. As can be seen from this graph, neither-mofn not only produces less complex resulting theories but also produces theories closer in size to the original. technically, the last difference between backpropaga-tion and NEITHER-MOFN was only significant at the 0.1 level. Concept Complexity Training Time ( 90 examples ) 300,00_l29!?_____ i i i Backprop Either Rapture Nohhor Neithor-MolN 103 Umili 15000 — HS ai — nam - I 35 XJ -IJO.CU — 125.00 — i;f).oo -115 «) ~ I10IX) 105 XJ - -limo — ■iS 11) — woo -MOT! -8000 — 75.IK) -70 0(1 ---------- --- -------------- -------- -------- ------ / ________... ................. ÌIBTKER- _TMEiTHai:MSiN Figure 5: DNA Training Time Comparison. Figure 6: DNA Concept Complexity. 3.1.3 Discussion Many of our expectations were borne out by the experimental results. Both NEITHER and NEITHER-MOFN ran more than an order of magnitude faster than EITHER due to the optimized algorithms discussed in section 2. NEITHER-MOfN'S increase in accuracy was also expected since the new algorithm is able to concentrate on making M-of-N revisions directly. Also, the fact that NEITHER-MOFN generates less complex theories is not surprising, again because it can directly modify threshold values rather than create new rules. In short, by adding one more operator to the generalization and specialization processes, NEITHER-MOFN is able to accurately revise a theory known to be difficult for symbolic systems, without having, to sacrifice the efficiency of a symbolic approach. Finally, the most comparable learning-curve results from [20] would indicate that KBANN'S accuracy in the promoter domain is about the same as NEITHER-MOFN'S. The most surprising result of the experiments was the difference in accuracy between the original EITHER algorithm and NEITHER. As stated above, EITHER was more accurate with fewer training examples, but its accuracy dropped off relative to NEITHER as the number of examples increased. One possible explanation for this behavior lies in the difference between how the two systems compare potential revisions (see Figure 3). Recall that EITHER . computes multiple repairs for each example, but does so only once. NEITHER, by contrast, computes one repair per example each time through its main loop. As a result, with fewer training examples, EITHER has more potential revisions to examine, apparently giving it an edge over NEITHER. Even though NEITHER computes new repairs each time it iterates, there may not be enough iterations to generate as rich a set of deletions as is done in one step by EITHER. On the other hand, as the number of training examples grows, NEITHER undergoes many more iterations, each computing new repairs in l^ght of any pre^^ous revisions. By contrast, EITHER computes its repairs for each example independently, missing out on any interactions which might occur when the revisions are applied to the theory in a particular order. Capturing these interactions may be one reason NEITHER out-performs EITHER with large numbers of examples.' 3.2 The Shock Diagnosis Domain A second set of experiments was run to test NEITHER'S ability to repair faulty theories. The data for this experiment was borrowed from a separate research project designed to test nursing students retention of concepts for determining if a patient is suffering from shock [8]. Each patient hypovolemic <— shock disrupted-blood-volume cardiogenic «— shock ineffective-pumping-action vascular-tone <— shock disrupted-vascular-tone shock <— 3 of ((pulse rising) (respiration rising) (blood-pressure falling) (urine-output low) (mental-status strained) skin-abnormal) shock <— (patient pregnant) (symptoms blood-loss) (mental-status strained) skin-abnormal skin-abnormal <— (skin cool-clamy) skin-abnormal <— (skin hot-flushed) disrupted-blood-volume <— (symptoms fluid-loss) disrupted-blood-volume <— (symptoms blood-loss) ineffective-pumping-action <— (symptoms cardiac) disrupted-vascular-tone <— (symptoms infection) disrupted-vascular-tone <— (symptoms allergy) disrupted-vascular-tone <— (symptoms neural) Figure 7: Shock Domain Theory. can be labeled in one of four ways: as suffering from hypovolemic, cardiogenic, or vascular tone shock, or as not in shock. A theory for diagnosing shock was written using the definitions and examples presented to the students and consultations with a medical expert. The final theory is shown in Figure 7. This data set was chosen for two reasons. First, it represents another real-world domain which has an M-of-N flavor (the "shock" concept in the theory is represented using a threshold rule). Second, NEITHER'S ability to refine theories in this domain is the centerpiece of another of our research efforts in st^udent modeling [1]. In short, a student modeling algorithm must be able to recover from a variety of deviations from the correct theory in order to be useful to a variety of students. 3.2.1 Experimental Design The basic design of this experiment was to introduce faults into the correct shock theory and test how well NEITHER-MOFN could refine the result. The following five alterations were injected into the theory with equal probability: antecedent deletions, antecedent additions, rule deletions, rule additions, and threshold changes. A modification factor was passed to the algorithm which made the alterations indicating the percentage of antecedents in the theory to be changed. Consequently, a modification factor of 0.1 indicated that roughly 10% of the antecedents in the the- ory would be modified. Data for training and testing was drawn from a pool of 150 examples equally representative of the three categories of shock. These 150 examples were randomly generated using the correct theory. For each category, 40 positive examples and 10 near-miss negative examples were created. Thus, of the 150 total cases, 120 examples belonged to one of the three categories of shock and 30 were non-shock examples. A three-phased experiment was run. In each experiment, the 150 examples were first split randomly into 100 training examples and 50 test examples. The original theory was then subjected to three independent modifications of 0.1, 0.2 and 0.3. Each resulting theory was refined by NEITHER-MOFN using the same 100 training examples. The theories were tested both before and after refinement using the same 50 test examples. This entire process was repeated 10 times, and the results averaged. For comparison purposes, we also ran the same training data through a propo-sitional version of the FOIL inductive learner [14] and tested the results using the same test data, averaging the results of the 10 trials. 3.2.2 Results Table 3 shows the results of the recovery experiments. Note that the results for FOIL are identical for each experiment since induction does not make use of an input theory. In each of the three experiments, NEITHER-MOFN was able to reconstruct a theory to perfect or near perfect accuracy on the test data. NEITHER-MOFN was also able to create more accurate theories than induction alone. Tests were also run using the non-threshold version of NEITHER but the results were nearly identical to those reported for NEITHER-MOFN (there was no statistically significant difference between NEITHER and NEITHER-MOFN on this data). 3.2.3 Discussion The results of Table 3 are largely what one would expect for a good theory refinement algorithm. As the theory deviated more and more from the original, the performance of the altered theory on the test data continued to decline. Likewise, the ability of NEITHER-MOFN to repair the theory declined with increasingly altered theories, though 0.1 modified theory 0.2 modified theory 0.3 modified theory before refinement 68.25 50.25 43.5 after refinement 100.0 94.0 94.0 induction 80.4 80.4 80.4 Table 3: Shock Test Set Accuracy. only slightly. Yet in all cases, NEITHER-MOFN was able to refine a wide variety of damaged theories to a high level of performance on novel test data. Additionally, NEITHER-MOFN was able to create more accurate theories than induction alone by taking advantage of input theories which were at least partly correct. Finally, though NEITHER-MOFN was unable to exactly duplicate the original theory in all cases, the refinements made seemed reasonable in light of the alterations made in the modified theories. 4 Related Work Several researchers have developed methods for inducing M-of-N concepts from scratch. CRLS [17] learns M-of-N rules and out-performed standard rule induction in several medical domains. ID-2-of-3 [9] incorporates M-of-N tests in decision-tree learning and out-performed standard decision-tree induction in a number of domains. Both projects clearly demonstrate the advantages of M-of-N rules. SEEK2 [5] includes operators for refining M-of-N rules; however, its revision process is heuristic and it is not guaranteed to produce a revised theory that is consistent with all of the training examples. NEITHER uses a greedy covering approach to guarantee that it finds a set of revisions that fix all of the misclassified examples in the training set. Also, unlike NEITHER, SEEK2 cannot learn new rules or add new antecedents to existing rules. KBANN [19] revises a theory by translating it into a neural network, using backpropagation to refine the weights, and then retranslating the result back into symbolic rules. NEITHER'S symbolic revision process is much more direct and, from all indications, significantly faster. Although KBANN'S results are referred to as M-of-N rules, they actually contain real-valued antecedent weights and therefore are not strictly M-of-N. In addition, KBANN'S revised theories for the promoter problem are also more complex in terms of number of antecedents than the initial theory [20], while NEITHER actually produces a slight reduction. Therefore, NEITHER'S revised theories are less complex and presumably easier to understand. Finally, unlike KBANN, NEITHER is guaranteed to converge to 100% accuracy on the training data. RAPTURE [6] uses a combination of symbolic and neural-network learning methods to revise a certainty-factor rule base [2]. Consequently, it lies somewhere between NEITHER and KBANN on the symbolic-connectionist dimension. As illustrated in the results, its accuracy on the promoter problem is only slightly superior to NEITHER'S. However, its real-valued certainty factors make its rules more complex. 5 Future Work The current version of NEITHER needs to be enhanced to handle a number of issues. We need to incorporate a number of advanced features from the original EITHER algorithm, such as constructive induction, modification of higher-level rules, and the ability to handle numerical features and noisy data. Also, we could extend our methods to handle negation as failure and incorporate the ability to handle M-of-N rules into first-order theory revision [15]. The inductive component of NEITHER should be modified to produce threshold rules directly, rather than symbolic rules. Finally, we need to perform a more comprehensive experimental evaluation of the system. 6 Conclusions This paper has presented an efficient proposi-tional theory refinement system that is capable of revising M-of-N rules. The basic framework is a modification of EITHER [11]; however, the construction of partial proofs has been reduced from exponential to linear time and a method for revising the thresholds of M-of-N rules has been incorporated. The resulting system runs more than an order of magnitude faster and produces significantly more accurate results in domains requiring partial matching, such as the problem of recognizing promoters in DNA. Acknowledgments This research was supported by the NASA Graduate Student Researchers Program under grant number NGT-50732, the National Science Foundation under grant IRI-9102926, and a grant from the Texas Advanced Research Program under grant 003658144. We would like to thank Jude Shavlik, Geoff Towell, and Marylin Murphy for generously providing the DNA and Shock data sets. Special thanks also to Chris Whatley for his help implementing NEITHER and to Dr. Thomas Baffes for his invaluable help in developing the rules for the shock domain. References [1] P. Baffes and R. J. Mooney. Using theory revision to model students and acquire stereotypical errors. In Proceedings of the Thir^-teenth Annual Conference of the Cognitive Science Society, pages 617-622, Blooming-ton,IN,1992. [2] G.G. Buchanan and eds. E.H. Shortliffe. R^ule-Based E^per^t Systems: T^he MYCIN Experimentes of the Stanford Heurist^ic Programming Project. Addison-Wesley Publishing Co., Reading, MA, 1984. [3] A. D. Danyluk. Gemini: An integration of analytical and empirical learning. In Proceedings of the International Workshop on Mul-tistrategy Learning, pages 191-206, Harper's Ferry, W.Va., Nov. 1991. [4] A. Ginsberg. Theory reduction, theory revision, and retranslation. In Proceedings of the Eighth National Conference on Artificial Intelligence, pages 777-782, Detroit, MI, July 1990. [5] A. Ginsberg, S. M. Weiss, and P. Politakis. Automatic knowledge based refinement for classification systems. Artificial Intelligence, 35:197-226, 1988. [6] J. J. Mahoney and R. J. Mooney. Combining neural and symbolic learning to revise probabilistic rule bases. In S.J. Hanson, J.C. Cowan, and C.L. Giles, editors, Advances in Neural Information Processing Systems, Vol. 5, pages 107-114, San Mateo, CA, 1993. Morgan Kaufman. [7] S. Matwin and B. Plante. A deductive-inductive method for theory revision. In Pr^oceedings of the International Workshop on Multistrategy Learning, pages 160-174, Harper's Ferry, W.Va., Nov. 1991. [8] Marilyn A. Murphy and Gayle V. Davidson. Computer-based adaptive instruction: Effects of learner control on concept learning. Journal of Computer-Based Instruction, 18(2):51-56, 1991. [9] P. M. Murphy and M. J. Pazzani. ID2-of-3: Constructive induction of M-of-N concepts for discriminators in decision trees. In Proceedings of the Eighth International Workshop on Machine Learning, pages 183-187, Evanston, IL, June 1991. [10] D. Ourston. Using Explanation-Based and Empir^ical Methods in Theory Revision. PhD thesis, University of Texas, Austin, TX, August 1991. Also appears as Artificial Intelligence Laboratory Technical Report AI 91164. [11] D. Ourston and R. Mooney. Changing the rules: A comprehensive approach to theory refinement. In Proceedings of the Eighth Na^-tional Conference on Artificial Intelligence, pages 815-820, Detroit, MI, July 1990. [12] D. Ourston and R. J. Mooney. Theory refinement combining analytical and empirical methods. Artificial Intelligence, in press. [13] J. R. Quinlan. Induction of decision trees. Macchine Lea^rning, l(l):81-106, 1986. [14] J.R. Quinlan. Learning logical definitions from relations. Machine Learning, 5(3):239-266, 1990. [15] B. Richards and R. Mooney. First-order theory revision. In Proceedings of the Eighth International Workshop on Machine Learning, pages 447-451, Evanston, IL, June 1991. [16] D. E. Rumelhart, G. E. Hinton, and J. R. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Par^allel Distributed Processing, Vol. I, pages 318-362. MIT Press, Cambridge, MA, 1986. [17] K. A. Spackman. Learning categorical decision criteria in biomedical domains. In Proceedings of the Fifth International Conference on Machine Learning, pages 36-46, Ann Arbor, MI, June 1988. [18] G. Towell and J. Shavlik. Refining symbolic knowledge using neural networks. In Proceedings of the International Workshop on Mul-tistrategy Learning, pages 257-272, Harper's Ferry, W.Va., Nov. 1991. [19] G. Towell and J. Shavlik. Interpretation of artificial neural networks: Mapping knowledge-based neural networks into rules. In R. Lippmann, J. Moody, and D. Touret-zky, editors, Advances in Neural Information Processing Systems, volume 4. Morgan Kaufmann, 1992. [20] G. G. Towell. Symbolic Knowledge and Neural Networks: Insertion, Refinement, and Extraction. PhD thesis, University of Wisconsin, Madison, WI, 1991. [21] B. L. Whitehall, S. C. Lu, and R. E. Stepp. Theory completion using knowledge-based learning. In Proceedings of the International Workshop on Multistrategy Learning, pages 144-159, Harper's Ferry, W.Va., Nov. 1991. MULTITYPE INFERENCE IN MULTISTRATEGY TASK-ADAPTIVE LEARNING: DYNAMIC INTERLACED HIERARCHIES Michael R. Hieb and Ryszard S. Michalski Center for Artificial Intelligence George Mason University, Fairfax, VA hieb@gmu.edu and michalski@gmu.edu Keywords: multistrategy learning, inferential theory of learning, knowledge transmutation, generalization, abstraction, similization Edited by: Gheorghe Tecuci Received: April 10, 1993 Revised: May 26, 1993 Accepted: October 16, 1993 Research on multistrategy task-adaptive learning aims at integrating a^^l basic inferential learning strategies—learning by deduction, induction and analogy. The implementation of such a learning system requires a knowledge representation that facilitates performing a multitype inference in a seamlessly integrated fashion. This paper presents an approach to implementing such multitype inference based on a novel knowledge representation, called Dynamic Interlaced Hierarchies (DIE). DIH integrates idea^s from our research on cognitive modeling of human plausible reasoning, the Inferential Theory of Learning, and knowledge visualization. In DIH, knowledge is partitioned into a "static" part that represents relatively stable knowledge, and a "dynamic" part that repr^esents knowledge ^ha^t changes relatively frequently. The static part is organized into type, part, or precedence hierarchies, while the dynamic pa^rt consists of traces that link nodes of (different hierarchies. By modifying traces in different ways, the system can perform different knowledge transmutations (patterns of inference), such as generalization, abstraction, similization, and their opposites, specialization, concretion and dissimilization, respectively. 1 Introduction This paper presents basic ideas underlying a knowledge representation proposed for the implementation of a MTL system and its use for imple-The development of multistrategy learning sys- menting multitype inference. This representation, terns requires a powerful and easily modifiable called Dy^nawic j^ni^erlaced Hierarchies (DIH), inknowledge representation that facilitates mul- tegrates ideas from our research on modeling hu-titype inference. This is particularly true in manplausible inference, the Inferential Theory of the case of multistrategy task-adaptive learning Learning and the visualization of knowledge. DIH (MTL) systems that integrate a whole range of encompasses many different forms of knowledge inferential strategies, such as empirical indue- p factSj ruleS) dependencies, etc., and facilitates tion, abduction, deduction, plausible deduction, knowledge transmutations, described in the Infer-abstraction, and analogy [11, 12, 16, 17]. A ential Theory of Learning (ITL) [13]. This paper MTL system adapts a strategy or a combina- shows how Dffl supports several basic patterns tion of strategies to the learning task, defined by ofknowledge change (transmutations), such as the available input knowledge, the learner's back- generalization, abstraction, similization, and their ground knowledge and the learning goal. A the- oppositeS) specialization, concretion and dissim-oretical framework for the development of MTL ilizatioI1) respectively. These operations are persystems has been presented in [13 . formed on DIH traces, which correspond to well-formed predicate logic expressions associated with a degree of belief. While our previous work has focused on the visualization of attribute-based representations for empirical induction [19], DIH allows the visualization of structural (attributional and relational) representations. The underlying assumption is that the syntactic structure for representing any knowledge should reflect as closely as possible the semantic relationships among the knowledge components, and facilitate knowledge modifications that correspond to the most frequently performed inferences. An early implementation of this idea was in the ADVISE system, which used three forms of knowledge representation: relational tables, networks and rules [14]. The DIH approach assumes that a large part of human conceptual knowledge is organized into varjous hierarchies, primarily type, part and precedence hierarchies (see Section 3 for an explanation) . Such hierarchies reflect frequently occurring relationships among knowledge components, and make it easy to perform basic forms of inference. The initial idea for DIH stems from the core theory of human plausible reasoning [5, 4] The theory presents a formal representation of various plausible inference patterns observed in human reasoning. DIH is more fully described in [8]. 2 Relevant Research The core theory of Plausible Reasoning presents a system that formalizes various plausible inference patterns and "merit parameters" that affect the certainty of these inferences. This system combines structural aspects of reasoning (determined by knowledge structures) with parametric aspects that represent quantitative belief and other measures affecting the reasoning process. Various components of the "Logic of Plausible Reasoning" have been implemented in several systems [1, 7, 9]. These implementations used various subsets of the inferences ("statement transforms") described in the core theory to investigate the parametric aspects of the theory. The implementations demonstrated how the core theory of plausible reasoning can be applied to various domains. DIH specifies a broader set of knowledge transmutations in a general and well-defined knowledge representation. These transmutations are part of a framework for both reasoning and learning. The organization of concepts into various hierarchies has been proposed as a plausible structure for human semantic memory quite early [6]. The WordNet project at Princeton University, directed by George Miller, concerns the implementation of an electronic thesaurus using such a memory structure [3]. WordNet is a very large lexical database with approximately 50,000 different word forms. WordNet divides the lexicon into various categories including nouns, verbs, and modifiers (adjectives and adverbs). Significantly, the nouns are stored in topical hierarchies (both type and part hierarchies), lending support to the DIH representation. However, while WordNet can be used as a source of DIH hierarchies, it does not provide any inferential facilities. Other relevant research includes the development of the Common Knowledge Representation Language (CKRL), done as part of an ESPRIT project [15]. CKRL offers a language in which knowledge can be exchanged between machine learning tools and it uses the set of most common representation structures and operators. While CKRL's representation for multistrategy learning seeks to integrate the various representations employed by several different learning programs for communication of knowledge between the machine learning tools, our aim is to develop a representation that facilitates an integration of learning and inference processes. Semantic network knowledge representation systems, such as the KL-ONE family [2], utilize a large network of relationships between concepts, intermixing different relationships. The hierarchies they use are tangled, in which a concept can have more than one parent. As a consequence, implementing knowledge transmutations, e.g., generalization, is not as easy as in DIH. DIH facilitates such transmutations because it uses only single-parent hierarchies, representing a structuring of a set of entities from a certain viewpoint. In DIH, a concept can belong to different hierarchies, reflecting the fact that a given concept (or object) can usually be classified from several different viewpoints. The design of semantic networks is primarily oriented toward facilitating deductive inference, and is not usually concerned with knowledge visualization. The design of DIH is oriented toward facilitating multitype inference and providing a basis for the visual presentation of knowledge. DIH also utilizes a hierarchy of merit parameters to represent probabilistic factors associated with plausible reasoning. 3 Basic Components of DIH The theory of plausible reasoning postulates that there are recurring patterns of human plausible inference. To adequately represent these patterns, one needs a proper knowledge representation. The DIH approach partitions knowledge into a "static" part and a "dynamic" part. The static part represents knowledge that is relatively stable (such as established hierarchies of concepts), and the "dynamic" part represents knowledge that changes relatively frequently (such as statements representing new observations or results of reasoning). The static part is organized into ^ype hierarchies (TH), pa^rt hierarchies (PH) and precedence hierarchies. Precedence hierarchies include several subclasses, specifically, measure hierarchies (MH), quantification hierarchies (QH) and schema hierarchies (SH). The dynamic part consists of traces that represent knowledge involving concepts from different hierarchies. Each trace links nodes of two or more hierarchies and is assigned a degree of belief. These hierarchies are composed of nodes representing abstract or physical entities, and links representing certain basic relationships among the entities, such as "type-of", "part-of" or "precedes". In the "pure" form, these hierarchies are single parent, that is, no node can have more than one parent. The root node is assigned the name of the class of entities that are organized into the hierarchy from a given viewpoint. A type (or generalization) hierarchy organizes concepts in a given class according to the "type-of" relation (also called a "generalization" or "kind-of" relation). For example, different types of "animals" can be organized into a "type" hierarchy. A part hierarchy organizes entities according to a "part-of" relationship. For example, the world, viewed as a system of continents, geographical regions, countries, etc., can be organized into a part hierarchy. While properties of a parent node in the type hierarchy are inherited by children nodes, this does not necessarily hold for a part hierarchy. There are several different part relationships, which include part-component, part-member, part-location and part-substance [18]. To represent relationships among elements of ordered or partially ordered sets, a class of precedence hierarchies is introduced. Hierarchies in this class represent hierarchical structures of concepts ordered according to some precedence relation, such as "A precedes B", "A is greater than B", "A has higher rank than B", etc. There are several subclasses of precedence hierarchies. One subclass is a measure hierarchy, in which leafs stand for values of some physical measurement, for example, weight, length, width, etc., and the parent nodes are symbolic labels characterizing ranges of these values, such as "low", "medium", "high", etc. Figure 1 shows a measure hierarchy of possible values of people's height. Dotted lines indicate a continuity of values between nodes. Arrows indicate the precedence order of the nodes. Another subclass hierarchy is a belief hierarchy, in which nodes represent degrees of an agent's beliefs in some knowledge represented by a trace. Other subclasses of precedence hierarchies include a r^ank hierarchy and a quantification hierarchy. A rank hierarchy consists of values representing the "rank" of an entity in some structure, e.g., an administrative hierarchy or military hierarchy. A quantification hierarchy consists of nodes that represent different quantifiers for a set (An example is shown in Figure 2). A quantification hierarchy that is frequently used in commonsense reasoning includes such nodes as "one", "some" (both corresponding to the existential quantifier), "most", and "all" (corresponding to the universal quantifier). Each hierarchy has a heading that specifies its kind (TH, PH, MH, QH or SH) and the underlying concept (or viewpoint) used for the creation of the hierarchy. In addition, the type and part hierarchies also have a top node that in the type hierarchies stands for the class of all entities in the hierarchical structure, and in the part hierarchies for the complete object. Figure 1: A measure hierarchy of values characterizing people's height. Schema hierarchies (or schema) are structures that indicate which hierarchies are connected in order to express multi-argument concepts or relationships. For example, the schema hierarchy for the concept of "physical-object" can be . This states that an attribute "shape" applies to any object that is a "physical-object" (a node in the "physical-object" hierarchy), and produces a shape value, which is a node in the "shape" hierarchy. The schema hierarchy for the concept of "giving" may be that states that this concept involves an agent that gives, an agent that receives, an object that is being given, and the time when the "giving" occurs. The agents, object and time are elements of their respective hierarchies. DIH also makes a distinction between structural and parametric knowledge. The structural knowledge is represented by hierarchies and traces that link nodes of different hierarchies. Parametric knowledge consists of numeric quantities characterizing structural elements of knowledge. In DIH, this knowledge is represented via precedence hierarchies of merit parameters. The basic merit parameter is a belief measure that characterizes the "truth" relationship of a given component of knowledge representation (a trace), as estimated by the reasoning agent. Other merit parameters include the forward and backward strength of a dependency, fr^equency^, dominance, etc. [5, 13]. In this paper, we will consider only one merit parameter, namely, the belief measure. The theory of human plausible reasoning [5] postulates that people rely primarily on the structural knowledge, and resort to parametric knowledge when the "structural" reasoning does not produce a unique result. They resist performing uncertain inferences based on only parametric knowledge, and they are not good at assigning a degree of certainty to a statement based only on the combination of the certainties of its con- stituents, without taking into consideration the meaning of the whole sentence. A reason for this may be that there does not exist a normative model for reasoning under uncertainty that is independent of the structural aspects of knowledge, i.e., its meaning. Plausible reasoning about a problem or question typically involves both structural and parametric knowledge components. Nodes of a hierarchy are elementary units of the DIH representation. Each node represents some real or abstract entity—a concept, an object, a process, etc. A given entity can be a node in multiple hierarchies, where each hierarchy structures a set of entities from a different viewpoint. The relevant viewpoint is determined by the context of the discourse. As mentioned earlier, the basic structures in the DIH representation are hierarchies, nodes, traces and schema. Our research on DIH demonstrates that these structures provide a very natural environment for performing basic types of inference on statements. The subsequent sections show how these inferences are performed using the DIH representation. 4 DIH Traces To describe the DIH knowledge representation, let us start by representing the following statement: "It is certain that some power plants in New York have mechanical failures." Figure 2 presents this statement as a trace connecting nodes of five hierarchies: "Process plants" and "Failure", both type hierarchies; "Quantification", the quantification hierarchy; "Location", a part hierarchy; and "Belief measure" a measure hierarchy. The interpretation of the trace is done on the basis of the schema hierarchy shown in Figure 3. The schema defines the universe of sentences that can be generated using concepts of these hierarchies, ordered according to the schema. TH - Process Plants (Process Plant ) (Ön^ (Unlikely) (Probable} ^Certain} (Nuclear " /^Hydro ^ j^power plant/ / ^power plant/ \ \ _ | PH - Location j \ \ \ \ (^United States} New York) (California^ (Mechanical ) (Electrical ) '^Syracuse) (Buffalo} The trace representing the sentence consists of nodes linked by dotted lines. The arrows in the trace indicate the argument (reference set) that is being described by the sentence. The interpretation of the trace is given by schema hierarchy SH1 in Figure 3. Figure 2: A DIH trace representing the sentence "It is certain that some power plants in New York have mechanical failures." The convention for the direction of arrows in a trace is that they point from the nodes denoting descriptive concepts to the argument node that stands for the set (or individual) being described, called a rel^eren^ce set. In this example, the set being described is "Power plant" in the hierarchy of Process Plants, thus the node representing it is the argument node. Other nodes linked by the trace represent descriptive concepts for the argument node. The belief measure takes values from a belief hierarchy, and refers to the entire trace rather than a single node, which is indicated by the schema. Using the formalism of the annotated predicate logic [9], this trace can be interpreted as: "(Some)x, [type(x) = Power plant) & [location(x) = New York] & [failure(x) = mechanical]: Belief = 1.0." This statement is a quantified conjunction of several elementary statements. An elementary statement expresses one property of the reference node (set), for example, "Location(Power plant) = New York." In a formal expression of an elementary statement, the reference set ("Power plant") is called an ar^g^ument, the predicate ("Location") is called a descriptor, and the value of the descriptor ("New York") is called the referent. Thus, an elementary statement is formally expressed in the form "descriptor (argument) = referent". In Figure 2, the square boxes contain the heading of the hierarchy. The concept specified in the heading is the general descriptor for the hierarchy. The nodes in the hierarchy are possible values of this descriptor. The schema hierarchy, SHI, in Figure 3 is used for the interpretation of the trace represented in Figure 2. The heading indicates the type of hierarchy (SH: Schema Hierarchy) and , the reference I Quantification -►EHHiÜ^-1-H Belief Measurej Figure 3: Schema hierarchy SHI. set of the trace. Since the schema hierarchy is a precedence hierarchy, a valid interpretation of the schema requires each of the descriptors in order. Thus the first element of the trace must be from the quantification hierarchy, the second from the failure hierarchy, the third from the location hierarchy and the last from the hierarchy of belief measures. This schema hierarchy is also utilized for examples in Section 4. Adding knowledge to the DIH representation is done by creating hierarchies and specifying traces that express statements involving nodes of different hierarchies. To allow proper interpretation of a trace, the schema is also specified by indicating relevant descriptors and their order. DIH allows one to represent complex forms of knowledge, involving different kinds of quantifiers, multi-argument predicates, different types of logical operations on them, and to associate degrees of belief with individual statements. A more complete description of the DIH representation system is given in [8]. 5 Multitype Inference in DIH The core theory of plausible reasoning introduced in [5] gives four knowledge transmutation operators (also called transforms) P generalization, specialization, similization and dissimilization. The Inferential Theory of Learning [13] specifies several additional operators, of which abstraction and concretion are incorporated into DIH. (In [5] the abstraction and concretion transmutations were called referent generalization and referent specialization, respectively.) Generalization (specialization) transmutations extend (contract) the reference set. They are done either by argument generalization (specialization) or by quantification generalization (specialization). Argument generalization is accomplished by moving above the node representing the reference set in a type hierarchy. Quantifica- tion generalization is accomplished by moving up the quantification hierarchy. Abstraction (concretion) transmutations decrease (increase) the amount of information about the reference set. A way to accomplish such a transmutation is by moving above the node in the type or part hierarchy that corresponds to a value of some descriptor in the sentence represented by the trace. Similization (dissimilization) transmutation is done by replacing a node corresponding to the reference set (argument) or a descriptor value (referent) by a node at the same level of hierarchy, which corresponds to a similar (dissimilar) concept within the context of the given hierarchy. In the case of dissimilization, the resulting trace is linked with a negation node, because the generated inference is a negation of the original sentence [13]. These transmutations can be given a simple conceptual interpretation, if one assumes that nodes at each level of hierarchy are ordered by the relation of similarity, that is, nodes that correspond to similar concepts (in the context of the given hierarchy) are located near each other, and nodes that correspond to dissimilar concepts are placed far away from each other. Such an arrangement is natural for precedence hierarchies. In sum, similization and dissimilization transmutations are performed by sideways node movements, while generalization (specialization) and abstraction (concretion) are performed by upward (downward) node movements. Table 1 lists all the above knowledge transmutations, specifying their abbreviated name, the relevant hierarchies, and the underlying inference type. The relevant hierarchies are the kinds of hierarchies for which the transmutations are valid. The various kinds of part hierarchies are not shown, but are distinguished in DIH. Additional constraints are necessary in some kinds of part hierarchies to maintain the validity of the Transmutation Symbol Relevant Hierarchies Inference Type Argument Generalization AGen Type, Part Deductive Argument Specialization ASpec Type, Part Inductive Quantification Generalization QGen Quantification Inductive Quantification Specialization •QSpec Quantification Deductive Abstraction Abs Type, Part, Precedence Deductive Concretion Con Type, Part, Precedence Inductive Argument Similization ASim . Type, Part Analogical Argument Dissimilization ADis Type, Part Analogical Referent Similization RSim Type, Part, Precedence Analogical Referent Dissimilization RDis Type, Part, Precedence Analogical Table 1: Basic knowledge generation transmutations. transmutation. Figure 4 presents a schematic diagram illustrating how knowledge transmutations modify a trace. A dotted line represents a link in a trace. An arrow means that the trace is moving to a new node in the indicated direction by performing the indicated transmutation. The quantification transmutations operate over the entire trace, rather than on a single node, as do the transformations involving the merit parameters. One form of generalization transmutation moves a node in the quantification hierarchy upward, another form moves a node (argument) in the type hierarchy " upward. The " + " indicates a strengthening of a merit parameter, or the movement of the link to a node that is "higher" in the particular merit parameter measure hierarchy. The "P" indicates a weakening of the merit parameter, or the movement of the link down in the hierarchy. Moving a node in a trace in a manner that corresponds to a deductive inference (Table 1) produces a new trace (statement) with the same truth status as the original trace. In the case of node movement that corresponds to inductive or analogical inference, the smaller the node movement ("perturbation"), the more plausible the resulting inference. The Argument Generalization transmutation represents a deductive inference. The abstraction operation is also deductive. In contrast, Argument Specialization, Quantification generalization and Concretion are inductive, because they produce traces (statements) that logically entail the original traces (statements). The above transmutations can be usually done in a number of different ways, by moving to different alternative nodes. The plausibility of the generated statements depends on additional merit parameters, such as dominance, ' typicality, multiplicity, similarity, frequency, etc. [5]. These issues will be the subject of future research. 6 Visualizing DIH-Based Inference This section illustrates several basic transmutations through a series of self-explanatory examples. These examples involve the same original statement, represented as a trace in Figure 2. Given the original statement, these transmutations generate new statements illustrated by DIH traces in Figures 5 through 12. Input to Transmutation Output from Transmutation Direction of Transmutation The legend above is used for interpreting the following figures. The input statement is the same as that of Figure 2, without the belief measure hierarchy. All of the examples are interpreted according to the schema SHI shown in Figure 3. There are two referents in the input statement. The resulting statements (output) show the results of the given transmutation assuming that there are no merit parameters that assist in the QGcn k Q - - I QSpcc AGcn t Asini AS i in ^-ìl—^ RS i in I RS i m t ASpcc I Con + k - MP A R D Q MP argument (standing for a set that is being described; the reference se!) referent (a value of the descriptor that characterizes the argument) descriptor quantification merit parameter (one or more merit parameters) a link in a trace moving a node in the direction of the arrow performs the indicated transmutation. Figure 4: Diagram of knowledge transmutations in DIH. specialization or concretion and that the similiza-tion operator finds a single "most similar" node using the descriptors given. The Background Knowledge (BK) is the learner's prior knowledge that is relevant to the learning process. 7 MTL-DIH System The research on DIH aims at developing a representation that will facilitate all basic inferential strategies and knowledge transmutations to be implemented in the multistrategy task-adaptive learning system MTL-DIH. Although issues related to the implementation of an MTL system are beyond the scope of this paper, we will briefly outline the basic ideas. We have been pursuing two approaches, MTL-JT, which builds a plausible justification tree to "understand" a user's input [17], and a second one, MTL-DIH, based on DIH. In the MTL-DIH approach, a learning strategy is determined by analyzing the learning task. This analysis relates the input information to the learner's background knowledge and the learning goal. The input information to the system is assumed to be given in the form of logic statements. It can be concept examples, concept descriptions, rules or a combination of the above. The system re- represents the input as a trace, or set of traces. Background knowledge is the part of the learner's prior knowledge that is relevant to the input and the learning goal. The learning goal specifies criteria characterizing knowledge to be learned. There are different kinds of learning goals, such as to predict new information, to explain the input, to classify a fact or concept instance, to create an abstract description from an operational one or conversely, to create a problem solution or a plan. It is assumed that the learning goal is determined by a teacher or by the control module of the system. The learning process involves determining the type of relationship between the given input and the background knowledge, and performing a sequence of knowledge transmutations, involving input and background knowledge, to produce knowledge satisfying the learning goal. 8 Summary and Future Research The DIH knowledge representation presented serves as the basis for implementing multistrategy task-adaptive learning. It builds upon ideas of the Inferential Theory of Learning and the core theory of plausible reasoning. Although it is closely Input: Siiine power piants in New York have mechanical failures BK:_Indicated hierarchies_ Output: All power plants in New York have mechanical failures IQH - Quantification fone)-»{jome)-►(Maat)- (Chemical plant } \ (Food plant) ( _ — " (Po^cr plant ^.......... Q^y, .•"Nuclear " f (H y d r o ',;:w:r wia.it/ I^ power piynt,' PH - Location (^United States) ^Syracuse) fTuffalo) •• 1 MocMamcai J (Electrical J Figure 5: Inductive generalization based on quantification. Input: Some power plants in New York have mechanical failures BK:_Indicated hierarchies_ Output : One power plant in New York has a mechanical failure Figure 6: Deductive specialization based on quantification. Input: Some power plants in New York have mechanical failures BK:_Indicated hierarchies_ Output: Some process plants in New York have mechanical failures mASA- ( Mechanical ^ ^ElectricaT"^ Figure 7: Deductive generalization based on the argument. Input: Some power plants in New York have mechanical failures BK:_Indicated hierarchies_ Output: Some Nuclear power plants in New York have mechanical failures [qH • (^antificatlon | fCoal power! / V -^ /'Nuclear näH^drö ^ AS^cj u ..MHydr° i AS'*c\ ^^—-—P'any* ^•••\power plany-..................X.......—/-*-^ ^ ~ ~ (_New York) (^California} ( Syracuse j (^Buffalo) Figure 8: Inductive specialization based on the argument. Input: Some power plants in New York have mechanical failures BK:_Indicated hierarchies _ Output: Some power plants in New York have failures TH • Process Piants! "qH Quanti ficaiion J (Process Plant) ^one)-Some j-•{ Most )-M,^ All (Chemical plant ) ( Food plant) (^ower plant Figure 9: Abstraction transmutation. Input: Some power plants in New York have mechanical failures BK:_Indicated hierarchies_._ Output: Some power plants in New York have component failures jTH - Process Plants) |QH - Quantification] föne)-»-(some)-►(Most j-»{ All (Powier piantjJT fCoal power^ / \ Toil powert . ^.Pljnt J I \ ^ Dlljit J ^ PH - Location J IJnUed States) fNuelear ^ fHydVo > \ Vpcwer plan^ j^power plan^ ^ I TH ■ Failure | \ (Syracuse) (eTffaloj V > ' \ (Malfunction) \ \ (computer) ( Controller) (Lack of lubrication) (^'component defect) Figure 10: Concretion transmutation. M. Hieb & R. Michalski Input: Some power plants in New York have mechanical failures BK:Indicated hierarchies; Output: Some chemical plants in New York have mechanical failure (Syracuse) ^Buffalo} ( MechanicalJ C Electrical J Figure 11: Argument similization transmutation. Input: Some power plants in New York have mechanical failures DK:Indicated hierarchies Output: Some power piants in California have mechanical failures (proctsi Plant j (rawer o:sn: ^ /□il poweM ypUnt J [one )-J-M Most):-All [ PH - Lacatoji | (i^Tieđ SiJtea) / \cower plany I^power plant/ ^Nuclear ^ / f^Hydro ^ / [tH - FjiiuTT] ( Syracuse^ (suffalo) ( Hechanical ^ fci^trical ) Figure 12: Referent similization transmutation. related to the semantic network representation, it represents a significantly different approach, and contains many new ideas that make it particularly useful for representing multitype inference. These include the idea of dividing the knowledge representation into a static part and a dynamic part, the organization of knowledge in which basic forms of inference can be performed via simple trace perturbations, and the introduction of various precedence hierarchies, such as the schema hierarchy, the measure hierarchy, and the quantification hierarchy. The primary purpose of this paper was to demonstrate how DIH supports several basic knowledge generation transmutations, specifically, generalization, specialization, abstraction, concretion, similization and dissimilization. The first version of DIH has been implemented in Smalltalk, and used as a tool for investigating the interactive display and modification of traces in hierarchies. The visual display of inference is particularly useful in situations that involve traces connecting only a few hierarchies (that is, representing short sentences). To facilitate knowledge visualization, the system has an option to present traces with only a limited number of neighboring nodes, rather then connecting complete hierarchies. In DIH, the more knowledge structures there are in background knowledge, the easier it is to assimilate new knowledge, or to plausibly explain input statements. DIH is an efficient, representation, because most knowledge modifications consist of forming or changing traces, without affecting the established hierarchies. Many issues remain to be addressed in future research. Among these issues are the representation of more complex forms of knowledgeQmutual implications, various types of dependencies, temporal and spatial knowledge, and the development of methods for determining the affect of merit parameters on the reasoning process. Acknowledgments The authors thank Eric Bloedorn, Mark Burn-stein, David Hille and Ken Kaufman for their thoughtful comments on this paper. This research was conducted in the Center for Artificial Intelligence at George Mason Univer- sity. The Center's research is supported in part by the National Science Foundation under grant No. IRI-9020266, in part by the Advanced Research Projects Agency under the grant No. N00014-91- J-1854, administered by the Office of Naval Research and the grant No. F49620-92-J-0549, administered by the Air Force Office of Scientific Research, and in part by the Office of Naval Research under grant No. N00014-91-J-1351. References [1] M. Baker, M.H. Burstein, and A.M. Collins, Implementing a Model of Human Plausible Reasoning. In Proceedings of the Tenth International Joint Conference of Artificial Intelligence, pp. 185-188, Morgan Kaufman, Los Altos, CA (1987). [2] R.J. Brachman, D.L. McGuinness, P.F. Patel-Schneider, L.A. Resnick, and A. Borgida, Lining w^i^ CLASSIC: When and How to Use a KL-ONE-Like Language. In Pr^inciples of Semantic Networks - Explorations in the Representation of Knowledge, edited J.F. Sowa, Morgan Kaufmann, Los Altos, CA (1991). [3] R. Beckwith, C. Fellbaum, D. Gross, and G.A. Miller, WordNet: A Lexical Database Organized on Psycholingistic Principles. Using On-line Resour^ces to Build a Lexicon. edited U. Zernick, Erlbaum, Hillsdale, NJ (1991). [4] D. Boehm-Davis, K. Dontas, and R.S. Michalski, A Validation and Exploration of the Collins-Michalski Theory of Plausible Reasoning. Reports of the Machine Learning and Inference Laboratory, MLI 90-5, Center for Artificial Intelligence, George Mason University, Fairfax, VA (1990). [5] A. Collins, and R.S. Michalski, The Logic of Plausible Reasoning: A Core Theory. Cognitive Science, 13, pp. 1-49 (1989). [6] A. Collins, and M.R. Quillian, How To Make A Language User. In Organization of Memory, edited E. Tulving and W. Donaldson, Academic, New York (1972). [7] K. Dontas and M. Zemakova, APPLAUSE: An Implementation of the Collins-Michalski Theory of Plausible Reasoning. In Proceedings of the Third International Symposium on Methodologies for Intelligent Systems, Torino, Italy (1988). [8] M.R. Hieb, and R.S. Michalski, A Know^led^e Representation System Based on Dynamic Interlaced Hierarchies: Basic Ideas and Examples. Reports of the Machine Learning and Inference Laboratory, MLI 93-5, Center for Artificial Intelligence, George Mason University, Fairfax, VA (1993). [9] J. Kelly, PRS: A System f^or Plausible Reasoning. M.S. Thesis, Department of Computer Science, University of Illinois, Urbana (1988). [10] R.S. Michalski, A Theory and Methodology of Inductive Learning. In Machine Learning: An Artificial Intelligence Approach, edited R. S. Michalski, J. Carbonell and T. Mitchell, TIOGA Publishing Co., Palo Alto, CA, pp. 83-134 (1983). [11] R.S. Michalski, Towards a Unified Theory of Learning: Multistrategy Task-adaptive Learning. Reports of the Machine Learning and Inference Laboratory, MLI 90-1, Center for Artificial Intelligence, George Mason University, Fairfax, VA (1990). [12] R.S. Michalski, Inferential Learning Theory as a Basis for Multistrategy Task-Adaptive Learning. First International Workshop on Multistrategy Learning, pp. 3-18, Harpers Ferry, West Virginia (1991). [13] R.S. Michalski, Inferential Theory of Learning: Developing Foundations for Multistrat-egy Learning. In Machine Learning: A Mul-tistrategy Approach, Volume 4, edited R.S. Michalski and G. Tecuci, Morgan Kaufmann, Los Altos, CA (1993). [14] R.S. Michalski, A.B. Baskin, C. Uhrik, T. Channic, S. Borodkin, A.G. Boulanger, L. Rodewald and R.E. Reinke, A Technical Description of the AD VIS E. l Meta-Expert System that Integrates M^ult^ip^e Knowledge Representations and Learning Capabilities. ISG 86-8, Department of Computer Science, University of Illinois, Urbana (1986). [15] K. Morik, K. Causse, and R. Boswell, A Common Knowledge Representation Inte-gr^ating Learning Tools. First International Workshop on Multistrategy Learning, pp. 8196, Harpers Ferry, West Virginia (1991). [16] G. Tecuci and R.S. Michalski, A Method for Multistrategy Task-adaptive Learning Based on Plausible Justifications. In Machine Learning: Proceedings of the Eighth International Workshop, edited L. Birnbaum and G. Collins, Morgan Kaufmann, Los Altos, CA (1991). [17] G. Tecuci, An Inference-Based Framework for Multistrategy Learning. In Machine Learning: A Multistrategy Approach, Volume 4, edited R.S. Michalski and G. Tecuci, Morgan Kaufmann, Los Altos, CA (1993). [18] M.E. Winston, R. Chaffin and D. Herrmann, A Taxonomy of Part- Whole Relations. Cognitive Science, 11, pp. 4l'7-444 (1987). [19] J. Wnek and R.S. Michalski, An Experimental Comparison of Symbolic and Sub-symbolic Learning Paradigms: Phase I -Learning Logic-style Concepts. First International Workshop on Multistrategy Learning, pp. 324-339, Harpers Ferry, West Virginia (1991). Call for Papers Distributed and Parallel Real Time Systems Special Issue of INFORMATICA Guest Editors: Marcin Paprzycki, Janusz Zalewski University of Texas-Perian Basin We would like to invite papers for the Special Issue of INFORMATICA, An International Journal of Computing and Informatics published in English by the Slovene Society Informatika and the Josef Stefan Institute in Ljubjana, Slovenia. The scope of the volume will encompass a variety of issues associated with the recent developments in the area of distributed and parallel real-time computing. Papers related to both hardware and software aspects of cuncurrency will be considered. Their focus should be on the timeliness and responsiveness aspects (bounded response time) of respective solutions. Sample topics may include: - multiprocessor buses and architectures - real time aspects of local area networks - message scheduling in distributed systems - distributed and parallel operating systems - task allocation and load balancing in real time - interprocess synchronization and communication for real time - specification and programming languages - formal methods in specification and design - debugging of distributed real-time systems - designing parallel and distributed applications - distributed real-time databases - dependability, realiability and safety in distributed real-time systems - standardization. Only previously unpublished work will be accepted for the volume. All papers will be refereed. Due dates: * February 15, 1994 Submission deadline * May 1, 1994 Notification of the authors * June 1, 1994 Camera-ready versions due All correspondence and requests for sample copies of INFORMATICA should be addressed to the Guest Editors at the following address: Dr. Marcin Paprzycki and Dr. Janusz Zalewski Dept. of Computer Science University of Texas-Permian Basin 4901 E. University Blvd Odessa, TX 79762-0001 USA Phone: (915)367-2310 Fax: (915)367-2115 Email: paprzycki_m®gusher.pb.utexas. edu zalewski.j (Dutpb. pb. utexas. edu Call for Papers 18th German Annual Conference on Artificial Intelligence (KI-94) Saarbrücken, September, 18-23, 1994 General Chairs: Jorg Siekmann, DFKI, Univ. Saarbriicken Hans-Jiirgen Biirckert, DFKI Saarbriicken The scientific, as well as the economic importance of Artificial Intelligence research has steadily grown, even though in some degree, partly slower than had been hoped for. AI research in Germany had a belated start, but steadily improved in recent years. Today the field is well established and some areas even play an outstanding role internationally. This positive development is accounted for in the forthcoming German Conference on AI, which is especially broadly based, aiming at researchers, university students, users and practitioners in industry alike. The German Conference on Artificial Intelligence is traditionally organised once a year by the Al-section of the German Society for Computer Science (GI). This is one of the four subdivisions of the GI and today the AI section has more than four thousand members. In 1994, the conference will be carried out at the University of Saarland in Saarbriicken by the German Research Center of AI (DFKI), the Computer Science Department (FBI), the Institute of Computer Linguistics (CoLi) and the Max Planck Institute for Computer Science (MPI). It comprises a Scientific Conference, from the 18th to 22nd of September 1994, and an Industrial Congress on the 22nd and 23rd of September 1994. Tutorials will be offered on Sunday the 18th, scientific talks form Tuesday to Thursday in the morning and workshops in the afternoon. On Monday a symposium will be held, covering a selected topic, the "International Symposium on Logics in Artificial Intelligence", chaired by Dov Gabbay (Imperial College London) and Jorg Siekmann (DFKI, University of Saarland). Invited speakers will give lectures on current issues pertaining to this context. This is prompted by the publication of the multivolume "Handbook of Logics in Artificial Intelligence and Logic Programming". The Industrial Congress offers a multifaceted programme which demonstrates how AI is put into industrial practice: talks on areas of applications of AI, consulting service offers from the German AI-Institutes' association (AKI) and an AI Exhibition. Contributions to the Scientific Conference and the Industrial Congress will be published in a set of Springer- Verlag conference publications. KI-94 will be supported by an advisory board consisting of the previous and future chair persons, as well as represen- tatives from Industry and University. Local Arrangement Committee Susanne Biundo, DFKI (chair) Michael Kohlhase, FBI Brigitte Krenn, CoLi Christoph Weidenbach, MPI Conference Office Coordinator: Reinhard Karger DFKI Saarbriicken Stuhlsatzenhausweg 3 D-66123 Saarbrücken Phone +49 681 302-4444, Fax +49 681 302-5341 e-mail: ki-94