A CRITIOUE ON PARALLEL COMPUTER ARCHITECTURES INFORMATICA 2/88 UDK 681.3.001.519.6 L. M. Patnaik R. Govindarajan M.Špegel" and J. Silc" Indian Institute of Science BANGALORE and Jožef Stefan Institute" The need for parallel processing is felt in many disciplines. In this paper we discuss the essential issues involved in these systems. Various architectural proposals reported in the literature are cla&sified based on the execution order and parallelism exploited. Design and salient features of some architectures which have gained importance are highlighted. Architectual features of non-von Neumann systems such as data- driven, demand-driven, and neural computers, which open the horizont for research in the new models of computations, are presented in this paper. Principles and requirements of programming languages and operating system9 for parallel architectures are also reviewed. Contents 1. Introduction Issues in Parallel Processing Systems Models of Computation Control Flow, Data-Driven Model, Demand-Driven Model 2.2. Concurrency Temporal, Spatial and Asynchronous Parallelism, Granularity of Parallelism 2.3. Scheduling 2.4. Synchronization Von Neumann Parallel Processing Systems Pipeline and Vector Processing Pipeline Architecture, Vector Processing 3.2. SIMD Machines 3.3. MIMD Architecture Interconnection Netuorks, Tightly Coupled Multiprocessors, Loosely Coupled Multiprocessors 3.4. Reconfigurable Architecture Connection Machine 3.5. VLSI Systems SiBtolic Arrays, Wavefront Array Processors 3.6. Critique on von Neuraann Systeras 4. Non-von Neumann Architectures 4.1. Data-Driven Model 4.2. Demand-Driven Systems 4.3. Neural Computers 5. Software Issues Related t'o Multiprocessing Systems 5.1. Parallel Programming Languages Conventional Parallel Languages, Non-von Neuman Languages 5.2. Multiprocessor Operatihg Systems 6. Conclusions 7. References 1• Introduction There has been an ever-increasing need for more and more computing power. In a real-time enviroraent, the demands are much more. While the computers built around a single processor cannot stretch their processing speed beyond a few milliona of floating point operations per second (FLOPS), an operating speed of a few giga FLOPS is required in many applications. Parallel processing techniques offer a •promising scope for the3e applications. Several applications such as computer graphics, computer aided design (CAD) of electronic and mechanical systems, wheather modeling and robotics have a high greed for high computirig power. To quote an example, ray tracing techniques [32] used to display 3- dimensional objects in computer graphics have to process 5xlO6 rays, each ray intersecting 10 to 20 surfaces, and each intersection computation requiring 20 to 30 floating. point operations on an average. In order to provide a fast, interactive and flicker-free display, each frame has to be computed 30 times per second. Thus, the total computation pouer required 60 giga FLOPS. Comparing this with the execution speed of the present day supercomputers -- whose expected/measured performance is about 100 millions FLOPS [53] -- we observe that the deroand is more by an order of magnitude of two. The need for parallel processing is conspicuous from the above illustration. The recent advances witnessed in VLSI technology and the consequent decline in the hardware cost further encourage the construction of massively parallel processing systems. The paper is organized in the following manner. In Section 2, we point out the major iasues associated with multiprocessing systems, with a brief discussion on each of theae items. 48 We classify parallel processing architectures broadly into two classes. The firat of them, the von Neumann type, includes all multiprocessing systems that adopt the principles of the von Neumann computation model. Pipeline and vector processing, SIMD machines, MIMD raachines and VLSI systems which fall under this category are surveyed in Section 3. Supercomputers employ one or more of theae techniques for achiving high performance. More emphasis is given to current and recent developmenta in these areas. In non-von Neumann architecture, we include data-driven, demand- driven and neural computers. Their methodology of computation is functionally different from the ones discussed earlier. An overview of these architectural configurations is presented in Section 4. The programming language and operating system support required for working with theBe complex parallel processing systems are reviened in detail in the subsequent section. Parallel programming languages are classified into two categories: the conventional von Neumann type (that adop the imperative style of programming) and the non-von Neumann languages. Case studies of some of the popular languages and their salient features are also reported. We remark that an exhaustive coverage of all contributions and research work reported in this fascinanting area of parallel processing systems is beyond the scope of this paper and we do not attempt that. Rather, we will concentrate on the basic principles behind the design of the various architectures. We will pay more attention to some specific architectures and models of computation which have gained importance. 2. Issues in Parallel Processing System3 In the this section we discuss the various associated with multiprocessing systems. First, it is interesting to look at the raodels of computation, the way one has evolved from the other, leading to myrial architectural configurations and machines. 2.1. Models of Computation Depending on the instruction execution order, coraputer systems can be classified as control-driven, data-driven or demand-driven machines. Control Flow Conventional von Neumann uniprocessing and multiprocessing systems have an explicit execution order specified by the prograraraer. Thia hinders the extent of parallelism that can be exploited. Hovever, these control models perform well for structured data items such as arrays [34]. Also the three decades of programming experience we had with these raodels still makes it a proponent candidate for future generation computers. Data-Driven Model In this model, the execution order is specified by the data dependency alone [26]. Instructions are executed as soon as all their input arguments become available to them. Data flow model is suitable for expression evaluation [34]. Since, only data dependency determines the execution order, and the granualarity of parallelism exploited is as low as a single inatruction, this model of computation exploits maximum parallelism. However, because of its eagerness in evaluating functions, it executes an instruction, if its operands are available, irrespective of whether or not it is required for the computation of the final result. Demand-Driven Model In demand-driven (also called reduction) execution, pioneered by Berkling [15] and Backus [9], the demand for result triggers the execution which ia turn triggers the evaluation of its arguments and so on. This demand propagation continues until constanta are encountered; then a value is returned to the demanding node and execution proceeds in the opposite direction. Since a demand-driven computer performs only those computations required to obtain the results, it will perform less computations, in general, than a data-driven computer. Aa the computation model is 'laz^'1 in evaluating expreaaions, instructions accept partially filled data structures2. Thia makea demand-driven model to support infinite data struotures. Such a facility ia not available in the other raodel8 of computation. Houever, thia inodel auffers overheads in terms of execution time due to the additional propagation of the demand . • o • o • Lovvest level computing modute Intermediate commands Output commands Fig. 12 Neural Net Modules 59 The benefits of neural net include high computation rates, provided by massive parallelism and a greater degree. of fault- tolerant since there are many processing nodes each with primarily local connections. Designing artifioial neural nets to solve problems and studying real biological nets may also change the way we think about problems and lead us to new insighta and algorithmic improvements. 5. Software Issues Related to Multiprocessing Systems Having discussed the various multiprocessing systems, it is worth probing further into two aspects of parallel processing: the language to program and the operating system support to handle these complex systems. 5.1. Parallel Programming Languages One of the motivations behind the developraent of concurrent languages has been the structuring of software -- in particular, operating system -- by means of high-level language constructs. The need for liberating the production of real-time applications from assembly language has been another driving force. In the following discussion, we classify the concurrent high-level languages into two groups: 'traditional' languages of the von Neumann type (based on imperative atyle of programming) and the unconventional languages, such as data flow, functional and logic-based languages. A quick review of some of these languages is presented below. Conventional Parallel Languages Based on the imperative style of programming, these languages are just the extensions of their sequential counterparts. A concurrent language should allow programmers to define a set of sequential activities to be executed in parallel, to initiate their evolution and to specify their interaction (refer [4] for an excellent survey on the concepts and notations for parallel programming). One important point regards the 'granularity of parallelism', i.e. the kinds of granules that can be processed in parallel. Some languages specify concurrency at statement level, and certain others at task level . Constructs for specifying inter-activity interaction are probably the most critical linguistic aspects of concurrency. Language constructs ensuring mutual exclusion are called synohronization primitives. Some of the best- known and landmark solutions that have been adopted to solve these problems are the semaphores [27], raailboxes, monitors [49] and remote procedure calls [46]. Research in this direction is towards designing new raachanisms for interprocessor communication, such as ordered ports. [13]. In the following discussion we restrict ourselves to a few parallel programming languages and their salient features. Communicating Sequential Processes (CSP) [50] is a language designed especially for distributed architectures. In CSP, activities comraunicate via input/output coramands. Coramunication requires both the participating processes to issue their commands. Also CSP achieves process synchronization using the input/output commands. Another interešting feature of . this language is its ability to express non-determinism using guarded commands. An implementation of a subset of CSP [73] has been has been succesafully attempted by the authors' research group. Their work on the design of an architecture to execute CSP is reported in [79]. Distributed Processes (DP) [46], developed by Hansen, is proposed for real-time applications controlled by microcomputer networks with distributed storage. In DP, a task consists of a fixed number of subtasks that are started simultaneously and exist forever. A process can call common procedures defined within other processes. These procedures are executed when the other processes are waiting for some conditions to become true. This is the only means of coramunication among the processes. Processes are synchronized by means of non-deterrainistic guarded commands. . / Occam language [55], which is based on CSP has also been designed to support concurrent applications by using concurrent processes running in a distributed architecture. These processes comraunicate through channels. The Transputer [56], also developed by Inmos Corporation, supports the direct execution of this language. Languages which adopt monitor-based solution for synchronization are oriented towards architectures with shared memory. Examples of these languages are Concurrent Pascal [45], Ada [5], and Modula [99]. These languages, in general, support strong type checking and seperate compilation, and express concurrent actions using explicit constructs. Concurrent Pascal [45] extends the sequential programming language Pascal using the concurrent programraing tools, processes and monitors.. The main contribution of this language is extending the concept of the monitor using an explicit hierar'chy of access rights to shared data structures that can be started in the program text and checked by a corapiler. The prograroming language Modula [99] is primarily intended for programming dedicated computer systeras. This language borrows many ideas from Pascal, . but in addition to conventional blook structure, it introduces a so-called module structure. A module is a set of procedures, data types and variables where the programmer has precise control over the naraes that are imported from and exported to the environment [99]. Modula includes general multiprooessing facilities such as processes,. interface modules and signals. In Ada [5], we only have active components, the tasks. Information may be exchanged among tasks via entries. An entry is very similar to a procedure. The call to an entry is like a procedure call; parameters should be passed if the called entry requires them. A randezvous is said to occur when the caller is in the call state and the called task is in the accept state. After executing the entry subprogram both the tasks resume their parallel execution. Ada provides specific and elaborate protoools for task termination. Ada is designed to support reliable and efficient real-time programming. Non-von Neumann Languages Conventional languages imitate the von Neumann computer. The dependericy of these languages on the basic organization of the von Neumann machine ia essentially a limitation to. express and exploit parallelism [10]. These imperative languages perform a task by changing the state of a system rather than raodifying the data directly. In parallel processing applications, it makes more sense to use a language with a 60 nonsequential semantic base. Various paradigms have been adopted and new programming languages based on these approaches have evolved. We will restrict ourselves in this paper to two such paradigms, namely the applicative and the non- procedural style of programming, and the resulting parallel versions of the languages that adopt these approaches. Applicative languages (also referred to as functional languages) avoid side-effects, such as those caused by an assignment statement. The lack of side-effects accounts, at least partially, for the well-known Church-Rosser property, which essentially states that no matter what order of computation is chosen in executing a program, the program is guaranteed to give the same result (assuming termination). This raarvellous determinacy property is invaluable in parallel systems. Another key point is that in functional languages the paralleliam is iraplicit and supported by their underlying semantics. A system of languages known as Functinal Programming languages (FP) [10] and Lisp [68] are two major outcoraes of the applicative atyle of programming. Languages for data flow architectures, which avoid side-effects and encourage single assignment, are also included in the set of applicative languagea. Dennis1 Value oriented Algorithmic Language (VAL) [1], Arvind'3 Irvine Data flow language (Id) [8], and Keller'8 Flow Graph Language (FGL) [57] are candidate examples in this category. Considerable work has been done by us in the area of aplicative programming languages. A high level language for data flow computers, called Data Flow Language (DFL) [72], has been proposed by us, and a compiler to convert the programs written in this language into data flow graphs has been implemented. The concepts borrowed from CSP and DP when embedded into data flow systems results in two new languages for distributed processing, namely Communicating Data Flow Channels (CDFC) and Distributed Data Flow (DDF) respectively [75]. Communication and non-deterrainism features have been added to FP by us {40,41] to strengthen its power as a programming language. We have also proposed that FP can be used as a language for program specification [41]. Although parallelism in a program is expressed by the functional languages in a natural way, their autoraatic detection and mapping to processors do not result in optimal performance. It is desirable to provide the user with the ability to explicitly express parallelism and mapping, retaining the functional style of programming. Languages which allow the programmers to annotate the parallelism and mapping scheme for the target architecture lead to optimal performance on a particular machine. Two languages developed with this motivation are ParAlfl (the Para- Functional language) [52] and Multilisp [44]. Efforts have been taken to exploit the advantages offered by the functional languages to the maximum extent by developing new machines based on non-von Neumann architecture (refer [95] for a recent survey). Applications such as the design of knowledge base systems and natural language processing revealed the inadequacies of the conventional programming languages to offer elegant solutions. The use of predicate logic, which is a high-level human oriented language for describing problera and problem solving methods for computers, promised great scope for these applications. Logic programming languages corabine simplicity with a lot of powerful features. They aeparate the logic and control [59], the two major componenta of an algorithm, and thus enable the programmer to write more correct, raore easily improved and more readily adapted programa. The powerful features of these languagea also include the declarative atyle, the unifioation mechanism for parameter passing and execution atrategy offered by non- deterrainiatio computation rule. The powerful execution mechanisra provided by theae languages is due to the non-procedural paradigm. An outcome of the reaearch carried out in this area with these motivationa ia the design of the language Prolog [19]. Logic programming languagea offer three kinds of parallelism, namely the 'AND', 'OR1 and 'argument' paralleliam [21,43]. The inability of the von Neumann architecture to efficiently execute logic programming language (in eaaence supporting non-procedural paradigm) haa led to the design of many parallel logic prograraming machinea [16,43,70,94,100]. Further research on these languages haa led to the deaign of three parallel logic programming languages, the PARLOG (PARallel LOGic programming) [18], P-Prolog [103] and Ooneurrent. Prolog [84]. With the increasing size and complexity of parallel proceaaing systems, it becomea essental to design efficient operating aysteraa, without which handling of such systems would be impoaaible. The general principlea and requireraenta of multiproceaaor operating aystems are diacuaaed in the following aection. 5.2. Multiproceaaor Operating Systems The basic goala for an operating syatem are to provide programmer-interfaoe to the machine, raanage reaources, provide mechaniama to iraplement policies, and facilitate matching applicationa to the machine. There is conceptually little difference betueen the operating aystem requirementa of a multiprocesaor and thoae of a large computer system with raultiprogramming. The operating sjrstem for a raultiprocesaor ahould be able to support multiple aaynchronoua taska which execute concurrently, and hence ia more coraplex. The functional capabilities of a multiprocessor operating syatem include reaource allocation and manageraent schemea, memory and data set protection, prevention of system deadlock and abnormal prooess termination or exception handling and processor load-balancing. Also, the operating system ahould be capable of providing ayateni reconfiguration schemes to aupport graceful degradation of performance in the event of a failure. In the following diacuasion, we introduce brifly the three baaic configurations, naraely maater-alave, aeparate supervision and floating-supervision systems [53]. In a 'master-slave' configuration, one processor, called the raaater, maintaina the atatua of all processors in the system and apportions the work to all the slave procesaors. Service oalla from the slave proceasors are sent to the master for executive aervice. Only one processor (the raaater) uses the superviaor and ita asaociated procedures. The merit of thia configuration is its simplicity. Houever, parallel processing system which haa the maater-slave configuration ia susceptible to cataatrophic failures, and a low utilization of the alave processora may result if the raaster cannot despatoh processes fast enough. Cyber 170 and DEC System 10 uae thia mode of operation. The master-slave 61 configuration is most effective for special applications where the work load is well- defined. In a 'seperate supervisor system.' , each processor contains a copy of a basic kernel. Each processor services its own needs. However, since there is some interaction among the processors, it is necessary for some of the supervisory code to be reentrant, unlike in the master-slave raode. • Separate supervisor mode is more reliable then master-slave mode. But the replication of the kernel in all the processors causes an under-utization of memory. ' The 'floating supervisor' scheme treats all the proeessors as well as other resources symmetrically or as an anonymous pool of resources. In this mode, the supervisor routine floats from one processor to another, although several of the proceasors raay be executing service routines simultaneously. Conflicts in service requests are resolved by priorities. Table access should be carefully controlled to maintain the systein integrity. The floating supervisor mode of operation has the advantages of providing graceful degradation and better availability of reduced capacity systems. Furthermore, it is flexible and it provides true redundancy and makes the most efficient use of available resources. Examples of the operating systems that exečute in this raode are the MVS and VM in the IBM 3081 and the Hydra [102] on the C.mmp. 6. Conclusions In this survey, we have identified the various issues involved in parallel processing systems. Approaohes followed to solved the associated problems have also been discussed and their relative merit are put forth. The principles and the requirements of language and operating system support for complex multiprocessing systems are elaborately described. For the wide spectrum of architectures proposed in the litarature, their design principles and salient features are brought out in a comparative menner. While the envisaged potentials offer a promising scope for parallel processing systeras for many applicationa, hardly a few systems are commercialized. The reasons fot this is the lack of good software support for these systems. Design of intelligent compilers which can identify parallel subtasks in a program (written in a sequential language), schedule the subtasks to the processing elements and manage coramunication among the scheduled tasks, is a step toward this end. Although there are many existing proposals in this line, none of them seems to achive all the three goals in an integrated manner, relieving the burden from the user completely. Another question that remains unanswered is whether or not to continue with von Neumann approach for building complex parallel processing machines. While familiarity and the past experience with control flow model inake it a proponent candidate, its inherent inefficiencies, such as the explicit specification of control and global updatable memory, limit its capabilities. Although data- driven and demand-driven coraputers exploit maximum parallelism in a program, their complex structure and inadequate software support force the designer to have a second throught on these approaches. With the advent of VLSI technology and RISC design, dedicated architectures are becoming more and raore popular. However, the inapplicability of these systems to a variety of applications causes a serious concern, At the other end of the spectrum, we have general purpose parallel processing systeras which give degrade performance due to the mismatch of the architecture and algorithm, and the reconfigurable machines. Considerable and on design efficient algorithms (for general purpose computing systems) which will bridge the gap between the application program and architecture. Finally, the research on neural computer and molecular machines is at its infancy. Modeling the neural circuits and understanding the functioning of human brain have to be considerable refined before one could make use them for building high speed computing systems. The vast.r.ess of this fascinanting area in which active research is underway, and the innumerable problems that remain to be solved are themselves standing evidences for the proraising future of parallel processing. With the ever-growing greed for very high speed of computing, and with the inability of the switching devices to cope up with the need, parallel processing techniques seem to be the only alternative. 7. References 1. W.B. Ackerman and J.B. Dennis, "VAL - A Value Oriented Algorithmic Language, Preliminary Reference Manual", Tech. Rep. TR-218, Lab. for Computer Science, M.I.T. Cambridge, MA, June 1979. 2. J.S. Albus, Brains, Behavior and Robotics, Byte Books, McGraw-Hill, New York, NY, 1981 . 3. M. Ammamiya, R. Hasegawa and H. Mikami, "List Processing and Data Flow Machines", Lecture Note Series, No.436, Research Institute for Mathematical Sciences, Kyoto University, September 1980. 4. G.R. Andrews and F.B. Schneider, "Concepts and Notations for Concurrent Programming", Computing Surveys, Vol.15, No.1, pp.3-43, March 1983. 5. ANSI/MIL-STD 1815A, Reference Manual for Ada Programming Language, 1983. 6. C.N. Arnold, "Performance Evaluation of Three Automatic Vectorizer Packages", Proc. Int'l Conf. Parallel Processing, pp.235-243, 1982. 7. Arvind and K.P. Gostelow, "A Computer Capable of Exchanging Processors for Time", Proc. IFIP Congress, pp.849-854, 1977 . 8. Arvind, K.P. Gostelow and W. Plouffe, "An Asynchronous Programming Language and Computing Machine", Tech. Rep. TR-114a, Dept. Information and Computer Soience, (Jniversity of California, Irvine, CA, December 1978. 9. J. Backus, "Reduction Languages and Variable Free Programming", Rep. RJ.1010, IBM T.J. Watson Research Center, Yorktown Heights, NY, April 1972. 10. J. Backus, "Can Programming be Liberated from von Neumann Style? A Funčtional Style and its Algebra of Programs", Communications of the ASM, Vol.21, No.8, pp.613-641, August 1978. 62 11. W.L. Bain Jr. and S.R. Ahuja, "Performance Analysis of High-Speed Digital Busses for Multiprocessing Systems", Proc. 8l " Ann. Symp. Computer Architecture, pp.107-131, May 1981. 12. G.H. Barnes, R.M. Brown, M. Kato, D.J. Kuck, D.L. Slotnick and R.A. Stokes, "The ILLIAC IV Computer", IEEE Trans. on Computers, Vol.C-17, No.8, pp.746-757, August 1968. 13. J. Basu, L.M. Patnaik and A.K. Gosnami, "Ordered Ports - A Language Construct for High Level Distributed Programming", The Computer Journal, Vol.30, 1987. 14. K.E. Batcher, "Deaign of a Massively Parallel Prooessor", IEEE Trans. on Computera, Vol.C-29, No.9, pp.836-840, September 1980. 15. K. Berkling, "Reduction Languages for Reduction Machines", Proc. 2ni Int'l Syrap. Computer Architecture, Januar 1975. 16. L. Bic, "A Data-Driven Model for Parallel Implementation of Logic Programs", Dept. Information and Computer Science, University of California, Irvine, CA, January 1984. 17. R.P. Case and A. Padegs, "Architecture of the IBM System 370", Comraunication of the ACM, Vol.21, No.l, pp.73-96, January 1978. 18. K. Clark and S. Gregory, "PARLOG: PARallel Programming in LOGic", ACM Trans. on Programming Languages and Systems, Vol.8, No.l, pp.1-49, January 1986. 19. W.F. Clocksim and C.S. Mellish, Programming in Prolog. Springer-Verlag, New York, NY, 1984. 20. D. Comte, A. Durrleu, 0. Gelly, A. Plas and J.C. Syre, "TEAU 9/7 SVSTEME LAU - Summary in English", CERT Tech. Rep. #1/3059, Centre d'Etudes et de Recherches de Toulouse, October 1976. 21. J.S. Conery and D.F. Kibler, "AND Parallelism and Nondeterminism in Logic Programs", New Generation Computing, Vol.3, No.l, pp.43-70, 1985. 22. M. Cornish, "The TI Data Flow Architectures: The Power of Concurrency for Avionics", Proc. 3r« Conf. Digital Avionics Systems, pp.19-25, November 1979. 23. A.L. Davis, "The Architecture and System Method of DDMl: A Recursive Structured Data Driven Machine", Proc. 5l" Ann. Symp. Computer Architecture, pp.210-215, April 1978. 24. A.L. Davis and R.M. Keller, "Data Flow Graphs", Coraputer, Vol.15, No.2, pp.26-41, February 1982. 25. Deneclor, Inc. Heterogeneous Element Processor: Principles of Operation, April 1981. '26. J.B. Dennis and "A D.P. Miaunas, Preliminary Architecture for a Basic Data Flow Processor", Proc. 2"" Int'1 Symp. Computer Architecture, pp.126-132, January 1975. 27. E.W. Dijkstra, Discipline of Programming. Prentioe-Hall Inc, Englewood Cliffs, NJ, 1976. 28. B.L. Drake, F.L. Luk, J.M. Speiser and J.J. Symguski, "SLAPP: A Systolic Linear Algebra Parallel Processor", Computer, Vol.20, No.7, pp.45-49, July 1987. 29. M.J. Duff, "CLIP-4", in Special Computer Architecture for Pattern Recognition, K.S. Fu and T. Ichigaua (Eds.) CRC Press, 1982. 30. T.Y. Feng, "A Survey of Interconnection Netuorks", Computer, Vol.14, No.12, pp.12- 27, December 1981. 31. A.l,. Kisher, II.T. Kung, L.M. Monier, II. Walker and Y. Dohi, "Architecture of the PSC: A Progrnmmnble Systolic Chip", Proe. 10'" Ann. Symp. Computer Architecture, 1983. 32. J.D. Foley and A. van Dam, Fundamentals of Interactive Computer Graphios, Addison- Wesley, Reading, MA, 1982. 33. J.A.It. ForUes (ind H.W. Wah, "Syst.olic Arrays -- From Concept to Implementation", Guest Editor's Introduction, Computer, Vol.20, No.7, pp.12-17, July 1987. 34. D.D. Gajski and J. Peir, "Essential Issues in Multiprocessor Systems", Computer, Vo.1.18, No.6, pp.9-27, June 1985. 35. C. Ghezzi, "Concurrency in Programming Languages: A Survey", Parallel Computing, Vol.2, pp.229-241, 1985. 36. D. Ghosal and L.M. Patnaik, "Performance Analysis of Hidden Surface Removal Algorithma on MIMD Computers", Proc. IEEE Int'l Conf. on Computers, Systeras and Signal Processing, pp.1666-1675, December 1984. 37. D. Ghosal and L.M. Patnaik, "Parallel Polygon Scan Conversion Algorithms: Performance Evaluation on a Shared Bus Architecture", Computers and Graphics, Vol.10, No.l, pp.7-25, 1986. 38. D. Ghosal and L.M. Patnaik, "SHAMP: An Experimental Multiprocessor System for Performance Evaluation of Parallel Algorithms", Multiprocessing and Microprograraming, Vol.19, No.3, pp.179- 192, 1987. 39. K.P. Gostelow and R.E. Thomas, "Performance of a Data Flow Computer", Tech. Rep. 127a, Dept. Information and Computer Science, University of Califirnia, Irvine, CA, October 1979. 40. A.K. Gosuami and L.M. Patnaik, "An Algebraic Approach Touards Formal Development of Functional Programs", accepted for publication, New Generation Computing. 41. A.K. Goswarai, "Non-Determinism and Communication in Functional Programming Systems: A Study in Formal Program Development", Ph.D. Dissertation, Dept. Coraputer Science and Automation, Indian Institute of "Science, Bangalore, India, October 1985. 42. A. Gottlieb, R. Grishman, C.P. Krushkal, K.P. McAaliffe, L. Randolph and M. Snir, "The NYU Ultracomputer -- Designing an MIMD Shared Memory Parallel Computer", IEEE Trans. on Computers, Vol.C-32, No.2, p.p. 175-189, February 1983. 43. Z. Halim, "A Data-Driven Machine for OR Parallel Evaluation of Logic Programs", 63 New Generation Computing, Vol.4, No.l, pp.5-33, 1986. 44. R.H. Halstead Jr., "Multilisp: A Language for Concurrent Symbolic Computation", ACM Trans. on Programming Languages and Systems, October 1985. 45. P.B. Hansen, "The Programming Language Concurrent Pascal", IEEE Trans. on Softuare Engineering, Vol.SE-1, Nb. 2, pp.199-209, June 1975. 46. P.B. Hansen, "Distributed Processes: A Concurrent Programming Concept", Communications of the ACM, Vol.21, No.ll, pp.934-941, November 1978. 47. J.P. Hayes, R. Jain, W.R. Martin, T.N. Mudge, L.R. Scott, K.G. Shin and Q.F. Stout, "Hypercube Computer Research at the University of Michigan", Proo. Second Conf. Hypercube Multiprocessors, September/October 1986. 48. W.D. Hillis, The Connection Machine. M.I.T. Press, Cambridge, MA, 1986. 49. C.A.R. Hoare, "Monitors: An Operating System Structuring Concept", Communication of the ACM, Vol.17, No.10, pp.549-557, October 1974. 50. C.A.R. Hoare, "Communicating Sequential Processes", Comraunication of the ACM, Vol.21, No.8, pp.666-677, Auguat 1978. 51. J.J. Hopfieed and D.W. Tank, "Computing with Neural Circuits", Science, pp.625- 633, August 1986. 52. P. Hudak, "Para-Functional Progratnming", Computer, Vol.19, No.8, pp.60-70, August 1986. 53. K. Hwang and F.A. Briggs, Computer Architecture and Parallel Processing. McGraw-Hill Book Company, New York, NY, 1984. 54. A.K. Jones and E.F. Gehringer, "Cm« Multiprocessor Project: A Research Review", Tech. Rep. CMU-CS-80-131, Carnegie-Mellon University, July 1980. 55. Inmos Ltd., Occam Reference Manual. 1986. 56. Inmos Ltd., transputer Reference Manual. Ref. No. 72TRN 04802, 1986. 57. R.M. Keller, S. Patil and G. Lindstrom, "A Loosely Coupled Applicative Multiprocessing System", Proc. Nat. Computer Conf., AFIPS, pp.861-870, 1979. 58. W.E. Kluge and H. Schlutter, "An Architecture for the Direct Execution of Reduction Languages", Proc. Int'l Workshop High-Level Language Computer Architecture, pp.174-180, May 1980. 59. R.A. Koualski, ' "Algorithm = Logic + Control", Communication of . the ACM, Vol.22, No.7, pp.424-435, July 1979. 60. D. Krishnan and L.M. Patnaik, "Systolic Architecture for Boolean Operations oh Polygons and Polyhedra", Eurographics, The Europian Association for Computer Graphics, Vol.6, No.3, July 1987. 61. C.P. Kruskal and A. Weiss, "Allocating Independent Subtasks on Parallel Processors", Proc. Int'l Conf. Parallel Processing, pp.236-240, August 1984. 62. H.T. Kung and S.W. Song, "A Systolic 2D Convplution Chip", Dept. Computer Science, Carnegie Mellon University, Tech. Rep. CMU-CS-81-110, March 1981. 63. H.T. Kung, "Why Systolic Architectures?", Computer, Vol.15, No.l, pp.37-46, January 1982. 64. S.Y. Kung, S.C. Lo, S.N. Jean and J.N. Hwang, "Wavefront Array Processors Concept to Implementation", Computer, Vol.20, No.7, pp.18-33, July 1987. 65. R.P. Lippmann, "An Introduction to Computing with Neural Nets", IEEE ASSP, pp.4-20, April 1987. 66. G.A. Mago, "A Network of Multiprocessors to Execute Reduction Languages", Int'1 J. Computer and Information Science, Part 1: Vol.8, No.5, pp.349-385, Part 2: Vol.8, No.6, pp.435-571, 1979. 67. P.C. Mathias and L.M. Patnaik, "A Systolic Evaluation of Linear, Quadratic and Cubic Expressions", accepted for publication, Journal of Parallel and Distributed' Computing. 68. J. McCarthy, et.al., LISP 1.5 Programmer's Reference Manual. M.I.T. Press, Cambridge, MA, 1965. 69. C.A. Mead and L.A. Conway, Introduction to VLSI Systems, Addison Wesley, Reading, MA, 1980. 70. R. Onai, M. Aso, H. Shimizu, K. Masuda and A. Matsumoto, "Architecture of a Reduction Based Parallel Inference Machine: PIM-P", New Generation Computing, Vol.3, No.2, pp.197-228, 1985. 71. D.A. Padua, D.J. Kuck and D.L. Lavrrrie, "High Speed Multiprocessor and Compilation Techniques", IEEE Trans. on Computers, Vol.C-29, No.9, pp.763-776, September 1980. 72. L.M. Patnaik, P. Bhattacharya and R. Ganesh, "DFL: A Data Flow Language", Computer Languages, Vol.9, No.2, pp.97- 106, 1984. 73. L.M. Patnaik and B.R. Badrinath, "Implementation of CSP-S for Description of Distributed Algorithms", Computer Languages, Vol.9, No.3/4, pp.193-202, 1984. 74. L.M. Patnaik, R. Govindarajan and N.S. Ramadoss, "Design and Performance Evaluation of EXMAN: An EXtended MANchester Data Flow Computer", IEEE Trans. on Computers, Vol.C-35, No.3, pp.229-244, March 1986. 75. L.M. Patnaik and J. Basu, "Two Tools for Interprocess Communication in Distributed Data Flow Systems", The Computer Journal, Vol.29, No.6, pp.506-521, December 1986. 76. A. Plas, et.al., "LAU Syatem Architecture: A Parallel Data Driven Processor Based on Single Assignment", Proc. 1976 Int'1 Conf. Parallel Processing, pp.293-302, August 1976. 77. C.V. Ramamoorthy and H.F. Li, "Pipeline Architectures", Computing Surveys, Vol.9, No.l, pp.61-102, March 1977. 78. C.P. Ravikumar and L.M. Patnaik, "Parallel Placement Based on Simulated Annealing", 64 IEEE Int'l Conf. Computer Design: VLSI in Computers and Processors, October 1987. 79. C.P. Ravikumar and L.M. Patnaik, "An Architecture for CSP and its Simulation", Int'l Conf. Parallel Processing, 1987. 80. C. Rieger, "ZMOB: Doing it in Parallel", Proc. Workshop Coraputer Architecture for PAIDM, pp.119-125, 1981. 81. A. Rosenfield, "Parallel Image Processing Using Cellular Arrays", Coraputer, Vol.16, No.l, pp.14-20, Janyary 1983. 82. R.M. Russel, "The Cray-1 Computer System", Communications of the ACM, Vol.21, No.1, pp.63-72, January 1978. 83. C.L. Seitz, "The Cosmic Cube", Communication of the ACM, Vol.28, No.l, pp.22-33, January 1985. 84. E.Y. Shapiro, "Concurrent Prolog: A Progress Report", Computer, Vol.19, No.8, pp.44-58, August 1986. 85. L. Snyder, "Introduction to Configurable Highly Parallel Computer", Computer, Vol.15, No.l, pp.47-64, January 1982. 86. J.S. Squire and S.M. Palais, "ProKrammintf and Design Conaiderations for a llighly Parallel Computer", Proc. AFIP Conf., Vol.23, pp.395-400, 1963. 87. J. Silc and B. Robič, "Efficient Static Dataflow Architecture for Specialized Computatins", Proc. 12'" IMACS World Congress on Scientific Computing, Paris, France, July 1988. 88. K.R. Traub, "A Compiler for M.I.T. Tagged- Token Data Flow Architecture", M.S. Thesis, M.I.T., Cambridge, MA, 1986. 89. P.C. Treleaven, "Principle Coraponents of a Data Flow Computer", Proc. 1978 Euromicro Symp., pp.366-374, October 1978. 90. P.C. Treleaven and G.F. Mole, "A Multiprocessor Reduction Machine for User- Defined Reduction Languages", Proc. 7'" Int'l Symp. Computer Architecture, pp.121- 130, 1980. 91. P.C. Treleaven, D.R. Brownbridge and R.P. Hopkins, "Data-Driven and Demand-Driven Computer Architecture", Computing Surveys, Vol.14, No.l, pp.93-143, March 1982. 92. D.A. Turner, "A New Implementation Technique for Applicative Languages", Software Practice and Experience, Vol.9, No.5, pp.31-49, September 1979. 93. J.D. Ullman, Computational Aspect of VLSI, Computer Science Press, 1984. 94. S. Umeyama and K. Taraura, "Parallel Execution of Logic Programs", Proc. 10'* Ann. Symp. Computer Architecture, pp.349- 355, 1983. 95. S.R. Vehdahl, "A Survey of Proposed Architectures for the Execution of Functional Languages", IEEE Trans. on Computers, Vol.C-33, No.12, pp.1050-1071, December 1984. 96. A.G. Venkataramana and L.M. Patnaik, "Design and Performance Evaluation of a Systolic Architecture for Hidden Surface Removal", accepted for publication, Computer and Graphics. 97. A.G. Venkataramana and L.M. Patnaik, "Systolic Architecture for B-Spline Surfaces", accepted for publication, International Journal of Parallel Programming. 98. I. Watson and J. Gurd, "A Prototype Data Flow Computer with Token Labelling", Proc. Nat. Computer Conf. New York, AFIPS Press, pp.623-628, 1979. 99. N. Wirth, "Modula: A Programming Language for Modular Multiprogramming", Software Practioe and Experience, Vol.7, No.l, pp.3-35, January 1977. 100. M.J. Wise, "A Parallel PROLOG: The Construction of Data-Driven Model", ACM Conf. LISP and Functional Programming, 1982. 101. W.A. Wulf and C.G. Bell, "C.ramp — A Multi-miniprocessor", Proc. AFIPS Fall Joint Computer Conf., Vol.41, pp.765-777, 1972. 102. W.A. Wulf, et.al., "Overvieu of the Hydra Operating System", Proc. 5«* Syrap. Operating System Principles, pp.122-131, November 1975. 103. R. Yang and H. Aiso, "P-Prolog: A Parallel Logic Language Based on Exclusive Relation", New Generation Computing, Vol.5, No.l, pp.79-95, 1987..