ABSTRACT

A computer usually consists of four major components: the arithmetic-logic unit (ALU), the main memory

unit (MU), the input/output unit (I/O), and the control unit (CU). Such a computer is known as a

uniprocessor since the processing is achieved by operating on one word or word pair at a time. In order to

increase the computer performance, we may improve the device technology to reduce the switching (gate

delay) time. Indeed, for the past half century we have seen switching speeds improve from 200 to 300 ms for

relays to present-day subnanosecond very large scale integration (VLSI) circuits. As the switching speeds of

computer devices approach a limit, however, any further significant improvement in performance is more

likely to be in increasing the number of words or word pairs that can be processed simultaneously. For

example, we may use one ALU to compute N sets of additions N times in a uniprocessor, or we may design a

computer system with N ALUs to add all N sets once. Conceptually, such a computer system may still consist

of the four major components mentioned previously except that there are N ALUs. An organization with

multiple ALUs under the control of a single CU is called a parallel processor. To make a parallel processor

more efficient and cost-effective, a fifth major component, called the interconnection network, is usually

required to facilitate the interprocessor and processor-memory commu.nications. In addition, each ALU

requires not only its own registers but also network interfaces; the expanded ALU is then called a processing

element (PE). Figure 17.1 shows a block diagram of a parallel processor.