Guide to Computer Architecture and System Design--CO-PROCESSORS SCENARIO



Home | Forum | DAQ Fundamentals | DAQ Hardware | DAQ Software

Input Devices
| Data Loggers + Recorders | Books | Links + Resources





1. INTRODUCTION

The increase of the hardware functionality circuits with the pipeline strategy necessarily paved the way for CISC (complex instruction set computer) systems. The RISC (reduced instruction set category) specially increased the processor in-storage and met the constraints of parallel processing with a synchronized approach on microprogrammed machines. In either case, obviously, there is an implication of far below the expected computational power. The application needs call for additional processors of similar or different nature of organizations to contribute for co-operative processes in order to obtain optimum performance figures.

The coprocessors can be an I/O (input-output) processor or an arithmetic processor depending on the application demands. Thus, computers today employing 2 or more processors of same or different nature can be referred to as a co-processing environment on well co-ordinated operating systems. The above statement indicates an altogether different approach as compared to the SIMD (Single instruction multiple data) or the RISC machines. The communication processor assists data flow on batch machines whereas arithmetic coprocessors meet the turn around time of dedicated computing tasks. The idea of co-processing avoids the bus bottleneck problems, improves the data band width utility and innovates newer configuration management techniques. The data base management system of an interactive nature has to take care of deadlocks and livelocks. A critical region in a program refers to a section of code that is executed with exclusive access to the shared data. Livelock is a situation that results in no advancement in the computing process. Software engineering for parallelism orients itself to avoid deadlock and livelock; to prevent unwanted race conditions; to restrict the creation of too many parallel constructs; and to detect program termination.

In parallel programming environment, more emphasis must be placed on writing code that is correct because of tremendous difficulty which may creep in at debugging time.

Program organization in terms of pure mathematical functions enhances possibilities for correctness and automatic compiler checking.

The chapter is further organised as follows: The trends of VLSI needs followed by networking configuration examples. The 8087 coprocessor abilities are touched upon.

The TMS 320 family of digital signal processors having a specific orientation is discussed later.

The fault-tolerant computing aspects to CMOS domain is presented and concludes with local area network.

2. HARDWARE REPLACEMENT FACT

The two basic hardware schemes for implementing software functions are Read Only Memories and the hard-wired logic gates using MSI (Medium scale integration) circuits or the field programmed PLAs (programmable logic arrays). See fig. 1.


Fig. 1 High speed PLA using Schottky TTL or ECL.

Programmable logic arrays are suitable for sorting, Fast Fourier Transforms (FFT), and floating-point arithmetic. Intersil IM 5200 or Signetics 82S100 are used in random logic functions to interface the external LSI unit or bulk-storage memory (disk or tape) to the microprocessors. Research on computer architecture has been stimulated by the research on software engineering innovations in VLSI technology. The hardware must support functionality for portability that is an important feature for parallel architecture such as data flow machines.

Existing super-computers used techniques such as pipelined processing (TIASC and CDC Star-1 (0), vector processing (CRAY-1 and Cyber-205) , and array processing (Illiac IV, BSPand MPP) to provide high performance and achieve processing rates of hundreds of mega-flops.

3. SAMPLE SYSTEMS

BBN Butterfly parallel processor:

Initially developed as a high-speed network switch, the Butterfly has been used in a variety of applications like fluid dynamics, image understanding and data communications.

Each processor node consists of : Motorola MC 68020 CPU and MC 68881 Floating point coprocessor; 1 to 4 Mbytes of local memory; A process node controller (PNC). An I/O bus; A butterfly switching interface. The number of switching elements for an n-processor system increases at a rate of n log4 n which is significantly better than the n2 complexity of a crossbar switch. The development environment (compilers, libraries, etc.) reside on the host. Programs for the Butterfly are written on the front-e;J.d- computer, either a DEC VAX or SUN workstation, running under UNIX 4.2 BSD. It is as of now a single user system of conventional RISC architectures. OCCAM is an interesting language, that shall evolve to support massively parallel architectures, particularly architectures without global storage as of the transputer domain.

IPSC (Intel personal super computer)

Hypercube

For any hypercube, if d is the dimension of the cube, it has d nearest neighbors and

2d nodes implies 16 nodes.

d = 4

The average distance between any two nodes is d/2, and the maximum distance is d.

I/O (input-output) sub system design attains great significance in multimedia applications. Research has invaded techniques of designing I/O systems that can handle real-time demands of multimedia computing.

The IBM 360 third generation systems are mainly employed for batch inputs on a single language environment at any point of time. In order to achieve the targeted throughput, they employ communication (input-output) processors which are often called as channels for data flow. Byte-multiplexing and block multiplexing using direct memory access method are used inter-wovenly on this bench for resource utility and portability issues.

4. 8087 CO-PROCESSOR

Intel 8087 is a math co-processor used along with 8086/8088 microprocessor to enhance the computing abilities of the system. The programmer of 8086 enjoys the privilege of the 8087 coprocessor capabilities with only the instruction set of the 8086 multiprocessing domain. The parallelism is accounted for by pipelining and memory segmentation tactics thus ensuring fast arithmetic and a good throughput. This configuration could be more called as co-operative processing to meet the bus bottlenecks in compute organizations. The Intel 8087, a numeric processing unit, works in conjunction with either an Intel 8086 or 8088 microprocessor in order to support floating-point notations. The 8087 is a coprocessor to enhance the computing capacity of 80 x 86 systems.

The (proposed) IEEE Binary Floating-point standard and the 8087 numeric processor are very closely related. One important numeric application is graphics. Robotics also require rapid responses involving matrix multiplications. Speech recognition, image processing and echo cancellation in telecommunications require moderate accuracy but very fast arithmetic. The 8087 can perform operations on 16-. 32- and 64- bit integer data and can also accommodate an 80- bit decimal format consisting of a sign bit and 18 decimal digits.

There are seven unused bits with a sign bit. COBOL language standard requires only 18 decimal digits (the predominant language in data processing applications). Real numbers are represented by making use of a sign bit, followed by the exponent field, followed by the significand field. The sign bit specifies the sign of the significand. The value of a floating-point number is

(-1)^sgn * S 2^E

where sgn is the value of the sign of the sign bit, S is the value of the significand (also called mantissa), and E is the value of the exponent (also called characteristic). The coprocessor 8087 contains the same number of address/data pins as 8086 for usage. The co-processing approach in a way aims at pipelined parallelism where a CISC (complex instruction set computer) has to achieve the same by more space complexity. The 8087 has eight arithmetic registers, each 80 bits wide. These registers are used as a stack and environment contains a 3 bit field ST (stack top) that indicates the chosen register. The assembler mnemonics for all 8087 instructions start with an F (for floating point) so they can be readily discerned from 8086 instructions. Though apparently stack architecture is compact for accessing, the book-keeping becomes a crucial issue with programmers. 8087 does support special arithmetic instructions like square root, partial tangent, etc.

Trigonometric, logarithmic and exponential functions are supported to its ability. With the available programming skills, FORTRAN compilers for the 8087 have intrinsic functions to perform load and store the control words. Intel's new Pentium processor supports both fast computing and database applications. These possess continued software compatibility with high performance figures.

The T.90 series, first-ever wireless supercomputers (formerly code-name Triton) of CRAY systems carry 1 to 32 processors and provide up to 60 billion calculations per second of peak computing power.

5. DIGITAL SIGNAL PROCESSING

Digital Signal Processing (DSP) involves the representation, transmission, and manipulation of signals using numerical techniques and digital processors. Digital communications and computing offer better reliability and efficiency as compared to their analog counterparts in the field of signal processing. Thus the very fast compute capabilities of the TMS 320 series of processors with software support for emulation and simulation has invaded widely varying applications. They include speech processing, telecommunications, defense research, bio-medical engineering domains and graphic work stations.

The TMS 320 products are 16/32-bit single-chip mini-computers applying the array processing concept with a tremendous I/O (input-output) strength. The TMS 32020 Macro assembler allows TMS 32010 source code to be executed for upward-compatibility and has a compact instruction set of 109 that allows ease of software development. The TMS 32020 is fabricated in a 4f.l NMOS technology and has a chip area of 119k square mil. It is produced in a 68-pin grid array package and has a typical power consumption of 1.2 w. The maximum clock frequency is 20.5 MHZ for an instruction rate of five million instructions per second. The TMS 320 oc 25 offers faster instruction time of 100 ns. produced using CMOS version. The development tools range from very inexpensive evaluation modules, assembler/linkers, and software simulators.

Some common digital signal processing routines and interface circuits are frequently used. For example, the same structure of a digital filter used for audio signal processing may also be used for a modem in data communications. A Fast Fourier Transform (FFT) routine can be used for analyzing signals both in instrumentation and speech coding.

Digital filters can meet tight specifications on magnitude and phase characteristics and eliminate voltage drift, temperature drift, and noise problems associated with analog filter components. The two methods used are finite impulse response (FIR) and infinite impulse response (IIR) filters with the TMS 320 family of digital signal processors.

A high-speed numeric processor, such as the TMS 32020 digital Signal processor, may serve as a coprocessor with a slower yet capable host in a computer system. The TMS 32020 is capable of performing numeric functions, such as a multiply-accumulate, in a Single cycle (200ns). Other 16-bit processors such as the Motorola Me 68000 have other qualities such as " Supervisor and user modes" which endorse them to be host processors. The applications of the MC 68000 - TMS 32020 interface include speech processing, spectrum analysis and graphics library. The design of a full-duplex 2400-bit per sec vocoder implementing an LPC (Linear predictive coding) algorithm in real-time making use of TMS 32010 is discussed in published proceedings.

The ADSP-2100 family of processors are programmable single-chip microcomputers optimized for digital Signal processing (DSP) and other high-speed numeric processing applications. They support serial interfaces to be used with personal computers. The ADSP 2181 is noteworthy to be mentioned for it has gained entry on laboratory workbenches. The program memory is organised as 24 bits wordlength. The data address generators (DAG) provide memory addresses when memory data is transferred to or from the input /output registers. With two independent DAGs, the processor can generate two addresses simultaneously for dual operand fetches. ADSP cards are now coming up for in-circuit emulation for portable environments. In addith10n, the ADSP helps for micro computer diagnostics development system especially to run-time events (debuggers). The C compiler reads ANSI C source and outputs ADSP language mnemonic~ that is ready to be assembled. It also supports inline assembler code. Signal processors demand fast and flexible arithmetic, extended dynamic range, hardware circular buffers as used in filter algorithms and zero-overhead in looping and branching operations.

6. CMOS AND FAULT-TOLERANCE

Complementary Metal Oxide Semiconductor (CMOS) technology has become popular because of its low-power requirement and high density. The complexity of testing increases with circuit density. Testing in the context of digital systems is defined to be the process by which a defect can be exposed. Permanent faults (stuck-at) are desirable candidates for design for testability during the manufacturing process. Error-detecting cades are widely employed in fault-tolerant computer systems. In particular, the inputs and outputs of a self-checking circuit are assumed to be encoded with a suitable error detecting code. The choice of the code depends on what errors are most likely.

Programmable logic arrays (PLAs) are desirable to be thoroughly tested for they offer flexibility of combinational circuit synthesis. At times, by software means the failures of a system are detectable, achieving importance in fail-safe systems. Several factors contribute to reliability measures in software engineering.

7. LOCAL AREA NETWORKS (LAN)

Computers have become affordable due to the LAN development that allows sharing of peripherals and costly resources, that is, information. The growth of LAN is clearly an indication of the improved practices in digital communications. Creating backups, security provision and establishment of standards are the potential problems for network users.

The data communications tactics make extensive use of graph theory for network connectivity and routing algorithms. Thus, the co-processors may take a dominant role in the present age of information technology in all of its forms. The information provided of this bookwork hope to reach many aspiring computer learners.

TERMS

Co-processing, Livelock; Occam, Hypercube, multimedia computing; Intel 8087; VOCODER; PLAs.

QUIZ

1. Explain what do you mean by a coprocessor in at least two ways.

2. Write, in brief, the butterfly parallel processor.

3. Describe, in detail, the co-processor 8087 for 80 x 86 systems.

4. Explain the importance of FIR and IIR filters for digital signal processing applications.

5. Discuss elaborately the Design for test of VLSI towards testability and fault-coverage.

6. What is a computer network?

7. Write notes on TCP/IP high level protocol for LANs environment.

RESOURCES:

Andrew Holck and Wallace Anderson,"A single-processor LPC vocoder", ICASSP 184, proceedings, San Diego, CA, March 19-21, 1984.

Baer,Jean-Loup, "Computer Systems Architecture". Computer Science Press, Inc., 1980.

Balzers, R.M., "EXDAMS, Extendable Debugging and Monitoring System", Proceedings of the spring Joint Computer Conference (AFIPS Press, 1969) pp.567-80.

Conte, G. and Del Corso, D., "Multi-microprocessor systems for real-time applications", Dordrecht, the Netherlands; D. Reidel, 1985.

David J.Kuck., "Structure of Computers and Computations", John Wiley, NAVS 1978.

Dias, nM., and Jump, J.R., "Packet switching interconnection networks for modular systems", IEEE Comp., Dec. 1981 b, pp.43-53.

Ellingson, G.E., "Computer program and change control", Record of 1973 IEEE symposium on computer software reliability., 1973, pp.82-89.

Feitelson Drar G., "Optical Computing - A survey for Computer Scientists", The MIT Press Cambridge, 1988.

Hedlund, K.S., "Wafer scale integration of parallel processors", Ph.D Thesis, Compo Science Dept., Purdue Univ. Ind., 1982.

Kim M.Y., "Synchronised Disk Interleaving", IEEE Trans. on compute., Vo1.35, No. II, Nov.1986, pp. 978-988.

Palmer J, "'The Intel 8087 Numeric Data Processor", proceedings of the seventh annual international ACM symposium on computer architecture, 1980.

Palmer John E, and Stephen P. Morse, "The 8087 Primer" John Wiley & Sons., Inc., 1984.

Slotnick, NL., Borck, w.c., and McReynolds, R.C., "The Solomon Computer", Proc. of AFIPS Fall joint Compo Conf., Wash. D.C., 1962, pp.97-107.

Wakerly J., "Error detecting codes, Self checking circuits and applications", Elsevier North - Holland, Inc., New York 1978.

Willis Neil, "Computer Architecture and Communications", Paradigm Publishing Ltd., 1986.

PREV. | NEXT

Related Articles -- Top of Page -- Home

Updated: Saturday, March 11, 2017 11:13 PST