ETA Systems Hardware Technologies (1983-88)

ETA Systems Hardware Technology

Preface

ETA Systems Inc. was spun off as the Supercomputer subsidiary from a struggling Control Data Corporation (CDC). The objective was to develop and manufacture High Performance Computers or commonly called in the 80’s and 90’s simply Supercomputers. Cray Research Inc. dominated this market during this time frame and CDC had a minor market position introducing the Star-100 followed by the CYBER-203 and CYBER-205 systetms. Novel architecture (fast scalar performance and the efficient use of vectors), innovative software and highest performance integrated circuit (resulting in the fastest clock period), innovative packaging (to optimize device spacing and thermal management) differentiated Supercomputers from conventional computer systems during this period. It must be stated to be "fair and balanced" that Supercomputers also had the highest price tag and demanded the largest memories and highest performance peripherals and system bandwidths. Systems dominating the market during the 80’s were the Cray-1, Cray XMP and CYBER-205. NEC, Fujitsu and Hitachi also developed systems in this market. The word Supercomputer was applied to other products as well. It is not intentional to dismiss their recognition.

The following overview will not enter into the decisions to separate ETA Systems from Control Data Corporation organizationally, although that topic is interesting as well. Nor will the following discuss software innovations at ETA Systems – and there were many. Architecture had a role in dictating the technology in terms of number of logic circuits that were serial per clock cycle. Architecture also demanded high performance large registers (temporary storage devices) to be included which also dictated performance (clock cycle) of the system. Other architecture features (instructions) dictated the number of functions that constituted a processor (gates / CPU) that, in turn, determined technology selection from a point of preferred Gates per Chip and Ports per Chip. Proximity of chips to each other for processor design was crucial during this time period since a CPU could not reside within the boundary of a single chip as it easily does today. Bandwidth, i.e.; number of bytes per unit of time that could be moved between functions within the CPU and the CPU and associated memory is key and places demands on pins or logic paths between functions that usually requires compromise in each and every design. Those reading this now find this humorous I am sure with multiprocessing units (multi CPUs) now residing within the boundaries of a single chip or IC die. In the 80’s and well into the mid 90’s, however, a CPU processor partitioning of necessary logic or Boolean functions on multiple integrated circuit chips (usually multiple hundreds of chips) and multiple complex printed circuit boards (2 to 8) was an integral part of determining the overall performance, power consumption, cost and reliability of the system.

Introduction

ETA Systems technology was selected in 1980 (the organization was the Advanced Design Laboratory of Control Data Corporation at the inception) with the following objectives:

1. The highest performance Supercomputer at the time of product delivery

2. The most cost effective technology available

3. The lowest possible power consumption while meeting other objectives

4. The largest product diversity with a single design

5. The highest possible reliability. pins and interconnects usually dictated the reliability since by that time Integrated Circuit technology reached a very high reliability for both logic and storage devices

6. Leverage as much of the technology as possible to the follow-on computer generation. This usually fell by the wayside until ECAD and MCAD technologies were introduced into the design.

7. Utilize only standard IC technology processes being developed for other markets

8. Demonstrate the prototype of the product in less than four years

Digging deeper into objectives

The highest performance Supercomputer at the time of product delivery
Simply stated, the highest performance processor solved the largest problems most effectively. Performance was usually measured in clock cycle that was unfortunate since variable amounts of calculations could be made per clock cycle. This single parameter was the bragging rights although later Gigaflops became the stated parameter and that also did not necessarily reflect the true performance of a supercomputer. The “king of the hill” at any given cycle (usually 2 to 4 years) held the largest market share.

The most cost effective technology available
The most “bang” for the buck applied to Supercomputers as well as other markets. The customer was willing to get the highest performance when solving his particular challenges as a higher priority provided that the performance clearly exceeded lower cost alternatives. A legend of the Supercomputer industry - Jim Thornton - once described the requirement as getting through an intersection without having an accident. Since a Supercomputer required so many components and interconnects, they were bound to fail more rapidly than small computers. So - Jim surmised, the faster the computer, the more things that could be solved before something went wrong. Go through an intersection as fast as possible - not at a slow rate and you have a better chance of getting through safely.

Each follow-on generation of Supercomputer products witnessed increased power per processor, which was justified by the resultant performance realized. (The lower the RC time constant (resistance - capacitance) the faster the computer clock cycle.) Since the lower the R, the higher the power, this was a trend. As multi-processor units increased per system the power consumption became a major issue; the largest users for site related “wall plug” power capacity limits and for the small users for basic life-of-system cost concerns (mainly power consumption, cooling and system reliability).

Design cycles were three to five years in duration and large teams were assembled to complete a single design. Development costs per product were significant. Product cost ($2M to $40M per system) and performance ranges (greater than 20:1) for each generation of Supercomputers were increasing. Since optimized cost points for each product were technology dependent, this required multiple design teams – each design utilizing a unique technology. Desire to utilize a single total design, i.e., packaging, IC selection, manufacturing tooling, and associated boards, connectors, etc. was desirable, therefore, for a myriad of obvious reasons. In fact, most companies were focused on only a portion of the computer market and dedicated to only a small portion of the product "bandwidth". Cray Research focused on the high end, IBM, CDC, Unisys and others did an admirable job in the middle and companies like DEC and HP were at the lower end. There were others too across the world, but these companies are only examples.

The highest possible reliability
Supercomputers required significant bandwidth to pass data between processing units and processors and memory. Bandwidth is a key differentiator that separates true Supercomputers from conventional computer systems. Interconnects were and still are dominant reliability concerns in large systems. Thermal management is also significant. Large systems, by the nature of the design require a large quantity of components to be simultaneously functional. Each interconnect and each active logic device (integrated circuit) requires the highest reliability to permit large user problems to be solved using Supercomputers. Simply stated; Supercomputer operational time had to exceed the size of the largest customer problem.

Leverage as much of the technology as possible to the follow-on generation
Significant cost for development of a given technology “kit” had to be leveraged for as long a period as possible. It was a “given” that most Integrated circuits would be developed for each generation of computer. What about interconnects (connectors)? What about printed circuit boards? What about support technologies like simulation tools, assembly tooling and basic packaging? Can any of these technologies extend to the next generation? And, are there any “mid life kickers” that could be inserted into a successful product to extend its market life? IBM did exceptional work in taking hardware across product boundaries and generations of new products. The initial tooling for packaging was expensive but results in later products appeared to prove dividends. Cray Research Inc. and Control Data, by contrast, generated new packaging and connector technology with each new generation of product.

Utilize only standard IC technology processes being developed for other markets
This addresses two major issues, cost and access to technology. Cost, since dedicated IC processing lines with unique processes for low volume products – even if it could be realized – would not allow effective amortization of costs for process development and manufacturing. Access to technology addresses advanced popular (high volume) processes to accommodate unique system designs. Innovation in the IC industry was and is applied to the highest return on investment markets. The “trick” was to apply this ”standard” and most innovative technology to low volume Supercomputer applications.

Demonstrate operating product prototypes in less than four years
This requires discipline as well as good management and leadership. Due to the complexity of Supercomputers, we felt that tools had to be upgraded significantly and checkout and diagnostics improved as well. At any given time improvements are made in Supercomputer technologies. Allowing each incremental development to be incorporated extends the product development cycle significantly. Selecting known (proven) technologies at the time of product development initiation results in a non-competitive product. Risk must be taken. What areas to take risk (technologies that are not available at product initiation but look the most promising) as well as the return on the risk must be carefully evaluated with factors clearly understood. Where to invest in new technology development must be understood as well as the return on investment and the leverage of investment where other markets are interested in common technologies must be understood. Back-up alternatives should be identified. The CEO of Cray Research - Jon Rollwagon - defined the challenge as "How many 3-point shots should each project take?" Missing market cycles is costly. Under reaching (conservative) and over-reaching (bad choices in “betting on the come”) were also costly and prohibitive. All of these factors were carefully evaluated. It might be added, using the same basketball comparison, cannot have too many “lay ups” (sure things already developed) either!

Results

Before getting to the details as to how decisions were made and how the ETA System technologies the “kit” was selected and developed, a list of noteworthy accomplishments achieved are listed:

First Industry competitive CMOS Supercomputer CPU
Since 1995 – to the present (beginning 12 years after the technology selection by ETA Systems I might add) ALL HPC (High Performance Computers) are developed and manufactured using CMOS IC technology. Until as late as 2000, bipolar technology (higher power, more costly to manufacture and lower gate count per chip) dominated high performance computers throughout the world.

First Industry Single Board CPU
The chip density (gates per chip) allowed by advanced CMOS, the use of layout and design Computer aided design tools for optimum layout and simulation, the successful design of a 45 layer advance Printed Circuit board (you read it right 45 layers) and innovative chip attachment and cooling permitted a single processor containing nearly 3 million gates to be packaged on a single board

First Industry system to be designed with self-test
CPU Processing units (≈3Million gates each) were validated for functionality and performance in less than 4 hours. Any interconnect errors were recorded and allowed chip-to-chip replacement to occur in a minimal time. Other CPU checkout during this same period required weeks to months to check out and validate a processing unit. Incoming testing of the logic IC Chip (function and performance) also used the same self-test innovations.

First Industry production Liquid Nitrogen CPU
The ETA Systems CPU was immersed in Liquid Nitrogen – 77 degrees Kelvin – to improve performance greater than two times that CMOS technology operated at room temperature – 300 degrees Kelvin.

First system at CDC to fully utilize Computer Design Software to design Chips, boards, validate Logic design and Auto Diagnostic test the system with Synergistic tools

Permitted checkout of a CPU to be completed in less than 4 hours. Manufacturing costs were greatly reduced. This technique was also used at the IC Supplier and greatly reduced any probe test hardware and software.

First Industry system to have multiple cost designs from single design effort

Performance range of the ETA System products was greater than 24:1 (8 processor system operating at 7 nanoseconds Clock period and a single processor system operating at 24 nanoseconds.). Processors were manufactured, tested and validated from a single manufacturing line using identical components. (IC Chips were performance sorted using auto self test). Product differences began at the system packaging level.