First-Hand:Cryo CMOS and 40+ layer PC Boards - How Crazy is this?

From ETHW
Revision as of 16:38, 16 April 2009 by Vacca (talk | contribs)

How it started

It was in the early 80's.  Control Data (CDC) had just launched the CYBER - 205 with modest success and the team was now focused on the next generation machine, the 2XX as I recall.  Speed, cost and meeting the schedule were all key objectives.  Speed because Cray Research under the guidance of Seymour Cray was setting  milestones for Supercomputers with the Cray 1 and then the Cray 2.  Cost, since Supercomputers were extremely expensive.  Schedules since the CYBER - 205 had established patience records as a machine that may never get out the door and this must not be repeated.

A conventional evolutionary approach for Integrated Circuit (IC) logic was initially selected.  Motorola, with some prodding, agreed to launch an 8,000 gate equivalent ECL (emitter-coupled-logic - the circuitry of choice for high performance processing units) provided that Control Data do the actual circuit development.  There were insufficient customers for Motorola to commit their resources to this lofty development.  Motorola did, however, commit their advanced ECL processes to CDC and a joint team was formed with the two companies.  

Logic designers at the CDC Advanced Design Laboratory were given preliminary design rules based on computer device  models and estimates of gate per chip densities.  There was a natural follow up of grumbling by the logic design team led by very experienced and innovative folks (Ray Kort, Maurice Hudson and Dave Hill to name three)  but circuit designers had learned to accept this since logic designers always found the circuits to be too slow and insufficient an quantity of gates and pins (I/O ports) per die. There was a lot of cooperation too.  Basic building blocks were defined by the logic designers - gate functionality, register functionality, etc. From this set of preliminary rules  function blocks were defined and capacity per reasonably-sized Printed Circuit (PC) boards defined. The initial design using the Cray CYBER - 205 based architecture was launched.


In parallel with this effort, and in the same design group; i.e.; circuit, packaging, PC board and newly formed CAD (tools for layout and design of chips and boards) -  chief chip design engineer - Randy Bach - was assigned to develop an advanced CMOS chip for the Canadian Computer Development organization.  At this time, early 80's CMOS was in it's infancy being used for memory devices, low performance peripherals and also for low performance microprocessors (5 to 10 MHz clock speeds).  The design contained 5,000 gates plus appropriate input and output communication devices.  Gate arrays for CMOS was also nearly non-existent so Randy and his small team of two assistants developed a cell library and worked closely with the Canadian Development team to meet their objectives as well.  

This effort was completely separate from the ECL based gate array to be used for the next generation Supercomputer.  The product was developed for a low cost application.


It was customary for Neil Lincoln - chief architect, Dale Handy - manufacturing manager and me to go off to lunch every 8 to 10 days to discuss status at either Author Treacher's Fish & Chips or Zantigo's (high class - NOT - fast food restaurants).  As a side note, both of these fast food places disappeared during the ETA Systems brief duration.    Zantigo's has returned (I think because they know it is safe now that the three of us cannot visit together any longer - Neil unfortunately passed on a few years ago).

At one of these meetings, Neil had "news" for me.  Simply stated, the gate array in co-development with Motorola had unacceptable goals.  The chip had too few I/O pins, consumed too much power and insufficient gates. He had determined that the CPU (some 3 Million gates) had to be assembled on a single board.  "It was time for this to be done".  He also reached the conclusion that the logic design required at least 15,000 gates per chip to meet these goals.  

The logic designers had gotten to him I surmised. Schedules, Neil reminded us, could not be altered - and that was that.  To soften the blow he bought lunch that day, three Cokes and three orders of fish and chips - Neil's was a large order.

The trip back to the lab was pretty quiet, fortunately short since our eating places were all very close to the lab.


that afternoon, I assembled the key folks - I might miss one or two but Randy Bach, Doug Carlson, Dave Resnick and John Ketzler were four that I recall now.  Doug was a mechanical engineer that I assigned the Motorola project to because of his management skills - something he probably never forgave me for - John was the key circuit engineer on the Motorola project and Dave was and still is a very versatile and perceptive engineer.  

Doug and I would inform Motorola of the decision not to continue.  The team would package up what was accomplished and turn it over to Motorola to carry the ball forward if they wished.  As a side note, Motorola and Cray did continue the design.  It was the circuit design used in the Cray C90, a very successful computer.

The meeting turned to what were the next steps.

The key challenges that emerged were: 

  • IC Technology that could meet the new lofty goals
  • The PC board technology required to meet a single board CPU
  • Packaging and interconnect technology required to support the two above requirements
  • Computer Aided Design (CAD) technology necessary to accurately design IC and PCB technologies
  • Suppliers for all - do they exist?
  • What additional internal resources were required to achieve objectives
  • System packaging beyond a single CPU. (Memory, peripherals, I/O, etc.)
  • Testing of complex IC technology and complex PCB technology

== Summary of IC technology accomplishments ==


ETA Systems Hardware Technology <o:p></o:p>1980 – 1989<o:p></o:p>

Preface:<o:p></o:p>

To restate the challenge: ETA Systems Inc. was spun off as the Supercomputer subsidiary from a struggling Control Data Corporation (CDC).  The objective was to develop and manufacture High Performance Computers or commonly called in the 80’s and 90’s simply Supercomputers.  Cray Research Inc. dominated this market during this time frame and CDC had a minor market position introducing the Star-100 followed by the CYBER-203 and CYBER-205 systems.  Novel architecture (fast scalar performance and the efficient use of vectors), innovative software and highest performance integrated circuit (resulting in the fastest clock period), innovative packaging (to optimize device spacing and thermal management) differentiated Supercomputers from conventional computer systems during this period. It must be stated to be "fair and balanced" that Supercomputers also had the highest price tag and demanded the largest memories and highest performance peripherals and system bandwidths. Systems dominating the market during the 80’s were the Cray-1, Cray XMP and CYBER-205.  NEC, Fujitsu and Hitachi also developed systems in this market.  The word Supercomputer was applied to other products as well. It is not intentional to dismiss their recognition.<o:p></o:p>

The following overview will not enter into the decisions to separate ETA Systems from Control Data Corporation organizationally, although that topic is interesting as well.  Nor will the following discuss software innovations at ETA Systems – and there were many.<o:p></o:p>

Architecture had a role in dictating the technology in terms of number of logic circuits that were serial per clock cycle.  Architecture also demanded high performance large registers (temporary storage devices) to be included which also dictated performance (clock cycle) of the system.  Other architecture features (instructions) dictated the number of functions that constituted a processor (gates / CPU) that, in turn, determined technology selection from a point of preferred Gates per Chip and Ports per Chip. Proximity of chips to each other for processor design was crucial during this time period since a CPU could not reside within the boundary of a single chip as it easily does today. Bandwidth, i.e.; number of bytes per unit of time that could be moved between functions within the CPU and the CPU and associated memory is key and places demands on pins or logic paths between functions that usually requires compromise in each and every design.<o:p></o:p>

Those reading this now find this humorous I am sure with multiprocessing units (multi CPUs) now residing within the boundaries of a single chip or IC die. In the 80’s and well into the mid 90’s, however, a CPU processor partitioning of necessary logic or Boolean functions on multiple integrated circuit chips (usually multiple hundreds of chips) and multiple complex printed circuit boards (2 to 8) was an integral part of determining the overall performance, power consumption, cost and reliability of the system.<o:p></o:p>

 <o:p></o:p>

 <o:p></o:p>

 <o:p></o:p>

Introduction <o:p></o:p>

ETA Systems technology was selected in 1980 (the organization was the Advanced Design Laboratory of Control Data Corporation at the inception) with the following objectives:<o:p></o:p>

  • The highest performance

Supercomputer at the time of product delivery<o:p></o:p>

  • The most cost effective technology

available<o:p></o:p>

  •  The lowest possible power

consumption while meeting other objectives<o:p></o:p>

  • The largest product diversity

with a single design <o:p></o:p>

  • The highest possible

reliability. pins and interconnects usually dictated the reliability since by that time Integrated Circuit technology reached a very high reliability for both logic and storage devices.<o:p></o:p>

  •   Leverage as much of the

technology as possible to the follow-on computer generation. This usually fell by the wayside until ECAD and MCAD technologies were introduced into the design.<o:p></o:p>

  • Utilize only standard IC

technology processes being developed for other markets.<o:p></o:p>

  •   Demonstrate the prototype of the

product in less than four years.<o:p></o:p>

Digging deeper into objectives:<o:p></o:p>

==== The highest performance Supercomputer at the time of product delivery<o:p></o:p> ====


Simply stated, the highest performance processor solved the largest problems most effectively.  Performance was usually measured in clock cycle that was unfortunate since variable amounts of calculations could be made per clock cycle.  This single parameter was the bragging rights although later Gigaflops became the stated parameter and that also did not necessarily reflect the true performance of a supercomputer.  The “king of the hill” at any given cycle (usually 2 to 4 years) held the largest market share.<o:p></o:p>

==== The most cost effective technology available<o:p></o:p> ====


 The most “bang” for the buck applied to Supercomputers as well as other markets.  The customer was willing to get the highest performance when solving his particular challenges as a higher priority provided that the performance clearly exceeded lower cost alternatives. A legend of the Supercomputer industry - Jim Thornton - once described the requirement as getting through an intersection without having an accident.  Since a Supercomputer required so many components and interconnects, they were bound to fail more rapidly than small computers. So - Jim surmised, the faster the computer, the more things that could be solved before something went wrong.  Go through an intersection as fast as possible - not at a slow rate and you have a better chance of getting through safely.<o:p></o:p>

==== The lowest possible power consumption while meeting other objectives <o:p></o:p> ====


Each follow-on generation of Supercomputer products witnessed increased power per processor, which was justified by the resultant performance realized. (The lower the RC time constant (resistance - capacitance) the faster the computer clock cycle.)  Since the lower the R, the higher the power, this was a trend.  As multi-processor units increased per system the power consumption became a major issue; the largest users for site related “wall plug” power capacity limits and for the small users for basic life-of-system cost concerns (mainly power consumption, cooling and system reliability).

  • <o:p></o:p> The largest product diversity

with a single design  <o:p></o:p>

Design cycles were three to five years in duration and large teams were assembled to complete a single design.  Development costs per product were significant. Product cost ($2M to $40M per system) and performance ranges (greater than 20:1) for each generation of Supercomputers were increasing.  Since optimized cost points for each product were technology dependent, this required multiple design teams – each design utilizing a unique technology.  Desire to utilize a single total design, i.e., packaging, IC selection, manufacturing tooling, and associated boards, connectors, etc. was desirable, therefore, for a myriad of obvious reasons.  In fact, most companies were focused on only a portion of the computer market and dedicated to only a small portion of the product "bandwidth".  Cray Research focused on the high end, IBM, CDC, Unisys and others did an admirable job in the middle and companies like DEC and HP were at the lower end.  There were others too across the world, but these companies are only examples.<o:p></o:p>

  • The highest possible

reliability <o:p></o:p>

Supercomputers required significant bandwidth to pass data between processing units and processors and memory.  Bandwidth is a key differentiator that separates true Supercomputers from conventional computer systems.  Interconnects were and still are dominant reliability concerns in large systems.  Thermal management is also significant. Large systems, by the nature of the design require a large quantity of components to be simultaneously functional.  Each interconnect and each active logic device (integrated circuit) requires the highest reliability to permit large user problems to be solved using Supercomputers. Simply stated; Supercomputer operational time had to exceed the size of the largest customer problem.<o:p></o:p>

  •  Leverage as much of the

technology as possible to the follow-on generation  <o:p></o:p>

Significant cost for development of a given technology “kit” had to be leveraged for as long a period as possible.  It was a “given” that most Integrated circuits would be developed for each generation of computer. What about interconnects (connectors)? What about printed circuit boards? What about support technologies like simulation tools, assembly tooling and basic packaging?  Can any of these technologies extend to the next generation?  And, are there any “mid life kickers” that could be inserted into a successful product to extend its market life? IBM did exceptional work in taking hardware across product boundaries and generations of new products.  The initial tooling for packaging was expensive but results in later products appeared to prove dividends.  Cray Research Inc. and Control Data, by contrast, generated new packaging and connector technology with each new generation of product. 

  • <o:p></o:p>Utilize only standard IC

technology processes being developed for other markets  <o:p></o:p>

This addresses two major issues, cost and access to technology.  Cost, since dedicated IC processing lines with unique processes for low volume products – even if it could be realized – would not allow effective amortization of costs for process development and manufacturing.  Access to technology addresses advanced popular (high volume) processes to accommodate unique system designs. Innovation in the IC industry was and is applied to the highest return on investment markets.  The “trick” was to apply this ”standard” and most innovative technology to low volume Supercomputer applications.<o:p></o:p>

==== Demonstrate operating product prototypes in less than four years <o:p></o:p> ====


This requires discipline as well as good management and leadership.  Due to the complexity of Supercomputers, we felt that tools had to be upgraded significantly and checkout and diagnostics improved as well. At any given time improvements are made in Supercomputer technologies. Allowing each incremental development to be incorporated extends the product development cycle significantly.  Selecting known (proven) technologies at the time of product development initiation results in a non-competitive product.  Risk must be taken.  What areas to take risk (technologies that are not available at product initiation but look the most promising) as well as the return on the risk must be carefully evaluated with factors clearly understood.  Where to invest in new technology development must be understood as well as the return on investment and the leverage of investment where other markets are interested in common technologies must be understood.  Back-up alternatives should be identified. The CEO of Cray Research - Jon Rollwagon - defined the challenge as "How many 3-point shots should each project take?" Missing market cycles is costly.  Under reaching (conservative) and over-reaching (bad choices in “betting on the come”) were also costly and prohibitive.  All of these factors were carefully evaluated.  It might be added, using the same basketball comparison, cannot have too many “lay ups” (sure things already developed) either!<o:p></o:p>



 <o:p></o:p>

Results<o:p></o:p>

Before getting to the details as to how decisions were made and how the ETA System technologies the “kit” was selected and developed, a list of noteworthy accomplishments achieved are listed:<o:p></o:p>

==== First Industry competitive Supercomputer to be developed using CMOS (Complimentary Metal Oxide Silicon) integrated circuit technology<o:p></o:p> ====


  Since 1995 – to the present (beginning 12 years after the technology selection by ETA Systems I might add) ALL HPC (High Performance Computers) are developed and manufactured using CMOS IC technology.  Until as late as 2000, bipolar technology (higher power, more costly to manufacture and lower gate count per chip) dominated high performance computers throughout the world.

==== First Industry Single Board (not single chip) Super-computer Processor<o:p></o:p> ====


 The chip density (gates per chip) allowed by advanced CMOS, the use of layout and design Computer aided design tools for optimum layout and simulation, the successful design of a 45 layer advance Printed Circuit board (you read it right 45 layers) and innovative chip attachment and cooling permitted a single processor containing nearly 3 million gates to be packaged on a single board.<o:p></o:p>

====  First Industry system to be auto-tested for mechanical continuity, functionality AND performance using on-chip self–test circuitry <o:p></o:p> ====


CPU Processing units (≈3Million gates each) were validated for functionality and performance in less than 4 hours.  Any interconnect errors were recorded and allowed chip-to-chip replacement to occur in a minimal time. Other CPU checkout during this same period required weeks to months to check out and validate a processing unit.  Incoming testing of the logic IC Chip (function and performance) also used the same self-test innovations.<o:p></o:p>

==== First Industry production system to utilize Liquid Nitrogen (cryogenic) cooling<o:p></o:p> ====


 The ETA Systems CPU was immersed in Liquid Nitrogen – 77 degrees Kelvin – to improve performance greater than two times that CMOS technology operated at room temperature – 300 degrees Kelvin.<o:p></o:p>

  • First system at CDC to fully

utilize Computer Design Software to design Chips, boards, validate Logic design and Auto Diagnostic test the system with Synergistic tools.  <o:p></o:p>

Permitted checkout of a CPU to be completed in less than 4 hours.  Manufacturing costs were greatly reduced.  This technique was also used at the IC Supplier and greatly reduced any probe test hardware and software.<o:p></o:p>

==== First Industry system to have systems selling from less than $1M to greater than $20M from a single design  <o:p></o:p> ====


Performance range of the ETA System products was greater than 24:1 (8 processor system operating at 7 nanoseconds Clock period and a single processor system operating at 24 nanoseconds.).  Processors were manufactured, tested and validated from a single manufacturing line using identical components.  (IC Chips were performance sorted using auto self test circuitry embedded on each chip). Product differences began at the system packaging level.<o:p></o:p>

 <o:p></o:p>

 <o:p></o:p>

 <o:p></o:p>


<o:p></o:p>

==

==