IEEE
You are not logged in, please sign in to edit > Log in / create account  

First-Hand:Cryo CMOS and 40+ layer PC Boards - How Crazy is this?

From GHN

Revision as of 17:55, 15 April 2009 by Vacca (Talk | contribs)
Jump to: navigation, search

How it started

It was in the early 80's.  Control Data (CDC) had just launched the CYBER - 205 with modest success and the team was now focused on the next generation machine, the 2XX as I recall.  Speed, cost and meeting the schedule were all key objectives.  Speed because Cray Research under the guidance of Seymour Cray was setting  milestones for Supercomputers with the Cray 1 and then the Cray 2.  Cost, since Supercomputers were extremely expensive.  Schedules since the CYBER - 205 had established patience records as a machine that may never get out the door and this must not be repeated.

A conventional evolutionary approach for Integrated Circuit (IC) logic was initially selected.  Motorola, with some prodding, agreed to launch an 8,000 gate equivalent ECL (emitter-coupled-logic - the circuitry of choice for high performance processing units) provided that Control Data do the actual circuit development.  There were insufficient customers for Motorola to commit their resources to this lofty development.  Motorola did, however, commit their advanced ECL processes to CDC and a joint team was formed with the two companies.  

Logic designers at the CDC Advanced Design Laboratory were given preliminary design rules based on computer device  models and estimates of gate per chip densities.  There was a natural follow up of grumbling by the logic design team led by very experienced and innovative folks (Ray Kort, Maurice Hudson and Dave Hill to name three)  but circuit designers had learned to accept this since logic designers always found the circuits to be too slow and insufficient an quantity of gates and pins (I/O ports) per die. There was a lot of cooperation too.  Basic building blocks were defined by the logic designers - gate functionality, register functionality, etc. From this set of preliminary rules  function blocks were defined and capacity per reasonably-sized Printed Circuit (PC) boards defined. The initial design using the Cray CYBER - 205 based architecture was launched.


In parallel with this effort, and in the same design group; i.e.; circuit, packaging, PC board and newly formed CAD (tools for layout and design of chips and boards) -  chief chip design engineer - Randy Bach - was assigned to develop an advanced CMOS chip for the Canadian Computer Development organization.  At this time, early 80's CMOS was in it's infancy being used for memory devices, low performance peripherals and also for low performance microprocessors (5 to 10 MHz clock speeds).  The design contained 5,000 gates plus appropriate input and output communication devices.  Gate arrays for CMOS was also nearly non-existent so Randy and his small team of two assistants developed a cell library and worked closely with the Canadian Development team to meet their objectives as well.  

This effort was completely separate from the ECL based gate array to be used for the next generation Supercomputer.  The product was developed for a low cost application.


It was customary for Neil Lincoln - chief architect, Dale Handy - manufacturing manager and me to go off to lunch every 8 to 10 days to discuss status at either Author Treacher's Fish & Chips or Zantigo's (high class - NOT - fast food restaurants).  As a side note, both of these fast food places disappeared during the ETA Systems brief duration.    Zantigo's has returned (I think because they know it is safe now that the three of us cannot visit together any longer - Neil unfortunately passed on a few years ago).

At one of these meetings, Neil had "news" for me.  Simply stated, the gate array in co-development with Motorola had unacceptable goals.  The chip had too few I/O pins, consumed too much power and insufficient gates. He had determined that the CPU (some 3 Million gates) had to be assembled on a single board.  "It was time for this to be done".  He also reached the conclusion that the logic design required at least 15,000 gates per chip to meet these goals.  

The logic designers had gotten to him I surmised. Schedules, Neil reminded us, could not be altered - and that was that.  To soften the blow he bought lunch that day, three Cokes and three orders of fish and chips - Neil's was a large order.

The trip back to the lab was pretty quiet, fortunately short since our eating places were all very close to the lab.


that afternoon, I assembled the key folks - I might miss one or two but Randy Bach, Doug Carlson, Dave Resnick and John Ketzler were four that I recall now.  Doug was a mechanical engineer that I assigned the Motorola project to because of his management skills - something he probably never forgave me for - John was the key circuit engineer on the Motorola project and Dave was and still is a very versatile and perceptive engineer.  

Doug and I would inform Motorola of the decision not to continue.  The team would package up what was accomplished and turn it over to Motorola to carry the ball forward if they wished.  As a side note, Motorola and Cray did continue the design.  It was the circuit design used in the Cray C90, a very successful computer.

The meeting turned to what were the next steps.

The key challenges that emerged were: 

  • IC Technology that could meet the new lofty goals
  • The PC board technology required to meet a single board CPU
  • Packaging and interconnect technology required to support the two above requirements
  • Computer Aided Design (CAD) technology necessary to accurately design IC and PCB technologies
  • Suppliers for all - do they exist?
  • What additional internal resources were required to achieve objectives
  • System packaging beyond a single CPU. (Memory, peripherals, I/O, etc.)
  • Testing of complex IC technology and complex PCB technology

Summary of IC technology accomplishments


John, if you peek in and look at this, ignore it for now, I just plugged in some history and will weed through it and clean it up later - too tired tonight :-).


ETA Systems Hardware Technology

1980 – 1989

Preface:

To restate the challenge: ETA Systems Inc. was spun off as the Supercomputer subsidiary.  The objective was to develop and manufacture High Performance Computers or commonly called in the 80’s and 90’s simply Supercomputers.  Cray Research Inc. dominated this market during this time frame and CDC (Control Data Corporation) had a minor role introducing the Star-100 and CYBER-203 and CYBER-205 systems.  Novel architecture (fast scalar performance and the efficient use of vectors), innovative software and highest performance integrated circuit (resulting in the fastest clock period) and associated innovative packaging (to optimize device spacing and thermal management) differentiated Supercomputers from conventional computer systems during this period. It must be stated to be "fair and balanced" that Supercomputers also had the highest price tag and demanded the largest memories and highest performance peripherals and system bandwidths. Systems dominating the market during the 80’s were the Cray-1, Cray XMP and CYBER-205.  NEC, Fujitsu and Hitachi also developed systems for this market.  The word Supercomputer was applied to other products as well. It is not intentional to dismiss their recognition.

The following overview will not enter into the decisions to separate ETA Systems from Control Data Corporation organizationally, although that topic is interesting as well.  Nor will the following discuss software innovations at ETA Systems – and there were many.

Architecture had a role in dictating the technology in terms of number of logic circuits that were serial per clock cycle.  Architecture also demanded high performance and large registers to be included which also dictated the performance (clock cycle) of the system.  Other architecture features dictated the number of functions that constituted a processor (gates / CPU) that, in turn, determined technology selection from a point of preferred Gates per Chip and Ports per Chip. Proximity of chips to each other for processor design was crucial during this time period since a CPU could not reside within the boundary of a single chip as it easily does today. Bandwidth, i.e.; number of bytes per unit of time that could be moved between functions within the CPU and the CPU and associated memory is key and places demands on pins or logic paths between functions that usually requires compromise in each and every design.

Those reading this now find this humorous I am sure with multiprocessing units now residing within the boundaries of a single chip or IC die. In the 80’s and well into the mid 90’s, however, a CPU processor partitioning of necessary logic or Boolean functions on multiple integrated circuit chips (usually multiple hundreds of chips) and multiple complex printed circuit boards (2 to 8) was an integral part of determining the overall performance, power consumption, cost and reliability of the system.



Introduction 

ETA Systems technology was selected in 1980 (the organization was the Advanced Design Laboratory of Control Data Corporation at the inception) with the following objectives:

Ø    The highest performance Supercomputer at the time of product delivery

Ø    The most cost effective technology available

Ø    The lowest possible power consumption while meeting other objectives

Ø    The largest product diversity with a single design 

Ø    The highest possible reliability. pins and interconnects usually dictated the reliability since by that time Integrated Circuit technology reached a very high reliability for both logic and storage devices.

Ø    Leverage as much of the technology as possible to the follow-on computer generation. This usually fell by the wayside until ECAD and MCAD technologies were introduced into the design.

Ø    Utilize only standard IC technology processes being developed for other markets.

Ø    Demonstrate the prototype of the product in less than four years.

Digging deeper into objectives:

Ø    The highest performance Supercomputer at the time of product delivery. Simply stated, the highest performance processor solved the largest problems most effectively.  Performance was usually measured in clock cycle that was unfortunate since variable amounts of calculations could be made per clock cycle.  This single parameter was the bragging rights although later Gigaflops became the stated parameter and that also did not necessarily reflect the true performance of a supercomputer.  The “king of the hill” at any given cycle (usually 2 to 4 years) held the largest market share.

Ø    The most cost effective technology available.  The most bang for the buck applied to Supercomputers as well as other markets.  The customer was willing to get the highest performance when solving his particular challenges as a higher priority provided that the performance clearly exceeded lower cost alternatives. A legend of the Supercomputer industry - Jim Thornton - once described the requirement as getting through an intersection without having an accident.  Since a Supercomputer required so many components and interconnects, they were bound to fail more rapidly than small computers. So - Jim surmised, the faster the computer, the more things that could be solved before something went wrong.  Go through an intersection as fast as possible - not at a slow rate and you have a better chance of getting through safely.

Ø    The lowest possible power consumption while meeting other objectives.  Each follow-on generation of Supercomputer products witnessed increased power per processor, which was justified by the resultant performance realized. The lower the RC time constant (resistance - capacitance) the faster the computer clock cycle.  Since the lower the R, the higher the power, this was a trend.  As multi-processor units increased per system the power consumption became a major issue; the largest users for site related “wall plug” power capacity limits and for the small users for basic life-of-system cost concerns (mainly power consumption, cooling and system reliability).

Ø    The largest product diversity with a single design.  Design cycles were three to five years in duration and large teams were assembled to complete a single design.  Development costs per given product were significant. Product cost ($2M to $40M per system) and performance ranges (greater than 20:1) for each generation of Supercomputers were increasing.  Since optimized cost points for each product were technology dependent, this required multiple design teams – each design utilizing a unique technology.  Desire to utilize a single total design, i.e., packaging, IC selection, manufacturing tooling, and associated boards, connectors, etc. was desirable, therefore, for a myriad of obvious reasons.  In fact, most companies were focused on only a portion of the computer market and dedicated to only a small portion of the product "bandwidth".  Cray Research focused on the high end, IBM did the best job in the middle and companies like DEC and HP were at the lower end.  There were others too across the world, but these companies are only examples.

Ø    The highest possible reliability. Supercomputers required significant bandwidth to pass data between processing units and processors and memory.  Bandwidth is a key differentiator that separates true Supercomputers from conventional computer systems.  Interconnects were and still are dominant reliability concerns in large systems.  Thermal management is also significant. Large systems, by the nature of the design require a large quantity of components to be simultaneously functional.  Each interconnect and each active logic device (integrated circuit) requires the highest reliability to permit large user problems to be solved using Supercomputers. Simply stated; Supercomputer operational time had to exceed the size of the largest customer problem.

Ø    Leverage as much of the technology as possible to the follow-on generation.  Significant cost for development of a given technology “kit” had to be leveraged as long as possible.  It was a “given” that most Integrated circuits would be developed for each generation of computer. What about interconnects (connectors)? What about printed circuit boards? What about support technologies like simulation tools, assembly tooling and basic packaging?  Can any of these technologies extend to the next generation?  And, are there any “mid life kickers” that could be inserted into a successful product to extend its market life? IBM did exceptional work in taking hardware across product boundaries and generations of new products.  The initial tooling for packaging was expensive but results in later products appeared to prove dividends.  Cray Research Inc. and Control Data, by contrast, generated new packaging and connector technology with each new generation of product. 

Ø    Utilize only standard IC technology processes being developed for other markets.  This addresses to major issues, cost and access to technology.  Cost, since dedicated IC processing lines with unique processes for low volume products – even if it could be realized – would not allow effective amortization of costs for process development and manufacturing.  Access to technology addresses advanced popular (high volume) processes to accommodate unique system designs. Innovation in the IC industry was and is applied to the highest return on investment markets.  The “trick” was to apply this ”standard” and most innovative technology to low volume Supercomputer applications.

Ø    Demonstrate operating product prototypes in less than four years.  This requires discipline as well as good management and leadership.  At any given time improvements are made in technologies used in supercomputers.  Allowing each incremental development to be incorporated extends the product development cycle significantly.  Selecting known technologies at the time of product development initiation results in a non-competitive product.  Risk must be taken.  What areas to take risk (technologies that are not available at product initiation) and what is the return on the risk must be carefully evaluated with factors clearly understood.  Where to invest in new technology development must be understood as well as the return on investment and the leverage of investment where other markets are interested in common technologies must be understood.  Back-up alternatives should be identified. The CEO of Cray Research - Jon Rollwagon - defined the challenge as "How many 3-point shots should each project take?" Missing market cycles is costly.  Under reaching (conservative) and over-reaching (bad choices in “betting on the come”) were also costly and prohibitive.  All of these factors were carefully evaluated.  It might be added, using the same basketball comparison, cannot have too many “lay ups” (sure things already developed) either!

Results

Before getting to the details as to how decisions were made and how the ETA System technologies the “kit” was selected and developed, a list of noteworthy accomplishments achieved are listed:

Ø    First Industry competitive Supercomputer to be developed using CMOS (Complimentary Metal Oxide Silicon) integrated circuit technology.  Since 1995 – to the present (beginning 12 years after the technology selection by ETA Systems I might add) ALL HPC (High Performance Computers) are developed and manufactured using CMOS IC technology.  Until as late as 2000, bipolar technology (higher power, more costly to manufacture and lower gate count per chip) dominated high performance computers throughout the world.

Ø    First Industry Single Board (not single chip) Supercomputer Processor.  The chip density (gates per chip) allowed by advanced CMOS, the use of layout and design Computer aided design tools for optimum layout and simulation, the successful design of a 45 layer advance Printed Circuit board (you read it right 45 layers) and innovative chip attachment and cooling permitted a single processor containing over 2.5 million gates to be packaged on a single board.

Ø    First Industry system to be auto-tested for mechanical continuity, functionality AND performance using on-chip self–test circuitry.  CPU Processing units (≈3Million gates each) were validated for functionality and performance in less than 4 hours.  Any interconnect errors were recorded and allowed chip-to-chip replacement to occur in a minimal time. Other CPU checkout during this same period required weeks to months to check out and validate a processing unit.  Incoming testing of the logic IC Chip (function and performance) also used the same self-test innovations.

Ø    First Industry production system to utilize Liquid Nitrogen (cryogenic) cooling. The ETA Systems CPU was immersed in Liquid Nitrogen – 77 degrees Kelvin – to improve performance greater than two times that CMOS technology operated at room temperature – 300 degrees Kelvin.

Ø    First system at CDC to fully utilize Computer Design Software to design Chips, boards, validate Logic design and Auto Diagnostic test the system with Synergistic tools.  Permitted checkout of a CPU to be completed in less than 4 hours.  Manufacturing costs were greatly reduced.  This technique was also used at the IC Supplier and greatly reduced any probe test hardware and software.

Ø    First CDC system to have systems selling from less than $1M to greater than $20M from a single design.  Performance range of the ETA System products was greater than 24:1 (8 processor system operating at 7 nanoseconds Clock period and a single processor system operating at 24 nanoseconds.).  Processors were manufactured, tested and validated from a single manufacturing line using identical components.  (IC Chips were performance sorted using auto self test circuitry embedded on each chip).

 


Boring into the details

Any Technology kit must be driven by a customer need.  In the case of Supercomputers the craving for increased computer performance at a lower cost (overall cost) was the deciding factor.  In any Supercomputer company a combination of marketing requirements, architecture innovations and logic design demands dictate the initial objectives of the hardware circuit and packaging organization.  I state “initial” since once the objectives are digested and key technologies are evaluated for the time frame addressed, compromises are the norm. In the case of ETA Systems technology selections in the early 1980’s, this was the strategy implemented.

The following paragraphs sequence the thought process and the technology selection strategy utilized.

Integrated Circuit selection:

The objectives, listed in earlier paragraphs were first integrated into the architecture and logic design requirements.  A market survey of key integrated circuit suppliers was conducted with emphasis on what was in development and planned for product introduction – not what was available at the time of the survey.  A risk assessment was made.  Primary focus was on the most dynamic technology, the IC Logic technology.  All decisions as to volume requirements, pins, packaging, etc. resulted from what was determined by this survey and risk analysis.  Merging the logic design objectives (gates, bandwidth and performance of key functions) was next.  <o:p></o:p>

An ECL (emitter coupled logic) high performance bipolar gate array using Motorola advanced IC technology was selected.  Since Motorola was not fully staffed to begin the actual product development  (application) but did have the process development underway, a cooperative development agreement was struck with the two companies (this occurred between Motorola and Control Data since ETA Systems had yet not been formed).  The design called for basic logic cells to be incorporated into a larger version of their existing gate array advancing the process for increased performance and chip size for increased gate capacity.  The existing gate or function array utilized approximately 2,500 gates (which was used as the primary gate array for the Cray Research very popular Y-MP Supercomputer) and the planned gate array would contain an excess of 8,000 equivalent gates.  <o:p></o:p>

Logic cell libraries were agreed to (acceptable to both Motorola for the general market and to CDC for the logic designs.  Pin counts (for power, ground and input/output logic communications) were established and power consumption estimates were made.  Once these parameters were established, board size, power systems and thermal control were evaluated in a trade off give-and-take.  Features of Printed Circuit Boards, (line widths, spacing, interconnect vias and number of layers were compared to the board size capacities, laminating press capabilities, drill designs and printed pc board processing limits.  IC packaging, limits, i.e.; minimum size of package, pin spacing, thermal removal, etc. was evaluated in parallel with PC Board limits.<o:p></o:p>

The chip design began, the cell library began and the packaging began once all parameters  (pins, power consumption and die size objectives) were agreed to.  Printed circuit board experiments also began.  Once feasibility was established and practical limits established (original goals could be met as to physical design and performance based on IC Modeling and extrapolation from previous established functional systems, a preliminary specification was presented to the architects and logic designers for review.<o:p></o:p>

From initial design data, logic design based on the parameters provided established a physical size for the Central Processing unit or CPU, the heart of the system.  A multiple board processor was required. This placed additional constraints on packaging since within a single processor all distances are crucial between circuits.  Three-dimensional packaging concepts were considered. Three dimensional packaging effectively meant a “sandwich” effect of multiple boards with interconnects from board to board were throughout the area – not exclusive to the periphery of the board such that chips on each of the boards would minimize distances between them.  In addition, power consumption estimates were made; thermal removal paths and techniques were considered.  A cost model was generated as well. All of these factors resulted in the CPU volume established.<o:p></o:p>

In parallel with these efforts, memory designs were underway.  Less freedom was available to memory since the basic semiconductor device could not be altered to accommodate specific users.  There were a few packaging alternatives, very few, and device configurations (Word – Bit architecture, pin numbering, power considerations, etc.) were dictated by the industry.  Since memory design has its own objectives for cost, reliability and performance, this effort could continue quite independently with one exception, the packaging of the total system must be synergistic and compatible.  A crucial parameter of this is the interconnect mechanism between processors and memory.<o:p></o:p>

A hardware system cost model was established – not only for current cost considerations but also estimates on volume costs based on learning-curve estimates as well for the life of the system.<o:p></o:p>

The chief architect, after careful review, rejected the design, Three key reasons were sited; performance would be impacted due to the 8,000 gate limit, (worst case logic paths could not reside in a single chip and multiple chip distances would increase the clock period), power consumption per CPU, although lower on a performance ratio basis to previous generations, was too high when the total system size (including the multiprocessor objectives) were considered and system cost appeared prohibitive – always a subjective issue but never-the-less a key component of the design.  Reliability concerns were also stated since the pin-count per CPU, although quite reduced from previous designs, were of concern.<o:p></o:p>

Back to the drawing board. <o:p></o:p>

Insert – mini tutorial - Bipolar technology refers to conventional NPN and PNP transistors operating in a non-saturating mode.  By not saturating the operating transistors (the base voltage being larger than the collector voltage) the switching characteristics were improved and balanced (off logic level and on logic levels had near identical delays).  In addition, the non-saturating circuitry – titled ECL for Emitter coupled logic – provided the TRUE and COMPLIMENT outputs for each logic function (i.e.; AND and NAND, OR and NOR, etc).  This provided significant advantages to logicians as they designed complex Boolean functions (ADD units, MULTIPLY units, DIVIDE units, etc). Under the category of “no free ride” ECL circuitry consumed higher power than the more popular but much slower saturating logic circuitry (TTL – transistor-transistor logic).  Other improvements in performance for integrated versions of ECL logic circuitry included replacing conventional junction isolation between circuits on a single die with Oxide isolation between circuits (lower capacitance per circuit so less charging and discharging when logic levels switched).  </span

CMOS (Complementary Metal Oxide Silicon) circuitry, especially at the time of ETA System, is a simpler and more efficient logic circuit.  Stacking of P channel and N channel transistors in series between voltage bus rails defines a single complementary gate. Functionality of the logic devices is much more forgiving to process variations due to the larger voltage swing and only active transistors used to define the circuitry (no resistors, diodes, etc.).  The physical size of a logic function when compared to a bipolar equivalent is significantly smaller, resulting in a significant increase in circuitry per equivalent die (chip) size. CMOS technology also consumed power ONLY when the circuit was switching (changing states) so power consumption was directly proportional to the frequency it was operating at.  ECL circuitry, by contrast, consumed approximately the same power – while switching or in a quiescent state.  (Later forms of CMOS – especially those designed in early 2000 and beyond, had increased power consumption primarily caused by increased leakage currents as a result of processes developed for lithography processes having features en excess (smaller) than 90 nanometers.  Technology at the time of the development of the ETA Supercomputers had minimum features of 1,200 nanometers.<o:p></o:p>

Advantages of CMOS were obvious; more circuits per given chip area, lower power consumption and higher functional yield.  It is important to stress “functional yield”. The CMOS devices functioned over a much larger range of processing variations (> 50% Vs.15% to 25% for ECL).  Performance variations were approximately 2 to 3 times for CMOS and 20 to 30 percent for ECL.  For this reason CMOS devices were sold at a much lower performance than any bipolar counterpart.  There is one other key difference in defining performance differences between Bipolar and CMOS devices.  For ECL (or any other bipolar device) the maximum operating frequency is defined, in part, by the base width – the physical distance between the emitter and collector of the transistor.  This is determined by the spacing based on diffusion or implant of the emitter and is controlled in the vertical direction and limited by process control that is quite precise.  This parameter is very thin and the frequency is determined indirectly proportional to the base width.  For CMOS the gate length defines the critical similar parameter.  Gate length is defined by a masking operation and limited by optic limitations for any generation.  Bipolar devices in the 1980’s and well into the later half on the 1990’s, therefore, had higher operating maximum frequencies than their CMOS counterparts.  As capital equipment – primary optics to generate masking and etching capabilities defined smaller and smaller geometries, CMOS technology improved dramatically in performance.  This was a result of smaller gate lengths but also each generation had smaller devices resulting in lower capacitance loading and lower time constants to charge and discharge.  During the time of the ETA Systems Supercomputer development, CMOS technology had not seen the advantages that bipolar devices could realize – but the potential for future improvements was obvious and projections clearly indicated that by the second half of the 1990’s (nearly 10 years after the first ETA Systems Supercomputer would be available), CMOS would overtake Bipolar in the last and most important parameter – performance.<o:p></o:p>

 <o:p></o:p>

Bipolar technology was stretched to a practical limit for the time frame in question. <o:p></o:p>

The IC industry had only one other technology candidate, CMOS, which was used exclusively for lower cost and considerable lower performance applications.  The impressive characteristic of CMOS technology at this time was: Lower power consumption per function, smaller size per logic function and lower cost per die due to two key factors (smaller physical size per function meant more logical functions per unit of area, and higher chip yield – chip functionality per wafer manufactured – due to reduced number of processing steps to generate CMOS devices.  That was the good news.  The concern was system performance.  While bipolar technology had set the standard for clock periods of 10 NSec for Supercomputer architectures such as the ETA System projection, CMOS was at least 5 times slower – in most cases 10 to 20 times slower for equivalent architectures. Based on this parameter alone, CMOS was not a candidate for Supercomputers in the 1990 time frame (the time frame where the ETA Systems Supercomputer would be in high volume production).<o:p></o:p>

The next steps were dramatic and at times emotional.  First, the team had to discard the ECL design and terminate the effort with Motorola.  This was very difficult since both companies depended on each other and secondly, all objectives of the ECL product were being met within the specifications established.  CDC (team which later became ETA Systems) provided Motorola with all of the design details to date.  Considerable effort was made to insure that the program was successful at Motorola.  <o:p></o:p>

A sidelight to this discussion – Motorola completed this product as an industry product.  Cray Research Inc. (the key competition and leader of the Supercomputer market) engaged with Motorola to successfully complete this development for a product announced in the late 1980’s. The product (Cray C-90) became another very successful supercomputer products developed and manufactured by Cray Research Inc.<o:p></o:p>

Next, a full effort evaluation of all technology candidates occurred.  CMOS futures were explored in depth.  GaAs technology was also evaluated.  Alternative ECL (bipolar) candidates were also considered.  CMOS was viewed as the technology of the future but the future was beyond the time frame necessary for product introduction.<o:p></o:p>

The following paragraphs summarize key events that led to the decision to use CMOS technology.<o:p></o:p>

Ø   Moore’s law (invented by the great innovator and co-founder of Intel – Gordon Moore) stated that IC technology (CMOS) technology, would double in performance and density every 18 months to two years.  The actual Moore’s law may have been stated somewhat differently but this captured all the project cared about.  To achieve this predicted growth, several parameters had to occur:<o:p></o:p>

o    The die size would increase (more gates per manufactured chip).<o:p></o:p>

o    Features on the chip (metal widths and spaces to interconnect devices and actual device parameters) would be reduced every 16 months to 2 years. Reducing parameter sizes have two positive results to goals of ETA Systems: increased performance and more gates per die.<o:p></o:p>

o    The technology would gain popularity – this would mean that capital equipment would keep pace with the “law”, applications would increase thus increasing volume, thus lowering cost and increasing performance.<o:p></o:p>

Ø   Key US Government agencies began a technology acceleration program based on CMOS technology (VHSIC).<o:p></o:p>

Ø   Honeywell, one of the participants in the VHSIC program held a technology luncheon symposium in which they presented an 11,000-gate CMOS development effort.  Attendees from CDC were impressed with what was occurring.  The chip was certainly larger than any that had been developed to date and the performance was accelerated beyond what was predicted for the 1988 time frame (the introduction date set for the ETA Systems Supercomputer).<o:p></o:p>

Ø   Logicians and architects determined that an minimum gate density of 15,000 gates per die would allow them to achieve a key objective; having a worst case Register to Register clock path residing within a single chip.  Now additional explanation is required here.  Each architecture configuration has a method of achieving its goals of applying computational instructions to problems.  The number of gates that are connected in serial fashion between the input and output registers (and this is truly simplifying the problem) determine the clock period that is allowed.  For the ETA Systems Supercomputer, therefore, it was determined that a functional unit clock period could reside within the boundary of the chip if the chip could provide 15,000 gates of logic to the designer.<o:p></o:p>

Ø   Research into technology experiments uncovered significant performance features of CMOS technology.  First of all, the technology was functional across a wide range of voltages and temperatures but performance was significantly altered.  The higher the operating voltage (within semiconductor constraints, of course) the higher the performance.  Unfortunately the Power consumption, although significantly lower than any alternative technology, increased as the Square of the operating voltage.  The lower the operating temperature of CMOS the higher performance as well. This factor was studied all the way from 400 degrees Kelvin to 77 degrees Kelvin.  77 Degrees Kelvin is the boiling point temperature of liquid Nitrogen.  300 to 350 degrees Kelvin is room temperature for example.<o:p></o:p>

So, let’s summarize what was learned with this evaluation:<o:p></o:p>

Ø   IC chips currently (four years before the need for an ETA Systems product) had a capacity of 11,000 gates.<o:p></o:p>

Ø   The performance of these gates, when operated at liquid Nitrogen temperatures, would perform at least two times faster than at room temperature – not yet validated at CDC.<o:p></o:p>

Ø   15,000 useable gates were required per chip to meet logic designer chip boundary requirements.<o:p></o:p>

Ø   If Moore’s law was applied to these parameters, within the time frame required, it was possible to achieve both gates per chip densities and performance goals (if the system operated in a liquid Nitrogen environment).   <o:p></o:p>

Ø   There were at least two IC Suppliers (those having contracts with the US government) that were pursuing CMOS as a performance and high gate/chip density technology.<o:p></o:p>

Computer Aided Design (CAD) tools were, during the period of the 80’s, in the infancy stage if one was to compare them to today’s capabilities.  To design, place cells within the matrix of the gates provided on the IC Chip, and route the interconnections of these cells accurately to the logic or Boolean design required by the logicians and to clock period constraints was a challenge.  This challenge applied to board layout designs as well.  Control Data Corporation (CDC) recognized the challenges and established a small but efficient and dedicated organization to address these challenges.  The industry had established a metric that to use CAD tools for gate or cell arrays, an additional 20% to 30% gates were required.  This meant if the ETA Supercomputer required at least 15,000 useable gates to accomplish necessary designs based on its architecture, a 18,000 to ≈20,000-gate capacity was required.  The technology organization set at its objectives a design of 20,000 gates plus necessary circuitry to self-test each gate or cell array.  This as compared to the gate array in development at Honeywell was nearly 2 times the capacity (11,000 total gates Vs. 20,000 total gates plus circuitry for self test). The task was to convince Honeywell to project the next generation size and layout rules and to accept an R&D effort that would allow CDC / ETA Systems achieve its objectives. Honeywell, an innovative organization, took on the task with key requirements:<o:p></o:p>

Ø    ETA Systems accept costs based on wafers processed, not functional chips.  Honeywell would provide necessary processing data to reflect wafers were processed within process parameter specifications.<o:p></o:p>

Ø    ETA Systems provide test equipment for wafer testing and test parameters for chip acceptance prior to packaging.<o:p></o:p>

Ø    Both companies would share facilities and key resources and work as a single team – as “open a Kimono relationship” that one could ever imagine during this dynamic period of complex process developments within the IC Industry.<o:p></o:p>

 <o:p></o:p>

Self-test circuitry was designed into the basic cell array periphery.  The area consumed by this custom set of pseudo-random generated logic and registers was less than 15% of the total chip area.  When the logic design team first heard of this area “waste” they lobbied for it to be removed in favor of more logic gates for function designs.  Fortunately this request was not honored.  IC validation at both the supplier in wafer form and at ETA Systems in packaged chip configuration coupled with the use of the same circuitry in manufacturing checkout to detect board opens and shorts between circuits assembled both in room temperature and cryogenic temperature environments proved to be well worth this “waste” of circuitry area. Small, relatively inexpensive testing systems were designed by ETA Systems and provided to the supplier.  The operands for initialization of the pseudo-random logic were also supplied for each design (chip type).<o:p></o:p>

Chip types (array design options) were carefully managed as to not proliferate the chip types in the system.  This was a new constraint placed on logic designers and was dealt with most professionally once understood.  The resultant chip total for the CPU (processing unit) was fewer than 150 while the chip types including clock chips and all logic design chips was fewer than 20 as best recalled.<o:p></o:p>

During the development cycle of the ETA System Supercomputer, Honeywell moved the manufacturing capability from a local Minneapolis-St. Paul facility to a state-of-the-art manufacturing facility in Colorado Springs, CO.  The transition was very transparent. To accomplish this team membership from both companies acted as one in all decisions addressing scheduling and timing of needs of various chips, testing, packaging, etc. The open book relationship was very beneficial to both companies.<o:p></o:p>

One design that was incorporated into the chip was to allow for next generation critical processing parameters to be added to the existing design (present chip layout).  Although this would not optimize the features of new process features (all parameters were not considered), key performance enhancements could be and were added to the present design.  A key feature was gate length and this was added transparently to the physical chip and offered appreciable performance enhancements to the design.<o:p></o:p>

Chip design summary: The decision to utilize CMOS technology for the ETA Systems Supercomputer in the 1985 – 1988 time frame (prematurely by all industry metrics) resulted in the following additional “technology kit” decisions:<o:p></o:p>

Ø    Addition of chip self-test. Feature established functionality at wafer test and functionality and performance sorting at ETA Systems.<o:p></o:p>

Ø    Computer Layout tools that validated logic prior to chip release for fabrication.<o:p></o:p>

Ø    Requirement to operate the chip at 77 degrees Kelvin or in liquid Nitrogen.<o:p></o:p>

Ø    Packaging, interconnect & assembly decisions based on liquid Nitrogen operation challenges.<o:p></o:p>

Ø    Remote testing of the CPU because of liquid Nitrogen operation challenges.<o:p></o:p>

Ø    Logic design partitioning challenges to design within 15,000-gate per chip boundaries and a minimum of IC chip types.<o:p></o:p>

 <o:p></o:p>

Printed Circuit Board Design Selection:<o:p></o:p>

In the period of the 1980s, the time frame of the ETA Systems Supercomputer development, Printed circuit boards had maximum dimensions of approximately a square foot and the number of total layers was fewer than 20.  (Layers provide power and ground stability, interconnect capability for the circuits attached to the board as well as inputs and outputs to and from the board.  If these total layers are allocated properly, approximately 50% are used for interconnect and the remaining for power and ground.  Positioning of power and ground layers also serve to provide interconnect layers that have transmission line capabilities to insure signal integrity throughout the board. During this period, a state-of-the-art printed circuit board was approximately one square foot of active circuitry and as stated earlier, 20 layers or fewer.<o:p></o:p>

It was determined that a maximum of 150 chips would be required to design the ETA Systems Supercomputer CPU.  Packaging of the IC and interconnecting the chip to a PC board with minimum spacing between chips (some spacing was required to allow interconnects to all of the necessary layers) resulted in a 1.2x1.2 sq. inch “footprint”.  Doing the simple math results in a pc board of a minimum of 220 sq. inches.  The number of total layers required to interconnect the 150 chips and the necessary Input and Output at the board periphery was determined to be 45.  Looking at design parameters of the board layers in more depth and insuring transmission line features to insure signal integrity defined the board thickness at slightly greater than 0.25 inches.  This thickness was approximately three times greater than high-end printed circuit boards produced in this time frame.  With a board having an area of greater than 1.5 times the size of what was able to be produced, a thickness of 300% of what was produced and a the number of layers 2.5 times of what was produced in this time frame it was clear that the printed circuit board industry was not ready for the ETA Systems design! The design has other limitations.  A key factor when designing pc boards is to insure proper connecting of the layers, i.e.; connecting the chip pins to the board and the proper layer of interconnect in the board and back to the proper receiving chip.  Drilling holes in the layers and plating the wall of the layers with copper for conduction make these connections.  These are called plated thru holes or PTH. A key parameter to insure that plating occurs in these holes is the hole diameter to depth ratio.  The industry at this period (not much better today) is 6:1, i.e.; the thickness of the board must be no more than 6 times the diameter of the hole.  This ratio would dominate the size of the board. If this ratio was used to design the board the board size would be increased in area by greater than 9 times.  Talk about piling on!  Since it was deemed not feasible, issues like cost and time to fabricate the board were not even addressed.<o:p></o:p>

Nestled into the design laboratory of Control Data Corporation was a small but very innovative printed circuit board prototype facility. The leader of this group kept his eyes and ears out for innovative alternatives to conventional board fabrication techniques and had previously displayed innovation (evolutionary in nature) in previous generations.  Embedded termination resistors in layers was one, finer features than the industry was producing another, and higher plated through hole (pth) ratios than the industry a third.<o:p></o:p>

New technologies in the printed circuit board were few and far between.  The industry was set in it’s ways of subtractive etching of circuit layers (removing unwanted copper from a pre-copper clad layer, convention wet etch processes and relatively simple assembly, i.e.; lamination of layers with pressure.  One inventor, Mr. Peter P. Pellegrino, arrived on the scene to discuss innovative, revolutionary and proven pc board processing.  At first it appeared to be too good to be true.  Board size relatively independent, aspect ratios exceeding 20:1 for PTH, an additive process that permitted finer lines to be fabricated on individual layers.  The layers were also embedded into the laminate so the opportunity for higher yield with reduced features.  An additional benefit of additive plating is reduction in waste and water usage.  <o:p></o:p>

A special plating cell was also introduced that permitted uniform deep hole plating by forcing plating fluid into each of the thousands of PTH.  The process titled “Push-Pull” also accelerated the plating cycle by over an order of magnitude, reducing cost.<o:p></o:p>

A small plating cell was incorporated into the prototype facility and experiments conducted.  Experiments were thorough and challenging since no one in the industry could approach the lofty objectives of the ETA Systems Supercomputer CPU board nor the lofty claims of the inventor. The results were simply outstanding.  From the results and a commitment to fabricate a larger line of plating insert cells, the 45 layer 15” x 24” CPU board became a finalized goal of ETA Systems.  <o:p></o:p>

Later, when manufacturing of the systems was viable a production capacity was developed for manufacturing.  It is noted that hundreds of these boards were fabricated from a period of 1987 through early 1989.  The yield of final boards was nearly perfect – only one finished board was scrapped.<o:p></o:p>

To this day (2007) few realize what a monumental accomplishment this was and still is.<o:p></o:p>

To accommodate routing and designing for minimum distance between IC chips, CAD tools were developed and the first use of diagonal routed layers were introduced.  Prior to this only x–y layers were permitted with manual and/or auto tools (CAD).  This enhancement permitted timing constraints to be realized between chips.<o:p></o:p>

The final board had the following noteworthy characteristics:<o:p></o:p>

Ø    Board size: 15 inches by 22 inches by 0.26 inches.<o:p></o:p>

Ø    Pth hole ratio ≈ 20:1 – plating time – less than 20 minutes<o:p></o:p>

Ø    45 total layers per CPU panel<o:p></o:p>

Ø    150 IC chip locations (fewer were used in final design).<o:p></o:p>

Ø    More than 30,000 board plated thru holes (pth) were used for interconnect.<o:p></o:p>

In 2007 this board development and manufacturing stands out as one of the major technology developments by ETA Systems.<o:p></o:p>

Packaging <o:p></o:p>

The key challenge for packaging the ETA Supercomputer processing unit was the cryogenic chamber for the processor.  The Cryostat to contain the processor (two processor units) had a conventional (and quite heavy) circular cryostat containing a vacuum chamber between the outside environment and the inner environment.  Input of liquid Nitrogen was at the bottom of the chamber and the escaping of the gaseous Nitrogen was provided for near the top of the unit.  The piping containing the Nitrogen to and from the regeneration unit was also temperature protected with vacuum lines.  It was felt that a less heavy and equally efficient chamber could be designed if time permitted but the selection was conservative to accommodate schedule and also to familiarize the team with the challenges of Cryogenics.  The compressor unit was a conventional Liquid Nitrogen system (very large and bulky) used for generation of Liquid Nitrogen for the commercial market.  Thought was given to actually eliminate the need to regenerate the system in a closed system but rather purchase Liquid Nitrogen in tanks and have them periodically refilled as is done in the IC and other industries using Liquid Nitrogen.  This was discarded for the initial design since several customer sites did not easily accommodate the external access to Liquid Nitrogen tanks.  It was to be an option for future systems and those customers that easily accommodated such an option.<o:p></o:p>

The final design was then a closed recycled Liquid Nitrogen system with the compressor located remote, much like Freon compressors, which many Supercomputer customers were already accommodating.<o:p></o:p>

The design challenge was at the surface (looked much like a two toaster) where the processing boards were inserted. This seal had to accommodate the connecting transmission to the external and room temperature memory and I/O subsystems. A printed circuit board was designed to connect the processor to the outside world.  Heaters were applied to the surface to prevent icing at the cryostat surface.<o:p></o:p>

The third challenge was to provide reliable soldering of the circuitry to the board amidst the severe temperature difference that the solder joints would be subjected to (greater than 250 degrees).  Studies at the National Bureau of Standards provided input that the temperature cycle should be profiled in a precise sequence as the board was cooled and heated.  In addition, care as not to remove the board and to care for condensation that would occur if the board had not been heated to room temperature was cared for.  The result was a 20-minute cycle to remove or insert the board was designed with a carefully prescribed sequence of temperature lowering and rising for both cycles.<o:p></o:p>

At the time of the unfortunate termination of ETA Systems, a more refined, lower cost and lower weight design was on the drawing boards.  Although the cryostat and associated cooling was costly, an analysis clearly showed that for the performance resulting from the design, the cost was less than any Bipolar IC system designed at the time.<o:p></o:p>

AIR COOLED SYSTEM <o:p></o:p>

As stated earlier in the document, an Air-cooled processor would operate considerably slower (2x slower) when operated in normal or “room temperature” environments.  ETA Systems by sorting the devices for performance at incoming inspection, allowed for a three times performance differential to be realized.  Only the highest performance devices were reserved for the Cryogenic cooled system.  The remaining parts were then resorted into two categories for room temperature; the differential would be a 4-nanosecond clock period between the two room temperature systems and 17 nanoseconds (24 to 7) for the total system product set.<o:p></o:p>

Air was forced on to the processor chips by using a plenum that was designed to cover each chip.  Holes were designed in the plenum such that equal operating temperature would result for each operating chip.  Since the power consumption variation significant for several part types, designing the appropriate number of holes above each chip location could provide custom cooling.  The plenum could then be molded for mass production of the processing unit.  Large volume cooling fans were designed for the system as well.  Cost was the focus for the air-cooled systems since the price tag was below $1M.<o:p></o:p>

Storage<o:p></o:p>

Stacks using three-dimensional characteristics were designed for the memory – both static (high performance) and dynamic (high density and lower performance) memories of the ETA Systems Supercomputer.  These unique designs provided for highest density and optimum performance of the standard memory devices used.  Ability to upgrade to future generations of memory (more storage capacity Integrated Circuits) was built into the design as well.  The design worked well and stacking became commonplace in the computer industry for future designs – eventually eliminating the chip package entirely.<o:p></o:p>


Summary <o:p></o:p>

The design of the ETA Systems Supercomputer hardware had many unique features.  The brief pages highlight some of them.  <o:p></o:p>

It would be remiss not to briefly discuss the “team” concept used to design the hardware.  By having the CAD, Packaging, memory, circuit and power expertise located in a close proximity and holding concise project reviews at all levels at periodic and timely phases, all were kept abreast of the progress and challenges of each other.  This permitted changes to be made to necessary designs to properly accommodate the challenges and opportunities in a timely fashion.  Hardware was demonstrated on or near schedule despite the innovations required in each aspect of the design.  The team was truly a “team”.  A missing link to the team was the logic design.  These folks were separate and actually on another floor of the ETA Systems facility.  It was strongly suggested and accepted for future designs, that the logic team would be a part of this common organization. <o:p></o:p>

Clearly, communications – effective communications at all levels of the organization was key to this hardware design success.<o:p></o:p>

 <o:p></o:p>

 <o:p></o:p>

 <o:p></o:p>