Oral-History:Robert M. Gray (1998): Difference between revisions

Revision as of 19:24, 8 January 2009

About Robert M. Gray

Born in San Diego in 1943, Robert M. Gray possessed an early interest in digital communications. After completing his Bachelor’s and Master’s degrees at MIT, he chose to attend USC for his Ph.D. His dissertation dealt with rate distortion theory, or source coding for autoregressive processes, in an attempt to evaluate Shannon functions for sources with memory that had not been previously evaluated. Of theoretical importance, his dissertation disproved the claim that the Shannon results could only be obtained for memoryless sources. He was recruited directly from USC to work at Stanford. Gray has published a significant body of work, both individually and collaboratively including: developing one of the first examples of a universal code; developing a speech coding system; popularizing the algorithm for vector quantifier design, and later, its applications for images. His publications have received such recognition as the Information Theory Society paper prize. Gray has also been involved with IEEE Transactions both as an associate editor and editor-in-chief. He concludes with a brief summary of his feelings about his career in signal processing from both a practical and mathematical perspective.

About the Interview

ROBERT M. GRAY: An Interview Conducted by Frederik Nebeker, IEEE History Center, 5 October 1998

Interview #347 for the IEEE History Center, The Institute of Electrical and Electronics Engineers, Inc., and Rutgers, The State University of New Jersey

Copyright Statement

This manuscript is being made available for research purposes only. All literary rights in the manuscript, including the right to publish, are reserved to the IEEE History Center. No part of the manuscript may be quoted for publication without the written permission of the Director of IEEE History Center.

Request for permission to quote for publication should be addressed to the IEEE History Center Oral History Program, Rutgers - the State University, 39 Union Street, New Brunswick, NJ 08901-8538 USA. It should include identification of the specific passages to be quoted, anticipated use of the passages, and identification of the user.

It is recommended that this oral history be cited as follows:
Robert M. Gray, an oral history conducted in 1998 by Frederik Nebeker, IEEE History Center, Rutgers University, New Brunswick, NJ, USA.

Interview

Interview: Robert M. Gray
Interviewer: Frederik Nebeker
Date: 5 October 1998
Place: Chicago, Illinois

Nebeker:

I’d like to begin by asking you where and when you were born and just a little bit about your family.

Gray:

I was born in 1943 in San Diego, actually in the North Island Naval Air Base Hospital that was hit by a plane six months later and no longer exists. My father was a lifetime naval officer from New England and my mother was a housewife from the Deep South.

Nebeker:

You said your father was involved with radio in World War I?

Gray:

He went to the Naval Academy and got out in 1910, went into submarines, and my guess is probably after he was moving from submarines to other ships he became involved with radio and ended up temporarily being a radio instructor, which meant he had to learn the material and then teach others. Radios were brand new in that time, as were launches other than small sailboats to get to some of the major ships.

Nebeker:

Did that mean that you had an interest in radio as a kid?

Gray:

I first became interested in radio when I was in high school and I joined a radio club. Then I got an amateur radio license. I was never terribly active. What intrigued me more was that my brother Steen, who had completed his undergraduate work at MIT, was teaching a course on computers at San Diego State. I would guess this is 1957-58, and he let me sit in. So I went and took a course in basic logic design and Boolean algebra. I got a little computer that I ordered called Geniac, and it came complete with a copy of Claude Shannon’s paper on Boolean algebra and applications to computing, and I did a science club project on logic using diodes. I recollect doing various simple logic operations with simple diode circuits. And that was probably what first really intrigued me. So I was pretty much digital from my origins rather than analog.

Nebeker:

Where did you grow up?

Gray:

I grew up in Coronado, California. That was a place a lot of naval officers retired to.

Nebeker:

Beautiful place.

Gray:

And it had I think a good high school, and I was able to get out and go take some courses at San Diego State. And now that I remember more carefully, my brother was in between MIT and going to Caltech for a Ph.D.

Nebeker:

And you went to MIT?

Gray:

I went to MIT as an undergraduate, essentially because both of my brothers had, and so it seemed very natural to follow them there, which I did in 1960.

Nebeker:

And that was for an EE degree?

Gray:

I knew I was going to go into EE fairly early, again probably because at the time I idolized my brothers and they had. But since they had done so well in EE, the only place I could outshine them was in humanities. So I ended up taking more humanities courses.

Nebeker:

And I see you got also a Master’s degree at MIT.

Gray:

I stayed there for a total of six years. Like my classmate Larry Rabiner, I was part of Course 6A, a cooperative program. He went to Bell Labs, I went to the U.S. Naval Ordinance Lab. Again, working basically on digital things, digital communications. Also passive sonar which was more analog, at the time.

Nebeker:

Where is that Ordinance Lab?

Gray:

It’s outside of Washington, D.C. in White Oaks/Silver Spring. So I was in Washington, D.C. several summers and other quarters during the ‘60s and I got to hear a speech by Jack Kennedy and by his brother. I was a Washington intern before it was a dirty word. Because all students working for the summer were considered interns. And so we had a lot of shows put on for us, which was really fascinating.

Nebeker:

I’ve heard very good things about that cooperative program. Did you like it?

Gray:

I loved it. I think it was an excellent idea. I would love to see something more like that at Stanford. I think it was good to get away from school. In our case we had classes taught at the plant, so we kept up a certain academic part, but it was very different doing engineering in the real world from just taking classes and learning the theory. And I think it was an absolutely essential part of my upbringing. I probably never would have gone to graduate school if I had not had a taste of real life engineering beforehand and realized you could do much niftier stuff in a lot of fields if you simply had more schooling.

Nebeker:

Were there any professors there at MIT that were particularly influential?

Gray:

Yes. The first one that comes to mind is Jim Bruce, who actually wrote one of the unrecognized classics in quantization. After I guess he had done his research, which included some work with Bose on the early development of some of the speaker systems that were subsequently so well commercialized, he essentially got involved with student affairs and became more of an administrator at MIT for many years. But early on he was active in research in signal processing and he was also my original undergraduate advisor. There was also Paul Gray. Not the Berkeley one, but the MIT one. No relation. But, he was in charge of the co-op program for the Naval Ordnance Lab when I first got involved in it. When I decided to do some things differently, including taking a half year off to travel, breaking my education in the middle, he was a profound influence on me. Bill Siebert was one of the great contributors to the early development of signal processing, especially in training lots of people like Larry Rabiner and Al Oppenheim. I should also add Al Drake, who taught probability to engineers.

Nebeker:

I know that you’ve had a lot to do with statistics in signal processing. Were you already interested in signal processing in those days?

Gray:

I guess I didn’t know the words. That became a title I think probably in the ‘70s to cover at least a substantial portion of what I did.

Nebeker:

Wasn’t the first course there in the late ‘60s somewhere at MIT?

Gray:

Oppenheim and Schaefer?

Nebeker:

Yes.

Gray:

I don’t know if that actually happened during the ‘60s. I didn’t take that course. The one I took was from Siebert, which was circuits, signals and systems. Their course could have started then, but I left MIT in ‘66, so my guess is that it really started after I had left.

Nebeker:

Okay. What did you do on completion of the cooperative program, the masters?

Gray:

Well, for the masters degree I worked mostly on sequential decoding, which I guess I’d call coding and information theory rather than signal processing, and I then decided to go elsewhere for graduate school, because my advisor, Irwin Jacobs, now well known for lots of other things, had suggested some problems to work on and unfortunately he went to a school that did not yet have a graduate school. UCSD was brand new. So I ended up going to USC, which allowed him to be on my dissertation committee, and spent three years there. and Jacobs basically found me a job at the Jet Propulsion Lab, where I worked during two summers. So that got me more involved in communications, phase-locked loops, more with sequential decoding, and the research led to work really more information theory than signal processing on source coding and rate distortion theory, the Shannon theory.

Nebeker:

Why did you choose USC?

Gray:

A variety of reasons. For one, I was able to teach a course in probability and random processes rather than being assigned a lab. I always preferred computers and theory to hardware. I just felt more comfortable with USC when I visited it, and I was able to continue to work with Jacobs, which at the time was something I very much wanted to do. So I guess the other thing was Zahrab Kaprilian, who was then the EE chair, and USC was in the process of expanding and improving its engineering school and department. He had actively recruited an outstanding faculty, and the year I was looking he was actively recruiting students. So he went to all of those schools he considered really good and basically telephoned a group of students he selected based on their admissions applications, and quite honestly that was extremely flattering. And it also provided a unique situation of bringing together a whole bunch of good competition from good schools and it was enjoyable. And it was a return to California.

Nebeker:

What professors did you work with at USC?

Gray:

My Ph.D. supervisor ended up being Bob Schultz, and his area was fairly distinct from mine. He was interested in communications and I think mostly synchronization codes in those days. On the other hand, he was willing to learn about source coding, which I was learning about, having been suggested by Jacobs. And I worked fairly well with him. It was an enjoyable collaboration, but technically I ended up working more with Lloyd Welch, who helped me out on some of the mathematical points. And as I had most of the EE courses from MIT already, I spent most of my course work in the math department. And there Tom Pitcher comes to mind, and Adriano Garcia, who was at JPL as well. And I spent a lot of time learning mathematics that is very important to signal processing.

Nebeker:

Wasn’t Richard Bellman there at the time?

Gray:

He was there, and I met him, but I didn’t work with him. He was certainly a presence there, but I think maybe the stronger influence was Solomon Goloub, and not at SC but at JPL was Gus Solomon, who was extremely entertaining. He was I think one of the funniest of all geniuses I have ever met. And he also had an impact. Bill Lindsay was there, from whom I learned a lot about phase-locked loops, with which I worked for awhile.

Nebeker:

I’ve come across a number of people like you in signal processing who have been very much interested in mathematics. Manfred Schroeder says he’s been a crypto-mathematician and Itakura said, you know, it’s always his hobby to read mathematics. And that sounds like something very early on you recognize that math would be valuable?

Gray:

Yes, for a variety of reasons. I think partially MIT, at least in those days, was not that great at teaching math. Engineers were sort of considered second class citizens, and except for those few like Rabiner who got into special courses, the math we had was dull and more mechanical. At USC not only was there all this time to take classes, there was also this attitude perceived by this bunch of engineers that Kaprilion brought in that the mathematicians looked down upon us. So it became a matter of honor as engineers to go in and beat them at their own game.

Nebeker:

I see.

Gray:

So we spent a great deal of time, and it was good stuff.

Nebeker:

And what was your dissertation?

Gray:

It dealt with rate distortion theory or source coding for auto regressive processes. This is a particular model of sources that has proved particularly useful in speech and in my case it was trying to evaluate Shannon functions for interesting sources with memory where it hadn’t been evaluated before. And, it involved in particular a branch of math that I learned about from the mathematics professors that I had called Toeplitz forms. And so it was a case where I was able to use some math that I was familiar with to get a new result. And I won’t say it was an application, because it was still Shannon theory, but it was at least one step closer.

Nebeker:

And has it proved useful in signal processing?

Gray:

Well, this particular result I would say provided some insight, and it was also at a time when Shannon source coding theory was getting increasing attention and importance. So I think it had a minor impact simply because the area was blossoming at the time, and it was at least a result that was dealing with models of information sources that were less trivial than the previous ones that had been treated. And it had a very interesting form, suggesting there were reasons for this unusual behavior, but I don’t think to this day anybody has really understood or generalized the results much past that. So it was I think of primarily theoretical import, and it at least disproved the claim that the Shannon results could only be obtained for memoryless sources, which would exclude interesting things like speech and images.

Nebeker:

Maybe I could at this point get you to comment on how close the interaction has been between information theory or coding theory and signal processing.

Gray:

I think it’s been pretty significant. I think it is not a coincidence that the two trace their birthdays to almost the same period. I think what we now call signal processing was profoundly influenced by Norbert Wiener, who was very much a contemporary of Claude Shannon. They both talked about a lot of things that were similar. I guess Shannon’s viewpoint was maybe more digital and Wiener’s was more analog. I think a lot of people have made the bridges between the two areas. Tom Kailath and Dave Forney come to mind in particular as they received their start in information theory but ending up doing things that are easily interpreted as signal processing. And a lot of information theory deals with algorithms, and I think classically signal processing is not always focused on how you go algorithmically from the continuous world of the discrete. Information theory has focused primarily on that frontier. And so I think with a lot of people nowadays it’s hard to distinguish one from the other. Traditionally information theory may have dealt more with performance bounds; how well you can do in the ideal situation.

Nebeker:

Right. The higher, the more abstract the theory.

Gray:

And signal processing has been a bit more concerned with the real world. But at least in things that I have dealt with, it’s usually been insight that comes from the information theory world. When you simply try and apply it to the specifics of a signal processing problem, speech or images, that forces you to find a way of accomplishing that goal, and then the information theory results give you some good guesses and insight and maybe some bounds to compare with to see how well you are doing.

Nebeker:

How does coding theory fit in this picture? Is it part of information theory?

Gray:

Coding theory. Information theory can claim much more than signal processing can claim. These become practical issues when you have separate IEEE societies fighting over territory for transactions. But coding, historically its main pushes came from the promises of information theory that you could do things vastly better than people have done by just doing ad hoc things, and thinking about things like the first error correcting codes. True, repetition codes probably pre-date Shannon, but the first real parity check codes, the first lossless source codes like Huffman coding. These all came out of the information theory community, mostly out of Bell Labs or MIT in the early days of the late 40s and 50s. Much of the development of coding theory until really this decade and maybe the end of the last one came out of information theory groups or organizations.

Nebeker:

Can coding theory be very abstract and not of much value to the signal processing engineer who has to figure out some way of coding something. Are there people doing coding theory, not the people who are actually figuring out how to code images or anything else?

Gray:

There may be a few people doing coding theory who are not connected to applications through the real world. But, I think by and a large as a field, if you look at the more famous of the code developers, people like Reed and Solomon, they were both either mathematicians or very mathematical engineers. Reed-Solomon codes are in virtually every CD. So the thing they developed is out there, and it’s everywhere. There have been a lot of cases where the mathematicians have come up with improved codes, and if they really were improved they got adopted and built. I don’t think it’s any different from signal processing history and Norbert Wiener who nobody understood at the start, but as they understood it they realized how useful it was and built things like matched filters and estimators and predictors.

Nebeker:

I’ve heard the complaint though that even within the signal processing, but applied here to this information theory/coding theory, that often academics you know develop a field in ways that maybe become further and further from the world of applications, and their work is less and less relevant to practical work.

Gray:

I won’t deny that, but I’d say they are digging their own grave when they do that, and they are not making their students terribly employable. I guess to show that there is the counter, I’d cite my own MS supervisor Irwin Jacobs, who went on to found Linkabit and then QualComm, and he and his wife recently gave something like $17 million to the USCD School of Engineering, which is now the Jacobs School of Engineering. And what he was doing when I was there was sequential decoding. And this was considered abstract by a lot of people. However at JPL, they realized that the only way to get pictures reliably from deep space to here was by sensible coding. Jacobs and Andy Viterbi commercialized these ideas, ideas like the Viterbi algorithm, which is essentially dynamic programming, which again a lot of people thought was abstract. So I think the people who are the most famous are the people who come up with the algorithms that really solve problems. And then sometimes, not always, they are the people who realize how good those algorithms are. The Viterbi algorithm, for example, was invented because he could not understand the sequential decoding algorithm of Wozencraft. So he came up with something simpler that he could understand and prove results for. And then it was other people, in particular Jim Omura and Dave Forney, who figured out what Viterbi was doing was actually optimal in a certain sense and practically had many advantages over the way it had been done before. So these were very bright people doing somewhat different things, but none of them got lost in abstraction and missed the connection to the real world.

Nebeker:

But it might be that these are the famous people whose result did have this great utility. In your experience with the field, does it seem that there are many people of a different sort who are sort of going off in abstraction directions?

Gray:

I don’t think there are many. Again, I think those who do that, most of them won’t last unless they’re really good at being abstract. Engineering departments pay better than math departments, and it’s pretty hard, if you have an engineering degree, to get a job in the math department even, if that’s what you want. At least I really like having the time just to fool around with the math and an idea that might or might not pay off. The danger is if that’s all you do and if that’s all you have your students do, you’re cloning yourself, which is a bad idea, creating your own competition, and you’re making it so that they’re not going to have an easy time fitting in. I think every student who works in compression for example should learn information theory, but I don’t encourage them to become professional Shannon theorists, unless it’s as a sideline. Because if they go into industry, or even into a university, it’s going to be JPEG and MPEG and ADPCM that they’re going to have to teach. And those are the tools they are going to have to give to most students. So they have to balance.

Nebeker:

And it sounds like in your view it’s not a great problem, that in most cases today there is good connection between them.

Gray:

I don’t think so. I think you could have gotten away with that maybe 30 years ago. I think it’s simply harder to get away with now. Funding is tougher. The funding agencies usually have to justify giving you money. Just because you’re working on a really fascinating problem isn’t enough. It’s got to at least hold promise for being able to do things better. And you can’t just say, “If I am successful, it’s going to improve the field.” You have to provide a convincing argument that if you can do this in fact it will have an impact on doing things better.

Nebeker:

Now you’ve been in both the information theory community and the signal processing community. Do you see that as like a continuum that you’re moving in or are there distinct communities there?

Gray:

I think they’re distinct communities, and there’s an overlap. My evolution is partially historical, maybe even largely historical, and most of my early career was involved with the Information Theory Society as well as with information theory. My change really came probably early ‘80s. As I changed technically to focus more and more on first speech and then later images, I didn’t drop information theory entirely, but I spent more time with people doing signal processing and got more involved with the Signal Processing Society.

Nebeker:

Why don’t we go back to your dissertation? And I don’t know, maybe we’d finished that. But you said you then got a job at JPL?

Gray:

That was while I was a student. I went straight out of USC into Stanford. Basically Tom Cover was a professor a Stanford who had been a roommate of Bob Schultz, my dissertation supervisor, so I found out about the position early and went up and interviewed and spent most of my time talking with Tom Kailath and Tom Cover. Kailath was already becoming as much signal processing as information theory. Cover was very definitely information theory. I had two offers and what sent me to Stanford was that if I went to the other offer I would be sort of in the shadow of somebody doing source coding and compression, and if I went to Stanford I was on my own. And I guess Stanford offered me a little bit more money, but it was really more the fact that Stanford would be I think more of a challenge and, I liked the idea of staying in California.

Nebeker:

Do you want to say what the other offer was?

Gray:

It was Cornell. My dissertation ended up having a very strong overlap with the dissertation of about a year earlier, that of Toby Berger. And so I knew of him and he knew of me, and because of the similarity with work that had led to an offer at Cornell. And in fact after I went to Stanford we spent either the first or second summer visiting Cornell for a summer. And I liked it there, but I liked Stanford better.

Nebeker:

I’m curious about the JPL job. Would you tell me about that?

Gray:

Well that was, as I mentioned, for two summers. I loved JPL. I didn’t like living in the Pasadena/Alta Dena area. So I ended up making a special deal whereby I could work long days four days a week and then spend three days out on the coast, and I worked a shared housing arrangement so that I lived on somebody’s couch four days a week. He lived on our couch the other three. And it was a very good group. It was working with Bob Tauseworth. As I mentioned, my first summer there was also Gus Solomon’s first period there. Andy Viterbi was there and so was Bill Lindsay. It was in the telecommunications area, and it was working on analysis of tracking loops, which was almost control theory, and on sequential decoding and on phase-locked loops. It would have been a great place to work except you could not see the mountains across the street from the smog. So I had no desire to stay there.

Nebeker:

But that again gave you some contact with actual systems.

Gray:

Yes, I did an analysis. I think it was the Mariner V tracking loop.

Nebeker:

I see that here in your list of publications. There's another publication on analysis of the effect of input noise on a VCO.

Gray:

There is a bizarre kind of random processor noise model called flicker noise, or 1/f noise, and I got involved with the analysis of that, which was nice and mathematical but it was a noise that very well fit something that crops up in phase-locked loops. And because it has some bizarre properties, some standard analysis techniques did not work for it. I got involved with that through Bob Tauseworth, who was I think my immediate supervisor, and that led to one of my first publications, an analysis of that appeared in the communications transactions. Probably even before the paper in the information theory transactions based on my Ph.D. work.

Nebeker:

Now you said that you thought of yourself as being in information theory and coding theory in this period?

Gray:

And maybe communications also.

Nebeker:

Well, I’m just looking at your early publications there. This is now a few years later, but this paper with Davisson in source coding without the ergodic assumption.

Gray:

Yes.

Nebeker:

Evidently it was an important paper.

Gray:

I would probably say that was part of my mathematical lunatic fringe. It was a paper that provided a generalization of Shannon theory that I think was interesting for a couple of reasons. One was just doing things more generally is nice. But the second and probably more important reason is that is it was one of the first examples of what later was referred to as a universal code. That is, these were codes designed when you don’t really know what the source you are looking at is, and so you have to have essentially a family of codes and somehow pick the best one on the fly. Although we didn’t know it at the time, the construction we had come up with to handle these very abstract things actually was a very good model for something that is practically important. And when we realized that, that led to some other work on universal coding. The paper though was unabashedly mathematical and the fact that we had generalized the Shannon theory to that is of debatable significance to the universe. But it won an Information Theory Society paper prize, which of course both of us very much appreciated.

Nebeker:

I see you have quite a few publications with Davisson.

Gray:

Yes.

Nebeker:

What’s the story there?

Gray:

Well, you actually fill in a gap I forgot before. One of the reasons I went to USC was that they had hired Lee Davisson as a professor. He was probably one of the most famous people in compression, which was an area that I thought might be kind of fun to work in, because of conversations with Irwin Jacobs. However when I got to USC it turned out he decided to stay at Princeton. And so I did not have the advantage of working with him. He subsequently went to USC. So when I went to Stanford, he went to USC. I don’t remember how exactly we got started working together, but I visited and we may have overlapped a little bit, and we found out we had common interests in quantization and in compression. So we started collaborating technically, both on research and papers and on a textbook for random processes. We became very close friends over the years.

Nebeker:

Even though he was in Southern California and you were up at Stanford?

Gray:

Yes. Because southern California wasn’t that far away. And he visited Stanford a fair number of times and I visited southern California. And it was mostly also on the mathematical side that my interests in a branch of mathematics called ergodic theory, which is what led to the paper that you cited, very much coincided with his interest to do some more mathematical things. He felt like he had been overdosing on the practical side. So it was a very nice just match of technical and I think social interests.

Nebeker:

It does look like a lot of these early publications are coding theory.

Gray:

Yes.

Nebeker:

Non-block source coding, sliding block source coding. Can you give me some kind of an overview of all this early coding theory work?

Gray:

Well, I think the main idea was that Shannon had proved these performance bounds on how well communication systems can do in an ideal situation. And he had made a lot of assumptions, assumptions about the behavior of the data sources that you are trying to compress. In particular, most of what he had done had been from memoryless sources, when the most interesting things in the real world like speech have lots of memory. And there are lots of codes that are not the kind he considered. He considered block codes, and there are codes that work in a more sliding, continuous fashion. Plus the fact he had also made a lot of assumptions about the nature of the communications channel such as you knew where a block started and ended; that is, synchronization.

The initial thing that got Davisson and me interested in this was first discovering that some of his assumptions could be generalized using techniques from ergodic theory. Which interestingly enough had a profound change a few years earlier because of Shannon’s work had been discovered to be very important for mathematics. And so, if you like, now it looked like the mathematics derived from engineering through math now could come back into at least pseudo engineering. And so we got very interested in generalizing the Shannon results to have sources with very complicated memory, and synchronization problems, and codes that as I mentioned have this sliding structure, as for example PCM, DPCM, ADPCM and delta modulation. These are not best modeled, I think, by block codes, but rather by sliding block codes.

Nebeker:

Now I know that it’s in fact a very common thing for mathematicians to try to generalize results and find out what it really depends upon and make it as general as possible, but it sounds like your work here was with at least an eye to what was going on in real systems.

Gray:

I think that’s fair, although I have to admit mathematicians often like to generalize. With the Shannon results, what happened is the engineers had gone way beyond anything Shannon had ever proved a theorem for. And yet everybody seemed to assume the Shannon stuff should apply. So it was more like a clean-up operation. You know, the guy with the scoop following the circus was to get Shannon theory a little bit closer to using the kinds of codes and being applied to the kinds of signals that in fact were being used. For example, Shannon assumed that you had an exact statistical description of the information source you want to compress or code, and rarely is that the case in the real world. So you want to be able to have an algorithm that says you come in cold. So what you have to do is first look at the data, build a model, then build your code for the model and know somehow that if nature is nice and underlying nature that there is a nice statistical model, your algorithm essentially will well approximate it, and then the code will work fairly well not just for your model, but also for that underlying thing which the powers did not give you on a platter. You had to do some guessing.

Nebeker:

I’m just trying to get some picture of this. Is ergodic theory, is that statistics?

Gray:

Ergodic theory I think is properly considered a branch of measure theory, which is really a superset of probability theory; that is, one way to think about it is as a branch of probability theory and of random processes. It looks at the long term average behavior of things. So for example you can consider it ergodic theory if you flip a coin forever and I count the number the heads, if that coin is really fair I ought to get something like fifty percent heads. So ergodic theory is trying to, or one of its aspects, is to try and understand when you can connect these abstract probabilities you compute with the real world measurements you make over a long period of time, of long term behavior. And so it tries to understand when do those things work, and when they don’t work can you patch up the theory to find something that does work.

Nebeker:

Okay.

Gray:

For example, if I tell you this is a fair coin and I flip it for a long time, you know roughly how it’s going to behave. But if I now have a couple of coins behind my back and they’ve each got a different amount of solder on them so they are not fair, you can’t predict quite so easily what’s going to happen, but you can say a lot about it if you think about it, and that’s part of what ergodic theory tries to do.

Nebeker:

And was this a well developed branch of mathematics whose results you were trying to apply to these coding problems?

Gray:

Yes, and fairly recently. Shannon’s ‘49 paper was picked up by Russians who in particular used the idea of entropy to understand a lot more about when ergodic theoretic results apply. Also a lot of the ergodic theoretic results are essentially proved by coding arguments of one process into another, and there are questions like, “When can I code this process into that one and then back again?” And it turned out the Shannon ideas were very important. This was mostly due to Kolmogorov and Sinai, who realized that Shannon’s ideas had an impact. Then it was a mathematician named Don Ornstein who happened to be at Stanford who tied a lot of those results up into almost final form. And he had only recently done that in the early ‘70s when I started getting involved with this. While Ornstein was very abstract and I could not track him easily, there was also another mathematician, Paul Shields, visiting Stanford at that time. My then student Dave Newhoff, who is here at ICASSP, wrote this quantization paper with me for the 50th anniversary issue of the IEEE Transactions on Information Theory, became disciplines to Shields, who translated Ornstein. And we learned about all of these tools, techniques and different ways of coding.

Nebeker:

Was Ornstein in the EE department?

Gray:

No, he was in the mathematics department. And he had won one of the major mathematics prizes, so he is I think one of the most famous mathematicians in the world. And he was positively amused by the fact that here was this engineering professor and student trying hard to understand his techniques. I can’t help but mention the fact that his explanation was all of these new coding techniques he had thought up, he had thought up because basically he was too lazy to read Shannon’s original papers and see how Shannon had done it. So he came up with alternative constructions, which we later called sliding-block codes. And so that’s what I guess was our main input at the time. I spent most of the ‘70s proving coding theorems, which is really Shannon theory.

Nebeker:

Okay.

Nebeker:

Was Dobrushin the other mathematician you mentioned?

Gray:

Yes, he was the other. Well, there was Paul Shields, who was probably the one I mentioned first. He was a mathematician very interested in learning engineering. He worked with Dave Neuhoff, my student, for many years after our collaboration ended. And, he is still very active in the information theory side of things. Shields wrote one of the nicer early books on ergodic theory when the Ornstein results were new. And as I said, it was very difficult for engineers to understand Ornstein’s writings. Shields sort of translated them to the masses. Dobrushin, I think, was one of the primary Russian, Soviet developers of information theory once Kolmogorov made that whole school aware of Shannon’s work. He was probably one of the best of the mathematicians working in information theory. He died a year or two ago.

Nebeker:

Now, in information theory, defined I suppose by these journals that have such a name in their title, are those people typically in EE departments?

Gray:

In the United States they have historically been mostly been in EE departments. Shannon was in EE and he was at MIT in EE. And the early days of information theory were mostly the MIT department of electrical engineering and Bell Labs. But at Bell Labs it was the mathematics research group, but many of those mathematicians were engineers by training. Since then it has expanded, but it’s still I think mostly in the U.S. it’s in EE. But in the rest of the world you will find it in statistics departments, sometimes in computer science departments, sometimes in math departments.

Nebeker:

There are some fields that end up with people who were trained in other fields. I mean physicists notoriously end up trying to do signal processing. But I was just wondering if there were many mathematicians who end up in the information systems?

Gray:

Very few. I think in fact Paul Shields is one of the few to come to mind. John Kieffer is another. People might disagree with me, but I think if you are a mathematician trying to get an understanding of the engineering applications, which is the point of this whole thing, it takes a fair amount of work. And the two cultures are very different, and when I was working with Nuehoff and Shields it probably took us a year to communicate, because we were using completely different languages to discuss common things. But the payoffs were great. They found tools they hadn’t realized existed for their problems. We found tools and methods of proving things that allowed us to generalize results we wanted to generalize.

Nebeker:

Do you think that that’s something that’s under exploited, this, that there are two communities that maybe don’t have as good communication as there might be?

Gray:

I think that that’s a chronic problem that engineers and mathematicians often do not talk to each other enough. I think just generally, professionally, there is a problem of myopia, which I think we academics have a duty to try and kick our students out of that kind of complacency and force them into taking, gobbling up classes in other departments to find out what they’re doing and what might be relevant.

Nebeker:

I was surprised to see that many of these papers in this coding or Shannon theory were published in the annals of probability or there were one or two others that looked like math in journals?

Gray:

Well, for me that was great. I mean it was sort of like coming out of the closet as at least partially a mathematician, and also proving that some of these problems that arise in engineering applications are of genuine mathematical interest and really have some intrinsic interest even if it was not for applications. The things that were aimed for journals like The Annals were mostly of mathematical interest, although other papers sent to engineering journals were then heavily derivative. That is, they could build on these other papers, refer to these for the mathematics that would probably drive some of our colleagues nuts. But I do know people who are in information theory who look down on anything they consider to be real analysis. Because that’s mathematics, and engineers don’t need to worry about that. This way I don’t have to argue with them. We’ll publish what they probably think of as mathematics in the mathematical journals. My favorite paper was the one that was written with Bill Dobrushin and Ornstein, which was in a mathematics journal which most of my colleagues and friends have never heard of, but probably took me more work than any other three or four papers I ever wrote.

Nebeker:

Can you find that for me quickly?

Gray:

I can I think. It was one of the ones in the Annals of Probability, in 1980.

Gray:

That was the same year that the paper for which I am probably best known for, the one with Yoseph Linde and Andreas Puzo came out, if memory serves. And that one all my friends know about. But only a select few knew about the other.

Nebeker:

I see. Well, I’ll say Dobrushin was 35 on this listing, and the Linde Buzo/Gray is 32.

Gray:

Okay. That was a good year.

Nebeker:

Would you tell me a little about that algorithm for vector quantizer.

Gray:

That one has a bit of a history, as often happens in results like this. After we do something we discover there were lots of other things in the literature very close to it. So what I think I can say off the bat is we can take credit for popularizing it and I think contributing to the idea. The basic idea was born when I had two students with me, Andreas Puzo and Yoseph Linde, and up until that time my interests had been primarily in information theory, proving theorems. My bias had started to change around the middle ‘70s when I got involved with work with my brother, Steen, or A.H. Gray, Jr., who was interested in speech. Forgive the long introduction, but I think it’s necessary to explain how I got there. Steen, knowing I was supposedly an expert in the theory of quantization, asked me some questions about optimal quantization because he and his co-worker John Markel, had become very puzzled by the fact that every conference they went to in signal processing would have umpteen papers titled optimal quantization for reflection coefficients or LPC coefficients to which their thought was, “Well, they can’t all be optimal. What is really the optimal here?” He had come to me with some questions which got us involved in some work looking at things that were optimal in a certain sense. Not exactly Shannon, but relating to what is sometimes called Bennett theory of asymptotic quantization. The point of all of this is I had started worrying about not just Shannon theory but how these things actually worked in speech. Well about that time one of the students, Yoseph Linde, had gotten involved with actually evaluating how well certain schemes worked and came up with an algorithm which subsequently proved to be due to Stu Lloyd from the 1950s in Bell Labs. We didn’t know it at that time.

Nebeker:

He independently came up with this?

Gray:

He had independently come up with it as just a computationally fast way to design a good scalar quantizer. And then I started dabbling with what does Shannon have to say about this and looking at a multidimensional generalization of it, but didn’t really get very far or think about how it would actually work or how it would actually apply. And Buzo was the one who really cut the Gordian knot when he realized that you could take this algorithm which Linde had programmed for scalar and I had written down some lines for vector, put them together with an idea that was implicit in the work of Itakura and Saito. In fact, they had come up with this idea of a quality metric or a distortion measure for speech. I’m not sure, but I think that Puzo, Linde and I can claim credit for being the first to call it the Itakura-Saito distortion measure.

It was clear it should be named after them because they were the first ones who did it, and then we just put that all together to design a speech coding system at very low bit rates. We were at first very discouraged by it because it sounded awful. We first described this work which we thought was only of theoretical interest at an information theory, not a signal processing, workshop that was held in Lake Tahoe. And my brother, who was the speech expert, came up and was also giving a speech about LPC coding and trying to interest the information theorists in signal processing. And what he discovered was that the data that he had sent us to try out was garbage. That is, they had sent us really lousy stuff. And our compressed stuff sounded lousy but very much like the lousy stuff that they had given us, and we realized that there was hope. And it was when essentially we got good clean data, we were able to come up with a system that wasn’t great, but it was taking something like 2400 bit per second LPC speech and knocking it down to something like 800 bit per second and having it at least intelligible, and certainly not that bad.

Linde left the field completely when he graduated and went on and founded a chip making company. Buzo went back to Mexico and continued to do very good work in speech coding, speech compression and speech recognition. And that paper I think stirred up a bunch of work. Now I have a better historical appreciation for it, which Neuhoff and I tried to put in that survey paper that I mentioned to you before, and now we can clearly see the precursors to that work. And also I think a lot of people realized around that time that it was an intimate cousin to something that had developed in statistics called the k-means algorithm. I’d still argue there were differences, and part of what we did was quite distinct. That is, you need something like a bootstrap to get this whole thing started, because it needs an initial starting place, a code book. And one of the things we did was to provide a means of designing a very low bit rate code for speech based on LPC ideas from scratch. You just run the algorithm and it designs the code.

Nebeker:

Was this how you got interested in speech?

Gray:

Well, that was really the story I told before. It was when my brother got me interested in quantization and we ended up writing a paper called “Optimal Quantizations of Reflection Coefficients.” That was when I first got interested in speech as a guinea pig, but that’s why Puzo was working actively on speech and actually tried this LPC vector quantizer, and Puzo spent some time working with my brother and John Markel at Signal Technology Incorporated, which was one of I think the main places for the development of the LPC revolution which Itakura started.

Nebeker:

I see. And then there was this, the Buzo-Gray-Gray-Markel paper, “Speech Coding Based on Vector Quantization.”

Gray:

That was simply taking the embryonic ideas and working out the speech application move thoroughly. The first paper had a more controversial time working through the reviewers. I think as the classic paper by Adelson and Burt, Adelson loves to say for his pyramid coding paper, that the reviewer said this will never go anywhere. We had some similar problems with our paper. On the other hand, the one specifically aimed at speech and doing lots of things that the original Trans. Communications paper did not do got a much friendlier reception in ICASSP. Both ICASSP and ASSP made me notice the Signal Processing Society as being somewhat more friendly to algorithms of this type.

Nebeker:

And is this the period when you started moving more toward things closer to application?

Gray:

Yes. I think from then on I was involved first with speech for several years, and then eventually with images. The movement into images really happened in the late ‘80s, from a different set of circumstances.

Nebeker:

Yes. I certainly want to hear about that. Before we leave speech though, is there some nice summary you can give me of these dozen or so papers that seem to have come out around ‘80?

Gray:

Part of it is completely non-technical. It was a rare experience I think to get to collaborate with a brother. And I know of very few sibling collaborations like that. I might add that was the time I rekindled my interest in amateur radio. We did a lot of discussing of this work on the amateur radio bands, which are of course quite public and not private. We would often get people running in and saying, “This sounds neat! What are you talking about?”

Nebeker:

Explain it all to us.

Gray:

It was a great collaboration in that it got my students temporary jobs and sometimes longer term jobs. The earlier ones worked at more primitive systems, and then as these things got fine tuned by the people down at Signal Technology and elsewhere, they got involved really in systems that I think subsequently proved to be quite practical. Embryonic versions of things like code excited linear prediction were done by that time by some very bright students. That work was very little information theory by that time, but you could call it either signal processing or communications. Several of the works were published in Transactions.

Nebeker:

I noticed that.

Gray:

As they took more and more advantage of ideas that are usually considered signal processing, they ended up migrating there, and there was at least one off-shoot, which was theoretical, which I really like, which ended up in the information theory Transactions, and one also in the Transactions on Pattern Analysis and Machine Intelligence, PAMI. But basically what we found, and this I can easily describe in English, was that Itakura-Saito distortion measure which they had argued on statistical and speech modeling grounds, ended up having exactly the same form as a classical result from information theory that appeared in Piusker's very abstract book on information measures. And that result, why those two things ended up being the same, ended in a fair amount of work on our part that I really liked. I can’t say it’s had a great impact, but it was great for insight into understanding the Itakura-Saito distortion measure's origins and in finding alternative ways of deriving all of this theory that to me at least involved fewer unpleasant assumptions about speech. The most common assuption being that speech was a Gaussian process, which has always been controversial, and I think most people involved with speech would deny it completely. The other way of looking at things allowed you to view speech as not Gaussian at all, but simply what happens if you try and synthesize speech by something that is Gaussian. So what you are trying to do is take a mathematical model you can build with a computer and have it sound like you. And that doesn’t bother me at all, because that’s what people do. And so it basically was theoretical, it was appropriate for information theory, and that’s where it went.

Nebeker:

Maybe this is a time to ask you about the IEEE Society structure and these various Transactions. You’ve published in a number of different Transactions, the SIAM journal and so on.

Gray:

Yes.

Nebeker:

How do you see this? Is it organized as well as it could be? Is it inevitable that one is going to have to go to different types of publication?

Gray:

It’s a good question. I should admit one bias I’ve got is that I was involved editorially with the IEEE Transactions on Information Theory, first as associate editor and then as what now we call editor-in-chief, but then it was just called editor. So I think I paid my dues on that, and we tried very hard to keep that as a very high quality and well edited journal. When we talk about how things have evolved and changed, first I think having lots of journals with overlapping areas is a good thing. A lot of my changes were because of the types of applications. The boundaries are not always clear, and if [inaudible] so long as it is clear that you pick a target and go for it, it is never fair to take multiple targets. And you have to do some thinking about what the best place for the paper is. I think what I’d gripe about is that I think there has been less editorial control over the years as too many of these journals got too large, so that now you can have bad English appearing in some Transactions, which in those days didn’t happen. The Transactions people edited, the associate editors edited, the editors really edited, and we were sensitive to authors that didn’t like those changes. And the philosophy was that the more the people in the know-how, the technical people, us, edited the paper, then the less effect the IEEE editorial staff would have.

We didn’t like the idea of technical editors who didn’t understand the meaning changing what went on. So long as they were all good journals, you could decide which one you thought was the best, and they did have differing reputations in differing communities. I think it’s safe to say if what you think you did was primarily of mathematical import, but it was aimed at engineers, then the information theory Transactions in some areas was the best place to send it to. If you wanted to hit the applications people, Communications or Transactions ASSP was better, but even then Transactions ASSP did have some theoretical papers, but I think fewer of them. The IT Transactions were the ones that took most of the hits about maybe being too theoretical or too abstract.

Nebeker:

What about the stuff in the Communications Society that was related to this.

Gray:

I felt more comfortable in the Communications Society years ago than I do now. I think partially because they have, probably out of necessity, evolved so heavily into networks about which I am less enthusiastic or knowledgeable about. I fit in there less well. I am not really a wireless person. I mean, I knew about radio and use it, but they have branched into a lot of topics that were of more interest to their audience but are of less interest to me. So as things in speech coding and image coding got to be of less interest to them, I migrated elsewhere.

Nebeker:

But do you think that it’s worked fairly well, let’s say for the moment just within IEEE, as to how papers end up appearing in different places?

Gray:

Mostly. I’m a little concerned now, because I see turf battles that I don’t remember before. I got very concerned when for example during a recent reorganization in signal processing they created a technical committee on signal processing and communications with many subsets, pieces that sounded really more appropriate for a communications or even information theory Transactions, and I question for example here at ICIP, is this really the place to be having sessions on lossless compression. Because lossless compression is historically communications and information theory, and for the last few years at least 50 percent computer science. So it has several venues, several conferences, does it need one more here.

Nebeker:

Yes.

Gray:

And my hope is the answer to that is yes, provided it’s very specific to images. That’s something the people making the editorial decisions and handling the papers that are coming in have to worry about

Nebeker:

Well and these decisions are slowly, gradually shaping the fields.

Gray:

Yes.

Nebeker:

I mean, if it turns out that there is a substantial body of theory, maybe that will emerge as a separate community and these application areas out there, maybe it will never build to that amount and remain in these scattered application areas.

Nebeker:

And the bottom line is quality. I mean if you risk diluting the area with too many conferences or too many societies, the quality of the papers that do appear may diminish. And that’s just something I think we have to be paranoid about.

Nebeker:

Well, if you look simply at the number of pages published by IEEE Transactions has gone way up.

Gray:

Yes.

Nebeker:

Do you think this is a mistake? Should we be exerting greater editorial control over what comes out in Transactions, having fewer of them say?

Gray:

That’s a dangerous question. My personal feeling is we’re probably publishing more pages than we need to. I just tend to suspect that they can’t all be as high quality as they historically have been. I think that every society has to be very careful about dragging their feet with expanding the size of their journals to be really sure that the stuff they are publishing is really up to snuff. When you have new conferences on the block, as ICIP still is, it’s fair to be forgiving, sympathetic, generous to get people here. So I don’t think for example the papers that appear at ICIP are going to be as solid as they are at ICASSP, simply because ICASSP has more of a tradition, it has an established community. ICIP has to grow up to that point. What worries me is if now all of a sudden we have an international conference in multimedia signal processing or a Transactions in that area, is that going to bleed off a young Transactions like ours and a young conference like ours with the end result that you can’t really have two where before there was one. I would like to see extreme caution exercised when creating new entities that are going to compete with the entities you have.

Nebeker:

Yes.

Gray:

It is clear that we had to spawn some new areas, because ICASSP could not handle it all. And I am convinced by the case for ICIP, and I think ICIP is getting steadily better. But I don’t want to a new society spawned every two or three years. We can’t manage it.

Nebeker:

Could I ask now about how you got into image processing?

Gray:

Okay. That I guess is another story where there were two influences. One was at the time speech funding for students was diminishing. As a professor, the number of students I have is doubly influenced by funding. First I can only have those that are fully funded on their own and choose what they want to do or that I can find funding for. It was clear that the funding for the speech side of things was going down, people of mine who were graduating having done Ph.D. theses in speech work were turning into networks people at Bell Labs and other places. So in some ways the writing on the wall to not encourage more speech work was there.

The other thing was that I had students who were getting very interested in image processing, but who didn’t need funding. If memory serves, it was really Rich Baker who had, being an entrepreneur by birth I think, pretty much found his own funding, wanted to try doing all of this stuff in image processing, and developed what at the time we thought was the first vector quantization system for images. We later discovered that we had been scooped elsewhere by my perpetual colleague and cohort Allen Gersho, and also this time by a group I was to get to know pretty well in Japan. On the other hand, we had done a few things differently, and I like to think somewhat better, and it was clear that it was something that was within our capability of doing and sort of proved that I had been wrong in thinking we couldn’t do image processing because we couldn’t afford the equipment. Speech had seemed better because we could afford what we needed to do, and it looked like for image processing you needed a lot of stuff we didn’t have. Baker basically got a used Stanford Television Network TV and started working on a PC, and he was designing codes and doing compression, very nicely, with that equipment which he traveled with because part of his Ph.D. was done while he traveled with his wife so she could finish her degree. And while he was visiting in other places, including Utah. I think had an impact on subsequent students. I think it’s fair to say the main people who got the group into image coding were Rich Baker and Eve Riskin. She came to our group out of MIT, took a year to work, and then when she came back got mostly interested both in the image vector quantization, thanks to Rich Baker. She was profoundly influenced by Phil Chou and Tom Lookabough, who had been the pinnacle of our speech work, and she was interested in medical images.

Nebeker:

I saw that here.

Gray:

And so that started a long period of work with my group involved with radiologists looking at compressing medical images and perhaps more strangely, evaluating the quality of medical images that have been compressed or actually that have been operated in any way by a computer.

Nebeker:

Well that’s obviously a very important thing to, essential thing to be able to do.

Gray:

That led to several years of work that got us involved much more with statistics, actually involved with clinical testing. I think the work has had an impact on digital mammography.

Nebeker:

So that the evaluation actually sort of went to the clinical level?

Gray:

Yes.

Nebeker:

The evaluation of how good these images are?

Gray:

Because that’s the only way you’ll ever sell it to that community. Because the feeling of a lot of engineers was that doctors will never accept lossy compressed images because of the thought of law suits.

Nebeker:

Yes.

Gray:

Starting with Eve and up until fairly recent times, we got involved in the statistical design of clinical experiments. Not just of lossy compression, but of simply digital images as opposed to analog images. And we’re currently involved with the first submitted application for a fully digital mammography machine and trying to convince the FDA that it’s as good as analog mammography. That was a derivative work which I would say had signal processing origins, because the statistics we are using are very much like the statistics that are used in statistical signal processing, but it was as much a realization that you have to convince the community you want to adopt this technology, and that community is not even yet convinced that digital X-rays are a good thing.

Nebeker:

I wanted to ask about this transfer for knowledge, techniques, and so on from speech signal processing to image processing.

Gray:

I guess one way to think of it is that we were among the first multimedia researchers. Our basis was compressing anybody’s signal, so this was a move of speech to images. Parts of the theory and application did not generalize, for example, things like the right generalization image coding of LPC or Itakura-Saito parcor coding has not been developed. Other parts did generalize. The common issue is how do you do a good job of compressing whatever signal it is you’re looking at, and that involves many approaches. Our favorite approach was to look at a bunch of sample data, learn from that, the data designs your codes. Lots of techniques aim at avoiding the necessity for a learning set, but usually they are making implicit assumptions about how the data are going to behave. The underlying theory extends, it’s just you’re dealing with a different kind of stuff. The statistics are going to differ. Which codes work well and which ones work badly, that’s going to differ. People may say that images are two-dimensional and signals and speech is one-dimensional, but I don’t buy that because the whole LPC approach is really a vector approach. You model a chunk of 50 milliseconds or more of speech as a linear filter, which is 10 or so coefficients plus the driving signal. So the idea of coding vectors is relevant for speech. You’re already at 10 or 12 dimensions or more. Images may be two dimensions, but again we like to group pixels together and code vectors. That’s what JPEG does, 8x8 64-dimensional vectors. So it’s all very much in the spirit of the multidimensional signal processing. It’s just the signals are different.

Nebeker:

But let me put it this way: Wasn’t it an advantage when you tackled some of these image processing problems to have worked on speech or some other types of signals?

Gray:

Yes, but I think that would have been true if we had worked on EKG or other signals. The advantage was having an appreciation for how hard it is to do a good and efficient job of compressing something in a lossy fashion with minimal complexity in some way. Speech is a very complex and rich signal. And it teaches you that if you design something say for a nice idealized Gaussian source, it’s probably not going to work all that well for speech, and you can do a whole lot better if you get a better feeling for the statistical behavior of real speech. And images are the same. You can do a lot better if you get a feeling for how those signals behave. Going the other way, things like wavelet coding have been very successful for images and not so successful for speech. That kind of signal processing just seems to do a nice job of breaking images up into primitive components, but they are not the right primitive components for speech, at least not yet.

Nebeker:

Well, this seems to speak to this question of whether signal processing is a grab bag of techniques or whether there is a substantial body of theory that’s applicable to all different types of signal processing.

Gray:

It’s both. I mean, you can go through the bottom underlying all of it and these are going to be things like good old fashioned Fourier analysis. Even if you are going to do wavelets, a lot of the analysis and results there are done using Fourier analysis. Probability and random processes are at the bottom. If you are going to be analog, you need to know about differential equations and partial differential equations, calculus, and all of the mathematical tools that are important to EEs in general. Then when you start getting more specific, other tools are going to be more useful. I think a standard electrical engineering undergraduate degree and a master’s gives most people these days an excellent background for doing signal processing or circuit design or computer engineering. Then building past that, you can start taking for example, image processing and speech processing courses just based on a standard EE background. When you take those more advanced courses, you get into the specific nature of particular applications and the specific tools, which still are going to have common ground. Both image and speech processing and linear prediction theory are extremely important. Estimation, prediction, ideas of Wiener filters and their digital equivalents. So you have a common base.

Nebeker:

So even at the forefront there is transfer.

Gray:

Yes. I think in more schools should teach an advanced undergraduate level class in image and speech and audio processing, because they are fun areas, you can usually get into them with just a really basic EE background, and get up to speed and doing some really interesting stuff given the computing toys that we now have. I brought my Mac G3 power book with me, and as we speak it is designing image codes which I will be able to look at because they’ll be on my screen when I go back, and I’m expecting that some will have worked out nicely and others will have worked out not so nicely, and it adds to my aura among my students if I don’t just throw out an abstract idea, but I am competent to program in MatLab and can actually try it out and show some promise. And then I don’t feel bad asking them to try it in a real language, like C, on a more hefty database. And it’s just marvelous that you can do that today.

Nebeker:

Well, we’ve gone on quite a while. Is there something I haven’t asked about that you’d like to comment on?

Gray:

Trying to look back, I think I can say mainly it’s a fun area. Images and speech are both part of the signal processing community, and I’ve been active in both. I pretty much wandered away from the speech. I feel more at home in the image area now. I still think it’s partially a fluke historically that I ended up being a signal processor instead of some other area, but that was largely a question of technical seduction. That’s where the interesting problems were, where at least a few times I’ve been fortunate enough to be around when new things were getting tried that gave more than epsilon improvement, either by colleagues or students or siblings, and the fact that in signal processing with a computer you can get in there and play with real data and listen and see what it’s doing. I don’t think I would change any of those turns that I took looking back upon them, and I can’t imagine anything better to say about a field. I have no regrets that I was more of a mathematician during part of it, and I think everybody in the field should spend a little bit of their time appreciating the theory.

Nebeker:

And you don’t regret not having been a mathematician from the beginning say?

Gray:

No, it was important to me I think egotistically and in some ways professionally to establish credentials on that line, and it was really a thrill to work with some real great minds on the mathematics side. Maybe it’s the way I felt about myself when I was a musician in my youth, I knew I couldn’t make a living at music, but I really enjoyed it and I got to where I was good enough that I could really appreciate the really great ones. Well, happily, signal processing gave me a vocation that I earned a rather nice living at, and even got to watch a few former students become millionaires.

Nebeker:

Well, thank you very much.

Gray:

You’re welcome.

@@ Line 29: / Line 29: @@
 <br>
-== Interview<br> ==
+== Interview<br>  ==
 Interview: Robert M. Gray<br>Interviewer: Frederik Nebeker<br>Date: 5 October 1998<br>Place: Chicago, Illinois
 <br>
 '''Nebeker:'''
 I’d like to begin by asking you where and when you were born and just a little bit about your family.<br>
+<br>
 '''Gray:'''
 I was born in 1943 in San Diego, actually in the North Island Naval Air Base Hospital that was hit by a plane six months later and no longer exists. My father was a lifetime naval officer from New England and my mother was a housewife from the Deep South.<br>
+<br>
 '''Nebeker:'''
 You said your father was involved with radio in World War I?<br>
+<br>
 '''Gray:'''
 He went to the Naval Academy and got out in 1910, went into submarines, and my guess is probably after he was moving from submarines to other ships he became involved with radio and ended up temporarily being a radio instructor, which meant he had to learn the material and then teach others. Radios were brand new in that time, as were launches other than small sailboats to get to some of the major ships.<br>
+<br>
 '''Nebeker:'''
 Did that mean that you had an interest in radio as a kid?<br>
+<br>
 '''Gray:'''
 I first became interested in radio when I was in high school and I joined a radio club. Then I got an amateur radio license. I was never terribly active. What intrigued me more was that my brother Steen, who had completed his undergraduate work at MIT, was teaching a course on computers at San Diego State. I would guess this is 1957-58, and he let me sit in. So I went and took a course in basic logic design and Boolean algebra. I got a little computer that I ordered called Geniac, and it came complete with a copy of Claude Shannon’s paper on Boolean algebra and applications to computing, and I did a science club project on logic using diodes. I recollect doing various simple logic operations with simple diode circuits. And that was probably what first really intrigued me. So I was pretty much digital from my origins rather than analog.<br>
+<br>
 '''Nebeker:'''
 Where did you grow up?<br>
+<br>
 '''Gray:'''
 I grew up in Coronado, California. That was a place a lot of naval officers retired to.<br>
+<br>
 '''Nebeker:'''
 Beautiful place.<br>
+<br>
 '''Gray:'''
 And it had I think a good high school, and I was able to get out and go take some courses at San Diego State. And now that I remember more carefully, my brother was in between MIT and going to Caltech for a Ph.D. <br>
+<br>
 '''Nebeker:'''
 And you went to MIT?<br>
+<br>
 '''Gray:'''
 I went to MIT as an undergraduate, essentially because both of my brothers had, and so it seemed very natural to follow them there, which I did in 1960.<br>
+<br>
 '''Nebeker:'''
 And that was for an EE degree?<br>
+<br>
 '''Gray:'''
 I knew I was going to go into EE fairly early, again probably because at the time I idolized my brothers and they had. But since they had done so well in EE, the only place I could outshine them was in humanities. So I ended up taking more humanities courses.<br>
+<br>
 '''Nebeker:'''
 And I see you got also a Master’s degree at MIT.<br>
+<br>
 '''Gray:'''
 I stayed there for a total of six years. Like my classmate Larry Rabiner, I was part of Course 6A, a cooperative program. He went to Bell Labs, I went to the U.S. Naval Ordinance Lab. Again, working basically on digital things, digital communications. Also passive sonar which was more analog, at the time. <br>
+<br>
 '''Nebeker:'''
 Where is that Ordinance Lab?<br>
+<br>
 '''Gray:'''
 It’s outside of Washington, D.C. in White Oaks/Silver Spring. So I was in Washington, D.C. several summers and other quarters during the ‘60s and I got to hear a speech by Jack Kennedy and by his brother. I was a Washington intern before it was a dirty word. Because all students working for the summer were considered interns. And so we had a lot of shows put on for us, which was really fascinating.<br>
+<br>
 '''Nebeker:'''
 I’ve heard very good things about that cooperative program. Did you like it?<br>
+<br>
 '''Gray:'''
 I loved it. I think it was an excellent idea. I would love to see something more like that at Stanford. I think it was good to get away from school. In our case we had classes taught at the plant, so we kept up a certain academic part, but it was very different doing engineering in the real world from just taking classes and learning the theory. And I think it was an absolutely essential part of my upbringing. I probably never would have gone to graduate school if I had not had a taste of real life engineering beforehand and realized you could do much niftier stuff in a lot of fields if you simply had more schooling.<br>
+<br>
 '''Nebeker:'''
 Were there any professors there at MIT that were particularly influential?<br>
+<br>
 '''Gray:'''
 Yes. The first one that comes to mind is Jim Bruce, who actually wrote one of the unrecognized classics in quantization. After I guess he had done his research, which included some work with Bose on the early development of some of the speaker systems that were subsequently so well commercialized, he essentially got involved with student affairs and became more of an administrator at MIT for many years. But early on he was active in research in signal processing and he was also my original undergraduate advisor. There was also Paul Gray. Not the Berkeley one, but the MIT one. No relation. But, he was in charge of the co-op program for the Naval Ordnance Lab when I first got involved in it. When I decided to do some things differently, including taking a half year off to travel, breaking my education in the middle, he was a profound influence on me. Bill Siebert was one of the great contributors to the early development of signal processing, especially in training lots of people like Larry Rabiner and Al Oppenheim. I should also add Al Drake, who taught probability to engineers.<br>
+<br>
 '''Nebeker:'''
 I know that you’ve had a lot to do with statistics in signal processing. Were you already interested in signal processing in those days?<br>
+<br>
 '''Gray:'''
 I guess I didn’t know the words. That became a title I think probably in the ‘70s to cover at least a substantial portion of what I did.<br>
+<br>
 '''Nebeker:'''
 Wasn’t the first course there in the late ‘60s somewhere at MIT?<br>
+<br>
 '''Gray:'''
 Oppenheim and Schaefer?<br>
+<br>
 '''Nebeker:'''
 Yes.<br>
+<br>
 '''Gray:'''
 I don’t know if that actually happened during the ‘60s. I didn’t take that course. The one I took was from Siebert, which was circuits, signals and systems. Their course could have started then, but I left MIT in ‘66, so my guess is that it really started after I had left.<br>
+<br>
 '''Nebeker:'''
 Okay. What did you do on completion of the cooperative program, the masters?<br>
+<br>
 '''Gray:'''
 Well, for the masters degree I worked mostly on sequential decoding, which I guess I’d call coding and information theory rather than signal processing, and I then decided to go elsewhere for graduate school, because my advisor, Irwin Jacobs, now well known for lots of other things, had suggested some problems to work on and unfortunately he went to a school that did not yet have a graduate school. UCSD was brand new. So I ended up going to USC, which allowed him to be on my dissertation committee, and spent three years there. and Jacobs basically found me a job at the Jet Propulsion Lab, where I worked during two summers. So that got me more involved in communications, phase-locked loops, more with sequential decoding, and the research led to work really more information theory than signal processing on source coding and rate distortion theory, the Shannon theory. <br>
+<br>
 '''Nebeker:'''
 Why did you choose USC?<br>
+<br>
 '''Gray:'''
 A variety of reasons. For one, I was able to teach a course in probability and random processes rather than being assigned a lab. I always preferred computers and theory to hardware. I just felt more comfortable with USC when I visited it, and I was able to continue to work with Jacobs, which at the time was something I very much wanted to do. So I guess the other thing was Zahrab Kaprilian, who was then the EE chair, and USC was in the process of expanding and improving its engineering school and department. He had actively recruited an outstanding faculty, and the year I was looking he was actively recruiting students. So he went to all of those schools he considered really good and basically telephoned a group of students he selected based on their admissions applications, and quite honestly that was extremely flattering. And it also provided a unique situation of bringing together a whole bunch of good competition from good schools and it was enjoyable. And it was a return to California.<br>
+<br>
 '''Nebeker:'''
 What professors did you work with at USC?<br>
+<br>
 '''Gray:'''
 My Ph.D. supervisor ended up being Bob Schultz, and his area was fairly distinct from mine. He was interested in communications and I think mostly synchronization codes in those days. On the other hand, he was willing to learn about source coding, which I was learning about, having been suggested by Jacobs. And I worked fairly well with him. It was an enjoyable collaboration, but technically I ended up working more with Lloyd Welch, who helped me out on some of the mathematical points. And as I had most of the EE courses from MIT already, I spent most of my course work in the math department. And there Tom Pitcher comes to mind, and Adriano Garcia, who was at JPL as well. And I spent a lot of time learning mathematics that is very important to signal processing.<br>
+<br>
 '''Nebeker:'''
 Wasn’t Richard Bellman there at the time?<br>
+<br>
 '''Gray:'''
 He was there, and I met him, but I didn’t work with him. He was certainly a presence there, but I think maybe the stronger influence was Solomon Goloub, and not at SC but at JPL was Gus Solomon, who was extremely entertaining. He was I think one of the funniest of all geniuses I have ever met. And he also had an impact. Bill Lindsay was there, from whom I learned a lot about phase-locked loops, with which I worked for awhile.<br>
+<br>
 '''Nebeker:'''
 I’ve come across a number of people like you in signal processing who have been very much interested in mathematics. Manfred Schroeder says he’s been a crypto-mathematician and Itakura said, you know, it’s always his hobby to read mathematics. And that sounds like something very early on you recognize that math would be valuable?<br>
+<br>
 '''Gray:'''
 Yes, for a variety of reasons. I think partially MIT, at least in those days, was not that great at teaching math. Engineers were sort of considered second class citizens, and except for those few like Rabiner who got into special courses, the math we had was dull and more mechanical. At USC not only was there all this time to take classes, there was also this attitude perceived by this bunch of engineers that Kaprilion brought in that the mathematicians looked down upon us. So it became a matter of honor as engineers to go in and beat them at their own game.<br>
+<br>
 '''Nebeker:'''
 I see.<br>
+<br>
 '''Gray:'''
 So we spent a great deal of time, and it was good stuff.<br>
+<br>
 '''Nebeker:'''
 And what was your dissertation?<br>
+<br>
 '''Gray:'''
 It dealt with rate distortion theory or source coding for auto regressive processes. This is a particular model of sources that has proved particularly useful in speech and in my case it was trying to evaluate Shannon functions for interesting sources with memory where it hadn’t been evaluated before. And, it involved in particular a branch of math that I learned about from the mathematics professors that I had called Toeplitz forms. And so it was a case where I was able to use some math that I was familiar with to get a new result. And I won’t say it was an application, because it was still Shannon theory, but it was at least one step closer.<br>
+<br>
 '''Nebeker:'''
 And has it proved useful in signal processing?<br>
+<br>
 '''Gray:'''
 Well, this particular result I would say provided some insight, and it was also at a time when Shannon source coding theory was getting increasing attention and importance. So I think it had a minor impact simply because the area was blossoming at the time, and it was at least a result that was dealing with models of information sources that were less trivial than the previous ones that had been treated. And it had a very interesting form, suggesting there were reasons for this unusual behavior, but I don’t think to this day anybody has really understood or generalized the results much past that. So it was I think of primarily theoretical import, and it at least disproved the claim that the Shannon results could only be obtained for memoryless sources, which would exclude interesting things like speech and images.<br>
+<br>
 '''Nebeker:'''
 Maybe I could at this point get you to comment on how close the interaction has been between information theory or coding theory and signal processing.<br>
+<br>
 '''Gray:'''
 I think it’s been pretty significant. I think it is not a coincidence that the two trace their birthdays to almost the same period. I think what we now call signal processing was profoundly influenced by Norbert Wiener, who was very much a contemporary of Claude Shannon. They both talked about a lot of things that were similar. I guess Shannon’s viewpoint was maybe more digital and Wiener’s was more analog. I think a lot of people have made the bridges between the two areas. Tom Kailath and Dave Forney come to mind in particular as they received their start in information theory but ending up doing things that are easily interpreted as signal processing. And a lot of information theory deals with algorithms, and I think classically signal processing is not always focused on how you go algorithmically from the continuous world of the discrete. Information theory has focused primarily on that frontier. And so I think with a lot of people nowadays it’s hard to distinguish one from the other. Traditionally information theory may have dealt more with performance bounds; how well you can do in the ideal situation.<br>
+<br>
 '''Nebeker:'''
 Right. The higher, the more abstract the theory.<br>
+<br>
 '''Gray:'''
 And signal processing has been a bit more concerned with the real world. But at least in things that I have dealt with, it’s usually been insight that comes from the information theory world. When you simply try and apply it to the specifics of a signal processing problem, speech or images, that forces you to find a way of accomplishing that goal, and then the information theory results give you some good guesses and insight and maybe some bounds to compare with to see how well you are doing.<br>
+<br>
 '''Nebeker:'''
 How does coding theory fit in this picture? Is it part of information theory?<br>
+<br>
 '''Gray:'''
 Coding theory. Information theory can claim much more than signal processing can claim. These become practical issues when you have separate IEEE societies fighting over territory for transactions. But coding, historically its main pushes came from the promises of information theory that you could do things vastly better than people have done by just doing ad hoc things, and thinking about things like the first error correcting codes. True, repetition codes probably pre-date Shannon, but the first real parity check codes, the first lossless source codes like Huffman coding. These all came out of the information theory community, mostly out of Bell Labs or MIT in the early days of the late 40s and 50s. Much of the development of coding theory until really this decade and maybe the end of the last one came out of information theory groups or organizations.<br>
+<br>
 '''Nebeker:'''
 Can coding theory be very abstract and not of much value to the signal processing engineer who has to figure out some way of coding something. Are there people doing coding theory, not the people who are actually figuring out how to code images or anything else?<br>
+<br>
 '''Gray:'''
 There may be a few people doing coding theory who are not connected to applications through the real world. But, I think by and a large as a field, if you look at the more famous of the code developers, people like Reed and Solomon, they were both either mathematicians or very mathematical engineers. Reed-Solomon codes are in virtually every CD. So the thing they developed is out there, and it’s everywhere. There have been a lot of cases where the mathematicians have come up with improved codes, and if they really were improved they got adopted and built. I don’t think it’s any different from signal processing history and Norbert Wiener who nobody understood at the start, but as they understood it they realized how useful it was and built things like matched filters and estimators and predictors.
-'''Nebeker:'''
-I’ve heard the complaint though that even within the signal processing, but applied here to this information theory/coding theory, that often academics you know develop a field in ways that maybe become further and further from the world of applications, and their work is less and less relevant to practical work.<br>
+<br> '''Nebeker:'''
+I’ve heard the complaint though that even within the signal processing, but applied here to this information theory/coding theory, that often academics you know develop a field in ways that maybe become further and further from the world of applications, and their work is less and less relevant to practical work.<br>
+<br>
 '''Gray:'''
 I won’t deny that, but I’d say they are digging their own grave when they do that, and they are not making their students terribly employable. I guess to show that there is the counter, I’d cite my own MS supervisor Irwin Jacobs, who went on to found Linkabit and then QualComm, and he and his wife recently gave something like $17 million to the USCD School of Engineering, which is now the Jacobs School of Engineering. And what he was doing when I was there was sequential decoding. And this was considered abstract by a lot of people. However at JPL, they realized that the only way to get pictures reliably from deep space to here was by sensible coding. Jacobs and Andy Viterbi commercialized these ideas, ideas like the Viterbi algorithm, which is essentially dynamic programming, which again a lot of people thought was abstract. So I think the people who are the most famous are the people who come up with the algorithms that really solve problems. And then sometimes, not always, they are the people who realize how good those algorithms are. The Viterbi algorithm, for example, was invented because he could not understand the sequential decoding algorithm of Wozencraft. So he came up with something simpler that he could understand and prove results for. And then it was other people, in particular Jim Omura and Dave Forney, who figured out what Viterbi was doing was actually optimal in a certain sense and practically had many advantages over the way it had been done before. So these were very bright people doing somewhat different things, but none of them got lost in abstraction and missed the connection to the real world.<br>
+<br>
 '''Nebeker:'''
 But it might be that these are the famous people whose result did have this great utility. In your experience with the field, does it seem that there are many people of a different sort who are sort of going off in abstraction directions?<br>
+<br>
 '''Gray:'''
 I don’t think there are many. Again, I think those who do that, most of them won’t last unless they’re really good at being abstract. Engineering departments pay better than math departments, and it’s pretty hard, if you have an engineering degree, to get a job in the math department even, if that’s what you want. At least I really like having the time just to fool around with the math and an idea that might or might not pay off. The danger is if that’s all you do and if that’s all you have your students do, you’re cloning yourself, which is a bad idea, creating your own competition, and you’re making it so that they’re not going to have an easy time fitting in. I think every student who works in compression for example should learn information theory, but I don’t encourage them to become professional Shannon theorists, unless it’s as a sideline. Because if they go into industry, or even into a university, it’s going to be JPEG and MPEG and ADPCM that they’re going to have to teach. And those are the tools they are going to have to give to most students. So they have to balance.<br>
+<br>
 '''Nebeker:'''
 And it sounds like in your view it’s not a great problem, that in most cases today there is good connection between them.<br>
+<br>
 '''Gray:'''
 I don’t think so. I think you could have gotten away with that maybe 30 years ago. I think it’s simply harder to get away with now. Funding is tougher. The funding agencies usually have to justify giving you money. Just because you’re working on a really fascinating problem isn’t enough. It’s got to at least hold promise for being able to do things better. And you can’t just say, “If I am successful, it’s going to improve the field.” You have to provide a convincing argument that if you can do this in fact it will have an impact on doing things better.<br>
+<br>
 '''Nebeker:'''
 Now you’ve been in both the information theory community and the signal processing community. Do you see that as like a continuum that you’re moving in or are there distinct communities there?<br>
+<br>
 '''Gray:'''
 I think they’re distinct communities, and there’s an overlap. My evolution is partially historical, maybe even largely historical, and most of my early career was involved with the Information Theory Society as well as with information theory. My change really came probably early ‘80s. As I changed technically to focus more and more on first speech and then later images, I didn’t drop information theory entirely, but I spent more time with people doing signal processing and got more involved with the Signal Processing Society.<br>
+<br>
 '''Nebeker:'''
 Why don’t we go back to your dissertation? And I don’t know, maybe we’d finished that. But you said you then got a job at JPL?<br>
+<br>
 '''Gray:'''
 That was while I was a student. I went straight out of USC into Stanford. Basically Tom Cover was a professor a Stanford who had been a roommate of Bob Schultz, my dissertation supervisor, so I found out about the position early and went up and interviewed and spent most of my time talking with Tom Kailath and Tom Cover. Kailath was already becoming as much signal processing as information theory. Cover was very definitely information theory. I had two offers and what sent me to Stanford was that if I went to the other offer I would be sort of in the shadow of somebody doing source coding and compression, and if I went to Stanford I was on my own. And I guess Stanford offered me a little bit more money, but it was really more the fact that Stanford would be I think more of a challenge and, I liked the idea of staying in California.<br>
+<br>
 '''Nebeker:'''
 Do you want to say what the other offer was?<br>
+<br>
 '''Gray:'''
 It was Cornell. My dissertation ended up having a very strong overlap with the dissertation of about a year earlier, that of Toby Berger. And so I knew of him and he knew of me, and because of the similarity with work that had led to an offer at Cornell. And in fact after I went to Stanford we spent either the first or second summer visiting Cornell for a summer. And I liked it there, but I liked Stanford better.<br>
+<br>
 '''Nebeker:'''
 I’m curious about the JPL job. Would you tell me about that?<br>
+<br>
 '''Gray:'''
 Well that was, as I mentioned, for two summers. I loved JPL. I didn’t like living in the Pasadena/Alta Dena area. So I ended up making a special deal whereby I could work long days four days a week and then spend three days out on the coast, and I worked a shared housing arrangement so that I lived on somebody’s couch four days a week. He lived on our couch the other three. And it was a very good group. It was working with Bob Tauseworth. As I mentioned, my first summer there was also Gus Solomon’s first period there. Andy Viterbi was there and so was Bill Lindsay. It was in the telecommunications area, and it was working on analysis of tracking loops, which was almost control theory, and on sequential decoding and on phase-locked loops. It would have been a great place to work except you could not see the mountains across the street from the smog. So I had no desire to stay there.<br>
+<br>
 '''Nebeker:'''
 But that again gave you some contact with actual systems.<br>
+<br>
 '''Gray:'''
 Yes, I did an analysis. I think it was the Mariner V tracking loop.<br>
+<br>
 '''Nebeker:'''
 I see that here in your list of publications. There's another publication on analysis of the effect of input noise on a VCO.<br>
+<br>
 '''Gray:'''
 There is a bizarre kind of random processor noise model called flicker noise, or 1/f noise, and I got involved with the analysis of that, which was nice and mathematical but it was a noise that very well fit something that crops up in phase-locked loops. And because it has some bizarre properties, some standard analysis techniques did not work for it. I got involved with that through Bob Tauseworth, who was I think my immediate supervisor, and that led to one of my first publications, an analysis of that appeared in the communications transactions. Probably even before the paper in the information theory transactions based on my Ph.D. work.<br>
+<br>
 '''Nebeker:'''
 Now you said that you thought of yourself as being in information theory and coding theory in this period?<br>
+<br>
 '''Gray:'''
 And maybe communications also.<br>
+<br>
 '''Nebeker:'''
 Well, I’m just looking at your early publications there. This is now a few years later, but this paper with Davisson in source coding without the ergodic assumption.<br>
+<br>
 '''Gray:'''
 Yes.<br>
+<br>
 '''Nebeker:'''
 Evidently it was an important paper.<br>
+<br>
 '''Gray:'''
 I would probably say that was part of my mathematical lunatic fringe. It was a paper that provided a generalization of Shannon theory that I think was interesting for a couple of reasons. One was just doing things more generally is nice. But the second and probably more important reason is that is it was one of the first examples of what later was referred to as a universal code. That is, these were codes designed when you don’t really know what the source you are looking at is, and so you have to have essentially a family of codes and somehow pick the best one on the fly. Although we didn’t know it at the time, the construction we had come up with to handle these very abstract things actually was a very good model for something that is practically important. And when we realized that, that led to some other work on universal coding. The paper though was unabashedly mathematical and the fact that we had generalized the Shannon theory to that is of debatable significance to the universe. But it won an Information Theory Society paper prize, which of course both of us very much appreciated.<br>
+<br>
 '''Nebeker:'''
 I see you have quite a few publications with Davisson. <br>
+<br>
 '''Gray:'''
 Yes.<br>
+<br>
 '''Nebeker:'''
 What’s the story there?<br>
+<br>
 '''Gray:'''
 Well, you actually fill in a gap I forgot before. One of the reasons I went to USC was that they had hired Lee Davisson as a professor. He was probably one of the most famous people in compression, which was an area that I thought might be kind of fun to work in, because of conversations with Irwin Jacobs. However when I got to USC it turned out he decided to stay at Princeton. And so I did not have the advantage of working with him. He subsequently went to USC. So when I went to Stanford, he went to USC. I don’t remember how exactly we got started working together, but I visited and we may have overlapped a little bit, and we found out we had common interests in quantization and in compression. So we started collaborating technically, both on research and papers and on a textbook for random processes. We became very close friends over the years.<br>
+<br>
+'''Nebeker:'''
-'''Nebeker:'''
+Even though he was in Southern California and you were up at Stanford?<br>
-Even though he was in Southern California and you were up at Stanford?<br>
+<br>
 '''Gray:'''
 Yes. Because southern California wasn’t that far away. And he visited Stanford a fair number of times and I visited southern California. And it was mostly also on the mathematical side that my interests in a branch of mathematics called ergodic theory, which is what led to the paper that you cited, very much coincided with his interest to do some more mathematical things. He felt like he had been overdosing on the practical side. So it was a very nice just match of technical and I think social interests.<br>
+<br>
 '''Nebeker:'''
 It does look like a lot of these early publications are coding theory.<br>
+<br>
 '''Gray:'''
 Yes.<br>
+<br>
 '''Nebeker:'''
 Non-block source coding, sliding block source coding. Can you give me some kind of an overview of all this early coding theory work?<br>
+<br>
 '''Gray:'''
@@ Line 548: / Line 547: @@
 Well, I think the main idea was that Shannon had proved these performance bounds on how well communication systems can do in an ideal situation. And he had made a lot of assumptions, assumptions about the behavior of the data sources that you are trying to compress. In particular, most of what he had done had been from memoryless sources, when the most interesting things in the real world like speech have lots of memory. And there are lots of codes that are not the kind he considered. He considered block codes, and there are codes that work in a more sliding, continuous fashion. Plus the fact he had also made a lot of assumptions about the nature of the communications channel such as you knew where a block started and ended; that is, synchronization. <br>
+<br>
+The initial thing that got Davisson and me interested in this was first discovering that some of his assumptions could be generalized using techniques from ergodic theory. Which interestingly enough had a profound change a few years earlier because of Shannon’s work had been discovered to be very important for mathematics. And so, if you like, now it looked like the mathematics derived from engineering through math now could come back into at least pseudo engineering. And so we got very interested in generalizing the Shannon results to have sources with very complicated memory, and synchronization problems, and codes that as I mentioned have this sliding structure, as for example PCM, DPCM, ADPCM and delta modulation. These are not best modeled, I think, by block codes, but rather by sliding block codes. <br>
-The initial thing that got Davisson and me interested in this was first discovering that some of his assumptions could be generalized using techniques from ergodic theory. Which interestingly enough had a profound change a few years earlier because of Shannon’s work had been discovered to be very important for mathematics. And so, if you like, now it looked like the mathematics derived from engineering through math now could come back into at least pseudo engineering. And so we got very interested in generalizing the Shannon results to have sources with very complicated memory, and synchronization problems, and codes that as I mentioned have this sliding structure, as for example PCM, DPCM, ADPCM and delta modulation. These are not best modeled, I think, by block codes, but rather by sliding block codes. <br>
+<br>
 '''Nebeker:'''
 Now I know that it’s in fact a very common thing for mathematicians to try to generalize results and find out what it really depends upon and make it as general as possible, but it sounds like your work here was with at least an eye to what was going on in real systems.<br>
+<br>
 '''Gray:'''
 I think that’s fair, although I have to admit mathematicians often like to generalize. With the Shannon results, what happened is the engineers had gone way beyond anything Shannon had ever proved a theorem for. And yet everybody seemed to assume the Shannon stuff should apply. So it was more like a clean-up operation. You know, the guy with the scoop following the circus was to get Shannon theory a little bit closer to using the kinds of codes and being applied to the kinds of signals that in fact were being used. For example, Shannon assumed that you had an exact statistical description of the information source you want to compress or code, and rarely is that the case in the real world. So you want to be able to have an algorithm that says you come in cold. So what you have to do is first look at the data, build a model, then build your code for the model and know somehow that if nature is nice and underlying nature that there is a nice statistical model, your algorithm essentially will well approximate it, and then the code will work fairly well not just for your model, but also for that underlying thing which the powers did not give you on a platter. You had to do some guessing.<br>
+<br>
 '''Nebeker:'''
 I’m just trying to get some picture of this. Is ergodic theory, is that statistics?<br>
+<br>
 '''Gray:'''
 Ergodic theory I think is properly considered a branch of measure theory, which is really a superset of probability theory; that is, one way to think about it is as a branch of probability theory and of random processes. It looks at the long term average behavior of things. So for example you can consider it ergodic theory if you flip a coin forever and I count the number the heads, if that coin is really fair I ought to get something like fifty percent heads. So ergodic theory is trying to, or one of its aspects, is to try and understand when you can connect these abstract probabilities you compute with the real world measurements you make over a long period of time, of long term behavior. And so it tries to understand when do those things work, and when they don’t work can you patch up the theory to find something that does work.<br>
+<br>
 '''Nebeker:'''
 Okay.<br>
+<br>
 '''Gray:'''
 For example, if I tell you this is a fair coin and I flip it for a long time, you know roughly how it’s going to behave. But if I now have a couple of coins behind my back and they’ve each got a different amount of solder on them so they are not fair, you can’t predict quite so easily what’s going to happen, but you can say a lot about it if you think about it, and that’s part of what ergodic theory tries to do.<br>
+<br>
 '''Nebeker:'''
 And was this a well developed branch of mathematics whose results you were trying to apply to these coding problems? <br>
-*'''Gray:'''
+<br>
-Yes, and fairly recently. Shannon’s ‘49 paper was picked up by Russians who in particular used the idea of entropy to understand a lot more about when ergodic theoretic results apply. Also a lot of the ergodic theoretic results are essentially proved by coding arguments of one process into another, and there are questions like, “When can I code this process into that one and then back again?” And it turned out the Shannon ideas were very important. This was mostly due to Kolmogorov and Sinai, who realized that Shannon’s ideas had an impact. Then it was a mathematician named Don Ornstein who happened to be at Stanford who tied a lot of those results up into almost final form. And he had only recently done that in the early ‘70s when I started getting involved with this. While Ornstein was very abstract and I could not track him easily, there was also another mathematician, Paul Shields, visiting Stanford at that time. My then student Dave Newhoff, who is here at ICASSP, wrote this quantization paper with me for the 50th anniversary issue of the IEEE Transactions on Information Theory, became disciplines to Shields, who translated Ornstein. And we learned about all of these tools, techniques and different ways of coding.<br>
+*'''Gray:'''
+Yes, and fairly recently. Shannon’s ‘49 paper was picked up by Russians who in particular used the idea of entropy to understand a lot more about when ergodic theoretic results apply. Also a lot of the ergodic theoretic results are essentially proved by coding arguments of one process into another, and there are questions like, “When can I code this process into that one and then back again?” And it turned out the Shannon ideas were very important. This was mostly due to Kolmogorov and Sinai, who realized that Shannon’s ideas had an impact. Then it was a mathematician named Don Ornstein who happened to be at Stanford who tied a lot of those results up into almost final form. And he had only recently done that in the early ‘70s when I started getting involved with this. While Ornstein was very abstract and I could not track him easily, there was also another mathematician, Paul Shields, visiting Stanford at that time. My then student Dave Newhoff, who is here at ICASSP, wrote this quantization paper with me for the 50th anniversary issue of the IEEE Transactions on Information Theory, became disciplines to Shields, who translated Ornstein. And we learned about all of these tools, techniques and different ways of coding.<br>
+<br>
 '''Nebeker:'''
 Was Ornstein in the EE department?<br>
+<br>
 '''Gray:'''
 No, he was in the mathematics department. And he had won one of the major mathematics prizes, so he is I think one of the most famous mathematicians in the world. And he was positively amused by the fact that here was this engineering professor and student trying hard to understand his techniques. I can’t help but mention the fact that his explanation was all of these new coding techniques he had thought up, he had thought up because basically he was too lazy to read Shannon’s original papers and see how Shannon had done it. So he came up with alternative constructions, which we later called sliding-block codes. And so that’s what I guess was our main input at the time. I spent most of the ‘70s proving coding theorems, which is really Shannon theory. <br>
+<br>
 '''Nebeker:'''
 Okay. <br>
+<br>
 '''Nebeker:'''
 Was Dobrushin the other mathematician you mentioned?<br>
+<br>
 '''Gray:'''
 Yes, he was the other. Well, there was Paul Shields, who was probably the one I mentioned first. He was a mathematician very interested in learning engineering. He worked with Dave Neuhoff, my student, for many years after our collaboration ended. And, he is still very active in the information theory side of things. Shields wrote one of the nicer early books on ergodic theory when the Ornstein results were new. And as I said, it was very difficult for engineers to understand Ornstein’s writings. Shields sort of translated them to the masses. Dobrushin, I think, was one of the primary Russian, Soviet developers of information theory once Kolmogorov made that whole school aware of Shannon’s work. He was probably one of the best of the mathematicians working in information theory. He died a year or two ago.<br>
+<br>
 '''Nebeker:'''
 Now, in information theory, defined I suppose by these journals that have such a name in their title, are those people typically in EE departments?<br>
+<br>
 '''Gray:'''
 In the United States they have historically been mostly been in EE departments. Shannon was in EE and he was at MIT in EE. And the early days of information theory were mostly the MIT department of electrical engineering and Bell Labs. But at Bell Labs it was the mathematics research group, but many of those mathematicians were engineers by training. Since then it has expanded, but it’s still I think mostly in the U.S. it’s in EE. But in the rest of the world you will find it in statistics departments, sometimes in computer science departments, sometimes in math departments.<br>
+<br>
 '''Nebeker:'''
 There are some fields that end up with people who were trained in other fields. I mean physicists notoriously end up trying to do signal processing. But I was just wondering if there were many mathematicians who end up in the information systems?<br>
+<br>
 '''Gray:'''
 Very few. I think in fact Paul Shields is one of the few to come to mind. John Kieffer is another. People might disagree with me, but I think if you are a mathematician trying to get an understanding of the engineering applications, which is the point of this whole thing, it takes a fair amount of work. And the two cultures are very different, and when I was working with Nuehoff and Shields it probably took us a year to communicate, because we were using completely different languages to discuss common things. But the payoffs were great. They found tools they hadn’t realized existed for their problems. We found tools and methods of proving things that allowed us to generalize results we wanted to generalize.<br>
+<br>
 '''Nebeker:'''
 Do you think that that’s something that’s under exploited, this, that there are two communities that maybe don’t have as good communication as there might be?<br>
+<br>
 '''Gray:'''
 I think that that’s a chronic problem that engineers and mathematicians often do not talk to each other enough. I think just generally, professionally, there is a problem of myopia, which I think we academics have a duty to try and kick our students out of that kind of complacency and force them into taking, gobbling up classes in other departments to find out what they’re doing and what might be relevant.<br>
+<br>
 '''Nebeker:'''
 I was surprised to see that many of these papers in this coding or Shannon theory were published in the annals of probability or there were one or two others that looked like math in journals?<br>
+<br>
 '''Gray:'''
 Well, for me that was great. I mean it was sort of like coming out of the closet as at least partially a mathematician, and also proving that some of these problems that arise in engineering applications are of genuine mathematical interest and really have some intrinsic interest even if it was not for applications. The things that were aimed for journals like The Annals were mostly of mathematical interest, although other papers sent to engineering journals were then heavily derivative. That is, they could build on these other papers, refer to these for the mathematics that would probably drive some of our colleagues nuts. But I do know people who are in information theory who look down on anything they consider to be real analysis. Because that’s mathematics, and engineers don’t need to worry about that. This way I don’t have to argue with them. We’ll publish what they probably think of as mathematics in the mathematical journals. My favorite paper was the one that was written with Bill Dobrushin and Ornstein, which was in a mathematics journal which most of my colleagues and friends have never heard of, but probably took me more work than any other three or four papers I ever wrote.<br>
+<br>
 '''Nebeker:'''
 Can you find that for me quickly?<br>
+<br>
 '''Gray:'''
 I can I think. It was one of the ones in the Annals of Probability, in 1980.<br>
+<br>
 '''Gray:'''
 That was the same year that the paper for which I am probably best known for, the one with Yoseph Linde and Andreas Puzo came out, if memory serves. And that one all my friends know about. But only a select few knew about the other.<br>
+<br>
 '''Nebeker:'''
 I see. Well, I’ll say Dobrushin was 35 on this listing, and the Linde Buzo/Gray is 32.<br>
+<br>
 '''Gray:'''
 Okay. That was a good year. <br>
+<br>
 '''Nebeker:'''
 Would you tell me a little about that algorithm for vector quantizer.<br>
+<br>
 '''Gray:'''
 That one has a bit of a history, as often happens in results like this. After we do something we discover there were lots of other things in the literature very close to it. So what I think I can say off the bat is we can take credit for popularizing it and I think contributing to the idea. The basic idea was born when I had two students with me, Andreas Puzo and Yoseph Linde, and up until that time my interests had been primarily in information theory, proving theorems. My bias had started to change around the middle ‘70s when I got involved with work with my brother, Steen, or A.H. Gray, Jr., who was interested in speech. Forgive the long introduction, but I think it’s necessary to explain how I got there. Steen, knowing I was supposedly an expert in the theory of quantization, asked me some questions about optimal quantization because he and his co-worker John Markel, had become very puzzled by the fact that every conference they went to in signal processing would have umpteen papers titled optimal quantization for reflection coefficients or LPC coefficients to which their thought was, “Well, they can’t all be optimal. What is really the optimal here?” He had come to me with some questions which got us involved in some work looking at things that were optimal in a certain sense. Not exactly Shannon, but relating to what is sometimes called Bennett theory of asymptotic quantization. The point of all of this is I had started worrying about not just Shannon theory but how these things actually worked in speech. Well about that time one of the students, Yoseph Linde, had gotten involved with actually evaluating how well certain schemes worked and came up with an algorithm which subsequently proved to be due to Stu Lloyd from the 1950s in Bell Labs. We didn’t know it at that time.<br>
+<br>
 '''Nebeker:'''
 He independently came up with this?<br>
+<br>
 '''Gray:'''
@@ Line 732: / Line 731: @@
 He had independently come up with it as just a computationally fast way to design a good scalar quantizer. And then I started dabbling with what does Shannon have to say about this and looking at a multidimensional generalization of it, but didn’t really get very far or think about how it would actually work or how it would actually apply. And Buzo was the one who really cut the Gordian knot when he realized that you could take this algorithm which Linde had programmed for scalar and I had written down some lines for vector, put them together with an idea that was implicit in the work of Itakura and Saito. In fact, they had come up with this idea of a quality metric or a distortion measure for speech. I’m not sure, but I think that Puzo, Linde and I can claim credit for being the first to call it the Itakura-Saito distortion measure. <br>
+<br>
 It was clear it should be named after them because they were the first ones who did it, and then we just put that all together to design a speech coding system at very low bit rates. We were at first very discouraged by it because it sounded awful. We first described this work which we thought was only of theoretical interest at an information theory, not a signal processing, workshop that was held in Lake Tahoe. And my brother, who was the speech expert, came up and was also giving a speech about LPC coding and trying to interest the information theorists in signal processing. And what he discovered was that the data that he had sent us to try out was garbage. That is, they had sent us really lousy stuff. And our compressed stuff sounded lousy but very much like the lousy stuff that they had given us, and we realized that there was hope. And it was when essentially we got good clean data, we were able to come up with a system that wasn’t great, but it was taking something like 2400 bit per second LPC speech and knocking it down to something like 800 bit per second and having it at least intelligible, and certainly not that bad. <br>
+<br>
+Linde left the field completely when he graduated and went on and founded a chip making company. Buzo went back to Mexico and continued to do very good work in speech coding, speech compression and speech recognition. And that paper I think stirred up a bunch of work. Now I have a better historical appreciation for it, which Neuhoff and I tried to put in that survey paper that I mentioned to you before, and now we can clearly see the precursors to that work. And also I think a lot of people realized around that time that it was an intimate cousin to something that had developed in statistics called the k-means algorithm. I’d still argue there were differences, and part of what we did was quite distinct. That is, you need something like a bootstrap to get this whole thing started, because it needs an initial starting place, a code book. And one of the things we did was to provide a means of designing a very low bit rate code for speech based on LPC ideas from scratch. You just run the algorithm and it designs the code.<br>
-Linde left the field completely when he graduated and went on and founded a chip making company. Buzo went back to Mexico and continued to do very good work in speech coding, speech compression and speech recognition. And that paper I think stirred up a bunch of work. Now I have a better historical appreciation for it, which Neuhoff and I tried to put in that survey paper that I mentioned to you before, and now we can clearly see the precursors to that work. And also I think a lot of people realized around that time that it was an intimate cousin to something that had developed in statistics called the k-means algorithm. I’d still argue there were differences, and part of what we did was quite distinct. That is, you need something like a bootstrap to get this whole thing started, because it needs an initial starting place, a code book. And one of the things we did was to provide a means of designing a very low bit rate code for speech based on LPC ideas from scratch. You just run the algorithm and it designs the code.<br>
+<br>
 '''Nebeker:'''
 Was this how you got interested in speech?<br>
+<br>
 '''Gray:'''
 Well, that was really the story I told before. It was when my brother got me interested in quantization and we ended up writing a paper called “Optimal Quantizations of Reflection Coefficients.” That was when I first got interested in speech as a guinea pig, but that’s why Puzo was working actively on speech and actually tried this LPC vector quantizer, and Puzo spent some time working with my brother and John Markel at Signal Technology Incorporated, which was one of I think the main places for the development of the LPC revolution which Itakura started.<br>
+<br>
+'''Nebeker:'''
-'''Nebeker:'''
+I see. And then there was this, the Buzo-Gray-Gray-Markel paper, “Speech Coding Based on Vector Quantization.”<br>
-I see. And then there was this, the Buzo-Gray-Gray-Markel paper, “Speech Coding Based on Vector Quantization.”<br>
+<br>
 '''Gray:'''
 That was simply taking the embryonic ideas and working out the speech application move thoroughly. The first paper had a more controversial time working through the reviewers. I think as the classic paper by Adelson and Burt, Adelson loves to say for his pyramid coding paper, that the reviewer said this will never go anywhere. We had some similar problems with our paper. On the other hand, the one specifically aimed at speech and doing lots of things that the original Trans. Communications paper did not do got a much friendlier reception in ICASSP. Both ICASSP and ASSP made me notice the Signal Processing Society as being somewhat more friendly to algorithms of this type.<br>
+<br>
 '''Nebeker:'''
 And is this the period when you started moving more toward things closer to application?<br>
+<br>
 '''Gray:'''
 Yes. I think from then on I was involved first with speech for several years, and then eventually with images. The movement into images really happened in the late ‘80s, from a different set of circumstances.<br>
+<br>
 '''Nebeker:'''
 Yes. I certainly want to hear about that. Before we leave speech though, is there some nice summary you can give me of these dozen or so papers that seem to have come out around ‘80?<br>
+<br>
 '''Gray:'''
 Part of it is completely non-technical. It was a rare experience I think to get to collaborate with a brother. And I know of very few sibling collaborations like that. I might add that was the time I rekindled my interest in amateur radio. We did a lot of discussing of this work on the amateur radio bands, which are of course quite public and not private. We would often get people running in and saying, “This sounds neat! What are you talking about?”<br>
+<br>
 '''Nebeker:'''
 Explain it all to us.<br>
+<br>
 '''Gray:'''
 It was a great collaboration in that it got my students temporary jobs and sometimes longer term jobs. The earlier ones worked at more primitive systems, and then as these things got fine tuned by the people down at Signal Technology and elsewhere, they got involved really in systems that I think subsequently proved to be quite practical. Embryonic versions of things like code excited linear prediction were done by that time by some very bright students. That work was very little information theory by that time, but you could call it either signal processing or communications. Several of the works were published in Transactions.<br>
+<br>
 '''Nebeker:'''
 I noticed that.<br>
+<br>
 '''Gray:'''
 As they took more and more advantage of ideas that are usually considered signal processing, they ended up migrating there, and there was at least one off-shoot, which was theoretical, which I really like, which ended up in the information theory Transactions, and one also in the Transactions on Pattern Analysis and Machine Intelligence, PAMI. But basically what we found, and this I can easily describe in English, was that Itakura-Saito distortion measure which they had argued on statistical and speech modeling grounds, ended up having exactly the same form as a classical result from information theory that appeared in Piusker's very abstract book on information measures. And that result, why those two things ended up being the same, ended in a fair amount of work on our part that I really liked. I can’t say it’s had a great impact, but it was great for insight into understanding the Itakura-Saito distortion measure's origins and in finding alternative ways of deriving all of this theory that to me at least involved fewer unpleasant assumptions about speech. The most common assuption being that speech was a Gaussian process, which has always been controversial, and I think most people involved with speech would deny it completely. The other way of looking at things allowed you to view speech as not Gaussian at all, but simply what happens if you try and synthesize speech by something that is Gaussian. So what you are trying to do is take a mathematical model you can build with a computer and have it sound like you. And that doesn’t bother me at all, because that’s what people do. And so it basically was theoretical, it was appropriate for information theory, and that’s where it went.<br>
+<br>
 '''Nebeker:'''
 Maybe this is a time to ask you about the IEEE Society structure and these various Transactions. You’ve published in a number of different Transactions, the SIAM journal and so on.<br>
+<br>
 '''Gray:'''
 Yes.<br>
+<br>
 '''Nebeker:'''
 How do you see this? Is it organized as well as it could be? Is it inevitable that one is going to have to go to different types of publication?<br>
+<br>
 '''Gray:'''
@@ Line 836: / Line 835: @@
 It’s a good question. I should admit one bias I’ve got is that I was involved editorially with the IEEE Transactions on Information Theory, first as associate editor and then as what now we call editor-in-chief, but then it was just called editor. So I think I paid my dues on that, and we tried very hard to keep that as a very high quality and well edited journal. When we talk about how things have evolved and changed, first I think having lots of journals with overlapping areas is a good thing. A lot of my changes were because of the types of applications. The boundaries are not always clear, and if [inaudible] so long as it is clear that you pick a target and go for it, it is never fair to take multiple targets. And you have to do some thinking about what the best place for the paper is. I think what I’d gripe about is that I think there has been less editorial control over the years as too many of these journals got too large, so that now you can have bad English appearing in some Transactions, which in those days didn’t happen. The Transactions people edited, the associate editors edited, the editors really edited, and we were sensitive to authors that didn’t like those changes. And the philosophy was that the more the people in the know-how, the technical people, us, edited the paper, then the less effect the IEEE editorial staff would have. <br>
+<br>
+We didn’t like the idea of technical editors who didn’t understand the meaning changing what went on. So long as they were all good journals, you could decide which one you thought was the best, and they did have differing reputations in differing communities. I think it’s safe to say if what you think you did was primarily of mathematical import, but it was aimed at engineers, then the information theory Transactions in some areas was the best place to send it to. If you wanted to hit the applications people, Communications or Transactions ASSP was better, but even then Transactions ASSP did have some theoretical papers, but I think fewer of them. The IT Transactions were the ones that took most of the hits about maybe being too theoretical or too abstract.<br>
-We didn’t like the idea of technical editors who didn’t understand the meaning changing what went on. So long as they were all good journals, you could decide which one you thought was the best, and they did have differing reputations in differing communities. I think it’s safe to say if what you think you did was primarily of mathematical import, but it was aimed at engineers, then the information theory Transactions in some areas was the best place to send it to. If you wanted to hit the applications people, Communications or Transactions ASSP was better, but even then Transactions ASSP did have some theoretical papers, but I think fewer of them. The IT Transactions were the ones that took most of the hits about maybe being too theoretical or too abstract.<br>
+<br>
 '''Nebeker:'''
 What about the stuff in the Communications Society that was related to this.<br>
+<br>
 '''Gray:'''
 I felt more comfortable in the Communications Society years ago than I do now. I think partially because they have, probably out of necessity, evolved so heavily into networks about which I am less enthusiastic or knowledgeable about. I fit in there less well. I am not really a wireless person. I mean, I knew about radio and use it, but they have branched into a lot of topics that were of more interest to their audience but are of less interest to me. So as things in speech coding and image coding got to be of less interest to them, I migrated elsewhere.<br>
+<br>
 '''Nebeker:'''
 But do you think that it’s worked fairly well, let’s say for the moment just within IEEE, as to how papers end up appearing in different places? <br>
+<br>
 '''Gray:'''
 Mostly. I’m a little concerned now, because I see turf battles that I don’t remember before. I got very concerned when for example during a recent reorganization in signal processing they created a technical committee on signal processing and communications with many subsets, pieces that sounded really more appropriate for a communications or even information theory Transactions, and I question for example here at ICIP, is this really the place to be having sessions on lossless compression. Because lossless compression is historically communications and information theory, and for the last few years at least 50 percent computer science. So it has several venues, several conferences, does it need one more here.<br>
+<br>
 '''Nebeker:'''
 Yes.<br>
+<br>
 '''Gray:'''
 And my hope is the answer to that is yes, provided it’s very specific to images. That’s something the people making the editorial decisions and handling the papers that are coming in have to worry about <br>
+<br>
 '''Nebeker:'''
 Well and these decisions are slowly, gradually shaping the fields. <br>
+<br>
 '''Gray:'''
 Yes.<br>
+<br>
 '''Nebeker:'''
 I mean, if it turns out that there is a substantial body of theory, maybe that will emerge as a separate community and these application areas out there, maybe it will never build to that amount and remain in these scattered application areas.<br>
+<br>
 '''Nebeker:'''
 And the bottom line is quality. I mean if you risk diluting the area with too many conferences or too many societies, the quality of the papers that do appear may diminish. And that’s just something I think we have to be paranoid about.<br>
+<br>
 '''Nebeker:'''
 Well, if you look simply at the number of pages published by IEEE Transactions has gone way up.<br>
+<br>
 '''Gray:'''
 Yes.<br>
+<br>
 '''Nebeker:'''
 Do you think this is a mistake? Should we be exerting greater editorial control over what comes out in Transactions, having fewer of them say?<br>
+<br>
 '''Gray:'''
 That’s a dangerous question. My personal feeling is we’re probably publishing more pages than we need to. I just tend to suspect that they can’t all be as high quality as they historically have been. I think that every society has to be very careful about dragging their feet with expanding the size of their journals to be really sure that the stuff they are publishing is really up to snuff. When you have new conferences on the block, as ICIP still is, it’s fair to be forgiving, sympathetic, generous to get people here. So I don’t think for example the papers that appear at ICIP are going to be as solid as they are at ICASSP, simply because ICASSP has more of a tradition, it has an established community. ICIP has to grow up to that point. What worries me is if now all of a sudden we have an international conference in multimedia signal processing or a Transactions in that area, is that going to bleed off a young Transactions like ours and a young conference like ours with the end result that you can’t really have two where before there was one. I would like to see extreme caution exercised when creating new entities that are going to compete with the entities you have.
-'''Nebeker:'''
+<br> '''Nebeker:'''
-Yes.<br>
+Yes.<br>
+<br>
 '''Gray:'''
 It is clear that we had to spawn some new areas, because ICASSP could not handle it all. And I am convinced by the case for ICIP, and I think ICIP is getting steadily better. But I don’t want to a new society spawned every two or three years. We can’t manage it.<br>
+<br>
 '''Nebeker:'''
 Could I ask now about how you got into image processing?<br>
+<br>
 '''Gray:'''
@@ Line 947: / Line 945: @@
 Okay. That I guess is another story where there were two influences. One was at the time speech funding for students was diminishing. As a professor, the number of students I have is doubly influenced by funding. First I can only have those that are fully funded on their own and choose what they want to do or that I can find funding for. It was clear that the funding for the speech side of things was going down, people of mine who were graduating having done Ph.D. theses in speech work were turning into networks people at Bell Labs and other places. So in some ways the writing on the wall to not encourage more speech work was there. <br>
+<br>
+The other thing was that I had students who were getting very interested in image processing, but who didn’t need funding. If memory serves, it was really Rich Baker who had, being an entrepreneur by birth I think, pretty much found his own funding, wanted to try doing all of this stuff in image processing, and developed what at the time we thought was the first vector quantization system for images. We later discovered that we had been scooped elsewhere by my perpetual colleague and cohort Allen Gersho, and also this time by a group I was to get to know pretty well in Japan. On the other hand, we had done a few things differently, and I like to think somewhat better, and it was clear that it was something that was within our capability of doing and sort of proved that I had been wrong in thinking we couldn’t do image processing because we couldn’t afford the equipment. Speech had seemed better because we could afford what we needed to do, and it looked like for image processing you needed a lot of stuff we didn’t have. Baker basically got a used Stanford Television Network TV and started working on a PC, and he was designing codes and doing compression, very nicely, with that equipment which he traveled with because part of his Ph.D. was done while he traveled with his wife so she could finish her degree. And while he was visiting in other places, including Utah. I think had an impact on subsequent students. I think it’s fair to say the main people who got the group into image coding were Rich Baker and Eve Riskin. She came to our group out of MIT, took a year to work, and then when she came back got mostly interested both in the image vector quantization, thanks to Rich Baker. She was profoundly influenced by Phil Chou and Tom Lookabough, who had been the pinnacle of our speech work, and she was interested in medical images.<br>
-The other thing was that I had students who were getting very interested in image processing, but who didn’t need funding. If memory serves, it was really Rich Baker who had, being an entrepreneur by birth I think, pretty much found his own funding, wanted to try doing all of this stuff in image processing, and developed what at the time we thought was the first vector quantization system for images. We later discovered that we had been scooped elsewhere by my perpetual colleague and cohort Allen Gersho, and also this time by a group I was to get to know pretty well in Japan. On the other hand, we had done a few things differently, and I like to think somewhat better, and it was clear that it was something that was within our capability of doing and sort of proved that I had been wrong in thinking we couldn’t do image processing because we couldn’t afford the equipment. Speech had seemed better because we could afford what we needed to do, and it looked like for image processing you needed a lot of stuff we didn’t have. Baker basically got a used Stanford Television Network TV and started working on a PC, and he was designing codes and doing compression, very nicely, with that equipment which he traveled with because part of his Ph.D. was done while he traveled with his wife so she could finish her degree. And while he was visiting in other places, including Utah. I think had an impact on subsequent students. I think it’s fair to say the main people who got the group into image coding were Rich Baker and Eve Riskin. She came to our group out of MIT, took a year to work, and then when she came back got mostly interested both in the image vector quantization, thanks to Rich Baker. She was profoundly influenced by Phil Chou and Tom Lookabough, who had been the pinnacle of our speech work, and she was interested in medical images.<br>
+<br>
 '''Nebeker:'''
 I saw that here.<br>
+<br>
 '''Gray:'''
 And so that started a long period of work with my group involved with radiologists looking at compressing medical images and perhaps more strangely, evaluating the quality of medical images that have been compressed or actually that have been operated in any way by a computer.<br>
+<br>
 '''Nebeker:'''
 Well that’s obviously a very important thing to, essential thing to be able to do.<br>
+<br>
 '''Gray:'''
 That led to several years of work that got us involved much more with statistics, actually involved with clinical testing. I think the work has had an impact on digital mammography.<br>
+<br>
 '''Nebeker:'''
 So that the evaluation actually sort of went to the clinical level?<br>
+<br>
 '''Gray:'''
 Yes.<br>
+<br>
 '''Nebeker:'''
 The evaluation of how good these images are?<br>
+<br>
 '''Gray:'''
 Because that’s the only way you’ll ever sell it to that community. Because the feeling of a lot of engineers was that doctors will never accept lossy compressed images because of the thought of law suits. <br>
+<br>
 '''Nebeker:'''
 Yes.<br>
+<br>
 '''Gray:'''
 Starting with Eve and up until fairly recent times, we got involved in the statistical design of clinical experiments. Not just of lossy compression, but of simply digital images as opposed to analog images. And we’re currently involved with the first submitted application for a fully digital mammography machine and trying to convince the FDA that it’s as good as analog mammography. That was a derivative work which I would say had signal processing origins, because the statistics we are using are very much like the statistics that are used in statistical signal processing, but it was as much a realization that you have to convince the community you want to adopt this technology, and that community is not even yet convinced that digital X-rays are a good thing. <br>
+<br>
 '''Nebeker:'''
 I wanted to ask about this transfer for knowledge, techniques, and so on from speech signal processing to image processing. <br>
+<br>
 '''Gray:'''
 I guess one way to think of it is that we were among the first multimedia researchers. Our basis was compressing anybody’s signal, so this was a move of speech to images. Parts of the theory and application did not generalize, for example, things like the right generalization image coding of LPC or Itakura-Saito parcor coding has not been developed. Other parts did generalize. The common issue is how do you do a good job of compressing whatever signal it is you’re looking at, and that involves many approaches. Our favorite approach was to look at a bunch of sample data, learn from that, the data designs your codes. Lots of techniques aim at avoiding the necessity for a learning set, but usually they are making implicit assumptions about how the data are going to behave. The underlying theory extends, it’s just you’re dealing with a different kind of stuff. The statistics are going to differ. Which codes work well and which ones work badly, that’s going to differ. People may say that images are two-dimensional and signals and speech is one-dimensional, but I don’t buy that because the whole LPC approach is really a vector approach. You model a chunk of 50 milliseconds or more of speech as a linear filter, which is 10 or so coefficients plus the driving signal. So the idea of coding vectors is relevant for speech. You’re already at 10 or 12 dimensions or more. Images may be two dimensions, but again we like to group pixels together and code vectors. That’s what JPEG does, 8x8 64-dimensional vectors. So it’s all very much in the spirit of the multidimensional signal processing. It’s just the signals are different.<br>
+<br>
 '''Nebeker:'''
 But let me put it this way: Wasn’t it an advantage when you tackled some of these image processing problems to have worked on speech or some other types of signals?<br>
+<br>
 '''Gray:'''
 Yes, but I think that would have been true if we had worked on EKG or other signals. The advantage was having an appreciation for how hard it is to do a good and efficient job of compressing something in a lossy fashion with minimal complexity in some way. Speech is a very complex and rich signal. And it teaches you that if you design something say for a nice idealized Gaussian source, it’s probably not going to work all that well for speech, and you can do a whole lot better if you get a better feeling for the statistical behavior of real speech. And images are the same. You can do a lot better if you get a feeling for how those signals behave. Going the other way, things like wavelet coding have been very successful for images and not so successful for speech. That kind of signal processing just seems to do a nice job of breaking images up into primitive components, but they are not the right primitive components for speech, at least not yet.<br>
+<br>
 '''Nebeker:'''
 Well, this seems to speak to this question of whether signal processing is a grab bag of techniques or whether there is a substantial body of theory that’s applicable to all different types of signal processing.<br>
+<br>
 '''Gray:'''
 It’s both. I mean, you can go through the bottom underlying all of it and these are going to be things like good old fashioned Fourier analysis. Even if you are going to do wavelets, a lot of the analysis and results there are done using Fourier analysis. Probability and random processes are at the bottom. If you are going to be analog, you need to know about differential equations and partial differential equations, calculus, and all of the mathematical tools that are important to EEs in general. Then when you start getting more specific, other tools are going to be more useful. I think a standard electrical engineering undergraduate degree and a master’s gives most people these days an excellent background for doing signal processing or circuit design or computer engineering. Then building past that, you can start taking for example, image processing and speech processing courses just based on a standard EE background. When you take those more advanced courses, you get into the specific nature of particular applications and the specific tools, which still are going to have common ground. Both image and speech processing and linear prediction theory are extremely important. Estimation, prediction, ideas of Wiener filters and their digital equivalents. So you have a common base.<br>
+<br>
 '''Nebeker:'''
 So even at the forefront there is transfer.<br>
+<br>
 '''Gray:'''
 Yes. I think in more schools should teach an advanced undergraduate level class in image and speech and audio processing, because they are fun areas, you can usually get into them with just a really basic EE background, and get up to speed and doing some really interesting stuff given the computing toys that we now have. I brought my Mac G3 power book with me, and as we speak it is designing image codes which I will be able to look at because they’ll be on my screen when I go back, and I’m expecting that some will have worked out nicely and others will have worked out not so nicely, and it adds to my aura among my students if I don’t just throw out an abstract idea, but I am competent to program in MatLab and can actually try it out and show some promise. And then I don’t feel bad asking them to try it in a real language, like C, on a more hefty database. And it’s just marvelous that you can do that today.<br>
+<br>
 '''Nebeker:'''
 Well, we’ve gone on quite a while. Is there something I haven’t asked about that you’d like to comment on?<br>
+<br>
 '''Gray:'''
 Trying to look back, I think I can say mainly it’s a fun area. Images and speech are both part of the signal processing community, and I’ve been active in both. I pretty much wandered away from the speech. I feel more at home in the image area now. I still think it’s partially a fluke historically that I ended up being a signal processor instead of some other area, but that was largely a question of technical seduction. That’s where the interesting problems were, where at least a few times I’ve been fortunate enough to be around when new things were getting tried that gave more than epsilon improvement, either by colleagues or students or siblings, and the fact that in signal processing with a computer you can get in there and play with real data and listen and see what it’s doing. I don’t think I would change any of those turns that I took looking back upon them, and I can’t imagine anything better to say about a field. I have no regrets that I was more of a mathematician during part of it, and I think everybody in the field should spend a little bit of their time appreciating the theory. <br>
+<br>
 '''Nebeker:'''
 And you don’t regret not having been a mathematician from the beginning say?<br>
+<br>
 '''Gray:'''
 No, it was important to me I think egotistically and in some ways professionally to establish credentials on that line, and it was really a thrill to work with some real great minds on the mathematics side. Maybe it’s the way I felt about myself when I was a musician in my youth, I knew I couldn’t make a living at music, but I really enjoyed it and I got to where I was good enough that I could really appreciate the really great ones. Well, happily, signal processing gave me a vocation that I earned a rather nice living at, and even got to watch a few former students become millionaires.<br>
+<br>
 '''Nebeker:'''
 Well, thank you very much.<br>
+<br>
 Gray:
 You’re welcome.
 <br>
 <br><br>