# First-Hand:The Hidden Markov Model

### From GHN

Line 1: | Line 1: | ||

− | '''Contributed by:''' [[Oral-History:Lawrence | + | '''Contributed by:''' [[Oral-History:Lawrence Rabiner|Lawrence R. Rabiner]], [[IEEE Fellow Grade History|Fellow of the IEEE]] |

In the late 1970s and early 1980s, the field of Automatic Speech Recognition (ASR) was undergoing a change in emphasis: from simple pattern recognition methods, based on templates and a spectral distance measure, to a statistical method for speech processing, based on the Hidden Markov Model (HMM). The underlying assumption of the HMM was that a speech signal could be well characterized and modeled, in both the time domain and in the frequency domain, using a Markov state diagram to characterize the temporal properties of speech, and a Gaussian mixture model to characterize the spectral properties of speech. | In the late 1970s and early 1980s, the field of Automatic Speech Recognition (ASR) was undergoing a change in emphasis: from simple pattern recognition methods, based on templates and a spectral distance measure, to a statistical method for speech processing, based on the Hidden Markov Model (HMM). The underlying assumption of the HMM was that a speech signal could be well characterized and modeled, in both the time domain and in the frequency domain, using a Markov state diagram to characterize the temporal properties of speech, and a Gaussian mixture model to characterize the spectral properties of speech. |

## Revision as of 15:39, 9 August 2012

**Contributed by:** Lawrence R. Rabiner, Fellow of the IEEE

In the late 1970s and early 1980s, the field of Automatic Speech Recognition (ASR) was undergoing a change in emphasis: from simple pattern recognition methods, based on templates and a spectral distance measure, to a statistical method for speech processing, based on the Hidden Markov Model (HMM). The underlying assumption of the HMM was that a speech signal could be well characterized and modeled, in both the time domain and in the frequency domain, using a Markov state diagram to characterize the temporal properties of speech, and a Gaussian mixture model to characterize the spectral properties of speech.

The earliest work on the theory of probabilistic functions of a Markov chain was published in a series of classic papers by Leonard E. Baum and his colleagues at the Institute for Defense Analyses (IDA) in Princeton, NJ, in the late 1960s. The process for disseminating information about the HMM methodology was a bit serendipitous and a bit fortuitous. An early form of the HMM methodology was initially adopted to speech processing applications by Jim Baker at Carnegie Mellon University (CMU) based on his reading and understanding of the Baum papers. The HMM methodology was introduced to Fred Jelinek at IBM Research in Yorktown Heights, NY, when Baker, and his wife Janet, joined the technical staff at IBM following completion of his doctoral research in the late 1970s. A key issue was that the HMM of the late 1970s was an incomplete (and somewhat ineffective) model for speech recognition since it could only work with discrete probability densities (rather than the continuous Gaussian mixture densities of most modern HMM implementations). Hence, until the end of the 1970s, the HMM remained a research vehicle for speech recognition applications at CMU and IBM, but was not disseminated or used by the rest of the speech recognition technical community.

The big breakthrough in popularizing the HMM was a classical set of lectures by Jack Ferguson and his colleagues at IDA in 1980. For this set of lectures, IDA received special permission to invite a selected number of researchers to come to IDA and be taught the fundamentals of Hidden Markov chains. A special publication, referred to as “The Blue Book,” was created for the attendees to this lecture series. The Blue Book was actually entitled *Applications of Hidden Markov Models to Text and Speech* and was provided to each attendee, but never was widely distributed in the technical community. The attendees from Bell Labs in Murray Hill, NJ, included Mohan Sondhi, Steve Levinson, and Joe Olive.

Immediately following these IDA lectures, Mohan and Steve gave a series of internal lectures at Bell Labs and drew Larry Rabiner and JUANG Georgia Tech Profile.html Fred Juang into the group working on methods for extending the capabilities and features of the Hidden Markov Model and their applications to speech recognition. A set of classic papers on the HMM methodology were published in the *Bell System Technical Journal* and *AT&T Technical Journal* in the mid-1980s. These papers introduced the scaling procedure for the re-estimation methods of the forward-backward method, showed how mixture densities could be utilized as part of the HMM framework, and described the segmental K-means method for iterative training of HMM parameters. Ultimately this work at Bell Labs led to the *IEEE Proceedings* paper on the “Hidden Markov Models and Selected Applications to Speech Recognition” at the end of the 1980s.

The HMM methodology spread rapidly after the publication of the *IEEE Proceedings* paper and by the early 1990s, the HMM was the preferred technology and became the method of implementation of a range of speech recognition systems, from simple isolated word recognition systems to large vocabulary speech understanding systems.

The genesis of the *IEEE Proceedings* paper was the work done at Bell Labs in the 1980s. The main motivation of the article was to describe, in the simplest possible form, how the HMM works, how it was implemented for a range of applications, and how it performed as compared to previous methods used in speech recognition research systems. Suffice to say the HMM technology led to major improvements in performance and ultimately to speech recognition systems that were implemented in the field and were utilized, often on a daily basis, by hundreds of millions of users for telephony applications.

The impact of the tutorial paper has far exceeded any reasonable expectations. The reasons for the popularity of both the HMM technology and the tutorial paper was that the paper was written in a style and manner that most people could learn from and use “right out of the box,” and that the ultimate HMM methodology was so readily adapted to a broad range of applications, far beyond those anticipated by either the author of this tutorial or by the original pioneers of the HMM methods.

### References

Lawrence R. Rabiner. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” *Proceedings of the IEEE* 77, no. 2 (February 1989), p. 257-86.

tutorial errata.pdf “Correction to: ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,’ Lawrence R. Rabiner, *Proc. IEEE*, Feb. 1989,” accessed 3 August 2012.

Rahimi, Ali. “An Erratum for ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,’” 30 December 2000, accessed 3 August 2012.

Lalit R. Bahl, Frederick Jelinek and Robert L. Mercer. “A Maximum Likelihood Approach to Continuous Speech Recognition,” *IEEE Transactions on Pattern Analysis and Machine Intelligence* Vol. PAMI-5, no. 2 (March 1983), p. 179-90; and in A. Waibel and K. F. Lee, eds., *Readings in Speech Recognition*, (San Mateo, CA: Morgan Kaufmann Publishers, 1990), p. 308-19.

James K. Baker. “Machine-aided Labeling of Connected Speech,” in *Working Papers in Speech Recognition XI*, Technical Reports, Computer Science Department, Carnegie-Mellon University, Pittsburgh, PA (1973).

________. “The DRAGON System—An Overview,” *IEEE Transactions on Acoustics, Speech, Signal Processing*, Vol. ASSP-23 (February 1975), p. 24-9.

James K. Baker and Lalit R. Bahl. “Some Experiments in Automatic Recognition of Continuous Speech,” *Proceedings of the 11th Annual IEEE Computer Society Conference* (1975), p. 326-9.

Leonard E. Baum and Ted Petrie. “Statistical Inference for Probabilistic Functions of Finite State Markov Chains,” *The Annals of Mathematical Statistics* 37, no. 6 (1966), p. 1554-63.

Leonard E. Baum and J. A. Eagon. “An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology,” *Bulletin of the American Mathematical Society* 73, no. 3 (1967), p. 360-3.

Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. “A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,” *The Annals of Mathematical Statistics* 41, no. 1 (1970), p. 164-71.

John D. Ferguson, ed. *Proceedings of the Symposium on the Applications of Hidden Markov Models to Text and Speech* (Princeton, NJ: IDA, Communications Research Division, 1980).

Frederick Jelinek. “Continuous Speech Recognition by Statistical Methods,” *Proceedings of the IEEE* 64, no. 4 (April 1976), p. 532-56.

Biing-Hwang Juang. “On Hidden Markov Models and Dynamic Time Warping for Speech Recognition – A Unified View,” *AT&T Bell Laboratories Technical Journal* 63, no. 7 (September 1984), p. 1213-44.

________. “Maximum Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains,” *AT&T Technical Journal* 64, no. 6, Part 1 (July-August 1985), p. 1235-50.

Biing-Hwang (Fred) Juang, and Lawrence R. Rabiner. probabilistic measure.pdf “A Probabilistic Distance Measure for Hidden Markov Models,” *AT&T Technical Journal* 64, no. 2 (February 1985), p. 391-408.

Stephen E. Levinson, Lawrence R. Rabiner, and Man M. Sondhi. “An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,” *Bell System Technical Journal* 62, no. 4 (April 1983), p. 1035-74.

Lawrence R. Rabiner, Stephen E. Levinson, and Man M. Sondhi. “On the Application of Vector Quantization and Hidden Markov Models to Speaker-Independent, Isolated Word Recognition,” *Bell System Technical Journal* 62, no. 4 (April 1983), p. 1075-1105.

L. R. Rabiner, B. H. Juang, S. E. Levinson, and M. M. Sondhi. papers/237_isolated_digit_hmm.pdf “Recognition of Isolated Digits Using Hidden Markov Models with Continuous Mixture Densities,” *AT&T Technical Journal* 64, no. 6 (July-August 1985), p. 1211-34.