Milestone-Proposal:Line spectrum pair (LSP), an essential technology for high-compression speech coding, 1975: Difference between revisions

From ETHW
(Created page with "{{Proposal |more than 25 years=Yes |within fields of interest=Yes |benefit to humanity=Yes |regional importance=Yes |ou is paying=Yes |ou is arranging dedication=Yes |section is ...")
 
No edit summary
Line 12: Line 12:
|plaque citation=LSP (Line Spectrum Pair), invented at NTT in 1975, is a key technology for speech coding. A speech synthesizer LSI was designed based on LSP in 1980, and many international speech coding standards of ITU-T, 3GPP, 3GPP2, and IETF have adopted LSP as an essential technology. These standards cover almost all cellular phones and IP phones worldwide.
|plaque citation=LSP (Line Spectrum Pair), invented at NTT in 1975, is a key technology for speech coding. A speech synthesizer LSI was designed based on LSP in 1980, and many international speech coding standards of ITU-T, 3GPP, 3GPP2, and IETF have adopted LSP as an essential technology. These standards cover almost all cellular phones and IP phones worldwide.
|a2b=IEEE Tokyo Section, Japan
|a2b=IEEE Tokyo Section, Japan
|IEEE units paying=
|IEEE units paying={{IEEE Organizational Unit Paying
|IEEE units arranging=
|Unit=IEEE Tokyo Section, Japan
|IEEE sections monitoring=
}}
|Milestone proposers=
|IEEE units arranging={{IEEE Organizational Unit Arranging
|Unit=IEEE Tokyo Section, Japan
}}
|IEEE sections monitoring={{IEEE Section Monitoring
|Section=IEEE Tokyo Section, Japan
}}
|Milestone proposers={{Milestone proposer
|Proposer name=Takehiro Moriya
|Proposer email=moriya.takehiro@lab.ntt.co.jp
}}
|a2a=NTT Musashino R&D center 
|a2a=NTT Musashino R&D center 
9-11, Midori-cho 3-Chome Musashino-Shi, Tokyo 180-8585 Japan
9-11, Midori-cho 3-Chome Musashino-Shi, Tokyo 180-8585 Japan
|a7=Initial invention, follow up investigations and developments have been carried out at this site.
|a7=Initial invention, follow up investigations and developments have been carried out at this site.
|a8=No. New building has been built on the original site.
|a8=No. New building has been built on the original site.
|mounting details=The plaque will be placed near the reception area in the ground floor entrance hall. All visitors have free access to this hall.  
|mounting details=The plaque will be placed near the reception area in the ground floor entrance hall. All visitors have free access to this hall.
|a9=NTT’s receptionists are always near the plaque, and the plaque will be displayed in a transparent hard case.  
|a9=NTT’s receptionists are always near the plaque, and the plaque will be displayed in a transparent hard case.
|a10=NTT (Nippon Telegraph and Telephone) corporation
|a10=NTT (Nippon Telegraph and Telephone) corporation
|support letter=Agreement.pdf
|support letter=Agreement.pdf
|a4= The line spectrum pair (LSP), invented in 1975, is one of the most efficient feature representation technologies for speech signals. Due to its number of practical merits for high-compression speech coding, it is commonly used worldwide in speech coding standards for cellular and IP phones, including 3GPP AMR (3G cellular in Europe and Japan), 3GPP2 EVRC (3G cellular in the USA and Japan), ITU-T G.723.1 and G.729 (IP phones), IETF SILK (software IP phones, Skype) , and PDC half (2G cellular in Japan), which cover almost all high-compression telephone communications systems that are widely used around the world now and will be in the future.
|a4=The line spectrum pair (LSP), invented in 1975, is one of the most efficient feature representation technologies for speech signals. Due to its number of practical merits for high-compression speech coding, it is commonly used worldwide in speech coding standards for cellular and IP phones, including 3GPP AMR (3G cellular in Europe and Japan), 3GPP2 EVRC (3G cellular in the USA and Japan), ITU-T G.723.1 and G.729 (IP phones), IETF SILK (software IP phones, Skype) , and PDC half (2G cellular in Japan), which cover almost all high-compression telephone communications systems that are widely used around the world now and will be in the future.
|a6=Auto-regressive linear prediction coding (LPC) technology is one of the most powerful and useful speech coding tools [1]. Pioneering investigations of the technology were independently and simultaneously started by Dr. B. S. Atal and Dr. M. R. Schroeder at AT&T Bell Labs in 1966, and by Dr. F. Itakura and Dr. S. Saito at NTT (Nippon Telegraph and Telephone) Labs. Representation and quantization of the prediction coefficients with less bit consumption and smaller LPC spectral distortion are the most critical issues in the LPC framework. At the same time, speech coding for cellular phones needs to be robust against transmission bit errors in the radio-wave environment. Thus, the representation method of predictive coefficients needs to achieve smaller LPC spectral distortion with fewer quantization bits while maintaining robustness against transmission channel errors.
|a6=Auto-regressive linear prediction coding (LPC) technology is one of the most powerful and useful speech coding tools [1]. Pioneering investigations of the technology were independently and simultaneously started by Dr. B. S. Atal and Dr. M. R. Schroeder at AT&T Bell Labs in 1966, and by Dr. F. Itakura and Dr. S. Saito at NTT (Nippon Telegraph and Telephone) Labs. Representation and quantization of the prediction coefficients with less bit consumption and smaller LPC spectral distortion are the most critical issues in the LPC framework. At the same time, speech coding for cellular phones needs to be robust against transmission bit errors in the radio-wave environment. Thus, the representation method of predictive coefficients needs to achieve smaller LPC spectral distortion with fewer quantization bits while maintaining robustness against transmission channel errors.
|a5=It is possible to transmit prediction coefficients directly. Quantizing predictive coefficients, however, needs many bits for maintaining the LPC spectral shape. It is also difficult to avoid the risk of instability of the coding system.
|a5=It is possible to transmit prediction coefficients directly. Quantizing predictive coefficients, however, needs many bits for maintaining the LPC spectral shape. It is also difficult to avoid the risk of instability of the coding system.
Partial auto correlation (PARCOR), invented by Dr. F. Itakura and Dr. S. Saito in 1972, enables an easy stability check but still needs many bits of quantization to maintain the LPC spectral shape. It is possible to reduce bit consumption for quantizing PARCOR by applying adaptive bit allocation and variable length coding schemes. Both schemes are, however, extremely sensitive to transmission channel errors.
Partial auto correlation (PARCOR), invented by Dr. F. Itakura and Dr. S. Saito in 1972, enables an easy stability check but still needs many bits of quantization to maintain the LPC spectral shape. It is possible to reduce bit consumption for quantizing PARCOR by applying adaptive bit allocation and variable length coding schemes. Both schemes are, however, extremely sensitive to transmission channel errors.
LSP, an alternative representation technology for prediction coefficients, was invented by Dr. F. Itakura in 1975 [2]. It enables a simple stability check and can maintain the LPC spectrum shape with around 30% less bit consumption than PARCOR, even without using adaptive bit allocation or the variable length coding schemes [3] – [7]. This is because the quantization distortion of LSP has smaller and more natural influences on LPC spectral shape than PARCOR. Thus, small LPC spectral distortion is achieved by efficient coding of LSP in combination with prediction, interpolation, and vector quantization.
LSP, an alternative representation technology for prediction coefficients, was invented by Dr. F. Itakura in 1975 [2]. It enables a simple stability check and can maintain the LPC spectrum shape with around 30% less bit consumption than PARCOR, even without using adaptive bit allocation or the variable length coding schemes [3] – [7]. This is because the quantization distortion of LSP has smaller and more natural influences on LPC spectral shape than PARCOR. Thus, small LPC spectral distortion is achieved by efficient coding of LSP in combination with prediction, interpolation, and vector quantization.
|references=[1] B. S. Atal, “The History of Linear Prediction”, IEEE SIGNAL PROCESSING MAGAZINE, pp. 154-157, MARCH 2006.


|references=[1] B. S. Atal, “The History of Linear Prediction”, IEEE SIGNAL PROCESSING MAGAZINE, pp. 154-157, MARCH 2006.
[2] F. Itakura, “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals,” J. Acoust. Soc. Am., 57, 533(A), 1975.
[2] F. Itakura, “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals,” J. Acoust. Soc. Am., 57, 533(A), 1975.
[3] JP Patent 56051116 - ALL POLE TYPE DIGITAL FILTER invented by F. Itakura
[3] JP Patent 56051116 - ALL POLE TYPE DIGITAL FILTER invented by F. Itakura
http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&II=8&ND=6&adjacent=true&locale=en_EP&FT=D&date=19810508&CC=JP&NR=56051116A&KC=A
http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&II=8&ND=6&adjacent=true&locale=en_EP&FT=D&date=19810508&CC=JP&NR=56051116A&KC=A
[4] US Patent 4,393,272  Sound synthesizer invented by F. Itakura and N. Sugamura
[4] US Patent 4,393,272  Sound synthesizer invented by F. Itakura and N. Sugamura
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=4,393,272.PN.&OS=PN/4,393,272&RS=PN/4,393,272
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=4,393,272.PN.&OS=PN/4,393,272&RS=PN/4,393,272
[5] F. Itakura, “Statistical Methods for Speech Analysis and Synthesis –from ML Vocoder to LSP through PARCOR –,” IEICE Fundamental Review Vol.3 No.3. 2010. (in Japanese)
[5] F. Itakura, “Statistical Methods for Speech Analysis and Synthesis –from ML Vocoder to LSP through PARCOR –,” IEICE Fundamental Review Vol.3 No.3. 2010. (in Japanese)
Abstract: The invention process of the line spectrum air (LSP), one of the most important analysis technologies for speech signals, is described. Partial auto correlation (PARCOR) and LSP are alternative representation methods for a speech spectrum shape or a vocal tract shape. Both methods were invented at NTT Labs in 1972 and 1975, respectively. This paper covers the processes for these inventions, starting from the original invention of a speech analysis method based on maximum likelihood estimation in 1966.
Abstract: The invention process of the line spectrum air (LSP), one of the most important analysis technologies for speech signals, is described. Partial auto correlation (PARCOR) and LSP are alternative representation methods for a speech spectrum shape or a vocal tract shape. Both methods were invented at NTT Labs in 1972 and 1975, respectively. This paper covers the processes for these inventions, starting from the original invention of a speech analysis method based on maximum likelihood estimation in 1966.
[6] F. Itakura, T. Kobayashi and M. Honda, “A Hardware implementation of a new narrow and medium band speech coding,”, Proc. ICASSP 82, pp. 1964 – 1967, 1982.
[6] F. Itakura, T. Kobayashi and M. Honda, “A Hardware implementation of a new narrow and medium band speech coding,”, Proc. ICASSP 82, pp. 1964 – 1967, 1982.
[7] F. Soong and B. H. Juan, “Line spectrum pair (LSP) and speech data compression,” Proc. ICASSP 84, Vol. 9, pp. 37 – 40, 1984.
[7] F. Soong and B. H. Juan, “Line spectrum pair (LSP) and speech data compression,” Proc. ICASSP 84, Vol. 9, pp. 37 – 40, 1984.
|supporting materials=[8] ITU-T G.723.1(Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s)
|supporting materials=[8] ITU-T G.723.1(Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s)
http://www.itu.int/rec/T-REC-G.723.1-200605-I,section 2.4-2.7
http://www.itu.int/rec/T-REC-G.723.1-200605-I,section 2.4-2.7
[9] ITU-T G.729(Coding of speech at 8 kbit/s using conjugate structure algebraic-code-excited linear prediction (CS-ACELP))
[9] ITU-T G.729(Coding of speech at 8 kbit/s using conjugate structure algebraic-code-excited linear prediction (CS-ACELP))
http://www.itu.int/rec/T-REC-G.729-200701-S, section 3.2
http://www.itu.int/rec/T-REC-G.729-200701-S, section 3.2
[10] 3GPP AMRhttp://www.etsi.org/deliver/etsi_ts/126000_126099/126090/10.01.00_60/ts_126090v100100p.pdf, section 5.2
[10] 3GPP AMRhttp://www.etsi.org/deliver/etsi_ts/126000_126099/126090/10.01.00_60/ts_126090v100100p.pdf, section 5.2
[11] 3GPP2 EVRC
[11] 3GPP2 EVRC
http://www.3gpp2.org/public_html/specs/C.S0014-0_v1.0_revised.pdf, section 4.2
http://www.3gpp2.org/public_html/specs/C.S0014-0_v1.0_revised.pdf, section 4.2
[12] IETF SILKhttp://tools.ietf.org/html/draft-vos-silk-01, page 261
[12] IETF SILKhttp://tools.ietf.org/html/draft-vos-silk-01, page 261
[13] Example of VoIP gateway supporting G. 729
[13] Example of VoIP gateway supporting G. 729
http://www.cisco.com/en/US/docs/ios/solutions_docs/voip_solutions/CAC.html
http://www.cisco.com/en/US/docs/ios/solutions_docs/voip_solutions/CAC.html
|submitted=No
|submitted=No
}}
}}

Revision as of 23:52, 28 December 2012

Docket #:

This is a draft proposal, that has not yet been submitted. To submit this proposal, click on "Edit with form", check the "Submit this proposal for review" box at the bottom, and save the page.


Is the achievement you are proposing more than 25 years old? Yes

Is the achievement you are proposing within IEEE’s fields of interest? (e.g. “the theory and practice of electrical, electronics, communications and computer engineering, as well as computer science, the allied branches of engineering and the related arts and sciences” – from the IEEE Constitution) Yes

Did the achievement provide a meaningful benefit for humanity? Yes

Was it of at least regional importance? Yes

Has an IEEE Organizational Unit agreed to pay for the milestone plaque(s)? Yes

Has an IEEE Organizational Unit agreed to arrange the dedication ceremony? Yes

Has the IEEE Section in which the milestone is located agreed to take responsibility for the plaque after it is dedicated? Yes

Has the owner of the site agreed to have it designated as an Electrical Engineering Milestone? Yes


Year or range of years in which the achievement occurred:

1975

Title of the proposed milestone:

Line spectrum pair (LSP), an essential technology for high-compression speech coding, 1975

Plaque citation summarizing the achievement and its significance:

LSP (Line Spectrum Pair), invented at NTT in 1975, is a key technology for speech coding. A speech synthesizer LSI was designed based on LSP in 1980, and many international speech coding standards of ITU-T, 3GPP, 3GPP2, and IETF have adopted LSP as an essential technology. These standards cover almost all cellular phones and IP phones worldwide.

In what IEEE section(s) does it reside?

IEEE Tokyo Section, Japan

IEEE Organizational Unit(s) which have agreed to sponsor the Milestone:

IEEE Organizational Unit(s) paying for milestone plaque(s):

Unit: IEEE Tokyo Section, Japan
Senior Officer Name: Senior officer name masked to public

IEEE Organizational Unit(s) arranging the dedication ceremony:

Unit: IEEE Tokyo Section, Japan
Senior Officer Name: Senior officer name masked to public

IEEE section(s) monitoring the plaque(s):

IEEE Section: IEEE Tokyo Section, Japan
IEEE Section Chair name: Section chair name masked to public

Milestone proposer(s):

Proposer name: Proposer's name masked to public
Proposer email: Proposer's email masked to public

Please note: your email address and contact information will be masked on the website for privacy reasons. Only IEEE History Center Staff will be able to view the email address.

Street address(es) and GPS coordinates of the intended milestone plaque site(s):

NTT Musashino R&D center  9-11, Midori-cho 3-Chome Musashino-Shi, Tokyo 180-8585 Japan

Describe briefly the intended site(s) of the milestone plaque(s). The intended site(s) must have a direct connection with the achievement (e.g. where developed, invented, tested, demonstrated, installed, or operated, etc.). A museum where a device or example of the technology is displayed, or the university where the inventor studied, are not, in themselves, sufficient connection for a milestone plaque.

Please give the address(es) of the plaque site(s) (GPS coordinates if you have them). Also please give the details of the mounting, i.e. on the outside of the building, in the ground floor entrance hall, on a plinth on the grounds, etc. If visitors to the plaque site will need to go through security, or make an appointment, please give the contact information visitors will need.

Initial invention, follow up investigations and developments have been carried out at this site.

Are the original buildings extant?

No. New building has been built on the original site.

Details of the plaque mounting:

The plaque will be placed near the reception area in the ground floor entrance hall. All visitors have free access to this hall.

How is the site protected/secured, and in what ways is it accessible to the public?

NTT’s receptionists are always near the plaque, and the plaque will be displayed in a transparent hard case.

Who is the present owner of the site(s)?

NTT (Nippon Telegraph and Telephone) corporation

A letter in English, or with English translation, from the site owner(s) giving permission to place IEEE milestone plaque on the property:


A letter or email from the appropriate Section Chair supporting the Milestone application:

File:Agreement.pdf

What is the historical significance of the work (its technological, scientific, or social importance)?

The line spectrum pair (LSP), invented in 1975, is one of the most efficient feature representation technologies for speech signals. Due to its number of practical merits for high-compression speech coding, it is commonly used worldwide in speech coding standards for cellular and IP phones, including 3GPP AMR (3G cellular in Europe and Japan), 3GPP2 EVRC (3G cellular in the USA and Japan), ITU-T G.723.1 and G.729 (IP phones), IETF SILK (software IP phones, Skype) , and PDC half (2G cellular in Japan), which cover almost all high-compression telephone communications systems that are widely used around the world now and will be in the future.

What obstacles (technical, political, geographic) needed to be overcome?

Auto-regressive linear prediction coding (LPC) technology is one of the most powerful and useful speech coding tools [1]. Pioneering investigations of the technology were independently and simultaneously started by Dr. B. S. Atal and Dr. M. R. Schroeder at AT&T Bell Labs in 1966, and by Dr. F. Itakura and Dr. S. Saito at NTT (Nippon Telegraph and Telephone) Labs. Representation and quantization of the prediction coefficients with less bit consumption and smaller LPC spectral distortion are the most critical issues in the LPC framework. At the same time, speech coding for cellular phones needs to be robust against transmission bit errors in the radio-wave environment. Thus, the representation method of predictive coefficients needs to achieve smaller LPC spectral distortion with fewer quantization bits while maintaining robustness against transmission channel errors.

What features set this work apart from similar achievements?

It is possible to transmit prediction coefficients directly. Quantizing predictive coefficients, however, needs many bits for maintaining the LPC spectral shape. It is also difficult to avoid the risk of instability of the coding system. Partial auto correlation (PARCOR), invented by Dr. F. Itakura and Dr. S. Saito in 1972, enables an easy stability check but still needs many bits of quantization to maintain the LPC spectral shape. It is possible to reduce bit consumption for quantizing PARCOR by applying adaptive bit allocation and variable length coding schemes. Both schemes are, however, extremely sensitive to transmission channel errors. LSP, an alternative representation technology for prediction coefficients, was invented by Dr. F. Itakura in 1975 [2]. It enables a simple stability check and can maintain the LPC spectrum shape with around 30% less bit consumption than PARCOR, even without using adaptive bit allocation or the variable length coding schemes [3] – [7]. This is because the quantization distortion of LSP has smaller and more natural influences on LPC spectral shape than PARCOR. Thus, small LPC spectral distortion is achieved by efficient coding of LSP in combination with prediction, interpolation, and vector quantization.

References to establish the dates, location, and importance of the achievement: Minimum of five (5), but as many as needed to support the milestone, such as patents, contemporary newspaper articles, journal articles, or citations to pages in scholarly books. At least one of the references must be from a scholarly book or journal article.

[1] B. S. Atal, “The History of Linear Prediction”, IEEE SIGNAL PROCESSING MAGAZINE, pp. 154-157, MARCH 2006.

[2] F. Itakura, “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals,” J. Acoust. Soc. Am., 57, 533(A), 1975.

[3] JP Patent 56051116 - ALL POLE TYPE DIGITAL FILTER invented by F. Itakura http://worldwide.espacenet.com/publicationDetails/biblio?DB=EPODOC&II=8&ND=6&adjacent=true&locale=en_EP&FT=D&date=19810508&CC=JP&NR=56051116A&KC=A

[4] US Patent 4,393,272 Sound synthesizer invented by F. Itakura and N. Sugamura http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=4,393,272.PN.&OS=PN/4,393,272&RS=PN/4,393,272

[5] F. Itakura, “Statistical Methods for Speech Analysis and Synthesis –from ML Vocoder to LSP through PARCOR –,” IEICE Fundamental Review Vol.3 No.3. 2010. (in Japanese) Abstract: The invention process of the line spectrum air (LSP), one of the most important analysis technologies for speech signals, is described. Partial auto correlation (PARCOR) and LSP are alternative representation methods for a speech spectrum shape or a vocal tract shape. Both methods were invented at NTT Labs in 1972 and 1975, respectively. This paper covers the processes for these inventions, starting from the original invention of a speech analysis method based on maximum likelihood estimation in 1966.

[6] F. Itakura, T. Kobayashi and M. Honda, “A Hardware implementation of a new narrow and medium band speech coding,”, Proc. ICASSP 82, pp. 1964 – 1967, 1982.

[7] F. Soong and B. H. Juan, “Line spectrum pair (LSP) and speech data compression,” Proc. ICASSP 84, Vol. 9, pp. 37 – 40, 1984.

Supporting materials (supported formats: GIF, JPEG, PNG, PDF, DOC): All supporting materials must be in English, or if not in English, accompanied by an English translation. You must supply the texts or excerpts themselves, not just the references. For documents that are copyright-encumbered, or which you do not have rights to post, email the documents themselves to ieee-history@ieee.org. Please see the Milestone Program Guidelines for more information.

[8] ITU-T G.723.1(Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s) http://www.itu.int/rec/T-REC-G.723.1-200605-I,section 2.4-2.7

[9] ITU-T G.729(Coding of speech at 8 kbit/s using conjugate structure algebraic-code-excited linear prediction (CS-ACELP)) http://www.itu.int/rec/T-REC-G.729-200701-S, section 3.2

[10] 3GPP AMRhttp://www.etsi.org/deliver/etsi_ts/126000_126099/126090/10.01.00_60/ts_126090v100100p.pdf, section 5.2

[11] 3GPP2 EVRC http://www.3gpp2.org/public_html/specs/C.S0014-0_v1.0_revised.pdf, section 4.2

[12] IETF SILKhttp://tools.ietf.org/html/draft-vos-silk-01, page 261

[13] Example of VoIP gateway supporting G. 729 http://www.cisco.com/en/US/docs/ios/solutions_docs/voip_solutions/CAC.html