MPEG-D: Unified speech and audio coding

Standard: MPEG-D
Part: 3

This document specifies a unified speech and audio codec which is capable of coding signals having an arbitrary mix of speech and audio content. The codec has a performance comparable to, or better than, the best known coding technology that might be tailored specifically to coding of either speech or general audio content. The codec supports single and multi-channel coding at high bitrates and provides perceptually transparent quality. At the same time, it enables very efficient coding at very low bitrates while retaining the full audio bandwidth. This document incorporates several perceptually-based compression techniques developed in previous MPEG standards: perceptually shaped quantization noise, parametric coding of the upper spectrum region and parametric coding of the stereo sound stage. However, it combines these well-known perceptual techniques with a source coding technique: a model of sound production, specifically that of human speech.This document incorporates several perceptually-based compression techniques developed in previous MPEG standards: perceptually shaped quantization noise, parametric coding of the upper spectrum region and parametric coding of the stereo sound stage. However, it combines these well-known perceptual techniques with a source coding technique: a model of sound production, specifically that of human speech.

Editions

Edition - 1: ISO/IEC 23003-3:2012 [Edition 1] Unified Speech and Audio Coding

Publication Year: 0
Status: published
Motivations:
Objectives:

Edition - 2: Unified Speech and Audio Coding

Publication Year: 0
Status: released
Motivations: In general, audio signals can be music, speech or a mix of both. Across applications, the bitrates available in transmission channels can vary greatly. Hence, this work addresses the need to provide high compression for signals that are a mix of music and speech.
Objectives: To create compression technology that provides high performance over a wide range if bit rates. Specifically, to code mono signals at 12 kb/s, stereo signals from 16 kb/s and 5.1 channel signals at 96 kb/s.

Edition - 3: Unified Speech and Audio Coding, Third Edition

Publication Year: 0
Status: released
Motivations: To simplify managment of this standard, this work removed Reference Software and Conformance from the text specification.
Objectives: Remove Reference Software and Conformance from the text specification.