G.729 refers to a codec described by ITU- T (actually, vocoder, voice coder, see Parametric Audio Coding ) for compressing voice into digital signals. The technical name is also " Conjugate Structure Algebraic Code Excited Linear Prediction " (CS- ACELP ). G.729 is used for example in IP telephony connections ( Internet telephony).


G.729 is a hybrid compression method, which is based on the study and transmission of voice parameters, a so-called vocoders, and differential information and consequent speech synthesis. Here, the codec relies on frames of 10 ms length, which he studied in a language- typical characteristics. These are translated into parameters for subsequent synthesis. In addition, the codec difference information arising from the artificially generated and the actual signal transfers. In a voice packet every two frames are transmitted at 10 millisecond together, whereby the delay is approximately 25 milliseconds.

Audio signals, which do not represent a source human language, this codec can only be difficult to process. For example, it can process the used in the analog telephony multi-frequency tones only inadequate. Here, you can manage this by using the multi-frequency tones are filtered out of the signal and transmitted in the channel information according to RFC 2833 ( " outband ").

Furthermore suppressed G.729 speech pauses. So that this does not sound like a disconnect if the listener, the decoder has the ability to fill pauses in speech with so-called comfort noise. The standard includes possible implementations in both the fixed-point and floating-point format in the technically more complex, making it easier to use in different complex DSP platforms. For these reasons, G.729 is relatively computationally expensive depending on the version used; he needed depending on the implementation and the options it contains about 50 MIPS. The variants G.729A and G.729B have a low computational complexity and need for example in the non-optimized reference implementation of the ITU- T on the microcontroller MicroBlaze approximately 10.3 million clock cycles for 80 audio samples. The MIPS data can, however, vary depending on the architecture and type of optimization from the values ​​used and provide only a rough guide dar.


G.729 is in different variants in the standard as annexes ( Annexes English ), divided. These attachments are marked with different letters and other symbols to distinguish. Each annex describes various possible combinations, which differ require the computing power and the functional scope of the codecs in the implementation effort. For a correct decoding of the encoder and decoder must be coordinated.

The following variations are within the scope of G.729 are available:

The DTX option is available for discontinuous transmission on German about broken connection, the ability to fill pauses in speech with so-called comfort noise. In the Mean Opinion Score ( MOS) G.729 reached a perceived quality of 3.98 of 5 points, which variant G.729A reached only 3.7 out of 5.

The codec used for the encoded voice signal usually has a fixed bit rate of 8 kbit / s, but also a fixed bit rate of 6.4 kbit / s and 11.8 kbit / s is possible, in some variants. The frequency spectrum encompasses 300-3400 Hz, with only voice data is transmitted accurately through the coding concept.

G.729.1 ( G.729J )

The last extension G.729J - this variant corresponds to the working designation G.729.1 - has the ability to wideband speech and audio coding: The transmitted frequency bandwidth has been increased to the range 50 Hz to 7 kHz. The G.729J codec is organized hierarchically and the actual bit rate and hence also the Sprach-/Audioqualität can be adjusted by simply " clipping " of the bit stream at variable bit rates.

Voice quality in comparison

To compare the quality of transmission, the method of the Mean Opinion Score ( MOS) can be applied, which detects the subjective perception of voice quality of a user ( in a listening situation ). The MOS scale is not an absolute scale, but depending on the problem and of the musical examples offered in the so-called hearing test. In various tests can therefore achieve different values ​​of the same codec. However, important is the difference between the codec to be tested to the known reference codecs (e.g. G.711 ). In typical tests G.729 reaches a value of about 3.9 (on a five-point MOS scale). Thus achieved G.729 higher subjective speech quality than other codecs (e.g. G.723 and BG728 ), but the reference is subject to G.711 codec (ISDN). G.711 achieves slightly higher MOS value of approximately 4.1, needed for this but with 64 kbit / s to an eight times higher data transfer rate than G.729, which requires only s 8 kbit / s.

Overhead when used with RTP in an IPv4 network

The said data rate of 8 kbit / s is nominal, it refers exclusively to the audio data itself is now a stream of data sent through a network, there is the overhead of switching data for the data packets in the data stream is packed added. When using RTP in an IPv4 network is the 40 bytes per IPv4 packet (60 bytes for IPv6). The frame length is 10 ms for G.729 and such a frame is encoded on 10 bytes. Typically, 2 frames per IPv4 packet will be sent. Consequently, you need this setting for 20 ms speech data effectively 60 bytes (40 10 10 bytes). Are 3000 bytes, ie 24 kbit / s (3000 bytes * 8/1000 = 24 kbit) per second. Packt is now more than 2 frames in the packet, then the relative proportion of IP data and the overhead drops becomes smaller. With 3 frames per packet, one would need only 18.7 kbit / s. The disadvantage is a greater delay: If this at 2 frames per packet still 25 ms (10 ms per frame 5 ms processing time), so these are in three frames already 35 ms. The delay is too great, this may be perceived by users to be disturbing.


  • ITU-T G.729 - The standard comprises a complete reference implementation of the ITU -T G.729 C for all variants.