Speex is a free - including patent-free - lossy audio codec by Jean -Marc Valin, which is especially designed for space-saving storage of audio data that contains human speech. For other types of signals he is - like all voice codecs - usually unsuitable. The name is a phonetic spelling of the English word speaks (3rd person present tense: "talks" ).

The method is developed under the umbrella of the Xiph.Org Foundation and is released under BSD -like license Xiphs. It is intended as a supplement to the lossy general purpose method Vorbis.

The data is stored by default in the Ogg container format. However, Speex files have normally for easier differentiation to Ogg Vorbis the ending. Spx. However, Speex can also be transferred into other containers or without containers, such as in the IP telephony common, is where most directly transmitted over UDP / RTP. Compared with general-purpose compression methods such as MP3 or Vorbis can be hereby reduce no music data or other other signal types without clearly audible loss of quality; this can be achieved with spoken text significantly better compression rates.

Speex ' MIME type is audio / x - speex, audio / speex to be requested in the near future.


Speex comes in many IP telephony applications. Since Flash Player 10 it can be used instead of the deprecated Nellymoser codec with Adobe Flash (next ADPCM, HE- AAC, MP3, and Nellymoser ).


Unlike many other speech codecs Speex targets ' bit-rate range as well as its fault tolerance and correction mechanisms not on mobile phone applications, but rather common for IP telephony or files circumstances. The draft format was designed to create a codec that can achieve both excellent voice quality and low data rates. This led to a codec with multiple bit rates. In order to use in the IP, instead of mobile telephony as a transmission error does not mutilated, but lost packets are expected. This all - or - not - delivery of the data packet ensures the UDP used. These considerations led to the decision for Code Excited Linear Prediction ( CELP ) as basic technique behind Speex. One main reason is that CELP is already low (example: DoD CELP at 4.8 kbit / s) has been proven as well as for higher bit rates (such as G.728 at 16 kbit / s) as appropriate.


The main features can be summarized as follows:

  • Free software / open source, patent and royalty- free
  • Large data rate range ( 2-44 kbit / s)
  • Different levels of complexity
  • Relatively high sampling rate (up to 48 kHz)
  • Possibility to encode within a same data stream in different bandwidths
  • Dynamic bitrate change and variable data rates ( VBR)
  • Intensity stereophony, option for encoding in intensity stereophony
  • Packet loss concealment
  • Echo cancellation
  • Voice Activity Detection (English Voice Activity Detection VAD, in the variable bit rate mode integrated)


In order to allow very good quality are also supported higher than usual for telephone quality 8 kHz sampling rate. Speex supports sample rates up to 48 kHz, but is primarily designed for 8, 16 and 32 kHz, which are referred to as narrowband, wideband and ultra-wideband.

The Speex encoding is basically controlled via a parameter that specifies a quality level. This can take values ​​from 0 to 10. For constant bit rate ( constant bit rate CBR) are integer values ​​specified for variable bit floating-point number.


At Speex it is possible to set the encoder in different levels of complexity. The search depth is determined by an integer between 1 and 10, which usually expresses the Störgeräuschintensität at level 10 from level 1 to about one to two decibels, but increases the computational complexity by approximately a factor of 5. As a good compromise between the range of level 2-4 is recommended, with the higher settings for signals that contain something other than human language, are often helpful.

Variable bit rate ( VBR)

VBR allows the codec, the bit rate dynamically adapt to the complexity of the signal. In the case of Speex that means, for example, specifically that vowels and strong transients for an adequate representation of data need more than fricatives. Therefore, higher quality is variable bit rate possible for the same data, expense and fall by a comparable quality to achieve less data. This mode is of course less on streaming applications, since the capacity of the transmission channel specifies a fixed upper limit, which may possibly not be met when an achievable quality level is specified and the input signal contains too complex a place. Further, in this mode, a total of not predictable and the mean bit rate.

Average Bit Rate ( Average bit - rate ABR)

In this case, the quality is dynamically adjusted in real-time ( open-loop ) to achieve a certain target bit rate, so that the average bit rate is unpredictable. In total, a slightly lower quality is achieved when the encoder in real variable bit rate would be exactly adjusted to the desired average bit rate.

Voice Activity Detection (Voice Activity Detection VAD)

Speex detects silence or background noise and stores for those areas only descriptive parameters that allow the generation of a similar to the human ear background noise, so-called comfort noise ( comfort noise generation german CNG). In the variable - bit-rate mode, this method is included.

Uneven transfer ( Discontinuous Transmission DTX )

This technique is an addition to the variable bit rate and voice activity detection, any data transmission can be set with the with constant background noise. In file -based operating placeholder frames are generated, which require five bits, giving a bit rate of 250 bits per second.

Perceived improvement (Perceptual enhancement )

This refers to techniques that are used to hide the received by the Codierungs-/Decodierungsprozess deviations from the original signal prior to human perception, which usually also alienated the sound in favor of a subjective sound improvement from the original.

Algorithmic delay

This corresponds to the length of a Speex frames plus a certain lead before the processing of a frame can be started. For the narrowband mode (8 kHz ) result 30 ms, for broadband (16 kHz) 34 ms delay.


Mainly comes Speex for telecommunication over the Internet for use such as for IP telephony and communication during online games ( for example, Teamspeak, Mumble and Counter- Strike ). Other applications include audio streaming, audio books and podcasts spoken. Accordingly, Speex is supported by a variety of programs from numerous fields, including audio playback programs (Winamp, XMMS, foobar2000 ), Audio Editors, IP telephony programs ( Ekiga, Jitsi, Jabbin, Linphone, KPhone, Twinkle ), and video games. On the Speex website is a list of programs and plug-ins. There are a DirectShow filter and an ACM codec, on which they were based Speex functions of many programs. The U.S. Army uses Speex in a designed by Raytheon EPLRS radio telephone system on their Land Warrior system. Microsoft also uses Speex ' for the Xbox Live headset, as the supervisor of the Theora codec Ralph Giles on LugRadio reported. On the iPod and other mp3 players to Speex can play with the open source Rockbox firmware.

As the Chaos Computer Club announced in a publication for Federal Trojan, Speex is also here for compressing voice recordings for use.