Data compression#Audio

Audio data compression (often referred to briefly as ambiguous audio compression ) is a data reduction ( " lossy " algorithm) or data compression ( " lossless " algorithm).

Audio data compression refers to specialized types of data compression to effectively reduce digital audio data in its size. As with other specialized types of data compression (especially video and image compression), specific properties of the corresponding signals exploited with different ways to achieve a reduction effect.

Do not confuse this type of compression with the method of dynamic narrowing (also called dynamic range compression ), which is normally used for lifting of quieter or lowering louder passages in an audio signal, and no data saves (see compressor).

  • 2.1 Psychoacoustics
  • 2.2 Quality 2.2.1 compression artifacts
  • 2.2.2 Generation loss
  • 2.2.3 Quality assessment

Lossless audio compression

As lossless audio data compression, less precise, less lossless audio compression or lossless in the appropriate context compression or English lossless audio (LA), is referred to methods that generate data from input signals, which allow a bitidentische reconstruction of the output signal (see data compression).

The lossless audio codecs differ from general algorithms for data reduction in that they are specially adapted to the typical data structure of audio files and therefore in almost all cases audio files compress better than non-specialized methods such as Lempel -Ziv - based algorithms ( Deflate / ZIP, RAR). The achievable with current methods compression rate is typical audio CD content (music, Bit/44100 16 Hz ) is usually between 25 and 70 percent.


The procedures are used in recording studios, on newer recordings such as SACD or DVD-Audio or increasingly in private music archives, quality-conscious music listener who want to avoid, for example, generation loss. In addition, many data compression method from the audio range are interesting to other signals, such as biological data, medical curves or seismic data.


The majority of the recordings are natural sounds recorded from the real world; Such data are difficult to compress. Similar to compress photos can not be as good as computer-generated images, although computer-generated sound sequences can contain very complicated waveforms that can be only poorly with many compression algorithms reduce.

In addition, the values ​​of audio samples change very quickly and there are rarely consequences of same bytes, so general data compression algorithms do not work well.

Find More economical representations

The PCM representation of sound waves can be by their nature generally difficult to simplify without a forced lossy conversion in frequency sequences, as they take place in the human ear.

In the case of audio data

  • Similarities between the (stereo ) channels and
  • Dependencies between successive samples ( by decorrelation ) and then
  • Entropy of the samples of the residual signal

Be utilized.


Channel coupling

By coupling channels dependencies between channels can be exploited. By providing a channel about the difference to an existing or a new center channel describes the repeated description of common content can be avoided.

The difference signals can be stored either lossless, lossy quantized and coded according or stored, for example also to parametric descriptions abstracted.


For the exploitation of dependencies between successive samples decorrelation is performed by attempting to predict the course of the sound curve. This can be calculated a Rest-/Differenzsignal which is correspondingly weak with good prognosis (ie, little has significant digits ) and can be compressed with a Entropiekodierungsmethode beyond. To this end, in most cases, samples are extrapolated using sophisticated, adaptive ( adaptive ) prediction methods from others.


The entropy coding of the decorrelated residual signal makes use of different probabilities of occurrence and similarities of the samples. For this example Rice codes are used often.

A process is symmetrical, when to decode the signal, the same steps as in the coding passes and vice versa is dependent of computational complexity for encoding the computing effort necessary for the decoding.

Process features

With lossless codecs should be excluded by definition, differences in quality of the audio signal, procedural differences lie here in the following features:

  • Compression rate
  • Direct playback of compressed data
  • Jump to any position in an audio stream
  • Resources required by the compression and decompression
  • Software and hardware support
  • Flexibility in dealing with metadata
  • Type of license
  • Cross-platform availability
  • Support for multi-channel signals
  • Support different resolutions - in time ( sampling frequency ) and the depth of sound ( sampling depth )
  • Possibly additional lossy, or even hybrid modes ( lossy correction file )
  • Streaming support
  • Fault tolerance / correction mechanisms
  • Embedded checksums for quickly checking a file for completeness
  • Symmetrical and asymmetrical coding options ( Un-/Abhängigkeit the decoding of the coding speed )
  • Supports the creation of self-extracting files
  • Compatibility with the Replay Gain standard
  • Support for embedded cue sheets
  • Possible saving of header data of the original format

Lossless audio formats

Lossless audio formats are:

  • Apple Lossless (also: Apple Lossless Encoding, or Apple Lossless Audio Codec ( ALAC ) )
  • Adaptive Transform Acoustic Coding - Advanced Lossless (ATRAC )
  • Free Lossless Audio Codec (FLAC)
  • LA Lossless Audio
  • Meridian Lossless Packing (MLP )
  • Monkey's Audio ( APE)
  • MPEG-4 Audio Lossless Coding ( ALS)
  • MPEG -1 Audio Layer 3 ( mp3HD )
  • OptimFROG
  • Shorten
  • Toms TAK lossless audio compressor
  • The True Audio ( TTA )
  • WavPack (WV / WVC )
  • Windows Media Audio Lossless (WMA Lossless )
  • Emagic ZAP

Lossy audio compression

As lossy audio data compression, less precise, less lossy audio compression, respectively, in the appropriate context Lossy compression or English " lossy " (lossy ), referred to methods that perform data reduction and selectively save less relevant signal components usually approximated with poor precision or discard irretrievably.

In simple processes such as μ -law and A-law only the individual sampling points of the PCM data stream are quantized using a logarithmic characteristic curve depends on the level. Procedures such as ADPCM already exploit the correlations of successive sampling points. Modern methods are mostly based on frequency transformations in conjunction with psycho- acoustic models that mimic the properties of the human ( inner) ear, and according to his inadequacies reduced the display of precision masked signal components. Models are used for specialized method further, that recreate the sound generators and so enable a sound synthesis by the recipient or, in the decoder, which then a large signal component with parameters to control the synthesizer can be described.


Most modern methods do not try to reduce the mathematical error, but to improve the subjective human perception of tone sequences. Because the human ear can not analyze all the information of an incoming sound, it is possible to modify a sound file strong, without the subjective perception of the listener is affected. Thus, a codec, for example, store a part of the sound portions at very high and very low frequency ranges, which lie on the edge of audibility, with more reduced precision or exceptionally even discard completely. Even soft sounds with less accuracy can be reproduced, since they obscured by loud sounds of neighboring frequencies ( "masked" ) are. Another type of superposition is that a low-level sound is not clear if it immediately before or after a loud sound comes ( temporal masking ). Such a model of the ear - brain connection that is responsible for these effects is often psychoacoustic model (also called " Psychoaccoustic Model ", " Psycho -model " or " Psy -model "). Exploited here are properties of human hearing such as frequency grouping, Hörbereichsgrenzen, masking effects and signal processing of the inner ear.

Most of the works according to a psychoacoustic model lossy compression algorithms are based on simple transformations, such as the modified discrete cosine transform ( MDCT), which convert the recorded waveform into its frequency sequences and thus find approximate representations of the starting material, which can be efficiently quantify, since the representation of the is closer to human perception. Some modern algorithms using wavelets, but it is not yet certain whether such algorithms work better than those based on MDCT.

Lossy compressing methods allow inherently only the reconstruction of an approximately similar signal. With many process transparency can be achieved, that is, for the auditory perception ( human ) a degree of similarity are obtained, in which there is no difference to the original can be detected. Below the threshold, the transparency into the entrained signal compression artifacts are audible. At the upper end of the scale, the transparency in which no difference to the original is perceptible. It can be found in Blind listening. In most cases, represents roughly represents a threshold in the amount of bit rate at which transparency is possible, with a more or less great risk to exceptional situations remains that can be transparently encoded (yet). This risk is reduced usually at a further increase of the bit rate and depends on, inter alia, by the architecture of the procedure. Here, then, more modern methods can often come up with better mechanisms for the control of problem areas. Below the transparency threshold of the compression process, the compression artifacts that may still be masked to some extent by the disorders that bring substandard units in the playback. When perceptible compression artifacts an objective comparison of different methods is much more difficult because it often relies largely on the subjective preferences of the listener. Standards can be, for example, the naturalness of the sound here - for example, whether the artifacts naturally occurring disturbances such as noise are similar. At the lower end of the quality scale, the reception threshold is with voice codecs usually still considered can not be reproduced understandable below which language content.

Compression artifacts

In compression method based on frequency transformations arise as a typical artefacts including a markedly thinned, poorer sound spectrum, resulting, for example, to Zwitscherartefakten ( " birdie artifact" ) or characteristic muffled bubbling or gurgling sound and anticipatory echoes (English "pre- echo artifacts" ) with sharp, high-energy sound events (transients).

Generation loss

Since the working parts of a lossy compression method generate loss in the control at each cycle ( other ), there is a so-called loss of generation, for example, when the transcoding compressed file, then decompressed and then compressed again. This happens in practice, especially when an audio CD is burned from lossy audio files ( audio CDs are uncompressed), and the material is read and compressed later. This makes lossy files unsuitable for applications in professional Tonbearbeitungsbereichen ( "Data reduction is audio destruction" ). However, such files are very popular with end users, because a megabyte for a minute of music at acceptable quality ranges approximately depending on the complexity of the clay material, which corresponds to a compression ratio of about 1:11.

Exceptions are here for example lossy prefilter for combination with lossless methods such lossyWAV, edit the PCM data to subsequently achieve greater compression with a ( certain ) lossless compression method working. This case, the data generated by the pre-filter - at least as long as they are then not changed - of course as often compressed using the lossless compression methods and decompressed working without suffering further losses.

Quality assessment

The following estimates are based on various listening tests of This forum provides a platform that is visited by interested and experienced users as well as developers of different audio compression methods such as MP3 ( LAME encoder ), Vorbis or Nero AAC. Due to the high number of participating subjects is statistically reliable quality statements emerge.

Since the development of MP3 ( to 1987) on the initial use of the codec ( for 1997-2000) to become the world's most popular audio format ( since about 2003) the output quality has been steadily improving. Likewise, other formats such as Vorbis, WMA or AAC were developed to represent an alternative to MP3 or replace this long-term. These formats have been continuously developed.

An MP3 file with a bitrate of ~ 128 kbit / s in 1997 sounded very modest. The promised CD-like quality has not been reached at that time. In 2005, so that current hearing tests, provides the encoder LAME for the same format at ~ 128 kbit / s for the clear majority of listeners already transparent, so indistinguishable from the original recording quality.

A comparable quality can be achieved already with 96 kbit / s using the AAC format, according to a hearing test from August 2007.

Listening tests with the latest bit rates of 48 and 64 kbit / s show that at these low bit rates already quality can be obtained which is suitable for use in portable devices or web radio.

Currently (August 2007) can be stated, therefore, that with a good quality encoder and the correct size of a very good quality can be achieved at 96-128 kbit / s, which can not be distinguished for the clear majority of users of the CD.

Lossy audio formats

In the examples, the bit rates are, as far as known, specified, in which a compressed file from most people is indistinguishable from the original, that sounds transparent - for concentrated listening with good facilities and a mature codec of the respective compression scheme; depending on the type of the music. However, it must be noted that transparency is not perceived by each at the same bit rate. The quality of the D / A converter, amplifier and speakers plays an important role here. While lossy compression is on studio equipment usually very clear, even for laymen, audible, they may not be distinguishable from the original on inferior players for the professional. The data are therefore an approximate value for the average listener with average equipment. The bit rate is 1411.2 CD kbits / s ( kilobits per second).

For comparisons of various audio codecs, see Related links.

  • AC-3, Dolby Digital or similarly named
  • Atrac3plus ( in Hi- MD and other portable audio devices from Sony): 48-352 kbit / s
  • DTS
  • Mp3PRO
  • Opus
  • WMA
  • LPEC
  • TwinVQ