MP3

Template: Infobox file format / Maintenance / Developers missing template: Infobox file format / Maintenance / missing site

FFFB hex \ xFF \ xfb ( ASCII C notation )

MP3, proper spelling: mp3, (name for the file name extension, actually: MPEG -1 Audio Layer III or MPEG-2 Audio Layer III) is a method for lossy compression of digital audio data stored. MP3 makes use of psychoacoustic with the aim to save only for man perceptible signal components. Thus, a strong reduction of the amount of data is possible with not or only hardly reduced perceived audio quality.

In a sample compression data rate of 196 kbit / s, which already allows for a high quality, the amount of data an MP3 audio file here is about seven times smaller than on an audio CD. MP3 is the dominant method for storing and transferring music to computers, smart phones, the Internet and portable music players ( MP3 players ), although there are now a number of technically advanced alternatives. The procedure was developed under the direction of Karlheinz Brandenburg and Hans -Georg Musmann mainly in Germany.

  • 6.1 Frame header
  • 6.2 frame data

History

Was developed the MP3 format from 1982 under the direction of Hans -Georg Musmann by a group led by Karlheinz Brandenburg of the Fraunhofer Institute for Integrated Circuits (IIS ) in Erlangen and at the Friedrich- Alexander -University Erlangen -Nuremberg in cooperation with AT & T Bell Labs and Thomson. As of 1989, the development of the ISO / IEC JTC1 SC29 WG11 ( MPEG) has continued. In 1992 it was established as part of the MPEG -1 standard. The history of standardization and the appreciation of the contributions of researchers in Genesis of the MP3 audio coding standard by Hans Georg Musmann in IEEE Transactions on Consumer Electronics, Vol 52, No. 3, pp. 1043-1049, August 2006 shown. . Mp3 The file name extension ( as an abbreviation for ISO MPEG Audio Layer 3) was determined according to a institution's internal survey on 14 July 1995, previously been internally the file name extension. bit used. As with many of the current coding core areas are protected from MP3 by patents. Brandenburg has received several awards of this data format for development.

In the mid- 1990s, players and software for personal computers in circulation, which made it possible to store and play back compressed MP3 files. The exchange of such files over the Internet is simplified: Even with a simple ISDN speed was required for the transmission of only two to three times the playing time; with DSL lines, the transmission was even far below the playing time. This soon led to a lively exchange trading without regard to copyright. Attempts by the music industry to tackle it, are still marked by only moderate success, especially as more and further develop the exchange systems, peer - to-peer principle work without central, controllable instances. The late 1990s already large collections of music files on the Internet, such as at MP3.com or Napster, which limits the number of users allowed to rise substantially developed. As of 1998, the first portable MP3 player appeared on the market.

Patents and licensing disputes

The Fraunhofer-Gesellschaft and other companies have software patents on partial methods that are used for MPEG encoding. An all-encompassing MP3 patent does not exist. The Fraunhofer-Gesellschaft has the largest part contributed to the development of the MP3 standard and can be patented, some methods for MP3 encoding. In a merger with Thomson they both own 18 MP3 - related patents. Since September of 1998, when the MP3 standard was established for six years, requires FhG / Thomson royalties for the production of hardware and software, using the MP3 format.

In the development of the format to have been resorted to patents of Bell Laboratories. These rights are currently at Alcatel- Lucent, which took over the Bell Labs. The company has filed patent lawsuits against Microsoft, Dell and Gateway a few years ago. In the case against Microsoft Lucent was the first instance awarded 1.52 billion U.S. dollars in February 2007. This judgment, however, was repealed in San Diego in August 2007 by the Federal District Court. The company Sisvel collects on behalf of Philips also claims of patent infringement.

Method

Like most lossy compression formats for music using the MP3 process of psychoacoustic effects of human perception of sounds and noises. For example, a person can have two tones only when a certain minimum difference of pitch differ, before and after very loud noises, he may for a short time perceive small sounds worse or not at all. So you do not save the original signal exactly, but satisfy the signal components, which can also be perceived by the human ear. The task of the encoder is the original audio signal according to fixed, based on the psychoacoustics rules be treated so that it requires less storage space, but the human ear still sounds exactly like the original. This is called subjective in full compliance of original and MP3 variant in the perception of the listener of transparency. The case from the encoder from the original signal, for example from an audio CD, remote data or information will be irretrievably lost, ie when they are in the MP3 signal is no longer present and that is also in principle no longer be reconstructed. This explains the term lossy compression. There are also lossless method for audio data compression such as FLAC, but they can reach much lower compression ratios and are less common.

When playing back the thus-generated MP3 signal to the decoder from the reduced data generated and for the vast number of listeners original sounding analog audio signal is not identical to the original signal but, as in the conversion to MP3 format information has been removed. If one were to compare the temporal waveform of the MP3 audio signal with the original, such as on the screen of an oscilloscope, so significant differences were seen. Because of the above -mentioned psychoacoustics of human perception, the MP3 signal to a listener hears but - assuming a sophisticated encoder and a sufficiently high data rate (bit rate) when encoding - just like the original on.

During the decoding always follows a fixed algorithm, the encoding can be done by different algorithms (eg Fraunhofer encoder, LAME encoder ), and accordingly provides different acoustic results. The question of whether this occur for some or even many listeners perceptible loss of quality depends among others on the quality of the encoder, the complexity of the signal from the data rate on the used audio equipment ( amplifier, connecting cables, speakers) and finally by the ear from the listener. The MP3 format allows, in addition to fixed data rate of 8 kbit / s up to 320 kbit / s, in the free-format mode, any free data rates up to 640 kbit / s ( Freeform MP3). However, few MP3 player decoder for higher bit rates than the ISO standard are (currently up to 320 kbit / s) designed.

The quality impressions are quite subjective and varies from person to person and from hearing to hearing. Most people can from a higher bit rate and use a mature Enkodierers also for concentrated listening the encoded material is indistinguishable from the starting material. Nevertheless, could in a listening test of the magazine c't certain pieces of music, even at 256 kbit / s can be distinguished from CD quality. However, the test was carried out in 2000 - since then the MP3 encoder, however, have significantly improved. In people with " abnormal " hearing (eg with hearing loss by acute acoustic trauma ) access mechanisms used but sometimes not work as intended, so that differences between them encoded and raw material are more noticeable (eg because loud sounds that the damaged ear ill hear other sounds can no longer conceal well ).

In addition to the coding with a constant data rate ( = variable quality, along with the changing over time, complexity of the audio signal ) is also a code having a constant quality (and thus varying the data rate ) is possible. This avoids ( mostly ) quality falls on difficult to encode music sites, but saves the other hand, in quiet or even completely silent passages of the audio stream to the data rate and thus on the final file size. The quality level is specified, and is obtained in this way for the minimum necessary file.

Data compression

  • A first step of the data compression, for example, based on the coupling channel of the stereo signal by forming the difference, as the data of the right and left channels is highly correlated, that are very similar. This is a lossless method, the output signals can be completely reproduced ( Mid / Side stereo).
  • According to the human hearing curve signal components are displayed in a less precise perceptible frequency ranges with less precision by the fouriertransfomierte data material will be quantized.
  • So-called masking effects are used to store the listening experience less important signal components with reduced precision. That may be about weak frequency components in the vicinity of strong overtones. A strong tone at 4 kHz but can also mask frequencies up to 11 kHz. Therefore, the largest savings in the MP3 encoding is that the tones are only just as accurately ( with so many bits) stored, that the resulting quantization noise is not masked and not be heard.
  • The data present in the so-called "frames" are finally Huffman coded.

Very heavy compression can often also quite audible signal components are detected by the compression, they are then heard as compression artifacts.

A design flaw is that the method is applied in blocks and so at the end of a file can be gaps. That bothers example for audiobooks, in which a coherent presentation to better locate the passages was broken down into individual tracks. This fall the last blocks on as disturbing breaks. Remedied by using the LAME encoder, the exact length of adding information, in combination with a player that can handle these, about foobar2000 or Winamp. Some player programs such as Windows Media Player does not support this gapless playback method mentioned. Apple iTunes is supported from version 7

Compression in detail

The compression consists of the following steps:

Lossy is step 4, the main data reduction follows from this step and step 5

Note: In the text you specified with spectral widths and times are based on an audio signal with 48 kHz sampling frequency.

Sub-band transform of the signal

In the sub-band transform, the signal using a polyphase filter bank 32 is divided into equal-width frequency bands (as for MPEG Layer 1, Layer 2 and MPEG DTS). The filter bank is working on a FIFO buffer with a size of 512 samples, which in a step always 32 new samples are supplied. This always 16 filter window overlap on the audio signal.

Although The decision to use equal-width frequency bands simplifies the filter, but does not reflect the human hearing, the sensitivity non- linearly dependent on the frequency.

Since, in practice no ideal filter exist, the frequency ranges overlap, so that a single frequency after filtering can also occur in two adjacent sub-bands.

Subbandfilterung is charged by the patent U.S. 6,199,039.

MDCT transform of the signal

The signals of the subbands are then transferred by the modified discrete cosine transform ( MDCT) to the frequency domain. Thus, the frequency bands are further spectrally resolved. The MDCT, the bands either in short blocks ( 12 samples results in 6 frequency bands ) or long blocks (36 samples, 18 frequency bands ) transform. Alternatively, the two lowest frequency bands, and the remaining long blocks can be transformed with the short blocks. Long blocks have better frequency resolution and are suitable when the audio signal in the corresponding frame is not suddenly changes ( stationarity ).

At the output of the MDCT, the signal is divided into blocks. Of 576 input values ​​( taking into account the window width of the filter, there are in fact a total of 1663 input values ​​) are generated by two cascaded transforms either

  • 576 spectral coefficients ( long blocks)
  • 3 × 192 spectral coefficients ( short blocks ) or
  • 36 3 × 180 spectral coefficients ( hybrid block, hardly used )

Matrixing

For 2- channel stereo signals can now be decided whether the signal as either a mono ( single channel ), stereo, joint stereo or dual channel should be encoded. In contrast to AAC or Ogg Vorbis this decision is global meeting for all frequencies.

The stereo method (not joint stereo ) (as well as dual channel ) lossy by the fact that even at 320 kbit / s only 160 kbit / s per channel are available, however, depending on the complexity of either one of the two channels assigned to different bit rates. Dual-channel stores two independent mono tracks (such as bilingual text tracks ) with the same bit rate coding; but not necessarily each decoder are both tracks at the same time again.

When joint stereo coding method, there are two: intensity and Mid / Side stereo that will be used or not combined; Both methods provide the sum of both channels have a central channel ( L R), and from the volume difference of the two channels to the side channel ( L-R). When intensity stereo phase ( skew ) of the signal is negligible in contrast to the Mid-/Side-Stereoverfahren. The joint - stereo method eliminates the frequent redundancy in the stereo channels to encode the signals with higher bit rate than the stereo procedure; the channel signals are very dissimilar, the joint stereo method falls back to the normal stereo encoding.

Since the audio signal is first differentiated into frequency bands, has the stereo information, if it is ever heard from recyclable; are also encoded differentiated as well. Here, for example, at low or frequencies above 2 kHz, information content can be saved by the fact that the question is not localizable signals no longer channel true, but subsumed with adjacent frequency bands coded ( intensity stereo ), or placed in the stereo center are.

Through continued development of the codecs the joint stereo method has recently been considered in music customary strongly similar stereo channels through the better compression ratio, higher bit rate coding and lossless (except tieffrequenziell ) stereo image as the best solution.

Quantization

The quantization is the essential step of the losses occur in the encoding. It is mainly responsible for the shrinkage of the amount of data.

Neighboring frequency bands are combined into groups 4-18 bins. They will get a common scale factor s = 2N / 4, with which they are quantized. The scale factor determines the accuracy of the coding of this frequency band. Smaller scale factors give a more accurate coding, more a less precise ( or no values ​​other than 0 more).

Of x0, x1, ..., x17, the values ​​of N, and Q0, Q1, ..., Q17 of the relationship xi ~ Qi4 / 3 2 N / 4

The non-linear coding Q4 / 3 (for negative values: - ( -Q ) 4/ 3) has been first introduced in the MP3 encoding. MPEG Layer 1 and 2 use a linear encoding.

This step is responsible for the quality as well as the data rate of the MP3 data stream formed substantially. He is supported by a psychoacoustic model that attempts to replicate the processes in the average human ear and the controller controls the scale factors.

Huffman Encoding

The scale factors and the quantized amplitudes of Q N of the different frequencies by means of a fixed code tables Huffman coded.

The final MP3 file consists of a sequence of frames that begin with a start mark (Sync) and containing one or two produced in the manner described above blocks.

Decompression

During decompression, the steps of compressing are performed in reverse order. After Huffman decoding the data by inverse quantization for the inverse modified cosine transform ( IMCT ) to be treated. This derives its data on to an inverse filter bank that now the original samples calculated (lossy by quantization in the encoding process ).

Development

MP3 is a very common especially in the Internet format. In industry, it is mainly used for PC games. It is a proprietary format that was developed as a successor of MP2 and included in the ISO standard.

In the industry was working at that time already in the MDCT -based AAC, which is designed clean and with a comparable effort produces better results.

In addition ( in the direction of a high quality encoding), there are also developments in order to achieve even at very low data rates (less than 96 kbit / s) an acceptable sound quality. Representative of this category are mp3PRO and MPEG -4 AAC HE or AAC . Transparency is with these methods, however, only by high-definition ( HD ) AAC accessible (AAC LC SLS).

An extension to multi-channel capability offers the MP3 surround format of the Fraunhofer Institute for Integrated Circuits IIS. MP3 Surround allows the playback of 5.1 audio at bit rates that are comparable to those of stereo sound and is also fully backward compatible. Thus, conventional MP3 decoder decode the signal in stereo, but produce MP3 Surround Decoder full 5.1 surround sound.

For the multi-channel material is mixed to stereo and encoded by a regular MP3 encoder. Simultaneously, the surround sound from the original be inserted as surround enhancement data in the " Ancillary Data" data field of the MP3 bitstream. The MP3 files can then be played by any MP3 decoder as a stereo signal. The MP3 Surround decoder uses the inserted extension data and returns the full multi -channel audio signal.

Further developments relate to methods for copyright protection, which could be implemented under circumstances in future releases.

Application

Audio raw material requires a lot of disk space (1 minute of stereo CD quality about 10 MB) and transfer (eg via the Internet) high data transfer rates and / or a lot of time. The lossless compression does not reduce the amount of data to be transmitted as strong as lossy process that for most cases ( exceptions are eg studio applications or archive ) still provide acceptable quality. So the MP3 format for audio data attained the status that has the JPEG compression for image data quickly.

MP3 was known to the general public mainly through music exchanges. In the warez scene the MP3 audio format is used by many DVD rips as soundtrack. With CD - ripper programs, it is possible to extract the music from audio CDs and output to MP3 files. Also, there are many programs that make it possible to transform MP3 by a conversion to another format, but also vice versa (example: audio track of a YouTube video ( FLV) is converted into an MP3 file). Another application, focusing in MP3 player, with which you can also listen to music on. Nowadays most smartphones also support MP3 files.

In the WWW find numerous applications for MP3 technology, of self-composed music on (self ) Audiobooks spoken, radio plays, bird calls and other sounds through to podcasting. Musicians can now also without a sales spread their music worldwide and sound recordings without much effort (apart from the GEMA fees, even on his own compositions, which are registered with GEMA ) make available on a website. Users can search all possible ( non-commercial ) find sounds and genres.

Even with multimedia software, especially for PC games, which are often numerous audio files in MP3 format can be stored. In addition, MP3 place at numerous - Application online music stores - mostly smaller.

Tagging

Unlike more modern codecs MP3 files originally offered no way to store metadata (eg title, artist, album, year, genre ) to the contained piece of music.

Regardless of the developer of the format for a solution was found, which is supported by almost all software and hardware players: The ID3 tags are hanged at the beginning or the end of the MP3 file. In the first version ( ID3v1 ) they are appended at the end and are limited to 30 characters per entry and a few default entries. The much more flexible version 2 ( ID3v2 ) will, however, not all MP3 players (especially hardware players ), because here the tags are inserted at the beginning of the MP3 file. Also within ID3v2, there are considerable differences. The most common are ID3v2.3 and ID3v2.4, ID3v2.4 which only officially the use of UTF - 8 encoded characters permits (previously were only ISO -8859 -1 and UTF -16 allowed). Many hardware players show UTF -8 tags only as confused characters. Since ID3v2 tags are at the beginning of the file, this data can for example read as it travels over HTTP without reading the whole file only or to request several parts of the file. In order to avoid that the whole file has to be rewritten with changes, are usually used padding, that is to reserve in advance accommodate these changes.

The metadata from the ID3 tag can for example be used to display information about the currently playing piece to sort the tracks in playlists (playlists) or to organize archives.

Specification

Frame header

Frame data

The frame header, the frame data to follow ( optionally first CRC), where the encoded audio data is included. The frame data always contain exactly 26 ms of audio data that can be calculated based on the specified properties in the header, the corresponding data length. The size of a frame can then be calculated with the following formula, in which the division shall be carried out as integer division:

With a complex, the amount of pieces of music data can not be stored in a frame, has a so-called MP3 "byte reservoir ". This memory area is determined as additional space for the file and extends the data in the corresponding frame. For this purpose, the encoder encodes previous music passages with lower bandwidth and therefore fills previous frames are not completely, the so-called "byte reservoir " is formed. This created free space can now be used for the higher amount of data more complex musical passages. The maximum size of this reservoir data is 511 bytes, only previous frames must be padded.

Common implementations

To encode MP3 files of licensed encoder of the Fraunhofer -Gesellschaft and the encoder of the open source project LAME are available. There are also the reference encoder ISO dist10 and other projects such as Xing, and Gogo blade.

As a decoder there mpg123, MAD, libavcodec and more.

Alternative formats

Besides MP3, there are numerous other audio formats. The Vorbis format is open source and is referred to by the developers as opposed to MP3 -patent. Vorbis has proved to technical analysis and in blind tests against MP3 especially in low and medium bit rate areas as superior, while in the high range (around 256 kbit / s), the projection is minimal. In addition, provides Ogg Vorbis multi-channel support and Ogg container format also allows you to video and text data. This is only supported by very few MP3 players and radios.

Real Audio from Real Media was primarily used for audio data streams ( streaming audio).

The free, based on MP2 algorithms Musepack (formerly MPEGplus ) was developed to allow s still better quality than the MP3 format at bit rates above 160 kbit / s. However, it could not become widely, since it is more aimed at the use by enthusiasts in the high -end sector and is widely supported in the commercial sector. Files in the Musepack format can be recognized by the extension mpc or mp .

Advanced Audio Coding (AAC) is a standard in the context of MPEG-2 and MPEG-4 method, which was developed by several large companies. Apple and RealMedia use this format for their online music stores, and the Nero AG provides an encoder for the format provided. With faac also a free encoder is available. AAC is superior at low bit rates up to about 160 kbit / s MP3 in sound quality - the lower the bit rate, the more obvious - allows multi-channel audio and is used by the industry ( for example, mobile phones and MP3 players ) wide support.

Windows Media Audio (WMA ) is an audio format developed by Microsoft and is often used for DRM-protected downloads. Although it can be played back on many common platforms, it can not compete with the MP3 format.

Find out more

The team led by Brandenburg made ​​the first practical tests with the a cappella version of the song Tom 's Diner by Suzanne Vega. Brandenburg heard the song by chance and felt immediately the piece as a suitable challenge for an audio data compression.

20929
de