Redundancy (information theory)

The concept of redundancy in information theory indicates how much information on average per character exists more than once in a source of information. A unit of information is then redundant if it can be omitted without loss of information. Identifying and removing such redundancies is called deduplication.

3.1 disadvantages
3.2 Advantages

News and information transfer

Redundancy is the portion of a message which contains no information. The redundant portion of the message can be a function of the information contained in the message. In the information technology and communications engineering application redundancy is used specifically to detect errors. A stronger redundancy enables not only the detection of errors and their correction equal. So redundancy allows an increase in quality ( fewer defects ) at the expense of quantity ( higher data rate ). The strength of the redundancy used in each case thus depends on the fault tolerance of the application - in banking and space could cost a lot of money a single bit upset, while Internet telephony or DVB even the continuing loss of entire packets is irrelevant.

Fault tolerance

A communication can be performed by introducing redundancy fault-tolerant via an information channel, as may be lost or corrupted previous partial information can be reconstructed by the receiver from its context. A measure of fault tolerance is the Hamming distance.

Average codeword length

The average code word length L ( C) of a source code C (z) with the probability distribution p (z) is given by:

Redundancy of a code

The redundancy of the code is the difference between the average code word length L ( C), and the entropy H (X). (Example: Huffman coding for optimal ( = minimal ) L ( C)).

The redundancy of the source is the difference between the maximum entropy Hmax (X ) = log2 | Z | and the entropy H ( X) of the news source.

Since the code word length can not be less than the entropy, the redundancy is never negative.

Encoding

In coding theory two forms of redundancy are distinguished:

The distribution redundancy lies in the different probable occurrence of each character of the alphabet.
Binding redundancy is that after the occurrence of a certain character given another character is particularly likely to occur. For example follows in a German text on a q almost always a u

Databases and data structures

In the database development as well as data structures of programs, it is, if possible, to avoid redundancy, since this can lead to higher memory requirements and inconsistencies. Redundancies are therefore counted among the anomalies. Redundancy -free treatment as a basic principle for a logical data model.

By normalizing the database schema, redundancies can be largely avoided. There are also redundancies are unavoidable (for example, key redundancies) and therefore taken as a necessary evil in purchasing. There are also redundancies that are accepted because their avoidance would be too high a cost in relation to its issue, such as the multiple occurrence of an attribute value or twice the storage of the name Müller for Mr. Miller and Mrs. Muller.

The deliberate acceptance of redundancy to obtain a better reading performance is called denormalization.

Disadvantages

In data structures of programs and databases redundancies can lead to program errors. The programmer must make sure that he also holds the redundant data consistent with all changes. This requires a high effort synchronization. The larger the project is, and will ever more developed in the project, the more difficult this is. When multiple programmers working unknowingly independently of redundant data, so it is almost impossible to keep the changes consistently.

Benefits

There are some cases in which intentionally caused data redundancy reduces the computation time of the software. She can be reached through targeted denormalization. This precisely calculated and deliberate redundancy, however, is clearly distinguished from carelessly incurred redundancy because someone does not apply the normalization rules. Denormalisierungen increase usually read performance, but degrade the write performance.

Cybernetics
Information Theory
Database Theory

Self-information Context (language use) Entropy (information theory)

219766