Hamming distance

The Hamming distance (even hamming distance ) and the Hamming weight, named after the American mathematician Richard Hamming Wesley ( 1915-1998 ), are measures of the diversity of character strings. The Hamming distance between two fixed length blocks (so-called codewords ) is the number of different locations.

The Hamming distance is used for error detection and error correction by the data units that are received via a transmission path, compared with valid characters. A possible correction of the characters is based on the probability principle. If an error detection or correction can take place, depends on the Hamming distance.

Often it is represented in binary numbers, such as in the coding theory. In this case, can be computationally implemented by an XOR operation and the counting of the resulting ones of comparison. However, there are also important applications for other number systems or alphabets.

  • 4.1 determination of the Hamming distance of a code
  • 6.1 Example
  • 7.1 example
  • 7.2 conclusion

Definition

Be a finite alphabet, and from, that is the same length words over this alphabet. Then we define the Hamming distance between and as:

Examples

Program Example

In the C programming language, the Hamming weight of the data word are determined as follows:

HammingDist int (int word1, word2 int ) {    int weight = 0; / / Number of 1 bits, starting with 0    int word = word1 word2 ^; / / Get all of the different bits      while ( word) / / As long as there are bits available ...    {      if ( word & 1 ) weight ; / / Add the lowest bit      word >> = 1; / / Remove lowest bit, move remaining bits by 1 to the right    }    return weight; / / Return the number of bits } or alternatively with the faster terminating algorithm by Wegner:

Weight ;    & word = word - 1; Hamming weight

The Hamming weight of a string is defined as the number of characters with the exception of the null character. This is at the same time to the Hamming distance to the zero vector ( the same length of string that consists only of zero characters).

For integers in binary representation is the number of set bits. For example, the Hamming weight of 1011 is equal to 3

Hamming distance of a code

Among the Hamming distance of a code is defined as the minimum of all the spaces between words within the code.

Example:

The smallest of the three distances is 1, the Hamming distance of the code is also equal to 1

Important is the Hamming distance, when one wants to develop code that enable error detection (EDC) or correction (ECC).

For codes with Hamming distance h all ( h-1) -bit error can be detected. In the example with h = 1 can thus not even any 1-bit errors are detected (x ↔ z does not occur to all other 1 -bit errors generate invalid code, such as 00111 from x or y).

For h = 2, all 1-bit errors can be detected. To be able to correct the errors, the Hamming distance must be increased to at least 2r 1, where r is the number of correctable bit errors.

For h = 3, all 1-bit errors can be detected and corrected. Step 2 -bit errors occur, these "corrected" under circumstances incorrect because the incorrect word may have the distance one to another valid codeword.

Wherein h = 4, all one -bit errors can also be detected and corrected. Step 2-bit errors, they can recognized but not be corrected. An incorrect " correction " is possible from 3 -bit errors.

The Hamming distance of a code is necessarily a natural number. A code with Hamming distance 0 is not possible because there are two code words could not be distinguished in this case.

Determine the Hamming distance of a code

The manual determination is best done with the Karnaugh - Veitch - Diagram. There you wearing a cross for each one occurring code value. Then, at least two crosses horizontally or vertically to each other with opposite edges coincide, then the Hamming distance = 1 If two crosses either diagonally to each other or to a field between horizontal or vertical to each other, so is the Hamming distance = 2. Karnaugh - Veitch the adjacent diagram for 4 bits ( gray boxes are cyclic repetitions ) shows the distance of a code value from a given ( cross). So you can, for example, realize that there are only two 4-bit values ​​with Hamming distance = 4, and that a complementary pair.

In binary code, the Hamming distance between two codewords a and b by ( a XOR b ), and the counting of the ones in the result can be determined.

Example of use

When transferring data must be ensured that information is not distorted or that changes in the data are at least noticed (detection of n-time errors) and perhaps can be corrected.

In the following example, a rotary switch has four settings. These are electronically as a binary number (code word) is transmitted to a receiver: 00, 01, 10, 11; The recipient will receive the code word, but otherwise has no way to check the switch position or to recognize. This is already the case in industrial applications, when the receiver is a microcontroller and said transmitter consists of the sensors within a switch.

The receiver has no way of a distortion in the transmission, or a malfunction of the switch (for example, faulty sensors within the switch ) can be seen in this scenario. Using the Hamming distance and the appropriate code is now to be found a way to detect faults in the transmitters or in the conduit.

The Hamming distance between the said four values ​​00, 01, 10, 11 is in each case 1, that is if only one bit is inverted by an error, the recipient receives indeed a different but equally valid codeword. If a distorted 00 to 01, the receiver can not detect the error alone at the news because both the intentional as well as the distorted value describe a valid position of the switch.

To improve the situation, the sender and receiver first agree to only certain (but longer ) to use code words and determine their importance in a table. To both can choose, for example, the code words 001, 010, 100, 111, each of which to one another the Hamming distance of 2 are - the remaining four code words with three bits in length, are not used.

For a single faulty bit ( single fault ), none of these four codewords 001, 010, 100, 111 changes in one of the other three valid code words. Thus, the receiver detects when a 011 arrives that an error must have occurred. However, a code with Hamming distance 2 is not certain correctable, as this example shows that the 011 could be reversed by only one bit from one of the three valid codewords 001, 010, 111 have been created.

If the recipient accepts that only single errors occur and the receiver would like to correct them, they must agree to the transmitter codewords, each having a Hamming distance ≥ 3 have, for example, 01011, 01100, 10010, 10,101th

  • When the recipient receives 01111 and it assumes that a single error has occurred, may 01111 only have arisen from the valid codeword 01011, in which the middle bit has been changed.
  • A double error can be also detected. However, since the transmitter and receiver know that they only use certain code words that differ by at least three bits ( Hamming distance ≥ 3 ), a double fault falls (only two bits changed ), which are not corrected with the transmitted information can.
  • Triple errors can not be detected, but the relevance of multiple faults decreases in technical systems, since the simultaneous occurrence of multiple failures is increasingly unlikely to ever more errors come together.

The double fault opens the possibility of error, how can the example 01111 show that if 01111 should be the result of a double fault from 01100, but the recipient feels it was a simple mistake and corrected, then from the actually intended by the sender 01100 by the double error a 01111 and the correction of the receiver incorrectly (due to the assumption of a single failure ), a 01011th

Because of the already mentioned decreasing probability of multiple errors ( n -fold errors) with increasing n to get in most applications with a Hamming distance of 4 (detection of triple faults) to 5 ( correcting double errors ).

The required length of the code word depends on the required Hamming distance and the number of possible switching positions and is shown in the table above. There one sees, for example, that for 20 different positions of a switch at least 8 bits have to be transmitted when all 20 code words to each other at least to reach the Hamming distance ≥ 3.

Representation of the bit string in a hypercube

The idea of ​​the Hamming distance can be well illustrated with the help of hypercubes. A hypercube is the generalization of a three-dimensional cube to the dimension d Each node in the figure corresponds to a bit combination which can also be understood as a coordinate specification in the room. The minimum number of edges which must be traversed in order to move from a valid word of a code word to another valid code is equal to the Hamming distance.

Example

If d = 3, the two words { 101, 010 } are elected for a code in the adjacent cube, the minimum Hamming distance 3 Thus, in a sphere with distance 1 to a point on a valid word (eg are detected and corrected as the valid code word 010) all errors (1 -bit error ) { 000, 110, 011 }.

If a code with the words { 000, 101, 110, 011 } is selected, the minimum Hamming distance 2 with a Hamming distance of 2 to 1-bit errors can only detect but not correct ( for example, can be Although realize that 111 is a bad word, but not if it is to be corrected after 110 or 011 or 101).

Minimum distance

The minimum distance between two adjacent code words could be interesting for the construction of a code, which can correct for bit positions m of k payload error. In block codes with fixed alphabet deliver the Singleton bound, the Hamming bound (keyword t- perfect) and Plotkin barrier more general statements about the maximum minimum distance.

It applies to a code with minimum distance h that errors be corrected and errors are visible.

Example

If all single errors to be corrected, so k = 1, it follows by inserting and changing h> 2 with h = 3 one can see 2 -bit errors, but also correct individual errors only.

Conclusion

For each code, the Hamming distance h must therefore be at least 3, so any errors are correctable.

See also: Hamming similarity, Hamming code, Levenshtein distance, Gray code

372220
de