MD5

Message-Digest Algorithm 5 (MD5) is a widely used cryptographic hash function that generates any message from a 128- bit hash value. This allows for example the easy check a download for correctness. MD5 was developed in 1991 by Ronald L. Rivest. It is now regarded not as safe as it is possible with reasonable effort, to produce different messages have the same MD5 hash.

History

MD5 is a representative of a set of ( cryptographic ) hash functions which were developed by Ronald L. Rivest at the Massachusetts Institute of Technology. As analyzes showed that the predecessor MD4 is probably uncertain, was developed as a safe substitute MD5 1991.

As early as 1993 published Bert de Boer and Antoon Bosselaers an algorithm for generating pseudo- collisions on the compression function of MD5: two different initialization constants yield the same hash value for the same message.

1996 was Hans Dobbertin a collision for two different messages. It is a real collision, two specially prepared messages that differ, but still yield the same hash value. However Dobbertin used a modified version MD5 may be used ( for A, B, C, D) in the other initialization constants. Also, it was not possible to specify the content of the colliding messages. Thus practical attacks on MD5 were not possible, but the weaknesses of MD5 were significantly so that cryptographers advised a change to other hash functions.

2004 succeeded in a Chinese research group led by Xiaoyun Wang, to systematically generate collisions when the beginning of the message can be chosen arbitrarily, but the same for both messages (common -prefix collision ). At this beginning of the message can be calculated with reasonable effort, two different continuations of the message that lead to the same hash value. This collision is maintained even when both of messages ( each consisting of the same beginning and the one and the other continued) the same suffix is ​​added. This attack has been improved by Wang and other research groups, so a PC can be calculated within seconds an MD5 collision today.

The time required to find a collision is greater when the head of the two different messages ( chosen- prefix collision ). In 2008, a team led by Marc Stevens and Alexander Sotirov perform such a collision attack to create a fake and recognized as trusted CA certificate. With this, they were in principle able for any URL to fake an SSL certificate and thus bypass the security mechanisms of HTTPS on the web. The work was first presented in December 2008 at the 25th Chaos Communication Congress and a few months later published in a scientific article. For collision calculation they used a cluster of 200 PlayStation 3 Sony

The 2012 discovered Windows malware Flame used a fake code signing certificate that is based on a new and previously unknown variant of a chosen- prefix collision for MD5.

Preimage attacks can not be performed in a reasonable time with the methods mentioned. Thus, it is still impossible to subsequently create a fake document that matches a specific, generated with MD5 certificate. However, it is possible by collision attacks, in many cases, to create two documents that give the same MD5 hash value to have then sign the first legitimate document, and then replace this by the second, fake document. Against this background, it is not recommended for further use of MD5.

MD5 hashes

The 128 -bit MD5 hashes ( also English "message - digests " ) will normally be listed as a 32- digit hexadecimal number. The following example shows a 59 bytes long ASCII input and the corresponding MD5 hash:

Md5 ( " quick brown fox completely dilapidated taxi across Bavaria ") = a3cca2b2aa1e3b5b3b5aad99a8529074 A small change of the text generates almost certainly a completely different hash. For example arises with Frank instead Franz ( only one letter changed ):

Md5 ( " Frank chases in the completely dilapidated taxi across Bavaria ") = 7e716d0e702df0505fc72e2b89467910 The hash of a zero-length string is:

Md5 ("") = d41d8cd98f00b204e9800998ecf8427e Use and availability

Under most Linux distributions, the program md5sum is coreutils installed as part of default.

On BSD -derived operating systems such as Mac OS X, there is md5 command.

In many other Unix derivatives can make do with the most installed program OpenSSL.

Microsoft Windows operating systems are not equipped ex works with a program for calculating MD5 hashes, but Microsoft has released the program File Checksum Integrity Verifier ( fciv.exe ) free ready.

Checking the MD5 hash value

After a successful download of a file or folder with files, the associated MD5 hash value is often provided in an additional file. About a testing program can then turn the hash value of the downloaded file are calculated, which is then compared with the hash value provided. If both hash values ​​are identical, the integrity of the downloaded file is confirmed. Thus, when downloading the file, there were no errors. This does not provide security in terms of targeted data manipulation by an attacker ( man-in -the -middle attack ), since the attacker can manipulate the transfer of the MD5 hash value.

Algorithm

MD5 generated from a variable length message a fixed-length output (128 bits). A first one is appended to the outgoing message. Thereafter, the output message is padded with zeros so that its length is 64 bits away from being divisible by 512. Next, a 64 -bit number encoding the length of the output message is appended. The message length is now divisible by 512.

The main algorithm MD5 uses a 128-bit buffer is divided into four 32 -bit words A, B, C and D. These are initialized with certain constants. Now this buffer compression function is called with the first 512 -bit block as a key parameter. The treatment of a message block is done in four stages similar to each other, called " rounds " of cryptographers. Each round consists of 16 operations based on a non-linear function "F", modular addition, and left rotation. There are four possible "F " functions in each round another thereof is used:

Stand for XOR, AND, OR, and NOT operations, respectively.

To the result of the same function to the second message block is called as a parameter and so on until the last 512 -bit block. As a result, in turn, a 128 -bit value is returned - the MD5 sum.

Pseudocode

The following is the pseudo code for the MD5 algorithm.

/ / Note: All variables are unsigned (unsigned ) 32 -bit values ​​and / / Behave in calculations congruent ( ≡ ) modulo 2 ^ 32 / / S defines the number of bits that are rotated per round: var uint s, K S [ 0 to 15] = { 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22 } S [ 16, 31] = { 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20 } s [ 32 .47 ] = { 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23 } s [ 48 .63 ] = { 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21 } / / Use binary integer part of the 2 ^ 32 times the amount of the sinus / / Integer values ​​as constants: for all i from 0 to 63      K [i]: = floor (abs (sin ( I 1 )) × 2 ^ 32 ) / / Initialize the variables: (see RFC 1321 ) var uint a0: = 0x67452301 var uint b0: = 0xEFCDAB89 var uint c0: = 0x98BADCFE var uint d0: = 0x10325476 / / Preparation of the 'message' is: var uint message_laenge: = BIT_LENGTH (message) Extended message to bit "1" Extended message to bits " 0" to length of message in bits ≡ 448 (mod 512) Extended message to message_laenge as 64-bit little-endian integer / / Process the message to consecutive 512 -bit blocks: for all 512 -bit block of message      subdivide block into 16 32 -bit little-endian words M [ i], 0 ≤ i ≤ 15      / / Initialize the hash value for this block:      uint var A: = a0      var uint B: = b0      uint var c: = c0      var uint D: = d0      / / Main loop:      for all i from 0 to 63          if 0 ≤ i ≤ 15 then              F: = ( B and C) or ( (not B) and D)              g: = i          else if 16 ≤ i ≤ 31 then              F: = ( B and D) or ( C and (not D))              g: = (5 × i 1 ) mod 16          else if 32 ≤ i ≤ 47 then              F: = B xor C xor D              g: = (3 × i 5 ) mod 16          else if 48 ≤ i ≤ 63 then              F: = C xor (B or ( not D))              g: = (7 × i ) mod 16          wenn_ende          temp: = D          D: = C          C: = B          B: = B left rotation ( (A F K [i ] M [ g] ), s [i])          A: = temp      / / Add the hash value of the block to the sum of previous hashes:      a0: = a0 A      b0: = b0 B      c0: = c0 C      d0: = d0 D var uint digest: a0 = b0 append append append c0 d0 / / (represented as a little-endian) Instead of the original wording from RFC 1321 can be used to increase efficiency include:

(0 ≤ i ≤ 15): F: = D xor ( B and (C xor D)) (16 ≤ i ≤ 31): F: = C xor (D and (B xor C)) security

MD5 is widely used and was originally considered to be cryptographically secure. Already in 1994 discovered Bert den Boer and Antoon Bosselaers pseudo- collisions in MD5. Basic working to find real collisions made ​​, Hans Dobbertin (then at BSI), who had already developed the successful attack on MD4 and using techniques that are rendered on MD5.

Collision resistance

In August 2004, a Chinese team of scientists found the first collision in the full MD5 function. On an IBM P690 cluster their first attack required an hour of it starting to let more collisions within five minutes of each find. The attack of the Chinese research was based on analysis. Shortly after the publication of the work of Chinese MD5CRK is set, the attempt to find a collision by brute- force method.

Description: An input block ( 512 bits ) is modified, whereby attempts to create a certain difference in the original output. Through an elaborate analysis of the algorithm, the number of unknown bits can be reduced, so that this is achieved by calculation. In the next 512 -bit block can try using the same methods to pick up the difference again. So the forgery requires a contiguous block of data of 1024 bits = 128 bytes, which greatly restricts the use.

Meanwhile, the collision attacks are so advanced that further use of MD5, especially in those scenarios where the user does not want to sign the files completely controlled, should be rejected. A 2009 test conducted of the computer magazine c't using GPGPU allows a about a year old high-end gaming PC with two Nvidia GeForce 9800 GX2 ( a total of four GPUs ), found in nearly 35 minutes, a collision.

Disposable property

Another method of attack make rainbow tables dar. In these tables, strings are stored with the corresponding MD5 hashes. The attacker searches these tables by using the given hash value and can then read the appropriate strings. This attack can be used primarily to determine passwords that are stored as MD5 hashes. However, the necessary rainbow tables are very large and will require a high computational effort to create them. Therefore, this attack is generally only possible with short passwords. For this case, there are pre-computed rainbow tables, for which at least the computational effort required to create the list is omitted. The use of a salt, that is of a random non- predictable value which is appended to the plain text can, however, negate the effectiveness of pre-calculated tables rainbow.

Summary

Find collisions means two different texts M and M ' with hash ( M) = hash (M') to find. In a pre-image attack sought to be given M or hash ( M) is an M 'such that hash ( M) = hash (M'). Since you only M ' controlled the pre-image attack, not even M, this is much harder.

Currently MD5 is broken only with respect to collision attacks. Therefore, there is no imminent threat to passwords that have been stored as an MD5 hash. These collisions are more dangerous for digital signatures. For secure storage of passwords but also algorithms should be taken into consideration that were developed specifically for this purpose, such as bcrypt.

There is still no known first pre-image attack, signed MD5 hash currently (2013 ) are still safe in the past.

560707
de