Hash function

A hash function (also hash function ) is an illustration ( the key ) maps a large input set to a smaller target set (the hash value ) - it is not injective. Here, the input set also contain elements with different lengths, the elements of the target amount, however, usually have a fixed length.

The name " hash " comes from the verb to hash, which can be translated as " hacking ". The German name is hash function. Both names suggest that these functions are normally set out to " scatter " the data and " chop " (see also the chopper in radio technology ). Especially in the computer science one uses also the term hash algorithm (english hash algorithm ), since hash functions are often specified in the form of an algorithm instead of a mathematical function.

The hash - or scatter values are mostly scalar values from a limited subset of the natural numbers. A "good " hash function provides this for the (expected ) input data values so that two different inputs lead to different output values. A hash value is therefore also known as English Fingerprint, since it represents an almost unique identifier of a larger amount of data, such as a fingerprint, a person almost uniquely identified.

A so-called " collision " occurs when different input data is assigned the same hash value. Since the set of possible hash values is usually smaller than that of the possible inputs, such collisions are then in principle unavoidable, which is why there must be method for collision detection. A good hash function is characterized in that it generates as few collisions for the inputs for which it was designed.

In the data storage, a hash value is used to calculate the location of the requested data (such as a hash table ).

At checksums using hash values in order to detect transmission errors.

In cryptology special cryptologic hash functions are used, which additionally required that it is virtually impossible to find a collision deliberately.

For known and limited input quantities can be (in the sense of collision-free ) hash functions also found perfect.

5.1 Known
5.2 Lattice Based
5.3 checksums
5.4 Cryptologic hash functions

Explanation

Hash functions are typically used to:

Assign a location to a complex object Example: "John Doe " will be placed in the folder "m" - the first letter of the surname.
To compute a (short ) checksum to the object Example: Checksum an ISBN, Cyclic redundancy check to detect transmission errors of files
To identify a content almost unique ( but still " short"), without revealing anything about the content Example: Applications in cryptography

Depending on the application has different requirements to the function. To group, for example an address file by the first letter of the surname, saves you obviously find a lot of time - you only need a 26 parts to search. This " hash function " is very convenient for the people ( as they are very easy to "calculate" to ), however, a computer program would use other methods to organize an address book. For the program it is that is important to have as few collisions; but there are obviously a lot of names that begin with the same letter, and they come unevenly often. So If you put, for example, personnel files according to this principle from, one often has a lot of files in the folder with the letter " S " while the folder "Q" remains empty.

The digit sum of the digits is a simple hash function. It assigns a random number to a single digit number, so for example on mapped. As this checksum checksum is not suitable since the permutation of digits - a typical case with typing of long numbers - is not recognized. Thus, the number has the same checksum. Checksums as the ISBN of a book or the CRC-32 checksum of a file ( for example, when checking a file downloaded from the Internet for transmission errors ) are better suited to detect such errors.

In the identification of content with so-called cryptographic hash functions, it is not only important that the hash value at every little modification seemingly at random (ie, not with just a few bits ) changes ( this would be a normal checksum rich ) and that it is not easy, a second content to produce the same hash value (to avoid a complete replacement of the contents ). It is equally important that the contents can not be reconstructed from the hash value. If you have exchanged two documents, and would like for example on the phone verify successful transmission, it is sufficient, on the phone to verify the correctness of the hash value. If the call is intercepted, so it do not give away the content of the message, even if part of the content should be already known.

Definition of a hash function

A mapping is called a hash function, if the following holds. In particular, provides a hash table of size. The amount represents the data to be hashed, and is also the set of keys called; the amount is the set of possible hash values . Typically, the amount of hash values is selected as the initial segment of the natural numbers. This amount then is also called address space.

Typically in practice, only a small subset is always ready. The amount will then enter the used key.

The ratio provides the occupancy factor.

The case will be referred to as collision. An injective hash function is called perfect, inter alia because it produces no collisions.

Criteria for a good hash function

Low probability of collisions of hash values for the input range of values , so possible a uniform distribution of hash values to the expected input values.
The memory requirement of the hash value should be significantly smaller than that of the message (input value ).
Chaos - Similar source elements ( input values ) should lead to completely different hash values . In the ideal case the overturning of a bit change in the input average half the bits in the resulting hash value.
Surjectivity - No result value (hash value) is impossible to each result ( each hash value in the defined range of values) should be able to actually occur.
Efficiency - The function must be quickly computable, do without large memory consumption and should be the source data ( input values ) need as read only once.
Order-preserving - If the hash function is to allow a sorted access in a hash table.

Depending on the application, these criteria play a different role. So chaos and order preservation is apparently contradictory. In cryptography, the latter is taboo for certain database applications, this is it exactly desirable.

Applications

Hash functions have different fields of application. Here, three large areas can be subdivided:

Hash functions in databases

Database management systems use hash functions to search data in large data sets using a hash table. In this case one speaks of a database index.

Checksums

Checksums can be seen, a simple means to change to the transmitted data.

An error is detected if the calculated checksum of the received data from the transmitted checksum, that is the original data is different. The suitability of different hash functions for checksum calculation depends on their collision probability.

If the checksum is designed to protect against targeted manipulations of the data, a cryptographic hash function is used, because a collision can be found only with very high computational complexity.

Examples

Hash values have an important task among others P2P applications for various reasons. The hash values are used here both for searching and identifying files as well as for identifying and testing of the transferred file fragments. This allows large files reliably exchange in small segments.

For use in P2P networks are mainly stepped hash functions, where for small bits of the hash file and then from these values , a total value is calculated. The networks Gnutella G2, Shareaza and Direct Connect these are, for example, Tiger Tree hash functions.

Locating files based on the hash value of its content is protected at least in the U.S. as a software patent. The owner pursues programs and companies that enable this system based on the search of files, including companies that want to identify providers of unlicensed content on behalf of the RIAA or MPAA.

When programming Internet applications hash functions are used to generate session IDs by ( such as time, IP address), a hash value is calculated using varying state values .

Cryptology

Cryptologic hash functions have special properties, in practice there are collision- resistant one-way functions. You will be used to sign messages or to ensure the integrity of data.

Hash algorithms

Friend

Brent Hashing
Division remainder method
Double Hashing
FNV
Cuckoo hashing
Multiplicative method
Mitt square method
Decomposition method
Digit analysis

Lattice Based

Ajtai
Micciancio
Peikert - roses
Fast Fourier Transform (FFT hash function)
LASH

Checksums

Fletcher -32
Adler -32
CRC ( Cyclic Redundancy Check )
Parity
Checksum

Cryptologic hash functions

MD2, MD4, MD5
Secure Hash Algorithm ( SHA)
RIPEMD -160
Tiger
HAVAL
Whirlpool
LM hash
NTLM

International Standard Book Number Chaos (cosmogony) Hash tree Session-ID Fowler–Noll–Vo hash function Adler-32 MD2 (cryptography) Tiger (cryptography) Whirlpool (cryptography) NT LAN Manager Donald Knuth Charles E. Leiserson Ron Rivest

335180