In cryptography, a oneway compression function is a function that transforms two fixed length inputs to an output of the same size as one of the inputs. The transformation is "oneway", meaning that it is difficult given a particular output to compute inputs which compress to that output. Oneway compression functions are not related to data compression, which by definition is invertible.
Oneway compression functions are for instance used in the MerkleDamgård construction inside cryptographic hash functions.
Oneway compression functions are often built from block ciphers. Some methods to turn any normal block cipher into a oneway compression function are DaviesMeyer, MatyasMeyerOseas, MiyaguchiPreneel (singleblocklength compression functions) and MDC2/MeyerSchilling, MDC4, Hirose (doubleblocklength compression functions). These methods are described in detail further down. (MDC2 is also the name of a hash function patented by IBM.)
Contents 
A compression function mixes two fixed length inputs and produces a single fixed length output of the same size as one of the inputs. This can also be seen as that the compression function transforms one large fixedlength input into a shorter, fixedlength output.
For instance, input A might be 128 bits, input B 128 bits and they are compressed together to a single output of 128 bits. This is the same thing as if one single 256bit input is compressed together to a single output of 128 bits.
Some compression functions have different size of the two inputs but the output usually is the same size as one of the inputs. For instance, input A might be 256 bits, input B 128 bits and they are compressed together to a single output of 128 bits. That is, a total of 384 input bits are compressed together to 128 output bits.
The mixing is done in such a way that full avalanche effect is achieved. That is, every output bit depends on every input bit.
A oneway function is a function that is easy to compute but hard to invert. A oneway compression function or (also called hash function) should have the following properties:
Ideally one would like the "unfeasibility" in preimageresistance and second preimageresistance to mean a work of about 2^{n} where n is the number of bits in the hash function's output. Recent results indicate that in the case of second preimageresistance this is more difficult than has been expected.
Oneway compression functions are used in the MerkleDamgård construction inside cryptographic hash functions.
A hash function must be able to process an arbitrarylength message into a fixedlength output. This can be achieved by breaking the input up into a series of equalsized blocks, and operating on them in sequence using a oneway compression function. The compression function can either be specially designed for hashing or be built from a block cipher.
The last block processed should also be length padded, this is crucial
to the security of this construction. This construction is called
the MerkleDamgård construction. Most widely
used hash functions, including SHA1 and MD5, take this form.
When length padding (also called MDstrengthening) is applied attacks cannot find collisions faster than the birthday paradox (2^{n/2}, n is the block size in bits) if the used ffunction is collisionresistant^{[1]}^{[2]}. Hence, the MerkleDamgård hash construction reduces the problem of finding a proper hash function to finding a proper compression function.
A second preimage attack (given a message m1 an attacker finds another message m2 to satisfy hash(m1) = hash(m2) ) can be done according to Kelsey and Schneier ^{[3]} for a 2^{k}messageblock message in time k x 2^{n/2+1}+2^{nk+1}. Note that the complexity is above 2^{n/2} but below 2^{n} when messages are long and that when messages get shorter the complexity of the attack approaches 2^{n}.
Due to several structural weaknesses, especially the length extension problem, Stefan Lucks proposed the use of the widepipe hash^{[4]} instead of MerkleDamgård construction. The widepipe hash is very similar to the MerkleDamgård construction but has a larger internal state size, meaning that the bitlength that is internally used is larger than the output bitlength. Therefore, in final step a second compression function compresses the last internal hash value to the final hash value. SHA224 and SHA384 take this form since they are derived from SHA256 and SHA512, respectively.
Oneway compression functions are often built from block ciphers.
Block ciphers take (like oneway compression functions) two fixed size inputs (the key and the plaintext) and return one single output (the ciphertext) which is the same size as the input plaintext.
However, modern block ciphers are only partially oneway. That is, given a plaintext and a ciphertext it is infeasible to find a key that encrypts the plaintext to the ciphertext. But, given a ciphertext and a key a matching plaintext can be found simply by using the block cipher's decryption function. Thus, to turn a block cipher into a oneway compression function some extra operations have to be added.
Some methods to turn any normal block cipher into a oneway compression function are DaviesMeyer, MatyasMeyerOseas, MiyaguchiPreneel (singleblocklength compression functions) and MDC2, MDC4, Hirose (doubleblocklength compressions functions).
Singleblocklength compression functions output the same number of bits as processed by the underlying block cipher. Consequently, doubleblocklength compression function twice the number of bits.
If a block cipher has a block size of say 128 bits singleblocklength methods create a hash function that has the block size of 128 bits and produces a hash of 128 bits. Doubleblocklength methods make hashes with double the hash size compared to the block size of the block cipher used. So a 128bit block cipher can be turned into a 256bit hash function.
These methods are then used inside the MerkleDamgård construction to build the actual hash function. These methods are described in detail further down. (MDC2 is also the name of a hash function patented by IBM.)
Using a block cipher to build the oneway compression function for a hash function is usually somewhat slower than using a specially designed oneway compression function in the hash function. This is because all known secure constructions do the key scheduling for each block of the message. Black, Cochran and Shrimpton have shown that it is impossible to construct a oneway compression function that makes only one call to a block cipher with a fixed key^{[5]}. In practice reasonable speeds are achieved provided the key scheduling of the selected block cipher is not a too heavy operation.
But, in some cases it is easier because a single implementation of a block cipher can be used for both block cipher and a hash function. It can also save code space in very tiny embedded systems like for instance smart cards or nodes in cars or other machines.
Therefore, the hashrate or rate gives a glimpse of the efficiency of a hash function based on a certain compression function. The rate of an iterated hash function outlines the ratio between the number of block cipher operations and the output. More precisely, if n denotes the output bitlength of the block cipher the rate represents the ratio between the number of processed bits of input m, n output bits and the necessary block cipher operations s to produce these n output bits. Generally, the usage of less block cipher operations could result in a better overall performance of the entire hash function but it also leads to a smaller hashvalue which could be undesirable. The rate is expressed in the formula .
The hash function can only be considered secure if at least the following conditions are met:
The constructions presented below: DaviesMeyer, MatyasMeyerOseas, MiyaguchiPreneel and Hirose have been shown to be secure under the blackbox analysis^{[6]}^{[7]}. The goal is to show that any attack that can be found is at most as efficient as the birthday attack under certain assumptions. The blackbox model assumes that a block cipher is used that is randomly chosen from a set containing all appropriate block ciphers. In this model an attacker may freely encrypt and decrypt any blocks, but does not have access to an implementation of the block cipher. The encryption and decryption function are represented by oracles that receive a pair of either a plaintext and a key or a ciphertext and a key. The oracles then respond with a randomly chosen plaintext or ciphertext, if the pair was asked for the first time. They both share a table for these triplets, a pair from the query and corresponding response, and return the record, if a query was received for the second time. For the proof there is a collision finding algorithm that makes randomly chosen queries to the oracles. The algorithm returns 1, if two responses result in a collision involving the hash function that is built from a compression function applying this block cipher (0 else). The probability that the algorithm returns 1 is dependent on the number of queries which determine the security level.
The DaviesMeyer singleblocklength oneway compression function feeds each block of the message (m_{i}) as the key to a block cipher. It feeds the previous hash value (H_{i1}) as the plaintext to be encrypted. The output ciphertext is then also XORed () with the previous hash value (H_{i1}) to produce the next hash value (H_{i}). In the first round when there is no previous hash value it uses a constant prespecified initial value (H_{0}).
In mathematical notation DaviesMeyer can be described as:
The scheme has the rate (k is the keysize):
If the block cipher uses for instance 256bit keys then each message block (m_{i}) is a 256bit chunk of the message. If the same block cipher uses a block size of 128 bits then the input and output hash values in each round is 128 bits.
Variations of this method replace XOR with any other group operation, such as addition on 32bit unsigned integers.
A notable property of the DaviesMeyer construction is that even if the underlying block cipher is totally secure, it is possible to compute fixed points for the construction : for any m, one can find a value of h such that : one just has to set ^{[8]}. This is a property that random functions certainly do not have. So far, no practical attack has been based on this property, but one should be aware of this "feature".
To prevent the fixedpoint attack finding collisions one should use the MerkleDamgård (MD) strengthening (bitlength of the message is appended at its end) as described at the paragraph “The MerkleDamgård construction” of this article. When MDstrengthening is applied attacks cannot find collisions faster than the birthday paradox (2^{n/2}, n is the block size in bits)^{[1]}^{[2]}. The fixedpoints can be used in a second preimage attack (given a message m1, attacker finds another message m2 to satisfy hash(m1) = hash(m2) ) of Kelsey and Schneier ^{[3]} for a 2^{k}messageblock message in time 3 x 2^{n/2+1}+2^{nk+1} . If the construction does not allow easy creation of fixed points (like MatyasMeyerOseas or MiyaguchiPreneel) then this attack can be done in k x 2^{n/2+1}+2^{nk+1} time. Note that in both cases the complexity is above 2^{n/2} but below 2^{n} when messages are long and that when messages get shorter the complexity of the attack approaches 2^{n}.
Most widely used hash functions, including MD5, SHA1 and SHA2 use this construction.
The security of the DaviesMeyer construction in the Ideal
Cipher Model was first proved by R. Winternitz^{[9]}.
The MatyasMeyerOseas singleblocklength oneway compression function can be considered the dual (the opposite) of DaviesMeyer.
It feeds each block of the message (m_{i}) as the plaintext to be encrypted. The output ciphertext is then also XORed () with the same message block (m_{i}) to produce the next hash value (H_{i}). The previous hash value (H_{i1}) is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant prespecified initial value (H_{0}).
If the block cipher has different block and key sizes the hash value (H_{i1}) will have the wrong size for use as the key. The cipher might also have other special requirements on the key. Then the hash value is first fed through the function g( ) to be converted/padded to fit as key for the cipher.
In mathematical notation MatyasMeyerOseas can be described as:
The scheme has the rate:
A second preimage attack (given a message m1 an attacker finds another message m2 to satisfy hash(m1) = hash(m2) ) can be done according to Kelsey and Schneier^{[3]} for a 2^{k}messageblock message in time k x 2^{n/2+1}+2^{nk+1}. Note that the complexity is above 2^{n/2} but below 2^{n} when messages are long and that when messages get shorter the complexity of the attack approaches 2^{n}.
The MiyaguchiPreneel singleblocklength oneway compression function is an extended variant of MatyasMeyerOseas. It was independently proposed by Shoji Miyaguchi and Bart Preneel.
It feeds each block of the message (m_{i}) as the plaintext to be encrypted. The output ciphertext is then XORed () with the same message block (m_{i}) and then also XORed with the previous hash value (H_{i1}) to produce the next hash value (H_{i}). The previous hash value (H_{i1}) is fed as the key to the block cipher. In the first round when there is no previous hash value it uses a constant prespecified initial value (H_{0}).
If the block cipher has different block and key sizes the hash value (H_{i1}) will have the wrong size for use as the key. The cipher might also have other special requirements on the key. Then the hash value is first fed through the function g( ) to be converted/padded to fit as key for the cipher.
In mathematical notation MiyaguchiPreneel can be described as:
The scheme has the rate:
The roles of m_{i} and H_{i1} may be switched,
so that H_{i1} is encrypted under the key m_{i}.
Thus making this method an extension of DaviesMeyer instead.
A second preimage attack (given a message m1 an attacker finds another message m2 to satisfy hash(m1) = hash(m2) ) can be done according to Kelsey and Schneier^{[3]} for a 2^{k}messageblock message in time k x 2^{n/2+1}+2^{nk+1}. Note that the complexity is above 2^{n/2} but below 2^{n} when messages are long and that when messages get shorter the complexity of the attack approaches 2^{n}.
The Hirose^{[7]} doubleblocklength oneway compression function consists of a block cipher plus a permutation p. It was proposed by Shoichi Hirose in 2006 and is based on a work^{[10]} by Mridul Nandi.
It uses a block cipher whose key length k is larger than the block length n, and produces a hash of size 2n. For example, any of the AES candidates with a 192 or 256bit key (and 128bit block).
Each round accepts a portion of the message m_{i} that is k−n bits long, and uses it to update two nbit state values G and H.
First, m_{i} is concatenated with H_{i−1} to produce a key K_{i}. Then the two feedback values are updated according to:
p(G_{i−1}) is an arbitrary fixedpointfree permutation on an nbit value, typically defined as
for an arbitrary nonzero constant c. (Allones may be a convenient choice.)
Each encryption resembles the standard DaviesMeyer construction. The advantage of this scheme over other proposed doubleblocklength schemes is that both encryptions use the same key, and thus key scheduling effort may be shared.
The final output is H_{t}G_{t}. The scheme has the rate R_{Hirose} = (k−n)/2·n relative to encrypting the message with the cipher.
Hirose also provides a proof in the Ideal Cipher Model.

