One round of XXTEA 

General  

Designers  David Wheeler, Roger Needham 
First published  October 1998 
Derived from  Block TEA 
Cipher detail  
Key sizes  128 bits 
Block sizes  arbitrary, at least two words (64 bits) 
Structure  Unbalanced Feistel Network 
Rounds  depends on the block size; ~52+6*words (632 full cycles) 
In cryptography, Corrected Block TEA (often referred to as XXTEA) is a block cipher designed to correct weaknesses in the original Block TEA (Tiny Encryption Algorithm), which was first published together with a paper on TEA extensions (XTEA).^{[1]}^{[2]}
The cipher's designers were Roger Needham and David Wheeler of the Cambridge Computer Laboratory, and the algorithm was presented in an unpublished technical report in October 1998 (Wheeler and Needham, 1998). It is not subject to any patents.
Formally speaking, XXTEA is a consistent incomplete sourceheavy heterogeneous UFN (unbalanced Feistel network) block cipher. XXTEA operates on variablelength blocks that are some arbitrary multiple of 32 bits in size (minimum 64 bits). The number of full cycles depends on the block size, but there are at least six (rising to 32 for small block sizes). The original Block TEA applies the XTEA round function to each word in the block and combines it additively with its leftmost neighbour. Slow diffusion rate of the decryption process was immediately exploited to break the cipher. Corrected Block TEA uses a more involved round function which makes use of both immediate neighbours in processing each word in the block.
If the block size is equal to the entire message, XXTEA has the property that it does not need a mode of operation: the cipher can be directly applied to encrypt the entire message.
XXTEA is likely to be more efficient than XTEA for longer messages.
Needham & Wheeler make the following comments on the use of Block TEA:
For ease of use and general security the large block version is to be preferred when applicable for the following reasons.
 A single bit change will change about one half of the bits of the entire block, leaving no place where the changes start.
 There is no choice of mode involved.
 Even if the correct usage of always changing the data sent (possibly by a message number) is employed, only identical messages give the same result and the information leakage is minimal.
 The message number should always be checked as this redundancy is the check against a random message being accepted.
 Cut and join attacks do not appear to be possible.
 If it is not acceptable to have very long messages, they can be broken into chunks say of 60 words and chained analogously to the methods used for DES.
However, due to the incomplete nature of the round function, two large ciphertexts of 53 or more 32bit words identical in all but 12 words can be found by a simple bruteforce collision search requiring 2^{96−N} memory, 2^{N} time and 2^{N}+2^{96−N} chosen plaintexts, in other words with a total time*memory complexity of 2^{96}, which is actually 2^{wordsize*fullcycles/2} for any such cipher. It is currently unknown if such partial collisions pose any threat to the security of the cipher. Eight full cycles would raise the bar for such collision search above complexity of parallel bruteforce attacks.
The unusually small size of the XXTEA algorithm would make it a viable option in situations where there are extreme constraints e.g. legacy hardware systems (perhaps embedded) where the amount of available RAM is minimal.
The original formulation of the Corrected Block TEA algorithm, published by David Wheeler and Roger Needham, is as follows:^{[3]}
#define MX (z>>5^y<<2) + (y>>3^z<<4)^(sum^y) + (k[p&3^e]^z); long btea(long* v, long n, long* k) { unsigned long z=v[n1], y=v[0], sum=0, e, DELTA=0x9e3779b9; long p, q ; if (n > 1) { /* Coding Part */ q = 6 + 52/n; while (q > 0) { sum += DELTA; e = (sum >> 2) & 3; for (p=0; p<n1; p++) y = v[p+1], z = v[p] += MX; y = v[0]; z = v[n1] += MX; } return 0 ; } else if (n < 1) { /* Decoding Part */ n = n; q = 6 + 52/n; sum = q*DELTA ; while (sum != 0) { e = (sum >> 2) & 3; for (p=n1; p>0; p) z = v[p1], y = v[p] = MX; z = v[n1]; y = v[0] = MX; sum = DELTA; } return 0; } return 1; }
According to Needham and Wheeler:
btea will encode or decode n words as a single block where n > 1
 v is the n word data vector
 k is the 4 word key
 n is negative for decoding
 if n is zero result is 1 and no coding or decoding takes place, otherwise the result is zero
 assumes 32 bit ‘long’ and same endian coding and decoding
Note that the initialization of z will cause a segmentation fault in some languages – it would be better placed inside the ‘Coding Part’ block. Also, in the definition of MX some programmers would prefer to use bracketing to clarify operator precedence.
A clarified version including those improvements is as follows:
#include <stdint.h> #define DELTA 0x9e3779b9 #define MX ((z>>5^y<<2) + (y>>3^z<<4)) ^ ((sum^y) + (k[(p&3)^e] ^ z)); void btea(uint32_t *v, int n, uint32_t const k[4]) { uint32_t y, z, sum; unsigned p, rounds, e; if (n > 1) { /* Coding Part */ rounds = 6 + 52/n; sum = 0; z = v[n1]; do { sum += DELTA; e = (sum >> 2) & 3; for (p=0; p<n1; p++) y = v[p+1], z = v[p] += MX; y = v[0]; z = v[n1] += MX; } while (rounds); } else if (n < 1) { /* Decoding Part */ n = n; rounds = 6 + 52/n; sum = rounds*DELTA; y = v[0]; do { e = (sum >> 2) & 3; for (p=n1; p>0; p) z = v[p1], y = v[p] = MX; z = v[n1]; y = v[0] = MX; } while ((sum = DELTA) != 0); } }
