From Wikipedia, the free encyclopedia
This article is about the transmission of data
across noisy channels. For the storage of text in computers, see
variablewidth encoding.
In coding
theory a variablelength code is a code which maps source symbols to a
variable number of bits.
Variablelength codes can allow sources to be compressed
and decompressed with zero error (lossless data compression)
and still be read back symbol by symbol. With the right coding
strategy an i.i.d.
source may be compressed almost arbitrarily close to its entropy. This is in contrast to fixed
length coding methods, for which data compression is only possible
for large blocks of data, and any compression beyond the logarithm
of the total number of possibilities comes with a finite (though
perhaps arbitrarily small) probability of failure.
Some examples of wellknown variablelength coding strategies
are Huffman
coding, Lempel–Ziv coding and arithmetic
coding.
Extension of
a code
The extension of a code is the mapping of
finite length source sequences to finite length bit strings, that
is obtained by concatenating for each symbol of the source sequence
the corresponding codeword produced by the original code.
Classes of variablelength
codes
Variablelength codes can be strictly nested in order of
decreasing generality as nonsingular codes, uniquely decodable
codes and prefix codes. Prefix codes are always uniquely decodable,
and these in turn are always nonsingular :
Nonsingular
codes
A code is nonsingular if each source symbol is
mapped to a different nonempty bit string, i.e. the mapping from
source symbols to bit strings is onetoone.
 For example the mapping
is not nonsingular because both "a" and "b" map
to the same bit string "0" ; any extension of this mapping
will generate a lossy (nonlossless) coding. Such singular coding
may still be useful when some loss of information is acceptable
(for example when such code is used in audio or video compression,
where a lossy coding becomes equivalent to source quantization).
 However, the mapping
is nonsingular ; its extension will generate a lossless
coding, which will be useful for general data transmission (but
this feature is not always required). Note that it is not necessary
for the nonsingular code to be more compact than the source (and
in many applications, a larger code is useful, for example as a way
to detect and/or recover from encoding or transmission errors, or
in security applications to protect a source from undetectable
tamperring).
Uniquely decodable codes
A code is uniquely decodable if its extension
is nonsingular. Whether a given code is uniquely decodable can be
decided with the Sardinas–Patterson
algorithm.
 The mapping
is uniquely decodable (this can be demonstrated by looking at the
followset after each target bit string in the map,
because each bitstring is terminated as soon as we see a 0 bit
which cannot follow any existing code to create a longer valid code
in the map, but unambiguously starts a new code).
 Consider again the code M_{2} from the previous
section. This code, which is based on an example found in ^{[1]}, is
not uniquely decodable, since the string
011101110011 can be interpreted as the sequence of
codewords 011101110011, but also as the sequence of
codewords 011101110011. Two possible decodings of this
encoded string are thus given by cdb and
babe.
 The nonsingular example mapping M_{2} in the previous
paragraph is not uniquely decodable because (for
example) the source sequence "aa" maps to bit string "00" using the
extension, exactly like the source sequence "c". However, such a
code is useful when the set of all possible source symbols is
completely known and finite, or when there are restrictions (for
example a formal syntax) that determine if source elements of this
extension are acceptable. Such restrictions permit the decoding of
the original message by checking which of the possible source
symbols mapped to the same symbol are valid under those
restrictions.
Prefix
codes
Main article:
Prefix code
A code is a prefix code if no target bit string
in the mapping is a prefix of the target bit string of a different
source symbol in the same mapping. This means that symbols can be
decoded instantaneously after their entire codeword is received.
Other commonly used names for this concept are prefixfree
code, instantaneous code, or
contextfree code.
 The example mapping M_{3} in the previous
paragraph is not a prefix code because we don't
know after reading the bit string "0" if it encodes a "a" source
symbol, or if it is the prefix of the encodings of the "b" or "c"
symbols.
 An example of a prefix code is shown below.
Symbol 
Codeword 
a 
0 
b 
10 
c 
110 
d 
111 

 The source is one of four symbols: a, b,c or d. The binary
codeword for each symbol is given.
 The encoding and decoding of a portion of an example source
sequence is given below:
 aabacda → 001001101110 → 001001101110 → aabacda
A special case of prefix codes are block codes. Here all codewords must have
the same length. The latter are not very useful in the context of
source
coding, but often serve as error correcting codes in the
context of channel coding.
Advantages
The advantage of a variablelength code is that unlikely source
symbols can be assigned longer codewords and likely source symbols
can be assigned shorter codewords, thus giving a low
expected codeword length. For the above example, if the
probabilities of (a, b, c, d) were ,
the expected number of bits used to represent a source symbol using
the code above would be:

 .
As the entropy of this source is 1.7500 bits per symbol, this
code compresses the source as much as possible so that the source
can be recovered with zero error.
Notes
 ^
Berstel et al. (2009), Example 2.3.1, p. 63
References
 Jean Berstel, Dominique Perrin and Christophe Reutenauer. Codes
and Automata. Cambridge University Press. To appear (estimated Nov.
2009). Draft available
online