Dictionary Methods | 11 | v2 | Introduction to Information Theory and

ABSTRACT

In the previous chapters, lossless compression was obtained by using a probability model to drive a coder. Dictionary methods use a list of phrases (the dictionary), which hopefully includes many of the phrases in the source, to replace source fragments by pointers to the list. Compression occurs if the pointers require less space than the corresponding fragments. (Of course, the method of passing the dictionary must also be considered.)

In many ways, dictionary methods are easier to understand than probabilistic methods. At the simplest level, several (fixed) specialized dictionaries could be made available to both the coder and decoder. For text in English, a few thousand of the most commonly used words could serve as the dictionary; if the source consisted of code in some computer language such as C, then a list of the keywords and standard library functions might serve as a dictionary. Fixed dictionaries may be useful in some situations, but there are at least two serious drawbacks. First, the dictionaries must be known to both the coder and decoder. Changes to the dictionary would have to be propagated to all the sites which use the scheme. Second, fixed dictionary schemes cannot compress “unknown” text. In the case of C code, there would likely be little compression of the variable names created by the programmer.