chapter  2
6 Pages

Compression of data

A mathematical notion of compression can be found in the fields of computer science and information theory (Wikipedia, ‘Data Compression’ 2013: 1-10). The use of fewer bits of information to represent the original code structure of information, or data, is termed either data compression, source coding or bit-rate reduction and is either a lossless or lossy compression format (Wikipedia, ‘Data Compression’, 2013: 1). Lossless compression is the reduction of statistically redundant information with no loss to the original information content (Wikipedia, ‘Data Compression’, 2013: 1). Lossy compression is the removal of unnecessary information from the original source, hence the term ‘source’ coding, with the resulting loss of the amount of original information or data (Wikipedia, ‘Data Compression’, 2013: 1). Symmetrical data compression is when the time to compress is the same as decompression and asymmetrical data compression is when compression and decompression times vary (Wikipedia, ‘Data Compression Symmetry’, 2013: 1). Universal code data compression is a prefix code that transposes positive integers within binary code words that are monotonic with relation to statistically probable distribution of lengths an optimal code would have produced (Wikipedia, ‘Universal Code’, 2013: 1). On average, prefix codes are assigned longer code words to larger integers, but are not used for precisely known statistical distribution and no universal code is known to be optimal for probability distribution (Wikipedia, ‘Universal Code’, 2013: 1). A universal

code is not a universal source code as no fixed prefix code is used and both Huffman coding and arithmetic encoding are better at compression than universal coding, unless the exact probability of the message is not known and then a universal code is needed for compression (Wikipedia, ‘Universal Code’, 2013: 1-2). Huffman codes were created in 1951 by David A. Huffman, a Ph.D. student at MIT, and are an entropy algorithm for encoding lossless data compression (Wikipedia ‘Huffman Coding’, 2013: 1). Huffman codes use a specific type of representing and choosing each symbol that is designated a prefix-free code so that any bit string represent a specific symbol is never the prefix of another symbol (Wikipedia, ‘Huffman Coding’, 2013: 1). There are many variations of Huffman codes (Wikipedia, ‘Huffman Codes’, 2013: 7-10). Arithmetic coding, on average, have better compression values because of arbitrary number of symbols that can be combined for more efficient coding and function better when adapting to real world input statistics (Wikipedia, ‘Huffman Coding’, 2013: 1). In entropy, encoding data compression is independent of the specific characteristics of the medium used and is a lossless compression code (Wikipedia, ‘Entropy Encoding’, 2013: 1). By assigning a unique prefix-free code to each specific symbol in the input and replace each fixed length input symbol with a variable length prefix-free output code word for compression (Wikipedia, ‘Entropy Encoding’, 2013: 1). The most common symbols use the shortest codes for compression (Wikipedia, ‘Entropy Encoding’, 2013: 1).