ABSTRACT

Many naturally occurring and man-made phenomena demonstrate a nonuniform exponential distribution whereby a small set of common elements in a class represent the bulk of all uses of the class [1]. This phenomenon has variously been referred to as the 80/20 rule [2], the Pareto principle (for continuous data) [3], Zipf’s law (for discrete data) [4] and the powerlaw phenomenon [5]. For example, linguistic researchers studying a large corpus of English language text demonstrated that the word “the” constitutes 7% of all word uses in the corpus, while “to” and “of” each represent another 3%. Indeed, only 135 out of the 50,000 unique observed words are needed to account for half of all word uses, while nearly half of the words in the corpus are used only a single time [6]. This pattern, com-

monly called Zipf’s law, has been observed in a variety of languages including American English, Chinese and the Latin of Plautus [7]. The same phenomenon has been observed in such disparate areas as the population of human settlements [8], distribution of wealth [9], movie rental patterns from Netflix and purchases from Amazon.com [1].