ABSTRACT

Introduction ......................................................................................................94 Data Mining ..................................................................................................... 96 The Tools of Data Mining ............................................................................. 100 Statistical Forecasting and Data Mining ......................................................102 Terminology in Data Mining: Speak Like a Data Miner ...........................102 A Data Mining Example: k-Nearest-Neighbor and R.A. Fisher ..............103 Churning: A Business Example ....................................................................106 Text Analytics ..................................................................................................113 Summary .........................................................................................................114

One apocryphal story about the origin of statistics (and hence analytics in general) describes a tale going back to the 17th century in London. During the plague that decimated London and the surrounding area, it became popular to declare oneself dead in order to avoid paying taxes (even though one was very much alive). To prevent this practice, the king required a death certificate including some basic information. A routine report was prepared for the king summarizing recent mortality details. In the preparation of the report (the Bills of Mortality), patterns were discovered; this was reportedly the first instance of noticing patterns in data as opposed to seeing patterns in nature (e.g., the stars in the night sky, leaf structure in plants, and the tides of the sea). Mortality cropped up again in the analytics literature, when Benjamin Gompertz discovered the pattern of exponential deaths as age increases in fruit flies. The pattern he wrote about is a special case of the generalized logistic function used in presentday data mining.