Rough Set and Neighborhood Systems in Big Data Analysis

doi:10.1201/9781315180748-14

ABSTRACT

262The notion of a rough set was introduced by Pawlak, and its extensions have proved themselves to be excellent models to capture imprecision in real-life data sets. However, a rough set has the limitation of being more suitable to handling categorical data than numeric data. The concept of a neighborhood system was introduced by Lin in 1998. It was observed by Hu, who introduced the concept of neighborhood rough sets, that besides being an extension to rough sets, is capable of handling both categorical as well as numeric data sets equally well. Rough sets are widely used for imputing missing values in data sets. Also, they are quite efficient in generating rule sets from given data sets. But another problem with rough sets is that they cannot handle large data sets of their own. As a result, Zhang et al. used techniques such as parallel processing, data reduction, and MapReduce to acquire knowledge from Big Data. However, it still cannot handle heterogeneous data well. In order to solve this problem, recently it has been observed by Hiremath et al. that neighborhood systems are more suitable in this regard. It is our aim in this chapter to present these developments along with some problems for future work on this topic.