ABSTRACT

Objective:

The primary focus of this chapter is to help researchers to identify a suitable community detection algorithm that would work efficiently on a massive temporarily evolving dataset. The existing literature has not satisfactorily highlighted the comparative performance of existing approaches on the real-world datasets (such as Digital Bibliography and Library Project (DBLP) ). Thus, it has motivated us to carry out a comparative analysis (quantitative or qualitative) of nine state-of-the-art community detection algorithms and explore their suitability for large evolving datasets. Our analysis of the computational results of these nine community detection algorithms has been based upon Lancichinetti-Fortunato-Radicchi (LFR) benchmark metrics.

Context:

In recent years, various fields of interest related to finance, engineering, medicine, and general sciences have contributed to the enormous growth in the data of the corresponding domain. The technique of Social Network Analysis (SNA) has been widely used by the researchers to graphically model these massive data (called social networks) for better analysis. Community detection is one such SNA approach to analyze such large social networks. Many community detection algorithms exist to detect the communities in a social network in order to analyze the network structure. However, the efficiency of these community detection algorithms on evolving social networks has remained unexplored. These social networks usually vary in terms of both their size and the changes over time. Most of the existing algorithms to detect the communities are not defined to identify the communities accurately in such types of social networks. Therefore, the inferences on the efficiency of these community detection algorithms on such evolving real-world social networks have mostly been biased.

Methodology:

We compared nine state-of-the-art community detection algorithms theoretically and experimentally, based on the computed LFR metrics. Our work involves a three-stage process: first, we discuss the representation of temporal and evolving massive datasets as a social network. In this step, we graphically model our DBLP dataset in such a way as to accurately capture or represent the nature of our dataset. Second, we apply nine community detection algorithms on these constructed graphs. Third, we compared these nine algorithms in order to study the strength and or weaknesses of each of the algorithms. The effectiveness of these algorithms is inferred from the computed LFR benchmark metrics, such as Modularity (M), Clustering Coefficient (CC), and Computation Time (CT).

Results:

The comparative analysis of the algorithm has been performed on three types of SNs: small, medium, and large networks. Based on the network properties and the experimental results, we inferred the community detection algorithm that performs best on networks of different sizes. In addition, our work should be helpful to those researchers who aim to understand and explore the workings of different community detection methods.