ABSTRACT

ABSTRACT: Twitter is one platform where people express their thoughts on any trending topics they are interested in. The exploration of this data can help us to find peer groups or group of users with similar interests. As in any other social network, this is also subjected to various spam attacks. So before identifying peer groups, the accounts that are ingenuine or regularly involved in spamming activities has to be filtered out. The main idea is to make use of the URLs the accounts share and their frequency to identify the account type.Here instead of focusing on one account, a group of accounts or a campaign is identified based on the similarity of the accounts. The similarity measure is calculated by applying Shannon’s Information theory to estimate the amount of information in a URL and then using the value to find out information shared by each account. Once similar accounts are identified a graph is plotted connecting those accounts who have a similarity measure above a threshold. The potential campaigns are identified from this graph. Then they are classified to spammers and normal users using ML algorithms. The normal users we thus identify are members who have similar interests. To further improve the efficiency these members are grouped together based on their location, so peer groups in a locality are identified. This peer group identification can help in connecting those people with similar interests in a locality.