ABSTRACT

This chapter presents the most common approaches used in systems for identifying social bots. It focuses on egocentric analysis methodology due to its advantages with respect to data collection and algorithmic complexity. The chapter examines the extracted features in terms of their contribution to overall performance and redundancy within the feature set. It presents an online bot detection system, Botometer, that is freely available for academic and public use as part of the Observatory on Social Media. The chapter considers four types of links: retweet, mention, being retweeted, and being mentioned. It describes a few additional feature selection methods inspired by information theory. The chapter analyzes the top features identified by the Random Forest algorithm and also evaluated other feature selection mechanisms in the recent literature. Feature selection is as essential as feature engineering for improving the performance of bot detection systems, especially when taking into consideration trade-offs between accuracy and computational speed.