ABSTRACT

Customer churn is a major problem and one of the most important concerns for large businesses. Due to the direct effect on the profits of the companies, especially in the telecommunication area, companies are looking for ways to develop means to predict potential customers which may churn. Therefore, looking for factors that increase customer churn is important to take necessary actions to reduce this retention. The main contribution of our work is to develop a churn prediction model which assists telecommunication businesses to predict customers who are most to leave after a certain period of time. The model developed in this work uses machine learning techniques on big data platform and builds a new way of feature selection. In order to measure the performance of the model, the Area Under Curve (AUC) standard measure is adopted, and the AUC value obtained is 84%. The model was prepared and tested through python environment by working on a large dataset that was found at www.kaggle.com. The model experimented four algorithms: Logistic Regression, Random Forest, Support Vector Machines and Extreme Gradient Boosting “XGBOOST”. However, the best results were obtained by applying random forest algorithm at a 80% accuracy.