ABSTRACT

Software projects often adopt software repositories from which software applications can be retrieved and installed. For supporting the selection of the right applications, often, categorizations are applied. Unfortunately, due to the large size and variety, as well as the evolution of the applications, manual categorization is error-prone, costly, and time-consuming. Hence, administrators prefer the use of automated tools that can detect the appropriate category of the application. In this context, we introduce a machine learning–based application categorization service that automates the fast classification of software applications. Several different machine learning classification algorithms have been evaluated using public data sets, and the best-performing machine learning algorithm has been built as a service and deployed on the cloud. The project source code is analyzed at the client side, and the cloud web service is called automatically to determine the appropriate category of the application. Our experimental results show that cloud-based categorization of software applications is promising, and software companies can build their own prediction service for the existing applications for the management of the vast repositories.