ABSTRACT

Diabetes mellitus is one of the most common diseases, so an efficient method for predicting it can help with self-diagnosis. Machine learning (ML) can detect patterns from data to predict the disease. The conventional way to identify diabetes is a blood glucose test; therefore, most patients cannot get an immediate diagnosis. In this chapter, we discuss five well-known and state-of-the-art ML algorithms, including logistic regression, support vector machines, random forest, XGBoost, and neural networks to predict diabetes mellitus. We have used the dataset from the University of California, Irvine (UCI) ML repository collected by a direct survey of patients from Sylhet Diabetes Hospital. An optional library has been used for tuning hyperparameters for various algorithms. We have used the model averaging ensemble model by stacking random forest, XG-Boost, and neural network together and comparing performance using accuracy as a metric. The best accuracy was 98.08%, achieved by using a neural network with two hidden layers. We provide a comparative discussion of our approach and results with all the other existing methods in the literature. The results demonstrated in this chapter present a significant improvement in diabetes prediction using ML over the established state-of-the-art.