ABSTRACT

Diabetes is a chronic disease which is mainly caused due to high blood sugar levels over a prolonged period. An outcome of this disease is that our human body is unable to produce enough insulin and properly use the produced insulin. In some scenarios, it exhibits a combination of both of these conditions. As a result of this, the body is unable to get sugar from the blood into the cells resulting in high blood sugar levels. Some symptoms of high blood sugar (hyperglycaemia) are fatigue, nausea, vomiting, stomach pain, excessive dry mouth, increased heartbeats, frequent urination, fruity breath odour, increased thirst and increased hunger. The main form of sugar found in our blood is Glucose and further one of the prime energy sources. When the body encounters a lack of insulin, it causes the sugar levels to build up drastically in the bloodstream. If left untreated, diabetes can cause many complications. Some of the problems that are related to this are generally classified as acute and chronic, wherein acute complications have Hypoglycaemia, Hyperglycaemic Hyperosmolar State (HHS), Diabetic Ketoacidosis (DKA) and chronic complications include vision loss, kidney damage, nerve damage, heart and blood vessel disease, dental problems, hand and foot problems. The risk factor and severity of diabetes can be reduced significantly if a precise and early prediction is possible. The sturdy and precise prediction of this co-morbidity is a challenging task, due to a minimal number of labelled datasets as well as a cause of the presence of outliers in the diabetes datasets. In this literature, we are proposing a sturdy and robust framework for diabetes prediction where we have first done Data Collection, then Data Pre-processing, after that implemented Exploratory Data Analysis (EDA), Outliers Detection, Feature Selection, then further incorporated Data Standardization, Data Splitting and applied different Machine Learning (ML) classifiers on the data set, finally implemented K-fold Cross-validation and Hyperparameter tuning. Here, Random Forest Classifier has given us the highest accuracy of 99.3% and we have used this classifier for building the best predictive model.