ABSTRACT

Cardiovascular disease (CVD) is the world’s biggest cause of mortality. Therefore, it is paramount that the disease be diagnosed as early as possible to prevent death. Undergoing ECG tests routinely are expensive and impractical for the plebeian. This research aims at giving insights into the efficacy of different machine learning algorithms for effective cardiovascular disease prognosis, and the process evaluates the best available models and bestows a strong framework for future research in the field, as well as provides a collaborative comparison of these techniques and models. The proposed model employs data preprocessing and data transformation methods which are used to generate reliable data for the machine learning model. An amalgamated dataset was used (Statlog, Long Beach VA, Switzerland, Hungarian, and Cleveland). Feature extraction is done using principal component analysis. Models and techniques such as multilayer perceptron, random forest, gradient boosting, extra-trees classifier, logistic regression, adaptive boosting, and decision tree classifier are used in the prediction of CVD. The models are evaluated using k-fold cross-validation. The following performance metrics are calculated: accuracy, precision, recall, and f1 score of the models. After performing analysis on the results, this research arrives at the conclusion that the random forest model and principal component analysis method produced the highest accuracy for a particular k-fold (100%).