ABSTRACT

Missing Data is a common phenomenon in many research areas and can have significant impact on the conclusions that can be drawn from such data. Existing missing data imputation techniques replace the missing entries by some plausible values either using deterministic or random methods. In this work, we propose a new approach, weighted cluster softmax technique, which is a generalization of K-Means clustering for handling multivariate missing value imputation. We have considered multiple open source data set from UCI libraries to test the algorithm and it is observed that our proposed method performs better than existing simple mean imputation and K-Means clustering based method.