Model Building with Big Complete and Incomplete Data | 23 | v3

ABSTRACT

Statisticians cannot avoid missing data. All but every dataset has some missing data, causing concern about how to accept them. All big datasets have lots of missing data, causing greater concern. Traditional data-based methods, predating big data, are known to be problematic with virtually all datasets. These methods now open a greater concern as to their unknown ineffectiveness on big data. This chapter presents a new data-based approach, in the face of known effeteness of data-based methods for model building with big complete and incomplete data. It illustrates the approach with a small dataset study for ease of presentation, which gives evidence the proposed procedure is viable for all sizes of datasets. The chapter expresses that data-based method can be for big complete and incomplete data with the use of principal component analysis (PCA) against compelling recognition of the faulty complete case analysis (CCA) when the MAR and MCAR assumptions are not satisfied.