ABSTRACT

Creation of an efficient and appropriate data collection form and database is an important component of the research process, and where possible, the collaborating biostatistician should be part of this process. How we design the questions/instruments will ultimately influence the kind of data that are keyed in, and ultimately analysed and interpreted. As they say ‘garbage in, garbage out’, and it is often very difficult to fix the problem once the data have been collected and keyed into the dataset. For instance, the problem of data entry errors is quite prevalent. Clark and Mulligan (2011) have reported on a study that evaluated the frequency and characteristics of data entry errors in large clinical databases and found that error rates ranged between 2.3% and 26.9%, with the errors being not just mistakes in data entry, but many non-random clusters in errors that could potentially affect the outcome of the studies.