ABSTRACT

It is necessary to bring the data at one place, to form a particular number of instances. This phase is more commonly known as data collection. But the authors are not only in the process of collecting the data but need the data to be clean and noise-free. The gathering of data may vary from people to people according to their profession. The data should be clean and noise-free. This data works as an input for the machine learning model. The data collected by an internet of thing (IoT) file is large in volume and of different formats. The IoT data have different formats such as BDB format and graphic format. When the authors collect the data from various sources, there are various challenges. The data collected from the sources is preprocessed by maintaining the file format and handling the missing values. So, the data contains all the accurate values, no missing value, and aligned attributes.