ABSTRACT

The evolution of the IoT is representative of a new age of technology and administrations that aim to contribute in this new area will have to modify their method to adapt to new data forms and data sources. And these modifications are just the start. As the IoT develops and businesses expand with IoT, they will have many more faces to resolve. The raw IoT data is collected from different sensors, which leads to many problems, such as noisy, heterogeneous, and massive data. Our proposed system aims to solve these problems of IoT data. The architecture of the proposed system involves of two major stages: data pre-processing and data processing. In the preprocessing phase, we used KNN to clean noisy data and replace missing data, which can use the most probable value. The SVD is used to reduce data to save time. The mutual information is implemented to detect the relationship between the data and detect semantic clustering to achieve high accuracy and speed up the running time. We compared between many different techniques such as KM, Optics, EM, DBSCAN as well as the proposed techniques FCM-DBSCAN and KMeans-DBSCAN. We found that FCM-DBSCAN, with its varied approaches for data reduction, had the high accuracy value. FCM-DBSCAN with SVD had the highest value of accuracy and the smallest retrieval time. KMeans-DBSCAN has a small data retrieval time to but has less accuracy than FCM- data DBSCAN. The proposed FCM-DBSCAN technique is applied to both MapReduce and Spark. The proposed technique is faster in Spark than in MapReduce.