ABSTRACT

This chapter gives knowledge about the latest techniques of large-scale data collection schemes to the readers. In the data science field, stream data generated from various sensors are analyzed to get various information. A larger amount of data can lead to higher quality information, and faster stream data collection is one of the main techniques used in the data science field, and various schemes have been proposed. However, the existing techniques do not assume different intervals at the same time to collect data periodically. Therefore, we define continuous sensor data with different intervals (cycles) as a sensor data stream and have proposed collection methods for distributed sensor data streams as a topic-based pub/sub (TBPS) system. We have evaluated our proposed method in simulation and confirmed that our proposed method can realize highly scalable systems to periodically collect distributed sensor data. The scalability of the data collection system is significantly important to accommodate a huge number of objects and to encourage the growth of the data science field.