ABSTRACT

154Most sensor network applications aim at monitoring the spatiotemporal evolution of physical quantities, such as temperature, light, or chemicals, in an environment. In these applications, low-resources sensor nodes are deployed and programmed to collect measurements at a predefined sampling frequency. Measurements are then routed out of the network to a network node with higher resources, commonly referred to as base station or sink, where the spatiotemporal evolution of the quantities of interest can be monitored.

In many cases, the collected measurements exhibit high spatiotemporal correlations and follow predictable trends or patterns. An efficient way to optimize the data collection process in these settings is to rely on machine learning techniques, which can be used to model and predict the spatiotemporal evolution of the monitored phenomenon.

This chapter presents a survey of the learning approaches that have been recently investigated for reducing the amount of communication in sensor networks by means of learning techniques. We have classified the approaches based on learning into three groups, namely, model-driven data acquisition, replicated models (RM), and aggregative approaches.

In model-driven approaches, the network is partitioned into two subsets, one of which is used to predict the measurements of the other. The subset selection process is carried out at the base station, together with the computation of the models. Thanks to the centralization of the procedure, these approaches provide opportunities to produce both spatial and temporal models. Model-driven techniques can provide high energy savings as part of the network can remain in an idle mode. Their efficiency in terms of accuracy is, however, tightly dependent on the adequacy of the model to the sensor data. We present these approaches in Section 7.2.

RM encompass a set of approaches where identical prediction models are run in the network and at the base station. The models are used at the base station to get the measurements of sensor nodes, and in the network to check that the predictions of the models are correct within some user defined ϵ. A key advantage of these techniques is to guarantee that the approximations provided by the models are within a strict error threshold ϵ of the true measurements. We review these techniques in Section 7.3.

Aggregation approaches allow to reduce the amount of communication by combining data within the network, and provide to a certain extent a mixture of the characteristics of model-driven and RM approaches. They rely on the ability of the network routing structure to aggregate information of interest on the fly, as the data are routed to the base station. As a result, the base station receives aggregated data that summarize in a compact way information about sensor measurements. The way data are aggregated depends on the model designed at the base station, and these approaches are therefore in this sense model driven. The resulting aggregates may, however, be communicated to all sensors in the network, allowing them to check the approximations against their actual measurements, as in RM approaches. We discuss aggregative approaches in Section 7.4.