ABSTRACT

In situ measurement implies that the instrument sensors are physically located in the environment they are monitoring. These sensors collect time-series data that flow from the sensor to the data repository continuously, creating a data stream. Typically these sensors operate under harsh conditions, and the data they collect must be transmitted across various types of data communication networks; thus, the data can easily become corrupted through faults in the sensor or in data transmission. Undetected erroneous data can significantly affect the value of the collected data for applications. Critically important are the situations where the data are used for real-time forecasting, when there is limited time to verify the quality of the data. For this reason, robust and scientifically sound methods for detecting erroneous data before it is archived are necessary. Due to the vast quantity of data being collected, these methods must be automated in order for them to be practical. Since it is often difficult to determine whether an anomalous measurement has occurred due to a sensor or data transmission fault, or due to an unusual environmental system response, many fault detection techniques seek to identify anomalous measurements – measurements that do not fit the historical pattern of the data, but may not necessarily be caused by sensor or data transmission faults. The causes of these measurements can then be investigated to determine whether or not the measurement actually represents the environmental system state.