ABSTRACT

Surveillance systems collect large amount of data from an array of sensors, such as visual, sound, infrared, vibration, and other sensing modalities. Reviewing, analyzing, or understanding these data is beyond human capacity as far as effectiveness and efficiency is concerned. Therefore, in the “Big Data” era, it is crucial for surveillance systems to invest in intelligent data processing to automate many layers of its processing model. In this chapter, three different processing techniques for feature extraction of sound signal characterization are presented: Fast Fourier Transform (FFT), Linear Predictive Coding (LPC), and Statistical-based Characterization (SBC). To compare these methods, sound signals are collected via a signal frame grabber, which preprocesses the frame to denoise it, normalizes the signal frame, extracts features using the aforementioned techniques FFT, LPC and SBC, trains the Artificial Neural Network (ANN) on feature vector space, and finally tests the ANN model for classification accuracy. The three methods were tested using 750 sound events of interest: noise, talking, pounding, screaming, and a mix of all. The FFT technique was superior to others, with an overall accuracy of 93.8%.