Audio Feature Extraction

doi:10.1201/b11041-8

ABSTRACT

The automatic analysis of music stored as a digital audio signal requires a sophisticated process of distilling information. For example, a three-minute song stored as uncompressed digital audio is represented digitally by a sequence almost 16 million numbers (3 [minutes] * 60 [seconds] * 2 [stereo channels] * 44100 [sampling rate]). In the case of tempo induction, these 16 million numbers need to somehow be converted to a single numerical estimate of the tempo of the piece.