ABSTRACT

Automatic speech scoring systems rely on accurate Automatic Speech Recognition (ASR) for optimal performance; therefore, in order to accurately describe the strengths and limitations of automatic speech scoring systems, this chapter presents information about this crucial component in the system’s pipeline. ASR research is a multidisciplinary area that incorporates knowledge from the fields of electrical engineering, computer science, mathematics, and linguistics. Research on ASR began in the late 1940s with support from the US Department of Defense. In 1952, the first spoken digit recognizer for a single speaker was built at Bell Laboratories. ASR is the computational process that converts a speech signal contained in a digital audio recording to a text output. The speech signal captured by a microphone can be converted into digital form by an analog-to-digital converter and stored in the form of a waveform, which is a representation of how a signal changes over time.