The viability of an MER system largely lies in the accuracy of emotion recognition. However, due to the semantic gap between the object feature level and the human cognitive level of emotion perception, it is difficult for the machine to accurately compute the emotion values, especially the valence values. Consequently, many efforts have been made to incorporate mid-level features of music to MER. For example, B. Schuller et al. incorporated genre, ballroom dance style, chord progression, and lyrics in their MER system and found that many of them contribute positively to the prediction accuracy [286-288]. Similar observations have also been made by many other researchers [60, 139, 183, 207]. The following three chapters describe how such mid-level features, including lyrics, chord progression, and genre metadata, can be utilized to improve MER. For simplicity, we focus on categorical MER in the following three chapters. We begin with the use of text features extracted from lyrics in this chapter.