ABSTRACT

In a smart city, emotion recognition technology can be used to improve the efficiency and effectiveness of various public services. For example, it can be integrated into public safety systems to detect and respond to potentially dangerous situations or used to monitor the emotional state of citizens to provide targeted mental health resources. In this chapter, an ensemble learning model has been developed to classify emotions using both speech and facial expressions, with the goal of creating a more natural and intuitive human-machine interaction. The model was trained on the RAVDESS dataset using a neural network-based approach, with an MLPClassifier for speech-emotion-recognition and a transfer learning-based Convolution Neural Network (CNN) model for face-expression-emotion recognition. Future improvements could include expanding the dataset to include more diverse emotions and actors, as well as exploring the use of transformer-based models to capture temporal information.