ABSTRACT

Recognition of emotions is one of the areas that have advanced the most with artificial intelligence. However, creating models that understand the nuances of natural language and speech is still a complex task. This is even worse when we talk about the elderly, a group with greater predisposition to physiological, psychological and even social problems. With emographic growth and aging populations, it is critical for us to develop systems to support the quality of life of these people. In this chapter, we propose some classical and other unusual approaches to perform the recognition of emotions in elderly people through speech using the public database RAVDESS. The idea is that these models are used in human-machine interfaces to support therapists and physicians for these patients. Initially, a CNN architecture with extraction of the log-mel spectrogram attribute was used. Although the results found did not exceed 61%, they served as a starting experiment for the next models. In the second experiment, we applied the wavelet transform, converting sound signals into images through pseudocolors. From these images, 2048 features were extracted by a pre-trained ResNet network. We also apply the particle swarm optimization algorithm (PSO). It selected 410 features considered most influential among those extracted by the deep network, in the previous step. Thus, to investigate the effects of PSO on the architecture, we used both generated sub-bases (with 2048 and with 410 features). Those 2 new sub-bases served for the training and testing of Bayesian Network, Naive Bayes classifier, decision tree J48, Random Tree, Random Forests, and Support Vector Machines (SVM) intelligent classifiers. The final results were compared to each other, considering several metrics. The RBF-kernel SVM with γ = 0.5 showed great potential. The pre-processing with transfer learning reached an accuracy of 81.1%, being the best model.