The utilization of deep architectures for generation of literary and musical compositions: A survey

doi:10.1201/9781003559092-69

Chapter

The utilization of deep architectures for generation of literary and musical compositions: A survey

ABSTRACT

This survey explores the impact of Generative Artificial Intelligence (AI) on creative expression, primarily lyric, rap, and poetry generation, voice synthesis, and music, and examines various systems employing deep neural network architectures to generate musical and literary compositions. In lyric generation, LSTM networks’ success in capturing genre-specific linguistic features is discussed. For rap composition, methodologies like LSTM-based models, prediction models, and Transformer-based autoregressive models are examined. In poetry generation, Hafez is highlighted for its three-step process involving related rhyme word search, FSA creation, and RNN-based poem generation. Voice synthesis studies feature Diffsinger, GAN-based singing voice synthesis, and deep autoregressive networks. Additionally, MuseGAN and Learn2Sing, works that address multi-track polyphonic music and individualized singing voice synthesis, are studied. The data and model training section outlines diverse sources and preprocessing techniques used in different art generation models. Beyond RNNs and LSTMs, potential alternatives like Markov models, VAEs, HRED, GRUs, Transformer architectures, and GAN variants are examined for their scope in enhancing coherence, fluency, and possibly creativity in generated content. This work overall emphasizes the diversity of methodologies employed and suggests potential avenues for future research in generating structured poetic and artistic content.