A Knowledge-enhanced Multi-modal Large Language Model for Chinese Guqin Subtractive Notation Interpretation

doi:10.4324/9781003666530-12

Chapter

A Knowledge-enhanced Multi-modal Large Language Model for Chinese Guqin Subtractive Notation Interpretation

ABSTRACT

Guqin is one of the oldest traditional Chinese musical instruments, known for its refined sound and deep cultural symbolism as a typical multimodal cultural memory resource. The development of multimodal large language models (MLLMs) provides new solutions for the knowledge service of multimodal cultural memory resources. However, for the knowledge representation of some special cultural memory resources such as text, images, audio, and video resources related to the Guqin Subtractive Notation, the existing MLLMs need further optimisation to achieve the expected results. This study uses multi-modal Guqin Subtractive Notation resources as training data, and combines them with the Knowledge Graph of Guqin Subtractive Notation to explore a vertical application path of MLLMs in the field of cultural heritage. The ultimate goal of this study is to use the knowledge-enhanced MLLMs to help more people understand the Guqin Subtractive Notation with the help of Generative AI technologies. The practical construction of the interpretation application scenario in this study demonstrates that LLMs perform better in solving discriminative problems, but in addressing generative issues in natural language interaction, combining Knowledge Graph technology and the formalised knowledge generated by human experts can significantly enhance the accuracy, reliability, and professionalism of LLMs in vertical tasks.