Graph-based Extractive Approach for English and Hindi Text Summarization

doi:10.1201/9781003272649-3

Chapter

Graph-based Extractive Approach for English and Hindi Text Summarization

ABSTRACT

In the current era, the exponential growth of the Internet is providing astonishing availability of documents. It has become a difficult as well as challenging task for human beings to manually extract summaries of these large numbers of documents. This has given rise to meticulous research in the field of text summarization. Text summarization is a process that shortens large pieces of text while preserving key information in their content. This is having a large impact on users’ lives as it conveys relevant and important information to them. This chapter introduces an extractive approach to summarization of text. This approach uses graph-based algorithms for text summarization where cosine similarity as well as the term frequency-inverse document frequency technique are used to generate the summarized text while keeping the relevant and meaningful information intact.

The proposed technique is implemented for English and Hindi text. The results are shown as summarized text in both the languages. The chapter focuses on multilingual natural language processing as a combination of computer science, linguistics, and mathematics where two languages, English and Hindi, were selected for experimental purposes.