Graph-based Text Document Extractive Summarization

doi:10.1201/9781003272649-12

ABSTRACT

Text document summary is an emerging application of natural language processing and information retrieval. Extracting representative or condensed content from a text document is necessary to reduce time demands on end-users of news articles and blogs, for example. Summaries can be generated from a single document or multiple documents

Extraction of the important sentences from documents is known as extractive summarization, whereas abstractive summarization is a method for generating a summary based on important terms in documents. The challenges posed by extractive and abstractive summarizations are different, hence the choice of which to use depends on the application.

This chapter focuses on how graph theory can be used to extracting summaries of text documents, provides the background to the text summarization problem, its types, advanced graph types, and methods for extracting summaries using graphs. Sentences, their order, and the order of the terms within them can be analyzed using graph theory to produce summaries. Graph types such as basic graphs, bipartite graphs, hypergraphs, and semigraph have been explored by researchers for extracting document summaries.