Text Mining

doi:10.1201/9780367808495-7

ABSTRACT

The major objectives of text mining (text data mining/text analytics) are to extract the pattern or information from the largely available unstructured or semi-structured text data. Data mining deals only with structured data whereas text mining deals with semi-structured or unstructured data, Around 80% of data stored throughout the globe is in unstructured or semi-structured form, it is the biggest need for text mining to manipulate the data in a meaningful way, there are many techniques like sentimental analysis, natural language processing (NLP), information extraction, information retrieval, clustering, concept linkage, associate rule mining (ARM), summarization, topic tracking are used to extract the data based upon the nature of data and will be discussed further on each technique in this chapter, but the major problem in the text mining is the ambiguity of the natural language, as the one word can be interpreted in multiple ways, ambiguity is the primary challenge for the researchers to address and the possible solutions are explained. Algorithms such as genetic algorithm, differential evolution can be combined to get the desired result, the output of algorithm can be scaled so that it can ensure the quality of the text retrieval. There are two methods called as precision and recall is used to 168measure text retrieval quality in text mining. There are several applications that are associated with text mining such as healthcare, telecommunication, research papers categorization, market analysis, Customer Relationship Management (CRM), banks, Information Technology and another environment where the huge unstructured volume of data is generated.