ABSTRACT

In this chapter, we will concentrate on techniques and technologies that extract relationships, discover networks of associations, and find key concepts from unstructured content. Data extraction, also known as data scraping, usually involves the process of retrieving data for further processing such as clustering or segmentation analysis. It can also involve the extraction of unstructured data (text) for further processing via a structured type of analytical tool; this may involve some transformation and possibly the addition of metadata. Typical unstructured data sources include web pages, e-mails, documents, PDFs, scanned text, mainframe reports, spool files, etc. A typical application of data scraping is collecting information for litigation investigations, or the scraping of web data for competitive intelligence.