ABSTRACT

In this section, we will learn about web scraping, one of the more vital skills of a digital humanist. Web scraping is a process by which we automate the calling of a server (which hosts a website) and parsing that request which is an HTML file. HTML stands for HyperText Markup Language. It is the way in which websites are structured. When we scrape a website, we write rules for extracting pieces of information from it based on how that data is structured within the HTML. To be competent at web scraping, therefore, one must be able to understand and parse HTML.