ABSTRACT

Maybe there is information on a Web site that a user would like to analyze. Or perhaps there are a lot of files on a Web site that the user wants to download – and he/she does not want to click on each link manually. This chapter shows how to create a simple file-download scraper for all of RStudio’s PDF cheat sheets. It shows the use of the robotstxt package to see whether a Web site permits scraping and the use of SelectorGadget Chrome extension to identify portions of a page we wanted to “scrape.” The rvest package can be used in order to extract information from a Web page based on SelectorGadget CSS selectors. The chapter also looks into purrr’s walk() and map() functions to apply download.file() to a list of links, including map_chr().