ABSTRACT

How data can be extracted from the Web in an astonishingly short time, without any certainty that what we get is what we actually wanted.

When the “network” first appeared at the end of the 1960s, only a small clique of experts could use it. ARPANET had been designed by keen scholars as a military and research tool, without the intention of it ever becoming a vehicle (perhaps the vehicle) for commercial business. And even after the transition of the 1980s when ARPANET and other existing networks started operating together under the TCP/IP protocol giving rise to the Internet, the network was still not suitable for the general public. The killer idea was the creation of the Web, with the crucial feature of assigning a Uniform Resource Locator (URL) to each Web page, for example: https://en.wikipedia.org/wiki/URL - the Wikipedia page about URLs. But even then, wandering in the ever growing Web was hard for non specialists, a little like navigating an ocean without GPS or even a compass.1 A great improvement in Internet usage came with the development of browsers that permit any Web page to be viewed once its URL is known. The search can start from a given or previously known site, and proceed following clickable links until a desired or at least interesting page is reached. It goes without saying that most of the relevant information on the Internet remains unreachable if one can only use a browser in this way, but the big leap towards the consumer Internet revolution had been taken.