ABSTRACT

The last decade has seen the rise of various mechanisms for organizing minimally-structured, human-processable data in the large, from the ranking of HTML pages at the scale of the Web to the classification of keyword-annotated digital images. Today, it is believed that new revolution targeting declarative, semi-structured machine-processable information is on its way. End users, who used to be restricted to passively consuming manually curated digital information, are today evolving into industrious supervisors of semi-automatic processes creating digital artifacts on a continuous basis. Peer production [Ben05], where decentralized communities of individuals collaborate to create complex digital artifacts, or human computation [vA06], where, interestingly, computational processes perform their functions by outsourcing certain steps to human agents, are just two facets characterizing this evolving trend towards a data industrial revolution. Networks of computers, yesterday considered as a convenient medium to store and transmit humantargeted information, are today evolving into autonomous spaces consuming, transforming, but also producing their own information. As structure is still inherently implied by all machine-processable data, we believe that this revolution represents a formidable challenge towards creating next-generation information management algorithms, relying on increasingly complex-but also uncertain-digital information to support higher-level data processing.