ABSTRACT

The need to document, manage, transfer, analyze, and preserve digital data is a signifi cant driver of the development of tools and technologies for e-research. This ‘data deluge’ is the result of new instruments that collect or log massive amounts of data, the output of large-scale computer simulations, the product of experiments that produce vast quantities of data, and the creation of new databases through the aggregation and integration of distributed and often heterogeneous data (Emmott, 2007; Hey & Trefethen, 2005; 2008; Jankowski, 2007). For instance, in ecology, some types of data that were once collected by hand by ecologists in the fi eld are now being gathered by embedded sensor networks. Small sciences such as ecology, which depend upon fi eldwork, often lack the tools, infrastructure, and expertise to manage the growing amounts of data generated by new forms of instrumentation and the digitization and federation of legacy data (Borgman, Wallis, & Enyedy, 2007; see also Chapter 8 by Meyer in this volume). As these fi elds go from being somewhat data poor to suddenly being data rich, existing methods and tools to manage and analyze data are quickly becoming inadequate. This scenario is increasingly common in many fi elds, including social sciences and humanities, and even in big sciences such as astronomy and physics (Baru, 2007; Hey & Trefethen, 2008). It is not yet clear how this data deluge will affect research practice and outcomes. The purpose of this chapter is to analyze different approaches to data sharing in order to identify important factors that may lead to success.