ABSTRACT

Large scientific collections may be managed as data grids for sharing data, digital libraries for publishing data, persistent archives for preserving data, or as real-time data repositories for sensor data. Despite the multiple types of data management objectives, it is possible to build each system from generic software infrastructure. This entry examines the requirements driving the management of large data collections, the concepts on which current data management systems are based, and the current research initiatives for managing distributed data collections.