ABSTRACT

This chapter presents Remote data integrity checking (RDIC) techniques for replication-based, erasure coding–based, and network coding–based distributed storage systems. It describes new directions that were recently proposed for the distributed RDIC paradigm. Replication-based, distributed, and untrusted storage systems re-replicate data when replicas fail, evaluate the correctness of replicas in the system, and move replicas among sites to meet availability goals. Auditing presents several examples, including keeping business records for 7 years in accordance with Sarbanes-Oxley and keeping back tax returns for 5 years. The RDIC protocols allow an auditor to guarantee that data are intact on storage and retrievable using a constant amount of client metadata, a constant amount of network traffic, and (most importantly) by reading a constant number of file fragments. The chapter surveys several RDIC schemes for distributed systems that store data redundantly across several storage servers.