ABSTRACT

Deduplication, particularly within backup and recovery environments, has very much become a mainstream technology. Deduplication greatly relies on the access speed of the disk in order to deliver operational efficiency. Deduplication is a logical extension of two existing technologies- single-instance storage (used quite successfully in systems such as archival products and many mail servers in order to reduce storage requirements) and traditional file/data compression technology. When considering deduplication for primary storage, the key considerations typically fall into one of the two following categories: reliability; and performance. While deduplication storage systems are growing in popularity for primary storage, they have usually offered some of the highest cost savings and efficiency gains in backup and recovery environments. Garbage collection is an essential task in deduplication storage. This can even become a vicious cycle- garbage collection aborted because it takes too long to run and will only increase the amount of storage to be considered on the next garbage collection run.