Open Access Open Access  Restricted Access Subscription or Fee Access

Survey on Content based Chunking in Data Deduplication on Cloud Storage Environment

S. Ruba, A. M. Kalpana

Abstract


With an enormous growth of digital data in the cloud storage, effective methods need to be employed to reduce hardware costs, meet the bandwidth requirements and to increase storage efficiency. It can be established by using Data Deduplication. Data Deduplication is a technique of detecting and eliminating the redundant copy of data.  Thus by storing less data, it would need less hardware and would be able to better utilize the existing storage space. Data deduplication plays an important role in data transmission in various data intensive network and cloud applications.  Deduplication system can use either fixed or variable size algorithms that mainly focuses on the requirement of storage space reduction by storing only single copy of data. This paper review the various chunking techniques that partition data stream or file into chunks of variable size.


Keywords


Backup Storage, Bimodal Chunking, Cloud Computing, Cloud Storage, Chunking Algorithms, Data Deduplication, Fixed Size Chunking, Multimodal CDC, Variable Size Chunking.

Full Text:

PDF

References


Tridgell, A., and Mackerras, P. The rsync algorithm. Technical report TR-CS-96-05, Department of Computer Science. 1996)

Sean Quinlan, Sean Dorward. Venti: a New Approach to Archival Storage. In Proceedings of the First USENIX Conference on File and Storage Technologies (FAST'02). 2002.

Neil T. Spring, David Wetherall. A Protocol-Independent Technique for Eliminating Redundant Network Traffic. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'00). 2000, pp. 87-95.

Broder, Andrei Z. On the resemblance and containment of documents. In Proc. of compression and complexity of sequences (SEQUENCES’97). 1997.

Muthitacharoen, B. Chen, and D. Mazi`eres. A low-bandwidth network file system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP ’01). October 2001, pp. 174-187.

Lawrence L. You, Kristal T. Pollack, Darrell D. E. Long. Deep Store: An Archival Storage System Architecture. in Proceedings of the 21st International Conference on Data Engineering . April 2005, pp. 804-- 815.

Cai Bo, Zhang Feng Li, and Wang can,” Research on Chunking Algorithms of Data De-duplication, proceeding of the ICCEAE, Springer, AISC 181, pp. 1019-1025, 2013.

Navendu Jain, Mike Dahlin, Renu Tewari. TAPER: Tiered Approach for Eliminating Redundancy in Replica synchronization. In Proceedings of the 2005 USENIX Conference on File and Storage Technologies (FAST'05). 2005.

Purushottam Kulkarni, Fred Douglis, Jason LaVoie, and John M. Tracey. Redundancy Elimination Within Large Collections of Files. In Proceedings of 2004 USENIX Technical Conference. 2004.

Deepak R. Bobbarjung, Suresh Jagannathand and Cezary Dubnicki. Improving Duplicate Elimination in Storage Systems. ACM Transaction on Storage. 2006, Vol. 2, 4.

Eshghi, K. A. 2005. Framework for Analyzing and Improving Content-Based Chunking Algorithms. Technical Report HPL-2005-30(R.1), Hewlett Packard Laboratories, Palo Alto, CA.

Jiansheng Wei, Junhua Zhu, Yong Li,” Multimodal Content Defined Chunking for Data Deduplication”, https://www.researchgate.net/publication/261286019, Research gate, 2014.

Chuanshuai Yu, Chengwei Zhang, Yiping Mao, Fulu Li, “Leap Based Content Defined Chunking- Theory and Implementation”, 31st Symposium on Mass Storage Systems and Technologies (MSST), IEEE, pp. 1-12, 2015.

Yucheng Zhang, Hong Jiang, Dan Feng, Wen Xia, Min Fu, Fangting Huang, Yukun Zhou, ”AE: An Asymmetric Extremum Content Defined Chunking Algorithm for Fast and Bandwidth-Efficient Data De-duplication”, 2015 IEEE Conference on Computer Communications (INFOCOM), IEEE, pp. 1337- 1345, 2015.

Wen Xia, Huazhong University of Science and Technology and Sangfor Technologies Co., Ltd.; “FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication” on 2016 USENIX Annual Technical Conference (USENIC ATC ’16). June 22–24, 2016 • Denver, CO, USA 978-1-931971-30-0

Jin Li, Xiaofeng Chen, Xinyi Huang, Shaohua Tang, Yang Xiang, Mohammad Hassan, Abdulhameed Alelaiwi, “Secure Distributed Deduplication Systems with Improved Reliability”, IEEE Transactions on Computers Volume: PP,Year – 2015

Qinlu He, Zhanhuai Li, Xiao Zhang,”Data De-duplication Techniques ”, International Conference on Future Information Technology and Management Engineering, IEEE, pp. 430-433, 2010.

Rashmi Vikraman, Abirami S,” A Study on Various Data De-duplication Systems”, International Journal of Computer Applications, Volume 94, No4, May 2014.

Kubiatowicz J et al (2000) Oceanstore: an architecture for global store persistent storage. In: Proceedings of the 9th international conference on architectural support for programming languages and operating systems.

Quinlan S, Dorwards S (2002) Venti: a new approach to archival storage. In: Proceedings of USENIX conference on file and storage technologies.

Kave Eshghi, Hsiu Khuern Tang, ,”A framework for analyzing and improving content based chunking Algorithms” Technical Report TR 2005-30, Hewlett-Packard Development Company, http://www.hpl.hp.com/techreports/2005/HPL-2005-30R1.html.

Teng-Sheng Moh, BingChun Chang,” A Running Time Improvement for the Two Thresholds Two Divisors Algorithm” ACMSE '10, April 15-17, 2010.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.