An Advanced Dictionary Based Lossless Compression Technique for English Text Data

Dipanjan Bhattacharya; Sanjay Chakraborty; Pinkar Roy; Animesh Kairi

An Advanced Dictionary Based Lossless Compression Technique for English Text Data

Dipanjan Bhattacharya, Sanjay Chakraborty, Pinkar Roy, Animesh Kairi

Abstract

Data compression technique helps us to reduce the size of such large volumes of data that reduces network bandwidth and the storage spaces as well. So text compression is a very important concept in Data Management. The research aim of this paper is to present a new lossless data compression technique for English text compression. It is basically a two steps process. Firstly, there is a reduction using a Dictionary-based lookup table. The dictionary based look-up table is made of as a part of the operating system. The dictionary based look-up table replaces the word by an 18-bit address. The reduction using the look-up table gives us a compression of more than 50% in most cases and the result is stored in a binary file. It is then followed by a compression using a modified Huffman Algorithm, which takes 6 bit data block at a time to build up the Huffman tree. This step together with the reduction, compresses the file to around 32-38% of its original size. Beside this approach, this paper also describes the comparison of this new technique with other well-known compression methods.

Keywords

Text Segment, Reduction, Dictionary Table, Data Compression.

Full Text:

PDF

References

A. Moffat. “Word based Text Compression”, Software – Practice and Experience, Vol 19 Number 2, pp.185-198.

David A. Huﬀman. “A method for the construction of minimum-redundancy codes’’, Proceedings of the Institute of Radio Engineers, Vol 40 , Number 9, pp. 1098–1101.

David Salomon, G.Motta, D.Bryant. Data Compression the Complete References Third Edition. Springer-Verlag New York, Inc, 2004.

D. Lelewer, Data Compression, ACM Press Newyork, NY, USA.

D.Manstetten. “Tight bounds on the redundancy of huffman Codes”, IEEE Transactions on Information theory. Vol 38 Number 1, pp.144-151.

Frank Rubin, "Experiments in text file compression", Comm. ACM, Vol 19, Number 11, 617-623.

Ida Mengyi Pu. Fundamental Data Compression. London, U.K. Butterworth-Heinemann, 2005.

Mark Nelson, Jean-Loup Gailly. The Data Compression book. 2nd ed. New York, NY, USA: Wiley, 1995.

Md. Abul Kalam Azad, Rezwana Sharmeen, ShabbirAhmad and S. M. Kamruzzaman An Efficient Technique for Text Compression In: The 1st International Conference on Information Management and Business (IMB2005).pp 467-473.

Miller, G. A., Newman, E. B., and Friedman, E. A. “Length frequency statistics of written English”. Information and Control, Science Direct, pp. 370-389.

Made Agus Dwi Suarjaya, “A New Algorithm for Data Compression Optimization”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No.8, 2012.

B.A. Al-hmeary, “Role of Run Length Encoding on Increasing Huffman Effect in Text Compression”, Journal of Kerbala University, Vol. 6 No.2 Scientific. 2008.

Chun-Jen Tsai, “Dictionary Techniques”, National Chiao Tung University, 2012.

Zoran H. PERIC, Marko D. PETKOVIC, Milan R. DINCIC, “Simple Compression Algorithm for Memoryless Laplacian Source Based on the Optimal Companding Technique”, INFORMATICA, Vol. 20, No. 1, pp.99–114, 2009.

Joseph Lee, “Huffman Data Compression”, MIT Undergraduate Journal of Mathematics, May 23, 2007.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me