Open Access Open Access  Restricted Access Subscription or Fee Access

Efficient DNA Compression using CBT Technique-A Novel Algorithm

V. Hari Prasad, Dr.P.V. Kumar

Abstract


Biological databases are growing exponentially due to the excessive surge of DNA sequences storage day by day. Compression thus becomes essential in order to reduce the size of DNA sequence to save not only to reduce storage space but also transmission time in exchanging data between system through web services over internet. The genome of organism contain all hereditary information encoded in DNA in four bases(A,C,G,and T) in terms of repetitive and non repetitive in nature. Many standard compression algorithms are existed to compress genetic sequences. DNA compression algorithms will work on repetitiveness and non repetitiveness of bases of DNA sequences. So our earlier algorithms are achieved high compression rates when the sequence is repetitive. But Such sequences like AT-rich DNA, which constitutes a distinct fraction of the cellular DNA of the archaebacterium Methanococcus voltae, consists of non-repetitive sequences, so earlier compression techniques are not achieved handsome compression rates due to non repetitive fragments are more and more in the sequence and existed algorithms may run in the worst case comparisons..Our proposed novel algorithm CBT(Compression Bit Plane Technique) yields better compression rates in terms of bits per Bases if any sequence contain more non repetitive fragments. The algorithm is also compared with existing ones and is found to achieve better compression ratio than others.

Keywords


Arithmetic Coding, Huffman Coding, DNA Bit Compress, Genbit Compress and Huffbit Compress.

Full Text:

PDF

References


E Schrodinger. Cambridge University Press: Cambridge, UK, 1944.[PMID: 15985324]

R Giancarlo et al. A synopsis Bioinformatics 25:1575 (2009) [PMID:19251772]

EV Koonin. Bioinformatics 15: 265 (1999)

JC Wooley. J.Comput.Biol 6: 459 (1999) [PMID: 10582579]

CH Bennett et al. IEEE Trans.Inform.Theory 44: 4 (1998)

S Grumbach & F Tahi. Journal of Information Processing and Management 30(6): 875 (1994)

E Rivals et al. A guaranteed compression scheme for repetitive DNA sequences. LIFL, Lille I University, technical report IT-285 (1995)

X Chen et al. A compression algorithm for DNA sequences and its applications in Genome comparison. In Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, Tokyo, Japan, April 8-11, 2000. [PMID: 11072342]

TC Bell et al. Newyork:Prentice Hall (1990)

J Ziv & A Lempel. IEEE Trans. Inf. Theory 23: 337 (1977)

J Ziv & A Lempel. IEEE Trans. Inf.Theory, 24: 530 (1978) [PMID: 20157474]

A Grumbach & F Tahi. In Proceedings of the IEEE Data

Compression Conference, Snowbird, UT, USA, March 30–April 2,1993.

Allam AppaRao.In proceedings of the JATIT journal computational Biology and Bio Informatics:[2009].HuffBit compress-compression of DNA using extended binary trees

Allam AppaRao.In proceedings of the JATIT journal computational Biology and Bio Informatics:[2011].Genbit compress-compression of DNA sequences.

Allam AppaRao.In proceedings of the Bio medical Informatics journal [2011].DNABIT compress-compression of DNA sequences


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.