Motif discovery using Genetic Algorithm

Dibyendu Bhaumik; Subrata Sahana

Motif discovery using Genetic Algorithm

Dibyendu Bhaumik, Subrata Sahana

Abstract

In this paper we propose to use the basic structure of genetic algorithm for finding potential motifs in the regions located from the -2000 bp upstream to +1000 bp downstream of transcription start site (TSS). The mutation in the GA is performed by using position weight matrices to reserve the completely conserved positions. The crossover is implemented with special-designed gap penalties to produce the optimal child pattern. We also present a rearrangement method based on position weight matrices to avoid the presence of a very stable local minimum, which may make it quite difficult for the other operators to generate the optimal pattern. Our method provides better result than Multiple Em for Motif Elicitation (MEME) and Gibbs Sampler, which are two popular algorithms for finding motifs.

Keywords

DNA, FMGA, Motif Pattern, Total Fitness Score.

Full Text:

PDF

References

G. D. Stormo, “DNA binding sites: representation and discovery,” Bioinformatics, 16, pp. 16-23, 2000.

Diego di Bernardo1, Thomas Down and Tim Hubbard, “ddbRNA: detection of conserved secondary structures in multiple alignments,” J. Bioinformatics, 19, pp. 1606-1611, 2003.

Stephen T. Smale and James T. Kadonaga, “The RNA Polymerase II Core Promoter,”Annu. Rev. Biochem, 72, pp. 449-479, 2003.

[Thomas D. Schneider, “Consensus Sequence Zen,” Applied Bioinformatics, 1(3), pp. 111-119, 2002.

Uwe Ohler, Computational promoter recognition in eukaryotic genomic DNA. Ph.D. Thesis, Technische Fakultat der Unicersitat Erlangen-Nurnderg, 2001.

Jennifer E.F. Butler and James T. Kadonaga, “The RNA polymerase II core promoter: a key component in the regulation of gene expression,” Proc. Genes & Development, 16, pp. 2583-2592, 2002.

Alan K. Kutach and James T. Kadonaga, “The downstream promoter element DPE appears to be as widely used as the TATA box in Drosophila core promoters,” Molecular and Cellular Biology, 20 (13), pp. 4754-4764, 2000.

Timothy L. Bailey and Charles Elkan, “Fitting a mixture model by expectation maximization to discover motifs in biopolymers,”Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, California, pp. 28-36, 1994.

William Thompson, Eric C. Rouchka and Charles E. Lawrence, “Gibbs Recursive Sampler: Finding transcription factor binding sites,” J. Nucleic Acids Research, 31, pp. 3580-3585, 2003.

N. W. Lo, S. W. Changchien, Y. F. Chang and T. C. Lu, “Human promoter prediction based on sorted consensus sequence patterns by genetic algorithms,” Proceedings of the International Congress on Biological and Medical Engineering, D3I-1540: pp 111-112, 2002.

Jonathan M. Keith et al., “A simulated annealing algorithm for finding consensus sequences,” J. Bioinformatics, 18, pp. 1494-1499, 2002.

Cedric Notredame and Desmond G. Higgins, “SAGA: Sequence alignment by genetic algorithm,” J. Nucleic Acids Research, 24, pp. 1515-1524, 1996.

M. Scherf, A. Klingenhoff, T. Werner, “Highly Specific Localization of Promoter Regions in Large Genomic Sequences by PromoterInspector: A Novel Context Analysis Approach,” J. Mol. Biol. 297 (3), pp. 599-606, 2000.

Sadiq M. Sait, Habib Youssef, “Iterative computer algorithms with applications in engineering: solving combinatorial optimization problems”. Los Alamitos, Calif.: IEEE Computer Society, c1999.

Falcon F.M Liu, Jeffrey J.P. Tsai, R.M Chen, S.N. Chen and S.H. Shih, “FMGA: Finding Motifs by Genetic Algorithm” Bioinformatics and Bioengineering, 2004. BIBE 2004, pp.459-466 Proceedings. Fourth IEEE Symposium on May 2004.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me