Open Access Open Access  Restricted Access Subscription or Fee Access

The Data Mining Approaches for Multi-Class Protein Fold Recognition

Lokesh K. Sharma, Sourabh Rungta


Computation analysis of the biological data obtained in genome sequencing and other projects is essential for understanding cellular function and the discovery of new drug and therapies. Data mining become an important tool for researchers of various field including bioinformatics. Protein fold recognition is an important approach to structure discovery in bioinformatics. In this paper the protein fold recognition methods are studied. Supervised learning methods of data mining are carried out and tested for multi-class protein fold recognition. The accuracy is measured by various statistics parameters and the results are reported in this paper. In the result we found that Bayesian Network classifier works better compare as other methods in the cross validation test. The Bayesian Network and Multi Layer Perceptron are reasonably comparable in independent test data supply; accuracy of both methods relatively similar. It is also observed that one-versus-other and all-versus-all mechanisms improve the accuracy as individual parameters.


Protein Structure Recognition, Bioinformatics, Data Mining and Supervised Learning.

Full Text:



A. Mokubo et al. "Chemotactic cytokine receptor 5 (CCR5) gene promoter polymorphism (59029A/G) is associated with diabetic nephropathy in Japanese patients with type 2 diabetes: a 10-year longitudinal study", Diabetes Res. Clin. Pract. 2006 Jul; 73(1), 2006, pp. 89-94.

C. H. Ding and I. Dubchak, "Multi-class protein fold recognition using support vector machines and neural networks", Bioinformatics, Vol. 7, 2001, pp.349–358. valid on 15 May 2012.

I. Dubchak, et al., "Recognition of a protein fold in the context of the structural classification of proteins (SCOP) classification", Proteins, Vol. 35, 1999, pp.401–407.

I. K. Valavanis,G. M. Spyrou and K. S. Nikita, "A comparative study of multi-classification methods for protein fold recognition", Int. J. Computational Intelligence in Bioinformatics and Systems Biology, Vol. 1, No. 3, 2010, pp. 332-346.

J. Kopp and T. Schwede, "The Swiss-model Repository: new features and functionalities", Nucleic acids research, Vol. 34, Database issue, 2006, pp. D315-D318.

J. Y. Chen and S. Lonardi, "Biological Data Mining", CRC Press, 2010.

K. Marsolo, S. Parthasarathy, and C. Ding, "A multi-level approach to SCOP fold recognition", in Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering, 2005, pp.57–64.

M. W. Craven et al., "Predicting protein folding classes without overly relying on homology", in Proceedings of Intelligent Systems in Molecular Biology (ISMB), Vol. 3, 1995, pp.98–106.

O. Okun, “Protein fold recognition with k-local hyperplane distance nearest neighbor algorithm”, Proceedings of the 2nd European Workshop on Data Mining and Text Mining for Bioinformatics, 2004, pp. 47–53.

O. Maimon and L. Rokach, "Data Mining and Knowledge Discovery Handbook", Second Edition, Springer, 2010, ISBN:978

S. B. Kotsiantis, "Supervised Machine Learning: A Review of Classification Techniques", Informatica (31), 2007 249-268.

S. Y. M. Suganthan and P. N. Kalyanmoy, "Multi-class protein fold recognition using multi-objective evolutionary algorithms", in Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology,2004 pp.61–66.

T. J. P. Hubbard et al., "SCOP: a structural classification of proteins database", Nucleic Acids Research, Vol. 27, 1999, pp.254–256.

U. Pieper et al., "MODBASE, a database of annotated comparative protein structure models and associated resources", Nucleic Acids Research, Vol. 37, Database issue, 2009, D347-D354.

P. H. Chi et al., “Efficient SCOP fold classification and retrieval using index-based protein substructure alignments(IPSA)”, Bioinformatics Advance Access, Oxford Journals, 2009,

A. Zimek et al., “A Study of Hierarchical and Flat Classification of Proteins”, IEEE Tran. on Computational Biology and Bioinformatics”, Vol. 7 Issue 3, 2010, pp. 563-571.

D. Horvath and C. Chira, “Simplified chain folding models as meta heuristic benchmark for tuning real protein folding algorithms”, IEEE Congress on Evolutionary Computation, 2010, pp. 1-8.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.