Open Access Open Access  Restricted Access Subscription or Fee Access

Clustering of Yeast Gene Data Using WEKA

D. Gaya, Dr. Latha Parthiban

Abstract


In today’s world data mining have increasingly become very interesting and popular in terms of all application. The need for data mining is that we have too much data, too much technology but do not have useful information. This paper aims at clustering protein cellular localization of yeast genome. The knowledge of protein localization may provide valuable information in the target identification process for drug discovery and automated methods have become increasingly important in recent years owing to the steady increase in the amount of protein sequence data. Weka is a data mining tool and in this paper WEKA tool is used for clustering Yeast gene dataset obtained from UCI machine learning repository.

Keywords


Data Mining; Data Preprocessing; Cluster Analysis; Yeast; Weka Tool

Full Text:

PDF

References


Ana Carolina Lorena, Andre Carlos Ponce, Leon Ferreira de Carvalho “Protein cellular localization with multiclass support vector machines and decision trees”, BSB, 2005, pp. 42-53.

Aristoklis D. Anastasiadis, George D Magoulas, “Analysing the localization sites of proteins through neural networks ensembles”, Neural Comput. Appl. Vol. 15, 2006, pp.277—288.

Nayak, R., Jain L.C. and B. K. H. Ting, “Artificial Neural Networks Biomedical Engineering : A Review”, 2001

Shortliffe E.H. “Computer-based Medical Consultations”, MYSIN New York: Elsevier (1976)

P. Horton, K. Nakai, “A probabilistic classification system for predicting the cellular localization sites of proteins”, Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, AAAI Press, St. Louis. pp.109—115, (1996)

P. Horton, K. Nakai, “Better prediction of protein cellular localization sites with the k-nearest neighbors classifier”, Proceedings of Intelligent Systems in Molecular Biology, Halkidiki, Greece, pp. 368—383 (1997)

K. Nakai, M. Kanehisa, “Expert system for predicting protein localization sites in gram-negative bacteria, Proteins”, Struct. Funct. Genet. Vol. 11, 1991, pp.95—110.

K. Nakai, M. Kanehisa, “A knowledge base for predicting protein localization sites in eukaryotic cells”, Genomics, Vol.14, 1992, pp. 897- 91.

Weka: http://www.cs.waikato.ac.nz/~ml/weka/

Jiawei Han, “Data Mining Technology”, Department of Computer Science, University of Illinois at Urbana-Champaign, 2002.

David Hand, Heikki Mannila and Padhraic Smyth. “Principles of Data Mining”, MIT Press, 2001.




DOI: http://dx.doi.org/10.36039/AA032014009

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.