Open Access Open Access  Restricted Access Subscription or Fee Access

An Improved Clustering Technique Based On Statistical Model Preprocessing Using Gene Expression Data

R. Mallika, G. Selvanayaki

Abstract


Micro arrays have become the effective, broadly used tools in biological and medical research to address a wide range of problems, including classification of disease subtypes and tumors. Many statistical methods are available for analyzing and systematizing these complex data into meaningful information, and one of the main goals in analyzing gene expression data is the detection of samples or genes with similar expression patterns. In this work, a comparison of performance of several feature selection methods based on data preprocessing including strategies of normalization or data reduction is studied and a new classical statistic technique is proposed for preprocessing. Then clustering technique is applied and promising results were achieved. The work also proves choice of a good preprocessing technique prior to clustering improves the performance. The results were proven to be the best in comparison with previous work.

Keywords


Clustering, Feature selection, Gene expression,

Full Text:

PDF

References


M.B. Eisen and P.O. Brown, “DNA arrays for analysis of gene expression”, Methods Enzymol, vol. 303, pp. 170-205, P.O. 1999

T. Kohonen. Self-Organization and Associative Memory. Spring-Verlag, Berlin, 1984.

D. Barbara. “An Introduction to Cluster Analysis for Data Mining”, http://www.ise.gmu.edu/~dbarbara/755/csurvey.pdf.

T. Hastie, R. Tibshirani, M.B. Eisen, A. Alizadeh, R. Levy, L. Staudt, W.C. Chan, D. Botstein and Patrick Brown. “Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns”, Genome Biology, Vol. 2(1):0003.1-0003.21, August 2000.

T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gassenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, D.D. Bloomfield, and E.S. Lander. “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring”, Science, Vol. 286(15):531-537, October 1999.

R. Tibshirani, T. Hastie, B. Narashiman, and G. Chu, “Diagnosis of multiple cancer types by shrunken centroids of gene expression,” Proc. Natl. Acad. Sci. USA, vol. 99, pp. 6567–6572, 2002.

M. Halkidi, Y. Batistakis and M. Vazirgiannis. “On Clustering Validation Techniques”, 2001.

Y. Cheng and G.M. Church. “Biclustering of expression data”, ISMB'00, 2000.

W.M. Rand. Objective criteria for the evaluation of clustering methods. 1971. Journal of the American Statistical Association. 846-850.

Mark A. Hall et al ,”IEEE Transactions on Knowledge and Data Engineering”, VOL 15 No 3 June 2003

Chrisding et al “K-means Clustering Via PCA” 21st Interational Conference on machine Learning, Canada-2004.

Ronnyluss et al “Clustering and Feature Selection Using Sparse Principal Component Analysis”, Princeton University, Jan-2008.

Seo Young Kim and Toshimitsu hamasaki “Evaluation of Clustering based on Preprocessing in Gene Expression Data” Vol 3 No 1 Dec 2007.


Refbacks

  • There are currently no refbacks.