Open Access Open Access  Restricted Access Subscription or Fee Access

Parallel Gene Selection Process Using Mapreduce for Microarray Data Classification

C. Devi Arockia Vanitha

Abstract


Microarray technology is one of the vital tools that can monitor the expression levels of thousands of genes in a given organism. This technology is useful in the classification of cancer. One of the important issues in the classification of cancer microarray data is the selection of informative genes with high confidence from thousands of genes in the data that contributes to cancer. A dimensionality reduction method should eliminate genes that are irrelevant, redundant, or noisy for classification, while at the same time retain all the highly discriminative genes. In this paper, a novel method for gene selection based on mapreduce is proposed for improving the running time of the algorithm. The proposed approach analyzes cancer gene expression datasets, extract the most informative genes and classify the cancerous sample from normal samples using Support Vector Machine. The functioning of the gene selection algorithm is distributed through a set of mappers and reducers and thereby speeds up the classification process and reduce the memory requirements. The classifier model developed using support vector machine is used for evaluating the performance of the proposed gene selection approach. Simulation results show that the proposed approach has greater importance in clinical diagnosis and drug discovery for cancer with the ability to handle big collections of data providing a good accuracy and fast response times.

Keywords


Microarray, Gene Selection, Classification, Support Vector Machine, MapReduce

Full Text:

PDF

References


Han J and Kamber M, “Data Mining: Concepts and Techniques”, Morgan Kaufmann / Elsevier, Second Edition

Devi Arockia Vanitha C, Devaraj D and Venkatesulu M, “Real Coded Genetic Algorithm for Development of Optimal G-K Clustering Algorithm”, SEMCCO 2014, LNCS 8947, DOI: 10.1007/978-3-319-20294-5_23.

Ying Lu and Jiawei Han, “Cancer classification using gene expression data”,Information Systems 28(4):243-268, June 2003, DOI: 10.1016/S0306-4379(02)00072-8.

Furey T.S, Cristianini N, Duffy N, and David W, “Support Vector Machine Classification and Validation of Cancer Tissue Samples using Microarray Expression Data”, Bioinformatics, 16 (10) (2000), 906-914.

Brown P.S Michael, William Noble Grundy, David Lin, NelloCristianini, Charles Walsh Sugnet, Terrence S. Furey, Manuel Ares, Jr. and David Haussler, “Support Vector Machine Classification of Microarray Gene Expression Data”, Technical Report - UCSC-CRL-99-09, 1999.

Narayana A, Keedwell E.C, Gamalielsson J, Tatineni S, “Single-layer artificial neural networks for gene expression analysis”, Neurocomputing, 61 (2004), 217 – 240.

Golub T.R et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring”, Science, 286 (15), (1999), 531-537.

Wei Sha-Sha, Lu Hui-Juan, Jin Wei and Li Chao, “A Construction Method of Gene Expression Data Based on Information Gain and Extreme Learning Machine Classifier on Cloud Platform”, International Journal of Database Theory and Application Vol.7, No.2 (2014), pp.99-108 http://dx.doi.org/10.14257/ijdta.2014.7.2.10.

Isaac Trigueroa,, Daniel Peraltaa,, Jaume Bacarditb, Salvador Garc´ıac,Francisco Herrera, “MRPR: A MapReduce Solution for Prototype Reduction in Big Data Classification”, Neurocomputing

Sara del R´ıo et al., “A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules”, International Journal of Computational Intelligence Systems, Vol. 8, No. 3 (2015) 422-437.

Aisling O’Driscolla, Jurate Daugelaiteb, Roy D. Sleator, “Big data’, Hadoop and cloud computing in genomics”, Journal of Biomedical Informatics, Volume 46, Issue 5, October 2013, Pages 774-781.

A.K.M. Tauhidul Islam, Byeong-Soo Jeong, A.T.M. Golam Bari, Chae-Gyun Lim and Seok-Hee Jeon, “MapReduce based parallel gene selection method”, Appl Intell (2015) 42:147–156, DOI 10.1007/s10489-014-0561-x.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.