Open Access Open Access  Restricted Access Subscription or Fee Access

Cancer Classification of Gene Expression Data by Fast Clustering and Fuzzy TSVM

B. Kalaivani, T. Vishnusaranya

Abstract


Cancer classification using gene expression data usually relies on traditional supervised learning techniques, in which only labeled data can be exploited for learning. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer type existing system developed  a classification system by identifying potential gene markers and subsequently applying the proposed technique on the selected genes for the classification of human cancer. In the proposed work, we use a fast clustering based feature selection technique and for classification we use a fuzzy based transductive support vector machine.  By using the fast clustering based feature selection we can obtain the high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient Minimum-Spanning Tree (MST) clustering method. FTSVM method generates membership values iteratively based on the positions of training vectors relative to the TSVM decision surface itself.


Keywords


Supervised Learning, Fuzzy Transductive Support Vector Machine, Knowledge Discovery Of Data Process, Pattern Evaluation, Gene Expression, Microarray.

Full Text:

PDF

References


S. Bandyopadhyay, A. Mukhopadhyay, and U. Maulik, “An improved algorithm for clustering gene expression data,” Bioinformatics, vol. 23, no. 21, pp. 2859–2865, 2007.

S. Bandyopadhyay, U. Maulik, and D. Roy, “Gene identification: Classical and computational intelligence approaches,” IEEE Trans. Syst., Man, Cybern. C, vol. 38, no. 1, pp. 55–68, Jan. 2008.

S. Bandyopadhyay, R. Mitra, and U.Maulik, “Development of the human cancer microRNA network,” BMC Silence, vol. 1, no. 6, 2010.

M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: A geometric framework for learning from examples,” Univ. Chicago, Chicago, IL, Tech. Rep. TR 2004–2006, 2004.

K. P. Bennett and A. Demiriz, “Semi-supervised support vector machines,” in Proc. Adv. Neural Inform. Process Syst., 1998, vol. 10, pp. 368– 374.

A. Blum and P. Langley, “Selection of relevant features and examples in machine learing,” Artif. Intell, vol. 97, no. 1/2, pp. 245–271, 1997.

C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Knowl. Discov. Data Mining, vol. 2, pp. 121–167, 1998.

[Online]. Available: http://www.biolab.si/supp/bi-cancer/projections/ index.htm

O. Chapelle, V. Sindhwani, and S. S. Keerthi, “Optimization techniques for semi-supervised support vectors,” J.Mach. Learn. Res., vol. 9, pp. 203– 233, 2008.

O. Chapelle and A. Zien, “Semi-supervised classification by low-density separation,” in Proc. 10th Int. Works. Artif. Intell. Stat., 2005, pp. 57–64.

Y. Chen, G. Wang, and S. Dong, “Learning with progressive transductive support vectormachine,” Pattern Recognit. Lett., vol. 34, no. 12, pp. 1845– 1855, 2003.

M. Dash andH. Liu, “Consistency based search in feature selection,” Artif. Intell, vol. 151, pp. 155–176, 2003.

A. Dupuy and R. M. Simon, “Critical review of public microarray studies in cancer outcome and guidelines on statistical analysis and reporting,” J. Nat. Cancer 1nst, vol. 99, pp. 147–157, 2007.

J. Ernst, Q. K. Beg, K. A. Kay, G. Bal´azsi, Z. N. Oltvai, and Z. Bar-Joseph, “A Semi-supervised method for predicting transcription factor–gene interactions in escherichia coli,” Plos Comput. Biol., vol. 4, p. e1000044, 2008.

L. Ein-Dor, O. Zuk, and E. Domany, “Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer,” Proc. Nat. Acad. Sci. USA, vol. 103, pp. 5923–5928, 2006.

A. J. Gentles, S. K. Plevritis, R. Majeti, and A. A. Alizadeh, “Association of a leukemic stem cell gene expression signature with clinical outcomes in acute myeloid leukemia,” J. Amer. Med. Assoc., vol. 304, pp. 2706–2715, 2010.

T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, “Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring,” Science, vol. 286, pp. 531–537, 1999.

M. Hollander and D. A. Wolfe, Nonparametric Statistical Methods, NJ: Wiley, 1999.

Q. H. Hu,D. R.Yu, and Z. X.Xie, “Information-preserving hybrid data reduction based on fuzzy-rough techniques,” Pattern Recognit. Lett., vol. 27, pp. 414–423, 2006.

R. Jensen and Q. Shen, “Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 12, pp. 1457–1471, Dec. 2004.

T. Joachims, “Transductive inference for text classification using support vector machines,” in Proc. Int Conf. Mach. Learning, 1999, pp. 200–209.

R. Johnson and T. Zhang, “On the effective Laplacian normalization for graph semi-supervised learning,” J. Mach. Learning Res., vol. 8, pp. 1489– 1517, 2007.

H.K. Kim, I. J. Choi, C. G. Kim, A. Oshima, and J. E. Green, “Gene expression signatures to predict the response of gastric cancer to cisplatin and fluorouracil,” J. Clin. Oncol., vol. 27, no. 15S, 2009.

D. C. Koestler, C. J. Marsit, B. C. Christensen, M. R. Karagas, R. Bueno, D. J. Sugarbaker, K. T. Kelsey, and E. A. Houseman, “Semi-supervised recursively partitioned mixture models for identifying cancer subtypes,” Bioinformatics, vol. 26, pp. 2578–2585, 2010.

E. Kreyszig, Introductory Mathematical Statistics. New York: Wiley, 1970.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.