Open Access Open Access  Restricted Access Subscription or Fee Access

Adaptive Model to Estimate Most Significant Features for Oversampling Medical Data

Shereen A. Taie, Ahmed Elazab

Abstract


Computerized classification plays an important role in the classification of cancer stage. Hence, there is a growing need for automatic classification of cancer data. In this paper, a new model for stage classification of cancer data is developed. The proposed model use decision tree data mining technique that is trained on medical data to classify cancer data into several stages based on the weights of the most significant features. Hence, this paper proposes boost strapping approach for generating data oversampling to solve the problems of medical data scarcity. Ultimately, the proposed model improve the gain ratio technique to predict the weights of the factors (attributes) that affected in the staging of each patient case before and after oversampling. The performance of the proposed model is evaluated to develop more cost-effective and easy to use systems that support clinicians. The experimental results show that the proposed model precision is 94% for the original dataset and 90% for the oversampling dataset. The result illustrates the promising capabilities of the model for detecting breast cancer stages by minimum data set and minimum attributes.

Keywords


Classification, Cancer Data, Decision Trees, Gain Ratio, Oversampling.

Full Text:

PDF

References


J Han and M. Kamber, “Data Mining: Concepts and Techniques”, San Diego: Academic Press, 2001.

D. Hand, H.Mannila, and P.Smyth, “Principles of Data Mining”, London: MIT Press, 2001.

J. Han and M. Kamber, "Data Mining Concepts and Techniques”, Morgan Kauffman Publishers, 2000.

Agarwal G, Ramakant P, Forgach ER, Rendon JC, Chaparro JM, Basurto CS, et al. Breast cancer care in developing countries. World J Surg. 2009; 33(10):2069–76.

Kumar S, Burney IA, Al Ajmi A, Al Moundhri MS. Changing trends of breast cancer survival in sultanate of oman. J Oncol. 2011; 2011:316243.

Anu Alias, B.Paulchamy, "Detection of Breast Cancer Using Artificial Neural Networks". International Journal of Innovative Research in Science, Engineering and Technology, ISSN: 2319-8753, Vol. 3, Issue 3, March 2014.

Singletary, S. Eva. "Rating the risk factors for breast cancer" Annals of surgery 237, no. 4 (2003): 474-482.

Kanwal P. S. Raghav, Leonel F. Hernandez-Aya, Xiudong Lei, Mariana Chavez-Mac Gregor and et al., "Impact of low estrogen/progesterone receptor expression on survival outcomes in breast cancers previously classified as triple negative breast cancers", Cancer ; 118(6): 1498–1506. doi:10.1002/cncr.26431, 15 March, 2012.

Yang Li, Qing Zhang, Ruiyang Tian, Qi Wang and et al., " Lysosomal transmembrane protein LAPTM4B promotes autophagy and tolerance to metabolic stress in cancer cells", 71(24): 7481–7489. doi:10.1158/0008-5472.CAN-11-0940, Cancer Res., 15 December, 2011.

Hatem A Azim Jr, Fedro A Peccatori, Sylvain Brohée, Daniel Branstetter and et al., " RANK-ligand (RANKL) expression in young breast cancer patients and during pregnancy", DOI 10.1186/s13058-015-0538-7, Breast Cancer Research 17:24, 2015.

Amany Edward Seedhom1 and Nashwa Nabil Kamal, MD, “Factors Affecting Survival of Women Diagnosed with Breast Cancer in El-Minia Governorate, Egypt", Jul-Sep; 2(3): 131–138, Int J Prev Med. 2011.

P. J. Hardefeldt, S. Edirimanne, and G. D. Eslick, “Diabetes increases the risk of breast cancer: a meta-analysis.,” Endocr. Relat. Cancer, vol. 19, no. 6, pp. 793–803, Dec. 2012.

Pereira A1, Garmendia ML, Alvarado ME, Albala C, "Hypertension and the risk of breast cancer in Chilean women: a case-control study", Asian Pac J Cancer Prev. 2012; 13(11):5829-34.

Jasmin Teresa Ney, Ingolf Juhasz-Boess, Frank Gruenhage, Stefan Graeber and et al., " Genetic polymorphism of the OPG gene associated with breast cancer", BMC Cancer 2013, DOI: 10.1186/1471-2407-13-40, BioMed Central Ltd. 2013.

Doebar SC, van den Broek EC, Koppert LB, Jager A, Baaijens MH, Obdeijn IA, van Deurzen CH, " Extent of ductal carcinoma in situ according to breast cancer subtypes: a population-based cohort study", Breast Cancer Res Treat. 2016 Jun 18.

Krishnan K, Baglietto L, Apicella C, Stone J, Southey MC, English DR, Giles GG, Hopper JL "Mammographic density and risk of breast cancer by mode of detection and tumor size: a case-control study", Breast Cancer Res. 2016 Jun 18;18(1):63.

Kovacevic, M., Huang, R., & You, Y. (2006). Bootstrapping for variance estimation in multi-level models fitted to survey data. ASA Proceedings of the Survey Research Methods Section, 3260-3269. Chicago.

G. a. K. E. F. De'ath, "Classification and regression trees: a powerful yet simple technique for ecological data analysis," Ecology, pp. 81, no. 11 (3178-3192.), 2000.

J. &. B. R. (. Platkiewicz, “A threshold equation for action potential initiation," pLoS Comput Biol, 6(7), e1000850. 2010.


Refbacks

  • There are currently no refbacks.