Open Access Open Access  Restricted Access Subscription or Fee Access

Construction of Complete Data Set from Incomplete Data Sets using Statistical Method like Attribute Relation Analysis, Principal Component Analysis and their Comparison for Accuracy

Virendra V. Dakhode, A.B. Bagwan, P.K. Deshmukh

Abstract


Now a day‟s complete data sets required for various data mining task. Incomplete data sets have become almost ubiquitous in a wide variety of application domains. The incompleteness in these data sets may arise from a number of factors: in some cases it may simply be a reflection of certain measurements not being available at the time; in others the information may be lost due to partial system failure; or it may simply be a result of users being unwilling to specify attributes due to privacy concerns. When a significant fraction of the entries are missing in all of the attributes, it becomes very difficult to perform any kind of reasonable extrapolation on the original data. For such cases, we introduce the novel idea of conceptual reconstruction, in which we create effective conceptual representations on which the data mining algorithms can be directly applied. The attraction behind the idea of conceptual reconstruction is to use the correlation structure of the data in order to express it in terms of concepts rather the original dimensions. As a result, the reconstruction procedure estimates only those conceptual aspects of the data which can be mined from the incomplete data set, rather than force errors created by extrapolation. We demonstrate the effectiveness of the approach on a variety of real data sets.

Keywords


Weka Workbench, Attribute Relational Analysis (ARA), Principle Component Analysis (PCA), Data Sets Etc

Full Text:

PDF

References


Richard J. Hathaway, Member, IEEE, and James C. Bezdek, Fellow, IEEE, Fuzzy c-Means Clustering of Incomplete Data. Ieee transaction on system,man and cybernetics-part B:cybernetic,vol.31 no. 5,octomber 2001.

Dan Li, Chongquan Zhong, Liyong Zhang Fuzzy c-means Clustering of Partially Missing DataSets Based on Statistical Representation 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010)

Cunningham, S.J. (1995) “Machine learning and statistics: A matter of perspective”. Working Paper Series 95/11, Department of Computer Science, University of Waikato (Hamilton, New Zealand).

Principal component analysis Jaakko Hollmen ETT 1996 http://users.ics.aalto.fi/jhollmen/dippa/node30.html.

Principal Component Analysis Springer Series in Statistics 2nd ed., 2002, XXIX, 487 p. 28 illus ISBN 978-0-387-95442-4

Bingwei Han, Shuangjiu Xiao, Lu Liu, Zhijing Wu A New Method for Filling Missing Values by Gray Relational Analysis 978-1-4577-0536-6/11/$26.00 ©2011 IEEE

Holmes, G., Donkin, A., and Witten, I.H. (1994) “Weka: a machine learning workbench.” Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, Brisbane, Australia, pp. 357- 361.

McQueen, R.J., Neal, D.L., DeWar, R.E., and Nevill-Manning, C.G. (1994) “The WEKA machine learning workbench: its application to a real world agricultural database.” Proceedings of the Canadian Machine Learning Workshop, Banff, Alberta, Canada.

W.L. Buntine. A guide to the literature on learning probabilistic networks from data. IEEE

Transactions on Knowledge and Data Engineering, 8:195 210, 1996.

B.W. Porter, R. Bareiss and R.C. Holte, “Concept learning and heuristic classification in weaktheory domains,” Artificial Intelligence 45, 229–263, 1990.

J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992.

Data Mining Practical Machine Learning Tools and Techniques, 2nd edition,

http://www.cs.waikato.ac.nz/~ml/weka/book.html

A Tutorial on Attribute Relation Analysis

WEKA 3.5.2 http://www.cs.waikato.ac.nz/ml/weka/

WEKA CVS Repository: http://www.cs.waikato.ac.nz/ml/weka/.

J Pharm Biomed Anal. 1997 Jan; 15(4):431 8. Inclusion of the standard deviation of data in principal component analysis. Wallerstein S, Cserháti T, Forgács E, Kiss V.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.