Open Access Open Access  Restricted Access Subscription or Fee Access

Hybrid Approaches for Outlier Detection and their Comparative Analysis

R. Anju, L N B Srinivas

Abstract


Data mining deals with the discovery of significant, hidden and interesting knowledge from large amounts of data. With the development of information technologies, the number of databases, as well as their dimension and complexity, grow rapidly, resulting in the necessity of automated analysis of great amount of heterogeneous structured information. Hence, the use of data mining systems came into existence. One of the biggest problems faced by the data mining systems today is to detect and remove the outliers. Outlier detection is one among the major issues in data mining systems. Outliers arise due to mechanical faults, changes in system behavior, fraudulent behavior, network intrusions or human errors. The paper discusses hybrid approaches for outlier detection methods used in data mining systems. Different algorithms of clustering and distance based methods are being discussed and experimented.


Keywords


Data Mining, Outlier Detection, Outliers, Clustering Algorithm, Distance Based Algorithm

Full Text:

PDF

References


F. Angiulli and F. Fassetti, "Detecting Distance-based Outliers in Streams of Data," In Proceedings of CIKM'07, Pages 811-820, November 6-10 2007.

Parneeta Dhaliwal, MPS Bhatia and Priti Bansal,‖ A Cluster-based Approach for Outlier Detection in Dynamic Data Streams (KORM: k-median OutlieR Miner)‖ JOURNAL OF COMPUTING, VOLUME 2, ISSUE 2, FEBRUARY 2010, ISSN: 2151-9617.PAGES 74-80.

Manzoor Elahi, KunLi, Wasif Nisar, Xinjie Lv, Hongan Wang, ‖Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream‖ In Proc .of Fifth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD.2008),ISBN: 978-0-7695-3305-6/08, pages 298-304.

E. M. Knorr and R. T. Ng. ―Algorithms for mining distance based outliers in large datasets‖ In Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pages 392–403, 1998.

Peng Yang; Biao Huang;‖ KNN Based Outlier Detection Algorithm in Large Dataset‖ International Workshop on Education Technology and Training, ISBN: 978-0-7695-3563-0, Pages 611 – 613, 2008.

Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.

Ramaswamy, S., Rastogi, R., and Shim, K., Efficient Algorithms for Mining Outliers from Large Data Sets, Proc. of ACM SIGMOD Int. Conf. on Management of Data, 2000, pp. 427–438.

Breunig, M.M., Kriegel, H.-P., Ng, R., and Sander, J., OPTICS-OF: Identifying Local Outliers, Proc. Conf. on Principles of Data Mining and Knowledge Discovery, Prague, 1999.

Tang, J., Chen, Z., Wai-chee Fu A., and Cheung, D., A Robust Outlier Detection Scheme for Large Data Sets, 2001.

Knorr, E.M., Ng, R.T., and Tucakov, V., Distance-Based Outliers: Algorithms and Applications, VLDB J., 2000, vol. 8, no. 3–4, pp. 237–253.

Yamanishi, K, Takeichi, J., and Williams, G., On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms, Proc. of the Sixth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Boston, 2000, pp. 320–324.

Hawkins, S., He, H., Williams, G., and Baxter, R., Outlier Detection Using Replicator Neural Networks, Proc. of the Fifth Int. Conf. on Data Warehousing and Knowledge Discovery, 2002.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.