Open Access Open Access  Restricted Access Subscription or Fee Access

Analysis of Clustering Algorithm for Outlier Detection in Data Stream

H.P. Jani, I.K. Rajani

Abstract


Outlier detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior. Data stream mining has poses different challenges for outlier detection like concept drift, huge size and evolutionary data from data streams. Clustering techniques for data stream which helps to create a similar group of data are used to cluster the similar data items in data streams and also used to detect the outliers from data stream, so they are called as cluster based outlier detection. Which provides advantages like less memory requirement, less time consumption and it results exact outliers. In data streams if an object does not obey the behavior of normal data object is called as outlier. We proposed a new framework for outlier detection in data streams, which is combination of Neighbour based outlier detection approach and clustering based approach for outlier detection in data streams which provides better output in terms of true outliers from data streams.


Full Text:

PDF

References


Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.,”A framework for clustering evolving data streams,” In: Proceedings of the 29th international conference on Very large data bases, VLDB Endowment (2003) 81–92

Aggarwal, C.C., Han, J., Wang, J., Yu, P.S, “A framework for projected clustering of high dimensional data streams,” In: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30. VLDB ’04, VLDB Endowment (2004) 852–863

Aggarwal, C.C., Han, J., Wang, J., Yu, P.S, “On high dimensional projected clustering of data streams,” Data Mining and Knowledge Discovery 10 (2005) 251–273

Babcock, B., Datar, M., Motwani, R., O’Callaghan, L. “Maintaining variance and k-medians over data stream windows,” In: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. PODS ’03, New York, NY, USA, ACM (2003) 234–243

Ng, W., Dash, M.,”Discovery of frequent patterns in transactional data streams,” In: Transactions on Large-Scale Data- and Knowledge-Centered Systems II. Volume 6380 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg (2010) 1–30

Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K.: “Density-based clustering of data streams at multiple resolutions,” ACM Transactions Knowledge Discovery Data 3(3) (2009) 1–28

Zhou, A., Cao, F., Qian, W., Jin, C,”Tracking clusters in evolving data streams over sliding windows,” Knowledge and Information Systems 15(May 2008) 181–214

Zhu,Y.,Shasha,D,”Efficient elastic burst detection in data streams,” IN: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining.KDD ’03, New York, NY, USA,ACM (2003) 336-345

A.Amini and W.Teh Ying, “A comparative study of density-based clustering algorithms on data streams Micro-clustering approaches,” in Intelligent Control and Innovative Computing,ser. . Lecture Notes in Electrical Engineering, S. I. Ao, O. Castillo, and X. Huang, Eds. Springer US, 2012, vol. 110, pp. 275–287.

Rana Poonam, Deepika Pahuja, and Ritu Gautam,”A Critical Review on Outlier Detection Technique,” International Journal of Science and Research, Volume 3 Issue 12, pp. 2394-2403, December 2014

Tian Zhang, Raghu Ramakrishnan, Miron Livny. “BIRCH: An Efficient Data Clustering Method for Very Large Databases.” ACM SIGMOD, pp.103-114, 1996.

Yogitaa, Durga Toshniwala. “A Framework for Outlier Detection in Evolving Data Streams by Weighting Attributes in Clustering.” 2nd International Conference on Communication, Computing & Security, pp. 214–222, ICCCS-2012.

Yogita and Durga Toshniwal. “Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering.” World Academy of Science, Engineering and Technology, Vol:6, Nov 2012

Jiang, Mon-Fong, Shian-Shyong Tseng, and Chih-Ming Su. "Two-phase clustering process for outliers detection.“ Pattern recognition letters 22, no. 6, pp. 691-700, 2001.

Elahi, Manzoor, Kun Li, Wasif Nisar, Xinjie Lv, and Hongan Wang. "Efficient clustering- based outlier detection algorithm for dynamic data stream." In Fuzzy Systems and Knowledge Discovery, 2008. FSKD'08. Fifth International Conference on, vol. 5, pp. 298-304. IEEE, 2008.

Koupaie, Hossein Moradi, Suhaimi Ibrahim, and Javad Hosseinkhani. "Outlier Detection in Stream Data by Clustering Method." International Journal of Advanced Computer Science and Information Technology 2, no. 3, pp. 25 34, 2013.

Koupaie, Hossein Moradi, Suhaimi Ibrahim, and Javad Hosseinkhani. "Outlier Detection in Stream Data by Machine Learning and Feature Selection Methods." International Journal of Advanced Computer Science and Information Technology (IJACSIT) 2, pp. 17-24, 2013.

Rana Poonam, Deepika Pahuja, and Ritu Gautam. "A Critical Review on Outlier Detection Techniques." International Journal of Science and Research, Volume 3 Issue 12, pp. 2394- 2403, December 2014.

Gurav, Rohini Balkrishna, and Sonali Rangdale. "Hybrid Approach for Outlier Detection in High Dimensional Dataset." International Journal of Science and Research (IJSR), Volume 3, Issue 7, pp. 1743-1746, 2014.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.