An Efficient Approach to Detect Road Traffic Risk Analysis Using K-Means Algorithm

A. Gnanabaskaran; Dr. K. Duraiswamy

An Efficient Approach to Detect Road Traffic Risk Analysis Using K-Means Algorithm

A. Gnanabaskaran, Dr. K. Duraiswamy

Abstract

Clustering low dimensional spatial data for traffic riskanalysis has been major issue due to sparsity of data points.. Most of mthe clustering algorithm becomes inefficient if the required distance similarity measure is computed for low dimensional spatial space of high dimensional data with sparsity of data point along different dimensions. objective of this study were to contribute the complexity of projecting clusters for traffic risk analysis, (i) There is no support for minimizing the number of dimensions on spatial space in order to reduce the searching time (ii) Comparison of computation time of HARP ,Proclus, Doc, FastDoc, SSPC algorithms. During the first phase the satellite captured still images for different dimensions network are enhanced and this images are given as input to second phase spatial attribute relevance analysis for detecting dense and sparse regions after detecting dense and sparse regions the algorithm employees pruning technique to reduce the search space by taking only dense traffic regions and eliminating sparse traffic region and during third phase K-means algorithm is employed to project the clusters on different spatial dimensions. As per the Results first we showed that various projecting clustering algorithm on spatial space becomes inefficient if the number of dimensions increases .The new scheme proposed reduces the spatial dimension space so that it reduces the computation time and finally the result is compared with HARP ,Proclus,Doc,FastDoc,SSPC The algorithms produces acceptable results when the average cluster dimensionality is greater then 10%. Hence as for as Conclusion the findings suggested the overhead reasonably minimized and using simulations, we investigated the efficiency of our schemes in supporting low dimensional spatial clustering for traffic risk analysis.

Keywords

Data Mining, Clustering, Low Dimensions, Projected Clustering

Full Text:

PDF

References

R.Agrawal, J.Gehrke, D.Gunopulos and P.Raghavan, “Automatic Subspace Clustering of High Dimensional Data,” Data Mining and Knowledge Discovery, vol.11, no.1, pp.5-33, 2005.doi:10.1109/TKDE.2008.224.

A. K. Jain, M. N. Mutry, and P.J. Flynn, “Data Clustering: A

Review,”ACM Computing Surveys, vol. 31, no. 3, pp. 264–323, 1999.doi :10.1186/1471-2105-7S4-S10 3.

K. Beyer, J. Goldstein, R.Ramakrishan, and U. Shaft, “When Is Nearest Neighbor Meaningful?,” Proc. of the 7th International Conference on Database Theory, pp.217–235, 1999. doi:10.1145/1835449.1835482.

H. Liu and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Trans.Knowledge and Data Eng., vol. 17, no. 3,pp. 1–12, 2005.doi: 10.1109/TKDE.2005.66

C.C. Aggarwal, C. Procopiuc, J. L. Wolf, P.S. Yu, and J.S. Park, “Fast Algorithm for Projected Clustering,” Proc. ACM SIGMOD Conf.,pp.61–72,1999.doi:10.1109/ICDE.2009 .188. 1152

K.Y.L. Yip, D. W. Cheung, M. K.Ng and K. Cheung, “Identifying Projected Clusters from Gene Expression Profiles,” Journal of Biomedical Informatics, vol. 37, no. 5, pp. 345–357, 2004.doi:10.1109/TKDE. 2008.162

K.Y.L. Yip, D.W. Cheng and M.K.Ng,“On Discovery of Extremely Low-Dimensional Clusters using Semi-Supervised Projected Clustering,” Proc. ICDE, pp. 329– 340, 2005.

doi.ieeecomputersociety.org/10.1109/ICDE.2005.96.

C.C. Aggarwal and P.S. Yu,“Redefining Clustering for High

Dimensional Applications,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 2, pp. 210–225, 2002. doi:10.1023/A:1009769707641.

K.Y.L. Yip, D.W. Cheng and .K. Ng, “HARP: A Practical Projected Clustering Algorithm,” IEEE Trans. Knowledge and Data Eng., vol. 16,No. 11, pp. 1387–1397, 2004. doi: 10.1186/1471-2148- 8-116.

C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali, “Monte Carlo Algorithm for Fast Projective Clustering,” Proc.ACM SIGMOD,pp. 418 – 427, 2002. doi:10.1109/TKDE.2008.224.

M. Lung and N. Mamoulis, “Iterative Projected Clustering by Subspace Mining,” IEEE Trans. Knowledge and Data Eng., vol. 17,no.2,pp.176- 189,Feb.2005.doi:10.1109/TKDE.2005.29.

E.K.K. Ng, A.W. Fu, and R.C. Wong, “Projective Clustering by Histograms,” IEEE Trans. Knowledge and Data Eng., vol.17,no.3,pp.369-383,Mar.2005.doi:10.1109/ TKDE . 2005.47.

M. Bouguessa, S. Wang, and Q. Jiang, “A K-Means-Based Algorithm for Projective Clustering,” Proc. 18th IEEE Int’l Conf. Pattern Recognition (ICPR ’06), pp. 888-891,2006.

doi:10.1109/TKDE.2008.162

C.H. Cheng, A.W. Fu, and Y. Zhang, “Entropy-Based Subspace Clustering for Mining Numerical Data,” Proc. ACM

SIGMOD’99,pp.84-93,1999.doi:10.1109 /TKDE.2008.224 ...

S. Goil, H. Nagesh, and A. Choudhary, “MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets,” Technical Report CPDC-TR-9906-010, Northwestern,univ,1999.doi:10.1234/12345678.

K. Kailing, H.-P. Kriegel, and P. Kroger, “Density-Connected Subspace Clustering for High-Dimensional Data,” Proc. Fourth SIAM Int’l Conf. Data Mining (SDM ’04), pp. 246-257,2004.doi:10.1109/TKDE.2008.224.

L. Parsons, E. Haque, and H. Liu, “Subspace Clustering for High Dimensional Data: A Review,” ACM SIGKDD Explorations Newsletter,vol. 6, no. 1, pp. 90-105,2004.

Doi.ieeecomputersociety.org/10.1109/TKDE.2006.106.

K.Y.L. Yip, “HARP: A Practical Projected Clustering Algorithm for Mining Gene Expression data,” master’s thesis, The Univ.ofHongKong,2004.doi:10.1109/TKDE.

74.

K. Bury, Statistical Distributions in Engineering. Cambridge Univ.Press, 1998. doi: 10.1002/(SICI)1521

N. Balakrishnan and V.B. Nevzorov, A Primer on Statistical Distributions. John Wiley & Sons, 2003.

R.V. Hogg, J.W. McKean, and A.T. Craig, Introduction to Mathematical Statistics, sixth ed. Pearson Prentice Hall, 2005.

J.F. Lawless, Statistical Models and Methods for Lifetime Data. John Wiley & Sons, 1982.

M. Bouguessa, S. Wang, and H. Sun, “An Objective Approach to Cluster Validation,” Pattern Recognition Letters, vol. 27, no. 13, pp.1419-1430, 2006.

J.J. Oliver, R.A. Baxter, and C.S. Wallace, “Unsupervised Learning Using MML,” Proc. 13th Int’l Conf. Machine Learning (ICML ’96),pp.364-3721996.doi.ieeecomputersociety.org /10. 1109 /3.

G. Schwarz, “Estimating the Dimension of a Model,” Annals of Statistics, vol. 6, no. 2, pp.461464,1978.Doi:10.1214/aos/1176344136

A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. (Series B), vol. 39, pp. 1-37, 1977.doi:10.1234/12345678.

M.A.T. Figueiredo and A.K. Jain, “Unsupervised Learning of Finite Mixture Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 381-396, Mar.

doi:10.1364/AO.47.006904

G. McLachlan and T. Krishnan, The EM Algorithm and Extensions.John Wiley & Sons, 1997.

J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, 1981.

J. Rissanen, Stochastic Complexity in Statistical Inquiry. World Scientific, 1989.

S. Breunig, H.-P. Kriegel, R. Ng, and J. Sander, “LOF: Identifying Density-Based Local Outliers,” Proc. ACMSIGMOD ’00, pp. 93-104, 2000.

J. Han and M. Kamber, Data Mining, Concepts and Techniques. Morgan Kaufman, 2001.

F. Angiulli and C. Pizzuti, “Outlier Mining in Large High- Dimensional Data Sets,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 2, pp.369-383,Feb.2005.doi:10.1109/TKDE. 2005 .31.

E.M. Knorr, R.T. Ng, and V. Tucakov, “Distance-Based

Outliers:Algorithms and Applications,” The VLDB J., vol. 8, nos. 3/4,pp. 237-253, 2000.

T. Li, “A Unified View on Clustering Binary Data,” Machine Learning,vol. 62, no.3,pp.199-215, 2006 .doi :10.1006 /jmva.1997.1687.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me