Open Access Open Access  Restricted Access Subscription or Fee Access

Applicability of Concurrent Data Structures for Data Intensive Data Mining Applications in Cloud Computing Environment

Asheesh Dixit, Akanksha Kherdikar Kurlekar

Abstract


Data Mining is fast becoming a pervasive technology that is poised to touch all aspects of our lives. This can be mainly attributed to the fact that data in the world is growing at an unprecedented rate. Today, decision making is more data-centric and complex than ever before. Computational requirements for such complex and data-intensive decision support systems are also increasing exponentially. All these data mining applications are sequential in nature and predominantly used in house; however there is underlying architecture that is multi-core and can be leveraged for data mining applications and same can be developed through concurrent data structures which will use this infrastructure (multi-core). This infrastructure can be easily provided through cloud computing environment by using IaaS. This paper attempts to develop algorithms which will overcome the drawbacks of sequential Data Structures and provide parallelization by using concurrent data structures. The usage of concurrent Data Structures will help in improving the Performance of Data Mining applications. The Applications and Algorithms suggested is an Approach to help gather, manage, store and present the Huge Date locally as well as on Cloud Computing environment.

Keywords


Cloud Computing (SAAS, PAAS, IAAS, DSAAS), Cluster Computing, Concurrent Data Structures, Data Intensive Data Mining, Grid Computing.

Full Text:

PDF

References


Buyya, R. (editor), High Performance Cluster Computing: Architectures and Systems, Prentice Hall PTR, NJ, USA, 1999.

Baker, M. (editor), Cluster Computing White Paper, http://www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/

Building the Grid: An Integrated Services and Toolkit Architecture for Next-Generation Networked Applications, Working Draft,http://www.gridforum.org/building_the_grid.htm.

Foster, I. and Kesselman, C. (editors), The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, 1999.

The Globus project, available at http//www.globus.org.

Dean, J. and Ghemawat, S., MapReduce: Simplified data processing on large clusters. In Proceedings of Operating Systems Design and Implementation (OSDI). San Francisco, CA. 137-150, 2004.

Dean, J. and Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM, Vol. 51, No. 1. January 2008.

Hadoop: Open source implementation of MapReduce. http://lucene. apache.org/Hadoop/.

Ekanayake, J., Pallickara S., and Fix, G., MapReduce for data intensive scientific analyses, In Proceedings of 4th International Conference on eSceince, 2008.

Weizhong Zhao, Huifang Ma and Qing He, Parallel K-means clustering based on MapReduce, CloudCom 2009, Jaatun, M. G. et. al. (eds), LNCS 5931, pp 674-691, 2009

Papadimitriou, S., Sun, J., DisCo: Distributed co-clustering with MapReduce, ICDM, 2008.

R. L. Grossman, R.L., Kasif, S., Mon, D., Ramu, A., and Malhi, B., The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters, Proceedings of the KDD-98 Workshop on Distributed Data Mining, AAAI, 1999.

Cannataro, M., Clusters and grids for distributed and parallel knowledge discovery, M. Bubaket. al. (eds.) HPCN 2000, LNCS 1823, pp 708-716. 2000.

Cannataro, M., Congiusta,A., Pugliese A.,Talia,D., and Trunfio,P., Distributed Data Mining on Grids: Services, Tools, and Applications, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 34, NO. 6, December 2004

ADaM http://datamining.itsc.uah.edu/adam/.

Stankovskia, V., Swainb, M., Kravtsovc, V. Niessend, T., Wegenerd, D., Kindermannd, J., Dubitzkyb, W., Grid-enabling data mining applications

with DataMiningGrid: An architectural perspective, Future Generation Computer Systems, Volume 24, Issue 4, Pages 259-279, April 2008.

Khoussainov, R., Zuo, X., and Kushmerick, N., Grid-enabled weka: A toolkit for machine learning on the grid,‖ 2004.

Weizhong Zhao, Huifang Ma and Qing He, Parallel K-means clustering based on MapReduce, CloudCom 2009, Jaatun, M. G. et. al. (eds), LNCS 5931, pp 674-691, 2009

Papadimitriou, S., Sun, J., DisCo: Distributed co-clustering with MapReduce, ICDM, 2008.

Fox, G.C., Algorithms and Application for Grids and Clouds, Keynote talk at SPAA‗2010, June 13-15, Thira, Greece, 2010.

JaliyaEkanayake, Hui Li, Bingjing Zhang, ThilinaGunarathne, Seung-HeeBae, Judy Qiu, and Geoffrey Fox, TWISTER: A runtime for iterative MapReduce, HPDC ‗ 10, June 20-25, Chicago, USA, 2010.

P Kuba, Data Structures for Spatial Mining; FIMU Report Series, 2001.

Raj P. GopalanYudhoGiriSucahyo, ITL-MINE: Mining frequent itemsets more efficiently. FSKD, pp 167-171, 2002.

ME Sayedet.al., FS Miner: Efficient and Incremental Mining of Frequent Sequence Patterns in Web logs, WIDM‘04, November 12–13, 2004.

FDR Lopez et. al., RSF-A new tree mining approach with an efficient data structure, Proceedings EUSFLATS, Spain, pp 1088-1093, 2005.

FDR Lopez et.al., Data Structures for Efficient Tree Mining: From Crisp to Soft Embedding Constraints, Int J Applied Mathematics and Computer Science (AMCS), 2008.

Bhagyashree Ambulkar, Vaishali Borkar, ―Data Mining in Cloud Computing‖, 2012 in International Journal of Computer Applications® (IJCA)ISSN: 0975 – 8887

IDC Blog,‗IT Cloud Services Forecast – 2008, 2012: A Key Driver of New Growth‘, available at: http://blogs.idc.com/ie/?p=224

Ardo, Christian. (2011, April 14), ―The Advantages of Using Cloud Computing‖, retrieved from http://cloudcomputing.sys-con.com/node/1792026.

Peter Mell and Tim Grance,‖TheNISTDefinition of CloudComputing‖, 2009 National Institute of Standards and Technology, information Technology Laboratory


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.