Data Mining Systems Using Grid Computing
Distribution of data and computation allows for solving larger problems and execute applications that are distributed in nature. The Grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources. The Grid extends the distributed and parallel computing paradigms allowing resource negotiation and dynamical allocation, heterogeneity, open protocols and services. Grid environments can be used both for compute intensive tasks and data intensive applications as they offer resources, services, and data access mechanisms. Data mining algorithms and knowledge discovery processes are both compute and data intensive, therefore the Grid can offers a computing and data management infrastructure for supporting decentralized and parallel data analysis. This paper discusses how Grid computing can be used to support distributed data mining. Grid-based data mining uses Grids as decentralized high-performance platforms where to execute data mining tasks and knowledge discovery algorithms and applications. Here we outline some research activities in Grid-based data mining; some challenges in this area and ketch some promising future directions for developing Grid based distributed data mining.
F. Berman. From TeraGrid to Knowledge Grid, Communications of the ACM, 44(11), pp. 27–28, 2001.. (Book style with paper title and editor)
M. Cannataro, D. Talia, The Knowledge Grid, Communications of the ACM, 46(1), (2003), pp. 89–93.
M. Cannataro, D. Talia, P. Trunfio, KNOWLEDGE GRID: High Performance Knowledge Discovery Services on the Grid. Proc. GRID 2001, LNCS, pp. 38-50, Springer-Verlag, 2001.
Fayyad U.M. and Uthurusamy R. (eds.), Data mining and knowledge discovery in databases. Communications of the ACM 39, 1997.
K. Czajkowski et al., The WS-Resource Framework Version1.0.http://www. 106.ibm.com/developerworks/library/ws-esource/wswsrf. pdf.
I. Foster, C. Kesselman, J. Nick, and S. Tuecke, The Physiology of the Grid, In: F. Berman, G. Fox, and A. Hey (eds.), Grid Computing: Making the Global Infrastructure a Reality, Wiley, pp. 217–249, (2003).
M. Cannataro, A. Congiusta, C. Mastroianni, A. Pugliese, D. Talia, P. Trunfio, Grid-Based Data Mining and Knowledge Discovery, In: Intelligent Technologies for Information Analysis, N. Zhong and J. Liu (eds.), Springer-Verlag, chapt. 2 (2004), pp. 19–45.
D. Talia, P. Trunfio, O. Verta. Weka4WS: a WSRFenabled Weka Toolkit for Distributed Data Mining on Grids. Proc. PKDD 2005), Porto, Portugal, October 2005, LNAI vol. 3721, pp. 309–320, Springer-Verlag, 2005.
H. Witten and E. Frank. Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, 2000.
H. Kargupta and C. Kamath and P. Chan, Distributed & Parallel Data Mining: Emergence, Growth, and Future Directions, In: Advances in Distributed and Parallel Knowledge Discovery, AAAI/MITPress, pp.409–416,(2000).
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.