Open Access Open Access  Restricted Access Subscription or Fee Access

Mining Tree Based Association Rules for XML Answering

S. Sakthi Nivetha, K. Suganya, J. Ragaventhiran

Abstract


The database research field has concentrated on the Extensible Markup Language (XML) due to its flexible hierarchical nature which can use to represent huge amounts of data  also it does not have absolute and fixed schema, but having possibly irregular and incomplete structure. It is a very hard task to extract information from semi structured documents and is going to become more and more difficult as the amount of digital information available on the Internet grows. On querying the XML document directly, the problem of information overload may occur where too much data are included in the answer because the set of keyword specified for the search capture too many meanings and the problem of information deprivation where either the use of inappropriate keywords, or wrong formulation of query prevents the user from receiving the correct answer. In this work we describe an approach to mine Tree based Association Rules(TARs) from XML documents. Such rules provide information on both the structure and the content of the XML document and the rules can be stored in XML format for the purpose of querying. The mined knowledge is used to provide the quick, approximate answers to queries and information about structural regularities that can be used as data guides for document querying. Here, we propose an algorithm that extends CMTree Miner (discovers closed and maximal frequent subtrees) to mine tree based association rules from XML document.


Keywords


Extensible Markup Language (XML), Approximate Query Answering, Data Mining, Intensional Information, Tree-Based Association Rules.

Full Text:

PDF

References


R. Agrawal and R. Srikant, “Fast Algorithms for MiningAssociation Rules in Large Databases,” Proc. 20th Int’l Conf. VeryLarge Data Bases, pp. 478-499, 1994.

T. Asai, K. Abe, S. Kawasoe, H. Arimura,H. Sakamoto, and S. Arikawa,Efficient Substructure Discovery from Large Semi-Structured Data,Proc. SIAM Int Conf. Data Mining, 2002.

Y. Xiao,J.F. Yao,Z. Li, and M.H. Dunham, Efficient Data Mining for Maximal Frequent Subtrees,Proc.IEEE Third Int. Conf. Data Mining, pp. 379-386, 2003.

J.W.W. Wan and G. Dobbie, Extracting Association Rules from XML Documents Using XQuery Proc.Fifth ACM IntâAZl Workshop Web

D. Braga, A. Campi, S. Ceri, M. Klemettinen, and P. Lanzi, Discovering Interesting Information in XMLData with Association Rules, Proc. ACM Symp.Applied Computing, pp. 450-454, 2003.

Y. Chi, Y. Yang, Y. Xia, and R.R. Muntz,CMTreeMiner: Mining both Closed and Maximal Frequent Subtrees, Proc. Eighth Pacific- Asia Conf.Knowledge Discovery and Data Mining, pp. 63-73,2004.

J. Paik, H.Y. Youn, and U.M. Kim, New Method forMining Association Rules from a Collection of XMLDocuments, Proc. Int Conf. Computational Scienceand Its Applications, pp. 936-945, 2005.

M.J. Zaki, Efficiently Mining Frequent Trees in aForest: Algorithms and Applications, IEEE Trans. Knowledge and Data Eng., vol. 17, no. 8, pp. 1021- 1035, Aug. 2005.

E. Baralis, P. Garza, E. Quintarelli, and L Tanca, Answering XML Queries by Means of Data Summaries, ACM Trans. Information Systems, vol. 25, no. 3, p. 10, 2007.

Mirjana Mazuran, Elisa Quintarelli, and Letiziatanca Data Mining for XML query-answering support, IEEE Transactions on Knowledge and DataEngineering, Volume:24 NO. 8, August 2012.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.