Open Access Open Access  Restricted Access Subscription or Fee Access

An Implementation of FP-Growth Algorithm for Software Specification Mining

R. Jeevarathinam, Dr. Antony Selvadoss Thanamani

Abstract


Specification mining is a machine learning approach for discovering formal specifications of the protocols that code must obey when interacting with an application program interface or abstract data type. Two major concerns in engineering software systems are high maintenance costs and reliability of systems. To reduce maintenance efforts, there is a need for automated tools to help software developers understand their existing code base. So, there is a need to extract specifications to aid program comprehension. In this paper a novel technique to efficiently mine software specifications, called FP_TraceMiner is proposed which mines software specifications from program execution traces. The FP-growth algorithm is currently one of the fastest approaches. To address the limitations of Apriori-like methods, a mining paradigm has been proposed, which uses FP-growth algorithm which transforms a database into FP-tree stored in main memory and then performs mining on that optimized FP-tree structure.

Keywords


Mining specifications, program execution traces, Apriori, FP_growth, frequent itemsets, frequent pattern.

Full Text:

PDF

References


G. Ammons, R. Bodik, and J. R. Larus. Mining specifications. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 4–16, 2002.

J. Li, H. Li, L.Wong, J.Pei, and G. Dong. Minimum description Length principle: Generators are preferable to closed patterns. In AAAI, 2006.

J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann, 2001.

J. Pei, J.Han , B. Mortazavi-Asl, H.Pinto, Q.Chen, U. Dayal, and M.-C. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In ICDE, 2001.

G. Ammons, D. Mandelin, R. Bodik, and J.R. Larus. Debugging temporal specifications with concept analysis. In Programming Language Design and Implementation, San Diego, California, June 2003.

J.Henkel and A. Diwan. Discovering algebraic specifications from java classes. In Proc. of the Euro. Conf. of Object Oriented Programming, 2003.

Johannes Henkel and Amer Diwan. A Tool for Writing and Debugging Algebraic Specifications, 2004.

X. Yan, J. Han, and R. Afshar. CloSpan: Mining closed sequential patterns in large datasets. In SDM, 2003.

Rajeev Alur and Pavol Gerny. Synthesis of Interface specifications for Java Classes, In Proc. Symposium on Principles of Programming Languages (POPL), pages 98-109, 2005.

J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M.Das. Perracotta: Mining temporal API rules from imperfect traces. In Proc. of Int. Conf. on Software Engineering, 2006.

Sujitkumar Chakrabarti, Y.N. Srikant. Specifications based Regression Testing using Explicit State Space Enumeration, April 2006.

D.Lo and S-C.Khoo. QUARK: Empirical assessment of automaton-based specifications miners. In Proc. Of Working Conf. on Reverse Engineering, 2006.

D. Lo and S-C. Khoo. SMArTIC: Toward building an accurate, robust and scalable specifications miner. In SIGSOFT FSE, 2006.

D. Lo, S. Maoz, and S.-C. Khoo. Mining modal scenario-based specifications from execution traces of reactive systems. In ASE, 2007.

D.Lo , S-C Khoo and Chao Liu. Mining Temporal Rules from Program Execution Traces, In proceedings of the 3rd International Workshop on Program Comprehension through Dynamic Analysis (PCODA'07). Vancouver, Canada. Oct 29, 2007.

D. Lo, S.-C. Khoo, and C. Liu. Efficient mining of iterative patterns for software specifications discovery. In Proc. of SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2007.

Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. Mining API Patterns as Partial Orders from Source Code: From Usage Scenarios to Specifications, September 2007.

D.Lo and S-C Khoo Software Specifications Discovery: A New Data mining Approach. In NSF NGDM, 2007.

D. Lo, J. Li, and S.-C. Khoo. Mining and ranking generators of sequential patterns. 2008.

N.Pasquier, Y.Bastide, R. Taouil and L.Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, 1999.

Apache software foundations. Jakarta Commons/Net. http://jakarta.apache.org/commons/net/.

The Java hotspot performance engine architecture. http://java.sun.com/products/hotspot/whitepaper.html.

JRat. http://jrat.sourceforge.net.

Agrawal R., Imielinski T., Swami A: Mining Association Rules Between Sets of Items in Large Databases. Proc. of the 1993 ACM SIGMOD Conf. on Management of Data (1993).

Agrawal R., Srikant R.: Fast Algorithms for Mining Association Rules. Proc. of the 20th Int’l Conf. on Very Large Data Bases (1994).

Han J., Pei J.: Mining Frequent Patterns by Pattern-Growth: Methodology and Implications. SIGKDD Explorations, December 2000 (2000).

Han J., Pei J., Yin Y.: Mining frequent patterns without candidate generation. Proc. of the 2000 ACM SIGMOD Conf. on Management of Data (2000).

Imielinski T., Mannila H.: A Database Perspective on Knowledge Discovery. Communications of the ACM, Vol. 39, No. 11 (1996).

B. Goethals, "Memory Issues in Frequent Pattern Mining," in Proceedings of SAC'04. Nicosia, Cyprus: ACM, 2004.

B. Goethals, "Survey on Frequent Pattern Mining," vol. 2004. Helsinki, 2003, pp. 43.

M. J. Zaki and K. Gouda, "Fast Vertical Mining Using Diffsets," presented at 9th International Conference on Knowledge Discovery and Data Mining, Washington, DC, 2003.

R. Dass and A. Mahanti, "Frequent Pattern Mining in Real-Time – First Results," presented at TDM2004/ACM SIGKDD 2004, Seattle, Washington USA, 2004.

R. Dass and A. Mahanti, "An Efficient Technique for Frequent Pattern Mining in Real-Time Business Applications," presented at 38th IEEE Hawaii International Conference on System Sciences (HICSS 38), Big Island, 2005.

R. Agarwal, T. Imielinski, and A. Swami, "Mining Association Rules Between Sets of Items in Large Datasets," in Proceedings of the ACM SIGMOD Conference on Management of Data. Washington,D.C.: ACM, 1993, pp. 207-216.

M. J. Zaki, "Scalable Algorithms for Association Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 12, pp. 372-390, 2000.

R.J.Bayard, “Efficiently Mining Long Patterns from Databases ,” In Proc. of the ACM SIGMOD Conference on Management of Data, pp. 85-93, 1998.

G.Grahne and J.Zhu, “High Performance Mining of Maximal Frequent Itemsets,” In Proc. of SIAM’03 Workshop on High Performance Data Mining, 2003.

G. Grahne and J. Zhu, “Efficiently Using Prefix-trees in MiningFrequent Itemsets,” In Proc. of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, 2003.

J.Pei, J.Han and R.Mao, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” In Proc. of DMKD’00, 2000.

A.W.-C Fu., R.W.-W. Kwong and J.Tang, “Mining N-most Interesting Itemsets,” In Proc. of the ISMIS’00, 2000.

J. Han, J. Wang, Y. Lu and P. Tzvetkov, “Mining Top-k Frequent Closed Patterns without Minimum Support,” In Proc. of IEEE ICDM Conference on Data Mining, 2002.

R.Jeevarathinam and Antony Selvadoss Thanamani, “An Efficient Algorithm for Mining Software Specifications from Program Execution Trace” presented at International Conference on Sensors, Security, Software and Intelligent Systems (ISSSIS 2009).


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.