Open Access Open Access  Restricted Access Subscription or Fee Access

Visualizing the Domain in 3-Dimension Using Semantic Clustering

Sanjay Madan, Purnima Ahuja, Shalini Batra

Abstract


To understand the software source code lots of approaches have been developed and many of them concern to the program structural information but this results in the loss of domain semantic crucial information contained in the text or symbols of source code. To understand software as a whole, we need to enrich these approaches with conceptual insights gained from the domain semantics. This paper proposes the mapping of domain to the code using the information retrieval techniques to use linguistic information, such as identifier names and comments in source code. Here we introduce the concept of Semantic Clustering, and an algorithm to group source artifacts based on how the synonymy and polysemy is related. Based on semantic similarity automatic labeling of the program code is done after detecting the clusters, and is visually explore in 3-Dimension format. The most important feature of theis approach is that it works at the source code textual level which makes it language independent. The approach correlates the semantics with structural information applies at different levels of abstraction (e.g.packages, classes, methods).


Keywords


Information retrieval, Latent Semantic Indexing, Semantic clustering, Software reverse engineering etc.

Full Text:

PDF

References


A. Abran, P. Bourque, R. Dupuis, L. Tripp, “Guide to the software engineering body of knowledge (ironman version),” Tech. rep., IEEE Computer Society (2004).

S. Ducasse, M. Lanza, “The class blueprint: Visually supporting the understanding of classes,” IEEE Transactions on Software Engineering 31 (1) (2005) 75–90.

Y. S. Maarek, D. M. Berry, G. E. Kaiser, “An information retrieval approach for automatically constructing software libraries,” IEEE Transactions on Software Engineering 17 (8) (1991) 800–813.

G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, E. Merlo, “Recovering traceability links between code and documentation,” IEEE Transactions on Software Engineering 28 (10) (2002) 970–983.

Adrian Kuhn, Stephane Ducasse, Tudor Girba, “Semantic Clustering:Identifying Topics in Source Code,” Language and Software Evolution Group, LISTIC, Universite de Savoie, France, 2006

Yo¨elle S. Maarek, Daniel M. Berry, and Gail E. Kaiser, “An information retrieval approach for automatically constructing software libraries,”IEEE Transactions on Software Engineering, 17(8):800–813, August 1991.

Giuliano Antoniol, Gerardo Canfora, Gerardo Casazza, Andrea De Lucia,and Ettore Merlo, “Recovering traceability links between code and documentation,” IEEE Transactions on Software Engineering,28(10):970–983, 2002.

S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, R. A.Harshman, “Indexing by latent semantic analysis,” Journal of the American Society of Information Science 41 (6) (1990) 391–407.

A. Marcus, A. Sergeyev, V. Rajlich, J. Maletic, “An information retrieval approach to concept location in source code”, in: Proceedings of the 11thWorking Conference on Reverse Engineering (WCRE 2004), 2004,pp. 214–223.

S. T. Dumais, J. Nielsen, “Automating the assignment of submitted manuscripts to reviewers,” In Research and Development in Information Retrieval, 1992, pp. 233–244.

J. I. Maletic, A. Marcus, “Using latent semantic analysis to identify similarities in source code to support program understanding,” In: Proceedings of the 12th International Conference on Tools with Artificial Intelligences (ICTAI 2000), 2000, pp. 46–53.

S. Kawaguchi, P. K. Garg, M. Matsushita, K. Inoue, “Mudablue: An automatic categorization system for open source repositories,” in:Proceedings of the 11th Asia-Pacific Software Engineering Conference(APSEC 2004), 2004, pp. 184–193.

A. Marcus, J. I. Maletic, “Identification of high-level concept clones in source code,” in: Proceedings of the 16th International Conference on Automated Software Engineering (ASE 2001), 2001, pp. 107–114.

A. De Lucia, F. Fasano, R. Oliveto, G. Tortora, “Enhancing an artefact management system with traceability recovery features,” in: Proceedings of 20th IEEE International Conference on Software Maintainance (ICSM 2004), 2004, pp. 306–315.

A. Marcus, D. Poshyvanyk, “The conceptual cohesion of classes,” in:Proceedings Internationl Conference on Software Maintenance (ICSM 2005), IEEE Computer Society Press, Los Alamitos CA, 2005, pp.133–142.

Adrian Kuhn, Stephane Ducasse, and Tudor Girba, “Semantic clustering:Exploiting source code linguistic information,” Information and Software Technology, submitted, 2006.

Bruno Caprile and Paolo Tonella. Nomen est omen, “Analyzing the language of function identifiers,” In Proceedings of 6th Working Conference on Reverse Engineering (WCRE 1999), pages 112–122.IEEE Computer Society Press, 1999.

Nicolas Anquetil and Timothy Lethbridg, “Extracting concepts from file names; a new file clustering criterion,” In International Conference on Software Engineering (ICSE’98), pages 84–93, 1998.

Jaques Bertin, “Graphics and Graphic Information Processing,” Walter de Gruyter, 1981.

Andrian Marcus and Denys Poshyvanyk, “The conceptual cohesion of classes,” In Proceedings Internationl Conference on Software Maintenance (ICSM 2005), pages 133–142, Los Alamitos CA, 2005.IEEE Computer Society Press.

Michael W. Berry, Susan T. Dumais, and Gavin W. O’Brien, “Using linear algebra for intelligent information retrieval,” SIAM Review,37(4):573–597, 1995

Adrian Kuhn, St´ephane Ducasse, and Tudor Gˆırba, “Enriching reverse engineering with semantic clustering,” In Proceedings of Working Conference on Reverse Engineering (WCRE 2005), pages 113–122, Los Alamitos CA, November 2005. IEEE Computer Society Press.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.