Open Access Open Access  Restricted Access Subscription or Fee Access

Context-based Feature Extraction Technique – LSI vs LDA

A. M. Abirami, A. Askarunisa, T.S.B. Akshara, G. Prasannashree, K. Priyanga, K. Sarika


Internet has enormous amount of documents and they need to be annotated for further processing. Customer reviews or feedback on product is mostly done by using text mining or text analytics techniques. Feature extraction plays the vital role in text analytics methodology by which the most relevant features are extracted and used for text processing. This research article focuses on the use of Latent Dirichlet Allocation (LDA) as the feature extraction technique and it is compared with the prominent technique Latent Semantic Indexing (LSI).


Text Analytics, Feature Extraction, Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), Document Categorization.

Full Text:



Aswani Kumar, & Srinivas, S. (2009). On the Performance of Latent Semantic Indexing-based Information Retrieval. Journal of Computing and Information Technology, 17(3), 259–264.

Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3.

Chawla, K., Ramteke, A., Bhattacharyya, P.: “IITB-Sentiment-Analysts: Participation in Sentiment Analysis in Twitter SemEval 2013 Task”, Seventh International Workshop on Semantic Evaluation (2013), 495-500.

David Binkley, Daniel Heinz, Dawn Lawrie & Justin Overfelt. (2014). Understanding LDA in Source Code Analysis.Proceedings of 22nd International Conference on Programme Comprehension ICPC ’14, Hyderabad, India.

Guo, H., Zhu, H., Guo, Z., & Su, Z. (2009). Product feature categorization with multilevel latent semantic association, in Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China.

Harb, A., Plantié, M., Dray, G., Roche, M., Trousset, F., Poncelet, P.: “Web Opinion Mining: How to extract opinions from blogs?”, CSTST ’08 International Conference on Soft Computing as Transdisciplinary Science and Technology, (2008), 211-217.

Liu, J.; Cao, Y.; Lin, C. Y.; Huang, Y.; and Zhou, M. 2007.Low-Quality Product Review Detection in Opinion Summarization. InProceedings of the 2007 Joint Conferenceon Empirical Methods in Natural Language Processing andComputational Natural Language Learning.

Liu, B (2012), Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, San Rafael, California, USA.

Manning, C. D., Raghavan, P., Schūtze, & Hinrich. (2009). An Introduction to Information Retrieval. Cambridge, England: Cambridge University Press.

Meena, A., Prabhakar, T.V.: “Sentence Level Sentiment Analysis in the Presenceof Conjuncts Using Linguistic Analysis”, 29th European Conference on IR Research ECIR 2007, LNCS 4425 (2007), 573–580.

Pang, B., Lee, L.: “Thumps up? Sentiment Classification using Machine Learning techniques”, Proceedings of Empirical Methods in Natural Language Processing (2002), 79-86.

Qiu, G., Liu, B., Bu, J., Chen, C.: “Expanding Domain sentiment lexicon through double propagation”, Computational Linguistics, 37, 1 (2008), 9-27.

Saif, H., He, Y., & Alani, H. (2012). Semantic sentiment analysis of twitter. In the 11th International Semantic Web Conference (ISWC 2012), Boston, MA, USA.

Somprasertsri, G., Lalitrojwong, P.: “Mining Feature-Opinion in Online Customer Reviews for Opinion Summarization”, Journal of Universal Computer Science, 16, 6 (2010), 938-955.

Wei Wei, & John Atla Gulla. (2010). Sentiment Learning on Product Reviews via Sentiment Ontology Tree. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics Sweden, pp. 404–413.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.