Open Access Open Access  Restricted Access Subscription or Fee Access

Text Mining: State-of-the-Art and Research Directions

N. Venkata Sailaja, Dr. L. Padmasree, Dr. N. Mangathayaru

Abstract


Today the size of unstructured text is increasing exponentially. The text is nothing but the combination of characters. In the environment where the size of unstructured text data is hugely more, to process such data by computers is a challenging task. Therefore to extract meaningful and useful patterns from the text, some pre-processing methods and algorithms are required. So, in general, text mining is the process of extracting valuable data and knowledge from the available unstructured text. To discover the patterns from the unstructured text is a major research issue in data mining.

In this survey, we discuss text mining, which is a young field evolved in recent past years, which deals with the areas such as information retrieval, machine learning, statistics, computational data sciences and advanced data mining. Here we have also described the main analysis tasks such as preprocessing of the original text, classification of text, clustering of text data, information extraction, classification techniques for text mining and its visualization. We also discussed future challenges of this area using different techniques, improvements and research directions in this paper.


Keywords


Text Mining, Preprocessing, Text Classification, Clustering, Machine Learning, and Information Extraction.

Full Text:

PDF

References


Abdullah Saeed Ghareb, Azuraliza Abu Bakar, Abdul Razak Hamdan, “Hybrid feature selection based on enhanced genetic algorithm for text categorization”, Expert Systems with Applications 49 (2016).

Vishwanath Bijalwan, Vinay Kumar, Pinki Kumari, Jordan Pascual, “KNN based Machine Learning Approach for Text and Document Mining”, International Journal of Database Theory and Application, Vol.7, No.1 (2014).

Divya Nasa, “Text Mining Techniques- A Survey”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 4,April (2012).

Robert Kudeliu, Mladen Konecki, Mirko Malekoviu, “Mind Map Generator Software Model with Text Mining Algorithm”, 33 Int. Conf. on Information Technology Interfaces, June 27-30, (2011), Cavtat, Croatia.

Libiao Zhang , Yuefeng Li, Chao Sun, Wanvimol Nadee, “Rough Set Based Approach to Text Classification”, IEEE/WIC/ACM International Conferences on Web Intelligence (WI) and Intelligent Agent Technology (IAT), (2013).

Sumit Goswami, Mayank Singh Shishodia, “A Fuzzy Based Approach to Text Mining and Document clustering”, arxiv.org/pdf/1306.4633.

Maria Muntean, Lucia Căbulea, Honoriu Vălean, “A New Text Clustering Method based on Huffman Encoding Algorithm”, (2014) IEEE.

Lincy Liptha R., Raja K., G.Tholkappia Arasu, “Enhancing Text Clustering Using Concept based Mining Model”, IJECSE.

A. Akilan, “Text Mining: Challenges and Future Directions”, IEEE (ICECS ‘2015).

S. S. Dhenakaran and S. Yasodha, “Semantic web mining: A critical review,” International Journal of Computer Science and Information Technologies, 2011, vol. 2, no. 5, pp. 2258–2261.

G. Stummea, A. Hotho, and B. Berendt, “Semantic web mining, State Of The Art And Future Directions A Knowledge And Data Engineering Group, University of Kassel, Institute of Information Systems, Humboldt University, Berlin, 2006.

M. A. Aufaure, B. L. Grand, M. Soto, and N. Bennacer, “Metadataand ontology-based semantic web mining,” in Web semantics & ontology, D. Taniar and J. W. Rahayu, Eds., 2006, pp. 259–296.

G. Sampson, M. D. Lytras, G. Wagner, and P. Diaz, “Ontologies and the semantic web for e-learning,” Educational Technology & Society, vol. 7, o. 4, pp. 26–28.

Berry Michael W., (2004), “Automatic Discovery of Similar Words”, in “Survey of Text Mining: Clustering, Classification and Retrieval”, Springer Verlag, New York, LLC, 24-43.

Navathe, Shamkant B., and Elmasri Ramez, (2000), “Data Warehousing and Data Mining”, in “Fundamentals of Database Systems”, Pearson Education pvt Inc, Singapore, 841-872.

Weiguo Fan, Linda Wallace, Stephanie Rich, and Zhongju Zhang, (2005), “Tapping into the Power of Text Mining”, Journal of ACM, Blacksburg.

Liu, F. & Lu, X. 2011. Survey on text clustering algorithm. In Proceedings of 2nd International IEEE Conference on Software Engineering and Services Science (ICSESS), China, 901-904.

Luger, G. F. 2008. Artificial Intelligence: Structure and Strategies for Complex Problem Solving. 6th edn. Addison Wesley.

Kano,Y., Baumgartner,W. A., McCrohon, L., Ananiadou, S., Cohen, K. B., Hunter, L. & Tsujii, T. 2009. Data Mining: Concept and Techniques. Oxford Journal of Bioinformatics, 25(15), 1997-1998.

Yang, Y. and Liu, X. (1999). “A Re-examination of Text Categorization Methods, in Proceedings of the 22nd Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’99), 1999, pp. 42-49.

D.Q. Miao, Q.G. Duan, H.Y. Zhang, J. Na, Rough Set based Hybrid Algorithm for Text Classification. Expert Systems with Applications 36, pp. 8932-8937, 2009.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.