Open Access Open Access  Restricted Access Subscription or Fee Access

Syntactical Knowledge based Stemmer for Automatic Document Summarization

Dipti Y Sakhare, Dr Raj Kumar

Abstract


With the rapid growth of the data in the Internet the
users are overloaded with huge amounts of information which is more difficult to access large volumes of documents. Automatic text summarization technique is an important activity in the analysis of high volume text documents. Text Summarization is condensing the source text into a shorter version preserving its information content and overall meaning. The proposed system generates a summary for a
given input document based on identification and extraction of important sentences in the document. The model will consist of four steps. In first stage, the system decomposes the given text into its constituent sentences. The second stage removes the stop words, stemming the text. Assignment of the POS tag will be done in third stage using dependency grammar. Finally the sentences will be
ranked depending on feature terms. The paper presents our work done till the stemming process. The stemmer implemented here promises good results.


Keywords


Stemming, Sentence Ranking, Text Summarization

Full Text:

PDF

References


I. Mani and M. Maybury. Advances in Automatic Text

Summarization. MIT Press, ISBN 0-262-13359-8, 1999.

Vishal Gupta , Gurpreet Singh LehalKuceral., A Survey of Text

Summarization Extractive, Journal of Emerging Technologies in Web

Intelligence, vol. 2, no. 3, august 2010

Hongyan Jing, Sentence Reduction for Automatic Text

Summarization, Proceedings of the sixth conference on Applied natural

language processing, Seattle, Washington, pp.310 - 315, 2000.

Rafeeq Al-Hashemi, Text Summarization Extraction System (TSES)

Using Extracted Keywords, International Arab Journal of e-

Technology, Vol. 1, No. 4, June 2010 pp 164 168

Kai Ishikawa et. al.; “Trainable Automatic Text Summarization Using

Segmentation of Sentence”; Multimedia Research Laboratories, NEC

Corporation 4-1-1 Miyazaki Miyamae-kuKawasaki-shi Kanagawa

-8555, 2003.

Ferranpla and Antoniomol i n a, Improving part-of-speech tagging

using lexicalized HMMs, Cambridge University Press, Natural

Language Engineering 10 (2): 167-189, 2004

Brill, E.A Simple Rule-Based Part-of-speech Tagger. Proceedings 3rd

Conference on Applied Natural Language Processing, ANLP, pp. 152-

ACL, 1992.

M. Santosh Kumar and Kavi Narayana (2006)“Corpus Based Statistical

approaches for stemming telugo” Journal of quantative linguistic, Vol.

No.16, Issue No.1 ,pp 130-133.

Frakes, W.B., 1992. Stemming algorithms. O'Neill, C. & Paice, C.D.,

What is Stemming?,

Brill, E. Transformation-based error-driven learning and natural

language processing: A case study in art-of-speech tagging.

Computational Linguistics 21(4): 543-565. 1995a

Ratnaparkhi, A. A maximum entropy part-of-speech tagger.

Proceedings1st Conference on Empirical Methods in Natural Language

Processing, EMNLP, 1996.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.