Open Access Open Access  Restricted Access Subscription or Fee Access

Sentence Boundary Detection Using Maximum Entropy Model

Tarun Dhar Diwan, Priti Verma, Dr. Kamal Mehta

Abstract


Sentence boundary detection system has three independent applications (Rule-based, HMM, and Maximum Entropy). Maximum Entropy Model is the central part of this system, which achieved an error rate less than 2% on part of the Wall Street Journal (WSJ) Corpus with only eight binary features. The performance of the three applications is illustrated and discussed. Sentence boundary disambiguation is the task of identifying the sentence elements within a paragraph or an article. Because the sentence is the basic textual unit immediately above the word and phrase, Sentence Boundary Disambiguation (SBD) is one of the essential problems for many applications of Natural Language Processing – Parsing, Information Extraction, Machine Translation, and Document Summarizations. The accuracy of the SBD system will directly affect the performance of these applications. However, the past research work in this field has already achieved very high performance, and it is not very active now. The problem seems too simple to attract the attention of the researchers.

Keywords


Sentence Boundary Disambiguation, Maximum Entropy Model, Features, Generalized Iterative Scaling, Hidden Markov Model.

Full Text:

PDF

References


Aberdeen, J., J. Burger, D. Day, L. Hirschmann, P. Robinson, and M. Vilain. 1995. Description of the alembic system used for muc-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann.

Berger A. 1996. A Brief Maxent Tutorial. http://www-2.cs.cmu.edu/~aberger/maxent.html.

Berger A. 1997. The improved iterative scaling algorithm: a gentle introduction. http://www-2.cs.cmu.edu/~aberger/maxent.html.

Manning, C.D. and H. Schütze. 2002. Foundations of statistical natural language processing. The MIT Press, Cambridge/London.

Mikheev, A. 1998, Feature Lattices and Maximum Entropy Models.

Mikheev, A. 2000. Tagging Sentence Boundaries. In NACL‟2000 (Seattle) ACL, pp. 264 – 271.

Mikheev, A. 2000. Document Centered Approach to Text Normalization. In SIGIR'2000 (Athens) ACM June 2000. pp. 136-143.

Palmer, D.D. and M.A. Hearst. 1997. Adaptive multilingual sentence boundary disambiguation. Computational Linguistics, 23/3, pp. 241 – 267.

Reynar, J.C. and A. Ratnaparkhi. 1997. A Maximum Entropy Approach to Identifying Sentence Boundaries. In Processing of the ANLP97, Washington, D.C.

Shannon C.E. 1948. A mathematical theory of communication. Bell System Technical Journal 27:379 – 423, 623 – 656.

Ratanaparkhi, Adwait. 1996. A maximum entropy model for part of speech tagging.In Proceeding of the conference on Empirical Methods in Natural Language Processing, University of Pennsylvania


Refbacks

  • There are currently no refbacks.