A Novel Approach Based On Pattern Discovery and Supervised Learning to Identify Comparative Sentences

T. Viveka

doi:10.36039/AA052011004

A Novel Approach Based On Pattern Discovery and Supervised Learning to Identify Comparative Sentences

T. Viveka

Abstract

This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification or classification. Sentiment classification studies the problem of classifying a document or a sentence based on the subjective opinion of the author. An important application area of sentiment/opinion identification is business intelligence as a product manufacturer always wants to know consumers’ opinions on its products. Comparisons on the other hand can be subjective or objective. Furthermore, a comparison is not concerned with an object in isolation. Instead, it compares the object with others. An example opinion sentence is “the sound quality of CD player X is poor”. An example comparative sentence is “the sound quality of CD player X is not as good as that of CD player Y”. Clearly, these two sentences give different information. Their language constructs are quite different too. Identifying comparative sentences is also useful in practice because direct comparisons are perhaps one of the most convincing ways of evaluation, which may even be more important than opinions on each individual object. This paper proposes to study the comparative sentence identification Problem. It first categorizes comparative sentences into different types, and then presents a novel integrated pattern discovery and supervised learning approach to identifying comparative sentences from text documents, indicative that lexicons built using semi supervised methods such as SentiWordNet can be an important resource in sentiment classification tasks. Considerations on future improvements are also presented based on a detailed analysis of classification results.

Keywords

Comparative Sentences, Wordnet, Sentiment Classification, Text Mining. Subjectivity Detection, Sentiwordnet.

Full Text:

PDF

References

Agrawal, R. Srikant, R. Mining sequential patterns. ICDE’94, 1994.

Ayres, J., Flannick, J., Gehrke, J., Yiu, T. Sequential pattern mining using a bitmap representation. KDD’02

Brill, E. A simple rule-based part of speech tagger. ANL, 1992.

Das, S. and Chen, M., Yahoo! for Amazon: Extracting market sentiment from stock message boards. APFA, 2001.

Dave, K., Lawrence, S., and Pennock, D. Mining the Peanut Gallery: Opinion extraction and semantic classification of product reviews. WWW’03, 2003.

Doran, C., Egedi, D., Hockey, B. A., Srinivas, B., Zaidel, M. XTAG System-A wide coverage grammar for English. COLING’94, 1994.

Fellbaum, C. WordNet: an electronic lexical database, MIT Press, 1998.

Carenini, G. Ng, R., Zwart, E. Extracting knowledge from evaluative text. ICKC’05, 2005.

Hatzivassiloglou, V., and Wiebe, J. Effects of adjective orientation and gradability on sentence subjectivity. COLING’00, 2000.

Hearst, M., Direction-based text interpretation as an information access refinement. In P. Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Assoc., 1992.

Hu, M., and Liu, B. Mining and summarizing customer reviews. KDD’04, 2004.

Jindal, N., and Liu, B. Mining comparative sentences and relations. AAAI’06, 2006

Joachims, T. Making large-scale SVM learning practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), 1999.

Abbasi, A., Chen, H., & Salem, A. (2008). Senti- mentAnalysis in Multiple Languages: Feature Se- lection for Opinion Classification in Web Forums. ACM Transactions on Information Systems, 26(3).

Crammer and Chechik, 2004] K. Crammer and G. Chechik. A needle in a haystack: local one-class optimization, ICML, 2004.

Dempster et al., 1977] A. Dempster, N. Laird and D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 1977.

Denis, 1998] F. Denis, PAC learning from positive statistical queries. ALT, 1998.

Denis, 2002 F. Denis, R. Gilleron, and M. Tommasi. Text classification from positive and unlabeled examples. IPMU, 2002.

Cohen,W.,Hurst,M.,Jensen,L.:Aexiblelearningsystemforwrappingtablesandlistsinhtmldocuments.In:TheEleventhInternationalWorldWideWeb ConferenceWWW-2002.(2002)

Feldman,R.,Aumann,Y.,FinkelsteinLandau,M.,Hurvitz,E.,Regev,Y.,Yaroshevich,A.:Acomparativestudyofinformationextractionstrategies.In:CICLing

ProceedingsoftheThirdInternationalConferenceonComputationalLinguisticsandIntelligentTextProcessing.(2002)349359

Freitag,D.,Kushmerick,N.:Boostedwrapperinduction.In:ProceedingsoftheSeventeenthNationalConferenceonArticialIntelligenceandTwelfthConferenceonInnovativeApplicationsofArticialIntellgence.(2000)577583Freitag,D.,McCallum,A.K.:Informationextracti withhmmsandshrinkage.

In:ProceedingsoftheAAAI99WorkshoponMachineLearningforInformatino Extraction.(1999)

Hsu,C.N.,Dung,M.T.:Generatingnite-statetransducersforsemi-structured dataextractionfromtheweb.InformationSystems 23 (1998)521538

Kushmerick,N.:Wrapperinductionforinformationextraction.PhDthesis(1997) Chairperson-DanielS.Weld.

Lerman,K.,Getoor,L.,Minton,S.,Knoblock,C.:Usingthestructureofwebsitesforautomaticsegmentationoftables.In:SIGMOD04:Proceedingsofthe2004ACMSIGMODinternationalconferenceonManagementofdata.(2004)119130Muslea,I.,Minton,S.,Knoblock,C.:Ahierarchicalapproachtowrapperinduc-

Pinto,D.,McCallum,A.,Wei,X.,Croft,W.B.:Tableextractionusingconditionalrandomelds.In:SIGIR03:Proceedingsofthe26thannualinternationalACMSIGIRconferenceonResearchanddevelopmentininformaionretrieval.(2003)235242Zhai,Y.,Liu,B.:Webdataextractionbasedonpartialtreealignment.In:WWW

DOI: http://dx.doi.org/10.36039/AA052011004

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me