Open Access Open Access  Restricted Access Subscription or Fee Access

A Survey on Authorship Attribution Issues of Arabic Text

Abeer H. El Bakly, Nagy Ramadan Darwish, Hesham A. Hefny

Abstract


Authorship attribution is a hot research domain which includes many issues such as discovering whether a specific text possesses to a specific author or not, solving the problem of authorship attribution claim between authors of one disputed work, discrimination between two or more stylometric of authors, detecting the most probable author for an unknown text and studying the difference between Stylometric of authors according to gender or political view or religion or education or job or motivation... and so on. Many attempts have been started to solve these problems using statistical methods such as Naı¨ve Bayes and Bayesian. Recently, other efforts have been done in this domain by utilizing artificial intelligence techniques such as machine learning and natural language processing... and so on. This paper presents a literature review of the utilization of machine learning techniques in authorship attribution. Besides, it covers the main approaches to solve different recent issues in Arabic authorship attribution.

Keywords


Arabic Text, Artificial Intelligence, Authorship Attribution, Machine Learning, Stylometric

Full Text:

PDF

References


A. Abbasi, H.Chen, “Applying authorship analysis to extremist-group web forum messages”, IEEE Intelligent Systems, vol. 20, no.5, PP. 67-75, 2005.

H. Ahmed, “Dynamic Similarity Threshold in Authorship Verification: Evidence from Classical Arabic”, Proc. 3rd International Conf. on Arabic Computational Linguistics, Dubai, United Arab Emirates, 2017.

A. Omar, W. I. Hamouda, “The Effectiveness of Stemming in the Stylometric Authorship Attribution in Arabic”, International Journal of Advanced Computer Science and Applications (IJACSA), vol. 11, no. 1, 2020.

A. Khatun, A. Rahman, M. S. Islam, M. E. Jannat, “Authorship Attribution in Bangla literature using Character-level CNN”, In Proc. 22nd International Conf. on Computer and Information Technology (ICCIT), arXiv: 2001.05316v1 [cs.CL], 2020.

A. S. Altheneyan, M.E.Menai, “Naı¨ve Bayes classifiers for authorship attribution of Arabic texts”, Journal of King Saud University, Elsevier, vol. 26, no.1, PP.473-484, 2014.

E. Ferracane, S. Wang, R. J. Mooney, “Leveraging Discourse Information Effectively for Authorship Attribution”, Proc. the Eighth International Conf. Natural Language Processing, Asian Federation of Natural Language Processing, Taipei, Taiwan, pp. 584-593, 2017.

D. Lichtblau, C. Stoean, “Authorship Attribution Using the Chaos Game Representation”, arXiv:1802.06007v1 [cs.CL], 2018.

D. Fifield, T. Follan, E. Lunde,., “Unsupervised authorship attribution”, ArXiv, abs/1503.07613, 2015.

D. I.Holmes, “Authorship Attribution. Computers and the Humanities”, vol. 28, PP.87–106, 1994.

P. Juola,, “Authorship Attribution. Foundations and Trends in Information Retrieval”, vol. 1, no. 3, PP. 233–334, 2006.

A. Alqurneh, A.Mustapha, M.AA.Murad, N.M.Sharef, “Stylometric model for detecting oath expressions: A case study for Quranic texts”, J.Digital Scholarship in the Humanities, Oxford University Press, 2014.

S. Mekala,., R.R. Tippireddy, V.V. Bulusu, “A Novel Document Representation Approach for Authorship Attribution”, International Journal of Intelligent Engineering and Systems, vol. 11, no. .3, 2018.

F. Mosteller,. D.L.Wallace, “Inference and disputed authorship: The Federalist”, Addison- Wesley, 1964.

S. Nirkhi, R.V. Dharaskar, ” Comparative study of Authorship Identification Techniques for Cyber Forensics Analysis”, International Journal of Advanced Computer Science and Applications (IJACSA), vol. 4, no. 5, PP.32-35, 2013.

A. H. El Bakly, N. R Darwish, H. A. Hefny. “Using Ontology for Revealing Authorship Attribution of Arabic Text”, International Journal of Engineering and Advanced Technology (IJEAT), vol. 9, no. 4, pp.143-151, 2020.

F. Peng, D. Shuurmans, S. Wang, , “Augmenting naive Bayes classifiers with statistical language models”, Information Retrieval Journal, vol. 7, no. 1, PP. 317-345, 2004.

R. Abooraig, A. Alwajeeh, M. Al-Ayyoub, I. Hmeidi “On the Automatic Categorization of Arabic Articles Based on Their Political Orientation”, J.SDIWC, 2014.

S. Ruder, P. Ghaffari, J.G. Breslin,”Character-level and Multi-channel Convolutional Neural Networks for Large-scale Authorship Attribution”, arXiv:1609.06686 [cs.CL], 2016.

H. Sayoud, “Author discrimination between the Holy Quran and Prophet’s statements”, Literary and Linguistic Computing (LLC), vol. 27, no. 4, PP. 427-444, 2012.

F. Sebastiani, “Machine learning in automated text categorization”, ACM Computing Surveys, vol. 34, no. 1, 2002.

K. Shaker, “Investigating Features and Techniques for Arabic Authorship Attribution”, PhD dissertation, Heriot-Watt Univ, Malaysia, 2012.

P. Shrestha, S. Sierra,, F.A. González., P. Rosso, M.M. Gómez, T.Solorio, , “Convolutional Neural Networks for Authorship Attribution of Short Texts”, Proc. 15th Conf. the European Chapter of the Association for Computational Linguistics: vol. 2, Short Papers, PP.669–674, Valencia, Spain, April 3-7, 2017.

E. Stamatatos, , “A Survey of Modern Authorship Attribution Methods”, Journal of the American Society for Information Science and Technology (JASIST), vol. 60, no. 3, PP. 538-556, 2009.

M. Chaudhari, S.Govilkar, “TECHNIQUES FOR SENTIMENT CLASSIFICATION”, International Journal on Computational Sciences & Applications (IJCSA), vol. 5, no. 3, 2015.

M. Al-Yahya, “Stylometric analysis of classical Arabic texts for genre detection”, The Electronic Library, Vol. 36, No. 5, pp.842-855, 2018.

R. Zheng, J..Li, H.Chen, Z. Huang, ” A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Technique”, JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, vol. 57, no.3, pp.378–393, 2006.

A. Gungor, “Benchmarking Authorship Attribution Techniques using over a thousand books by Fifty Victorian Era Novelists Investigating “, Mcs degree, Purdue Univ, Indianapolis, Indiana, 2018.

W. Anwar, I.S. Bajwa, S. Ramzan, “Design and Implementation of a Machine Learning-Based Authorship Identification Model”, J.Scientific Programming, Hindawi, 2019.

G. U. Yule, “The statistical study of literary vocabulary”. Cambridge University Press, 1944.

G. K. Zipf, “Selected studies of the principle of relative frequency in language”, Harvard University Press, Cambridge, MA, 1932.

S. H. M. Al-Azani, “Authorship Attribution of Arabic Texts”, Mcs. Degree, King Fahd Univ of Petroleum & Minerals, Saudi Arabia, 2014.

S. M. Hamdan. and J.M. Hamdan,” Authors' perceptions of author’s gender: A myth or a truth?”, International Journal of English and Literature, vol. 4, no.10, pp. 523-528, 2013.

M. AL-SAREM, F.SAEED, A. ALSAEEDI, W.BOULILA, T. AL-HADHRAMI, “Ensemble Methods for Instance-Based Arabic Language Authorship Attribution”, IEEE Access, vol.8, 2020.

A. Omar, B. I.Elghayesh, M. A. M.Kassem, ”Authorship Attribution Revisited: The Problem of Flash Fiction: A morphological-based Linguistic Stylometry Approach”, Arab World English Journal, vol.10, no.3,pp.318-329,2019 , DOI: https://dx.doi.org/10.24093/awej/vol10no3.22


Refbacks

  • There are currently no refbacks.