Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Role of Text Mining in Detection of Plagiarism in Arabic Texts: An Architectural Perspective

Abdullah Al Hussein
College of Computer and Information Sciences, Majmaah University, P.O. Box 66, Al Majmaah 11952, Kingdom of Saudi Arabia
Research Journal of Applied Sciences, Engineering and Technology  2016  4:277-282
http://dx.doi.org/10.19026/rjaset.13.2943  |  © The Author(s) 2016
Received: June ‎8, ‎2015  |  Accepted: April ‎22, ‎2016  |  Published: August 15, 2016

Abstract

The aim of the study of to design and text mining tool for plagiarism detection in Arabic document. Plagiarism is an act or instance of using or closely imitating the language and thoughts of another author without authorization and the representation of that author's work as one's own, as by not crediting the original author (El-Matarawy et al., 2013). Plagiarism detecting in Arabic language documents are difficult because of the complex linguistic structure of this language. Text mining in data mining has a very good role in the natural language processing. In this study, we present plagiarism detection architecture for comparison of Arabic texts to identify similarities using text-mining methods. A new text-mining algorithm namely Text Mining Algorithm for Plagiarism Detection (TMA-PD) is proposed to generate tokens from the given Arabic document. A new framework, which combines the familiar text mining methods Topic Tracking, Clustering and Concept Linkage, is also proposed in this research study. This tool thus reduces the time in preliminary part in the detection of plagiarism. As the present text mining tools do not have the feature to process Arabic documents, a new add-on will be developed and integrated in it. Software agents are also used for better comparisons and to find out more texts that are similar. The performance of this tool will be evaluated on a large data set of Arabic texts.

Keywords:

Plagiarism , text mining,


References

  1. Al Hussein, A., S. Venkataraman and R. Jayabrabu, 2014. Detection of plagiarism in Arabic texts using text mining: A software agent based approach. Proceeding of the 6th International Integrity and Plagiarism Conference. Newcastle Gateshead, UK.
  2. Alzahrani, S.M., N. Salim and A. Abraham, 2012. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE T. Syst. Man Cy. C, 42(2): 133-149.
  3. Barrón-Cede-o, A., P. Gupta and P. Rosso, 2013. Methods for cross-language plagiarism detection. Knowl-Based Syst., 50: 211-217.
  4. Bensalem, I., P. Rosso and S. Chikhi, 2012. Intrinsic plagiarism detection in Arabic text: Preliminary experiments. Proceeding of the 2nd Spanish Conference on Information Retrieval (CERI-2012). Valencia, Spain.
  5. Bin-Habtoor, A.S. and M.A. Zaher, 2012. A survey on plagiarism detection systems. Int. J. Comput. Theor. Eng., 4(2): 185-188.
  6. El-Matarawy, A., M. El-Ramly and R. Bahgat, 2013. Plagiarism detection using sequential pattern mining. Int. J. Appl. Inform. Syst., 5(2): 24-29.
  7. Gupta, V. and G.S. Lehal, 2009. A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell., 1(1): 60-76.
    CrossRef    Direct Link
  8. Jadalla, A. and A. Elnagar, 2012. A Plagiarism Detection System for Arabic Text-based Documents. In: Chau, M. et al. (Eds.), Intelligence and Security Informatics. Lecture Notes in Computer Science, Springer-Verlag, Berlin, Heidelberg, 7299: 145-153.
  9. Jayabrabu, R., V. Saravanan and K. Vivekanandan, 2012a. Software agents paradigm in automated data mining for better visualization using intelligent agents. J. Theor. Appl. Inform. Technol., 39(2): 167-177. http://www.jatit.org/volumes/Vol39No2/9Vol39No2.pdf.
    Direct Link
  10. Jayabrabu, R., V. Saravanan and K. Vivekanandan, 2012b. A framework: Cluster detection and multidimensional visualization of automated data mining using intelligent agents. Int. J. Artif. Intell. Appl., 3(1): 125-138.
    CrossRef    
  11. Kent, C.K. and N. Salim, 2010. Features based text similarity detection. J. Comput., 2(1): 53-57. https://arxiv.org/ftp/arxiv/papers/1001/1001.3487.pdf.
    Direct Link
  12. Mechti, S., M. Jaoua and L.H. Belguith, 2013. A Framework for Plagiarism Detection Based on Author Profiling. Notebook for PAN at CLEF 2013. ANLP Research Group-MIRACL Laboratory, University of Sfax, Tunisia.
    PMid:23408607 PMCid:PMC3624352    Direct Link
  13. Menai, M.E.B. and M. Bagais, 2011. APlag: A plagiarism checker for Arabic texts. Proceeding of the 6th International Conference on Computer Science and Education (ICCSE, 2011), pp: 1379-1383.
  14. Oberreuter, G. and J.D. Velásquez, 2013. Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Expert Syst. Appl., 40(9): 3756-3763. http://www.sciencedirect.com/science/article/pii/S0957417412013231.
    CrossRef    Direct Link
  15. Osman, A.H., N. Salim, M.S. Binwahlan, R. Alteeb and A. Abuobieda, 2012. An improved plagiarism detection scheme based on semantic role labeling. Appl. Soft Comput., 12(5): 1493-1502. http://www.sciencedirect.com/science/article/pii/S1568494612000087.
    CrossRef    a href='http://www.sciencedirect.com/science/article/pii/S1568494612000087' target='_blank'>Direct Link
  16. Rajan, J. and V. Saravanan, 2008. A framework of an automated data mining system using autonomous intelligent agents. Proceeding of the International Conference on Computer Science and Information Technology, pp: 700-704.
  17. Ramya, L. and R. Venkatalakshmi, 2013. Intelligent plagiarism detection. Int. J. Res. Eng. Adv. Technol., 1(1): 1-4.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved