Research Article | OPEN ACCESS
Role of Text Mining in Detection of Plagiarism in Arabic Texts: An Architectural Perspective
Abdullah Al Hussein
College of Computer and Information Sciences, Majmaah University, P.O. Box 66, Al Majmaah 11952, Kingdom of Saudi Arabia
Research Journal of Applied Sciences, Engineering and Technology 2016 4:277-282
Received: June 8, 2015 | Accepted: April 22, 2016 | Published: August 15, 2016
Abstract
The aim of the study of to design and text mining tool for plagiarism detection in Arabic document. Plagiarism is an act or instance of using or closely imitating the language and thoughts of another author without authorization and the representation of that author's work as one's own, as by not crediting the original author (El-Matarawy et al., 2013). Plagiarism detecting in Arabic language documents are difficult because of the complex linguistic structure of this language. Text mining in data mining has a very good role in the natural language processing. In this study, we present plagiarism detection architecture for comparison of Arabic texts to identify similarities using text-mining methods. A new text-mining algorithm namely Text Mining Algorithm for Plagiarism Detection (TMA-PD) is proposed to generate tokens from the given Arabic document. A new framework, which combines the familiar text mining methods Topic Tracking, Clustering and Concept Linkage, is also proposed in this research study. This tool thus reduces the time in preliminary part in the detection of plagiarism. As the present text mining tools do not have the feature to process Arabic documents, a new add-on will be developed and integrated in it. Software agents are also used for better comparisons and to find out more texts that are similar. The performance of this tool will be evaluated on a large data set of Arabic texts.
Keywords:
Plagiarism , text mining,
References
-
Al Hussein, A., S. Venkataraman and R. Jayabrabu, 2014. Detection of plagiarism in Arabic texts using text mining: A software agent based approach. Proceeding of the 6th International Integrity and Plagiarism Conference. Newcastle Gateshead, UK.
-
Alzahrani, S.M., N. Salim and A. Abraham, 2012. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE T. Syst. Man Cy. C, 42(2): 133-149.
-
Barrón-Cede-o, A., P. Gupta and P. Rosso, 2013. Methods for cross-language plagiarism detection. Knowl-Based Syst., 50: 211-217.
-
Bensalem, I., P. Rosso and S. Chikhi, 2012. Intrinsic plagiarism detection in Arabic text: Preliminary experiments. Proceeding of the 2nd Spanish Conference on Information Retrieval (CERI-2012). Valencia, Spain.
-
Bin-Habtoor, A.S. and M.A. Zaher, 2012. A survey on plagiarism detection systems. Int. J. Comput. Theor. Eng., 4(2): 185-188.
-
El-Matarawy, A., M. El-Ramly and R. Bahgat, 2013. Plagiarism detection using sequential pattern mining. Int. J. Appl. Inform. Syst., 5(2): 24-29.
-
Gupta, V. and G.S. Lehal, 2009. A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell., 1(1): 60-76.
CrossRef Direct Link
-
Jadalla, A. and A. Elnagar, 2012. A Plagiarism Detection System for Arabic Text-based Documents. In: Chau, M. et al. (Eds.), Intelligence and Security Informatics. Lecture Notes in Computer Science, Springer-Verlag, Berlin, Heidelberg, 7299: 145-153.
-
Jayabrabu, R., V. Saravanan and K. Vivekanandan, 2012a. Software agents paradigm in automated data mining for better visualization using intelligent agents. J. Theor. Appl. Inform. Technol., 39(2): 167-177. http://www.jatit.org/volumes/Vol39No2/9Vol39No2.pdf.
Direct Link
-
Jayabrabu, R., V. Saravanan and K. Vivekanandan, 2012b. A framework: Cluster detection and multidimensional visualization of automated data mining using intelligent agents. Int. J. Artif. Intell. Appl., 3(1): 125-138.
CrossRef
-
Kent, C.K. and N. Salim, 2010. Features based text similarity detection. J. Comput., 2(1): 53-57. https://arxiv.org/ftp/arxiv/papers/1001/1001.3487.pdf.
Direct Link
-
Mechti, S., M. Jaoua and L.H. Belguith, 2013. A Framework for Plagiarism Detection Based on Author Profiling. Notebook for PAN at CLEF 2013. ANLP Research Group-MIRACL Laboratory, University of Sfax, Tunisia.
PMid:23408607 PMCid:PMC3624352 Direct Link
-
Menai, M.E.B. and M. Bagais, 2011. APlag: A plagiarism checker for Arabic texts. Proceeding of the 6th International Conference on Computer Science and Education (ICCSE, 2011), pp: 1379-1383.
-
Oberreuter, G. and J.D. Velásquez, 2013. Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Expert Syst. Appl., 40(9): 3756-3763. http://www.sciencedirect.com/science/article/pii/S0957417412013231.
CrossRef Direct Link
-
Osman, A.H., N. Salim, M.S. Binwahlan, R. Alteeb and A. Abuobieda, 2012. An improved plagiarism detection scheme based on semantic role labeling. Appl. Soft Comput., 12(5): 1493-1502. http://www.sciencedirect.com/science/article/pii/S1568494612000087.
CrossRef a href='http://www.sciencedirect.com/science/article/pii/S1568494612000087' target='_blank'>Direct Link
-
Rajan, J. and V. Saravanan, 2008. A framework of an automated data mining system using autonomous intelligent agents. Proceeding of the International Conference on Computer Science and Information Technology, pp: 700-704.
-
Ramya, L. and R. Venkatalakshmi, 2013. Intelligent plagiarism detection. Int. J. Res. Eng. Adv. Technol., 1(1): 1-4.
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|