Research Article | OPEN ACCESS
Similarity Measurements of Vector Space Model on Arabic Text
Ahmad M. Odat
Faculty of Science and Information Technology, Irbid National University, Irbid, Jordan
Research Journal of Applied Sciences, Engineering and Technology 2015 8:860-864
Received: May 20, 2015 | Accepted: July 14, 2015 | Published: November 15, 2015
Abstract
This study presented an effective retrieval model through appling a successful comparison between sets of measurment within Vector Space Model (VSM) and to proof that, we use two mechanism in inverted file, the first is word-oriented mechanism for indexing a text collection and the second is block-oriented mechanism, after removing the stop words. This study use 242 collection of arabic abstract and 60 building collection of arabic queries. During building an inverted file as index file, time and space factors are computed. And after running the system, recall and precision calculated to compare the retrieval efficiency of using inverted file. The VSM have many measurment: Cosine measure, Dice measure, Jaccard measure and Inner product similarity. The study achived an effective retrieval system through appled VSM with jaccard measure comparison with the other measurements, jaccard mesure obtain a good result particularly when using the arabic collection documents. The study also obtained a good result from block-oriented mechanism rather than word-oriented mechanism. As a conclusion, the best information retrival model for arabic documents is VSM with jaccard measure using block-oriented technique.
Keywords:
Block-oriented mechanism, inverted file, jaccard measure, precision, recall, Vector Space Model (VSM),
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
![](http://www.maxwellsci.com/images/RJASET-maxw.jpg) |
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
![Submit Manuscript](../images/sub.jpg) |
Information |
|
|
|
Sales & Services |
|
|
|