Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Rule-and Dictionary-based Solution for Variations in Written Arabic Names in Social Networks, Big Data, Accounting Systems and Large Databases

1Ahmad B.A. Hassanat and 2Ghada Awad Altarawneh
1Department of IT
2Department of Accounting, Mu'tah University, Mu'tah-Karak 61710, Jordan
Research Journal of Applied Sciences, Engineering and Technology  2014  14:1630-1638
http://dx.doi.org/10.19026/rjaset.8.1144  |  © The Author(s) 2014
Received: May ‎19, ‎2014  |  Accepted: June ‎18, ‎2014  |  Published: October 10, 2014

Abstract

This study investigates the problem that some Arabic names can be written in multiple ways. When someone searches for only one form of a name, neither exact nor approximate matching is appropriate for returning the multiple variants of the name. Exact matching requires the user to enter all forms of the name for the search and approximate matching yields names not among the variations of the one being sought. In this study, we attempt to solve the problem with a dictionary of all Arabic names mapped to their different (alternative) writing forms. We generated alternatives based on rules we derived from reviewing the first names of 9.9 million citizens and former citizens of Jordan. This dictionary can be used for both standardizing the written form when inserting a new name into a database and for searching for the name and all its alternative written forms. Creating the dictionary automatically based on rules resulted in at least 7% erroneous acceptance errors and 7.9% erroneous rejection errors. We addressed the errors by manually editing the dictionary. The dictionary can be of help to real world-databases, with the qualification that manual editing does not guarantee 100% correctness.

Keywords:

Arabic names , Arabic names standardization , database, name entity recognition, NLP,


References

  1. Alghamdi, M., 2005. Algorithms for romanizing Arabic names. J. King Saud Univ., Comput. Sci. Inform., 17: 1-27.
  2. Al-Onaizan, Y. and K. Knight, 2002. Machine transliteration of names in arabic text. Proceedings of the ACL-02 Workshop on Computational Approaches to Semitic Languages (SEMITIC, 02). Stroudsburg, PA, pp: 1-10.
    CrossRef    
  3. Alshamsan, I.S., 2003. Discrepancies in the Writing Arabic Names in Letters and Diacritics: Forms and Causes. The Standardization of the Romanization of Arabic Proper Names: The Security Dimensions, Riyadh, pp: 9-54 (In Arabic).
  4. Arbabi, M., S.M. Fischthal, V.C. Cheng and E. Bart, 1994. Algorithms for Arabic names transliteration. IBM J. Res. Dev., 38(2): 183-194.
    CrossRef    
  5. Buckwalter, T., 2004. Buckwalter Arabic Morphological Analyzer Version 2.0. LDC Catalog Number LDC2004L02, ISBN: 1-58563-324-0.
  6. Elsebai, A., F. Meziane and F.Z. Belkredim, 2009. A rule based persons names arabic extraction system. Proceeding of Communications of the IBIMA, Vol. 11, ISSN: 1943-7765.
  7. Farghaly, A. and K. Shaalan, 2009. Arabic natural language processing: Challenges and solutions. ACM T. Asian Lang. Inform. Process., 8(4).
  8. Freeman, A.T., S.L. Condon and C.M. Ackerman, 2006. Cross linguistic name matching in english and arabic: A "one to many mapping" extension of the levenshtein edit distance algorithm. Proceeding of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL '06), pp: 471-478.
    CrossRef    PMid:17000408    
  9. Halpern, J., 2007. The challenges and pitfalls of arabic romanization and arabization. Proceeding of the 2nd Workshop on Computational Approaches to Arabic Script-based Languages, Palo Alto.
  10. Levenshtein, V., 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Phys. Doklady, 10(10): 707-710.
  11. Soudi, A., A. van den Bosch and G. Neumann, 2007. Arabic Computational Morphology: Knowledge-based and Empirical Methods. 1st Edn., Springer Publishing Co., Incorporated © 2007, ISBN: 1402060459 9781402060458.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved