Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Challenges of Urdu Named Entity Recognition: A Scarce Resourced Languageq

1, 3Saeeda Naz, 1Arif Iqbal Umar, 1Syed Hamad Shirazi, 2Sajjad Ahmad Khan, 2Imtiaz Ahmed and 2Akbar Ali Khan
1Department of Information Technology, Hazara University, Mansehra, Pakistan
2COMSATS Institute of Information Technology, Abbottabad, KPK, Pakistan
3Higher Education Department, GGPGC NO.1 Abbottabad, KPK, Pakistan
Research Journal of Applied Sciences, Engineering and Technology  2014  10:1272-1278
http://dx.doi.org/10.19026/rjaset.8.1095  |  © The Author(s) 2014
Received: July ‎14, ‎2014  |  Accepted: September ‎13, ‎2014  |  Published: September 15, 2014

Abstract

In this study, we present a brief overview of Named Entity Recognition (NER) system, various approaches followed for NER systems and finally NER systems for Urdu language. Urdu language raises several challenges to Natural Language Processing (NLP) largely due to its rich morphology. Research against NER systems in Urdu language is at infancy stage therefore the focus of this study is on challenges and peculiarities of Urdu NER system. In this study we also explore the previous work done on NER systems for South and South East Asian Languages (SSEAL). Finally, we conclude the existing work in Urdu NER which is a scarce resourced and morphologically rich language and other SSEAL which have similar features to Urdu language.

Keywords:

CRF, ME , SSEAL, Urdu named entity recognition,


References

  1. Becker, D. and K. Riaz, 2002. A study in Urdu corpus construction. Proceeding of the 3rd Workshop on Asian Language Resources and International Standardization at the 19th International Conference on Computational Linguistics. August, 2002.
    CrossRef    
  2. Bikel, D.M., S. Miller, R. Schwartz and R. Weischedel, 1997. Nymble: A high-performance learning name-finder. Proceeding of the 5th Conference on Applied Natural Language Processing. Association for Computational Linguistics, 1997.
    CrossRef    
  3. Biswas, S., S.P. Mishra, S. Acharya and S. Mohanty, 2010. A hybrid oriya named entity recognition system: Harnessing the power of rule. Int. J. Artif. Intell. Expert Syst. (IJAE), 1(1): 1-6.
  4. Borthwick, A., 1999. A maximum entropy approach to named entity recognition. Ph.D. Thesis, Computer Science Department, New York University, New York.
  5. Chaudhuri, B.B. and S. Bhattacharya, 2008. An experiment on automatic detection of named entities in Bangla. Proceeding of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp: 75-82.
    PMid:17913520    
  6. Cucerzan, S. and D. Yarowsky, 1999. Language independent named entity recognition combining morphological and contextual evidence. Proceeding of the Joint SIGDAT Conference on EMNLP and VLC, pp: 90-99.
  7. Ekbal, A. and S. Bandyopadhyay, 2008a. Named entity recognition using support vector machine: A language independent approach. Int. J. Comput. Syst. Sci. Eng. (IJCSSE), 4: 155-170.
  8. Ekbal, A. and S. Bandyopadhyay, 2008b. Bengali named entity recognition using support vector machine. Proceeding of the IJCNLP-Workshop on NER for South and South East Asian Languages. Hyderabad, India.
  9. Ekbal, A. and S. Saha, 2011. A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies. Expert Syst. Appl., 38(12): 14760-14772.
    CrossRef    
  10. Ekbal, A., R. Haque and S. Bandyopadhyay, 2008. Named entity recogntion in Bengali: A conditional random field approach. Proceeding of IJCNLP. India, pp: 589-594.
  11. Gali, K., H. Surana, A. Vaidya, P. Shishtla and D.M. Sharma, 2008. Aggregating machine learning and rule based heuristic for named entity recognition. Proceeding of the IJCNLP-08 Workshop on NER for South and South East Asian Languages. Hyderabad, India, pp: 25-32.
  12. Goyal, V., 2008. Named entity recognition for south Asian languages. Proceeding of the IJCNLP-08 Workshop on NER for South and South-East Asian Languages. Hyderabad, India.
  13. Gupta, P.K. and S. Arora, 2009. An approach for named entity recognition system for Hindi: An experimental study. Proceeding of ASCNT-CDAC. Noida, India, pp: 103-108.
    PMid:19255726    
  14. Kumar, P.P. and V.R. Kiran, 2008. A hybrid named entity recognition system for south Asian languages. Proceeding of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp: 83-88.
  15. Li, W. and A. McCallum, 2003. Rapid development of hindi named entity recognition using conditional random fields and feature induction. ACM T. Asian Lang. Inform. Process. (TALIP), 2(3): 290-294.
    CrossRef    
  16. Mukund, S., R. Srihari and E. Peterson, 2010. An information-extraction system for Urdu-a resource-poor language. ACM T. Asian Lang. Inform. Process., 9(4).
  17. Nadeau, D., Peter D. Turney and S. Matwin, 2006. Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. Proceeding of 19th Conference of the Canadian Society for Computational Studies of Intelligence, (AI'06), pp: 266-277.
  18. Raju, B.S., D.S.V. Raju and K. Kumar, 2010. Named entity recognition for Telegu using maximum entropy model. J. Theor. Appl. Inform. Technol., 3: 125-130.
  19. Ramshaw, L.A. and M.P. Marcus, 1995. Text chunking using transformation-based learning. Proceeding of the 3d ACL Workshop on Very Large Corpora, pp: 82-94.
  20. Rau, L.F. and P.S. Jacobs, 1991. Creating segmented databases from free text for text retrieval. Proceeding of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '91), pp: 337-346.
    CrossRef    
  21. Riaz, K., 2010. Rule-based named entity recognition in Urdu. Proceeding of the 2010 Named Entities Workshop (NEWS, 2010), pp: 126-135.
  22. Saha, S.K., S. Sarkar and P. Mitra, 2008a. A hybrid feature set based maximum entropy Hindi named entity recognition. Proceeding of the 3rd International Joint Conference on Natural Language Processing. Hyderabad, India.
  23. Saha, S.K., P.S. Ghosh, S. Sarkar and P. Mitra, 2008b. Named entity recognition in Hindi using maximum entropy and transliteration. Polibits, 38: 33-42.
    CrossRef    
  24. Sang, E.F.T.K., 2002. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. Proceeding of the 6th Conference on Natural Language Learning (COLING-02), pp: 1-4.
    CrossRef    PMid:11960559 PMCid:PMC107739    
  25. Sharma, P., U. Sharma and J. Kalita, 2011. Named entity recognition: A survey for the Indian languages. Parsing in Indian Languages, pp: 35-39.
  26. Singh, U., V. Goyal and G.S. Lehal, 2012. Named entity recognition system for Urdu. Proceeding of COLING, pp: 2507-2518.
  27. Srikanth, P. and K.N. Murthy, 2008. Named entity recognition for Telegu. Proceeding of the IJCNLP-Workshop on NER for South and South East Asian Languages. Hyderabad, India, pp: 41-52.
  28. Srivastava, S., M Sanglikar and D.C. Kothari, 2011. Named entity recognition system for Hindi language: A hybrid approach. Int. J. Comput. Linguist. (IJCL), 2(1).
  29. VijayKrishna, R. and L. Sobha, 2008. Domain focused Named Entity Recognizer for Tamil using conditional random fields. Proceedings of the IJCNLP-08 Wokshop on NER for South and South East Asian Languages. Hyderabad, India, pp: 59-66.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved