Research Article | OPEN ACCESS
Challenges of Urdu Named Entity Recognition: A Scarce Resourced Languageq
1, 3Saeeda Naz, 1Arif Iqbal Umar, 1Syed Hamad Shirazi, 2Sajjad Ahmad Khan, 2Imtiaz Ahmed and 2Akbar Ali Khan
1Department of Information Technology, Hazara University, Mansehra, Pakistan
2COMSATS Institute of Information Technology, Abbottabad, KPK, Pakistan
3Higher Education Department, GGPGC NO.1 Abbottabad, KPK, Pakistan
Research Journal of Applied Sciences, Engineering and Technology 2014 10:1272-1278
Received: July 14, 2014 | Accepted: September 13, 2014 | Published: September 15, 2014
Abstract
In this study, we present a brief overview of Named Entity Recognition (NER) system, various approaches followed for NER systems and finally NER systems for Urdu language. Urdu language raises several challenges to Natural Language Processing (NLP) largely due to its rich morphology. Research against NER systems in Urdu language is at infancy stage therefore the focus of this study is on challenges and peculiarities of Urdu NER system. In this study we also explore the previous work done on NER systems for South and South East Asian Languages (SSEAL). Finally, we conclude the existing work in Urdu NER which is a scarce resourced and morphologically rich language and other SSEAL which have similar features to Urdu language.
Keywords:
CRF, ME , SSEAL, Urdu named entity recognition,
References
-
Becker, D. and K. Riaz, 2002. A study in Urdu corpus construction. Proceeding of the 3rd Workshop on Asian Language Resources and International Standardization at the 19th International Conference on Computational Linguistics. August, 2002.
CrossRef
-
Bikel, D.M., S. Miller, R. Schwartz and R. Weischedel, 1997. Nymble: A high-performance learning name-finder. Proceeding of the 5th Conference on Applied Natural Language Processing. Association for Computational Linguistics, 1997.
CrossRef
-
Biswas, S., S.P. Mishra, S. Acharya and S. Mohanty, 2010. A hybrid oriya named entity recognition system: Harnessing the power of rule. Int. J. Artif. Intell. Expert Syst. (IJAE), 1(1): 1-6.
-
Borthwick, A., 1999. A maximum entropy approach to named entity recognition. Ph.D. Thesis, Computer Science Department, New York University, New York.
-
Chaudhuri, B.B. and S. Bhattacharya, 2008. An experiment on automatic detection of named entities in Bangla. Proceeding of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp: 75-82.
PMid:17913520
-
Cucerzan, S. and D. Yarowsky, 1999. Language independent named entity recognition combining morphological and contextual evidence. Proceeding of the Joint SIGDAT Conference on EMNLP and VLC, pp: 90-99.
-
Ekbal, A. and S. Bandyopadhyay, 2008a. Named entity recognition using support vector machine: A language independent approach. Int. J. Comput. Syst. Sci. Eng. (IJCSSE), 4: 155-170.
-
Ekbal, A. and S. Bandyopadhyay, 2008b. Bengali named entity recognition using support vector machine. Proceeding of the IJCNLP-Workshop on NER for South and South East Asian Languages. Hyderabad, India.
-
Ekbal, A. and S. Saha, 2011. A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies. Expert Syst. Appl., 38(12): 14760-14772.
CrossRef
-
Ekbal, A., R. Haque and S. Bandyopadhyay, 2008. Named entity recogntion in Bengali: A conditional random field approach. Proceeding of IJCNLP. India, pp: 589-594.
-
Gali, K., H. Surana, A. Vaidya, P. Shishtla and D.M. Sharma, 2008. Aggregating machine learning and rule based heuristic for named entity recognition. Proceeding of the IJCNLP-08 Workshop on NER for South and South East Asian Languages. Hyderabad, India, pp: 25-32.
-
Goyal, V., 2008. Named entity recognition for south Asian languages. Proceeding of the IJCNLP-08 Workshop on NER for South and South-East Asian Languages. Hyderabad, India.
-
Gupta, P.K. and S. Arora, 2009. An approach for named entity recognition system for Hindi: An experimental study. Proceeding of ASCNT-CDAC. Noida, India, pp: 103-108.
PMid:19255726
-
Kumar, P.P. and V.R. Kiran, 2008. A hybrid named entity recognition system for south Asian languages. Proceeding of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp: 83-88.
-
Li, W. and A. McCallum, 2003. Rapid development of hindi named entity recognition using conditional random fields and feature induction. ACM T. Asian Lang. Inform. Process. (TALIP), 2(3): 290-294.
CrossRef
-
Mukund, S., R. Srihari and E. Peterson, 2010. An information-extraction system for Urdu-a resource-poor language. ACM T. Asian Lang. Inform. Process., 9(4).
-
Nadeau, D., Peter D. Turney and S. Matwin, 2006. Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. Proceeding of 19th Conference of the Canadian Society for Computational Studies of Intelligence, (AI'06), pp: 266-277.
-
Raju, B.S., D.S.V. Raju and K. Kumar, 2010. Named entity recognition for Telegu using maximum entropy model. J. Theor. Appl. Inform. Technol., 3: 125-130.
-
Ramshaw, L.A. and M.P. Marcus, 1995. Text chunking using transformation-based learning. Proceeding of the 3d ACL Workshop on Very Large Corpora, pp: 82-94.
-
Rau, L.F. and P.S. Jacobs, 1991. Creating segmented databases from free text for text retrieval. Proceeding of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '91), pp: 337-346.
CrossRef
-
Riaz, K., 2010. Rule-based named entity recognition in Urdu. Proceeding of the 2010 Named Entities Workshop (NEWS, 2010), pp: 126-135.
-
Saha, S.K., S. Sarkar and P. Mitra, 2008a. A hybrid feature set based maximum entropy Hindi named entity recognition. Proceeding of the 3rd International Joint Conference on Natural Language Processing. Hyderabad, India.
-
Saha, S.K., P.S. Ghosh, S. Sarkar and P. Mitra, 2008b. Named entity recognition in Hindi using maximum entropy and transliteration. Polibits, 38: 33-42.
CrossRef
-
Sang, E.F.T.K., 2002. Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. Proceeding of the 6th Conference on Natural Language Learning (COLING-02), pp: 1-4.
CrossRef PMid:11960559 PMCid:PMC107739
-
Sharma, P., U. Sharma and J. Kalita, 2011. Named entity recognition: A survey for the Indian languages. Parsing in Indian Languages, pp: 35-39.
-
Singh, U., V. Goyal and G.S. Lehal, 2012. Named entity recognition system for Urdu. Proceeding of COLING, pp: 2507-2518.
-
Srikanth, P. and K.N. Murthy, 2008. Named entity recognition for Telegu. Proceeding of the IJCNLP-Workshop on NER for South and South East Asian Languages. Hyderabad, India, pp: 41-52.
-
Srivastava, S., M Sanglikar and D.C. Kothari, 2011. Named entity recognition system for Hindi language: A hybrid approach. Int. J. Comput. Linguist. (IJCL), 2(1).
-
VijayKrishna, R. and L. Sobha, 2008. Domain focused Named Entity Recognizer for Tamil using conditional random fields. Proceedings of the IJCNLP-08 Wokshop on NER for South and South East Asian Languages. Hyderabad, India, pp: 59-66.
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|