Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology

    Abstract
2014(Vol.7, Issue:6)
Article Information:

Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair

Deepa Gupta, Vani Raveendran and Rahul Kumar Yadav
Corresponding Author:  Deepa Gupta 
Submitted: March 19, 2013
Accepted: May 10, 2013
Published: February 15, 2014
Abstract:
Creation of Parallel Corpora and efficient corporal alignment at sentential level for structurally distinct languages having relatively low degree of correlation remains a challenge. This work emphasizes the importance of domain biased parallel data collection and a structured methodology to obtain the same for English-Hindi language duet. Further, its sentential alignment has also been undertaken since the participating languages are structurally distinct. In essence two aspects of this study is collection of parallel corpora from different domains and aligning the extracted parallel corpus at sentence level. The proposition is intended to help researchers in the field of Natural Language Processing help contribute better in terms of accuracy, precision and robustness of their proposition. This being possible only with availability of abundant parallel corpora and more so only if the parallel corpora are available domain wise and aligned at least at sentence level. The language pair considered for the development of the algorithm is English-Hindi. The algorithm being generic in nature makes our proposition scalable to other like structured language pairs.

Key words:  Cost calculation, Natural Language Processing (NLP), non-official data, normal distribution, official data, parallel corpus collection, semi-official data, sentential alignment
Abstract PDF HTML
Cite this Reference:
Deepa Gupta, Vani Raveendran and Rahul Kumar Yadav, . Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair. Research Journal of Applied Sciences, Engineering and Technology, (6): 1187-1198.
ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved