Research Article | OPEN ACCESS
Arabic Sentiment Analysis with Optimal Combination of Features Selection and Machine Learning Approaches
Bilal Sabri and Saidah Saad
Data Mining and Optimization Research Group (DMO), Centre for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technologi, School of Computer Science, Universiti Kebangsaan Malaysia (UKM), 43600 Bandar Baru Bangi, Malaysia
Research Journal of Applied Sciences, Engineering and Technology 2016 5:386-393
Received: March 2, 2016 | Accepted: May 23, 2016 | Published: September 05, 2016
Abstract
The main objective of this research study is to design a model that allows for the utilization of a novel technique for the implementation of sentiment analysis in the Arabic language. Sentiment analysis is an interesting task that includes web mining, Natural Language Processing (NLP) and Machine Learning (ML). Most of the research work on sentiment analysis was focused on the texts in the English language. Therefore, the research on sentiment analysis in the Arabic language and other languages are in the infancy stage. This study empirically evaluates three Feature Selection Methods (FSM) (Information Gain (IG), Chi-square (CHI) and Gini Index (GI)) and, three classification approaches (Association Rule (AR) mining and the N-gram model and the Meta-classifier approach) for the implementation of sentiment classification in the Arabic language. A number of related experiments have been carried out on the Opinion Corpus of Arabic (OCA). The results obtained from the experiments were favorable, depending on the algorithms used and the number of selected feature has proven that the use of FS method can increase the performance of sentiment classification in the Arabic language. The results of the experiments reveal that FS method is obtained to develop the classifier performance. Furthermore, the results of the experiment indicated that the use of CHI feature selection has produced the best performance for FS and the performance of meta-classifier a combination approach has outperformed the other approaches for sentiment classification in the Arabic language. In conclusion, this research study has proven that the combination approach (meta-classifier) with the chi-square FS method produces the most accurate classification technique, as high as 90.80%.
Keywords:
Feature selection, meta-classifier, machine learning approach, NLP , opinion mining, sentiment analysis,
References
-
Alsaffar, A. and N. Omar, 2014. Study on feature selection and machine learning algorithms for Malay sentiment classification. Proceeding of the IEEE International Conference on Information Technology and Multimedia (ICIMU, 2014). Putrajaya, pp: 270-275.
Direct Link -
Alsaffar, A. and N. Omar, 2015. Integrating a Lexicon based approach and K nearest neighbour for Malay sentiment analysis. J. Comput. Sci., 11(4): 639-644.
Direct Link -
Eickhoff, M., 2015. Enabling Reproducible Sentiment Analysis: A Hybrid Domain-Portable Framework for Sentiment Classification. In: Donnellan, B. et al. (Eds.), New Horizons in Design Science: Broadening the Research Agenda. Springer International Publishing, Switzerland, pp: 215-229.
Direct Link -
Franco-Salvador, M., F.L. Cruz, J.A. Troyano and P. Rosso, 2015. Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowl-Based Syst., 86: 46-56.
Direct Link -
Haddi, E., X. Liu and Y. Shi, 2013. The role of text pre-processing in sentiment analysis. Proc. Comput. Sci., 17: 26-32.
Direct Link -
Hai, Z., K. Chang and J.J. Kim, 2011. Implicit Feature Identification via Co-occurrence Association Rule Mining. In: Gelbukh, A.F. (Ed.), Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 6608: 393-404.
Direct Link -
Ibrahim, H.S., S.M. Abdou and M. Gheith, 2015. Sentiment analysis for modern standard Arabic and colloquial. Int. J. Nat. Lang. Comput., 4(2): 95-109.
Direct Link -
Man, Y., O. Yuanxin and S. Hao, 2014. Investigating association rules for sentiment classification of web reviews. J. Intell. Fuzzy Syst. Appl. Eng. Technol., 27(4): 2055-2065.
Direct Link -
Montoyo, A., P. Martínez-Barco and A. Balahur, 2012. Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments. Decis. Support Syst., 53(4): 675-679.
Direct Link -
Omar, N., M. Albared, A.Q. Al-Shabi and T. Al-Moslmi, 2013. Ensemble of classification algorithms for subjectivity and sentiment analysis of Arabic customers' reviews. Int. J. Adv. Comput. Technol., 5(14): 77-85.
Direct Link -
Omar, N., M. Albared, T. Al-Moslmi and A. Al-Shabi, 2014. A Comparative Study of Feature Selection and Machine Learning Algorithms for Arabic Sentiment Classification. In: Jaafar, A. et al. (Eds.), Information Retrieval Technology. Lecture Notes in Computer Science, Springer International Publishing, Switzerland, 8870: 429-443.
Direct Link -
Rushdi-Saleh, M., M.T. Martín-Valdivia, L.A. Ure-a-López and J.M. Perea-Ortega, 2011. OCA: Opinion corpus for Arabic. J. Am. Soc. Inf. Sci. Tec., 62(10): 2045-2054.
Direct Link -
Soliman, T.H., M.A. Elmasry, A. Hedar and M.M. Doss, 2014. Sentiment analysis of Arabic slang comments on facebook. Int. J. Comput. Technol., 12(5): 3470-3478.
Direct Link -
Tasci, S. and T. Güngör, 2013. Comparison of text feature selection policies and using an adaptive framework. Expert Syst. Appl., 40(12): 4871-4886.
Direct Link -
Thabtah, F., Q. Mahmood, L. McCluskey and H. Abdel-Jaber, 2010. A new classification based on association algorithm. J. Inf. Knowl. Manage., 9(1): 55-64.
Direct Link -
Wang, W., H. Xu and W. Wan, 2013. Implicit feature identification via hybrid association rule mining. Expert Syst. Appl., 40(9): 3518-3531.
Direct Link -
Xia, R., C. Zong and S. Li, 2011. Ensemble of feature sets and classification algorithms for sentiment classification. Inform. Sciences, 181(6): 1138-1152.
CrossRef Direct Link -
Yang, C.C., Y.C. Wong and C.P. Wei, 2009. Classifying web review opinions for consumer product analysis. Proceeding of the 11th International Conference on Electronic Commerce (ICEC'09), pp: 57-63.
Direct Link
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|