Research Article | OPEN ACCESS
Self-Organizing Maps and Principal Component Analysis to Improve Classification Accuracy
Hicham Omara, Mohamed Lazaar and YounessTabii
LirosaLaboratory, Faculty of Science, AbdelmalekEssaadi University, Tetouan, Morocco
Research Journal of Applied Sciences, Engineering and Technology 2018 5:190-196
Received: February 5, 2018 | Accepted: March 2, 2018 | Published: May 15, 2018
Abstract
The aim of this study is to perform the Kohonen Self-Organizing Map (SOM) using Principal Component Analysis (PCA). SOM is an algorithm commonly used to visualize and classify datasets, due to its ability to project large data into a smaller dimension. However, their performance decreases when the size of the problem becomes too big. Therefore, reducing the size of the data by removing irrelevant or redundant variables and selecting only the most significant ones according to certain criteria has become a requirement before any classification, this reduction should give the best performance according to a certain objective function. Many researchers have tried to solve this problem. This study presents a new approach to improve SOM based on PCA. The experimental analysis of real data from the UCI machine learning repository shows an improvement of the proposed SOM compared to a traditional approach. More than 2% of the improvement in the accuracy of the classification is observed.
Keywords:
Classification, feature extraction, feature selection, principal component analysis, self-organizing maps,
References
-
Abdi, H. and L.J. Williams, 2010. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat., 2(04): 433-459.
CrossRef
-
Acu-a, E. and C. Rodriguez, 2004. The Treatment of Missing Values and its Effect on Classifier Accuracy. In: Banks D., F.R. McMorris, P. Arabie and W. Gaul (Eds.), Classification, Clustering and Data Mining Applications. Springer, Berlin, Heidelberg, pp: 639-647.
CrossRef
-
Arauzo-Azofra, A., J.L. Aznarte and J.M. Benítez, 2011. Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst. Appl., 38(7): 8170-8177.
CrossRef
-
Blum, A.L. and P. Langley, 1997. Selection of relevant features and examples in machine learning. Artif. Intell., 97(1-2): 245-271.
CrossRef
-
Cattell, R.B., 1966. The Scree test for the number of factors. Multivar. Behav. Res., 1(02): 245-276.
CrossRef PMid:26828106
-
D'agostino, R.B. and H.K. Russell, 2005. Scree Test. In: Encyclopedia of Biostatistics. John Wiley and Sons, Ltd.
CrossRef
-
Devaraj, D., B. Yegnanarayana and K. Ramar, 2002. Radial basis function networks for fast contingency ranking. Int. J. Elec. Power, 24(05): 387-393.
CrossRef
-
Ding, C. and H. Peng, 2005. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol., 3: 185-205.
CrossRef PMid:15852500
-
Dudoit, S., J. Fridlyand and T.P. Speed, 2002. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc., 97: 77-87.
CrossRef
-
Ettaouil, M., M. Lazaar and G. Youssef, 2012. Vector quantization by improved kohonen algorithm. J. Comput., 4: 111-117.
Direct Link
-
Ettaouil, M., M. Lazaar and Y. Ghanou, 2013. Architecture optimization model for the multilayer perceptron and clustering. J. Theor. Appl. Inf. Technol., 47: 64-72.
Direct Link
-
Gheyas, I.A. and L.S. Smith, 2010. Feature subset selection in large dimensionality domains. Pattern Recog., 43(01): 5-13.
CrossRef
-
Guyon, I. and A. Elisseeff, 2003. An introduction to variable and feature selection. J. Mach. Learn. Res., 3: 1157-1182.
-
Jain, Y.K. and S.K. Bhandare, 2011. Min max normalization based data perturbation method for privacy protection. Int. J. Comput. Commun. Technol., 2: 45-50.
Direct Link
-
Jolliffe, I.T., 1972. Discarding variables in a principal component analysis. I: Artificial data. J. R. Stat. Soc. C-Appl., 21(02): 160-173.
Direct Link
-
Kaiser, H.F., 1960. The application of electronic computers to factor analysis. Educ. Psychol. Meas., 20(1): 141-151.
CrossRef
-
Kohavi, R. and G.H. John, 1997. Wrappers for feature subset selection. Artif. Intell., 97(1-2): 273-324.
CrossRef
-
King, J.R. and D.A. Jackson, 1999. Variable selection in large environmental data sets using principal components analysis. Environmetrics, 10(01): 67-77.
CrossRef
-
Kira, K. and L.A. Rendell, 1992. A practical approach to feature selection. Proceedings of the 9th International Workshop on Machine Learning (ML92). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp: 249-256.
CrossRef
-
Kohonen, T., 1998. The self-organizing map. Neurocomputing, 21: 1-6.
CrossRef
-
Maldonado, S., R. Weber and J. Basak, 2011. Simultaneous feature selection and classification using kernel-penalized support vector machines. Inform. Sciences, 181(1): 115-128.
CrossRef
-
Mundra, P.A. and J.C. Rajapakse, 2010. SVM-RFE With MRMR filter for gene selection. IEEE T. NanoBiosci., 9(01): 31-37.
CrossRef PMid:19884101
-
Narayanan, A., E.C. Keedwell, J. Gamalielsson and S. Tatineni, 2004. Single-layer artificial neural networks for gene expression analysis. Neurocomputing, 61: 217-240.
CrossRef
-
Pavel, S. and K. Olga, 2011. Visual analysis of self-organizing maps. Nonlinear Anal-Model., 16(4): 488-504.
Direct Link
-
Ruiz, R., J.C. Riquelme, J.S. Aguilar-Ruiz and M. Garcia-Torres, 2012. Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches. Expert Syst. Appl., 39(12): 11094-11102.
CrossRef
-
Shaharudin, S.M. and N. Ahmad, 2017. Choice of Cumulative Percentage in Principal Component Analysis for Regionalization of Peninsular Malaysia Based on the Rainfall Amount. In: Mohamed Ali, M., H. Wahid, N. Mohd Subha, S. Sahlan, M. Md. Yunus and A. Wahap (Eds.), Modeling, Design and Simulation of Systems. AsiaSim, 2017. Communications in Computer and Information Science, Springer, Singapore, 752: 216-224.
CrossRef
-
Sola, J. and J. Sevilla, 1997. Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE T. Nucl. Sci., 44(3): 1464-1468.
CrossRef
-
Wang, Z., V. Palade and Y. Xu, 2006. Neuro-fuzzy ensemble approach for microarray cancer gene expression data analysis. Proceeding of the 2006 International Symposium on Evolving Fuzzy Systems, pp: 241-246.
CrossRef
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|