Self-Organizing Maps and Principal Component Analysis to Improve  Classification Accuracy

Hicham Omara; Mohamed Lazaar; YounessTabii; YounessTabii

doi:10.19026/rjaset.15.5851

Abstract

The aim of this study is to perform the Kohonen Self-Organizing Map (SOM) using Principal Component Analysis (PCA). SOM is an algorithm commonly used to visualize and classify datasets, due to its ability to project large data into a smaller dimension. However, their performance decreases when the size of the problem becomes too big. Therefore, reducing the size of the data by removing irrelevant or redundant variables and selecting only the most significant ones according to certain criteria has become a requirement before any classification, this reduction should give the best performance according to a certain objective function. Many researchers have tried to solve this problem. This study presents a new approach to improve SOM based on PCA. The experimental analysis of real data from the UCI machine learning repository shows an improvement of the proposed SOM compared to a traditional approach. More than 2% of the improvement in the accuracy of the classification is observed.

Keywords:

Classification, feature extraction, feature selection, principal component analysis, self-organizing maps,

References

Abdi, H. and L.J. Williams, 2010. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat., 2(04): 433-459.
CrossRef
Acu-a, E. and C. Rodriguez, 2004. The Treatment of Missing Values and its Effect on Classifier Accuracy. In: Banks D., F.R. McMorris, P. Arabie and W. Gaul (Eds.), Classification, Clustering and Data Mining Applications. Springer, Berlin, Heidelberg, pp: 639-647.
CrossRef
Arauzo-Azofra, A., J.L. Aznarte and J.M. Benítez, 2011. Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst. Appl., 38(7): 8170-8177.
CrossRef
Blum, A.L. and P. Langley, 1997. Selection of relevant features and examples in machine learning. Artif. Intell., 97(1-2): 245-271.
CrossRef
Cattell, R.B., 1966. The Scree test for the number of factors. Multivar. Behav. Res., 1(02): 245-276.
CrossRef PMid:26828106
D'agostino, R.B. and H.K. Russell, 2005. Scree Test. In: Encyclopedia of Biostatistics. John Wiley and Sons, Ltd.
CrossRef
Devaraj, D., B. Yegnanarayana and K. Ramar, 2002. Radial basis function networks for fast contingency ranking. Int. J. Elec. Power, 24(05): 387-393.
CrossRef
Ding, C. and H. Peng, 2005. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol., 3: 185-205.
CrossRef PMid:15852500
Dudoit, S., J. Fridlyand and T.P. Speed, 2002. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc., 97: 77-87.
CrossRef
Ettaouil, M., M. Lazaar and G. Youssef, 2012. Vector quantization by improved kohonen algorithm. J. Comput., 4: 111-117.
Direct Link
Ettaouil, M., M. Lazaar and Y. Ghanou, 2013. Architecture optimization model for the multilayer perceptron and clustering. J. Theor. Appl. Inf. Technol., 47: 64-72.
Direct Link
Gheyas, I.A. and L.S. Smith, 2010. Feature subset selection in large dimensionality domains. Pattern Recog., 43(01): 5-13.
CrossRef
Guyon, I. and A. Elisseeff, 2003. An introduction to variable and feature selection. J. Mach. Learn. Res., 3: 1157-1182.
Jain, Y.K. and S.K. Bhandare, 2011. Min max normalization based data perturbation method for privacy protection. Int. J. Comput. Commun. Technol., 2: 45-50.
Direct Link
Jolliffe, I.T., 1972. Discarding variables in a principal component analysis. I: Artificial data. J. R. Stat. Soc. C-Appl., 21(02): 160-173.
Direct Link
Kaiser, H.F., 1960. The application of electronic computers to factor analysis. Educ. Psychol. Meas., 20(1): 141-151.
CrossRef
Kohavi, R. and G.H. John, 1997. Wrappers for feature subset selection. Artif. Intell., 97(1-2): 273-324.
CrossRef
King, J.R. and D.A. Jackson, 1999. Variable selection in large environmental data sets using principal components analysis. Environmetrics, 10(01): 67-77.
CrossRef
Kira, K. and L.A. Rendell, 1992. A practical approach to feature selection. Proceedings of the 9th International Workshop on Machine Learning (ML92). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp: 249-256.
CrossRef
Kohonen, T., 1998. The self-organizing map. Neurocomputing, 21: 1-6.
CrossRef
Maldonado, S., R. Weber and J. Basak, 2011. Simultaneous feature selection and classification using kernel-penalized support vector machines. Inform. Sciences, 181(1): 115-128.
CrossRef
Mundra, P.A. and J.C. Rajapakse, 2010. SVM-RFE With MRMR filter for gene selection. IEEE T. NanoBiosci., 9(01): 31-37.
CrossRef PMid:19884101
Narayanan, A., E.C. Keedwell, J. Gamalielsson and S. Tatineni, 2004. Single-layer artificial neural networks for gene expression analysis. Neurocomputing, 61: 217-240.
CrossRef
Pavel, S. and K. Olga, 2011. Visual analysis of self-organizing maps. Nonlinear Anal-Model., 16(4): 488-504.
Direct Link
Ruiz, R., J.C. Riquelme, J.S. Aguilar-Ruiz and M. Garcia-Torres, 2012. Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches. Expert Syst. Appl., 39(12): 11094-11102.
CrossRef
Shaharudin, S.M. and N. Ahmad, 2017. Choice of Cumulative Percentage in Principal Component Analysis for Regionalization of Peninsular Malaysia Based on the Rainfall Amount. In: Mohamed Ali, M., H. Wahid, N. Mohd Subha, S. Sahlan, M. Md. Yunus and A. Wahap (Eds.), Modeling, Design and Simulation of Systems. AsiaSim, 2017. Communications in Computer and Information Science, Springer, Singapore, 752: 216-224.
CrossRef
Sola, J. and J. Sevilla, 1997. Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE T. Nucl. Sci., 44(3): 1464-1468.
CrossRef
Wang, Z., V. Palade and Y. Xu, 2006. Neuro-fuzzy ensemble approach for microarray cancer gene expression data analysis. Proceeding of the 2006 International Symposium on Evolving Fuzzy Systems, pp: 241-246.
CrossRef

Research Journal of Applied Sciences, Engineering and Technology

Self-Organizing Maps and Principal Component Analysis to Improve Classification Accuracy

Abstract

Keywords:

References

Competing interests

Open Access Policy

Copyright



Journal Home \| Aim & Scope \| Author(s) Information \| Editorial Board \| MSP Download Statistics