Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Self-Organizing Maps and Principal Component Analysis to Improve Classification Accuracy

Hicham Omara, Mohamed Lazaar and YounessTabii
LirosaLaboratory, Faculty of Science, AbdelmalekEssaadi University, Tetouan, Morocco
Research Journal of Applied Sciences, Engineering and Technology  2018  5:190-196
http://dx.doi.org/10.19026/rjaset.15.5851  |  © The Author(s) 2018
Received: February 5, 2018  |  Accepted: March 2, 2018  |  Published: May 15, 2018

Abstract

The aim of this study is to perform the Kohonen Self-Organizing Map (SOM) using Principal Component Analysis (PCA). SOM is an algorithm commonly used to visualize and classify datasets, due to its ability to project large data into a smaller dimension. However, their performance decreases when the size of the problem becomes too big. Therefore, reducing the size of the data by removing irrelevant or redundant variables and selecting only the most significant ones according to certain criteria has become a requirement before any classification, this reduction should give the best performance according to a certain objective function. Many researchers have tried to solve this problem. This study presents a new approach to improve SOM based on PCA. The experimental analysis of real data from the UCI machine learning repository shows an improvement of the proposed SOM compared to a traditional approach. More than 2% of the improvement in the accuracy of the classification is observed.

Keywords:

Classification, feature extraction, feature selection, principal component analysis, self-organizing maps,


References

  1. Abdi, H. and L.J. Williams, 2010. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat., 2(04): 433-459.
    CrossRef    
  2. Acu-a, E. and C. Rodriguez, 2004. The Treatment of Missing Values and its Effect on Classifier Accuracy. In: Banks D., F.R. McMorris, P. Arabie and W. Gaul (Eds.), Classification, Clustering and Data Mining Applications. Springer, Berlin, Heidelberg, pp: 639-647.
    CrossRef    
  3. Arauzo-Azofra, A., J.L. Aznarte and J.M. Benítez, 2011. Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst. Appl., 38(7): 8170-8177.
    CrossRef    
  4. Blum, A.L. and P. Langley, 1997. Selection of relevant features and examples in machine learning. Artif. Intell., 97(1-2): 245-271.
    CrossRef    
  5. Cattell, R.B., 1966. The Scree test for the number of factors. Multivar. Behav. Res., 1(02): 245-276.
    CrossRef    PMid:26828106    
  6. D'agostino, R.B. and H.K. Russell, 2005. Scree Test. In: Encyclopedia of Biostatistics. John Wiley and Sons, Ltd.
    CrossRef    
  7. Devaraj, D., B. Yegnanarayana and K. Ramar, 2002. Radial basis function networks for fast contingency ranking. Int. J. Elec. Power, 24(05): 387-393.
    CrossRef    
  8. Ding, C. and H. Peng, 2005. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol., 3: 185-205.
    CrossRef    PMid:15852500    
  9. Dudoit, S., J. Fridlyand and T.P. Speed, 2002. Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc., 97: 77-87.
    CrossRef    
  10. Ettaouil, M., M. Lazaar and G. Youssef, 2012. Vector quantization by improved kohonen algorithm. J. Comput., 4: 111-117.
    Direct Link
  11. Ettaouil, M., M. Lazaar and Y. Ghanou, 2013. Architecture optimization model for the multilayer perceptron and clustering. J. Theor. Appl. Inf. Technol., 47: 64-72.
    Direct Link
  12. Gheyas, I.A. and L.S. Smith, 2010. Feature subset selection in large dimensionality domains. Pattern Recog., 43(01): 5-13.
    CrossRef    
  13. Guyon, I. and A. Elisseeff, 2003. An introduction to variable and feature selection. J. Mach. Learn. Res., 3: 1157-1182.
  14. Jain, Y.K. and S.K. Bhandare, 2011. Min max normalization based data perturbation method for privacy protection. Int. J. Comput. Commun. Technol., 2: 45-50.
    Direct Link
  15. Jolliffe, I.T., 1972. Discarding variables in a principal component analysis. I: Artificial data. J. R. Stat. Soc. C-Appl., 21(02): 160-173.
    Direct Link
  16. Kaiser, H.F., 1960. The application of electronic computers to factor analysis. Educ. Psychol. Meas., 20(1): 141-151.
    CrossRef    
  17. Kohavi, R. and G.H. John, 1997. Wrappers for feature subset selection. Artif. Intell., 97(1-2): 273-324.
    CrossRef    
  18. King, J.R. and D.A. Jackson, 1999. Variable selection in large environmental data sets using principal components analysis. Environmetrics, 10(01): 67-77.
    CrossRef    
  19. Kira, K. and L.A. Rendell, 1992. A practical approach to feature selection. Proceedings of the 9th International Workshop on Machine Learning (ML92). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp: 249-256.
    CrossRef    
  20. Kohonen, T., 1998. The self-organizing map. Neurocomputing, 21: 1-6.
    CrossRef    
  21. Maldonado, S., R. Weber and J. Basak, 2011. Simultaneous feature selection and classification using kernel-penalized support vector machines. Inform. Sciences, 181(1): 115-128.
    CrossRef    
  22. Mundra, P.A. and J.C. Rajapakse, 2010. SVM-RFE With MRMR filter for gene selection. IEEE T. NanoBiosci., 9(01): 31-37.
    CrossRef    PMid:19884101    
  23. Narayanan, A., E.C. Keedwell, J. Gamalielsson and S. Tatineni, 2004. Single-layer artificial neural networks for gene expression analysis. Neurocomputing, 61: 217-240.
    CrossRef    
  24. Pavel, S. and K. Olga, 2011. Visual analysis of self-organizing maps. Nonlinear Anal-Model., 16(4): 488-504.
    Direct Link
  25. Ruiz, R., J.C. Riquelme, J.S. Aguilar-Ruiz and M. Garcia-Torres, 2012. Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches. Expert Syst. Appl., 39(12): 11094-11102.
    CrossRef    
  26. Shaharudin, S.M. and N. Ahmad, 2017. Choice of Cumulative Percentage in Principal Component Analysis for Regionalization of Peninsular Malaysia Based on the Rainfall Amount. In: Mohamed Ali, M., H. Wahid, N. Mohd Subha, S. Sahlan, M. Md. Yunus and A. Wahap (Eds.), Modeling, Design and Simulation of Systems. AsiaSim, 2017. Communications in Computer and Information Science, Springer, Singapore, 752: 216-224.
    CrossRef    
  27. Sola, J. and J. Sevilla, 1997. Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE T. Nucl. Sci., 44(3): 1464-1468.
    CrossRef    
  28. Wang, Z., V. Palade and Y. Xu, 2006. Neuro-fuzzy ensemble approach for microarray cancer gene expression data analysis. Proceeding of the 2006 International Symposium on Evolving Fuzzy Systems, pp: 241-246.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved