Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


A Novel Ensemble Classifier based Classification on Large Datasets with Hybrid Feature Selection Approach

1J. Vandar Kuzhali and 2S. Vengataasalam
1Department of Computer Applications, Erode Sengunthar Engineering College, India
2Department of Mathematics, Kongu Engineering College, India
Research Journal of Applied Sciences, Engineering and Technology  2014  17:3633-3642
http://dx.doi.org/10.19026/rjaset.7.716  |  © The Author(s) 2014
Received: November 21, 2013  |  Accepted: December 18, 2013  |  Published: May 05, 2014

Abstract

Exploring and analyzing large datasets has become an active research area in the field of data mining in the last two decades. There had been several approaches available in the literature to investigate the large datasets that comprise of millions of data. The most important data mining approaches involved in this task are preprocessing, feature selection and classification. All the three approaches have their own importance in carrying out the task effectively. Most of the existing techniques suffer from drawbacks of high complexity and computationally costly on large data sets. Especially, the classification techniques do not provide consistent and reliable results for large datasets which makes the existing classification systems inefficient and unreliable. This study mainly focuses on develop a novel and efficient framework for analyzing and classifying a large dataset. This study proposes a novel classification approach on large datasets through the process of ensemble classification. Initially, efficient preprocessing approach based on enhanced KNN and feature selection based on genetic algorithm integrated with Kernal PCA are carried out which selects a subset of informative attributes or variables to construct models relating data. Then, Classification is carried on the selected features based on the ensemble approach to get accurate results. This research study presents two types of ensemble classifiers called homogenous and heterogeneous ensemble classifiers to evaluate the performance of the proposed system. Experimental results shows that the proposed approach provide significant results for various large datasets.

Keywords:

ANFIS, classification, datasets, dimensionality reduction, enhanced KNN, ensemble classifier, feature selection, FRB, fuzzy classifier, preprocessing,


References

  1. Bin, N., J. Du, H. Liu, G. Xu, Z. Wang, Y. He and B. Li, 2009. Crowds' classification using hierarchical cluster, rough sets, principal component analysis and its combination. Proceeding of International Forum on Computer Science-Technology and Applications (IFCSTA'09), pp: 287-290.
  2. Chen, H.L., D.Y. Liu, Y. Bo, L. Jie, W. Gang and S.J. Wang, 2011b. An Adaptive Fuzzy k-Nearest Neighbor Method Based on Parallel Particle Swarm Optimization for Bankruptcy Prediction. In: Huang, J.Z., L. Cao and J. Srivastava (Eds.): PAKDD. Part I, LNAI 6634, Springer Verlag, Berlin, Heidelberg, pp: 249-264.
    CrossRef    
  3. Chen, H.L., Y. Bo, W. Gang, L. Jie, X. Xin, S.J. Wang and D.Y. Liu, 2011a. A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. Knowl-Based Syst., 24(8): 1348-1359.
    CrossRef    
  4. Cover, T.M. and P.E. Hart, 1967. Nearest neighbor pattern classification. IEEE T. Inform. Theory, 13(1): 21-27.
    CrossRef    
  5. Dasarathy, B.V., 1990. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamos, CA.
  6. Dhaliwal, D.S., P.S. Sandhu and S.N. Panda, 2011. Enhanced K-nearest neighbor algorithm. J. World Acad. Sci. Eng. Technol., 73: 681-685.
  7. Ding, M., Z. Tian and H. Xu, 2009. Adaptive kernel principal analysis for online feature extraction. Proc. World Acad. Sci., Eng. Technol., 59: 288-293.
  8. Fangjun, K., W. Xu, S. Zhang, Y. Wang and K. Liu, 2012. A novel approach of KPCA and SVM for intrusion detection. J. Comput. Inform. Syst., 8: 3237-3244.
  9. Guha, S., R. Rastogi and K. Shim, 1998. CURE: An efficient clustering algorithm for large databases. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. Seattle, Washington, pp: 73-84.
    CrossRef    
  10. Hart, P., 1967. Nearest neighbor pattern classification. IEEE T. Inform. Theory, 13(1): 21-27.
    CrossRef    
  11. Hojjatoleslami, S.A. and J. Kittler, 1996. Detection of clusters of microcalcification using a k-nearest neighbour classifier. Proceeding of IEE Colloquium on Digital Mammography, pp: 10/1-10/6.
    CrossRef    
  12. Jang, S.R., 1992. Self-learning fuzzy controllers based on temporal back propagation. IEEE T. Neural Networ., 3(5): 714-723.
    CrossRef    PMid:18276470    
  13. Jang, S.R., 1993. ANFIS: Adaptive-network-based fuzzy inference system. IEEE T. Syst. Man Cyb., 23(3): 665-685.
    CrossRef    
  14. Keller, J., 1985. A fuzzy k-nearest neighbor algorithm. IEEE T. Syst. Man Cyb., 15(4): 580-585.
    CrossRef    
  15. Khan, J., J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson and S. Meltzer, 2001. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med., 7(6): 673-679.
    CrossRef    PMid:11385503 PMCid:PMC1282521    
  16. Liao, W.Z. and J.S. Jiang, 2008. Image feature extraction based on kernel ICA. Image Signal Process., 2: 763-767.
    CrossRef    
  17. Lior, R., 2010. Ensemble-based classifiers. Artif. Intell. Rev., 33: 1-39.
    CrossRef    
  18. Pai, P.F. and W.C. Hong, 2005. Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms. Electr. Pow. Syst. Res., 74: 417-425.
    CrossRef    
  19. Purnami, S.W., J.M. Zain and T. Heriawan, 2011. An alternative algorithm for classification large categorical dataset: K-mode clustering reduced support vector machine. Int. J. Database Theor. Appl., Vol. 4(1): 19-29.
  20. Sanz, J., A. Fernándezb, H. Bustincea and F. Herrera, 2011. A genetic tuning to improve the performance of fuzzy rule-based classi?cation systems with interval-valued fuzzy sets: Degree of ignorance and lateral position. Int. J. Approx. Reason., 52(6): 751-766.
    CrossRef    
  21. Shipp, C.A. and L.I. Kuncheva, 2002. Relationships between combination methods and measures of diversity in combining classi?ers. Inform. Fusion, 3: 135-148.
    CrossRef    
  22. Somorjai, R.L., B. Dolenko and R. Baumgartner, 2003. Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: Curses, caveats and cautions. Bioinformatics, 19(12): 1484-1491.
    CrossRef    PMid:12912828    
  23. Stefano, C.D., F. Fontanella and C. Marrocco, 2008. A GA-Based Feature Selection Algorithm for Remote Sensing Images. In: Giacobini, M. et al. (Ed.): Evo Workshops. LNCS 4974, Springer Verlag, Berlin, Heidelberg, pp: 285-294.
    CrossRef    
  24. Yang, P., B. Zhou, Z. Zhang and A. Zomaya, 2010. A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinformatics, 11(Suppl 1): S5.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved