Research Article | OPEN ACCESS
A Novel Ensemble Classifier based Classification on Large Datasets with Hybrid Feature Selection Approach
1J. Vandar Kuzhali and 2S. Vengataasalam
1Department of Computer Applications, Erode Sengunthar Engineering College, India
2Department of Mathematics, Kongu Engineering College, India
Research Journal of Applied Sciences, Engineering and Technology 2014 17:3633-3642
Received: November 21, 2013 | Accepted: December 18, 2013 | Published: May 05, 2014
Abstract
Exploring and analyzing large datasets has become an active research area in the field of data mining in the last two decades. There had been several approaches available in the literature to investigate the large datasets that comprise of millions of data. The most important data mining approaches involved in this task are preprocessing, feature selection and classification. All the three approaches have their own importance in carrying out the task effectively. Most of the existing techniques suffer from drawbacks of high complexity and computationally costly on large data sets. Especially, the classification techniques do not provide consistent and reliable results for large datasets which makes the existing classification systems inefficient and unreliable. This study mainly focuses on develop a novel and efficient framework for analyzing and classifying a large dataset. This study proposes a novel classification approach on large datasets through the process of ensemble classification. Initially, efficient preprocessing approach based on enhanced KNN and feature selection based on genetic algorithm integrated with Kernal PCA are carried out which selects a subset of informative attributes or variables to construct models relating data. Then, Classification is carried on the selected features based on the ensemble approach to get accurate results. This research study presents two types of ensemble classifiers called homogenous and heterogeneous ensemble classifiers to evaluate the performance of the proposed system. Experimental results shows that the proposed approach provide significant results for various large datasets.
Keywords:
ANFIS, classification, datasets, dimensionality reduction, enhanced KNN, ensemble classifier, feature selection, FRB, fuzzy classifier, preprocessing,
References
-
Bin, N., J. Du, H. Liu, G. Xu, Z. Wang, Y. He and B. Li, 2009. Crowds' classification using hierarchical cluster, rough sets, principal component analysis and its combination. Proceeding of International Forum on Computer Science-Technology and Applications (IFCSTA'09), pp: 287-290.
-
Chen, H.L., D.Y. Liu, Y. Bo, L. Jie, W. Gang and S.J. Wang, 2011b. An Adaptive Fuzzy k-Nearest Neighbor Method Based on Parallel Particle Swarm Optimization for Bankruptcy Prediction. In: Huang, J.Z., L. Cao and J. Srivastava (Eds.): PAKDD. Part I, LNAI 6634, Springer Verlag, Berlin, Heidelberg, pp: 249-264.
CrossRef
-
Chen, H.L., Y. Bo, W. Gang, L. Jie, X. Xin, S.J. Wang and D.Y. Liu, 2011a. A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. Knowl-Based Syst., 24(8): 1348-1359.
CrossRef
-
Cover, T.M. and P.E. Hart, 1967. Nearest neighbor pattern classification. IEEE T. Inform. Theory, 13(1): 21-27.
CrossRef
-
Dasarathy, B.V., 1990. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamos, CA.
-
Dhaliwal, D.S., P.S. Sandhu and S.N. Panda, 2011. Enhanced K-nearest neighbor algorithm. J. World Acad. Sci. Eng. Technol., 73: 681-685.
-
Ding, M., Z. Tian and H. Xu, 2009. Adaptive kernel principal analysis for online feature extraction. Proc. World Acad. Sci., Eng. Technol., 59: 288-293.
-
Fangjun, K., W. Xu, S. Zhang, Y. Wang and K. Liu, 2012. A novel approach of KPCA and SVM for intrusion detection. J. Comput. Inform. Syst., 8: 3237-3244.
-
Guha, S., R. Rastogi and K. Shim, 1998. CURE: An efficient clustering algorithm for large databases. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. Seattle, Washington, pp: 73-84.
CrossRef
-
Hart, P., 1967. Nearest neighbor pattern classification. IEEE T. Inform. Theory, 13(1): 21-27.
CrossRef
-
Hojjatoleslami, S.A. and J. Kittler, 1996. Detection of clusters of microcalcification using a k-nearest neighbour classifier. Proceeding of IEE Colloquium on Digital Mammography, pp: 10/1-10/6.
CrossRef
-
Jang, S.R., 1992. Self-learning fuzzy controllers based on temporal back propagation. IEEE T. Neural Networ., 3(5): 714-723.
CrossRef PMid:18276470
-
Jang, S.R., 1993. ANFIS: Adaptive-network-based fuzzy inference system. IEEE T. Syst. Man Cyb., 23(3): 665-685.
CrossRef
-
Keller, J., 1985. A fuzzy k-nearest neighbor algorithm. IEEE T. Syst. Man Cyb., 15(4): 580-585.
CrossRef
-
Khan, J., J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson and S. Meltzer, 2001. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med., 7(6): 673-679.
CrossRef PMid:11385503 PMCid:PMC1282521
-
Liao, W.Z. and J.S. Jiang, 2008. Image feature extraction based on kernel ICA. Image Signal Process., 2: 763-767.
CrossRef
-
Lior, R., 2010. Ensemble-based classifiers. Artif. Intell. Rev., 33: 1-39.
CrossRef
-
Pai, P.F. and W.C. Hong, 2005. Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms. Electr. Pow. Syst. Res., 74: 417-425.
CrossRef
-
Purnami, S.W., J.M. Zain and T. Heriawan, 2011. An alternative algorithm for classification large categorical dataset: K-mode clustering reduced support vector machine. Int. J. Database Theor. Appl., Vol. 4(1): 19-29.
-
Sanz, J., A. Fernándezb, H. Bustincea and F. Herrera, 2011. A genetic tuning to improve the performance of fuzzy rule-based classi?cation systems with interval-valued fuzzy sets: Degree of ignorance and lateral position. Int. J. Approx. Reason., 52(6): 751-766.
CrossRef
-
Shipp, C.A. and L.I. Kuncheva, 2002. Relationships between combination methods and measures of diversity in combining classi?ers. Inform. Fusion, 3: 135-148.
CrossRef
-
Somorjai, R.L., B. Dolenko and R. Baumgartner, 2003. Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: Curses, caveats and cautions. Bioinformatics, 19(12): 1484-1491.
CrossRef PMid:12912828
-
Stefano, C.D., F. Fontanella and C. Marrocco, 2008. A GA-Based Feature Selection Algorithm for Remote Sensing Images. In: Giacobini, M. et al. (Ed.): Evo Workshops. LNCS 4974, Springer Verlag, Berlin, Heidelberg, pp: 285-294.
CrossRef
-
Yang, P., B. Zhou, Z. Zhang and A. Zomaya, 2010. A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinformatics, 11(Suppl 1): S5.
CrossRef
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|