Research Article | OPEN ACCESS
A Hybrid Classifier for Leukemia Gene Expression Data
1S. Jacophine Susmi, 1H. Khanna Nehemiah, 2A. Kannan and 1J. Jabez Christopher
1Ramanujan Computing Centre, Anna University, Chennai, 600025, India
2Department of Information Science and Technology, Anna University, Chennai, 600025, India
Research Journal of Applied Sciences, Engineering and Technology 2015 2:197-205
Received: October 22, 2014 | Accepted: December 18, 2014 | Published: May 20, 2015
Abstract
In this study, a hybrid technique is designed for classification of leukemia gene data by combining two classifiers namely, Input Discretized Neural Network (IDNN) and Genetic Algorithm-based Neural Network (GANN). The leukemia microarray gene expression data is preprocessed using probabilistic principal component analysis for dimension reduction. The dimension reduced data is subjected to two classifiers: first, an input discretized neural network and second, genetic algorithm-based neural network. In input discretized neural network, fuzzy logic is used to discretize the gene data using linguistic labels. The discretized input is used to train the neural network. The genetic algorithm-based neural network involves feature selection. The subset of genes is selected by evaluating fitness for each chromosome (solution). The subset of features with maximum fitness is used to train the neural network. The hybrid classifier designed, is experimented with the test data by subjecting it to both the trained neural networks simultaneously. The hybrid classifier employs a distance based classification that utilizes a mathematical model to predict the class type. The model utilizes the output values of IDNN and GANN with respect to the distances between the output and the median threshold, thereby predicting the class type. The performance of the hybrid classifier is compared with existing classification techniques such as neural network classifier, input discretized neural network and genetic algorithm-based neural network. The comparative result shows that the hybrid classifier technique obtains accuracy rate of 88.23% for leukemia gene data.
Keywords:
Classification , fuzzy logic, genetic algorithm, microarray gene expression, neural network, PPCA,
References
-
Agrawal, R.K. and R. Bala, 2007. A hybrid approach for selection of relevant features for microarray datasets. World Acad. Sci. Eng. Technol., 29(52): 281-287.
-
Alok, S. and K.P. Kuldip, 2008. Cancer classification by gradient LDA technique using microarray gene expression data. Data Knowl. Eng., 66: 338-347.
CrossRef
-
Anandhavalli, G., 2008. Analysis of DNA microarray data using association rules: A selective study. World Acad. Sci. Eng. Technol., 42: 12-16.
-
Anibal, R.F., V.L.G. Juan and R.G. Abalo, 2007. A new predictor of coding regions in genomic sequences using a combination of different approaches. Int. J. Biol. Life Sci., 3(2): 106-110.
-
Benítez, J.M., J.L. Castro and I. Requena, 1997. Are artificial neural networks black boxes? IEEE T. Neural Networ., 8(5): 1156-1164.
CrossRef PMid:18255717
-
Bidaut, G., F.J. Manion, C. Garcia and M.F. Ochs, 2006. WaveRead: Automatic measurement of relative gene expression levels from microarrays using wavelet analysis. J. Biomed. Inform., 39(4): 379-388.
CrossRef PMid:16298556
-
Bose, S., C. Das, T. Gangopadhyay and S. Chattopadhyay, 2013. A modified local least squares-based missing value estimation method in microarray gene expression data. Proceeding of the IEEE 2nd International Conference on Advanced Computing, Networking and Security (ADCONS), pp: 18-23.
CrossRef
-
Brintha, S.J. and V. Bhuvaneswari, 2012. Clustering microarray gene expression data using type 2 fuzzy logic. Proceeding of the IEEE 3rd National Conference on Emerging Trends and Applications in Computer Science (NCETACS), pp: 147-151.
CrossRef
-
Chien-Pang, L., L. Wen-Shin, C. Yuh-Min and K. Bo-Jein, 2011. Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst. Appl., 38(1): 4661-4667.
-
Dai, J.J., L. Lieu and D. Rocke, 2006. Dimension reduction for classification with gene expression microarray data. Stat. Appl. Genet. Mo. B., 5(1): 1-19.
CrossRef PMid:16646870
-
Essam, A.D., 2010. Integration of support vector machine and bayesian neural network for data mining and classification. World Acad. Sci. Eng. Technol., 64(35): 202-207.
-
Hela, Z., H. Laurent, L. Yves and A. Adel, 2004. Building diverse classifier outputs to evaluate the behavior of combination methods: The case of two classifiers. multiple classifier systems. In: Roli, F., J. Kittler and T. Windeatt (Eds.), MCS, 2004. LNCS 3077, Springer-Verlag, Berlin, Heidelberg, pp: 273-282.
-
Huynh, H.T., J.J. Kim and Y. Won, 2009. Classification study on DNA microarray with feedforward neural network trained by singular value decomposition. Int. J. BioSci. BioTechnol., 1(1).
-
Ilin, A. and T. Raiko, 2010. Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res., 11: 1957-2000.
-
Jiang, D., C. Tang and A. Zhang, 2004. Cluster analysis for gene expression data: A survey. IEEE T. Knowl. Data En., 16(11): 1370-1386.
CrossRef
-
Jing, L., M.K. Ng and T. Zeng, 2010. Novel hybrid method for gene selection and cancer prediction. World Acad. Sci. Eng. Technol., 62(89): 482-489.
-
Kambhatla, N. and T.K. Leen, 1997. Dimension reduction by local principal component analysis. Neural Comput., 9(7): 1493-1516.
CrossRef
-
Kanthida, K., N. Michael, P. Bernhard, B. Christian, R.L. Klaus and G. Armin, 2009. Evaluation of the impact of dataset characteristics for classification problems in biological applications. World Acad. Sci. Eng. Technol., 58: 966-970.
-
Kaur, H. and G.P.S. Raghava, 2003. A neural-network based method for prediction of ?-turns in proteins from multiple sequence alignment. Protein Sci., 12(5): 923-929.
CrossRef PMid:12717015 PMCid:PMC2323863
-
Labib, N.M. and M.N. Malek, 2005. Data mining for cancer management in Egypt case study: Childhood acute lymphoblastic leukemia. World Acad. Sci. Eng. Technol., 8(61): 309-314.
-
Morshed, J. and J.J. Kaluarachchi, 1998. Parameter estimation using artificial neural network and genetic algorithm for free-product migration and recovery. Water Resour. Res., 34(5): 1101-1113.
CrossRef
-
Nittaya, K. and K. Kittisak, 2007. Moving data mining tools toward a business intelligence system. World Acad. Sci. Eng. Technol., 25(22): 117-122.
-
Peng, Y., W. Li and Y. Liu, 2007. A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Inform., 2: 301-311.
PMid:19458773 PMCid:PMC2675487
-
Qi, S., S. Wei-Min and K. Wei, 2009. New gene selection method for multiclass tumor classification by class centroid. J. Biomed. Inform., 42(1): 59-65.
CrossRef PMid:18835752
-
Rajasekaran, S. and G.A.V. Pai, 2003. Neural Networks, Fuzzy Logic and Genetic Algorithms: Synthesis and Applications. Prentice-Hall of India, New Delhi.
-
Shreyas, S., N. Seetharam and K. Amit, 2007. Biological data mining for genomic clustering using unsupervised neural learning. Eng. Lett., 14(2).
-
Sung-Bae, C., 2002. Fusion of neural networks with fuzzy logic and genetic algorithm. Integr. Comput-Aid. E., 9: 363-372.
-
Tipping, M.E. and C.M. Bishop, 1999. Probabilistic principal component analysis. J. Roy. Stat. Soc. B., 21(3): 611-622.
CrossRef
-
Xu, Y., V. Olman and D. Xu 2001. Minimum spanning trees for gene expression data clustering. Genome Inform., 12: 24-33.
PMid:11791221
-
Young Kim, S., J. Won Lee and J. Sung Bae, 2005. Iterative clustering algorithm for analyzing temporal patterns of gene expression. World Acad. Sci. Eng. Technol., 4(3): 8-11.
-
Zhang, G.B. Huang, N. Sundararajan and P. Saratchandran, 2007. Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE ACM T. Comput. Bi., 4(3): 485-495.
-
Zhou, X., K.Y. Liu and S.T.C. Wong, 2004. Cancer classification and prediction using logistic regression with Bayesian gene Selection. J. Biomed. Inform., 37(4): 249-259.
CrossRef PMid:15465478
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|