A Hybrid Classifier for Leukemia Gene Expression Data

S. Jacophine Susmi; H. Khanna Nehemiah; A. Kannan; J. Jabez Christopher

doi:10.19026/rjaset.10.2572

Abstract

In this study, a hybrid technique is designed for classification of leukemia gene data by combining two classifiers namely, Input Discretized Neural Network (IDNN) and Genetic Algorithm-based Neural Network (GANN). The leukemia microarray gene expression data is preprocessed using probabilistic principal component analysis for dimension reduction. The dimension reduced data is subjected to two classifiers: first, an input discretized neural network and second, genetic algorithm-based neural network. In input discretized neural network, fuzzy logic is used to discretize the gene data using linguistic labels. The discretized input is used to train the neural network. The genetic algorithm-based neural network involves feature selection. The subset of genes is selected by evaluating fitness for each chromosome (solution). The subset of features with maximum fitness is used to train the neural network. The hybrid classifier designed, is experimented with the test data by subjecting it to both the trained neural networks simultaneously. The hybrid classifier employs a distance based classification that utilizes a mathematical model to predict the class type. The model utilizes the output values of IDNN and GANN with respect to the distances between the output and the median threshold, thereby predicting the class type. The performance of the hybrid classifier is compared with existing classification techniques such as neural network classifier, input discretized neural network and genetic algorithm-based neural network. The comparative result shows that the hybrid classifier technique obtains accuracy rate of 88.23% for leukemia gene data.

Keywords:

Classification , fuzzy logic, genetic algorithm, microarray gene expression, neural network, PPCA,

References

Agrawal, R.K. and R. Bala, 2007. A hybrid approach for selection of relevant features for microarray datasets. World Acad. Sci. Eng. Technol., 29(52): 281-287.
Alok, S. and K.P. Kuldip, 2008. Cancer classification by gradient LDA technique using microarray gene expression data. Data Knowl. Eng., 66: 338-347.
CrossRef
Anandhavalli, G., 2008. Analysis of DNA microarray data using association rules: A selective study. World Acad. Sci. Eng. Technol., 42: 12-16.
Anibal, R.F., V.L.G. Juan and R.G. Abalo, 2007. A new predictor of coding regions in genomic sequences using a combination of different approaches. Int. J. Biol. Life Sci., 3(2): 106-110.
Ben�tez, J.M., J.L. Castro and I. Requena, 1997. Are artificial neural networks black boxes? IEEE T. Neural Networ., 8(5): 1156-1164.
CrossRef PMid:18255717
Bidaut, G., F.J. Manion, C. Garcia and M.F. Ochs, 2006. WaveRead: Automatic measurement of relative gene expression levels from microarrays using wavelet analysis. J. Biomed. Inform., 39(4): 379-388.
CrossRef PMid:16298556
Bose, S., C. Das, T. Gangopadhyay and S. Chattopadhyay, 2013. A modified local least squares-based missing value estimation method in microarray gene expression data. Proceeding of the IEEE 2nd International Conference on Advanced Computing, Networking and Security (ADCONS), pp: 18-23.
CrossRef
Brintha, S.J. and V. Bhuvaneswari, 2012. Clustering microarray gene expression data using type 2 fuzzy logic. Proceeding of the IEEE 3rd National Conference on Emerging Trends and Applications in Computer Science (NCETACS), pp: 147-151.
CrossRef
Chien-Pang, L., L. Wen-Shin, C. Yuh-Min and K. Bo-Jein, 2011. Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst. Appl., 38(1): 4661-4667.
Dai, J.J., L. Lieu and D. Rocke, 2006. Dimension reduction for classification with gene expression microarray data. Stat. Appl. Genet. Mo. B., 5(1): 1-19.
CrossRef PMid:16646870
Essam, A.D., 2010. Integration of support vector machine and bayesian neural network for data mining and classification. World Acad. Sci. Eng. Technol., 64(35): 202-207.
Hela, Z., H. Laurent, L. Yves and A. Adel, 2004. Building diverse classifier outputs to evaluate the behavior of combination methods: The case of two classifiers. multiple classifier systems. In: Roli, F., J. Kittler and T. Windeatt (Eds.), MCS, 2004. LNCS 3077, Springer-Verlag, Berlin, Heidelberg, pp: 273-282.
Huynh, H.T., J.J. Kim and Y. Won, 2009. Classification study on DNA microarray with feedforward neural network trained by singular value decomposition. Int. J. BioSci. BioTechnol., 1(1).
Ilin, A. and T. Raiko, 2010. Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res., 11: 1957-2000.
Jiang, D., C. Tang and A. Zhang, 2004. Cluster analysis for gene expression data: A survey. IEEE T. Knowl. Data En., 16(11): 1370-1386.
CrossRef
Jing, L., M.K. Ng and T. Zeng, 2010. Novel hybrid method for gene selection and cancer prediction. World Acad. Sci. Eng. Technol., 62(89): 482-489.
Kambhatla, N. and T.K. Leen, 1997. Dimension reduction by local principal component analysis. Neural Comput., 9(7): 1493-1516.
CrossRef
Kanthida, K., N. Michael, P. Bernhard, B. Christian, R.L. Klaus and G. Armin, 2009. Evaluation of the impact of dataset characteristics for classification problems in biological applications. World Acad. Sci. Eng. Technol., 58: 966-970.
Kaur, H. and G.P.S. Raghava, 2003. A neural-network based method for prediction of ?-turns in proteins from multiple sequence alignment. Protein Sci., 12(5): 923-929.
CrossRef PMid:12717015 PMCid:PMC2323863
Labib, N.M. and M.N. Malek, 2005. Data mining for cancer management in Egypt case study: Childhood acute lymphoblastic leukemia. World Acad. Sci. Eng. Technol., 8(61): 309-314.
Morshed, J. and J.J. Kaluarachchi, 1998. Parameter estimation using artificial neural network and genetic algorithm for free-product migration and recovery. Water Resour. Res., 34(5): 1101-1113.
CrossRef
Nittaya, K. and K. Kittisak, 2007. Moving data mining tools toward a business intelligence system. World Acad. Sci. Eng. Technol., 25(22): 117-122.
Peng, Y., W. Li and Y. Liu, 2007. A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Inform., 2: 301-311.
PMid:19458773 PMCid:PMC2675487
Qi, S., S. Wei-Min and K. Wei, 2009. New gene selection method for multiclass tumor classification by class centroid. J. Biomed. Inform., 42(1): 59-65.
CrossRef PMid:18835752
Rajasekaran, S. and G.A.V. Pai, 2003. Neural Networks, Fuzzy Logic and Genetic Algorithms: Synthesis and Applications. Prentice-Hall of India, New Delhi.
Shreyas, S., N. Seetharam and K. Amit, 2007. Biological data mining for genomic clustering using unsupervised neural learning. Eng. Lett., 14(2).
Sung-Bae, C., 2002. Fusion of neural networks with fuzzy logic and genetic algorithm. Integr. Comput-Aid. E., 9: 363-372.
Tipping, M.E. and C.M. Bishop, 1999. Probabilistic principal component analysis. J. Roy. Stat. Soc. B., 21(3): 611-622.
CrossRef
Xu, Y., V. Olman and D. Xu 2001. Minimum spanning trees for gene expression data clustering. Genome Inform., 12: 24-33.
PMid:11791221
Young Kim, S., J. Won Lee and J. Sung Bae, 2005. Iterative clustering algorithm for analyzing temporal patterns of gene expression. World Acad. Sci. Eng. Technol., 4(3): 8-11.
Zhang, G.B. Huang, N. Sundararajan and P. Saratchandran, 2007. Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE ACM T. Comput. Bi., 4(3): 485-495.
Zhou, X., K.Y. Liu and S.T.C. Wong, 2004. Cancer classification and prediction using logistic regression with Bayesian gene Selection. J. Biomed. Inform., 37(4): 249-259.
CrossRef PMid:15465478

Research Journal of Applied Sciences, Engineering and Technology

A Hybrid Classifier for Leukemia Gene Expression Data

Abstract

Keywords:

References

Competing interests

Open Access Policy

Copyright



Journal Home \| Aim & Scope \| Author(s) Information \| Editorial Board \| MSP Download Statistics