Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


A Hybrid Classifier for Leukemia Gene Expression Data

1S. Jacophine Susmi, 1H. Khanna Nehemiah, 2A. Kannan and 1J. Jabez Christopher
1Ramanujan Computing Centre, Anna University, Chennai, 600025, India
2Department of Information Science and Technology, Anna University, Chennai, 600025, India
Research Journal of Applied Sciences, Engineering and Technology  2015  2:197-205
http://dx.doi.org/10.19026/rjaset.10.2572  |  © The Author(s) 2015
Received: October ‎22, ‎2014  |  Accepted: December ‎18, ‎2014  |  Published: May 20, 2015

Abstract

In this study, a hybrid technique is designed for classification of leukemia gene data by combining two classifiers namely, Input Discretized Neural Network (IDNN) and Genetic Algorithm-based Neural Network (GANN). The leukemia microarray gene expression data is preprocessed using probabilistic principal component analysis for dimension reduction. The dimension reduced data is subjected to two classifiers: first, an input discretized neural network and second, genetic algorithm-based neural network. In input discretized neural network, fuzzy logic is used to discretize the gene data using linguistic labels. The discretized input is used to train the neural network. The genetic algorithm-based neural network involves feature selection. The subset of genes is selected by evaluating fitness for each chromosome (solution). The subset of features with maximum fitness is used to train the neural network. The hybrid classifier designed, is experimented with the test data by subjecting it to both the trained neural networks simultaneously. The hybrid classifier employs a distance based classification that utilizes a mathematical model to predict the class type. The model utilizes the output values of IDNN and GANN with respect to the distances between the output and the median threshold, thereby predicting the class type. The performance of the hybrid classifier is compared with existing classification techniques such as neural network classifier, input discretized neural network and genetic algorithm-based neural network. The comparative result shows that the hybrid classifier technique obtains accuracy rate of 88.23% for leukemia gene data.

Keywords:

Classification , fuzzy logic, genetic algorithm, microarray gene expression, neural network, PPCA,


References

  1. Agrawal, R.K. and R. Bala, 2007. A hybrid approach for selection of relevant features for microarray datasets. World Acad. Sci. Eng. Technol., 29(52): 281-287.
  2. Alok, S. and K.P. Kuldip, 2008. Cancer classification by gradient LDA technique using microarray gene expression data. Data Knowl. Eng., 66: 338-347.
    CrossRef    
  3. Anandhavalli, G., 2008. Analysis of DNA microarray data using association rules: A selective study. World Acad. Sci. Eng. Technol., 42: 12-16.
  4. Anibal, R.F., V.L.G. Juan and R.G. Abalo, 2007. A new predictor of coding regions in genomic sequences using a combination of different approaches. Int. J. Biol. Life Sci., 3(2): 106-110.
  5. Benítez, J.M., J.L. Castro and I. Requena, 1997. Are artificial neural networks black boxes? IEEE T. Neural Networ., 8(5): 1156-1164.
    CrossRef    PMid:18255717    
  6. Bidaut, G., F.J. Manion, C. Garcia and M.F. Ochs, 2006. WaveRead: Automatic measurement of relative gene expression levels from microarrays using wavelet analysis. J. Biomed. Inform., 39(4): 379-388.
    CrossRef    PMid:16298556    
  7. Bose, S., C. Das, T. Gangopadhyay and S. Chattopadhyay, 2013. A modified local least squares-based missing value estimation method in microarray gene expression data. Proceeding of the IEEE 2nd International Conference on Advanced Computing, Networking and Security (ADCONS), pp: 18-23.
    CrossRef    
  8. Brintha, S.J. and V. Bhuvaneswari, 2012. Clustering microarray gene expression data using type 2 fuzzy logic. Proceeding of the IEEE 3rd National Conference on Emerging Trends and Applications in Computer Science (NCETACS), pp: 147-151.
    CrossRef    
  9. Chien-Pang, L., L. Wen-Shin, C. Yuh-Min and K. Bo-Jein, 2011. Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst. Appl., 38(1): 4661-4667.
  10. Dai, J.J., L. Lieu and D. Rocke, 2006. Dimension reduction for classification with gene expression microarray data. Stat. Appl. Genet. Mo. B., 5(1): 1-19.
    CrossRef    PMid:16646870    
  11. Essam, A.D., 2010. Integration of support vector machine and bayesian neural network for data mining and classification. World Acad. Sci. Eng. Technol., 64(35): 202-207.
  12. Hela, Z., H. Laurent, L. Yves and A. Adel, 2004. Building diverse classifier outputs to evaluate the behavior of combination methods: The case of two classifiers. multiple classifier systems. In: Roli, F., J. Kittler and T. Windeatt (Eds.), MCS, 2004. LNCS 3077, Springer-Verlag, Berlin, Heidelberg, pp: 273-282.
  13. Huynh, H.T., J.J. Kim and Y. Won, 2009. Classification study on DNA microarray with feedforward neural network trained by singular value decomposition. Int. J. BioSci. BioTechnol., 1(1).
  14. Ilin, A. and T. Raiko, 2010. Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res., 11: 1957-2000.
  15. Jiang, D., C. Tang and A. Zhang, 2004. Cluster analysis for gene expression data: A survey. IEEE T. Knowl. Data En., 16(11): 1370-1386.
    CrossRef    
  16. Jing, L., M.K. Ng and T. Zeng, 2010. Novel hybrid method for gene selection and cancer prediction. World Acad. Sci. Eng. Technol., 62(89): 482-489.
  17. Kambhatla, N. and T.K. Leen, 1997. Dimension reduction by local principal component analysis. Neural Comput., 9(7): 1493-1516.
    CrossRef    
  18. Kanthida, K., N. Michael, P. Bernhard, B. Christian, R.L. Klaus and G. Armin, 2009. Evaluation of the impact of dataset characteristics for classification problems in biological applications. World Acad. Sci. Eng. Technol., 58: 966-970.
  19. Kaur, H. and G.P.S. Raghava, 2003. A neural-network based method for prediction of ?-turns in proteins from multiple sequence alignment. Protein Sci., 12(5): 923-929.
    CrossRef    PMid:12717015 PMCid:PMC2323863    
  20. Labib, N.M. and M.N. Malek, 2005. Data mining for cancer management in Egypt case study: Childhood acute lymphoblastic leukemia. World Acad. Sci. Eng. Technol., 8(61): 309-314.
  21. Morshed, J. and J.J. Kaluarachchi, 1998. Parameter estimation using artificial neural network and genetic algorithm for free-product migration and recovery. Water Resour. Res., 34(5): 1101-1113.
    CrossRef    
  22. Nittaya, K. and K. Kittisak, 2007. Moving data mining tools toward a business intelligence system. World Acad. Sci. Eng. Technol., 25(22): 117-122.
  23. Peng, Y., W. Li and Y. Liu, 2007. A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Inform., 2: 301-311.
    PMid:19458773 PMCid:PMC2675487    
  24. Qi, S., S. Wei-Min and K. Wei, 2009. New gene selection method for multiclass tumor classification by class centroid. J. Biomed. Inform., 42(1): 59-65.
    CrossRef    PMid:18835752    
  25. Rajasekaran, S. and G.A.V. Pai, 2003. Neural Networks, Fuzzy Logic and Genetic Algorithms: Synthesis and Applications. Prentice-Hall of India, New Delhi.
  26. Shreyas, S., N. Seetharam and K. Amit, 2007. Biological data mining for genomic clustering using unsupervised neural learning. Eng. Lett., 14(2).
  27. Sung-Bae, C., 2002. Fusion of neural networks with fuzzy logic and genetic algorithm. Integr. Comput-Aid. E., 9: 363-372.
  28. Tipping, M.E. and C.M. Bishop, 1999. Probabilistic principal component analysis. J. Roy. Stat. Soc. B., 21(3): 611-622.
    CrossRef    
  29. Xu, Y., V. Olman and D. Xu 2001. Minimum spanning trees for gene expression data clustering. Genome Inform., 12: 24-33.
    PMid:11791221    
  30. Young Kim, S., J. Won Lee and J. Sung Bae, 2005. Iterative clustering algorithm for analyzing temporal patterns of gene expression. World Acad. Sci. Eng. Technol., 4(3): 8-11.
  31. Zhang, G.B. Huang, N. Sundararajan and P. Saratchandran, 2007. Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE ACM T. Comput. Bi., 4(3): 485-495.
  32. Zhou, X., K.Y. Liu and S.T.C. Wong, 2004. Cancer classification and prediction using logistic regression with Bayesian gene Selection. J. Biomed. Inform., 37(4): 249-259.
    CrossRef    PMid:15465478    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved