Research Article | OPEN ACCESS
Fuzzy Discretization based Classification of Medical Data
1M. Shanmugapriya, 1H. Khanna Nehemiah, 1R.S. Bhuvaneswaran, 2Kannan Arputharaj and 1J. Dhalia Sweetlin
1Ramanujan Computing Centre
2Department of Information Science and Technology, Anna University, Chennai-600025, India
Research Journal of Applied Sciences, Engineering and Technology 2017 8:291-298
Received: December 22, 2016 | Accepted: April 11, 2017 | Published: August 15, 2017
Abstract
Discretization is one of the commonly used data preprocessing technique to improve the efficiency of the knowledge extraction process on clinical data. Generally, clinical data contains numeric attributes with continuous values. Data discretization simplifies the original data by transforming continuous data attribute values into a finite set of intervals. Although discretization is capable of handling continuous attributes on clinical data, there are cases where discretization is not an appropriate technique for handling continuous attributes. There are instances where attribute values are vague, imprecise and have multiple distributions with different classes, which challenges the process of mining in clinical data. Hence, there is a need for fuzzy discretization to pre-process the clinical data before mining. The aim of this study is to derive fuzzy discretization from crisp-interval discretization using geometric approach for constructing fuzzy sets, where overlapping region between the fuzzy sets is represented as geometric area. This study comprises of three steps: First, non-overlapping fuzzy sets are constructed using intervals generated from crisp-interval discretization. Second, area of overlapping between the fuzzy sets is computed based on the geometric approach and an average area of overlapping is estimated. Third, fuzzy sets are redesigned based on the estimated average area of overlapping. Fuzzy discretizations for three, five and seven intervals have been examined using Pima Indian Diabetes dataset (PID) and Bupa Liver Disorder dataset (BLD) taken from the University of California Irvine machine learning repository. The variation in performance of crisp and fuzzy discretization methods is measured using six classification approaches namely, tree based approach, probabilistic induction based approach, rule-based approach, network learning approach, kernel-based approach and distance-based approach and a rule-based fuzzy inference system. The results show that the classification accuracy remains stable with less deviation across different classifiers with varying intervals.
Keywords:
Classification, fuzzy discretization, fuzzy set, interval discretization, membership function, overlapping area,
References
-
Dougherty, J., R. Kohavi and M. Sahami, 1995. Supervised and unsupervised discretization of continuous features. Proceeding of the 12th International Conference on Machine Learning, 12: 194-202.
CrossRef
- Zadeh, L.A., 1965. Fuzzy sets. Inform. Control, 8(3): 338-353.
CrossRef - Alcalá-Fdez, J., A. Fernández, J. Luengo, J. Derrac, S. García, L. Sánchez and F. Herrera, 2011. KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult-Valued Log. S., 17: 255-287.
Direct Link - Allahverdi, N., 2009. Some applications of fuzzy logic in medical area. Proceeding of the IEEE International Conference on Application of Information and Communication Technologies (AICT), pp: 1-5.
CrossRef -
Bera, S., A.J. Gaikwad and D. Datta, 2014. Selection of fuzzy membership function based on probabilistic confidence. Proceeding of the International Conference on Control, Instrumentation, Energy and Communication (CIEC), pp: 612-615.
CrossRef - Exarchos, T.P., A.T. Tzallas, D. Baga, D. Chaloglou, D.I. Fotiadis, S. Tsouli, M. Diakou and S. Konitsiotis, 2012. Using Partial decision trees to predict Parkinson's symptoms: A new approach for diagnosis and therapy in patients suffering from Parkinson's disease. Comput. Biol. Med., 42(2): 195-204.
CrossRef PMid:22197114 - Fazzolari, M., R. Alcalá and F. Herrera, 2014. A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: D-MOFARC algorithm. Appl. Soft Comput., 24: 470-481.
CrossRef -
Ishibuchi, H. and T. Yamamoto, 2003. Deriving fuzzy discretization from interval discretization. Proceeding of the 12th IEEE International Conference on Fuzzy Systems, 1: 749-754.
CrossRef - Ishibuchi, H., T. Yamamoto and T. Nakashima, 2001. Fuzzy data mining: Effect of fuzzy discretization. Proceeding of the IEEE International Conference on Data Mining (ICDM), pp: 241-248.
CrossRef - Kaufmann, A., 1975. Introduction to the Theory of Fuzzy Subsets, V.1: Fundamental Theoretical Elements. Academic Press, San Diego.
- Kianmehr, K., M. Alshalalfa and R. Alhajj, 2008. Effectiveness of fuzzy discretization for class association rule-based classification. In: An, A., S. Matwin, Z.W. Ras and D. Slezak (Eds.), Foundations of Intelligent Systems. ISMIS, 2008. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 4994: 298-308.
CrossRef - Klir, G.J. and B. Yuan, 1991. Fuzzy Sets and Fuzzy Logic. Prentice-Hall, Englewood Cliffs, NJ.
PMid:1798912 -
Liu, H., F. Hussain, C.L. Tan and M. Dash, 2002. Discretization: An enabling technique. Data Min. Knowl. Disc., 6(4): 393-423.
CrossRef -
Maslove, D.M., T. Podchiyska and H.J. Lowe, 2013. Discretization of continuous features in clinical datasets. J. Am. Med. Inform. Assn., 20(3): 544-553.
CrossRef PMid:23059731 PMCid:PMC3628044 -
Mehta, R.G., D.P. Rana and M.A. Zaveri, 2009. A novel fuzzy based classification for data mining using fuzzy discretization. Proceeding of the WRI World Congress on Computer Science and Information Engineering, 3: 713-717.
CrossRef - Mittal, A. and L.F. Cheong, 2002. Employing discrete bayes error rate for discretization and feature selection tasks. Proceeding of the IEEE International Conference on Data Mining (ICDM-2002), pp: 298-305.
CrossRef - Muthukaruppan, S. and M.J. Er, 2012. A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease. Expert Syst. Appl., 39(14): 11657-11665.
CrossRef -
Naaz, S., A. Alam and R. Biswas, 2011. Effect of different defuzzification methods in a fuzzy based load balancing application. Int. J. Comput. Sci., 8(5).
Direct Link - Pal, D., K.M. Mandana, S. Pal, D. Sarkar and C. Chakraborty, 2012. Fuzzy expert system approach for coronary Pal, D., K.M. Mandana, S. Pal, D. Sarkar and C. Chakraborty, 2012. Fuzzy expert system approach for coronary artery disease screening using clinical parameters. Knowl-Based Syst., 36: 162-174.
CrossRef -
Pappis, C.P. and N.I. Karacapilidis, 1993. A comparative assessment of measures of similarity of fuzzy values. Fuzzy Set. Syst., 56(2): 171-174.
CrossRef - Quinlan, J.R., 1996. Improved use of continuous attributes in C4.5. J. Artif. Intell. Res., 4: 77-90.
CrossRef - Rajasekaran, S. and G.A. Vijayalakshmi Pai, 2007. Neural Networks, Fuzzy Logic and Genetic Algorithms: Synthesis and Applications. Prentice Hall, New Delhi, India.
-
Roy, A. and S.K. Pal, 2003. Fuzzy discretization of feature space for a rough set classifier. Pattern Recogn. Lett., 24(6): 895-902.
CrossRef - Russell, S.J. and P. Norvig, 1995. Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, NJ.
- Samuel, O.W., M.O. Omisore and B.A. Ojokoh, 2013. A web based decision support system driven by fuzzy logic for the diagnosis of typhoid fever. Expert Syst. Appl., 40(10): 4164-4171.
CrossRef - Setnes, M., R. Babuška, U. Kaymak and H.R. van Nauta Lemke, 1998. Similarity measures in fuzzy rule base simplification. IEEE T. Syst. Man Cy. B, 28(3): 376-386.
CrossRef PMid:18255954 - Shanmugapriya, M., H. Khanna Nehemiah, R.S. Bhuvaneswaran, K. Arputharaj and J. Jabez Christopher, 2016a. SimE: A geometric approach for similarity estimation of fuzzy sets. Res. J. Appl. Sci. Eng. Technol., 13(5): 345-353.
CrossRef -
Shanmugapriya, M., H. Khanna Nehemiah, R.S. Bhuvaneswaran, K. Arputharaj and J. Dhalia Sweetlin, 2016b. Unsupervised discretization: An analysis of unsupervised discretization approaches for clinical datasets. Res. J. Appl. Sci. Eng. Technol., (Accepted for Publication).
- Zeinalkhani, M. and M. Eftekhari, 2014. Fuzzy partitioning of continuous attributes through discretization methods to construct fuzzy decision tree classifiers. Inform. Sciences, 278: 715-735.
CrossRef -
Zimmermann, H.J., 1996. Fuzzy Set Theory-and Its Applications. 3rd Edn., Kluwer Academic Publishers, Norwell, MA, USA.
CrossRef -
Zwick, R., E. Carlstein and D.V. Budescu, 1987. Measures of similarity among fuzzy concepts: A comparative analysis. Int. J. Approx. Reason., 1(2): 221-242.
CrossRef
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|