Research Article | OPEN ACCESS
Performance Comparison of Clustering Techniques
Sambourou Massinanke and Lu Zhimao
College of Information and Communication Engineering, Harbin Engineering University, Harbin, China
Research Journal of Applied Sciences, Engineering and Technology 2014 5:963-969
Received: January 29, 2013 | Accepted: March 14, 2013 | Published: February 05, 2014
Abstract
Data mining consists to extracting or “mining†information from large quantity of data. Clustering is one of the most significant research areas in the domain of data mining. Clustering signifies making groups of objects founded on their features where the objects of the same groups are similar and those belonging in different groups are not similar. This study reviews two Clustering Algorithms of the representative clustering techniques: K-modes and K-medoids algorithms. The two algorithms are experimented and evaluated on partitioning Y-STR data. All these algorithms are compared according to the following factors: certain number times of run, precision and recall. The global results show that K-mode clustering is better than the k-medoid in clustering Y-STR data.
Keywords:
Data clustering, , k-medoids clustering and data of Y-STR, k-modes clustering,
References
-
Ahmad, A. and L. Dey, 2007. A k-mean clustering algorithm for mixed numeric and categorical data'. Data Knowl. Eng., 63: 503-527.
CrossRef
-
Aggarwal, C.C., C.S. Gates and P.S. Yu, 1999. On the merits of building categorization systems by supervised clustering. Proceedings of the 5th Conference on ACM Special Interest Group on Knowledge Discovery and Data Mining, August 15-18, 1999, San Diego, California, USA, pp: 352-356.
CrossRef
-
Chaturvedi, A., P. Green and J. Carroll, 2001. K-modes clustering. J. Classification, 18: 35-55.
CrossRef
-
Chu, S.C., J.F. Roddick and J.S. Pan, 2002. An efficient k-medoids-based algorithm using previous medoid index, triangular inequality elimination criteria and partial distance search. Proceeding of the International Conference on Data Warehousing and Knowledge Discovery (DaWaK), London, UK, pp: 63-72.
CrossRef
-
Ester, M., H.P. Kriegel and X. Xu, 1995. Knowledge discovery in large spatial databases: focusing techniques for efficient class identification. Proceeding of the International Symposium on Advances in Spatial Databases, Portland, ME, 951: 67-82.
CrossRef
-
Fitzpatrick, C., 2005. Forensic Genealogy. Rice Book Press, Fountain Valley, CA.
-
Fitzpatrick, C. and A. Yeiser, 2005. DNA and Genealogy. Rice Book Press, Fountain Valley, CA.
-
Gan, G., Z. Yang and J. Wu, 2005. A genetic k-modes algorithm for clustering categorical data. Lect. Notes Artif. Intell., 3584(2005): 195-202.
CrossRef
-
Gowda, K.C. and E. Diday, 1991. Symbolic clustering using a new dissimilarity measure. Pattern Recogn. Lett., 24(6): 567-578.
CrossRef
-
Hartigan, J. and M. Wong, 1979. Algorithm as136: A k-means clustering algorithm. Appl. Stat., 28: 100-108.
CrossRef
-
Haung, J.Z., M.K. Ng, H. Rong and Z. Li, 2005. Automated variable weighting in k-mean type clustering. IEEE T. PAMI, 27(5).
-
He, Z., S. Deng and X. Xu, 2005. Improving k-modes algorithm considering frequencies of attribute values in mode. Lect. Notes Artif. Intell., 3801(2005): 157-162.
CrossRef
-
Huang, Z., 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discovery, 2(1998): 283-304.
CrossRef
-
Huang, Z., 2003. A note on k-modes clustering. J. Classification, 20: 257-26.
CrossRef
-
Huang, Z. and M.K. Ng, 1999. A fuzzy k-modes algorithm for clustering categorical data. IEEE T. Fuzzy Syst., 7(4): 446-452.
CrossRef
-
Jain, A.K., M.N. Murty and P.J. Flynn, 1999. Data clustering: A review. ACM Comput. Surveys, 31: 264-323, DOI: 10.1145/331499.331504.
CrossRef
-
Jain, A.K. and R.C. Dubes, 1988. Algorithms for Clustering Data. Prentice Hall Inc., Englewood Cliffs, New Jersey, pp: 320.
-
Kaufman, L. and P.J. Rousseeuw, 2005. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, NY.
CrossRef
-
Kaufmann, L. and P.J. Rousseeuw, 1990. Finding Group in Data: An Introduction to Cluster Analysis. John Willey and Sons, NY.
CrossRef
-
Kim, D.W., K.H. Lee and D. Lee, 2004. Fuzzy clustering of categorical data using fuzzy centroids. Pattern Recogn. Lett., 25(11): 1263-1271.
CrossRef
-
Kim, D.W., K.Y. Lee, D. Lee and K.H. Lee, 2005. A k populations algorithm for clustering categorical data. Pattern Recogn., 38(7): 1131-1134.
CrossRef
-
Kowalski, G., 1997. Information Retrieval Systems: Theory and Implementation. 3rd Edn., Kluwer Academic Publishers, USA, pp: 296.
-
Krishna, K. and M. Murty, 1999. Genetic k-means algorithm'. IEEE T. Syst. Man Cy., 29(3): 433-439.
CrossRef PMid:18252317
-
Ng, M.K. and J.C. Wong, 2002. Clustering categorical data sets using tabu search techniques. Pattern Recogn., 35(12): 2783-2790.
CrossRef
-
Ralambondrainy, H., 1995. A conceptual version of the K-Means algorithm. Pattern Recogn. Lett., 16: 1147-1157.
CrossRef
-
San, O.M., V.N. Huynh and Y. Nakamori, 2004. An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci., 14(2): 241-247.
-
Sun, Y., Q. Zhu and Z. Chen, 2002. An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recogn. Lett., 23(7): 875-884.
CrossRef
-
Van Rijsbergen, C.J., 1989. Information Retrieval. 2nd Edn., Buttersworth Publishers, London, UK, pp: 323.
-
Zhang, Q. and I. Couloigner, 2005. A new and efficient k-medoid algorithm for spatial clustering. Proceeding of the International Conference on Computational Science and Its Applications, Singapore, 3482 of LNCS: 181-189.
CrossRef
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|