Research Article | OPEN ACCESS
Outlier Removal Approach as a Continuous Process in Basic K-Means Clustering Algorithm
1Dauda Usman and 2Ismail Bin Mohamad
1, 2Department of Mathematical Sciences, Faculty of Science, Universiti
Teknologi Malaysia, 81310, UTM Johor Bahru, Johor Darul Ta’azim, Malaysia
Research Journal of Applied Sciences, Engineering and Technology 2014 4:771-777
Received: April 04, 2013 | Accepted: April 22, 2013 | Published: January 27, 2014
Abstract
Clustering technique is used to put similar data items in a same group. K-mean clustering is a commonly used approach in clustering technique which is based on initial centroids selected randomly. However, the existing method does not consider the data preprocessing which is an important task before executing the clustering among the different database. This study proposes a new approach of k-mean clustering algorithm. Experimental analysis shows that the proposed method performs well on infectious disease data set when compare with the conventional k-means clustering method.
Keywords:
Infectious diseases , k-means clustering, principal component analysis, principal components, standardization,
References
-
Alshalabi, L., Z. Shaaban and B. Kasasbeh. 2006. Data mining: A preprocessing engine. J. Comput. Sci., 2(9): 735-739.
CrossRef -
Chris, D. and H. Xiaofeng, 2006. K-means clustering via principal component analysis. Proceeding of the 21st International Conference on Machine Learning. Banff, Canada.
PMCid:PMC1455438 -
Ding, C. and X.X. He, 2004. K-means clustering via principal component analysis. Proceeding of the 21st International Conference on Machine Learning. ACM Press, New York.
CrossRef -
Eckart, C. and G. Young, 1936. The approximation of one matrix by another of lower rank. Psychometrika, 1: 211-218.
CrossRef -
Hartigan, J. and M. Wang, 1979. A K-means clustering algorithm. Appl. Stat., 28:100-108.
CrossRef -
Hastie, T., R. Tibshirani and J. Friedman, 2001. Elements of Statistical Learning. Springer Verlag, New York.
CrossRef -
Jain, A. and R. Dubes, 1988. Algorithms for Clustering Data. Prentice Hall, New York.
-
Jolliffe, I., 2002. Principal Component Analysis. 2nd Edn., Springer Series in Statistics. Springer-Verlag, New York.
-
Karthikeyani, V.N. and K. Thangavel, 2009. Impact of normalization in distributed k-means clustering. Int. J. Soft Comput., 4(4): 168-172.
-
Milligan, G. and M. Cooper, 1988. A study of standardization of variables in cluster analysis. J. Classif., 5: 181-204.
CrossRef -
Ng, A., M. Jordan and Y. Weiss, 2001. On spectral clustering: Analysis and an algorithm. Proceeding of the Neural Information Processing Systems (NIPS 2001).
-
Valarmathie, P., M. Srinath and K. Dinakaran, 2009. An increased performance of clustering high dimensional data through dimensionality reduction technique. J. Theor. Appl. Inform. Technol., 13: 271-273.
-
Yan, J., B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan, Q. Yang, W. Xi and Z. Chen, 2006. Effective and efficient dimensionality reduction for large scale and streaming data preprocessing. IEEE T. Knowl. Data Eng., 18(3): 320-333.
CrossRef -
Zha, H., C. Ding, M. Gu, X. He and H. Simon, 2002. Spectral relaxation for K-means clustering. Neu. Inf. Pro. Syst., 14: 1057-1064.
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|