Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Outlier Removal Approach as a Continuous Process in Basic K-Means Clustering Algorithm

1Dauda Usman and 2Ismail Bin Mohamad
1, 2Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia, 81310, UTM Johor Bahru, Johor Darul Ta’azim, Malaysia
Research Journal of Applied Sciences, Engineering and Technology  2014  4:771-777
http://dx.doi.org/10.19026/rjaset.7.315  |  © The Author(s) 2014
Received: April 04, 2013  |  Accepted: April 22, 2013  |  Published: January 27, 2014

Abstract

Clustering technique is used to put similar data items in a same group. K-mean clustering is a commonly used approach in clustering technique which is based on initial centroids selected randomly. However, the existing method does not consider the data preprocessing which is an important task before executing the clustering among the different database. This study proposes a new approach of k-mean clustering algorithm. Experimental analysis shows that the proposed method performs well on infectious disease data set when compare with the conventional k-means clustering method.

Keywords:

Infectious diseases , k-means clustering, principal component analysis, principal components, standardization,


References

  1. Alshalabi, L., Z. Shaaban and B. Kasasbeh. 2006. Data mining: A preprocessing engine. J. Comput. Sci., 2(9): 735-739.
    CrossRef    
  2. Chris, D. and H. Xiaofeng, 2006. K-means clustering via principal component analysis. Proceeding of the 21st International Conference on Machine Learning. Banff, Canada.
    PMCid:PMC1455438    
  3. Ding, C. and X.X. He, 2004. K-means clustering via principal component analysis. Proceeding of the 21st International Conference on Machine Learning. ACM Press, New York.
    CrossRef    
  4. Eckart, C. and G. Young, 1936. The approximation of one matrix by another of lower rank. Psychometrika, 1: 211-218.
    CrossRef    
  5. Hartigan, J. and M. Wang, 1979. A K-means clustering algorithm. Appl. Stat., 28:100-108.
    CrossRef    
  6. Hastie, T., R. Tibshirani and J. Friedman, 2001. Elements of Statistical Learning. Springer Verlag, New York.
    CrossRef    
  7. Jain, A. and R. Dubes, 1988. Algorithms for Clustering Data. Prentice Hall, New York.
  8. Jolliffe, I., 2002. Principal Component Analysis. 2nd Edn., Springer Series in Statistics. Springer-Verlag, New York.
  9. Karthikeyani, V.N. and K. Thangavel, 2009. Impact of normalization in distributed k-means clustering. Int. J. Soft Comput., 4(4): 168-172.
  10. Milligan, G. and M. Cooper, 1988. A study of standardization of variables in cluster analysis. J. Classif., 5: 181-204.
    CrossRef    
  11. Ng, A., M. Jordan and Y. Weiss, 2001. On spectral clustering: Analysis and an algorithm. Proceeding of the Neural Information Processing Systems (NIPS 2001).
  12. Valarmathie, P., M. Srinath and K. Dinakaran, 2009. An increased performance of clustering high dimensional data through dimensionality reduction technique. J. Theor. Appl. Inform. Technol., 13: 271-273.
  13. Yan, J., B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan, Q. Yang, W. Xi and Z. Chen, 2006. Effective and efficient dimensionality reduction for large scale and streaming data preprocessing. IEEE T. Knowl. Data Eng., 18(3): 320-333.
    CrossRef    
  14. Zha, H., C. Ding, M. Gu, X. He and H. Simon, 2002. Spectral relaxation for K-means clustering. Neu. Inf. Pro. Syst., 14: 1057-1064.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved