Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Rough K-means Outlier Factor Based on Entropy Computation

Djoko Budiyanto Setyohadi, Azuraliza Abu Bakar and Zulaiha Ali Othman
Data Mining and Optimization Research Group, Center for Artificial Intelligence Technologi, Faculty of Information Science and Technologi, Universiti Kebangsaan Malaysia, Bangi, Selangor DarulEhsan, 43000, Malaysia
Research Journal of Applied Sciences, Engineering and Technology  2014  3:398-409
http://dx.doi.org/10.19026/rjaset.8.986  |  © The Author(s) 2014
Received: March ‎29, ‎2014  |  Accepted: April ‎28, ‎2014  |  Published: July 15, 2014

Abstract

Many studies of outlier detection have been developed based on the cluster-based outlier detection approach, since it does not need any prior knowledge of the dataset. However, the previous studies only regard the outlier factor computation with respect to a single point or a small cluster, which reflects its deviates from a common cluster. Furthermore, all objects within outlier cluster are assumed to be similar. The outlier objects intuitively can be grouped into the outlier clusters and the outlier factors of each object within the outlier cluster should be different gradually. It is not natural if the outlierness of each object within outlier cluster is similar. This study proposes the new outlier detection method based on the hybrid of the Rough K-Means clustering algorithm and the entropy computation. We introduce the outlier degree measure namely the entropy outlier factor for the cluster based outlier detection. The proposed algorithm sequentially finds the outlier cluster and calculates the outlier factor degree of the objects within outlier cluster. Each object within outlier cluster is evaluated using entropy cluster-based to a whole cluster. The performance of the algorithm has been tested on four UCI benchmark data sets and show outperform especially in detection rate.

Keywords:

Entropy outlier , outlier detection , rough k-means,


References

  1. Breunig, M.M., H.P. Kriegel, R.T. Ng and J. Sander, 2000. LOF: Identifying density based local outliers. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp: 93-104.
    CrossRef    
  2. Chandola, V., A. Banerjee and V. Kumar, 2009. Anomaly detection: A survey. ACM Comput. Surv., 41(3), Article 15.
    CrossRef    
  3. Chen, Y., D. Miao and R. Wang, 2008. Outlier Detection Based on Granular Computing. Springer, Heidelberg.
    CrossRef    
  4. Duan, L., L. Xu, Y. Liu and J. Lee, 2009. Cluster-based outlier detection. Ann. Oper. Res., 168: 151-168.
    CrossRef    
  5. Harkins, S., H. He, G.J. Willams and R.A. Baster, 2002. Outlier detection using replicator neural networks. Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery. Aix-en-Provence, France, pp: 170-180.
    CrossRef    
  6. Hawkins, D.M., 1980. Identifications of Outliers, Monograph on Applied Probability and Statistic. Chapman and Hall, London.
    CrossRef    
  7. He, Z., X. Xu and S. Deng, 2003. Discovering cluster based local outliers. Pattern Recogn. Lett., 24(9-10): 1641-1650.
    CrossRef    
  8. He, Z., S. Deng and X. Xu, 2005. An optimization model for outlier detection in categorical data. Proceeding of the International Conference on Intelligent Computing, pp: 400-495.
    CrossRef    
  9. He, Z., J.Z. Huang, X. Xu and D. Shengchun, 2004. A Frequent Pattern Discovery Method for Outlier Detection. In: Springer Link (Ed.), Lecture Notes Computer Science. Springer, Berlin/Heidelberg, pp: 726-732.
    CrossRef    
  10. He, Z., S. Deng, X. Xu and J.Z. Huang, 2006. A fast greedy algorithm for outlier mining. Proceeding of 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD, 2006), pp: 567-576.
    CrossRef    
  11. Hodge, V.J. and J. Austin, 2004. A survey of outlier detection methodologies. Artif. Intell. Rev., 22: 85-126.
    CrossRef    
  12. Jiang, M.F., S.S. Tseng and C.M. Su, 2001. Two-phase clustering process for outliers detection. Pattern Recogn. Lett., 22(6-7): 691-70.
    CrossRef    
  13. Jiang, F., Y.F. Sui and C.G. Cao, 2005. Outlier Detection Using Rough Set Theory. In: Slezak, D., J. Yao, J.F. Peters, W. Ziarko and X. Hu (Eds.), RSFDGrC 2005. LNCS (LNAI), Springer, Heidelberg, 3642: 79-87.
    CrossRef    
  14. Jiang, F., Y. Sui and C. Cao, 2006. Outlier Detection Based on Rough Membership Function. In: Greco S. et al. (Eds.), RSCTC 2006. LNAI 4259, Springer-Verlag, Berlin, Heidelberg, pp: 388-397.
    CrossRef    
  15. Lingras, P. and C. West, 2004. Interval set clustering of Web users with rough k-means. J. Intell. Inform. Syst., 23: 5-16.
    CrossRef    
  16. Mahoney, M.V. and P.K. Chan, 2003. Learning rules for anomaly detection of hostile network traffic. Proceeding of 3rd IEEE International Conference on Data Mining (ICDM, 2003), pp: 601-604.
    CrossRef    
  17. Nguyen, T.T., 2007. Outlier Detection: An Approximate Reasoning Approach. In: Kryszkiwicz, M. et al. (Eds.), RSEISP 2007. LNAI 4585, Springer-Verlag, Berlin, Heidelberg, pp: 495-504.
    CrossRef    
  18. Pawlak, Z., J. Grzymala-Busse, R. Slowinski and W. Ziarko, 1995. Rough sets. Commun. ACM, 38(11): 89-95.
    CrossRef    
  19. Pires, A. and C.M. Santos-Pereira, 2005. Using clustering and robust estimators to detect outliers in multivariate data. Proceedings of the International Conference on Robust Statistics.
  20. Ramaswamy, S., R. Rastogi and K. Shim, 2000. Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD'00), pp: 427-438.
    CrossRef    PMid:10870986    
  21. Shaari, F., A.A. Bakar and A.R. Hamdan, 2009. Outlier detection based on rough sets theory. Intell. Data Anal., 13(2): 191-206.
  22. Yao, J. and M.M. Dash, 2000. Entropy-based fuzzy clustering and modeling. Fuzzy Set. Syst., 3: 282-188.
    CrossRef    
  23. Zhang, K., M. Hutter and H. Jin, 2009. A new local distance-based outlier detection approach for scattered real-world data. Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD, 2009), pp: 813-822.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved