Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Distributed Anomaly Detection Over Big Data

Mohamed Sakr, Walid Atwa and Arabi Keshk
Faculty of Computers and Information, Shebeen El Kom, Menofia, 32511, Egypt
Research Journal of Applied Sciences, Engineering and Technology  2019  2:77-87
http://dx.doi.org/10.19026/rjaset.16.6003  |  © The Author(s) 2019
Received: December 27, 2018  |  Accepted: February 17, 2019  |  Published: March 15, 2019

Abstract

This study aims to solve the problem of detecting anomalies in big data. A border-based Gird Partition (BGP) algorithm was proposed. The BGP algorithm focuses on calculating the Local Outlier Factor (LOF) for big data in a distributed environment. It splits the data into intersected subsets, then allocates these subsets to the slave nodes in a distributed environment. Some parts of these subsets are replicated between slave nodes. The slave nodes calculate the LOF for each subset that it owns. The splitting of the data between the slave nodes is done in grid-based without considering the size of the data that will be assigned to every slave node. The BGP algorithm results in un-balanced distribution of the subsets between slave nodes. To overcome this problem a modification on the BGP algorithm is proposed to take in consideration the size of the data that will be assigned to every slave node. The modified algorithm called Balanced boarder-based Gird Partition algorithm (BBGP). BBGP splits the data between the slave node equally. So that all the slave nodes will do balanced processing for calculating the LOF for the data. In the end, we evaluate the performance of the two algorithms through a series of simulation experiments over real data sets.

Keywords:

Anomaly detection, big data, distributed environment, local outlier factor, outlier detection,


References

  1. Breunig, M.M., H.P. Kriegel, R.T. Ng and J. Sander, 2000. LOF: Identifying density-based local outliers. Proceeding of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00), pp: 93-104.
    CrossRef    
  2. Cao, F., M. Ester, W. Qian and A. Zhou, 2006. Density-based clustering over an evolving data stream with noise. Proceeding of the 6th SIAM International Conference on Data Mining, pp: 328-339.
    CrossRef    
  3. Hawkins, D.M., 1980. Identification of Outliers. Chapman and Hall, London, Vol. 11.https://doi.org/10.1007/978-94-015-3994-4
    CrossRef    PMid:6898078    
  4. Knox, E.M. and R.T. Ng, 1998. Algorithms for mining distance-based outliers in large datasets. Proceeding of the 24th International Conference on Very Large Data Bases (VLDB '98), pp: 392-403.
  5. Ramaswamy, S., R. Rastogi and K. Shim, 2000. Efficient algorithms for mining outliers from large data sets. Proceeding of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00), pp: 427-438.
    CrossRef    PMid:10870986    
  6. Aggarwal, C.C. and P.S. Yu, 2001. Outlier detection for high dimensional data. Proceeding of the 2001 ACM SIGMOD International Conference on Management of Data, pp: 37-46.
    CrossRef    
  7. Aggarwal, C.C. and P.S. Yu, 2008. Outlier detection with uncertain data. Proceeding of the 2008 SIAM International Conference on Data Mining, pp: 483-493.
    CrossRef    
  8. Aggarwal, C.C., J. Han, J. Wang and P.S. Yu, 2003. A framework for clustering evolving data streams. Proceeding of the 29th International Conference on Very Large Data Bases (VLDB '03), 29: 81-92.
    CrossRef    PMid:12693467    
  9. Bai, M., X. Wang, J. Xin and G. Wang, 2016. An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing, 181: 19-28.
    CrossRef    
  10. Dheeru, D. and E. Karra Taniskidou, 2017. UCI Machine Learning Repository.
  11. Esoteric, n.d. 2018. Kryonet. [Online] Retrieved form: https://github.com/EsotericSoftware/kryonet. (Accessed on: January 1, 2018)
    Direct Link
  12. Guha, S., A. Meyerson, N. Mishra, R. Motwani and L. O'Callaghan, 2003. Clustering data streams: Theory and practice. IEEE T. Knowl. Data En., 15: 515-528.
    CrossRef    
  13. Jin, W., A.K.H. Tung, J. Han and W. Wang, 2006. Ranking outliers using symmetric neighborhood relationship. In: Ng, W.K., M. Kitsuregawa, J. Li and K. Chang, (Eds.): Advances in Knowledge Discovery and Data Mining. PAKDD, 2006. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 3918: 577-593.
    CrossRef    
  14. Kontaki, M., A. Gounaris, A.N. Papadopoulos, K. Tsichlas and Y. Manolopoulos, 2011. Continuous monitoring of distance-based outliers over data streams. Proceeding of the IEEE 27th International Conference on Data Engineering, pp: 135-146.
    CrossRef    
  15. Lozano, E. and E. Acufia, 2005. Parallel algorithms for distance-based and density-based outliers. Proceeding of the 5th IEEE International Conference on Data Mining (ICDM'05), pp: 4.
    Direct Link
  16. Rajasegarar, S., C. Leckie and M. Palaniswami, 2008. Anomaly detection in wireless sensor networks. IEEE Wirel. Commun., 15(4): 34-40.
    CrossRef    
  17. Tang, J., Z. Chen, A.W.C. Fu and D.W. Cheung, 2002. Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.S., P.S. Yu and B. Liu (Eds.), Advances in Knowledge Discovery and Data Mining. PAKDD, 2002. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2336: 535-548.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved