Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


K-Means Clustering Scheme for Enhanced Spam Detection

1, 2Nadir Omer Fadl Elssied and 1Othman Ibrahim
1Faculty of Computing, University Technology Malaysia, 81310, Skudai, Johor Bahru, Malaysia
2Algeraf Sharq Technical College, Khartoum, Sudan
Research Journal of Applied Sciences, Engineering and Technology  2014  10:1940-1952
http://dx.doi.org/10.19026/rjaset.7.486  |  © The Author(s) 2014
Received: May 01, 2013  |  Accepted: June 22, 2013  |  Published: March 15, 2014

Abstract

In recent years, the problems of increasing spam mail on the internet are becomes a serious issue and difficult to detect. Furthermore, several e-mail classifications methods have been proposed and their performance is achieved. Although, Naïve Bayes classifiers (NB) has been widely used in e-mail classification and is very simple and efficient, yet the problem of improving the accuracy and reducing misclassification rate still exists. Therefore, many researches are being carried out. These studies propose a hybrid scheme for e-mail classification based on Naïve Bayes and K-means clustering to obtain better accuracy and reduce the misclassification rate of spam detection. The experiment of the proposed scheme was carried out using spam base benchmark dataset to evaluate the feasibility of the proposed method. The result of this hybrid led to enhance Naïve Bayes classifiers and subsequently increase the accuracy of spam detection and reducing the misclassification rate. In addition, experimental results on spam base datasets show that the enhanced Naïve Bayes (KNavie) significantly outperforms Naïve Bayes and many other recent spam detection methods.

Keywords:

K-mean clustering, machine learning, Na, spam detection,


References

  1. Alguliev, R.M., R.M. Aliguliyev and S.A. Nazirova, 2011. Classification of textual e-mail spam using data mining techniques. Appl. Comput. Intelli. Soft Comput., 2011 Article ID 416308, pp: 8.
  2. Arun, R. and T. Durga, 2009. Adaptive spam filtering based on bayesian algorithm. Proceedings of 23rd InternationalConference on Computer Science, Information and Technology, Pune, ISBN-978-93-81693-83-4.
  3. Attri, U. and H. Kaur, 2012. Comparative study of gaussian and nearest mean classifiers for filtering spam e-mails. Int. J. Comput. Sci. Appl. (TIJCSA), 3(5): 2079-8407.
  4. Carpinter, J. and R. Hunt, 2006. Tightening the net: A review of current and next generation spam filtering tools. Comput. Secur., 25(8): 566-578.
    CrossRef    
  5. Caruana, G. and M. Li, 2012. A survey of emerging approaches to spam filtering. ACM Comput. Surv. (CSUR), 44(2): 9.
    CrossRef    
  6. Chhabra, P., R.Wadhvani and S. Shukla, 2010. Spam Filtering using Support Vector Machine. Special Issue IJCCT, 1(2, 3, 4): 166-171.
  7. Çıltık, A. and T. Güngör. Time-efficient spam e-mail filtering using n-gram models. Pattern Recogn. Lett., 29(1): 19-33.
    CrossRef    
  8. Cournane, A. and R. Hunt, 2004. An analysis of the tools used for the generation and prevention of spam. Comput. Amp. Secur., 23(2): 154-166.
    CrossRef    
  9. DeBarr, D. and H. Wechsler, 2009. Spam detection using clustering, random forests and active learning. Proceeding of the 6th Conference on Email and Anti-Spam, Mountain View, California, USA.
  10. Fagbola, T., S. Olabiyisi and A. Adigun, 2012. Hybrid GA-SVM for efficient feature selection in e-mail classification. Comput. Eng. Intelli. Syst., 3(3): 17-28.
  11. Guzella, T.S. and W.M. Caminhas, 2009. A review of machine learning approaches to Spam filtering. Expert Syst. Appl., 36(7): 10206-10222.
    CrossRef    
  12. Hayati, P. and V. Potdar, 2008a. Evaluationofspam detec tionandprevention frameworks for email and image spam: Astateofart. Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, pp: 520-527.
  13. Hayati, P. and V. Potdar, 2008b. Evaluation of spam detection and prevention frameworks for email and image spam: A state of art. Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, pp: 520-527.
    CrossRef    
  14. Hershkop, S. and S.J. Stolfo, 2005. Combining email models for false positive reduction. Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp: 98-107.
    CrossRef    
  15. Hong, C., 2011. Improving classification in Bayesian networks using structural learning. World Acad. Sci. Eng. Technol., 75: 1407-1411.
  16. Ismaila, I. and S. Ali, 2012. Optimized spam classification approach with negative selection algorithm. J. Theor. Appl. Inform. Technol., 39(1): 22-31.
  17. Lai, C.C. and C.H. Wu, 2007. Particle swarm optimization-aided feature selection for spam email classification. Proceedings of the 2nd International Conference on Innovative Computing, Information and Control, pp: 165.
    CrossRef    
  18. Long, X., W. L. Cleveland and Y.L. Yao, 2011. Methods and Systems for Identifying and Localizing Objects based on Features of the Objects that are Mapped to a Vector: Google Patents.
  19. Ma, W., D. Tran and D. Sharma, 2009. A novel spam email detection system based on negative selection. Proceeding of the 4th International Conference on Computer Sciences and Convergence Information Technology, ICCIT '09, pp: 987-992.
    CrossRef    
  20. MacQueen, J., 1967. Some methods for classification and analysis of multivariate observations. Proceeding of the 5th Berkeley Symposium on Math Statist and Prob., pp: 281-297.
  21. Manjusha, K. and R. Kumar, 2010. Spam mail classification using combined approach of bayesian and neural network. Proceeding of the International Conference on Computational Intelligence and Communication Networks (CICN), pp: 145-149.
    CrossRef    
  22. MarkHopkins, E.R., G. Forman and J. Suermondt, 1999.
    Direct Link
  23. Marsono, M.N., M.W. El-Kharashi and F. Gebali, 2009. A spam rejection scheme during SMTP sessions based on layer-3 e-mail classification. J. Network Comput. Appl., 32(1): 236-257.
    CrossRef    
  24. Mohammad, A.H. and R.A. Zitar, 2011. Application of genetic optimized artificial immune system and neural networks in spam detection. Appl. Soft Comput., 11(4): 3827-3845.
    CrossRef    
  25. Mohammed, M., A. Shawkat and T. Kevin, 2010. Improved C4.5 algorithm for rule based classification. Proceedings of 9th Artificial Intelligence, Knowledge Engineering and Database Conference (AIKED'10), pp: 296-301.
  26. Münz, G., S. Li and G. Carle, 2007. Traffic Anomaly Detection using K-Means Clustering. In GI/ITG Workshop MMBnet, 2007.
  27. Pearson, K., 1920. Notes on the history of correlation. Biometrika, 13(1), 25-45.
    CrossRef    
  28. Pour, A.N., R. Kholghi and S.B. Roudsari, 2012. Minimizing the time of spam mail? detection by relocating? filtering system to the sender? mail server. Int. J. Network Secur. Appl., 4(2): 10.
  29. Radicati, S. and Q. Hoang, 2011. Email Statistics Report, 2011-2015. Retrieved May, 25, 2011.
  30. Rao, I.K.R., 2003. Data mining and clustering techniques. Proceeding of the DRTC workshop on Semantic Web.
  31. Raskar, S.S. and D.Thakore, 2011. Text mining and clustering analysis. IJCSNS, 11(6): 203.
  32. Saad, O., A. Darwish and R. Faraj, 2012. A survey of machine learning techniques for Spam filtering. Int. J. Comput. Sci. Network Secur., 12(2): 66.
  33. Sadan, Z. and D.G. Schwartz, 2011. Social network analysis of web links to eliminate false positives in collaborative anti-spam systems. J. Network Comput. Appl., 34(5): 1717-1723.
    CrossRef    
  34. Sahami, M., S. Dumais, D. Heckerman and E. Horvitz, 1998. A Bayesian Approach to Filtering Junk E-Mail. Learning for Text Categorizoration: Paper from the 1998 Workshop AAAI Technical Report WS-98-05.
  35. Salcedo-Campos, F., J. Díaz-Verdejo and P. García- Teodoro, 2012. Segmental parameterisation and statistical modelling of e-mail headers for spam detection. Inform. Sci., 195(0): 45-61.
    CrossRef    
  36. Salehi, S. and A. Selamat, 2011. Hybrid Simple Artificial Immune System (SAIS) and Particle Swarm Optimization (PSO) for spam detection. Proceeding of the 5th Malaysian Conference in Software Engineering (MySEC), pp: 124-129.
    CrossRef    
  37. Sanz, E.P., J.M. Gómez Hidalgo and J.C. Cortizo Pérez, 2008. Email spam filtering. Adv Comput.,74: 45-114.
    CrossRef    
  38. Schneider, K.M., 2003. A comparison of event models for Naive Bayes anti-spam e-mail filtering. Proceeding of the 10th Conference of the Europeaan Chapter of the ACL, Buda-Pest, Hungry, pp: 307-314.
    CrossRef    
  39. Schryen, G., 2007. The impact that placing email addresses on the Internet has on the receipt of spam: An empirical analysis. Comput. Secur., 26(5): 361-372.
    CrossRef    
  40. Tala Tafazzoli, S.H.S., 2009. A combined method for detecting spam machines on a target network. Int. J. Comput. Networks Commun. (IJCNC), 1(2): 35-44.
  41. Taninpong, P. and S. Ngamsuriyaroj, 2009. Incremental adaptive spam mail filtering using naïve bayesian classification. Proceeding of the 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, pp: 243-248.
  42. Torres, G.J., R.B. Basnet, A.H. Sung, S. Mukkamala and B.M. Ribeiro, 2009. A similarity measure for clustering and its applications. Int. J. Electric. Comput. Syst. Eng., 3(3).
  43. Vinther, M., 2002. Intelligent Junk Mail Detection Using Neural Networks.
    Direct Link
  44. Vorakulpipat, C., V. Visoottiviseth and S. Siwamogsatham, 2012. Polite sender: A resource-saving spam email countermeasure based on sender responsibilities and recipient justifications. Comput. Amp. Secur., 31(3): 286-298.
    CrossRef    
  45. Wu, H., H. Li, G. Wang, H. Chen and X. Li, 2011. A novel spam filtering framework based on fuzzy adaptive particle swarm optimization. Proceeding of the International Conference on Intelligent Computation Technology and Automation (ICICTA), pp: 38-41.
    CrossRef    
  46. Xiao-Li, C., L. Pei-Yu, Z. Zhen-Fang and Q. Ye, 2009. A method of spam filtering based on weighted support vector machines. Proceeding of the IEEE International Symposium on IT in Medicine and Education, pp: 947-950.
  47. Yin, H. and Z. Chaoyang, 2011. An improved bayesian algorithm for filtering spam e-mail. Proceeding of the 2nd International Symposium on Intelligence Information Processing and Trusted Computing (IPTC), pp: 87-90.
    CrossRef    
  48. Ying, K.C., S.W. Lin, Z.J. Lee and Y.T. Lin, 2010. An ensemble approach applied to classify spam e-mails. Expert Syst. Appl., 37(3): 2197-2201.
    CrossRef    
  49. Youn, S. and D. McLeod, 2007. A comparative study for email classification. Adv. Innovat. Syst. Comput. Sci. Software Eng., pp: 387-391.
    CrossRef    
  50. Zhang, H. and D. Li, 2007. Naïve Bayes Text Classifier. pp: 708-708.
  51. Zhang, Q., H. Yang, P. Wang and W. Ma, 2011. Fuzzy clustering based on semantic body and its application in chinese spam filtering. JDCTA: Int. J. Dig. Content Technol. Appl., 5(4): 1-11.
    CrossRef    
  52. Zhao, W. and Z. Zhang, 2005. An email classification model based on rough set theory. Proceedings of the 2005 International Conference on Active Media Technology, (AMT 2005), pp: 403-408.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved