Privacy Preserving Probabilistic Possibilistic Fuzzy C Means Clustering

V.S. Thiyagarajan; Venkatachalapathy

doi:10.19026/rjaset.11.1672

Research Journal of Applied Sciences, Engineering and Technology

Research Article | OPEN ACCESS

Privacy Preserving Probabilistic Possibilistic Fuzzy C Means Clustering

¹V.S. Thiyagarajan and ²Venkatachalapathy

¹Annamalai University, Chidhambaram, India
²Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Chidhambaram, India

Research Journal of Applied Sciences, Engineering and Technology 2015 1:27-39

http://dx.doi.org/10.19026/rjaset.11.1672 | © The Author(s) 2015

Received: December ?14, ?2014 | Accepted: April ?13, ?2015 | Published: September 05, 2015

Back to issue | PDF | HTML

Abstract

Due to this uncontrollable growth of data, clustering played major role to partition into a small sets to do relevant processes within the small sets. Recently, the privacy and security are extra vital essentials when data is large and the data is distributed to other sources for various purposes. According to that, the privacy preservation should be done before distributing the data. In this study, our proposed algorithm meets the both requirements of achieving the clustering accuracy and privacy preserving of the data. Initially, the whole dataset is divided to small segments. The next step is to find the best sets of attributes combinations, which are attained through, attribute weighing process, which leads to attain the privacy preservation through vertical partitioning. The next is to apply the proposed Probabilistic Possibilistic Clustering Algorithm (PPFCM) for each segment, which produces the number of clusters for each segment. The next step is applying the PPFCM on the centroids of the clusters. The corresponding data tuples of the grouped centroids join to attain the final clustered result. The implementation is done using JAVA and the performance of the proposed PPFCM algorithm is compared with possibilistic FCM and probability-clustering algorithm for the benchmark datasets.

Keywords:

Clustering, possibilistic fuzzy C means clustering, privacy preserving, probabilistic clustering,

References

Adult dataset, 1994. Retrieved from: http://archive.ics.uci.edu/ml/datasets/Adult.
Direct Link
Chen, W.Y., Y. Song, H. Bai, C.J. Lin and E.Y. Chang, 2011. Parallel spectral clustering in distributed systems. IEEE T. Pattern Anal., 33(3): 568-586.
CrossRef PMid:20421667
Das, S., A. Abraham and A. Konar, 2008. Automatic clustering using an improved differential evolution algorithm. IEEE T. Syst. Man Cy. A, 38(1): 218-237.
CrossRef
Ester, M., H.P. Kriegel, J. Sander and X. Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96 ), pp: 226-231.
Islam, M.Z. and L. Brankovic, 2004. A framework for privacy preserving classification in data mining. Proceeding of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence and Software Internationalisation, 32: 163-168.
Izakian, H., A. Abraham and V. Snasel, 2009. Fuzzy clustering using hybrid fuzzy C-means and fuzzy particle swarm optimization. Proceeding of World Congress on Nature and Biologically Inspired Computing. IEEE Press, India, pp: 1690-1694.
CrossRef
Jain, Y.K., V.K. Yadav and G. Panday, 2011. An efficient association rule hiding algorithm for privacy preserving data mining. Int. J. Comput. Sci. Eng., 3(7): 2792-2798.
Januzaj, E., H.P. Kriegel and M. Pfeifle, 2004. Scalable density-based distributed clustering. Proceeding of 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp: 231-244.
CrossRef
Ji, J., W. Pang, C. Zhou, X. Han and Z. Wang, 2012. A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowl-Based Syst., 30: 129-135.
CrossRef
Jin, R., A. Goswami and G. Agrawal, 2006. Fast and exact out-of-core and distributed K-means clustering. Knowl. Inf. Syst., 10(1): 17-40.
CrossRef
Kusiak, A. and M. Smith, 2007. Data mining in design of products and production systems. IFAC Annu. Rev. Control, 31(1): 147-156.
CrossRef
Li, T., N. Li, J. Zhang and I. Molloy, 2012. Slicing: A new approach for privacy preserving data publishing. IEEE T. Knowl. Data En., 24(3): 561-574.
CrossRef
Lyigun, C., 2008. Probabilistic Distance Clustering, Proquest, ISBN: 0549980075, 9780549980070.
Mehmed, K., 2003. Data Mining: Concepts, Models, Methods and Algorithms. John Wiley and Sons, Hoboken, N.J.
Mushroom dataset, 1981. Retrieved from: http://archive.ics.uci.edu/ml/datasets/Mushroom.
Direct Link
Ng, R.T. and J. Han, 1994. Efficient and effective clustering methods for spatial data mining. Proceeding of the 20th International Conference on Very Large Data Bases, pp: 144-155.
Osmar, R.Z., 1999. Introduction to Data Mining. In: Principles of Knowledge Discovery in Databases. CMPUT690, University of Alberta, Canada.
Pal, N.R., K. Pal, J.M. Keller and J.C. Bezdek, 2005. A possibilistic fuzzy c-means clustering algorithm. IEEE T. Fuzzy Syst., 13(4): 517-530.
CrossRef
Patel, S., V. Patel and D. Jinwala, 2013. Privacy preserving distributed K-means clustering in malicious model using zero knowledge proof. In: Hota, C. and P.K. Srimani (Eds.), ICDCIT, 2013. LNCS 7753, Springer-Verlag, Berlin, Heidelberg, pp: 420-431.
Roy, B., 2014. Performance analysis of clustering in privacy preserving data mining. Int. J. Comput. Appl. Inform. Technol., 5(2): 35-45.
Sheikholeslami, G., S. Chatterjee and A. Zhang, 1998. WaveCluster: A multi-resolution clustering approach for very large spatial databases. Proceeding of the 24th VLDB Conferences. New York, USA, pp: 428-439.
Wang, W., J. Yang and R. Muntz, 1997. STING: A statistical information grid approach to spatial data mining. Proceeding of the 23rd International Conference on Very Large Data Bases (VLDB), pp: 186-195.
Wehrens, R. and L.M. Buydens, 2004. Model-based clustering for image segmentation and large datasets via sampling. J. Classif., 21: 231-253.
CrossRef
Wu, S. and S. Wang, 2013. Information-theoretic outlier detection for large-scale categorical data. IEEE T. Knowl. Data En., 25(3): 589-602.
CrossRef
Zhang, T., R. Ramakrishnan and M. Livny, 1996a. BIRCH: An efficient data clustering method for very large databases. Proceeding of the ACM SIGMOD International Conference on Management of Data, pp: 103-114.
CrossRef
Zhang, T., R. Ramakrishnan and M.L. Birch, 1996b. An efficient data clustering method for very large databases. Proceeding of the ACM SIGMOD International Conference on Management of Data, pp: 103-114.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online): 2040-7467
ISSN (Print): 2040-7459

Information

Sales & Services



Journal Home \| Aim & Scope \| Author(s) Information \| Editorial Board \| MSP Download Statistics