Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Spatial Clustering Algorithm for Time Series Rainfall Data Using X-Means Data Splitting

1Noor Rasidah Ali and 2Ku Ruhana Ku-Mahamud
1Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Cawangan Kedah, Kampus Sungai Petani, Malaysia
2School of Computing, College of Arts and Sciences, Universiti Utara Malaysia, Malaysia
Research Journal of Applied Sciences, Engineering and Technology  2017  6:221-226
http://dx.doi.org/10.19026/rjaset.14.4720  |  © The Author(s) 2017
Received: November 21, 2016  |  Accepted: February 13, 2017  |  Published: June 15, 2017

Abstract

The aim of this study is to present a new spatial clustering process for time series data. It has become an important and demanding application when the data involves chronological long time series and huge datasets. A great challenge in clustering is to achieve an optimal solution in searching similarity along the series. Furthermore, it also involves a very large-scale data analysis. Unfortunately, the existing clustering time series algorithms have become impractical since data do not scale properly for longer time series. The performance of the clustering algorithm gets even worse if it relies on actual data and many clustering algorithms are often faced with conflict in handling high dimensional data. In the case of spatial time series, the problem can be solved by unsupervised approaches rather than supervised classification, with appropriate preprocessing techniques to transform the actual data. The unsupervised solution using time series clustering algorithms is capable to extract valuable information and identify structure in complex and massive datasets as spatial time series. Therefore, a clustering algorithm by introducing data transformation using X-means data splitting is proposed to investigate the spatial homogeneity of time series rainfall data. The hierarchical clustering was used to demonstrate the similarity once the data was divided into training and testing sets. The proposed algorithm is compared with five types of data transformation techniques, namely mean and median in monthly data and the rest is in daily data such as binary, cumulative and actual values. Results indicate that data transformation using X-means data splitting in hierarchical clustering outperformed other transformation techniques and more consistent between training and testing datasets based on similarity measures.

Keywords:

Clustering algorithm, similarity measures, spatial homogeneity, spatial time series, X-means data splitting,


References

  1. Fu, T.C., 2011. A review on time series data mining. Eng. Appl. Artif. Intel., 24(1): 164-181.
    CrossRef    
  2. Keogh, E., K. Chakrabarti, M. Pazzani and S. Mehrotra, 2001. Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst., 3(3): 263-286.
    CrossRef    
  3. Bierman, P., M. Lewis, B. Ostendorf and J. Tanner, 2011. A review of methods for analysing spatial and temporal patterns in coastal water quality. Ecol. Indic., 11(1): 103-114.
    CrossRef    
  4. Chambers, L.E., 2003. South Australian rainfall variability and trends. BMRC Research Report NO. 92. Bureau of Meteorology Research Centre, Melbourne, pp: 33-34.
  5. Finch, H., 2005. Comparison of distance measures in cluster analysis with dichotomous data. J. Data Sci., 3: 85-100.
    Direct Link
  6. Gaspar, P., J. Carbonell and J.L. Oliveira, 2012. On the parameter optimization of support vector machines for binary classification. J. Integr. Bioinform., 9(3): 201-211.
    CrossRef    
  7. Goler, I., P. Senkul and A. Yazici, 2012. Spatio-temporal Pattern and Trend Extraction on Turkish Meteorological Data. In: Gelenbe, E., R. Lent and G. Sakellari (Eds.), Computer and Information Sciences II. Springer, London, pp: 505-510.
    Direct Link
  8. Jain, A.K. and R.C. Dubes, 1988. Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, New Jersey, pp: 320.
  9. Kisilevich, S., F. Mansmann, M. Nanni and S. Rinzivillo, 2010. Spatio-temporal Clustering. In: Data Mining and Knowledge and Discovery Handbook. Springer, US, pp: 855-874.
    Direct Link
  10. Kleist, C., 2015. Time series data mining methods: A review. M.S. Thesis, Department of Statistics, School of Business and Economics, Humboldt-Universitat zu Berlin, Berlin.
    Direct Link
  11. Koutroumanidis, T., G. Sylaios, E. Zafeiriou and V.A. Tsihrintzis, 2009. Genetic modeling for the optimal forecasting of hydrologic time-series: Application in Nestos river. J. Hydrol., 368(1-4): 156-164.
    CrossRef    
  12. Liu, Y. and L. Liu, 2016. Rainfall Feature Extraction using Cluster Analysis and its Application on Displacement Prediction for a Cleavage-parallel Landslide in the Three-Gorges Reservoir Area. Natural Hazards and Earth System Sciences Discussions Papers (January), pp: 1-15.
    Direct Link
  13. Mohan, A., 2014. A new spatio-temporal data mining method and its application to reservoir system operation. M.S. Thesis, University of Nebraska, Lincoln.
    Direct Link
  14. Montero, P. and J.A. Vilar, 2014. TSclust: An R package for time series clustering. J. Stat. Softw., 62(1): 1-43.
    CrossRef    
  15. Ratanamahatana, C.A., J. Lin, D. Gunopulos, E. Keogh, M. Vlachos and G. Das, 2009. Mining Time Series Data. In: Data Mining and Knowledge Discovery Handbook. Springer, US, pp: 1069-1103.
    CrossRef    
  16. Richman, M.B., 1986. Rotation of principal components. Int. J. Climatol., 6(3): 293-335.
    CrossRef    
  17. Wu, C.L. and K.W. Chau, 2013. Prediction of rainfall time series using modular soft computing methods. Eng. Appl. Artif. Intel., 26(3): 997-1007.
    CrossRef    
  18. Wu, C.L., K.W. Chau and C. Fan, 2010. Prediction of rainfall time series using modular artificial neural networks coupled with data-preprocessing techniques. J. Hydrol., 389(1-2): 146-167.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved