An efficient clustering algorithm for spatial datasets with noise

Akash Nag; Sunil Karforma

doi:10.5815/ijmecs.2018.07.03

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Computer hardware

An efficient clustering algorithm for spatial datasets with noise

Author: Akash Nag, Sunil Karforma

Journal: International Journal of Modern Education and Computer Science @ijmecs

Article in issue: 7 vol.10, 2018.

Free access

Clustering is the technique of finding useful patterns in a dataset by effectively grouping similar data items. It is an intense research area with many algorithms currently available, but practically most algorithms do not deal very efficiently with noise. Most real-world data are prone to containing noise due to many factors, and most algorithms, even those which claim to deal with noise, are able to detect only large deviations as noise. In this paper, we present a data-clustering method named SIDNAC, which can efficiently detect clusters of arbitrary shapes, and is almost immune to noise – a much desired feature in clustering applications. Another important feature of this algorithm is that it does not require apriori knowledge of the number of clusters – something which is seldom available.

Clustering, data mining, spatial datasets, noisy data

Short address: https://sciup.org/15016777

IDR: 15016777 | DOI: 10.5815/ijmecs.2018.07.03

References An efficient clustering algorithm for spatial datasets with noise

MacQueen, James. "Some methods for classification and analysis of multivariate observations." Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. No. 14. 1967.
Kaufman, Leonard, and Peter J. Rousseeuw. Finding groups in data: an introduction to cluster analysis. Vol. 344. John Wiley & Sons, 2009.
Huang, Zhexue. "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining." DMKD. 1997.
Ng, Raymond T., and Jiawei Han. "Efficient and Effective Clustering Methods for Spatial Data Mining." Proc. of. 1994.
Schikuta, Erich. "Grid-clustering: An efficient hierarchical clustering method for very large data sets." Pattern Recognition, 1996., Proceedings of the 13th International Conference On. Vol. 2. IEEE, 1996.
Schikuta, Erich, and Martin Erhart. "The BANG-clustering system: Grid-based data analysis." International Symposium on Intelligent Data Analysis. Springer Berlin Heidelberg, 1997.
Guha, Sudipto, Rajeev Rastogi, and Kyuseok Shim. "CURE: an efficient clustering algorithm for large databases." ACM Sigmod Record. Vol. 27. No. 2. ACM, 1998.
Sibson, Robin. "SLINK: an optimally efficient algorithm for the single-link cluster method." The computer journal 16.1 (1973): 30-34.
Ester, Martin, et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." Kdd. Vol. 96. No. 34. 1996.
Hinneburg, Alexander, and Daniel A. Keim. "An efficient approach to clustering in large multimedia databases with noise." KDD. Vol. 98. 1998.
Agrawal, Rakesh, et al. Automatic subspace clustering of high dimensional data for data mining applications. Vol. 27. No. 2. ACM, 1998.
Zhang, Tian, Raghu Ramakrishnan, and Miron Livny. "BIRCH: an efficient data clustering method for very large databases." ACM Sigmod Record. Vol. 25. No. 2. ACM, 1996.
Fu, Limin, and Enzo Medico. "FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data." BMC bioinformatics 8.1 (2007): 3.
Ankerst, Mihael, et al. "OPTICS: ordering points to identify the clustering structure." ACM Sigmod record. Vol. 28. No. 2. ACM, 1999.