Analyzing the performance of various clustering algorithms

Автор: Bhupesh Rawat, Sanjay Kumar Dwivedi

Журнал: International Journal of Modern Education and Computer Science @ijmecs

Статья в выпуске: 1 vol.11, 2019 года.

Бесплатный доступ

Clustering is one of the extensively used techniques in data mining to analyze a large dataset in order to discover useful and interesting patterns. It partitions a dataset into mutually disjoint groups of data in such a manner that the data points belonging to the same cluster are highly similar and those lying in different clusters are very dissimilar. Furthermore, among a large number of clustering algorithms, it becomes difficult for researchers to select a suitable clustering algorithm for their purpose. Keeping this in mind, this paper aims to perform a comparative analysis of various clustering algorithms such as k-means, expectation maximization, hierarchical clustering and make density-based clustering with respect to different parameters such as time taken to build a model, use of different dataset, size of dataset, normalized and un-normalized data in order to find the suitability of one over other.

Еще

Cluster analysis, k-means algorithm, Hierarchical algorithm, Expectation maximization, Make density-based clustering, Agglomerative clustering, Divisive clustering, Birch, Cure

Короткий адрес: https://sciup.org/15016824

IDR: 15016824   |   DOI: 10.5815/ijmecs.2019.01.06

Список литературы Analyzing the performance of various clustering algorithms

  • M.S.Chen,J.Han,P.S.Yu,“Data Mining:An Overview from a Database Perspective”, IEEE Transaction on Data and Knowledge Engineering,Vol.8,pp.866-888,1996.
  • J.Han,J.Pei,M.Kamber,Data Mining: Concepts and Techniques, Morgan Kaufman Publisher,2006.
  • G.Kesavraj,S.Sukumaran, “A study on classification techniques in data mining”, In 4th International Conference on Computing, Communications and Networking Technologies (ICCCNT),IEEE,pp.1- 7,2013.
  • A.Gosain,S.Dahiya, “Performance Analysis of various fuzzy clustering algorithm: A Review, In Proceeding of 7th International conference on communication, computing and virtulization,Elsevier,2016.
  • T.Sajana,S.Rani,K.V.Narayana, “A Survey on Clustering Techniques for Big Data Mining”, Indian Journal of Science and Technology,Vol.9,2016.
  • L.Rokach, O.Maimon. (2005) Clustering Methods. In: O.Maimon, Rokach L. (eds) Data Mining and Knowledge Discovery Handbook Springer, Boston, MA.
  • L.Kaufman, P.J.Rousseeuw, “Finding Groups in Data An Introduction to Cluster Analysis”, A Wiley-Science Publication John Wiley & Sons.(1990).
  • M.P.Veyssieres,R.E.Plant,“Identificaton of vegitation state and transition domain in California’s hardwood rangeland, University of California”,1998.
  • I.Dhillon,D.Modha, “Concepts decompostion for large sparse text data using cluster machine learning”, Vol.42,pp.143-175,2001.
  • Z.Huang, “Extension to the k-means algorithm for clustering large datasets with categorical values, Data mining and knowledge Discovery,Vol.2,pp.283-304,1998.
  • F.Usama,G.Piatetsky-Shapiro,P.Smyth, “The KDD Process for Extracting useful Knowledge from Volumes of Data”, Communicaton of the ACM, Vol.39, pp.27-34,1996.
  • K.S.Osama,“Data Mining in Sports: A Research Overview, MIS Master Project,2006.
  • U.Fayyad,G.Piatetsky-Shapiro,P.Smyth, From data mining to knowledge discovery in Databases, AI Magazine, Vol.17,1996.
  • N.Mehta,S.Dang,“A Review of Clustering Techniques in various Applications for effective data mining, International Journal of Research in Engineering & Applied Science, Vol.1,2011.
  • H.Edelstein, “Mining Data Warehouses, Information Week,pp.48- 51,1996.
  • W.K.Loh, Y.H.Park, “A Survey on Density-Based Clustering Algorithms”. In:Y.S.Jeong., Y.H.Park, Hsu C.H.Hsu,J.Park (eds) Ubiquitous Information Technologies and Applications. Lecture Notes in Electrical Engineering, Springer,pp.775-780,2014.
  • T.Schön,Machine Learning, Lecture.6 Expectation Maximization(EM) and clustering", Available at: http://www.contol.isy.liu.se/student/graduate/MachineLearning/Lecture/ Machine Learning/Lectures/le6.pdf.
  • E. Frank, M. Hall, and I. Witten, “The weka workbench,” Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th edn. Morgan Kaufman, Burlington, 2016.
  • E.Frank, M.Hall, G. Holmes, R. Kirkby,B. Pfahringer,I.H. Witten and, L. Trigg, Weka in Data Mining and Knowledge Discovery,Springer(2005) 1305–1314.
  • M.Hall, E.Frank,G.Holmes, B.Pfahringer, P.Reutemann, I.H.Witten, “The WEKA data mining software: an update, ACM SIGKDD Explor. Newsl.Vol.11pp.10-18,2009.
  • S.Borman, “The expectation maximization algorithm: A short tutorial, Unpublished paper. Available:http://ftp.csd.uwo.ca/faculty/olga/course s/Fall2006/Papers/EM algorithm.pdf.
  • A.Hinneburg,D.A.Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, In: Proc. Int'l Conf.on Knowledge Discovery and Data Mining (KDD).pp.58-65.1998.
  • M.Ankerst, M.M.Breunig,H-P.Kriegel,J.Sander: OPTICS: Ordering Points To Identify the Clustering Structure. In: Proc. of Int'l Conf. on Management of Data, ACM SIGMOD,pp.49-60,1999.
  • A.Hinneburg, H.-H.Gabriel:DENCLUE 2.0:Fast Clustering Based on Kernel Density Estimation. In: M.Berthold, J.Shawe-Taylor,N. Lavrač (eds.) IDA 2007.LNCS. Springer.4723,pp.70-80,2007.
  • X.XU,M. ESTER, H.-P.KRIEGEL,J.SANDER, “A distribution-based clustering algorithm for mining in large spatial databases”, In Proceedings of the 14th ICDE,IEEE,pp.324-331,1998.
  • X.Xu, J.Jäger, H.-P.Kriegel: “A Fast Parallel Clustering Algorithm for Large Spatial Databases", Data Mining and Knowledge Discovery (DMKD).Vol.3,pp.263-2990,1999.
  • T.Zhang, R.Ramakrishnan, M.Linvy, “BIRCH: An efficient data clustering method for very large data sets”, Data Mining and Knowledge Discovery, Vol.1,pp.141-182,1997.
  • S.Guha,R.Rastogi, K.Shim, “CURE: An efficient clustering algorithm for large data sets”, In Proceeding of ACM SIGMOD Conference,1998.
  • G.Karypis, E.H.Han,V.Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling Computer”, IEEE Computer, Vol.32,pp.68-75,1999.
  • Z.Huang, “Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values”, Data Mining and Knowledge Discovery.pp.283-304,1998.
  • A.Hinneburg, D.A.Keim, “A General Approach to Clustering in Large Databases with Noise”, Knowledge and Information Systems (KAIS), Vol.5,pp.387-415,2003.
  • Y.M.Cheung, “K*-means:A new generalized k-means clustering algorithm”,Pattern Recognition Letters, Elsevier. Vol.24,pp.2883- 2893,2003.
  • P.Berkhin, “Survey of Clustering Data Mining techniques”, Accrue Software, Inc, 2000.
  • J.Hartigan,M.Wong, “Algorithm AS136:A k-means clustering algorithm”, Applied Statistics, Vol.28, pp-100-108,1979.
  • A.K.Jain, M.N.Murty, P.J.Flynn, “Data clustering: A review”, ACM Comput. Surv.Vol.31,pp.264-323,1999.
  • K.Stoffel,A.Belkoniene, “Parallel - means clustering for large data set”, In Proc. Euro Par'99 Parallel Processing, Springer, pp.1451-1454,1999.
  • G.Ball,D.Hall, “A clustering technique for summarizing multivariate data”,Behaviour Science, Vol.12,pp.153-155,1967.
Еще
Статья научная