Научные статьи \ Общие вопросы науки и культуры \ Информационные технологии. Вычислительная техника. Обработка данных \ Специальные определители для вычислительной техники

Efficient clustering algorithm with enhanced cohesive quality clusters

Автор: Anand Khandare, Abrar Alvi

Журнал: International Journal of Intelligent Systems and Applications @ijisa

Статья в выпуске: 7 vol.10, 2018 года.

Бесплатный доступ

Analyzing data is a challenging task nowadays because the size of data affects results of the analysis. This is because every application can generate data of massive amount. Clustering techniques are key techniques to analyze the massive amount of data. It is a simple way to group similar type data in clusters. The key examples of clustering algorithms are k-means, k-medoids, c-means, hierarchical and DBSCAN. The k-means and DBSCAN are the scalable algorithms but again it needs to be improved because massive data hampers the performance with respect to cluster quality and efficiency of these algorithms. For these algorithms, user intervention is needed to provide appropriate parameters as an input. For these reasons, this paper presents modified and efficient clustering algorithm. This enhances cluster’s quality and makes clusters more cohesive using domain knowledge, spectral analysis, and split-merge-refine techniques. Also, this algorithm takes care to minimizing empty clusters. So far no algorithm has integrated these all requirements that proposed algorithm does just as a single algorithm. It also automatically predicts the value of k and initial centroids to have minimum user intervention with the algorithm. The performance of this algorithm is compared with standard clustering algorithms on various small to large data sets. The comparison is with respect to a number of records and dimensions of data sets using clustering accuracy, running time, and various clusters validly measures. From the obtained results, it is proved that performance of proposed algorithm is increased with respect to efficiency and quality than the existing algorithms.

Еще

Clustering, Cluster, Massive Data, k-means, Cohesive, Quality, Validity Measures

Короткий адрес: https://sciup.org/15016507

IDR: 15016507 | DOI: 10.5815/ijisa.2018.07.05

Текст научной статьи Efficient clustering algorithm with enhanced cohesive quality clusters

Published Online July 2018 in MECS

For above reasons, this paper proposes efficient clustering algorithm using spectral analysis, domain knowledge, and split-merge-refine approach. This enhances the efficiency, quality and minimizes empty clusters. The performance of this algorithm is checked on 10 real small to large data sets. This paper is organized as follows: Section II summarizes literature surveys and related work from the year 1999 to 2017. Section III describes the working of some standard clustering algorithms from all the above categories. Section IV covers the proposed clustering algorithms. Section V covers experimental results analysis of standard and proposed clustering algorithms on various data sets using validity measures. Section VI covers conclusion and future work of this paper.

II. Related Work

One of the simple efficient filtering algorithms is Lloyd’s k-means clustering. The implementation of this algorithm is easy because it uses k-dimensional tree data structure for data clustering. But this algorithm can be improved further [1]. The appropriate value of k and initial selection plays a vital role in improving the quality of clusters. In order to overcome these problems associated with the initial selection, the new greedy initialization method is used. These select suitable initial centroids to form more compact and separated clusters [2]. This new method takes more time to search centroids. The distance measures play a vital role in the process of clustering. New distance measure along with new clustering algorithm which is based on the k-means algorithm is known as the circular k-means [3]. This algorithm is used to cluster the vectors of directional information. Selecting the appropriate value of k is difficult for clustering. Also, checking the validity of final clusters is challenging. For these two reasons, validity measures are useful. Nine validity measures based on a property of symmetry are used to estimate the value of k and validity of clusters. These measures include Davies–Bouldin, Dunn index, Point symmetry, I, Xie–Beni, FS, K, and SV index [4]. One of the popular clustering algorithms is k-means. For the k-mean, the user needs to give the value of k and initial centroids, which affects the overall quality of clusters. Also, results of k-means may be affected by noise in data. Density based noise detection can be used to filter noise. This helps in getting more accurate results [5].

Authors propose [27] method to optimize a number of clusters k, with minimum time complexity. This reduces the effort required for each iteration by decreasing reclustering of data objects. The different distance measures are used to track the effect on the computational time required per iteration. Therefore, this algorithm may produce less reliable clusters of data objects [28]. This paper presents enhanced method to select k and initial centers using weighted mean. This method is better in terms of mathematical computation and reliability. The performance of the standard k-means algorithm will be affected by the selection of the initial centers and converges to local minimum problems [29]. This paper proposes a new algorithm for initialization of the k-means to converge into a better local and also to produce better clusters.

From the above literature, it is observed that very few papers are focusing on all the aspects related to enhancement in a single algorithm. Hence, this paper is integrating more than two features in the single algorithm.

III. Candidate Clustering Algorithms

As per the above literature, clustering algorithms are partition, hierarchical, density, grids and model based. In each of the above categories, some candidate clustering algorithms are studied in this section. This paper studies algorithm from various categories analyzes their strong and weak points. This is done by applying and measuring performance on various data sets from the kaggle with various validity measures. Then these algorithms can be used to compare with the proposed efficient clustering algorithm. The summary of these clustering algorithms with their basic steps and their strong and weak points are given as follows:

A. k-Means Clustering

The k-means belongs to partition based clustering. Inputs for k-means are a number of clusters and it selects initial centroids randomly. The algorithm is divided into following three steps:

1. Read data and value of k.
2. Data assignment steps using distance measures.
3. Updating centroids and clusters.
4. Forming final clusters.

The value of k should be provided at the beginning of clustering. Poor qualities clusters are generated because initial centroids are selected randomly.

This algorithm can be used for the numerical type of data.

Quality can be improved further.

C. The Balanced Iterative Reducing and Clustering using Hierarchies

BIRC or Balanced Iterative Reducing and Clustering using Hierarchies is the example of the hierarchical type of clustering algorithm [44]. This algorithm constructs clustering features tree from data and leaf nodes are clustered. Steps of this algorithm are as follows:

1. Scans the data sets to construct features tree.
2. Apply the clustering to cluster the leaf nodes.

The quality of clusters generated by the BIRC is not good.

D. The Clustering Using Representatives

CURE or Clustering Using Representatives is the example of the hierarchical type of clustering algorithm [44]. Steps of this algorithm are as follows:

1. Read data sets and create the p partitioned.
2. create the representative points for k clusters.

CURE has a high run time complexity for big data.

E. The Density Based Spatial Clustering of Applications with Noise

DBSCAN or Density Based Spatial Clustering of Applications with Noise belongs to density clustering algorithm. For this algorithm, the user has to provide minimum points and radius. Steps of this algorithm are as follows:

1. Select any data objects as a unvisited point.
2. Identify the neighborhoods of the point.
3. Clusters the points
4. Identify the other unvisited.

This algorithm is only applicable for the specific type of data sets.

F. CLIQUE Clustering

It is grid based type algorithm of clustering used to find subspace clusters of data sets. This algorithm is divided into following steps:

1. Finding the dense area of data sets.
2. Generate the k-dimensional cells.
3. Eliminates the low-density cells.
4. Cluster the high-density cells.

It is necessary to give more parameters to work correctly.

G. Expectation Maximization

EM or Expectation Maximization is standard model based approach to clustering. This algorithm is divided into following steps:

1. Assigning the each data objects hypothetically to one of the clusters.
2. Update the hypothesis and assign data objects to new clusters.

EM takes more running time for to cluster the data sets.

IV. Proposed Efficient Clustering Algorithm

From surveys of modified and standard methods of the clustering algorithm, this paper presents efficient clustering algorithm with following features.

1. It is uses improved k-means clustering algorithm [21] to automatically predict the value of a number of clusters and appropriate initial centroids.
2. This algorithm is making use of split and merges technique to clusters large data.
3. This algorithm also removes the empty clusters.
4. Refinement step is added to form more cohesive clusters.
5. The algorithm is more efficient and produces high-quality clusters.

This algorithm consists of following three major steps:

1. Predicting value of k and initial centroids using domain knowledge and spectral analysis[19][21]
2. Forming the intermediate clusters.
3. Refining the clusters to form final clusters.

Fig.1 shows the flow of proposed algorithm.

Fig.1. Flow of Proposed Algorithm

Detailed algorithm is given as follows:

Input: Data Objects

Output: Quality k-Clusters

Phase 1: Predicting k and initial centroids

1. Scan the input data and estimate the value of k by understanding and analyzing properties of data objects using domain knowledge and spectral analysis.
2. Select only required attributes of data from the data sets using above analysis.
3. Use improved clustering algorithm to determine initial centroids and form the clusters.

Phase 2: Forming intermediate clusters

1. Check if the k is sufficient to apply fine tuning, if yes, fine tune k by reducing it.
2. Find the cluster with maximum negative impact, i.e., maximum within-SS.
3. Split the cluster into 2 new clusters and replace it in the list of clusters.
4. Calculate the accuracy of the newly formed clusters.
5. Repeat steps 2, 3 and 4 until the accuracy is
6. If the new accuracy has no significant improvement, consider the previous instance of clusters list as the final clusters.
7. Create clusters using above methods.

significantly increasing.

Phase 3: Improving and finding optimal path within clusters for the refinement of clusters

1. Identify outliers and form them into one cluster.
2. Reduce the clusters by eliminating empty clusters.
- 1.1. Scan all the clusters and for each cluster check if there exist some points in it.
- 1.2. If for any cluster number of point is zero, then remove the cluster and reduce the value of k by one.
3. Take all the formed clusters.
4. For every element in the cluster ‘A’:

1. Take element of this cluster A_i
2. For every element in the other cluster ‘B’:

1. Take the distance
2. Store this distance in Distances [] Array.
5. The minimum (Distances []) is the distance between cluster A and B.
6. Store this minimum distance between these two clusters.
7. Repeat steps 2, 3 and 4 for all cluster pairs
8. Find out the cluster pair with a minimum distance between each other, say pair P.
9. Merge these clusters and calculate new accuracy.
10. Repeat this until the accuracy significantly increases.
11. If the new accuracy has no significant improvement, consider the previous instance of clusters list as the final clusters.
12. Stop when criteria met.

D = ( A - B )2 (1)

possible.

In this algorithm, there are three phases. The first phase predicts the value of k and initial centroids. The second phase uses split and merges techniques to form intermediate clusters based on predicted value of k. And the third step is responsible for refining the clusters with no empty clusters and high cohesive clusters.

V. Experimental Results

The clustering algorithms[25] such as k-means, PAM, Hierarchical clustering, DBSCAN and proposed clustering algorithms are applied to various data sets. Also proposed algorithm is compared with R-k-means clustering which uses Lloyd’s algorithm. And then comparative experimental results are presented in this section. For the experiments, this paper is using the accuracy of clustering, running time of algorithms, silhouette, Dunn, DB and CH scores to compare the performance of algorithms. Details of these measures are given in paper [4][13][40]. The value of accuracy, Silhouette, Dunn, and CH should be high whereas the value of running time and DB index should be low for better clustering. For the maximum data sets, all the values of above measures are getting optimal for proposed clustering algorithm.

A. Data Sets

This paper uses 10 real data sets from kaggle site. These data sets include small to large data in size. Table 1 shows the details of data sets used.

Table 1. Data Sets Used

SN	Data Sets	Number of instances	Number of attributes
1	Accident	2057	15
2	Airline clusters	3999	7
3	Breast Cancer	569	30
4	Cities	493	10
5	Diamond	3089	11
6	Judges Rating	43	13
7	Rating1	2105	27
8	Salary	3123	6
9	Galaxy	3462	65
10	Voting	1076	5
11	Sensor1	8845	78

Table 2 shows results of standard k-means and proposed clustering algorithm with respect to efficiency.

Fig.2 to Fig.9 shows the performance of proposed, existing clustering algorithm and R-k-means clustering algorithms. Table 3 shows results of standard k-means and proposed clustering algorithm with respect to quality of clusters.

t able 2. t he efficiency of s tandard k-means v s . p roposed c lustering

Data sets	Algorithm	Accuracy	Running Time
Accident	Proposed Algo.	95.4	0.000581
Accident	Standard k-means	93.81	0.000088
Airline Clusters	Proposed Algo.	98.7	0.00212
Airline Clusters	Standard k-means	98.4	0.011
Breast cancer	Proposed Algo.	97.43	0.000551
Breast cancer	Standard k-means	96.65	0.000015
Cities	Proposed Algo.	99.31	0.000521
Cities	Standard k-means	97.5	0.000061
Diamond	Proposed Algo.	98.25	0.000249
Diamond	Standard k-means	98.06	0.000669
Judges rating	Proposed Algo.	97.15	0.0011
Judges rating	Standard k-means	90.94	0.00026
Rating 1	Proposed Algo.	79.37	0.000463
Rating 1	Standard k-means	61.5	0.000039
Salary	Proposed Algo.	96.39	0.000666
Salary	Standard k-means	95.8	0.000034
Galaxy	Proposed Algo.	99.86	0.00134
Galaxy	Standard k-means	99.67	0.0122
Voting	Proposed Algo.	96.12	0.000559
Voting	Standard k-means	95.44	0.000793
Sensor 1	Proposed Algo.	95.01	0.000891
Sensor 1	Standard k-means	85.67	0.0138

Fig.2. Accuracy of Proposed Algorithm

Fig.3. Running Time of Proposed Algorithm

Table 3. Clusters Quality of Proposed Clustering

Data sets	Algorithm	Silhouette Score	Dunn Score	CH Score	DB Score
Accident	Proposed Algo.	0.43	0.0025	3258.9	0.1
Accident	Standard k-means	0.46	0.002	2434	0.77
Airline Clusters	Proposed Algo.	0.29	0.00078	0.0003	0.01
Airline Clusters	Standard k-means	0.28	0.0003	2051.93	0.94
Breast cancer	Proposed Algo.	0.44	0.0079	1617.68	0.08
Breast cancer	Standard k-means	0.4	0.0071	1265.64	0.78
Cities	Proposed Algo.	0.35	0.0010	3224.34	0.07
Cities	Standard k-means	0.34	0.0007	888.16	0.82
Diamond	Proposed Algo.	0.57	0.0060	19190.56	0.06
Diamond	Standard k-means	0.56	0.0026	19188.48	0.49
Judges rating	Proposed Algo.	0.69	0.33	96.17	0.13
Judges rating	Standard k-means	0.23	0.27	34.04	0.88
Rating 1	Proposed Algo.	0.19	0.050	619.0	0.5
Rating 1	Standard k-means	0.15	0.026	621.89	1.9
Salary	Proposed Algo.	0.36	0.0013	1071.01	0.02
Salary	Standard k-means	0.35	0.0007	989.83	0.91
Galaxy	Proposed Algo.	0.53	0.01	91385.11	0.64
Galaxy	Standard k-means	0.35	0.01	33459.76	0.88
Voting	Proposed Algo.	0.64	0.011	468.29	0.03
Voting	Standard k-means	0.24	0.0038	388.16	1.0
Sensor 1	Proposed Algo.	0.55	0.0005	4194.5	1.11
Sensor 1	Standard k-means	0.5	0.0004	2063.7	0.88

Fig.4. Silhouette Score of Proposed Algorithm

Fig.6. DB Score of Proposed Algorithm

Fig.5. Dunn Score of Proposed Algorithm

Fig.7. Accuracy of Proposed Algorithm

Table 4. Proposed Algorithm Vs. R-k-means

Data sets	Algo.	Accuracy	Run Time	Silhouette Score	Dunn Score	CH Score	DB Score
Accident	Proposed Algo.	95.4	0.000581	0.43	0.0025	3258.9	0.1
Accident	R k-means	94.38	0.222	0.33	0.0033	2644	0.83
Cities	Proposed Algo.	99.31	0.000520	0.35	0.0010	3224.34	0.07
Cities	R k-means	97.5	0.055	0.32	0.00079	908.29	0.92
Airlines	Proposed Algo.	98.7	0.00212	0.29	0.00078	2712	0.01
clusters	R k-means	98.4	0.89	0.27	0.00029	2147	1.04

Table 4 shows the R programming k-means and proposed clustering algorithm. R tool uses Lloyd’s algorithms in k-means [38-39].

Fig.8. CH Score of Proposed Algorithm

Table5 shows the comparison of proposed clustering and other existing clustering algorithms.

Fig.9. The accuracy of Proposed Vs. Existing Algorithms

Table 5. Existing Algorithms Vs. Proposed Algorithm

Data sets	Algorithms	Accuracy	Running Time	Silhouette Score	Dunn Score
Accident	Proposed Alg.	95.4	0.00058	0.43	0.0025
	PAM	93.97	10.18	0.43	0.001
	Hierarchical	45.43	0.0018	-0.027	0.0013
	DBSCAN	57.70	0.0076	-0.57	4236599
Cities	Proposed Alg.	99.31	0.00125	0.35	0.023
	PAM	97.11	0.0023	0.33	0.020
	Hierarchical	96.91	0.0032	0.30	0.021
	DBSCAN	80.01	0..0033	0.31	0.22

Airline Clusters	Proposed Alg.	96.22	2.27	0.33	0.00032
	PAM	96.22	2.27	0.33	0.00032
	Hierarchical	97.12	2.12	0.33	0.0032
	DBSCAN	95.12	1.23	0.30	0.0021

From the above results it is observed that as a number of instances in data sets increases, accuracy is also increased. Fig.10 shows this trend.

From the Table 3, it is observed that higher the number of instances of data sets then lower the value of silhouette score. This trend is shown in the Fig.11.

101		Accura су increases as Data Instances Increases

100 99 98 97




96
	2000 4000 6000

Fig.10. Accuracy increases as Data Instances Increases

Fig.11. Higher Data Instances Lower Silhouette Score

Список литературы Efficient clustering algorithm with enhanced cohesive quality clusters

Tapas Kanungo David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu, "An Efficient k-Means Clustering Analysis and Implementation", IEEE transactions on pattern analysis and machine intelligence, vol. 24, no. 7, July 2002.
Wei Zhong, Gulsah Altun, Robert Harrison, Phang C. Tai, and Yi Pan,” Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property”, IEEE Transactions On Nanobioscience, Vol. 4, No. 3, September 2005.
Dimitrios Charalampidis, “A Modified K-Means Algorithm for Circular Invariant Clustering”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 12, December 2005.
Sriparna Saha, and Sanghamitra Bandyopadhyay,"Performance Evaluation of Some Symmetry-Based Cluster Validity Indexes", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 39, No. 4, 2009.
Juntao Wang and Xiaolong Su, "An improved K-Means clustering algorithm”, IEEE 3rd International Conference on Communication Software and Networks, 2011.
Jiye Liang, Liang Bai, Chuangyin Dang, and Fuyuan Cao,” The K-Means-Type Algorithms Versus Imbalanced Data Distributions”, IEEE Transactions On Fuzzy Systems, Vol. 20, No. 4, August 2012.
Mohamed Abubaker and Wesam Ashour,” Efficient Data Clustering Algorithms: Improvements over Kmeans ”, I.J. Intelligent Systems and Applications, 37-49,2013.
Rui Máximo Esteves, Thomas Hacker, and Chunming Rong,”Competitive K-means”, IEEE International Conference on Cloud Computing Technology and Science,2013.
Rui Xu, and Donald Wunsch II,” Survey of Clustering Algorithms ”, IEEE Transactions on Neural Networks, Vol. 16, No. 3, May 2005.
Ferdinando Di Martino, Vincenzo Loia, and Salvatore Sessa, “Extended Fuzzy C-Means Clustering in GIS Environment for Hot Spot Events”, B. Apolloni et al. (Eds.): KES 2007/WIRN 2007, Part I, LNAI 4692, pp. 101–107, Springer-Verlag Berlin Heidelberg 2007.
Bikram Keshari Mishra,Nihar Ranjan Nayak,Amiya Rath,Sagarika Swain ,” Far Efficient K-Means Clustering Algorithm ” , ICACCI-12 ,August 2012.
Xiaohui Huang, Yunming Ye, and Haijun Zhang,” Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 25, No. 8, August 2014.
Adil Fahad, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y. Zomaya, Sebti Foufou, And Abdelaziz Bouras,” A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis ”, IEEE Transactions On Emerging Topics In Computing,2014.
Michał Kozielski and Aleksandra Gruca,” Soft approach to identification of cohesive clusters in two gene representations”, Procedia Computer Science 35, 281 – 289, 2014.
G.Sandhiya and Mrs.ramyajothikumar,“Enhanced K-Means with Dijkstra Algorithm for”, 10th International Conference on Intelligent Systems and Control, 2016.
Jeyhun Karimov and Murat Ozbayoglu,” Clustering Quality Improvement of k-means using a Hybrid Evolutionary Model”, Procedia Computer Science 61, 38 – 45, 2015.
Vikas Verma, Shweta Bhardwaj, and Harjit Singh,” A Hybrid K-Mean Clustering Algorithm for Prediction Analysis”, Indian Journal of Science and Technology, Vol 9(28), DOI: 10.17485/ijst/2016/v9i28/98392, July 2016.
Shashank Sharma, Megha Goel, and Prabhjot Kaur,” Performance Comparison of Various Robust Data Clustering Algorithms”, I.J. Intelligent Systems and Applications, 63-71, MECS, 2013.
Mr. Anand Khandare, Dr. A.S. Alvi, “Efficient Clustering Algorithm with Improved Clusters Quality”, IOSR Journal of Computer Engineering, vol-18, pp. 15-19, Nov.-Dec. 2016.
Rui Xu, Jie Xu, and Donald C. Wunsch, II,” A Comparison Study of Validity Indices on Swarm-Intelligence-Based Clustering “, IEEE Transactions On Systems, Man, And Cybernetics—Part B: Cybernetics, Vol. 42, No. 4, August 2012.
Mr. Anand Khandare, Dr. A.S. Alvi, “Clustering Algorithms: Experiment and Improvements”, IRSCNS, Springer, LNNS, July 2016.
Anand Khandare and A.S. Alvi, “Survey of Improved k-means Clustering Algorithms: Improvements, Shortcomings, and Scope for Further Enhancement and Scalability”, Information Systems Design and Intelligent Applications, Advances in Intelligent Systems and Computing 434, DOI 10.1007/978-81-322-2752-6_48, 2016.
https://www.rstudio.com.
https://cran.r-project.org.
https://www.kaggle.com/datasets.
http://scikitlearlearn.org/stable/modules/generated/sklearn.cluster.KMeans.html.
Preeti Jain, Dr. Bala Buksh,"Accelerated K-means Clustering Algorithm ", I.J. Information Technology and Computer Science, 39-46 DOI: 10.5815/ijitcs.2016.10.05 ,MECS, 2016.
Aleta C. Fabregas, Bobby D. Gerardo, Bartolome T. Tanguilig III,"Enhanced Initial Centroids for K-means Algorithm " ISSN: 2074-9007 (Print), ISSN: 2074-9015 (Online) DOI: 10.5815/ijitcs, MECS, 2017.
P.SIVAKUMAR, Dr.M.RAJARAM,"Efficient and Fast Initialization Algorithm for K-means Clustering", I.J. Information Technology and Computer Science, 1, 19-24 DOI: 10.5815/ijitcs.2012.01.03,MECS, 2012.
Yugal Kumar, G. Sahoo, “A Review on Gravitational Search Algorithm and its Applications to Data Clustering & Classification", I.J. Intelligent Systems and Applications, 2014, 06, 79-93 DOI: 10.5815/ijisa.2014.06.09, MECS, 2014.
Handayani Tjandrasa, Isye Arieshanti, Radityo Anggoro, "Classification of Non-Proliferative Diabetic Retinopathy Based on Segmented Exudates using K-Means Clustering", I.J. Image, Graphics and Signal Processing, 1, 1-8 DOI: 10.5815/ijigsp.2015.01.01, MECS, 2015.
Muhammad Ali Masood, M. N. A. Khan, "Clustering Techniques in Bioinformatics ", I.J. Modern Education and Computer Science, 2015, 1, 38-46 DOI: 10.5815/ijmecs.2015.01.06, MECS, 2015.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, Angela Y. Wu"An Efficient k-Means Clustering Algorithm: Analysis and Implementation", IEEE Transactions on Pattern Analysis and Machine Intelligence archive Volume 24 Issue 7, Page 881-892,2002.
Purnawansyah, Haviluddin, "K-Means clustering implementation in network traffic activities", International Conference on Computational Intelligence and Cybernetics,10.1109/CyberneticsCom.2016.7892566, 2016.
Chang Lu, Yueting Shi, Yueyang Chen, " Data Mining Applied to Oil Well Using K-Means and DBSCAN", 7th International Conference on Cloud Computing and Big Data, 10.1109/CCBD.2016.018,2016.
Yohwan Noh, Donghyun Koo, Yong-Min Kang, DongGyu Park, DoHoon Lee, "Automatic crack detection on concrete images using segmentation via fuzzy C-means clustering", International Conference on Applied System DOI:10.1109/ICASI.2017.7988574,2017.
Kai-Shiang Chang, Yi-Wen Peng, Wei-Mei Chen, "Density-based clustering algorithm for GPGPU computing", International Conference on Applied System Innovation, DOI: 10.1109/ICASI.2017.7988545, 2017.
Dilmurat Zakirov, Aleksey Bondarev, Nodar Momtselidze, "A comparison of data mining techniques in evaluating retail credit scoring using R programming", Twelve International Conference on Electronics Computer and Computation, DOI:10.1109/ICECCO.2015.7416867, 2015.
Tran Duc Chung, Rosdiazli Ibrahim, Sabo Miya Hassan, "Fast approach for automatic data retrieval using R programming language",2nd IEEE International Symposium on Robotics and Manufacturing Automation, DOI: 10.1109/ROMA.2016.7847824, 2016.
M. Arif Wani, Romana Riyaz, "A new cluster validity index using maximum cluster spread based compactness measure", International Journal of Intelligent Computing and Cybernetics, ISSN: 1756-378X, 2016.
Deepali Aneja, Tarun Kumar Rawat, "Fuzzy Clustering Algorithms for Effective Medical Image Segmentation", I.J. Intelligent Systems and Applications, 11, 55-61 DOI: 10.5815/ijisa.2013.11.06 , MECS ,2013.
J Anuradha, B K Tripathy, "Hierarchical Clustering Algorithm based on Attribute Dependency for Attention Deficit Hyperactive Disorder", I.J. Intelligent Systems and Applications, 06, 37-45, DOI: 10.5815/ijisa.2014.06.04, MECS, 2014.
SudiptoGuha,Rajeev Rastogi,KyuseokShim, "Cure: an efficient clustering algorithm for large databases",DOI: 10.1016/S0306-4379(01)00008-4 ,Elsevier, 2001.
Tian Zhang, Raghu Ramakrishnan, Miron Livny," BIRCH: an efficient data clustering method for very large databases", SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data Pages 103-114,1996.
https://www.python.org.
https://www.programiz.com/python-programming.
Brian S. Everitt, Sabine Landau, Morven Leese, "Cluster Analysis ", 4th Wiley Publishing ISBN:0340761199 9780340761199,2009.
Fareeha Zafar, Zaigham Mahmood, "Comparative analysis of clustering algorithms comprising GESC, UDCA, and k-Mean methods for wireless sensor networks ", URSI Radio Science Bulletin Volume:84, Issue:4, 10.23919/URSIRSB.2011.7909974, 2011.
Xiaohui Huang, Yunming Ye, Haijun Zhang, "Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation ", IEEE Transactions on Neural Networks and Learning Systems Volume: 25, Issue: 8, 10.1109/TNNLS.2013.2293795, 2014.
Jianyun Lu, Qingsheng Zhu,"An Effective Algorithm Based on Density Clustering Framework", IEEE Wireless Communications Letters, Volume: 5, Issue: 6, DOI: 10.1109/LWC.2016.2603154,2016.
Yuan Zhou, Ning Wang; Wei Xiang,"Clustering Hierarchy Protocol in Wireless Sensor Networks Using an Improved PSO Algorithm", IEEE Access, Volume: 5, DOI: 10.1109/ACCESS.2016.2633826,2016.
Neha Bharill, Aruna Tiwari, Aayushi Malviya,"Fuzzy Based Scalable Clustering Algorithms for Handling Big Data Using Apache Spark" IEEE Transactions on Big Data, Volume: 2, Issue: 4, Pages: 339 - 352, DOI: 10.1109/TBDATA.2016.2622288, 2016.

Еще