Models and algorithms for automatic grouping of objects based on the k-means model

Автор: G. Sh. Shkaberina, L. A. Kazakovtsev, R. Li

Журнал: Siberian Aerospace Journal @vestnik-sibsau-en

Рубрика: Informatics, computer technology and management

Статья в выпуске: 3 vol.21, 2020 года.

Бесплатный доступ

The paper is devoted to the study and development of new algorithms for automatic grouping of objects. The algorithms can improve the accuracy and stability of the result of solving practical problems, such as the problems of identifying homogeneous batches of industrial products. The paper examines the application of the k-means algorithm with the Euclidean, Manhattan, Mahalanobis distance measures for the problem of automatic grouping of objects with a large number of parameters. A new model is presented for solving problems of automatic grouping of industrial products based on the k-means model with the Mahalanobis distance measure. The model uses a training procedure by calculating the averaged estimate of the covariance matrix for the training sample (sample with pre-labeled data). A new algorithm for automatic grouping of objects based on an optimization model of k-means with the Mahalanobis distance measure and a weighted average covariance matrix calculated from a training sample is proposed. The algorithm allows reducing the proportion of errors (increasing the Rand index) when identifying homogeneous production batches of products based on the results of tests. A new approach to the development of genetic algorithms for the k-means problem with the use of a single greedy agglomerative heuristic procedure as the crossover operator and the mutation operator is presented. The computational experiment has shown that the new mutation procedure is fast and efficient in comparison with the original mutation of the genetic algorithm. The high rate of convergence of the objective function is shown. The use of this algorithm allows a statistically significant increase both in the accuracy of the result (improving the achieved value of the objective function within the framework of the chosen mathematical model for solving the problem of automatic grouping), and in its stability, in a fixed time, in comparison with the known algorithms of automatic grouping. The results show that the idea of including a new mutation operator in the genetic algorithm significantly improves the results of the simplest genetic algorithm for the k-means problem.

Еще

Automatic grouping, k-means, Mahalanobis distance, genetic algorithm.

Короткий адрес: https://sciup.org/148321756

IDR: 148321756   |   DOI: 10.31772/2587-6066-2020-21-3-347-354

Список литературы Models and algorithms for automatic grouping of objects based on the k-means model

  • Orlov V. I., Kazakovtsev L. A., Masich I. S., Stashkov D. V. Algoritmicheskoe obespechenie podderzhki prinyatiya reshenii po otboru izdelii mikroelektroniki dlya kosmicheskogo priborostroeniya [Algorithmic support of decision-making on selection of microelectronics products for space industry]. Krasnoyarsk, 2017, 225 p.
  • Kazakovtsev L. A., Antamoshkin A. N. [Greedy heuristic method for location problems]. Vestnik SibGAU. 2015, Vol. 16, No. 2, P. 317–325 (In Russ.).
  • MacQueen J. Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp. Math. Stat. Probab. 1967, Vol. 1, P. 281–297.
  • Hosage C. M., Goodchild M. F. Discrete Space Location-Allocation Solutions from Genetic Algorithms. Annals of Operations Research. 1986, Vol 6, P. 35–46. http://doi.org/10.1007/bf02027381
  • Bozkaya B., Zhang J., Erkut E. A Genetic Algorithm for the p-Median Problem, Drezner Z., Hamacher H. (eds,), Facality Location: Applications and Theory, Springer, Berlin, 2002.
  • Alp O., Erkut E., Drezner Z. An Efficient Genetic Algorithm for the p-Median Problem. Annals of Operations Research. 2003, Vol. 122, P. 21–42. doi: 10.1023/A:1026130003508
  • Maulik U., Bandyopadhyay S. Genetic Algorithm-Based Clustering Technique. Pattern Recognition. 2000, Vol. 33, No. 9, P. 1455–1465. doi: 10.1016/S0031-3203(99)00137-5
  • William M. Rand. Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association. 1971, Vol. 66, No. 336, P. 846–850.
  • De Maesschalck R., Jouan-Rimbaud D., Massart D. L. The Mahalanobis distance. Chem Intell Lab Syst. 2000, Vol. 50, No. 1, P. 1–18. doi: 10.1016/S0169-7439(99)00047-7
  • McLachlan G. J. Mahalanobis Distance. Resonance. 1999, Vol. 4, No. 20. doi: 10.1007/BF02834632.
  • Xing E. P., Ng A. Y., Jordan M. I., Russel S. Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems. 2003, Vol. 15, P. 505–12.
  • Orlov V. I., Fedosov V. V. ERC clustering dataset, 2016. http://levk.info/data1526.zip
  • Orlov V. I., Shkaberina G. Sh., Rozhnov I. P., Stupina A. A., Kazakovtsev L. A. [Application pf clustering algorithm wirh special distance measures for the problem of automatic grouping of electronic and radio devices]. Control systems and information technologies.2019, Vol. 3, No. 3, P. 42–46 (In Russ.).
  • Hansen P., Mladenovic N. J-means: a new local search heuristic for minimum sum of squares lustering. Pattern recognition. 2001, Vol. 34, No. 2, P. 405–413.
  • Kazakovtsev L. A., Antamoshkin A. N. Genetic Algorithm with Fast Greedy Heuristic for Clustering and Location Problems. Informatica. 2014, Vol. 38, No. 3, P. 229–240.
Еще
Статья научная