Разработка подсистемы автоматизированного применения алгоритмов динамической балансировки нагрузки для системы LUNA
Автор: Малышкин Виктор Эммануилович, Перепелкин Владислав Александрович, Чмиль Александр Владимирович
Журнал: Проблемы информатики @problem-info
Рубрика: Прикладные информационные технологии
Статья в выпуске: 4 (57), 2022 года.
Бесплатный доступ
В научном численном моделировании на суперЭВМ часто возникает проблема статического или динамического обеспечения баланса вычислительной нагрузки. Эта проблема не имеет эффективного универсального решения, вследствие чего на практике используются различные частные и эвристические алгоритмы балансировки нагрузки на вычислительные узлы. Несмотря на то, что эта тема хорошо разработана в литературе и имеется большое количество методов, алгоритмов и программ балансировки нагрузки, их применение в каждом конкретном случае представляет собой проблему. Даже настройка параметров подходящего алгоритма балансировки нагрузки может стать непреодолимым препятствием для пользователя суперЭВМ. Это обуславливает актуальность автоматического обеспечения балансировки нагрузки на узлы как подзадачи автоматического конструирования параллельных программ. Если в системе программирования имеется набор алгоритмов балансировки в виде, допускающем их автоматическое применение, то обозначенная проблема снимается с пользователя. В системе автоматического конструирования параллельных программ LuNA имеются средства для накопления и автоматического применения алгоритмов статической и динамической балансировки вычислительной нагрузки на узлы. В статье рассматривается подход, на основе которого такое накопление и применение возможно в системе LuNA.
Автоматическое конструирование параллельных программ, динамическая балансировка нагрузки, технология фрагментированного программирования, система luna
Короткий адрес: https://sciup.org/143179782
IDR: 143179782 | УДК: 004.4’242 | DOI: 10.24412/2073-0667-2022-4-107-119
Dynamic load balancing algorithms application automation subsystem development for LUNA system
The imbalance of computational load is one of the key problems of parallel programming. The imbalance of the computational load occurs due to various factors. The causes of the imbalance can be both hardware and software. Hardware imbalance occurs due to heterogeneity of computing system resources. The program imbalance is associated with such factors as the dynamics of the simulated phenomenon and an inefficiently parallelized program containing excessive communications, an unsuccessful distribution of calculations between nodes, etc. Parallel implementation on supercomputers of large numerical models which requires dynamic load balancing on computing nodes is a complex task of system parallel programming. Solving this problem requires certain qualifications. Ordinary users of supercomputers in the field of scientific numerical modeling usually do not have such qualifications, which makes it difficult to use supercomputers in the relevant tasks. The problem is partially solved by using specialized software, where dynamic load balancing has already been implemented. However, the use of such software is not always possible, especially for new numerical models. Dynamic load balancing is a relatively well-developed topic. There are many publications on this topic. There are algorithms, methods, software implementations and studies of their properties. However, the task of ensuring efficient and balanced performance of supercomputer computing resources is still time-consuming and requires relevant qualifications. Even a “simple” adjustment of the parameters of the load balancing algorithm can become an insurmountable problem in practice. The problem is especially relevant for modern supercomputers of the peta and exaflops ranges, since it is not trivial to provide a sufficiently full load of computing resources of such supercomputers even for simple tasks. The elimination of the imbalance is a non-trivial task, for which there is no single method. There are many algorithms aimed at eliminating the imbalance, but none of them is universal. The principal solution is automation of dynamic balancing of the computing load. Automation in this case refers to a situation when various methods, algorithms and programs that perform dynamic load balancing accumulate in some database or library in a form that allows their automatic application. User creates his program in such a way that the corresponding methods, algorithms, and programs are applied automatically, without the need for the user to deeply understand the problems of dynamic load balancing. A specific case is the support for dynamic balancing of computational load in software such as Cliarm++ or PICADOR. A general case would be a situation where the programming system is not specialized, and all the knowledge about dynamic load balancing accumulated by researchers is available and automatically applied in the library. It is significant that there are no universal dynamic balancing algorithms due to the algorithmic complexity of this problem in the general formulation. Therefore, various particular and heuristic methods and algorithms used in various practical tasks are being researched in the relevant field. Accordingly, the automation of dynamic load balancing is based on the accumulation of these particular and heuristic algorithms, as well as on the information about their appropriate usage, and on how to determine which case is more suitable in a particular situation. The LuNA system of automatic design of parallel programs develops with an understanding of this circumstance. One of the tasks of the system is to ensure the accumulation and automatic application of knowledge about dynamic balancing of computing load. The paper reveals the question of how fundamentally the LuNA system is suited to provide this accumulation and automatic application, and also provides information about the extent at which this approach is currently implemented. In particular, the results of an experimental study of the performance characteristics of programs on LuNA systems using various dynamic load balancing algorithms are presented.
Список литературы Разработка подсистемы автоматизированного применения алгоритмов динамической балансировки нагрузки для системы LUNA
- Victor Malyshkin. Active Knowledge, LuNA and Literacy for Oncoming Centuries /'/ LNCS, 2015. V. 9465. P. 292-303.
- Kale L. V., Krishnan S. Charm++. A portable concurrent object oriented system based on C++ // Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications, 1993. P. 91-108.
- Bastrakov S. et al. Particle-in-cell plasma simulation on heterogeneous cluster systems // Journal of Computational Science, 2012. N 3(6). P. 474-479.
- Malyshkin V., Perepelkin V. LuNA Fragmented Programming System, Main Functions and Peculiarities of Run-Time Subsystem. // Parallel Computing Technologies, 2011. LNCS 6873. P. 53-61.
- Malyshkin V. E., Perepelkin V. A., Schukin G. A. Distributed algorithm of data allocation in the fragmented programming system LuNA // International Conference on Parallel Computing Technologies. Springer, Cham, 2015. P. 80-85.
- Acar U. A., Chargueraud A., Rainey M. Scheduling parallel programs by work stealing with private deques /'/ Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, 2013. P. 219-228.