Analysis of the execution of fragmented programs based on Slow factors
Автор: Kireev S.E., Litvinov V.S.
Статья в выпуске: 2 т.13, 2024 года.
Бесплатный доступ
When executing parallel programs based on the task parallelism paradigm, several issues need to be addressed, such as choosing the order in which tasks are started, considering the dependencies between them, distributing data and tasks across parallel processes, and balancing resource utilization. These issues fall under the category of system-level parallel programming and are typically handled by a dedicated execution system. The final performance of a parallel program depends on how effectively these issues are addressed, as well as the structure and characteristics of the underlying algorithm. If the program’s performance is insufficient, optimization may be required, which necessitates identifying the bottlenecks that hinder its performance. Profiling can be used to pinpoint program bottlenecks by collecting performance metrics that may reveal the source of performance issues. However, the conventional tools commonly used for profiling parallel programs are not able to provide an answer in terms of the required concepts, due to the difficulty in analyzing the asynchronous execution of multiple tasks, as well as the inability to differentiate between application (multiple tasks) and system (operating system) components within an executing program. Consequently, such programs necessitate the development of novel profiling and analysis techniques. The paper discusses the problem of obtaining comprehensible performance characteristics of task-based parallel programs for performance analysis and optimization. It is suggested to evaluate the influence of the following factors: Starvation, Latency, Overhead andWaiting for contention resolution (SLOW). An algorithm for obtaining the corresponding characteristics for the LuNA fragmented programming system is presented, as well as a method for analyzing them to optimize LuNA programs. The correctness of the approach has been demonstrated on a number of synthetic tests. The application of the approach to the analysis of the “real-world” numerical simulation program is shown.
Performance analysis, parallel programming, fragmented programming, task parallelism, luna system
Короткий адрес: https://sciup.org/147243961
IDR: 147243961 | DOI: 10.14529/cmse240205