Hybrid MPI + OpenMP algorithm for symmetric spare matrix reordering and its application to the solving systems of linear equations
Автор: Pirova Anna Yurievna
Журнал: Проблемы информатики @problem-info
Рубрика: Теоретическая и системная информатика
Статья в выпуске: 1 (54), 2022 года.
Бесплатный доступ
Systems of linear equations (SLAE) with large sparse matrix arise from numerical simulations in various scientific and engineering applications. SLAE solution is one of the time and memory-consuming stages of modeling, which requires efficient implementation on modern supercomputers. Depending on the properties of the matrix, direct and iterative methods of solving such systems are applied. Direct methods are based on matrix factorization into two triangular matrices and backward solution of these systems. A feature of direct methods is that the number of non-zcro elements during the factorization step can significantly increase over the initial matrix. Symmetric row and column reordering is used before numerical factorization to minimize the fill-in. A suitable permutation reduces the memory consumption for storing the factor and the time required for the most time-consuming stage - numerical factorization. Therefore, it is important to develop new parallel reordering algorithms that reduce the total time for solving SLAEs with a sparse matrix, and hence, the time of numerical simulation in applications. The problem of finding the ordering that minimizes the factor fill-in is NP-hard. In practice, two heuristic approaches are commonly used: the minimum degree and nested dissection algorithms. The nested dissection algorithm is built on the divide-and-conquer principle. At each step of the algorithm, a graph separator is found. The separator vertices divide the graph into two disconnected subgraphs of similar size. Then the separator vertices are numbered and removed from the graph, and the algorithm operates recursively on new subgraphs. The process stops when all vertices are numbered. There are many modifications of the nested dissection method that differ in the algorithm for finding the separator. Since 1993, the modifications of the nested dissection algorithm employing the multilevel approach are used in parallel computations. Most sparse SLAE solvers have built-in implementations of reordering methods, as well as an interface for using third-party libraries and user permutations. Open source libraries ParMETIS and PT-Scotch are widely used in the academic community. They implement a multilevel nested dissection algorithm for distributed memory systems using MPI technology. In particular, the open academic direct solver MUMPS has an interface for using these libraries. The ParMETIS library also has a version for shared memory systems called mt-metis. The commercial solver Intel MKL PARDISO and its cluster version Intel Cluster PARDISO are examples of tight integration of the solver and ordering routine. They include optimized versions of the METIS library, designed for shared-memory and distributed- memory systems, but its approaches to optimizing and combining shared and distributed memory computations during the reordering process have not been published. Earlier, we presented the PMORSy and DMORSy reordering libraries that implement the multilevel nested section method for shared- and distributed-memory systems, respectively. The ordering The study was supported by Ministry of Science and Higher Education, project Л" 0729-2020-0055 © A. Yu. Pirova, 2022 А. Ю. Пирова, 29 algorithm in PMORSy is based on parallel processing of the task queue. The task is to find the separator in the current subgraph. After calculating the separator, new subgraphs go to the shared task queue, from which they are assigned for execution to free threads. The algorithm is implemented using OpenMP technology. The parallel algorithm in DMORSy is based on parallel processing of a binary tree constructed during the nested dissection method. Let the original graph be distributed among P processes. Then its separator is calculated in parallel by all P processes using a multilevel algorithm similar to that used in PT-Scotch. After the separator is found, new subgraphs are redistributed to P / 2 processes each, and the ordering process is continued independently for new subgraphs. The algorithm is implemented using MPI technology. In this paper, we propose a hybrid MPI + OpenMP parallel algorithm for distributed memory systems, in which the use of processes and threads within a single computing node is combined. The algorithm is constructed as follows. While the input graph is distributed among P > 1 processes, its separator is calculated in parallel by all SPS processes using a multilevel algorithm from DMORSy. Once the input subgraph is stored on a single process, the ordering is performed using a parallel task queue according to the algorithm from PMORSy. The combination of parallelization schemes includes their advantages: the scalability of the distributed memory algorithm and less factor fill-in obtained by the algorithm on shared memory.
Nested dissection ordering, sparse matrix reordering, distributed-memory parallel algorithms, sparse matrix solve
Короткий адрес: https://sciup.org/143179063
IDR: 143179063 | DOI: 10.24412/2073-0667-2022-1-28-41