The fastest and energy-efficient breadth-first search algorithm on a single node with various parallel architectures according to Graph500
Бесплатный доступ
The breadth-first search algorithm is one of the basic algorithms for graph traversing and is used in many other algorithms of high-level graph analysis. BFS is characterized with intensive irregular memory accesses and a random data dependency, which greatly complicates its parallelization to all existing architectures. The paper considers the implementation of the BFS algorithm (the core benchmark of the Graph500 rating) for processing large graphs on different architectures: Intel x86, IBM Power8+, Intel KNL and NVidia GPU. Algorithms for implementing breadth-first search will be considered, such as top-down traverse, bottom-up traverse, and a hybrid traverse that includes both top-down and bottom-up traverses. The features of the algorithm implementation on shared memory will be shown, as well as graph transformations (local sorting of graph vertices, global sorting of graph vertices, renumbering of all graph vertexes, compressed representation of graph vertices), which allow achieving the best performance and energy-efficiency in this algorithm among all single-node systems of Graph500 and GreenGraph500 ratings.
Bfs, cuda, power8, knl, graph500, parallel graph processing
Короткий адрес: https://sciup.org/147160643
IDR: 147160643 | УДК: 004.021 | DOI: 10.14529/cmse180201