Обзор моделей параллельных вычислений

Ежова Надежда Александровна; Соколинский Леонид Борисович; Ezhova N.A.; Sokolinsky L.B.

doi:10.14529/cmse190304

Научные статьи \ Общие вопросы науки и культуры \ Информационные технологии. Вычислительная техника. Обработка данных \ Специальные определители для вычислительной техники

Обзор моделей параллельных вычислений

Автор: Ежова Надежда Александровна, Соколинский Леонид Борисович

Журнал: Вестник Южно-Уральского государственного университета. Серия: Вычислительная математика и информатика @vestnik-susu-cmi

Статья в выпуске: 3 т.8, 2019 года.

Бесплатный доступ

Цель данного обзора - дать максимально полное представление о достижениях и современном состоянии дел в разработке аналитических моделей параллельных вычислений, позволяющих предсказать время вычислений, ускорение, эффективность и масштабируемость параллельных алгоритмов применительно к различным целевым многопроцессорным платформам. Важность моделей параллельных вычислений вытекает из того, что они до реализации параллельного алгоритма в виде программы позволяют понять, насколько эффективно данный алгоритм может использовать конкретную многопроцессорную платформу, и при необходимости внести изменения в дизайн алгоритма, либо рассмотреть вариант замены целевой аппаратной платформы. В обзоре показывается эволюция моделей параллельных вычислений, происходившая одновременно с эволюцией многопроцессорных систем, от одноуровневых моделей с общей памятью до многоуровневых иерархических моделей с распределенной памятью, ориентированных на кластерные вычислительные системы с многоядерными ускорителями. В заключении обзора приводятся рекомендации по выбору возможных направлениий дальнейших исследований в области разработки математических моделей параллельных вычислений.

Модель параллельных вычислений, обзор, параллельное программирование, многопроцессорные системы, оценка производительности, предсказание времени выполнения алгоритма

Короткий адрес: https://sciup.org/147233202

IDR: 147233202 | УДК: 004.051 | DOI: 10.14529/cmse190304

Survey of parallel computation models

This survey aims to present the state of the art in analytic parallel computation models, providing sufficiently detailed descriptions of particularly noteworthy efforts. Such models allow predicting the computation time, speedup, efficiency and scalability of parallel algorithms for various target multiprocessor platforms. Modeling the cost of computations and communications in multiprocessor systems is an important and challenging problem. It provides insights into the design of the parallel algorithms for optimization of their deployment in the increasingly complex high-performance computing. The survey shows the evolution of parallel computing models inspired by the evolution of multiprocessor systems, from single-level models with shared memory to multi-level hierarchical models with distributed memory, which correspond to multicore clusters. The review concludes with prospective directions for further research in the area of developing mathematical models for parallel computing.

Список литературы Обзор моделей параллельных вычислений

Zhang Y. et al. Models of Parallel Computation: a Survey and Classification // Frontiers of Computer Science in China. Higher Education Press, 2007. Vol. 1, No. 2. P. 156-165. DOI: 10.1007/s11704-007-0016-1
Valiant L.G. A Bridging Model for Parallel Computation // Communications of the ACM. 1990. Vol. 33, No. 8. P. 103-111. DOI: 10.1145/79173.79181
Campbell D.K.G. A Survey of Models of Parallel Computation. Technical Report No.YCS97-278. 1997. 37 p.
Shepherdson J.C., Sturgis H.E. Computability of Recursive Functions // Journal of the ACM. ACM, 1963. Vol. 10, No. 2. P. 217-255. DOI: 10.1145/321160.321170
Elgot C.C., Robinson A. Random-Access Stored-Program Machines, an Approach to Programming Languages // Journal of the ACM. ACM, 1964. Vol. 11, No. 4. P. 365-399. DOI: 10.1145/321239.321240
Hartmanis J. Computational Complexity of Random Access Stored Program Machines // Mathematical Systems Theory. Springer-Verlag, 1971. Vol. 5, No. 3. P. 232-245.
DOI: 10.1007/BF01694180
Cook S.A., Reckhow R.A. Time Bounded Random Access Machines // Journal of Computer and System Sciences. Academic Press, 1973. Vol. 7, No. 4. P. 354-375.
DOI: 10.1016/S0022-0000(73)80029-7
Aho A. V., Hopcroft J.E., Ullman J.D. The Design and Analysis of Computer Algorithms. London, Amsterdam, Don Mills, Ontario, Sydney: Addison-Wesley, 1974. 470 p.
Skillicorn D.B., Talia D. Models and Languages for Parallel Computation // ACM Computing Surveys. 1998. Vol. 30, No. 2. P. 123-169.
DOI: 10.1145/280277.280278
Fortune S., Wyllie J. Parallelism in Random Access Machines // Proceedings of the Tenth Annual ACM Symposium on Theory of Computing - STOC'78. New York, New York, USA: ACM Press, 1978. P. 114-118.
DOI: 10.1145/800133.804339
Culler D. et al. LogP: Towards a Realistic Model of Parallel Computation // Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPOPP'93. New York, New York, USA: ACM Press, 1993. P. 1-12.
DOI: 10.1145/155332.155333
Yuan L. et al. LogGPH: A Parallel Computational Model with Hierarchical Communication Awareness // Proceedings of the 2010 13th IEEE International Conference on Computational Science and Engineering - CSE'10. Washington, DC, US: IEEE Computer Society, 2010. P. 268-274.
DOI: 10.1109/CSE.2010.40
Lu F., Song J., Pang Y. HLognGP: A Parallel Computation Model for GPU clusters // Concurrency and Computation: Practice and Experience. 2015. Vol. 27, No. 17. P. 4880-4896.
DOI: 10.1002/cpe.3475
Qiao X., Chen S., Yang L.T. HPM: a Hierarchical Model for Parallel Computations // International Journal of High Performance Computing and Networking. 2004. Vol. 1, No. 1-3. P. 117-127.
DOI: 10.1504/IJHPCN.2004.007571
Rico-Gallego J.-A., Díaz-Martín J.-C. τ-Lop: Modeling Performance of Shared Memory MPI // Parallel Computing. North-Holland, 2015. Vol. 46. P. 14-31.
DOI: 10.1016/J.PARCO.2015.02.006
Rico-Gallego J.-A., Lastovetsky A.L., Diaz-Martin J.-C. Model-Based Estimation of the Communication Cost of Hybrid Data-Parallel Applications on Heterogeneous Clusters // IEEE Transactions on Parallel and Distributed Systems. 2017. Vol. 28, No. 11. P. 3215-3228.
DOI: 10.1109/TPDS.2017.2715809
Bilardi G. et al. On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation // Proceedings of the International Conference on Computational Science - ICCS'01. Part II. Lecture Notes in Computer Science, Vol. 2074. Berlin, Heidelberg: Springer, 2001. P. 579-588.
DOI: 10.1007/3-540-45718-6_63
Ежова Н.А., Соколинский Л.Б. Модель параллельных вычислений для многопроцессорных систем с распределенной памятью // Вестник ЮУрГУ. Серия: Вычислительная математика и информатика. 2018. Том 7, № 2. С. 32-49.
DOI: 10.14529/cmse180203
Ежова Н.А., Соколинский Л.Б. Исследование масштабируемости итерационных алгоритмов при суперкомпьютерном моделировании физических процессов // Вычислительные методы и программирование. 2018. Том 19, № 4. С. 416-430.
DOI: 10.26089/NumMet.v19r437
Ceze L.H. Shared-Memory Multiprocessors // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1810-1812.
DOI: 10.1007/978-0-387-09766-4_142
Nayfeh B.A., Olukotun K. A Single-chip Multiprocessor // Computer. 1997. Vol. 30, No. 9. P. 79-85.
DOI: 10.1109/2.612253
Bardine A. et al. NUMA Caches // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1329-1338.
DOI: 10.1007/978-0-387-09766-4_16
Snir M. Distributed-Memory Multiprocessor // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 574-578.
Pfister G.F. In Search of Clusters. 2nd Edition. Upper Saddle River, NJ: Prentice Hall, 1998. 575 p.
Beowulf Cluster Computing with Linux / ed. Sterling T.L. Cambridge, London: MIT Press, 2002. 496 p.
Owens J.D. et al. GPU Computing // Proceedings of the IEEE. 2008. Vol. 96, No. 5. P. 879-899.
DOI: 10.1109/JPROC.2008.917757
Rochange C., Uhrig S., Sainrat P. Memory Hierarchy // Time-Predictable Architectures. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. P. 69-104.
DOI: 10.1002/9781118790229.ch4
Hennessy J.L., Patterson D.A. Computer Architecture: A Quantitative Approach // Computer. Fifth Edit. Morgan Kaufmann, 2011. 856 p.
Bottomley J. Understanding Caching // Linux Journal. 2004. No. 117. P. 58-62.
Wu K. et al. Early Evaluation of Intel Optane Non-Volatile Memory with HPC I/O Workloads // arXiv:1708.02199v2 [cs.DC]. 2017. 6 p.
Yang C.-T., Huang C.-L., Lin C.-F. Hybrid CUDA, OpenMP, and MPI Parallel Programming on Multicore GPU Clusters // Computer Physics Communications. North-Holland, 2011. Vol. 182, No. 1. P. 266-269.
DOI: 10.1016/J.CPC.2010.06.035
Bilardi G., Pietracaprina A. Models of Computation, Theoretical // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1150-1158. 10.1007/978-0-387- 09766-4_218.
DOI: 10.1007/978-0-387-09766-4_218
Skillicorn D.B. Parallelism and the Bird-Meertens Formalism. Kingston, Canada, 1992. 16 p.
Bilardi G., Pietracaprina A., Pucci G. A Quantitative Measure of Portability with Application to Bandwidth-Latency Models for Parallel Computing // Euro-Par'99 Parallel Processing. Euro-Par 1999. Lecture Notes in Computer Science, Vol 1685. Springer, Berlin, Heidelberg, 1999. P. 543-551.
DOI: 10.1007/3-540-48311-X_76
Grama A. et al. Architecture Independent Analysis of Parallel Programs // Proceedings of the International Conference on Computational Science - ICCS'01. Part II. Lecture Notes in Computer Science, Vol. 2074. Berlin, Heidelberg: Springer, 2001. P. 599-608.
DOI: 10.1007/3-540-45718-6_65
JaJa J.F. PRAM (Parallel Random Access Machines) // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1608-1615.
DOI: 10.1007/978-0-387-09766-4_23
Goldschlager L.M. A Unified Approach to Models of Synchronous Parallel Machines // Proceedings of the Tenth Annual ACM Symposium on Theory of Computing - STOC'78. New York, New York, USA: ACM Press, 1978. P. 89-94.
DOI: 10.1145/800133.804336
Ladner R.E., Fischer M.J. Parallel Prefix Computation // Journal of the ACM. 1980. Vol. 27, No. 4. P. 831-838.
DOI: 10.1145/322217.322232
JaJa J.F. An Introduction to Parallel Algorithms. Redwood City, CA, USA: Addison Wesley Publishing Co., Reading, 1992. 576 p.
Darema F. et al. A Single-Program-Multiple-Data Computational Model for EPEX/FORTRAN // Parallel Computing. 1988. Vol. 7, No. 1. P. 11-24.
DOI: 10.1016/0167-8191(88)90094-4
Darema F. SPMD Computational Model // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1933-1943.
DOI: 10.1007/978-0-387-09766-4_26
Cook S., Dwork C., Reischuk R. Upper and Lower Time Bounds for Parallel Random Access Machines without Simultaneous Writes // SIAM Journal on Computing. Society for Industrial and Applied Mathematics, 1986. Vol. 15, No. 1. P. 87-97.
DOI: 10.1137/0215006
Karp R.M., Ramachandran V. Parallel Algorithms for Shared-Memory Machines // Handbook of theoretical computer science. Volume A: Algorithms and Complexity / ed. Van Leeuwen J. Amsterdam, New York, Oxford, Tokyo: Elsevier, 1990. P. 871-941.
Pippenger N. On Simultaneous Resource Bounds // 20th Annual Symposium on Foundations of Computer Science (SFCS 1979). San Juan, Puerto Rico: IEEE, 1979. P. 307-311.
DOI: 10.1109/SFCS.1979.29
Pippenger N. Pebbling with an Auxiliary Pushdown // Journal of Computer and System Sciences. Academic Press, 1981. Vol. 23, No. 2. P. 151-165. 10.1016/0022- 0000(81)90011-8.
DOI: 10.1016/0022-0000(81)90011-8
Snyder L. Type Architectures, Shared Memory, and the Corollary of Modest Potential // Annual Review of Computer Science. 1986. Vol. 1, No. 1. P. 289-317.
DOI: 10.1146/annurev.cs.01.060186.001445
Mehlhorn K., Vishkin U. Randomized and Deterministic Simulations of PRAMs by Parallel Machines with Restricted Granularity of Parallel Memories // Acta Informatica. Springer-Verlag, 1984. Vol. 21, No. 4. P. 339-374.
DOI: 10.1007/BF00264615
Gibbons P.B., Matias Y., Ramachandran V. The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms // SIAM Journal on Computing. 1998. Vol. 28, No. 2. P. 733-769.
DOI: 10.1137/S009753979427491
Gibbons P.B., Matias Y. Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation? // Theory of Computing Systems. 1999. Vol. 32, No. 3. P. 327-359.
DOI: 10.1007/s002240000121
Aggarwal A., Chandra A.K., Snir M. On Communication Latency in PRAM Computations // Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA'89. New York, New York, USA: ACM Press, 1989. P. 11-21.
DOI: 10.1145/72935.72937
Mansour Y., Nisan N., Vishkin U. Trade-offs between Communication Throughput and Parallel Time // Journal of Complexity. Academic Press, 1999. Vol. 15, No. 1. P. 148-166.
DOI: 10.1006/JCOM.1998.0498
Cole R., Zajicek O. The APRAM: Incorporating Asynchrony into the PRAM Model // Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA'89. New York, New York, USA: ACM Press, 1989. P. 169-178.
DOI: 10.1145/72935.72954
Gibbons P.B. A More Practical PRAM Model // Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA'89. New York, New York, USA: ACM Press, 1989. P. 158-168.
DOI: 10.1145/72935.72953
VALIANT L.G. General Purpose Parallel Architectures // Handbook of Theoretical Computer Science (Vol. A): Algorithms and Complexity. Elsevier, 1990. P. 943-971.
DOI: 10.1016/B978-0-444-88071-0.50023-0
de la Torre P., Kruskal C.P. Towards a Single Model of Efficient Computation in Real Parallel Machines // Future Generation Computer Systems. North-Holland, 1992. Vol. 8, No. 4. P. 395-408.
DOI: 10.1016/0167-739X(92)90071-I
Heywood T., Ranka S. A Practical Hierarchical Model of Parallel Computation I. The model // Journal of Parallel and Distributed Computing. Academic Press, 1992. Vol. 16, No. 3. P. 212-232.
DOI: 10.1016/0743-7315(92)90034-K
Forsell M. A PRAM-NUMA Model of Computation for Addressing Low-TLP Workloads // International Journal of Networking and Computing. [Hiroshima University], 2011. Vol. 1, No. 1. P. 21-35.
Ranade A.G. How to Emulate Shared Memory // Journal of Computer and System Sciences. Academic Press, 1991. Vol. 42, No. 3. P. 307-326. 10.1016/0022- 0000(91)90005-P.
DOI: 10.1016/0022-0000(91)90005
Forsell M. et al. Hardware and Software Support for NUMA Computing on Configurable Emulated Shared Memory Architectures // 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum. IEEE, 2013. P. 640-648.
DOI: 10.1109/IPDPSW.2013.146
Forsell M. E - A Language for Thread-Level Parallel Programming on Synchronous Shared Memory NOCs // WSEAS Transactions on Computers. 2004. Vol. 3, No. 3. P. 807-812.
Forsell M., Leppanen V. An Extended PRAM-NUMA Model of Computation for TCF Programming // International Journal of Networking and Computing. 2013. Vol. 3, No. 1. P. 98-115.
Aggarwal A. et al. A Model for Hierarchical Memory // Proceedings of the Nineteenth annual ACM Conference on Theory of Computing - STOC'87. New York, New York, USA: ACM Press, 1987. P. 305-314.
DOI: 10.1145/28395.28428
Aggarwal A., Chandra A.K., Snir M. Hierarchical Memory with Block Transfer // 28th Annual Symposium on Foundations of Computer Science (sfcs 1987). IEEE, 1987. P. 204- 216.
DOI: 10.1109/SFCS.1987.31
Luccio F., Pagli L. A Model of Sequential Computation with Pipelined Access to Memory // Mathematical Systems Theory. Springer-Verlag, 1993. Vol. 26, No. 4. P. 343-356.
DOI: 10.1007/BF01189854
Mead C.A., Conway L.A. Introduction to VLSI systems. Boston, MA, USA: AddisonWesley, 1980. 396 p.
Alpern B. et al. The Uniform Memory Hierarchy Model of Computation // Algorithmica. Springer-Verlag, 1994. Vol. 12, No. 2-3. P. 72-109.
DOI: 10.1007/BF01185206
Vitter J.S., Shriver E.A.M. Algorithms for parallel memory, II: Hierarchical multilevel memories // Algorithmica. Springer-Verlag, 1994. Vol. 12, No. 2-3. P. 148-169.
DOI: 10.1007/BF01185208
Tiskin A. BSP (Bulk Synchronous Parallelism) // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 192-199.
DOI: 10.1007/978-0-387-09766-4_311
Goudreau M. et al. Towards Efficiency and Portability: Programming with the BSP Model // Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA'96. New York, NY, USA: ACM Press, 1996. P. 1-12.
DOI: 10.1145/237502.237503
Bisseling R.H. Parallel Scientific Computation: A Structured Approach using BSP and MPI. New York: Oxford University Press, 2004. 325 P.
McColl W.F. Scalable Computing // J. van Leeuwen (eds). Computer Science Today: Recent Trends and Developments. Lecture Notes in Computer Science, Vol. 1000. Berlin, Heidelberg: Springer, 1995. P. 46-61.
DOI: 10.1007/BFb0015236
Tiskin A. The Bulk-synchronous Parallel Random Access Machine // Theoretical Computer Science. 1998. Vol. 196, No. 1-2. P. 109-130.
DOI: 10.1016/S0304-3975(97)00197-7
McColl W.F., Tiskin A. Memory-Efficient Matrix Multiplication in the BSP Model // Algorithmica. Springer-Verlag, 1999. Vol. 24, No. 3-4. P. 287-297.
DOI: 10.1007/PL00008264
Kielmann T., Gorlatch S. Bandwidth-Latency Models (BSP, LogP) // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 107-112.
DOI: 10.1007/978-0-387-09766-4_189
Alexandrov A. et al. LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation // Journal of Parallel and Distributed Computing. 1997. Vol. 44, No. 1. P. 71-79.
DOI: 10.1006/jpdc.1997.1346
Kielmann T., Bal H.E., Verstoep K. Fast Measurement of LogP Parameters for Message Passing Platforms // Parallel and Distributed Processing. IPDPS 2000. Lecture Notes in Computer Science, Vol. 1800. Berlin, Heidelberg: Springer, 2000. P. 1176-1183.
DOI: 10.1007/3-540-45591-4_162
Gropp W., Lusk E., Skjellum A. Using MPI: Portable Parallel Programming with the Message-Passing Interface. Second Ed. MIT Press, 1999.
Gropp W. MPI 3 and Beyond: Why MPI Is Successful and What Challenges It Faces // Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, Vol. 7490 / ed. Träff J.L., Benkner S., Dongarra J.J. Berlin, Heidelberg: Springer, 2012. P. 1-9.
DOI: 10.1007/978-3-642-33518-1_1
Touyama T., Horiguchi S. Parallel Computation Model LogPQ // High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336 / ed. Polychronopoulos C., Joe K., Araki K. A.M. Berlin, Heidelberg: Springer, 1997. P. 327-334.
DOI: 10.1007/BFb0024227
Touyama T., Horiguchi S. Performance Evaluation of Practical Parallel Computation Model LogPQ // Proceedings of the Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99). Washington, DC, USA: IEEE Computer Society, 1999. P. 216-221.
DOI: 10.1109/ISPAN.1999.778942
Palmer J., Steele G.L. Connection Machine model CM-5 System Overview // Frontiers'92, the Fourth Symposium on the Frontiers of Massive Parallel Computation, October 19-21, 1992, McLean, Virginia. IEEE Computer Society Press, 1992. P. 474-483.
DOI: 10.1109/FMPC.1992.234877
Ino F., Fujimoto N., Hagihara K. LogGPS: A Parallel Computational Model for Synchronization Analysis // ACM SIGPLAN Notices. 2001. Vol. 36, No. 7. P. 133-142.
DOI: 10.1145/568014.379592
Gropp W. et al. A High-performance, Portable Implementation of the MPI Message Passing Interface Standard // Parallel Computing. 1996. Vol. 22, No. 6. P. 789-828.
DOI: 10.1016/0167-8191(96)00024-5
Moritz C.A. et al. LoGPC: Modeling Network Contention in Message-Passing Programs // ACM SIGMETRICS Performance Evaluation Review. New York, New York, USA: ACM Press, 1998. Vol. 26, No. 1. P. 254-263.
DOI: 10.1145/277851.277933
Moritz C.A., Frank M.I. LoGPC: Modeling Network Contention in Message-Passing Programs // IEEE Transactions on Parallel and Distributed Systems. 2001. Vol. 12, No. 4. P. 404-415.
DOI: 10.1109/71.920589
Agarwal A. et al. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor // Scalable Shared Memory Multiprocessors. Proceedings of a workshop held May 26-27, 1990, in Seattle, Wash. / ed. Dubois M., Thakkar S. Boston, MA: Springer, 1992. P. 239-261.
DOI: 10.1007/978-1-4615-3604-8_13
Kubiatowicz J., Agarwal A. Anatomy of a Message in the Alewife multiprocessor // ACM International Conference on Supercomputing 25th Anniversary Volume. New York, NY, USA: ACM Press, 2014. P. 193-204.
DOI: 10.1145/2591635.2667168
Cameron K.W., Ge R., Sun X.-H. lognP and log3P: Accurate Analytical Models of Pointto-point Communication in Distributed Systems // IEEE Transactions on Computers. 2007. Vol. 56, No. 3. P. 314-327.
DOI: 10.1109/TC.2007.38
Cameron K.W., Ge R. Predicting and Evaluating Distributed Communication Performance // Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. IEEE, 2004. P. 15.
DOI: 10.1109/SC.2004.40
Cameron K.W., Sun X.-H. Quantifying Locality Effect in Data Access Delay: Memory logP // Proceedings of the 2003 IEEE International Parallel and Distributed Processing Symposium (IPDPS'03). IEEE Comput. Soc, 2003. P. 8.
DOI: 10.1109/IPDPS.2003.1213137
Cappello F. et al. HiHCoHP-Toward a Realistic Communication Model for Hierarchical Hyperclusters of Heterogeneous Processors // Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001. IEEE Comput. Soc., 2001. P. 6.
DOI: 10.1109/IPDPS.2001.924978
Cappello F. et al. An Algorithmic Model for Heterogeneous Hyper-Clusters: Rationale and Experience // International Journal of Foundations of Computer Science. World Scientific Publishing Company, 2005. Vol. 16, No. 02. P. 195-215.
DOI: 10.1142/S0129054105002942
Bosque J.L., Pastor L. A Parallel Computational Model for Heterogeneous Clusters // IEEE Transactions on Parallel and Distributed Systems. 2006. Vol. 17, No. 12. P. 1390-1400.
DOI: 10.1109/TPDS.2006.165
Hoefler T. et al. LogfP - a Model for Small Messages in InfiniBand // Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. Washington, DC, USA: IEEE Computer Society, 2006. P. 319-319.
DOI: 10.1109/IPDPS.2006.1639624
Jepsen T.C. InfiniBand // Distributed Storage Networks: Architecture, Protocols and Management. Chichester, West Sussex, England: John Wiley & Sons, 2013. P. 159-174.
DOI: 10.1002/9780470871461.ch6
Nasri W., Tarhouni O., Slimi N. PLP: Towards a Realistic and Accurate Model for Communication Performances on Hierarchical Cluster-based Systems // 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 2008. P. 1-8.
DOI: 10.1109/IPDPS.2008.4536486
Hoefler T., Schneider T., Lumsdaine A. LogGOPSim - Simulating Large-Scale Applications in the LogGOPS Model // Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC'10. New York, New York, USA: ACM Press, 2010. P. 597-604.
DOI: 10.1145/1851476.1851564
Valiant L.G. A Bridging Model for Multi-core Computing // Journal of Computer and System Sciences. Elsevier Inc., 2011. Vol. 77, No. 1. P. 154-166.
DOI: 10.1016/j.jcss.2010.06.012
Tu B. et al. Performance Analysis and Optimization of MPI Collective Operations on Multicore Clusters // The Journal of Supercomputing. Springer US, 2012. Vol. 60, No. 1. P. 141-162.
DOI: 10.1007/s11227-009-0296-3
Tu B. et al. Accurate Analytical Models for Message Passing on Multi-core Clusters // 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing. IEEE, 2009. P. 133-139.
DOI: 10.1109/PDP.2009.18
Sterling T. et al. SLOWER: A Performance Model for Exascale Computing // Supercomputing Frontiers and Innovations. 2014. Vol. 1, No. 2. P. 42-57.
DOI: 10.14529/jsfi140203
Gerbessiotis A. V. Extending the BSP Model for Multi-core and Out-of-core Computing: MBSP // Parallel Computing. Elsevier B.V., 2015. Vol. 41. P. 90-102.
DOI: 10.1016/j.parco.2014.12.002
Amaris M. et al. A Simple BSP-based Model to Predict Execution Time in GPU Applications // 2015 IEEE 22nd International Conference on High Performance Computing (HiPC). IEEE, 2015. P. 285-294.
DOI: 10.1109/HiPC.2015.34
Maggs B.M., Matheson L.R., Tarjan R.E. Models of Parallel Computation: a Survey and Synthesis // Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences. IEEE Comput. Soc. Press, 1995. P. 61-70.
DOI: 10.1109/HICSS.1995.375476
Rico-Gallego J.A. et al. A Survey of Communication Performance Models for High-Performance Computing // ACM Computing Surveys. ACM, 2019. Vol. 51, No. 6. P. 1-36.
DOI: 10.1145/3284358

Еще