Обзор моделей параллельных вычислений
Автор: Ежова Надежда Александровна, Соколинский Леонид Борисович
Статья в выпуске: 3 т.8, 2019 года.
Бесплатный доступ
Цель данного обзора - дать максимально полное представление о достижениях и современном состоянии дел в разработке аналитических моделей параллельных вычислений, позволяющих предсказать время вычислений, ускорение, эффективность и масштабируемость параллельных алгоритмов применительно к различным целевым многопроцессорным платформам. Важность моделей параллельных вычислений вытекает из того, что они до реализации параллельного алгоритма в виде программы позволяют понять, насколько эффективно данный алгоритм может использовать конкретную многопроцессорную платформу, и при необходимости внести изменения в дизайн алгоритма, либо рассмотреть вариант замены целевой аппаратной платформы. В обзоре показывается эволюция моделей параллельных вычислений, происходившая одновременно с эволюцией многопроцессорных систем, от одноуровневых моделей с общей памятью до многоуровневых иерархических моделей с распределенной памятью, ориентированных на кластерные вычислительные системы с многоядерными ускорителями. В заключении обзора приводятся рекомендации по выбору возможных направлениий дальнейших исследований в области разработки математических моделей параллельных вычислений.
Модель параллельных вычислений, обзор, параллельное программирование, многопроцессорные системы, оценка производительности, предсказание времени выполнения алгоритма
Короткий адрес: https://sciup.org/147233202
IDR: 147233202 | DOI: 10.14529/cmse190304
Список литературы Обзор моделей параллельных вычислений
- Zhang Y. et al. Models of Parallel Computation: a Survey and Classification // Frontiers of Computer Science in China. Higher Education Press, 2007. Vol. 1, No. 2. P. 156-165. DOI: 10.1007/s11704-007-0016-1
- Valiant L.G. A Bridging Model for Parallel Computation // Communications of the ACM. 1990. Vol. 33, No. 8. P. 103-111. DOI: 10.1145/79173.79181
- Campbell D.K.G. A Survey of Models of Parallel Computation. Technical Report No.YCS97-278. 1997. 37 p.
- Shepherdson J.C., Sturgis H.E. Computability of Recursive Functions // Journal of the ACM. ACM, 1963. Vol. 10, No. 2. P. 217-255. DOI: 10.1145/321160.321170
- Elgot C.C., Robinson A. Random-Access Stored-Program Machines, an Approach to Programming Languages // Journal of the ACM. ACM, 1964. Vol. 11, No. 4. P. 365-399. DOI: 10.1145/321239.321240
- Hartmanis J. Computational Complexity of Random Access Stored Program Machines // Mathematical Systems Theory. Springer-Verlag, 1971. Vol. 5, No. 3. P. 232-245.
- DOI: 10.1007/BF01694180
- Cook S.A., Reckhow R.A. Time Bounded Random Access Machines // Journal of Computer and System Sciences. Academic Press, 1973. Vol. 7, No. 4. P. 354-375.
- DOI: 10.1016/S0022-0000(73)80029-7
- Aho A. V., Hopcroft J.E., Ullman J.D. The Design and Analysis of Computer Algorithms. London, Amsterdam, Don Mills, Ontario, Sydney: Addison-Wesley, 1974. 470 p.
- Skillicorn D.B., Talia D. Models and Languages for Parallel Computation // ACM Computing Surveys. 1998. Vol. 30, No. 2. P. 123-169.
- DOI: 10.1145/280277.280278
- Fortune S., Wyllie J. Parallelism in Random Access Machines // Proceedings of the Tenth Annual ACM Symposium on Theory of Computing - STOC'78. New York, New York, USA: ACM Press, 1978. P. 114-118.
- DOI: 10.1145/800133.804339
- Culler D. et al. LogP: Towards a Realistic Model of Parallel Computation // Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPOPP'93. New York, New York, USA: ACM Press, 1993. P. 1-12.
- DOI: 10.1145/155332.155333
- Yuan L. et al. LogGPH: A Parallel Computational Model with Hierarchical Communication Awareness // Proceedings of the 2010 13th IEEE International Conference on Computational Science and Engineering - CSE'10. Washington, DC, US: IEEE Computer Society, 2010. P. 268-274.
- DOI: 10.1109/CSE.2010.40
- Lu F., Song J., Pang Y. HLognGP: A Parallel Computation Model for GPU clusters // Concurrency and Computation: Practice and Experience. 2015. Vol. 27, No. 17. P. 4880-4896.
- DOI: 10.1002/cpe.3475
- Qiao X., Chen S., Yang L.T. HPM: a Hierarchical Model for Parallel Computations // International Journal of High Performance Computing and Networking. 2004. Vol. 1, No. 1-3. P. 117-127.
- DOI: 10.1504/IJHPCN.2004.007571
- Rico-Gallego J.-A., Díaz-Martín J.-C. τ-Lop: Modeling Performance of Shared Memory MPI // Parallel Computing. North-Holland, 2015. Vol. 46. P. 14-31.
- DOI: 10.1016/J.PARCO.2015.02.006
- Rico-Gallego J.-A., Lastovetsky A.L., Diaz-Martin J.-C. Model-Based Estimation of the Communication Cost of Hybrid Data-Parallel Applications on Heterogeneous Clusters // IEEE Transactions on Parallel and Distributed Systems. 2017. Vol. 28, No. 11. P. 3215-3228.
- DOI: 10.1109/TPDS.2017.2715809
- Bilardi G. et al. On the Effectiveness of D-BSP as a Bridging Model of Parallel Computation // Proceedings of the International Conference on Computational Science - ICCS'01. Part II. Lecture Notes in Computer Science, Vol. 2074. Berlin, Heidelberg: Springer, 2001. P. 579-588.
- DOI: 10.1007/3-540-45718-6_63
- Ежова Н.А., Соколинский Л.Б. Модель параллельных вычислений для многопроцессорных систем с распределенной памятью // Вестник ЮУрГУ. Серия: Вычислительная математика и информатика. 2018. Том 7, № 2. С. 32-49.
- DOI: 10.14529/cmse180203
- Ежова Н.А., Соколинский Л.Б. Исследование масштабируемости итерационных алгоритмов при суперкомпьютерном моделировании физических процессов // Вычислительные методы и программирование. 2018. Том 19, № 4. С. 416-430.
- DOI: 10.26089/NumMet.v19r437
- Ceze L.H. Shared-Memory Multiprocessors // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1810-1812.
- DOI: 10.1007/978-0-387-09766-4_142
- Nayfeh B.A., Olukotun K. A Single-chip Multiprocessor // Computer. 1997. Vol. 30, No. 9. P. 79-85.
- DOI: 10.1109/2.612253
- Bardine A. et al. NUMA Caches // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1329-1338.
- DOI: 10.1007/978-0-387-09766-4_16
- Snir M. Distributed-Memory Multiprocessor // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 574-578.
- Pfister G.F. In Search of Clusters. 2nd Edition. Upper Saddle River, NJ: Prentice Hall, 1998. 575 p.
- Beowulf Cluster Computing with Linux / ed. Sterling T.L. Cambridge, London: MIT Press, 2002. 496 p.
- Owens J.D. et al. GPU Computing // Proceedings of the IEEE. 2008. Vol. 96, No. 5. P. 879-899.
- DOI: 10.1109/JPROC.2008.917757
- Rochange C., Uhrig S., Sainrat P. Memory Hierarchy // Time-Predictable Architectures. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2014. P. 69-104.
- DOI: 10.1002/9781118790229.ch4
- Hennessy J.L., Patterson D.A. Computer Architecture: A Quantitative Approach // Computer. Fifth Edit. Morgan Kaufmann, 2011. 856 p.
- Bottomley J. Understanding Caching // Linux Journal. 2004. No. 117. P. 58-62.
- Wu K. et al. Early Evaluation of Intel Optane Non-Volatile Memory with HPC I/O Workloads // arXiv:1708.02199v2 [cs.DC]. 2017. 6 p.
- Yang C.-T., Huang C.-L., Lin C.-F. Hybrid CUDA, OpenMP, and MPI Parallel Programming on Multicore GPU Clusters // Computer Physics Communications. North-Holland, 2011. Vol. 182, No. 1. P. 266-269.
- DOI: 10.1016/J.CPC.2010.06.035
- Bilardi G., Pietracaprina A. Models of Computation, Theoretical // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1150-1158. 10.1007/978-0-387- 09766-4_218.
- DOI: 10.1007/978-0-387-09766-4_218
- Skillicorn D.B. Parallelism and the Bird-Meertens Formalism. Kingston, Canada, 1992. 16 p.
- Bilardi G., Pietracaprina A., Pucci G. A Quantitative Measure of Portability with Application to Bandwidth-Latency Models for Parallel Computing // Euro-Par'99 Parallel Processing. Euro-Par 1999. Lecture Notes in Computer Science, Vol 1685. Springer, Berlin, Heidelberg, 1999. P. 543-551.
- DOI: 10.1007/3-540-48311-X_76
- Grama A. et al. Architecture Independent Analysis of Parallel Programs // Proceedings of the International Conference on Computational Science - ICCS'01. Part II. Lecture Notes in Computer Science, Vol. 2074. Berlin, Heidelberg: Springer, 2001. P. 599-608.
- DOI: 10.1007/3-540-45718-6_65
- JaJa J.F. PRAM (Parallel Random Access Machines) // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1608-1615.
- DOI: 10.1007/978-0-387-09766-4_23
- Goldschlager L.M. A Unified Approach to Models of Synchronous Parallel Machines // Proceedings of the Tenth Annual ACM Symposium on Theory of Computing - STOC'78. New York, New York, USA: ACM Press, 1978. P. 89-94.
- DOI: 10.1145/800133.804336
- Ladner R.E., Fischer M.J. Parallel Prefix Computation // Journal of the ACM. 1980. Vol. 27, No. 4. P. 831-838.
- DOI: 10.1145/322217.322232
- JaJa J.F. An Introduction to Parallel Algorithms. Redwood City, CA, USA: Addison Wesley Publishing Co., Reading, 1992. 576 p.
- Darema F. et al. A Single-Program-Multiple-Data Computational Model for EPEX/FORTRAN // Parallel Computing. 1988. Vol. 7, No. 1. P. 11-24.
- DOI: 10.1016/0167-8191(88)90094-4
- Darema F. SPMD Computational Model // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 1933-1943.
- DOI: 10.1007/978-0-387-09766-4_26
- Cook S., Dwork C., Reischuk R. Upper and Lower Time Bounds for Parallel Random Access Machines without Simultaneous Writes // SIAM Journal on Computing. Society for Industrial and Applied Mathematics, 1986. Vol. 15, No. 1. P. 87-97.
- DOI: 10.1137/0215006
- Karp R.M., Ramachandran V. Parallel Algorithms for Shared-Memory Machines // Handbook of theoretical computer science. Volume A: Algorithms and Complexity / ed. Van Leeuwen J. Amsterdam, New York, Oxford, Tokyo: Elsevier, 1990. P. 871-941.
- Pippenger N. On Simultaneous Resource Bounds // 20th Annual Symposium on Foundations of Computer Science (SFCS 1979). San Juan, Puerto Rico: IEEE, 1979. P. 307-311.
- DOI: 10.1109/SFCS.1979.29
- Pippenger N. Pebbling with an Auxiliary Pushdown // Journal of Computer and System Sciences. Academic Press, 1981. Vol. 23, No. 2. P. 151-165. 10.1016/0022- 0000(81)90011-8.
- DOI: 10.1016/0022-0000(81)90011-8
- Snyder L. Type Architectures, Shared Memory, and the Corollary of Modest Potential // Annual Review of Computer Science. 1986. Vol. 1, No. 1. P. 289-317.
- DOI: 10.1146/annurev.cs.01.060186.001445
- Mehlhorn K., Vishkin U. Randomized and Deterministic Simulations of PRAMs by Parallel Machines with Restricted Granularity of Parallel Memories // Acta Informatica. Springer-Verlag, 1984. Vol. 21, No. 4. P. 339-374.
- DOI: 10.1007/BF00264615
- Gibbons P.B., Matias Y., Ramachandran V. The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms // SIAM Journal on Computing. 1998. Vol. 28, No. 2. P. 733-769.
- DOI: 10.1137/S009753979427491
- Gibbons P.B., Matias Y. Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation? // Theory of Computing Systems. 1999. Vol. 32, No. 3. P. 327-359.
- DOI: 10.1007/s002240000121
- Aggarwal A., Chandra A.K., Snir M. On Communication Latency in PRAM Computations // Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA'89. New York, New York, USA: ACM Press, 1989. P. 11-21.
- DOI: 10.1145/72935.72937
- Mansour Y., Nisan N., Vishkin U. Trade-offs between Communication Throughput and Parallel Time // Journal of Complexity. Academic Press, 1999. Vol. 15, No. 1. P. 148-166.
- DOI: 10.1006/JCOM.1998.0498
- Cole R., Zajicek O. The APRAM: Incorporating Asynchrony into the PRAM Model // Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA'89. New York, New York, USA: ACM Press, 1989. P. 169-178.
- DOI: 10.1145/72935.72954
- Gibbons P.B. A More Practical PRAM Model // Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA'89. New York, New York, USA: ACM Press, 1989. P. 158-168.
- DOI: 10.1145/72935.72953
- VALIANT L.G. General Purpose Parallel Architectures // Handbook of Theoretical Computer Science (Vol. A): Algorithms and Complexity. Elsevier, 1990. P. 943-971.
- DOI: 10.1016/B978-0-444-88071-0.50023-0
- de la Torre P., Kruskal C.P. Towards a Single Model of Efficient Computation in Real Parallel Machines // Future Generation Computer Systems. North-Holland, 1992. Vol. 8, No. 4. P. 395-408.
- DOI: 10.1016/0167-739X(92)90071-I
- Heywood T., Ranka S. A Practical Hierarchical Model of Parallel Computation I. The model // Journal of Parallel and Distributed Computing. Academic Press, 1992. Vol. 16, No. 3. P. 212-232.
- DOI: 10.1016/0743-7315(92)90034-K
- Forsell M. A PRAM-NUMA Model of Computation for Addressing Low-TLP Workloads // International Journal of Networking and Computing. [Hiroshima University], 2011. Vol. 1, No. 1. P. 21-35.
- Ranade A.G. How to Emulate Shared Memory // Journal of Computer and System Sciences. Academic Press, 1991. Vol. 42, No. 3. P. 307-326. 10.1016/0022- 0000(91)90005-P.
- DOI: 10.1016/0022-0000(91)90005
- Forsell M. et al. Hardware and Software Support for NUMA Computing on Configurable Emulated Shared Memory Architectures // 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum. IEEE, 2013. P. 640-648.
- DOI: 10.1109/IPDPSW.2013.146
- Forsell M. E - A Language for Thread-Level Parallel Programming on Synchronous Shared Memory NOCs // WSEAS Transactions on Computers. 2004. Vol. 3, No. 3. P. 807-812.
- Forsell M., Leppanen V. An Extended PRAM-NUMA Model of Computation for TCF Programming // International Journal of Networking and Computing. 2013. Vol. 3, No. 1. P. 98-115.
- Aggarwal A. et al. A Model for Hierarchical Memory // Proceedings of the Nineteenth annual ACM Conference on Theory of Computing - STOC'87. New York, New York, USA: ACM Press, 1987. P. 305-314.
- DOI: 10.1145/28395.28428
- Aggarwal A., Chandra A.K., Snir M. Hierarchical Memory with Block Transfer // 28th Annual Symposium on Foundations of Computer Science (sfcs 1987). IEEE, 1987. P. 204- 216.
- DOI: 10.1109/SFCS.1987.31
- Luccio F., Pagli L. A Model of Sequential Computation with Pipelined Access to Memory // Mathematical Systems Theory. Springer-Verlag, 1993. Vol. 26, No. 4. P. 343-356.
- DOI: 10.1007/BF01189854
- Mead C.A., Conway L.A. Introduction to VLSI systems. Boston, MA, USA: AddisonWesley, 1980. 396 p.
- Alpern B. et al. The Uniform Memory Hierarchy Model of Computation // Algorithmica. Springer-Verlag, 1994. Vol. 12, No. 2-3. P. 72-109.
- DOI: 10.1007/BF01185206
- Vitter J.S., Shriver E.A.M. Algorithms for parallel memory, II: Hierarchical multilevel memories // Algorithmica. Springer-Verlag, 1994. Vol. 12, No. 2-3. P. 148-169.
- DOI: 10.1007/BF01185208
- Tiskin A. BSP (Bulk Synchronous Parallelism) // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 192-199.
- DOI: 10.1007/978-0-387-09766-4_311
- Goudreau M. et al. Towards Efficiency and Portability: Programming with the BSP Model // Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures - SPAA'96. New York, NY, USA: ACM Press, 1996. P. 1-12.
- DOI: 10.1145/237502.237503
- Bisseling R.H. Parallel Scientific Computation: A Structured Approach using BSP and MPI. New York: Oxford University Press, 2004. 325 P.
- McColl W.F. Scalable Computing // J. van Leeuwen (eds). Computer Science Today: Recent Trends and Developments. Lecture Notes in Computer Science, Vol. 1000. Berlin, Heidelberg: Springer, 1995. P. 46-61.
- DOI: 10.1007/BFb0015236
- Tiskin A. The Bulk-synchronous Parallel Random Access Machine // Theoretical Computer Science. 1998. Vol. 196, No. 1-2. P. 109-130.
- DOI: 10.1016/S0304-3975(97)00197-7
- McColl W.F., Tiskin A. Memory-Efficient Matrix Multiplication in the BSP Model // Algorithmica. Springer-Verlag, 1999. Vol. 24, No. 3-4. P. 287-297.
- DOI: 10.1007/PL00008264
- Kielmann T., Gorlatch S. Bandwidth-Latency Models (BSP, LogP) // Encyclopedia of Parallel Computing. Boston, MA: Springer US, 2011. P. 107-112.
- DOI: 10.1007/978-0-387-09766-4_189
- Alexandrov A. et al. LogGP: Incorporating Long Messages into the LogP Model for Parallel Computation // Journal of Parallel and Distributed Computing. 1997. Vol. 44, No. 1. P. 71-79.
- DOI: 10.1006/jpdc.1997.1346
- Kielmann T., Bal H.E., Verstoep K. Fast Measurement of LogP Parameters for Message Passing Platforms // Parallel and Distributed Processing. IPDPS 2000. Lecture Notes in Computer Science, Vol. 1800. Berlin, Heidelberg: Springer, 2000. P. 1176-1183.
- DOI: 10.1007/3-540-45591-4_162
- Gropp W., Lusk E., Skjellum A. Using MPI: Portable Parallel Programming with the Message-Passing Interface. Second Ed. MIT Press, 1999.
- Gropp W. MPI 3 and Beyond: Why MPI Is Successful and What Challenges It Faces // Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, Vol. 7490 / ed. Träff J.L., Benkner S., Dongarra J.J. Berlin, Heidelberg: Springer, 2012. P. 1-9.
- DOI: 10.1007/978-3-642-33518-1_1
- Touyama T., Horiguchi S. Parallel Computation Model LogPQ // High Performance Computing. ISHPC 1997. Lecture Notes in Computer Science, vol 1336 / ed. Polychronopoulos C., Joe K., Araki K. A.M. Berlin, Heidelberg: Springer, 1997. P. 327-334.
- DOI: 10.1007/BFb0024227
- Touyama T., Horiguchi S. Performance Evaluation of Practical Parallel Computation Model LogPQ // Proceedings of the Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99). Washington, DC, USA: IEEE Computer Society, 1999. P. 216-221.
- DOI: 10.1109/ISPAN.1999.778942
- Palmer J., Steele G.L. Connection Machine model CM-5 System Overview // Frontiers'92, the Fourth Symposium on the Frontiers of Massive Parallel Computation, October 19-21, 1992, McLean, Virginia. IEEE Computer Society Press, 1992. P. 474-483.
- DOI: 10.1109/FMPC.1992.234877
- Ino F., Fujimoto N., Hagihara K. LogGPS: A Parallel Computational Model for Synchronization Analysis // ACM SIGPLAN Notices. 2001. Vol. 36, No. 7. P. 133-142.
- DOI: 10.1145/568014.379592
- Gropp W. et al. A High-performance, Portable Implementation of the MPI Message Passing Interface Standard // Parallel Computing. 1996. Vol. 22, No. 6. P. 789-828.
- DOI: 10.1016/0167-8191(96)00024-5
- Moritz C.A. et al. LoGPC: Modeling Network Contention in Message-Passing Programs // ACM SIGMETRICS Performance Evaluation Review. New York, New York, USA: ACM Press, 1998. Vol. 26, No. 1. P. 254-263.
- DOI: 10.1145/277851.277933
- Moritz C.A., Frank M.I. LoGPC: Modeling Network Contention in Message-Passing Programs // IEEE Transactions on Parallel and Distributed Systems. 2001. Vol. 12, No. 4. P. 404-415.
- DOI: 10.1109/71.920589
- Agarwal A. et al. The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor // Scalable Shared Memory Multiprocessors. Proceedings of a workshop held May 26-27, 1990, in Seattle, Wash. / ed. Dubois M., Thakkar S. Boston, MA: Springer, 1992. P. 239-261.
- DOI: 10.1007/978-1-4615-3604-8_13
- Kubiatowicz J., Agarwal A. Anatomy of a Message in the Alewife multiprocessor // ACM International Conference on Supercomputing 25th Anniversary Volume. New York, NY, USA: ACM Press, 2014. P. 193-204.
- DOI: 10.1145/2591635.2667168
- Cameron K.W., Ge R., Sun X.-H. lognP and log3P: Accurate Analytical Models of Pointto-point Communication in Distributed Systems // IEEE Transactions on Computers. 2007. Vol. 56, No. 3. P. 314-327.
- DOI: 10.1109/TC.2007.38
- Cameron K.W., Ge R. Predicting and Evaluating Distributed Communication Performance // Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. IEEE, 2004. P. 15.
- DOI: 10.1109/SC.2004.40
- Cameron K.W., Sun X.-H. Quantifying Locality Effect in Data Access Delay: Memory logP // Proceedings of the 2003 IEEE International Parallel and Distributed Processing Symposium (IPDPS'03). IEEE Comput. Soc, 2003. P. 8.
- DOI: 10.1109/IPDPS.2003.1213137
- Cappello F. et al. HiHCoHP-Toward a Realistic Communication Model for Hierarchical Hyperclusters of Heterogeneous Processors // Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001. IEEE Comput. Soc., 2001. P. 6.
- DOI: 10.1109/IPDPS.2001.924978
- Cappello F. et al. An Algorithmic Model for Heterogeneous Hyper-Clusters: Rationale and Experience // International Journal of Foundations of Computer Science. World Scientific Publishing Company, 2005. Vol. 16, No. 02. P. 195-215.
- DOI: 10.1142/S0129054105002942
- Bosque J.L., Pastor L. A Parallel Computational Model for Heterogeneous Clusters // IEEE Transactions on Parallel and Distributed Systems. 2006. Vol. 17, No. 12. P. 1390-1400.
- DOI: 10.1109/TPDS.2006.165
- Hoefler T. et al. LogfP - a Model for Small Messages in InfiniBand // Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. Washington, DC, USA: IEEE Computer Society, 2006. P. 319-319.
- DOI: 10.1109/IPDPS.2006.1639624
- Jepsen T.C. InfiniBand // Distributed Storage Networks: Architecture, Protocols and Management. Chichester, West Sussex, England: John Wiley & Sons, 2013. P. 159-174.
- DOI: 10.1002/9780470871461.ch6
- Nasri W., Tarhouni O., Slimi N. PLP: Towards a Realistic and Accurate Model for Communication Performances on Hierarchical Cluster-based Systems // 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 2008. P. 1-8.
- DOI: 10.1109/IPDPS.2008.4536486
- Hoefler T., Schneider T., Lumsdaine A. LogGOPSim - Simulating Large-Scale Applications in the LogGOPS Model // Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC'10. New York, New York, USA: ACM Press, 2010. P. 597-604.
- DOI: 10.1145/1851476.1851564
- Valiant L.G. A Bridging Model for Multi-core Computing // Journal of Computer and System Sciences. Elsevier Inc., 2011. Vol. 77, No. 1. P. 154-166.
- DOI: 10.1016/j.jcss.2010.06.012
- Tu B. et al. Performance Analysis and Optimization of MPI Collective Operations on Multicore Clusters // The Journal of Supercomputing. Springer US, 2012. Vol. 60, No. 1. P. 141-162.
- DOI: 10.1007/s11227-009-0296-3
- Tu B. et al. Accurate Analytical Models for Message Passing on Multi-core Clusters // 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing. IEEE, 2009. P. 133-139.
- DOI: 10.1109/PDP.2009.18
- Sterling T. et al. SLOWER: A Performance Model for Exascale Computing // Supercomputing Frontiers and Innovations. 2014. Vol. 1, No. 2. P. 42-57.
- DOI: 10.14529/jsfi140203
- Gerbessiotis A. V. Extending the BSP Model for Multi-core and Out-of-core Computing: MBSP // Parallel Computing. Elsevier B.V., 2015. Vol. 41. P. 90-102.
- DOI: 10.1016/j.parco.2014.12.002
- Amaris M. et al. A Simple BSP-based Model to Predict Execution Time in GPU Applications // 2015 IEEE 22nd International Conference on High Performance Computing (HiPC). IEEE, 2015. P. 285-294.
- DOI: 10.1109/HiPC.2015.34
- Maggs B.M., Matheson L.R., Tarjan R.E. Models of Parallel Computation: a Survey and Synthesis // Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences. IEEE Comput. Soc. Press, 1995. P. 61-70.
- DOI: 10.1109/HICSS.1995.375476
- Rico-Gallego J.A. et al. A Survey of Communication Performance Models for High-Performance Computing // ACM Computing Surveys. ACM, 2019. Vol. 51, No. 6. P. 1-36.
- DOI: 10.1145/3284358