Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems

Бесплатный доступ

Efficient use and high output of any supercomputer depends on a great number of factors. The problem of controlling granted resource utilization is one of those, and becomes especially noticeable in conditions of concurrent work of many user projects. It is important to provide users with detailed information on peculiarities of their executed jobs. At the same time it is important to provide project managers with detailed information on resource utilization by project members by giving access to the detailed job analysis. Unfortunately, such information is rarely available. This gap should be eliminated with our proposed approach to supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems based on system monitoring data management and study, building integral job characteristics, revealing job categories and single job run peculiarities.

Еще

Supercomputer, efficiency, system monitoring, job categories, integral job characteristics, queued job collection, job queue, resource utilization control

Короткий адрес: https://sciup.org/147160605

IDR: 147160605   |   DOI: 10.14529/cmse160403

Список литературы Supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems

  • Top50 Supercomputers of Russia and CIS. URL: http://top50.supercomputers.ru/(дата обращения: 15.02.2016).
  • Top500 Supercomputer sites. URL: http://top500.org/(дата обращения: 15.02.2016).
  • Antonov A., Zhumatiy S., Nikitenko D., Stefanov K., Teplov A., Shvets P. Analysis of dynamic characteristics of job stream on supercomputer system Numerical Methods and Programming, 2013. Vol. 14, No. 2. P. 104-108.
  • Safonov A., Kostenetskiy P., Borodulin K., Melekhin F. A monitoring system for supercomputers of SUSU//Russian Supercomputing Days International Conference, Moscow, Russian Federation, 28-29 September, 2015, Proceedings. CEUR Workshop Proceedings, 2015. Vol. 1482. P. 662-666.
  • Stefanov K. et al. Dynamically Reconfigurable Distributed Modular Monitoring System for Supercomputers (DiMMon)//Procedia Computer Science/Elsevier B.V., 2015. Vol. 66. P. 625-634.
  • Nikitenko D. Complex approach to performance analysis of supercomputer systems based on system monitoring data//Numerical Methods and Programming, 2014. Vol. 15. P. 85-97.
  • Voevodin V., Zhumatiy S., Nikitenko D. Octoshell: Large Supercomputer Complex Administration System//Russian Supercomputing Days International Conference, Moscow, Russian Federation, 28-29 September, 2015, Proceedings. CEUR Workshop Proceedings, 2015. Vol. 1482. P. 69-83.
  • Nikitenko D., Voevodin V., Zhumatiy S. Resolving frontier problems of mastering large-scale supercomputer complexes//Proceedings of the ACM International Conference on Computing Frontiers (CF’16), Como, Italy, 16-18 May, 2016. ACM New York, NY, USA, 2016. P. 349-352.
  • Voevodin Vl., Antonov A., Bryzgalov P., Nikitenko D., Zhumatiy S., Sobolev S., Stefanov K., Voevodin Vad. Practice of ”Lomonosov” Supercomputer//Open systems, 2012. No. 7. P. 36-39.
  • Zhumatiy S., Nikitenko D. Approach to flexible supercomputers management//International supercomputing conference Scientific Services & Internet: all parallelism edges, Novorossiysk, Russian Federation, 23-28 September, 2013, Proceedings. MSU, 2013. P. 296-300.
  • Voevodin Vl. Supercomputer situational screen//Open systems, 2014. No. 3. P. 36-39.
  • Shvets P., Antonov A., Nikitenko D., Sobolev S., Stefanov K., Voevodin Vad., Voevodin V., Zhumatiy S. An Approach for Ensuring Reliable Functioning of a Supercomputer Based on a Formal Model//Parallel Processing and Applied Mathematics. 11th International Conference, PPAM 2015, Krakow, Poland, September 6-9, 2015. Springer International Publishing. Vol. 9573. P. 12-22.
  • Voevodin V., Antonov A., Dongarra J. AlgoWiki: an Open Encyclopedia of Parallel Algorithmic Features//Supercomputing Frontiers and Innovations, 2015. Vol. 2, No.1. P. 4-18.
  • SLURM workload manager. URL: http://slurm.schedmd.com/(дата обращения: 15.02.2016).
  • Cleo cluster batch system. URL: http://sourceforge.net/projects/cleo-bs/(дата обращения: 15.02.2016).
  • Ganglia Monitoring System. URL: http://ganglia.sourceforge.net/(дата обращения: 15.02.2016).
  • Collectd -The system statistics collection daemon. URL: https://collectd.org/(дата обращения: 15.02.2016).
  • Clustrx. URL: http://www.t-platforms.ru/products/software/clustrxproductfamily/clustrxwatch.html (дата обращения: 15.02.2016).
  • jQuery & jQuery UI. URL: http://jqueryui.com/(дата обращения: 15.02.2016).
  • TagIt. URL: http://aehlke.github.io/tag-it/(дата обращения: 15.02.2016).
Еще
Статья научная