Инкрементальные расширения потоково-локальной сборки мусора
Автор: Филатов Александр Юрьевич, Михеев Виталий Витальевич
Журнал: Проблемы информатики @problem-info
Рубрика: Прикладные информационные технологии
Статья в выпуске: 2 (55), 2022 года.
Бесплатный доступ
Потоково-локальные системы управления памятью сводят проблему обнаружения недостижимых объектов в многопроцессорной среде к применению трассирующего алгоритма в одном потоке к отдельному участку динамической памяти - локальной куче. Существенным недостатком такого подхода является необходимость выполнять дорогостоящую процедуру обхода объектного графа в одном потоке приложения, что негативно сказывается на отзывчивости программы. Данная работа обсуждает применимость различных инкрементальных техник, нацеленных на уменьшение времени локальной разметки, и обосновывает корректность предложенных алгоритмов. Описанные подходы расширили существующий потоково-локальный сборщик мусора в экспериментальной виртуальной машине для языка Java, что позволило провести сравнительный анализ эффективности предложенных стратегий на представительном наборе приложений для измерения производительности.
Инкрементальная сборка мусора, потоково-локальные кучи, виртуальная машина java, jvm, numa
Короткий адрес: https://sciup.org/143179389
IDR: 143179389 | УДК: 004.43 | DOI: 10.24412/2073-0667-2022-2-53-72
Incremental approaches to thread-local garbage collection
Recent advances in computational technology gave rise to highly multiprocessor systems that are used in different domains. Modern managed programming languages such as Java, Scala and Kotlin are expected to use all of the available computational resources to provide competitive performance in production environments. These requirements pose a new challenges to developers of managed runtimes which require scalable and efficient automatic memory management. Wide adoption of cache coherent non-uniform memory access (ccNUMA) systems drew attention to garbage collection techniques that encourage data locality and minimize number of inter-node memory accesses. Thread-local garbage collection is a promising research direction that allows to design scalabale, throughput-oriented and NUMA-aware algorithms for automatic memory management. Memory manager divide heap objects into independent groups: local objects that are biased to the thread allocated them and shared (or global) objects which are accessible by more than one thread. Any operation with thread-local memory - either allocation of a new object, tracing of reachable objects or reclamation of unused memory - may be performed in an independent manner without any synchronization between different threads. Improved locality of data make thread-local memory manager an attractive alternative to conventional GC algorithms especially for software targeting ccNUMA hardware. One of the main advantages of the proposed scheme is that memory manager may use any garbage collection approach to manage thread-local heaps. In particular, any existing well-studied algorithm for uniprocessor systems may by adopted to thread-local setting. However, single-threaded tracing approaches share a common drawback - marking phase may take a significant time to complete leading to low responce time and reduced throughput of an application. There are plenty of incremental approaches to classic garbage collector designes but applicability of them to thread-local memory management (which itself is an incremental GC design) is an open research question. This research paper focuses on the approach that treats local and global objects as generations and uses special „globalization" operation to evacuate an object from local heap into shared memory. Conventional generational scheme is known to introduce memory drag - unreachable objects from old generation remain in heap until collection of this generation is triggered. Problem of such „floatinggarbage" introduces more overheads for thread-local garbage collector because it cannot locally reclaim shared objects. Performance overheads of preliminary evacuation require thorough analysis before being- applied in production environments. Main contributions of this work are the following: Evacuation procedure is formalized in terms of an abstract object graph concurrently modified by intercommunicating agents and correctness of evacuating transformation is proven. Upper bound for potential number of inter-agent messages that depend on local component size is found. - Two non-trivial evacuation strategies are studied using large benchmarking suite. First strategy uses age of objects to determine the long-living ones and evacuate them into shared heap. Second strategy is based on the idea that stack frame depth is proportional to the overhead incurred by repeated thread-local marking of references stored in memory of this frame. Described incremental thread-local memory manager was used to run well-known DaCapo benchmark suite written in Java, machine-learning application written in Scala and several tests from open-source benchmarking repository aimed at performance evaluation of Apache Spark framework for distributed large-scale data processing. Performance measurements indicate that chosing optimal parameters to the studied incremental algorithms may tremendously increase throughput of some applications. At the same time, there are applications that are very sensitive to the configuration of a thread-local memory manager and slight modification of a parameter may lead to significant performance drop. Development of auto-tuning techniques for incremental thread-local garbage collection remains an open problem.
Список литературы Инкрементальные расширения потоково-локальной сборки мусора
- Jones R., Hosking A., Moss Е. The Garbage Collection Handbook: The Art of Automatic Memory Management. Chapman k, Hall/CRC, 1st edition. 2011.
- Doligez D., Leroy X. A Concurrent, Generational Garbage Collector for a Multithreaded Implementation of ML. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '93. Association for Computing Machinery. 1993. New York, USA. P. 113-123.
- Meyer B. Object-Oriented Software Construction (2nd Ed.). PrenticeHall, Inc., USA, 1997.
- Tamar Domani Gal, Goldshtein Gal, Kolodner Elliot K., Lewis Ethan, Petrank Erez, Sheinwald Dafna. Thread-Local Heaps for Java. In SIGPLAN Not, ACM Press, 2002. P. 76-87.
- Marlow S., Peyton .Jones S. Multicorc garbage collection with local heaps. In Proceedings of the International Symposium on Memory Management, ISMM '11, New York, NY, USA, 2011. Association for Computing Machinery. P. 21 32.
- Mole M., .Jones R., Kalibera T. A study of sharing definitions in thread-local heaps. In ICOOOLPS. 2012.
- Sivaramakrishnan КС, Dolan S., White L., .Jaffer S., Kelly Т., Sahoo A., Parimala S., Dhiman A., Madhavapeddv A. Retrofitting parallelism onto OCAML. Proc. ACM Program. Lang., 4(ICFP), August 2020.
- Filatov A., Mikhccv V. Quantitative evaluation of thread-local garbage collection efficiency for ■JAVA. Programming and Computer Software, 45:111, 01 2019.
- Wilson P. R. Uniprocessor garbage collection techniques. In Proceedings of the International Workshop on Memory Management, IWMM '92, P. 1 42, London, UK, UK, 1992. Springer-Verlag.
- Lieberman H., Hewitt C. A real-time garbage collector based on the lifetimes of objects. Commun. ACM, 26 (6): 419 429, .June 1983.
- Anderson T. A. Optimizations in a private nurserv-based garbage collector. In Proceedings of the 2010 International Symposium on Memory Management, ISMM '10, P. 21 30, New York, NY, USA, 2010. Association for Computing Machinery.
- Filatov A., Mikhccv V. Evaluation of thread-local garbage collection. In 2020 Ivannikov Memorial Workshop (IVMEM), P. 15 21, 2020.
- Apache Hadoop's official website, 2020 [Electron, res.]: https://hadoop.apache.org/.
- Apache SparkTVofficial website. [Electron, res.]: https://spark.apache.org/, 2020.
- Gurcvich Yu. Specification and validation methods, chapter Evolving Algebras 1993: Lipari Guide, P. 9 36. Oxford University Press, Inc., New York, NY, USA, 1995.
- Zamulin A. An ASM-based formal model of a .Java program. Programming and Computer Software, 29 (3): P. 130 139, 2003.
- Blackburn S. M., Garner R., Hoffmann C., Khang A. M., McKinlev K. S., Bcntzur R., Diwan A., Feinbcrg D., Frampton D., Guycr S. Z., Hirzcl M., Hosking A., .Jump M., Lee H., Moss .J.E.B., Phansalkar A., Stefanovie D., VanDruncn Т., Daniel von Dineklage, Wicdermann B. The DaXapo benchmarks: .Java benchmarking development and analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object- Oriented Programming Systems, Languages, and Applications, OOPSLA '06, P. 169 190, New York, NY, USA, 2006. Association for Computing Machinery.
- Dreppcr U. What every programmer should know about memory. 2007.
- Mceallum A., Schultz K., Singh S. Factorie: Probabilistic programming via imperatively defined factor graphs. P. 1249 1257, 01 2009.
- Blei D., Ng A., .Jordan M. Latent Dirichlet allocation. .Journal of Machine Learning Research, 2003. V. 3. P. 993 1022.
- Sewe A., Mezini M., Sarimbckov A., Binder W. Da capo con Scala: design and analysis of a Scala benchmark suite for the .Java virtual machine. In OOPSLA '11 Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, P. 657 676. ACM, 2011.
- HiBeneh source repository. [Electron. Res.]: https://github.com/Intel-bigdata/HiBench, 2020.