Incremental approaches to thread-local garbage collection

Автор: Filatov Alexander Yu., Mikheev Vitaly V.

Журнал: Проблемы информатики @problem-info

Рубрика: Прикладные информационные технологии

Статья в выпуске: 2 (55), 2022 года.

Бесплатный доступ

Recent advances in computational technology gave rise to highly multiprocessor systems that are used in different domains. Modern managed programming languages such as Java, Scala and Kotlin are expected to use all of the available computational resources to provide competitive performance in production environments. These requirements pose a new challenges to developers of managed runtimes which require scalable and efficient automatic memory management. Wide adoption of cache coherent non-uniform memory access (ccNUMA) systems drew attention to garbage collection techniques that encourage data locality and minimize number of inter-node memory accesses. Thread-local garbage collection is a promising research direction that allows to design scalabale, throughput-oriented and NUMA-aware algorithms for automatic memory management. Memory manager divide heap objects into independent groups: local objects that are biased to the thread allocated them and shared (or global) objects which are accessible by more than one thread. Any operation with thread-local memory - either allocation of a new object, tracing of reachable objects or reclamation of unused memory - may be performed in an independent manner without any synchronization between different threads. Improved locality of data make thread-local memory manager an attractive alternative to conventional GC algorithms especially for software targeting ccNUMA hardware. One of the main advantages of the proposed scheme is that memory manager may use any garbage collection approach to manage thread-local heaps. In particular, any existing well-studied algorithm for uniprocessor systems may by adopted to thread-local setting. However, single-threaded tracing approaches share a common drawback - marking phase may take a significant time to complete leading to low responce time and reduced throughput of an application. There are plenty of incremental approaches to classic garbage collector designes but applicability of them to thread-local memory management (which itself is an incremental GC design) is an open research question. This research paper focuses on the approach that treats local and global objects as generations and uses special „globalization" operation to evacuate an object from local heap into shared memory. Conventional generational scheme is known to introduce memory drag - unreachable objects from old generation remain in heap until collection of this generation is triggered. Problem of such „floatinggarbage" introduces more overheads for thread-local garbage collector because it cannot locally reclaim shared objects. Performance overheads of preliminary evacuation require thorough analysis before being- applied in production environments. Main contributions of this work are the following: Evacuation procedure is formalized in terms of an abstract object graph concurrently modified by intercommunicating agents and correctness of evacuating transformation is proven. Upper bound for potential number of inter-agent messages that depend on local component size is found. - Two non-trivial evacuation strategies are studied using large benchmarking suite. First strategy uses age of objects to determine the long-living ones and evacuate them into shared heap. Second strategy is based on the idea that stack frame depth is proportional to the overhead incurred by repeated thread-local marking of references stored in memory of this frame. Described incremental thread-local memory manager was used to run well-known DaCapo benchmark suite written in Java, machine-learning application written in Scala and several tests from open-source benchmarking repository aimed at performance evaluation of Apache Spark framework for distributed large-scale data processing. Performance measurements indicate that chosing optimal parameters to the studied incremental algorithms may tremendously increase throughput of some applications. At the same time, there are applications that are very sensitive to the configuration of a thread-local memory manager and slight modification of a parameter may lead to significant performance drop. Development of auto-tuning techniques for incremental thread-local garbage collection remains an open problem.

Еще

Incremental garbage collection, jvm, thread local heaps, xi'ma

Короткий адрес: https://sciup.org/143179389

IDR: 143179389   |   DOI: 10.24412/2073-0667-2022-2-53-72

Статья научная