On program restoration from checkpoints set
Автор: Polyakov A.Y.
Статья в выпуске: 35 (211), 2010 года.
Бесплатный доступ
In paper two approaches to distributed programs restore problem from checkpoints set are described. Computation node wide algorithm of parent-child relationships and group/session assignement recreation at restore time is proposed. Also coordinated algorithm for process set restoration from several nodes/terminals is designed. Described algorightms are implemented in checkpointing package called DMTCP (Distributed MultiThreaded Checkpointing).
Fault tolerance, rollback-recovery, checkpointing
Короткий адрес: https://sciup.org/147159079
IDR: 147159079
Статья научная