A review of the recurrent transformer architecture in the context of memory-augmented neural networks

Автор: Bulatov A.S., Kuratov Y.M., Burtsev M.S.

Журнал: Труды Московского физико-технического института @trudy-mipt

Рубрика: Информатика и управление

Статья в выпуске: 4 (64) т.16, 2024 года.

Бесплатный доступ

This paper provides an overview of memory-augmented neural network (MANN) architectures, with a focus on the Recurrent Memory Transformer (RMT) for long-context tasks. Transformer architectures demonstrate high effectiveness in processing text, images, and speech; however, their application to long sequences is limited by the quadratic computational complexity of the self-attention mechanism and the challenge of separately storing local and global information. This study examines key memory-augmented models, particularly in the context of natural language processing. We analyze the RMT architecture, which overcomes these limitations with a recurrent memory mechanism that introduces special memory tokens, allowing the model to store and transfer information between sequence segments. This approach enables the model to capture both local and global dependencies while maintaining computational efficiency and scalability. Experimental results show that RMT outperforms comparable models, such as Transformer-XL, in processing long sequences, achieving high efficiency even with limited memory resources. This architecture presents a promising solution for a wide range of tasks requiring long-context, processing, such as algorithmic modeling and reasoning.

Еще

Deep learning, recurrent neural networks, natural language processing

Короткий адрес: https://sciup.org/142243844

IDR: 142243844

Статья научная