Language model with uncertainty based memory augmentation for multi-hop question answering task

Основное

Автор: Sagirova A.R., Burtsev M.S.

Журнал: Труды Московского физико-технического института @trudy-mipt

Статья в выпуске: 3 (59) т.15, 2023 года.

Бесплатный доступ

Transformers have become the gold standard for many natural language processing tasks, however, models with self-attention mechanisms struggle to process long sequences due to their quadratic complexity. Therefore, processing long texts remains a challenge. To address this issue, we propose a two-stage method that first collects relevant information over the entire document and then combines it with local context to solve the task. Our experimental results show that fine-tuning a pre-trained model with memory-augmented input, including the least uncertain global elements, improves the model’s performance on multi-hop question answering task compared to the baseline. We also found that the content of the global memory correlates with the supporting facts required for the correct answer.

Еще

Transformer, global memory, multi-hop question answering

Короткий адрес: https://sciup.org/142239994

IDR: 142239994

Статья научная