Multi-task training for better generalization in structured query languages

Основное

Автор: Somov O.D.

Журнал: Труды Московского физико-технического института @trudy-mipt

Статья в выпуске: 2 (62) т.16, 2024 года.

Бесплатный доступ

Semantic parsing is a task of translating a natural language statement into a logical expression in a formal language. One of the applications of semantic parsing is text-to-query, where an natural language statement should be translated into an executable query for the knowledge base. The most popular text-to-query tasks are text-to-SQL tasks and text-to-SPARQL tasks. Semantic parsers often fall prey to the distribution shift problem. One of the most often shifts is compositional shift - the ability to generate novel code compositions from known syntax elements. In this work, we explore the robustness of using pretrained language models (PLM) along with multi-task training approach. We propose specifically designed data splits of the SPARQL and SQL datasets, LC-QuAD and WikiSQL, for emulating the distribution shift and compare the original state of the art text-to-query approach with multi-task. We provide an in-depth analysis of data splits and model predictions and show the advantages of multi-task approach over original for the text-to-query task.

Еще

Semantic parsing, distribution shift, multi-task training

Короткий адрес: https://sciup.org/142242126

IDR: 142242126

Статья научная