Building a Chinese-Russian parallel discourse structure corpus of official texts

Бесплатный доступ

This paper is devoted to building a Chinese-Russian Parallel Discourse Structure Corpus of Official Texts (CRPDT) that aims at producing a discourse treebank, in which Chinese and Russian parallel texts are manually annotated and aligned at the level of discourse structure. In this corpus, discourse units and their discourse relations are annotated for each paragraph in the parallel texts. Experimental research is based on the material of 4 Chinese source texts “Reports on the work of the Government” and their Russian translations. The paper presents the history and development of building discourse treebanks, the principles of annotation for building parallel discourse treebanks. This paper shows how to work on the discourse segmentation for Chinese-Russian parallel texts. Annotation and alignment tools take from Chinese-English Parallel Discourse Treebank. We postulate that the corpus might be useful for machine translation, language learning, translation studies, discourse analysis of Chinese and Russian texts and future Natural Language Processing.

Еще

Короткий адрес: https://sciup.org/147154030

IDR: 147154030   |   DOI: 10.14529/ling160404

Статья научная