On using the case study method to create universal resources for conceptual annotation of multilingual corpora

Бесплатный доступ

The development of annotated corpora is crucial for the computer technologies meant to process unstructured information (automatic classification, intellectual content and trend analysis, machine learning, machine translation, etc.). It is therefore one of the focuses of international theoretical and applied linguistic research. The key aspect here is the automation of annotation procedures, which, in turn, requires static (linguistic) and dynamic (software) resources that could be reused, at least partially, for annotating multilingual texts of various domains. This paper presents an effort to create such resources for the conceptual type of annotation, one of the most popular and problematic annotation levels, by using the case study method. Conceptual annotation is understood as a kind of semantic annotation focused on solving specific information problems within specific domains. The methodology and results of the study are worked out by applying the case study method to the “Terrorism” domain texts in Russian, English and French. The resources created during the research thus include a universal methodology for the resource development, as well as domain oriented software and linguistic material (multilingual ontology and conceptually annotated corpora in three languages), which can directly be used for augmenting the coverage of annotated corpora in the “Terrorism” domain, developing metrics to resolve conceptual ambiguity, as well as for automating text annotation in other domains and languages. The results of the current research are also of interest for contrastive linguistic studies.

Еще

Conceptual annotation, static and dynamic resources, domain, ontology, multilingualism, terrorism

Короткий адрес: https://sciup.org/147234412

IDR: 147234412   |   DOI: 10.14529/ling200408

Статья научная