To the noun phrase recognition problem in application to automatic information extraction from Russian texts
Автор: Vlasova Natalia Aleksandrovna, Podobryaev Alexey Vladimirovich
Журнал: Программные системы: теория и приложения @programmnye-sistemy
Рубрика: Искусственный интеллект, интеллектуальные системы, нейронные сети
Статья в выпуске: 1 (28) т.7, 2016 года.
Бесплатный доступ
The problem of isolating complex noun groups in Russian-language journalistic texts in the application to problems of automatic information retrieval is considered. By complex nominal groups are meant long nominal groups containing genitive, prepositional constructions, as well as proper names. A scheme for finding the boundaries of nominal groups is proposed, beginning with a fragment of text that obviously contains a name group. An algorithm for identifying such fragments has been developed. Their classification based on the frequency of occurrence of the types of fragments, the number of words of the fragment, their part-time composition, the presence of already identified named entities of different species, information on the occurrence of parts of fragments in the list of complex prepositions and stable combinations. The original system of attributes for constructing an algorithm for automatically extracting nominal groups within the boundaries of analysis of fragments constructed at the first stage is given. In the experimental part of the study, fragments (58032 fragments) were extracted from the collection of texts of socio-political subjects (1000 documents), complicated cases were analyzed
Information extraction, named entities recognition, noun phrase chunking
Короткий адрес: https://sciup.org/14336183
IDR: 14336183