Noun phrase extraction from a Spanish corpus

Бесплатный доступ

The article investigates the application of a lexicalist approach to the extraction of noun phrases from the corpus of Spanish patent texts. A corpus-based research into noun phrases in texts of patents for apparatuses is undertaken which results in formulation of syntactic patterns of a noun phrase typical of an inflectional language, such as Spanish. Difficulties hampering the complete reuse of the knowledge base, compiled for English noun phrase recognition, are investigated. The algorithm of noun phrase recognition consisting in filtering a list of candidates for noun phrases on the basis of the analysis of their lexical contexts has been enhanced by integrating morphological analysis of inflectional forms in Spanish. The performance of the algorithm is evaluated by analyzing the causes of false acceptance and false rejection of candidates.

Еще

Noun phrase, information extraction, knowledge base, lexicalist approach, inflectional languages, spanish language, syntactic patterns, context, domain, corpus-based research

Короткий адрес: https://sciup.org/147153760

IDR: 147153760

Статья научная