Use of the linguistically oriented Python language modules for handling large texts in the eastern languages in order to mine the orientalistics data (with NLTK module taken as an example)

Бесплатный доступ

This article analyzes the contemporary linguistically oriented software created on the basis of the programming language Python. The Natural Language Toolkit (NLTK) is selected as an example. The research considers not only the general principles of the NLTK but also the principles especially applied to the eastern languages: Farsi, Arabic and Chinese. The author shows certain solutions for work with texts in Unicode as input-output for Python text processing modules.

Nltk, eastern languages, модули python, python, natural language processing, code, кодировка utf-8, big data, unix, modules, encoding utf-8

Короткий адрес: https://sciup.org/147153945

IDR: 147153945

Статья научная