USE OF THE LINGUISTICALLY ORIENTED PYTHON LANGUAGE MODULES FOR HANDLING LARGE TEXTS IN THE EASTERN LANGUAGES IN ORDER TO MINE THE ORIENTALISTICS DATA (WITH NLTK MODULE TAKEN AS AN EXAMPLE)
Abstract
This article analyzes the contemporary linguistically oriented software created on the basis of
the programming language Python. The Natural Language Toolkit (NLTK) is selected as an example.
The research considers not only the general principles of the NLTK but also the principles especially
applied to the eastern languages: Farsi, Arabic and Chinese. The author shows certain solutions
for work with texts in Unicode as input-output for Python text processing modules.
Issue
Section
Articles