USE OF THE LINGUISTICALLY ORIENTED PYTHON LANGUAGE MODULES FOR HANDLING LARGE TEXTS IN THE EASTERN LANGUAGES IN ORDER TO MINE THE ORIENTALISTICS DATA (WITH NLTK MODULE TAKEN AS AN EXAMPLE)

Authors

  • Bulat G. Fatkulin Author

Abstract

This article analyzes the contemporary linguistically oriented software created on the basis of
the programming language Python. The Natural Language Toolkit (NLTK) is selected as an example.
The research considers not only the general principles of the NLTK but also the principles especially
applied to the eastern languages: Farsi, Arabic and Chinese. The author shows certain solutions
for work with texts in Unicode as input-output for Python text processing modules.

Author Biography

  • Bulat G. Fatkulin
    associate professor, department of General Linguistics, South Ural State University, Chelyabinsk, Russian Federation, bfatkulin@gmail.com

Issue

Section

Articles