ON METHODS AND MODELS OF KEYWORD AUTOMATIC EXTRACTION
Abstract
The paper presents an overview and classification of major approaches to the automatic
extraction of keywords from text documents. The approaches can be divided into statistical and
hybrid approaches. Both of these types can be further classified into corpora-based and documentbased.
Advantages and shortcomings of particular approaches are analyzed. It is claimed that the use
of statistical keyword extraction methods for inflecting languages, such as Russian, is problematic.
Requirements to the efficient model of automatic keyword extraction from texts in Russian are
formulated and particular recommendations to meet these requirements are given. It is emphasized
that to create effective keyword extractors one should take into consideration the linguistic types of
natural languages (analytical, inflecting, agglutinative, isolating), the domain (sublanguage) and the
availability of linguistic and programming resources. The approach is illustrated by a case study of a
keyword extractor for Russian texts on mathematical modeling.