ON METHODS AND MODELS OF KEYWORD AUTOMATIC EXTRACTION

Authors

  • Svetlana Olegovna Sheremetyeva Author
  • Pavel Grigor'evich Osminin Author

Abstract

The paper presents an overview and classification of major approaches to the automatic
extraction of keywords from text documents. The approaches can be divided into statistical and
hybrid approaches. Both of these types can be further classified into corpora-based and documentbased.
Advantages and shortcomings of particular approaches are analyzed. It is claimed that the use
of statistical keyword extraction methods for inflecting languages, such as Russian, is problematic.
Requirements to the efficient model of automatic keyword extraction from texts in Russian are
formulated and particular recommendations to meet these requirements are given. It is emphasized
that to create effective keyword extractors one should take into consideration the linguistic types of
natural languages (analytical, inflecting, agglutinative, isolating), the domain (sublanguage) and the
availability of linguistic and programming resources. The approach is illustrated by a case study of a
keyword extractor for Russian texts on mathematical modeling.

Author Biographies

  • Svetlana Olegovna Sheremetyeva

    PhD (Habilitation), professor of the Linguistics and Intercultural Communication
    department, South Ural State University (Chelyabinsk), linklana@yahoo.com

  • Pavel Grigor'evich Osminin

    assistant professor of the Linguistics and Intercultural Communication department, South
    Ural State University (Chelyabinsk), osperevod@gmail.com

Issue

Section

Articles