COMPARATIVE ANALYSIS OF RAG METHODS FOR BUILDING RUSSIAN-SPEAKING INTELLIGENT SERVICES

Authors

  • Andrey V. Melnikov Author
  • Ivan E. Nikolaev Author
  • Mikhail A. Rusanov Author
  • Valerian R. Abbazov Author

Abstract

The paper discusses one of the currently most popular approaches to building various types of intelligent assistants and query-response systems based on large language models (LLMs), based on in-context learning or retrieval augmented generation (RAG). The recent proliferation of publications on this topic is primarily English-oriented and utilizes leading-quality models such as GPT-4o and their developments. At the same time, evaluations of RAG context search methods for Russian language tasks are practically absent, which makes it an urgent task to conduct research aimed at adapting and evaluating these methods for the Russian language. Aim. To study the effectiveness of different retrieval augmented generation (RAG) approaches for Russian-language tasks, given that most studies in this area are English-oriented and use leading models such as GPT-4. Materials and Methods. The paper reviews three basic approaches to RAG construction: naive RAG, HyDE, and a probabilistic approach based on the BM25 function. Particular attention is paid to assessing the quality of these methods in terms of the mean average precision (mAP) metric for the three knowledge domains. Combined RAG methods such as SelfRAG were not used to obtain separate evaluations of each approach. Russian language text corpora were selected for the experiments for the knowledge domains: oil and gas industry and jurisprudence. Results. The conducted study allowed us to obtain quality scores for each of the considered methods. The results agree well with the data of other studies, but are inferior to the known RAGs in English. Conclusion. The obtained results can be used as baseline evaluations and as a basis for making decisions on selecting optimal RAG architectures for Russian-language tasks. Further research will be aimed at integrating combined methods and adapting models to improve the quality of Russian language generation.

Author Biographies

  • Andrey V. Melnikov
    Dr. Sci. (Eng.), Prof., Director, Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia
  • Ivan E. Nikolaev
    Senior Lecturer of the Department of Information Technologies and Economic Informatics, Chelyabinsk State University, Chelyabinsk, Russia
  • Mikhail A. Rusanov
    Senior Lecturer of the Engineering School of Digital Technologies, Yugra State University, Khanty-Mansiysk, Russia
  • Valerian R. Abbazov
    Lead Programmer, Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia

Published

2025-05-20

Issue

Section

Informatics and Computer Engineering