СРАВНИТЕЛЬНЫЙ АНАЛИЗ МЕТОДОВ RAG ДЛЯ ПОСТРОЕНИЯ РУССКОЯЗЫЧНЫХ ИНТЕЛЛЕКТУАЛЬНЫХ СЕРВИСОВ

Андрей Витальевич Мельников; Иван Евгеньевич Николаев; Михаил Александрович Русанов; Валерьян Ринатович Аббазов

Authors

Andrey V. Melnikov Author
Ivan E. Nikolaev Author
Mikhail A. Rusanov Author
Valerian R. Abbazov Author

Abstract

The paper discusses one of the currently most popular approaches to building various types of intelligent assistants and query-response systems based on large language models (LLMs), based on in-context learning or retrieval augmented generation (RAG). The recent proliferation of publications on this topic is primarily English-oriented and utilizes leading-quality models such as GPT-4o and their developments. At the same time, evaluations of RAG context search methods for Russian language tasks are practically absent, which makes it an urgent task to conduct research aimed at adapting and evaluating these methods for the Russian language. Aim. To study the effectiveness of different retrieval augmented generation (RAG) approaches for Russian-language tasks, given that most studies in this area are English-oriented and use leading models such as GPT-4. Materials and Methods. The paper reviews three basic approaches to RAG construction: naive RAG, HyDE, and a probabilistic approach based on the BM25 function. Particular attention is paid to assessing the quality of these methods in terms of the mean average precision (mAP) metric for the three knowledge domains. Combined RAG methods such as SelfRAG were not used to obtain separate evaluations of each approach. Russian language text corpora were selected for the experiments for the knowledge domains: oil and gas industry and jurisprudence. Results. The conducted study allowed us to obtain quality scores for each of the considered methods. The results agree well with the data of other studies, but are inferior to the known RAGs in English. Conclusion. The obtained results can be used as baseline evaluations and as a basis for making decisions on selecting optimal RAG architectures for Russian-language tasks. Further research will be aimed at integrating combined methods and adapting models to improve the quality of Russian language generation.

Author Biographies

Andrey V. Melnikov

Dr. Sci. (Eng.), Prof., Director, Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia
Ivan E. Nikolaev

Senior Lecturer of the Department of Information Technologies and Economic Informatics, Chelyabinsk State University, Chelyabinsk, Russia
Mikhail A. Rusanov

Senior Lecturer of the Engineering School of Digital Technologies, Yugra State University, Khanty-Mansiysk, Russia
Valerian R. Abbazov

Lead Programmer, Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia

COMPARATIVE ANALYSIS OF RAG METHODS FOR BUILDING RUSSIAN-SPEAKING INTELLIGENT SERVICES

Authors

Abstract

Author Biographies

Published

Issue

Section