COMPARATIVE ANALYSIS OF RAG METHODS FOR BUILDING RUSSIAN-SPEAKING INTELLIGENT SERVICES
Abstract
The paper discusses one of the currently most popular approaches to building various types of intelligent assistants and query-response systems based on large language models (LLMs), based on in-context learning or retrieval augmented generation (RAG). The recent proliferation of publications on this topic is primarily English-oriented and utilizes leading-quality models such as GPT-4o and their developments. At the same time, evaluations of RAG context search methods for Russian language tasks are practically absent, which makes it an urgent task to conduct research aimed at adapting and evaluating these methods for the Russian language. Aim. To study the effectiveness of different retrieval augmented generation (RAG) approaches for Russian-language tasks, given that most studies in this area are English-oriented and use leading models such as GPT-4. Materials and Methods. The paper reviews three basic approaches to RAG construction: naive RAG, HyDE, and a probabilistic approach based on the BM25 function. Particular attention is paid to assessing the quality of these methods in terms of the mean average precision (mAP) metric for the three knowledge domains. Combined RAG methods such as SelfRAG were not used to obtain separate evaluations of each approach. Russian language text corpora were selected for the experiments for the knowledge domains: oil and gas industry and jurisprudence. Results. The conducted study allowed us to obtain quality scores for each of the considered methods. The results agree well with the data of other studies, but are inferior to the known RAGs in English. Conclusion. The obtained results can be used as baseline evaluations and as a basis for making decisions on selecting optimal RAG architectures for Russian-language tasks. Further research will be aimed at integrating combined methods and adapting models to improve the quality of Russian language generation.Published
2025-05-20
Issue
Section
Informatics and Computer Engineering