ПЕРЕОБУЧЕНИЕ В МАШИННОМ ОБУЧЕНИИ: ПРОБЛЕМЫ И РЕШЕНИЯ

Виктор Александрович Парасич; Ирина Васильевна Парасич; Георгий Иосифович Волович; Сергей Геннадьевич Некрасов; Парасич Андрей Викторович

Authors

Victor A. Parasich Author
Irina V. Parasich Author
Georgiy I. Volovich Author
Sergey G. Nekrasov Author
Andrey V. Parasich Author

Abstract

Overfitting is one of the most important factors affecting the performance of machine learning algorithms. When solving machine learning problems, it is important to be able to effectively solve the problem of overfitting. The research objective. The purpose of this article is to study the problem of overfitting in machine learning tasks. The article discusses effective learning methods aimed at preventing overfitting. Material and methods. The focus of the article is on various non-standard issues related to overfitting that are important from a practical point of view. Various causes of overfitting, its consequences and methods of combating overfitting are considered. The dependence of overfitting and generalizing ability on the quality of features and properties of the training set is studied. Particular attention is paid to the features of training and the formation of a training sample in multidimensional feature spaces. The question of the correct formation of the training set and the correct addition of data to the training set from the point of view of overfitting prevention, as well as the impact of incorrect distribution of the target variable on overfitting, is considered. It is explained why the methods of adding incorrect data to the training set, such as MixUp and CutMix, can improve the quality of training. The problem of the algorithm's confidence in its predictions is considered, as well as the problem of algorithm overconfidence in incorrect predictions, which is also typical for ChatGPT. The problem of assessing the quality of the algorithm is considered. It is shown why normalization can help avoid overfitting. Results. An algorithm for training decision trees Random Samples Mix-Up is proposed to combat overfitting, which improves the quality of training decision trees. A comparative analysis of the quality of models before and after the application of this method of combating overfitting is carried out. Experiments on real data confirm effectiveness of this method. Conclusion. The results of the study can be useful in developing new machine learning algorithms and improving the efficiency of existing ones. The results of the study can be useful for developers of machine learning algorithms and specialists in the field of artificial intelligence.

Author Biographies

Victor A. Parasich

Сand. Sci. (Eng.), Ass. Prof., Ass. Prof. of the Department of Electronic Computing Machines, South Ural State University, Chelyabinsk, Russia
Irina V. Parasich

Сand. Sci. (Eng.), Ass. Prof. of the Department of Mathematical and Computer Modeling, South Ural State University, Chelyabinsk, Russia
Georgiy I. Volovich

LLC Chelenergopribor, Chelyabinsk, Russia
Sergey G. Nekrasov

Dr. Sci. (Eng.), Prof. of the Department of Information and Measuring Technology, South Ural State University, Chelyabinsk, Russia
Andrey V. Parasich

Software engineer, LLC TRIDIVI, Chelyabinsk, Russia

OVERFITTING IN MACHINE LEARNING: PROBLEMS AND SOLUTIONS

Authors

Abstract

Author Biographies

Published

Issue

Section