Optimizing stochastic gradient boosting with out-of-sample evaluation metrics

Автор: Ibragimov B.L., Gusev G.G.

Журнал: Труды Московского физико-технического института @trudy-mipt

Рубрика: Математика

Статья в выпуске: 3 (63) т.16, 2024 года.

Бесплатный доступ

Stochastic Gradient Boosting (SGB) is a powerful ensemble learning method widely-used in various machine learning applications. It introduces regularization by discarding a subset of data samples at each iteration, a technique that helps prevent overfitting. However, these out-of-sample (OOS) data points, typically left unused during model training, present an untapped opportunity to enhance the robustness of the learning process. In this paper, we propose a novel approach that leverages OOS data not only to evaluate the quality of constructed decision trees but also to perform targeted pruning and hyperparameter optimization. By assessing the correlation score between actual and predicted gradient values on OOS data, we establish a metric that effectively approximates the trees’s performance on unseen test data. Our empirical studies, conducted on a collection of real-world benchmark datasets with sizes up to 100,000 samples, demonstrate the efficiency of this method. The results indicate a consistent reduction in error rates, with improvements reaching up to 2% in log-loss compared to standard SGB implementations. These findings highlight the potential of OOS-driven pruning and hyperparameter tuning to not only enhance model accuracy but also to provide a computationally efficient pathway for regularization in gradient boosting frameworks.

Еще

Stochastic gradient boosting, regularization, subsampling, ensemble, machine learning

Короткий адрес: https://sciup.org/142242984

IDR: 142242984

Статья научная