Optimization problems of subset selection in linear regression with control of its significance using F-test

Бесплатный доступ

This article is devoted to the problem of subset selection in multiple linear regression models. When implementing such a selection using the determination coefficient, the resulting model may be insignificant according to the F-test. To solve this problem, two problems of mixed 0-1 integer linear programming are proposed, the solution algorithms for which have been improved dozens of times over the past 20 years. The solution to the first of them gives an optimal model with an assigned number of factors, while the optimal number of factors is determined automatically when solving the second one. Computational experiments were carried out. For the second problem, an example shows that with tightening the requirements for the significance of the model according to the F-test, the number of factors in the selection decreases. The technique proposed in the article, associated with the introduction of additional binary variables, can be used in the future to control multicollinearity in models and the significance of estimates according to the Student’s t-test.

Еще

Regression analysis, linear regression, ordinary least squares, subset selection, mixed 0-1 integer linear programming, coefficient of determination, f-test

Короткий адрес: https://sciup.org/148330409

IDR: 148330409   |   DOI: 10.37313/1990-5378-2024-26-6-200-207

Статья научная