Optimization problems of subset selection in linear regression with control of its significance using F-test
Автор: Bazilevskiy M.P.
Журнал: Известия Самарского научного центра Российской академии наук @izvestiya-ssc
Рубрика: Информатика, вычислительная техника и управление
Статья в выпуске: 6 т.26, 2024 года.
Бесплатный доступ
This article is devoted to the problem of subset selection in multiple linear regression models. When implementing such a selection using the determination coefficient, the resulting model may be insignificant according to the F-test. To solve this problem, two problems of mixed 0-1 integer linear programming are proposed, the solution algorithms for which have been improved dozens of times over the past 20 years. The solution to the first of them gives an optimal model with an assigned number of factors, while the optimal number of factors is determined automatically when solving the second one. Computational experiments were carried out. For the second problem, an example shows that with tightening the requirements for the significance of the model according to the F-test, the number of factors in the selection decreases. The technique proposed in the article, associated with the introduction of additional binary variables, can be used in the future to control multicollinearity in models and the significance of estimates according to the Student’s t-test.
Regression analysis, linear regression, ordinary least squares, subset selection, mixed 0-1 integer linear programming, coefficient of determination, f-test
Короткий адрес: https://sciup.org/148330409
IDR: 148330409 | DOI: 10.37313/1990-5378-2024-26-6-200-207