Эффективный обобщённый оценщик гребня для модели логистической регрессии
Автор: Ахмед Мутлаг Алгбури
Журнал: Informatics. Economics. Management - Информатика. Экономика. Управление.
Рубрика: Информатика, вычислительная техника
Статья в выпуске: 5 (1), 2026 года.
Бесплатный доступ
В логистической регрессионной модели (LRM) для оценки неизвестных параметров традиционно используется метод максимального правдоподобия (MLE). Однако при наличии существенной мультиколлинеарности между объясняющими переменными оценки параметров, полученные методом MLE, становятся нестабильными, имеют большие дисперсии и приводят к широким доверительным интервалам и снижению статистической мощности критериев. Для преодоления этих недостатков в работе рассматривается обобщённый гребневой оценщик для логистической регрессии (GRL), основанный на введении матрицы гребневых параметров K, позволяющей контролировать степень смещения и уменьшать дисперсию оценок регрессионных коэффициентов. Параметры модели GRL оцениваются с использованием процедуры максимального правдоподобия, после чего проводится сравнительное исследование эффективности MLE и GRL при различных сценариях мультиколлинеарности с помощью моделирования Монте‑Карло. В ходе численного эксперимента анализируется ряд недавно предложенных методов выбора гребневого параметра k и оценивается их влияние на среднеквадратичную ошибку (MSE) оценок коэффициентов. Результаты моделирования демонстрируют, что обобщённый гребневой оценщик логистической регрессии обеспечивает более низкие значения MSE по сравнению с классическим MLE во всех рассмотренных конфигурациях корреляции между переменными и уровнями шумов, что подтверждает его практическую пригодность для задач классификации и прогнозирования в условиях мультиколлинеарности.
Логистическая регрессия, оценка гребней, обобщённая оценка гребней, мультиколлинеарность, моделирование Монто-Карло.
Короткий адрес: https://sciup.org/14135108
IDR: 14135108 | DOI: 10.47813/2782-5280-2026-5-1-1033-1041
Текст статьи Эффективный обобщённый оценщик гребня для модели логистической регрессии
DOI:
Logistic regression is a common method for modeling binary data in health sciences and biostatistics. Frisch's 1934 discussed the problem of multicollinearity. which stated that any two variables together generate a multicollinearity problem. This occurs when independent variables are correlated in multiple linear regression, making it difficult to obtain definitive answers to the research questions because the variance are too high or the t-values are too low. This State is known the multicollinearity problem [1].
Logistic regression is a common method for modeling binary data. it is frequently used in classification and predictive analytics. and known logistic model. Logistic regression measures the probability of an event, such as presence/absence or success/failure, based on a given dataset of independent variables. Because the independent variables may be correlated, the ridge regression method can be used with logistic regression models. For further details on logistic regression and the ridge regression method, please refer our readers to [1-5], and others.
In many regression model applications, there is correlation between explanatory variables. When correlations between variables are high, they lead to unstable estimation of regression coefficients, making it difficult to interpret these estimates. In multicollinearity, it becomes difficult to estimate the individual effects of each explanatory variable within the model. Furthermore, the variability of regression coefficients will affect both the inference and prediction of the model. Several methods have been proposed to address multicollinearity [6]. The MLE is most commonly used to estimate the unknown coefficients of the (LRM). One assumption of multiple regression models is that explanatory variables are independent and uncorrelated. MLE performs better when explanatory variables are independent and uncorrelated [7]. However, in practice, linear relationships between explanatory variables can be found in multicollinearity. This multicollinearity problem, introduced by [8], has some disadvantages in parameter estimation using MLE. One issue is that parameter estimates often have large variances, making reliable results difficult to obtain. MLE can also produce unstable estimates of the estimated coefficients. There is also the problem of wide confidence interval and low statistical power in making appropriate decisions, which leads to an increased probability of type II hypothesis testing errors for regression coefficients.
Several methods exist for addressing and discusses problem of multicollinearity. one of the most common methods is Ridge regression, developed by [9]. Studies were conducted on the linear regression (LR) model to determine the best value shrinkage of the K coefficient for ridge regression.; for example, [9-11], and many others [12] developed the Ridge regression estimator in the Generalized Linear Mode (GLM), and by extending the idea of [12], many researchers have proposed the Ridge regression approach for different models, for example, [1, 4, 13].
The ridge method is one of the ways this problem is addressed, and it was first introduced by researchers [9]. In this method, a value for the slope of the letter, denoted as K, where the variance of the slope coefficient is reduced while the bias factor is increased. Researchers have showed that this K coefficient has a non-zero value, where (MSE) of the regression coefficient using ridge regression is smaller than maximum likelihood (MLE) variance of the coefficient. Among these are [11], [14-23]. And Several methods for generalized Ridge regression Among these are [24-27].
The purpose of this article is to apply several parameters of generalized ridge logistic regression (GRL) that can be estimated apply MLE method under conditions of high correlations between explanatory variables. The article was follows. In the first section, we explain the model we are analyzing and formally define several parameters for the logistic ridge regression. In section 2 we present Generalized ridge Estimator (GRE). In section 3, we simulation experiment, including the factors that can influence the sample characteristics for these proposed parameters. In section 4, we shown the results for the different coefficients in terms (MSE). The conclusions of the article are presented in section 5.
MATERIALS AND METHODS
Logistic Ridge Regression Model (Lrrm)
In this section, we will introduce the logistic regression, first proposed by [2], and discuss some of the parameters that have been used in ridge estimators by researchers [18, 21, 22].
Logistic regression analysis is a commonly apply statistical method when the value of i for the dependent variable (y) in the regression model is Be(p) with the following value for the coefficient:
exp (xjp) 1+exp (x j /?)
Where P is an explanatory variable and is a vector of size (k+1)×1 of the coefficients, xi is row i of X, which is a matrix data of size n×(k+1). The most common method of estimation is to using the MLE, where the following log Likelihood function must be maximized:
l(X;P) - yT log(p) + (1 — y)T log(1 — p). (2)
Set zero for first derivative . Then, MLE are solving the following equation:
?^-XT(y — p)-0. (3)
The equations resulting from the first derivative are non-linear equations that do not have a solution. Therefore, these equations are solved by numerical methods, the most common of which is the Newton-Raphson algorithm, Therefore, by using the iterative weighted least squares(IWLS) algorithm, the solution equation 3 is obtained :
P ML - (XTWX)-1(XTWz) (4)
Where W = diag[pt(1 — р) and z is a vector where the element i is equal to zi = log(pi)+-:^. (5)
The covariance matrix asymptotic of MLE is equal to the inverse of the second derivative matrix ( inverse Hessian matrix):
-1
Cov(P ml) = E(-d-^) — (XTWX)-1 (6)
the asymptotic MSE:
E(L Ml ) = E((P ml — PY(P ml — P) = tr^WXyy^^iY- (7)
Here, A j referred eigenvalue j of the XTWX matrix. One drawback of apply maximum likelihood estimation is that the variance becomes large when there is a strong correlation between the independent variables, because some eigenvalues will be small. The ridge estimator in ridge regression can be directly extended to the logistic ridge regression [2] [1]as follows
P lrr = (XTWX + kI)-1(XTWXP ML ) = ZP ml , (8)
Where W, PML are the ML estimates derived from eq (4). The mean squared error of the logistic regression is:
E(LIrr)=E(P l RR—P) ‘ (P lrr —P)
= E[(P ml — P) ‘ Z 'Z(P ml — P) + (ZP — P) ‘ (ZP — P)
= tr[ (P ml — P)' (P ml — PV ' Z] + P ' (XTWX
+ kI)-1(XTWX + kI)-1P
- tr[(XTWX)-1Z ' Z] + k^P ' (XTWX + kI)-2P
- %=! .. + k2P (XTWX + kIY2P (9)
(Aj + KJ where k> 0. A specific estimator from Eq. (8) with k= 0 might be thought of as the ML estimator.
Generalized Ridge Estimator
Generalized ridge Estimator (GRE) differs from the ridge Regression (RR) model in that it takes into account the p values of k:
P gre = (XTX + KY^Y (10)
Where K = dig(k1, k2, ..., ks). It is useful to find the optimal values for ik when using GRE because the MSE is best than when used the ridge estimator and MLE.
The GRR definition for the Logistic Regression Model (LGRR) is:
P lgrr = (XTWX + KYWWXP ml ) = ZP ml , (11)
K matrix selection must be carefully considered. Several approaches are modified to estimate K in this study, including . These approaches are listed below, in order.
a2
^(HK) — ^ , i-l,2,^,S (12)
Where di2 is referred the element i of yPlrr and у is eigenvector of XTWX and the dispersion 2
coefficient, v, is estimated by a2 — УУ-Л —^ .
L 1 n-s
We will use some Several approaches are modified to estimate K in this study [27] , Specifically, the estimators presented in equations SL No (74)–(84):
-J
k 1
2o2
Amax^2 i max
k2 = max
2a2
max
-2 a i
k3
= max
2d2
л A 2
max i
к.
__ 2S(7 2
■ 4 = Tv?
л ^ tt 1-1 1 max
k 5
29 2 лСП^а ? ) 1 / 2 v 1-1 1J max
k6 = median
ˆ2
|
29 2 |
(19) |
|
|
k7 1---- |
L А ^ тах^а2) |
|
|
k8 = max ( |
/ 2^ 2 |
(20) |
|
J If-1^2 ) |
||
|
J. _ 2 9 2 уф 1 k 9 1 c ^ i=1» 2 №=1*l “ ; |
(21) |
|
|
2po2 |
(22) |
|
|
k10 1---- ж |
: 1^-1^) |
|
|
2pS2 |
(23) |
|
|
k11 1---- M1 |
лЖ-Л2 ) 1 / 5 |
|
\ /Lmax
RESULTS
Table 1. Average MSE values when n=100 and ^0 = 0.
|
p=5 |
p=10 |
|||||
|
Method |
p =0.85 |
p =0.95 |
p =0.99 |
p =0.85 |
p =0.95 |
p =0.99 |
|
MLE |
2.5640 |
8.0337 |
40.4974 |
11.1561 |
44.3506 |
210.4606 |
|
GRR_Lukman1 |
2.0123 |
5.6261 |
21.7362 |
8.1675 |
28.3941 |
109.3064 |
|
GRR_Lukman2 |
1.0306 |
2.1555 |
7.0780 |
1.3152 |
2.4829 |
8.1120 |
|
GRR_Lukman3 |
1.4339 |
3.3882 |
11.0671 |
3.1056 |
8.0597 |
29.4210 |
|
GRR_Lukman4 |
2.0730 |
6.3484 |
29.2969 |
8.2033 |
32.7290 |
150.6817 |
|
GRR_Lukman5 |
1.4426 |
4.6235 |
27.7736 |
1.6170 |
12.5240 |
159.1039 |
|
GRR_Lukman6 |
1.7778 |
4.6801 |
18.5824 |
5.3897 |
16.3831 |
67.9481 |
|
GRR_Lukman7 |
1.1269 |
2.2643 |
7.1078 |
3.7734 |
8.8272 |
28.1793 |
|
GRR_Lukman8 |
2.0247 |
5.1535 |
15.6500 |
7.3830 |
21.2948 |
59.5732 |
|
GRR_Lukman9 |
0.7515 |
1.1128 |
1.9983 |
1.2124 |
2.1293 |
3.3574 |
|
GRR_Lukman10 |
1.4345 |
3.2204 |
10.7390 |
4.7142 |
13.1398 |
39.8583 |
|
GRR_Lukman11 |
1.1615 |
2.2072 |
6.2601 |
2.7405 |
5.3214 |
13.6279 |
Table 2 . Average MSE values when n=200 and ^0 = 0.
|
p=5 |
p=10 |
|||||
|
Method |
p =0.85 |
p =0.95 |
p =0.99 |
p =0.85 |
p =0.95 |
p =0.99 |
|
MLE |
1.0442 |
3.3817 |
16.5013 |
3.3025 |
11.0761 |
59.2111 |
|
GRR_Lukman1 |
0.9321 |
2.7027 |
10.7266 |
2.8679 |
8.9635 |
41.6266 |
|
GRR_Lukman2 |
0.7296 |
1.4919 |
4.7454 |
0.9928 |
2.0051 |
7.0140 |
|
GRR_Lukman3 |
0.8855 |
2.0516 |
7.4390 |
1.9074 |
4.5544 |
19.4522 |
|
GRR_Lukman4 |
0.9627 |
2.9010 |
13.3834 |
2.8315 |
9.2649 |
47.9272 |
|
GRR_Lukman5 |
0.8856 |
2.1618 |
11.4463 |
1.0028 |
2.3699 |
38.5917 |
|
GRR_Lukman6 |
0.9024 |
2.5009 |
9.8090 |
2.2084 |
6.8076 |
32.6604 |
|
GRR_Lukman7 |
0.6180 |
1.3725 |
4.1141 |
1.4811 |
3.5308 |
12.7774 |
|
GRR_Lukman8 |
0.9765 |
2.7784 |
9.4083 |
2.8747 |
8.4569 |
32.9171 |
|
GRR_Lukman9 |
0.6009 |
0.8848 |
1.5433 |
0.9010 |
1.5181 |
2.7522 |
|
GRR_Lukman10 |
0.7713 |
1.7973 |
5.3324 |
1.9129 |
4.9953 |
19.2595 |
|
GRR_Lukman11 |
0.7240 |
1.4570 |
3.3814 |
1.3110 |
2.9221 |
9.7536 |
Table 3. Average MSE values when n=300 and /?0 = 0.
|
p=5 |
p=10 |
|||||
|
Method |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
|
MLE |
0.7026 |
2.1128 |
10.9040 |
1.9295 |
6.4616 |
33.7800 |
|
GRR_Lukman1 |
0.6486 |
1.8067 |
8.0236 |
1.7516 |
5.4456 |
26.1889 |
|
GRR_Lukman2 |
0.6068 |
1.1835 |
4.1618 |
0.8440 |
1.7057 |
5.4734 |
|
GRR_Lukman3 |
0.6694 |
1.5601 |
6.1269 |
1.4356 |
3.4610 |
14.0352 |
|
GRR_Lukman4 |
0.6622 |
1.9106 |
9.4135 |
1.7402 |
5.5250 |
28.9724 |
|
GRR_Lukman5 |
0.7314 |
1.5641 |
7.9563 |
0.9479 |
1.4907 |
18.8884 |
|
GRR_Lukman6 |
0.6414 |
1.7482 |
7.6224 |
1.4887 |
4.4498 |
21.8357 |
|
GRR_Lukman7 |
0.5835 |
1.0167 |
3.2071 |
1.0468 |
2.4077 |
8.5328 |
|
GRR_Lukman8 |
0.6735 |
1.9004 |
7.6975 |
1.7914 |
5.3346 |
23.0749 |
|
GRR_Lukman9 |
0.5482 |
0.8337 |
1.3530 |
0.8208 |
1.3343 |
2.5168 |
|
GRR_Lukman10 |
0.5665 |
1.3308 |
4.4193 |
1.3282 |
3.3835 |
13.0538 |
|
GRR_Lukman11 |
0.5693 |
1.1555 |
3.1013 |
1.0628 |
2.2775 |
6.7911 |
Table 4: Average MSE values when n=100 and /?0 = 1.
|
p=5 |
p=10 |
||||||
|
Method |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
|
|
MLE |
2.8853 |
9.0213 |
39.3057 |
7.5031 |
23.3707 |
38.3376 |
|
|
GRR_Lukman1 |
2.4003 |
6.6659 |
28.2197 |
6.1558 |
17.3545 |
19.5704 |
|
|
GRR_Lukman2 |
1.4377 |
3.3739 |
12.0741 |
2.2103 |
4.4228 |
6.6966 |
|
|
GRR_Lukman3 |
1.9670 |
5.0477 |
19.8731 |
4.1927 |
9.9221 |
11.5442 |
|
|
GRR_Lukman4 |
2.5588 |
7.7533 |
28.9280 |
6.5001 |
19.6348 |
25.6908 |
|
|
GRR_Lukman5 |
2.0190 |
6.4329 |
28.2873 |
3.5663 |
13.5733 |
17.2433 |
|
|
GRR_Lukman6 |
2.3001 |
6.2631 |
28.0022 |
5.2918 |
14.4088 |
19.6552 |
|
|
GRR_Lukman7 |
1.6270 |
3.5627 |
10.8803 |
3.6640 |
7.6278 |
8.9798 |
|
|
GRR_Lukman8 |
2.5380 |
6.7096 |
22.9903 |
6.3420 |
16.2860 |
17.4482 |
|
|
GRR_Lukman9 |
1.0476 |
1.6246 |
3.5449 |
1.7779 |
2.5738 |
3.8658 |
|
|
GRR_Lukman10 |
1.8426 |
4.2598 |
15.7734 |
4.3020 |
10.2747 |
11.8678 |
|
|
GRR_Lukman11 |
1.4828 |
2.9591 |
9.2960 |
2.9339 |
5.8681 |
8.7304 |
|
Table 5. Average MSE values when n=200 and /?0 = 1.
|
p=5 |
p=10 |
|||||
|
Method |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
|
MLE |
1.1343 |
3.5181 |
17.9053 |
3.6015 |
12.1673 |
62.9317 |
|
GRR_Lukman1 |
1.0455 |
2.9787 |
12.9131 |
3.2352 |
10.3182 |
47.3147 |
|
GRR_Lukman2 |
0.8379 |
1.9236 |
7.3975 |
1.2874 |
2.8962 |
11.0290 |
|
GRR_Lukman3 |
1.0018 |
2.6092 |
10.7833 |
2.3911 |
6.8679 |
27.8553 |
|
GRR_Lukman4 |
1.0831 |
3.2344 |
15.9179 |
3.2767 |
10.9151 |
54.7164 |
|
GRR_Lukman5 |
0.9942 |
2.7171 |
14.5964 |
1.1480 |
4.4925 |
47.5701 |
|
GRR_Lukman6 |
1.0323 |
2.9789 |
13.2040 |
2.8057 |
8.8001 |
41.3562 |
|
GRR_Lukman7 |
0.8180 |
1.8122 |
5.9311 |
1.9173 |
4.6245 |
16.5334 |
|
GRR_Lukman8 |
1.0995 |
3.1639 |
12.5449 |
3.3397 |
10.2538 |
40.7839 |
|
GRR_Lukman9 |
0.7700 |
1.1871 |
2.1919 |
1.2827 |
2.0791 |
3.6526 |
|
GRR_Lukman10 |
0.8953 |
2.1650 |
7.7458 |
2.3931 |
6.4305 |
23.8907 |
|
GRR_Lukman11 |
0.8112 |
1.6878 |
5.2282 |
1.7178 |
3.9754 |
11.6601 |
Table 6. Average MSE values when n=300 and /?0 = 1.
|
p=5 |
p=10 |
|||||
|
Method |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
|
MLE |
0.7207 |
2.1623 |
10.9962 |
2.0544 |
7.0762 |
34.4338 |
|
GRR_Lukman1 |
0.6837 |
1.9178 |
8.6506 |
1.9137 |
6.2860 |
27.8912 |
|
GRR_Lukman2 |
0.6490 |
1.3828 |
5.4834 |
1.2596 |
2.2104 |
8.4649 |
|
GRR_Lukman3 |
0.7137 |
1.7440 |
7.6352 |
1.6483 |
4.5553 |
19.6154 |
|
GRR_Lukman4 |
0.6984 |
2.0469 |
10.0677 |
1.9301 |
6.5065 |
31.3165 |
|
GRR_Lukman5 |
0.7794 |
1.7665 |
8.9426 |
0.9451 |
2.3031 |
24.0508 |
|
GRR_Lukman6 |
0.6786 |
1.9157 |
8.8958 |
1.7323 |
5.5408 |
25.8418 |
|
GRR_Lukman7 |
0.5699 |
1.2850 |
4.2769 |
1.2475 |
3.0282 |
11.2455 |
|
GRR_Lukman8 |
0.7071 |
2.0398 |
8.9884 |
1.9753 |
6.4112 |
26.4480 |
|
GRR_Lukman9 |
0.5107 |
1.0233 |
1.8671 |
1.1845 |
1.7906 |
3.0205 |
|
GRR_Lukman10 |
0.6188 |
1.4839 |
5.5180 |
1.5486 |
4.2778 |
15.3369 |
|
GRR_Lukman11 |
0.6091 |
1.2434 |
3.8175 |
1.2469 |
2.8739 |
8.3472 |
Table 7. Average MSE values when n=100 and /?0 = -1.
|
p=5 |
p=10 |
|||||
|
Method |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
|
MLE |
2.9467 |
8.3180 |
42.4476 |
6.9314 |
23.8698 |
29.5804 |
|
GRR_Lukman1 |
2.2975 |
5.4689 |
21.3817 |
5.4325 |
16.3420 |
19.1137 |
|
GRR_Lukman2 |
1.2327 |
2.4029 |
7.3129 |
1.6253 |
3.1657 |
11.1405 |
|
GRR_Lukman3 |
1.6177 |
3.2421 |
11.7059 |
3.0771 |
7.3726 |
27.3403 |
|
GRR_Lukman4 |
2.3394 |
5.9420 |
28.9126 |
5.4588 |
17.8676 |
19.4914 |
|
GRR_Lukman5 |
1.7009 |
4.5919 |
29.9182 |
2.5120 |
12.2020 |
13.8735 |
|
GRR_Lukman6 |
2.0291 |
4.5375 |
18.8591 |
4.2049 |
12.0259 |
15.8053 |
|
GRR_Lukman7 |
1.1560 |
2.2522 |
6.5854 |
2.5469 |
5.7325 |
9.0577 |
|
GRR_Lukman8 |
2.2976 |
4.9186 |
14.5954 |
5.2753 |
13.7368 |
14.3222 |
|
GRR_Lukman9 |
0.8749 |
1.1090 |
2.1244 |
1.4134 |
2.0125 |
3.7456 |
|
GRR_Lukman10 |
1.6501 |
3.1216 |
9.2303 |
3.4325 |
8.1922 |
10.5109 |
|
GRR_Lukman11 |
1.3906 |
2.3528 |
5.5576 |
2.4372 |
4.7003 |
12.6191 |
Table 8. Average MSE values when n=200 and fi0 = -1.
|
p=5 |
p=10 |
|||||
|
Method |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
|
MLE |
1.2170 |
3.4007 |
19.1084 |
3.4705 |
11.5113 |
62.1224 |
|
GRR_Lukman1 |
1.0572 |
2.4911 |
11.9543 |
3.0123 |
9.0812 |
42.3467 |
|
GRR_Lukman2 |
0.8354 |
1.4736 |
5.0611 |
1.1765 |
2.3169 |
7.3114 |
|
GRR_Lukman3 |
0.9406 |
1.8121 |
7.3855 |
2.0518 |
4.9306 |
19.3170 |
|
GRR_Lukman4 |
1.0760 |
2.6376 |
14.6526 |
2.9965 |
9.3444 |
49.2212 |
|
GRR_Lukman5 |
0.9304 |
1.9319 |
13.0337 |
1.0732 |
3.1636 |
42.8344 |
|
GRR_Lukman6 |
1.0004 |
2.2707 |
10.5528 |
2.4440 |
7.1221 |
32.7479 |
|
GRR_Lukman7 |
0.6580 |
1.2425 |
3.9162 |
1.4477 |
3.5208 |
11.7012 |
|
GRR_Lukman8 |
1.0902 |
2.4790 |
9.7868 |
3.0087 |
8.4983 |
31.8784 |
|
GRR_Lukman9 |
0.6815 |
0.8013 |
1.4840 |
1.0468 |
1.5884 |
2.8113 |
|
GRR_Lukman10 |
0.8547 |
1.5684 |
5.5986 |
2.0808 |
4.7607 |
17.3599 |
|
GRR_Lukman11 |
0.8371 |
1.4306 |
3.7911 |
1.5613 |
2.9232 |
9.0367 |
Table 9. Average MSE values when n=300 and /?0 = -1.
|
p=5 |
p=10 |
|||||
|
Method |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
ρ =0.85 |
ρ =0.95 |
ρ =0.99 |
|
MLE |
0.7154 |
2.2026 |
11.1116 |
2.0765 |
6.7260 |
34.4314 |
|
GRR_Lukman1 |
0.6542 |
1.8230 |
7.5016 |
1.8686 |
5.6814 |
25.6538 |
|
GRR_Lukman2 |
0.6581 |
1.2301 |
3.6960 |
0.9910 |
1.8927 |
5.6851 |
|
GRR_Lukman3 |
0.6899 |
1.5461 |
5.4742 |
1.5366 |
3.6920 |
13.3017 |
|
GRR_Lukman4 |
0.6666 |
1.9237 |
8.9121 |
1.8533 |
5.7939 |
28.3836 |
|
GRR_Lukman5 |
0.7541 |
1.5646 |
7.6838 |
0.9905 |
1.7285 |
20.5645 |
|
GRR_Lukman6 |
0.6562 |
1.6968 |
6.8684 |
1.5623 |
4.6231 |
21.4535 |
|
GRR_Lukman7 |
0.6435 |
0.9295 |
2.6926 |
1.0122 |
2.2516 |
7.7297 |
|
GRR_Lukman8 |
0.6760 |
1.8802 |
6.7105 |
1.8957 |
5.5830 |
22.1709 |
|
GRR_Lukman9 |
0.5874 |
0.8249 |
1.1946 |
0.9870 |
1.4071 |
2.2339 |
|
GRR_Lukman10 |
0.6863 |
1.2994 |
3.6031 |
1.3793 |
3.3887 |
12.2619 |
|
GRR_Lukman11 |
0.6371 |
1.2430 |
2.6854 |
1.1468 |
2.3586 |
7.0557 |
DISCUSSION
Using a simulation method based on Logistic Regression Model (LRM) that incorporates the estimators used, which suffer from the multicollinearity problem,
The results are summarized in Tables 1-9. The MSE criterion was determined based on several coefficients: explanatory variables p, sample size n, correlation ρ The following conclusions were obtained:
The results showed that the best value for the mean squared error (MSE) is highlighted in bold. As shown in Tables 1-9, the estimator GRR_Lukman9 performs better than the current estimators under almost all conditions. Furthermore, the MLE estimator exhibits the worst performance among the other estimators, as it is affected by the multicollinearity problem. Furthermore, increasing the correlation coefficient results in higher MSE values for all estimators when n and p are held constant. This is particularly true when the correlation coefficient ( ρ ) is 0.99, Similarly, increasing the number of explanatory variables (p) leads to an increase in the MSE of all estimators used.
P ERFORMANCE OF ρ :
When increases correlation, we observe that the estimators in general are negatively affected, while
K1, K2,... K11 estimators proposed by [27] are only slightly affected. In fact, the mean squared error (MSI) values of these estimators sometimes decrease as correlation increases. Thus, the usefulness of using logistic regression increases the correlation increases. Generally, the best choice is the K9 estimator when the correlation is 0.85, 0.95 and 0.99.
Performance of n
When increases sample size (n) while the correlation coefficients and explanatory variables ( ρ and p) remain constant, MSE value decreases. This indicates that increasing the sample size has a positive effect on the performance of all estimators. Specifically, the MSE of the GRR_K9 estimator decreases when compared to other estimators. This suggests that sufficiently large sample sizes can lead to stable estimations.
Performance number of explanatory variables(s)
Increasing the number of explanatory variables also changes the sample size, making direct comparisons of MSE values very difficult. However, increasing the number of observations leads to an increase in MSE values. Furthermore, the advantage of apply logistic regression (LR) increases with number of explanatory variables because MLE outperform logistic regression less frequently. In cases with 10 independent variables, ML estimation is never superior to LR estimators except in very large or weakly correlated samples.
CONCLUSION
The simulation results also indicate that four important factors influence the features of value estimators: n number of observations, s number of explanatory variables, and ρ the correlation between variables. In most cases, the mean squared error decreases when increases n and increases as the other coefficients increase. Therefore, the conclusion of this article is that the maximum likelihood estimator should not be used when there is a high degree of correlation the explanatory variables because it leads to a high mean squared error. The logistic regression should always be preferred, as ridge parameter estimators provide some reduction in the mean squared error, but others are even better. The optimal choice is K9 when correlation is low and high. These estimators significantly reduce variance in all the different cases investigated in this article.