KL-triggered Continual Adaptation for Nonstationary Resource Allocation: An Off-policy Actor–critic Approach with Nash Social Welfare

Yih-Chang Chen

doi:10.5815/ijisa.2026.03.01

Scientific articles \ Prolegomena. Fundamentals of knowledge and culture. Propaedeutics \ Computer science and technology. Computing. Data processing \ Artificial intelligence

KL-triggered Continual Adaptation for Nonstationary Resource Allocation: An Off-policy Actor–critic Approach with Nash Social Welfare

Автор: Yih-Chang Chen

Журнал: International Journal of Intelligent Systems and Applications @ijisa

Статья в выпуске: 3 vol.18, 2026 года.

Бесплатный доступ

This paper proposes a drift-aware off-policy deterministic actor–critic framework for constrained continuous resource allocation in non-stationary environments. Feasible allocations are ensured by a simplex-parameterized policy using softmax normalization with budget scaling, avoiding projection or Lagrangian tuning. The reward integrates Nash social welfare via mean log-utility, efficiency, fairness, and constraint-violation penalties with adaptive weights. To improve sample efficiency, we adopt prioritized experience replay based on TD error and state novelty. Non-stationarity is detected by KL divergence between recent and historical state-visitation distributions; detected drift triggers buffer refresh and incremental fine-tuning, while Elastic Weight Consolidation mitigates catastrophic forgetting. Experiments across six application-motivated domains (food, medical, housing, education services, employment support, and elderly care) demonstrate improved utilization and welfare with reduced inequality and low decision latency compared with optimization, heuristic, and DRL baselines. Results are reported over multiple runs with mean ± standard deviation and corrected significance tests.

Deep Reinforcement Learning, Non-Stationary Environments, Constrained Resource Allocation, Nash Social Welfare, Continual Adaptation

Короткий адрес: https://sciup.org/15020391

IDR: 15020391 | DOI: 10.5815/ijisa.2026.03.01