A Hybrid Active and Semi-Supervised Learning Framework for Classification with Minimal Labeled Data
Автор: Kostiantyn O. Minkov, Igor V. Malyk
Журнал: International Journal of Intelligent Systems and Applications @ijisa
Статья в выпуске: 3 vol.18, 2026 года.
Бесплатный доступ
Modern machine learning models typically require large amounts of precisely labeled data to perform effectively. However, obtaining such labels is time-consuming and costly, especially in specialized domains such as medical image analysis and document classification, where unlabeled data is abundant but expert annotation is scarce. This paper addresses the problem of learning from very few labeled examples by jointly leveraging weak supervision, active learning (AL), and semi-supervised learning (SSL). A hybrid framework is proposed in which a small set of informative samples is actively selected for manual annotation using an entropy-based acquisition function combined with weak label disagreement scoring, while a large pool of unlabeled or weakly labeled data is exploited through SSL based on the FixMatch algorithm. The approach iteratively corrects noisy labels and refines the model with minimal human involvement. The framework is evaluated using a ResNet-18 classifier on the CIFAR-10 benchmark dataset and is compared against two baselines: pure active learning and pure semi-supervised learning. Each method is run independently across three random seeds at the key active learning rounds, and accuracy is reported as mean ± standard deviation. Across three independent seeds, the hybrid framework consistently leads both baselines at intermediate labelling budgets, with the largest absolute gap at Round 15 (+1.27 percentage points over pure active learning, +1.35 percentage points over pure SSL). The framework also offers a clear label-efficiency advantage: at Round 15, with |D_L | = 6500 labels, the hybrid method already reaches 0.6792 ± 0.0097 test accuracy – exceeding the accuracies that pure active learning (0.6730 ± 0.0139) and pure SSL (0.6687 ± 0.0056) attain only at Round 20 with |D_L | = 7000. By Round 20 all three methods saturate near a common data ceiling, indicating that the integrated use of weak supervision, active learning, and consistency-based SSL is most valuable when the annotation budget is genuinely constrained.
Active Learning, Semi-Supervised Learning, Neural Networks, Classification, Machine Learning, Data Analysis, Model, Accuracy
Короткий адрес: https://sciup.org/15020395
IDR: 15020395 | DOI: 10.5815/ijisa.2026.03.05