The Impact of Dataset Size on the Reliability of Model Testing and Ranking
Автор: A.V. Chuiko, V.V. Arlazarov, S.A. Usilin
Рубрика: Программирование
Статья в выпуске: 2 т.18, 2025 года.
Бесплатный доступ
Machine learning is widely applied across diverse domains, with research teams continually developing new recognition models that compete on open datasets. In some tasks, accuracy surpasses 99% These minimal differences, combined with the varying size of the benchmark datasets, raise questions about the reliability of model evaluation and ranking. This paper introduces a method for determining the necessary dataset size to ensure robust hypothesis testing for model performance. It also examines the statistical significance of accuracy rankings in recent studies on MNIST, CIFAR-10, and CIFAR-100 datasets.
Dataset size, object recognition, statistical significance, model evaluation, recognition quality assessment
Короткий адрес: https://sciup.org/147250688
IDR: 147250688 | DOI: 10.14529/mmp250209