The Impact of Dataset Size on the Reliability of Model Testing and Ranking

Бесплатный доступ

Machine learning is widely applied across diverse domains, with research teams continually developing new recognition models that compete on open datasets. In some tasks, accuracy surpasses 99% These minimal differences, combined with the varying size of the benchmark datasets, raise questions about the reliability of model evaluation and ranking. This paper introduces a method for determining the necessary dataset size to ensure robust hypothesis testing for model performance. It also examines the statistical significance of accuracy rankings in recent studies on MNIST, CIFAR-10, and CIFAR-100 datasets.

Dataset size, object recognition, statistical significance, model evaluation, recognition quality assessment

Короткий адрес: https://sciup.org/147250688

IDR: 147250688   |   DOI: 10.14529/mmp250209

Статья научная