Machine learning model compression method based on iterative layer filtering

Основное

Автор: Utkin I.A., Nagorny D.S.

Журнал: Труды Московского физико-технического института @trudy-mipt

Статья в выпуске: 2 (66) т.17, 2025 года.

Бесплатный доступ

The accelerating growth of both the size of the machine learning model and the required computational power has led to the emergence of a number of techniques that reduce the resources expended when using machine learning models. Such methods are: quantization, pruning, distillation and their combinations. The presented research is devoted to one of the current topics related to pruning of machine learning models for further compression that in the long term will allow their use in more compact devices, e.g., such as laptop computers or smartphones. Pruning or filtering of model parameters is based on different criteria. The proposed method is based on a design feature of the model such as normalization layers, which bring the values of the model weights to a normal distribution. Based on the distribution as a criterion it is proposed to use the intervals of standard deviation. The weights that fall within the intervals of standard deviation are truncated in the layer with a further multiplication by a scaling factor. The filtering technique is applied to the whole model with periodic control of metrics during layer processing, which is implemented as an iterative algorithm. The algorithm resulted in a compressed model with an acceptable reduction of metrics (0.95 from the reference one). Depending on the input data, the compression ranged from 0.113 to 0.1848 of the total number of parameters. The number of removed layer parameters varied from 0.74 to 0.99 in relative units, where up to half of all model layers were processed were subjected to up to half of all model layers. The software package with the calculations used in the study is available at the the following reference [8].

Еще

Machine learning model, parameter pruning, language models, iterative filtering, binary classification

Короткий адрес: https://sciup.org/142245007

IDR: 142245007

Статья научная