Влияние настраиваемых параметров полносвязной нейронной сети на качество предсказания для задачи классификации литотипов
Автор: Коссов Г.А., Селезнев И.А.
Журнал: Проблемы информатики @problem-info
Рубрика: Прикладные информационные технологии
Статья в выпуске: 1 (58), 2023 года.
Бесплатный доступ
В работе рассматривается задача классификации литотипов с помощвю полносвязной нейронной сети. Тренировочными и тестовыми данными являются цветовые и текстурные признаки, полученные в результате анализа полноразмерных изображений керна. Преимущества такого подхода заключаются в возможности как обучать модель в реальном времени, так и адаптировать ее к новому набору данных посредством дообучения. Число признаков каждого тренировочного примера равнялось 48, число классов, соответствующих определенным литотипам - 20. В работе показано, что для задачи классификации с помощью нейронных сетей наиболее значимым параметром архитектуры модели является число слоев и узлов. В работе была предложена оценка сложности алгоритма в терминах O-нотации. Показано, что число выполняемых операций растет линейно O(m) по числу слоев и кубически O(n3) по числу нейронов в слое. Однако с точки зрения качества предсказания модели увеличение числа слоев не приводит к лучшим результатам. При анализе зависимости метрики fl-score от числа узлов для различных слоев было получено, что увеличение числа нейронов приводит к выигрышу в качестве предсказания.
Нейронные сети, классификация литотипов, анализ керна, гиперпараметры, обучение с учителем
Короткий адрес: https://sciup.org/143180996
IDR: 143180996 | УДК: 519.7 | DOI: 10.24412/2073-0667-2023-1-48-59
Influence of neural network parameters for the quality of prediction for the tasks of automatic lithotype description
Machine learning methods are widely used for solving problems of interpreting and describing geological and geophysical data. One of them is automatic lithology extraction during the analysis of a whole core photographs. In this paper we propose to analyze the parameters that represent the textural and color features of the images. The advantage of this approach is that it allows online training and retraining of the classification model. Among the existing classification methods, such as boosting, random forests, support vector machines, neural networks are preferred for their universality and implementation in various sets of programming tools. The application of neural networks requires the user to have a clear understanding of the modelling goals, because an important factor is the choice of model architecture. There are many parameters that are set by the user, and all of them affect the quality of the prediction. Therefore, the purpose of this research is to study the behavior of networks with various configurations and to find any common regularities. The paper considers the problem of classifying lithotypes using fully connected neural networks. The data for processing are color and textural features that were obtained as a result of the processing of whole core images. Thus, we consider the classification task of training examples with 48 features into 20 classes corresponding to certain lithotypes. The test sample consisted of 2998 elements. We trained the model on samples consisting of 10,000 and 1,000 elements, respectively. The hyperparameters of the model include loss function, optimization method, activation function, batch size, number of epochs, number of hidden layers, and number of neurons in a layer. Based on a given issue, it is already possible to explain the choice of one or another parameter or function in advance. For the classification problem the optimal way is using ReLU and LogSoftMax activation function. CrossEntropyLoss was used as a loss function. This loss function combines LogSoftMax and NLLLoss, so the use of LogSoftMax is also justified by simplifying the calculation of CrossEntropyLoss. We use the Adam algorithm as the method of optimization. The quality of the model was evaluated using the fl-score metric. According to the results of training a model with a fixed number of layers and nodes, but with a different batch size, it was figured out that the optimal batch size consists of 256 elements. Based on this assumption we determined that 30 epochs are enough to train the model. All in all among a large set of network hyperparameters it is complicated to determine the exact number of network elements, i.e. the number of layers and neurons. Therefore, in the current research we study the dependence of fl-score and the value of the loss function on the number of nodes in the layer. The paper shows that an increase in the number of neurons definitely leads to a gain in quality. Fl-score equals 1 for all cases after 10 neurons in a layer. Moreover, a model with incorrect number of layers can be improved by increasing the amount of neurons in each layer. Increasing the number of layers allows the model to construct a more complex approximation, which can improve the quality of the prediction. However, as the number of layers increases, there is a risk of network overfitting and the appearance of local minima of the error function that leads to training problems. Thus, the number of nodes in a layer is the defining parameter and we should set this parameter up first. An important factor in the model training is the time spending. In this research, we propose a following estimate of the algorithm complexity. Besides, we have studied the influence of the number of layers (m) and nodes (n). The estimate is given in terms of O-notation. It is shown that the number of performed operations increase linearly O(m) in the number of layers and cubically O(n3) in the number of neurons. Consequently, with relation to the number of operations it is preferably to increase the number of network layers. However, many elements does not guarantee the rise in the fl-score. The predictions of some classification algorithms (for example, boosting or random forest) are highly dependent on the first initialization of the parameters. In our case, the dependence of the loss value on the random initialization of the neural network weights was investigated. We use the Epps-Pally test to check the normality of the loss value distribution. Tests have shown that the distribution of the value of the loss is not a Gaussian one. This fact should be taken into account in setting the requirement for the reproducibility of experiments result. The starting model weights should be initialized accordingly.
Список литературы Влияние настраиваемых параметров полносвязной нейронной сети на качество предсказания для задачи классификации литотипов
- Thomas A., et al. Automated lithology extraction from core photographs // First Break. 2011. V. 29. N 6.
- Baraboshkin E. E., et al. Deep convolutions for in-depth automated rock typing // Computers and Geosciences. 2020. V. 135. P. 104330.
- Abashkin V. V., et al. Quantitative analysis of whole core photos for continental oilfield of Western Siberia // SPE Russian Petroleum Technology Conference, OnePetro, 2020.
- Seleznev I. A., et al. Joint Usage of Whole Core Images Obtained in Different Frequency Ranges for the Tasks of Automatic Lithotype Description and Modeling of Rocks’ Petrophysics Properties // Geomodel 2020, European Association of Geoscientists and Engineers, 2020. V. 2020. N 1. P. 1-5.
- Амиргалиев E. H., и др. Интеграция алгоритмов распознавания литологических типов // Проблемы информатики. 2013. № 4 (21). С. 11-20.
- Чанг Б. Т Т., и др. Классификация изображений на основе применения цветовой информации, вейвлет-преобразования Хаара и многослойной нейронной сети // Проблемы информатики. 2011. № 5. С. 81-86.
- Мухамедгалиев А. Ф., Разакова М. Г., Смирнов В. В. Создание и развитие геоинформационных технологий тематической интерпретации данных радиолокационного зондирования с использованием математических методов и вычислительных алгоритмов текстурной классификации и нейронных сетей // Проблемы информатики. 2012. № 3. С. 69-73.
- Manurangsi, Р., Reichman, D. The computational complexity of training ReLU (s). arXiv:1810.04207v2 [cs.CC]. 2018.
- Kingma, D. P., Ba, J. Adam: A method for stochastic optimization. arXiv:1412.6980 [cs.LG]. 2014.
- Максимушкин В. В., Арзамасцев А. А. Сравнительная оценка вычислительной сложности обучения искусственной нейронной сети с жестким ядром и сети с классической структурой // Вестник Тамбовского университета. Серия: Естественные и технические науки. 2006. Т. 11. № 2. С.190-197.
- Makienko D., Seleznev I., Safonov I. The effect of the imbalanced training dataset on the quality of classification of lithotypes via whole core photos //Creative Commons License Attribution. 2020. V. 4.
- Bernard, S., Heutte, L., Adam, S. Influence of hyperparameters on random forest accuracy // International workshop on multiple classifier systems, Springer, Berlin, Heidelberg, 2009. P. 171-180.
- Epps, T. W., Pulley, L. B. A test for normality based on the empirical characteristic function // Biometrika. 1983. V. 70. N 3. P. 723-726.
- ГОСТ P. 5479-2002. Статистические методы. Проверка отклонения распределения вероятностей от нормального распределения / М.: Изд-во стандартов, 2002.
- Лемешко Б. Ю. Критерии проверки отклонения распределения от нормального закона. Руководство по применению / Б. Ю. Лемешко. М.: ООО «Научно-издательский центр ИНФРА- М», 2015. 160 с.