Influence of neural network parameters for the quality of prediction for the tasks of automatic lithotype description
Автор: Kossov G.A., Seleznev I.A.
Журнал: Проблемы информатики @problem-info
Рубрика: Прикладные информационные технологии
Статья в выпуске: 1 (58), 2023 года.
Бесплатный доступ
Machine learning methods are widely used for solving problems of interpreting and describing geological and geophysical data. One of them is automatic lithology extraction during the analysis of a whole core photographs. In this paper we propose to analyze the parameters that represent the textural and color features of the images. The advantage of this approach is that it allows online training and retraining of the classification model. Among the existing classification methods, such as boosting, random forests, support vector machines, neural networks are preferred for their universality and implementation in various sets of programming tools. The application of neural networks requires the user to have a clear understanding of the modelling goals, because an important factor is the choice of model architecture. There are many parameters that are set by the user, and all of them affect the quality of the prediction. Therefore, the purpose of this research is to study the behavior of networks with various configurations and to find any common regularities. The paper considers the problem of classifying lithotypes using fully connected neural networks. The data for processing are color and textural features that were obtained as a result of the processing of whole core images. Thus, we consider the classification task of training examples with 48 features into 20 classes corresponding to certain lithotypes. The test sample consisted of 2998 elements. We trained the model on samples consisting of 10,000 and 1,000 elements, respectively. The hyperparameters of the model include loss function, optimization method, activation function, batch size, number of epochs, number of hidden layers, and number of neurons in a layer. Based on a given issue, it is already possible to explain the choice of one or another parameter or function in advance. For the classification problem the optimal way is using ReLU and LogSoftMax activation function. CrossEntropyLoss was used as a loss function. This loss function combines LogSoftMax and NLLLoss, so the use of LogSoftMax is also justified by simplifying the calculation of CrossEntropyLoss. We use the Adam algorithm as the method of optimization. The quality of the model was evaluated using the fl-score metric. According to the results of training a model with a fixed number of layers and nodes, but with a different batch size, it was figured out that the optimal batch size consists of 256 elements. Based on this assumption we determined that 30 epochs are enough to train the model. All in all among a large set of network hyperparameters it is complicated to determine the exact number of network elements, i.e. the number of layers and neurons. Therefore, in the current research we study the dependence of fl-score and the value of the loss function on the number of nodes in the layer. The paper shows that an increase in the number of neurons definitely leads to a gain in quality. Fl-score equals 1 for all cases after 10 neurons in a layer. Moreover, a model with incorrect number of layers can be improved by increasing the amount of neurons in each layer. Increasing the number of layers allows the model to construct a more complex approximation, which can improve the quality of the prediction. However, as the number of layers increases, there is a risk of network overfitting and the appearance of local minima of the error function that leads to training problems. Thus, the number of nodes in a layer is the defining parameter and we should set this parameter up first. An important factor in the model training is the time spending. In this research, we propose a following estimate of the algorithm complexity. Besides, we have studied the influence of the number of layers (m) and nodes (n). The estimate is given in terms of O-notation. It is shown that the number of performed operations increase linearly O(m) in the number of layers and cubically O(n3) in the number of neurons. Consequently, with relation to the number of operations it is preferably to increase the number of network layers. However, many elements does not guarantee the rise in the fl-score. The predictions of some classification algorithms (for example, boosting or random forest) are highly dependent on the first initialization of the parameters. In our case, the dependence of the loss value on the random initialization of the neural network weights was investigated. We use the Epps-Pally test to check the normality of the loss value distribution. Tests have shown that the distribution of the value of the loss is not a Gaussian one. This fact should be taken into account in setting the requirement for the reproducibility of experiments result. The starting model weights should be initialized accordingly.
Neural network, lithotype description, core analysis, hyperparameters, supervised learning
Короткий адрес: https://sciup.org/143180996
IDR: 143180996 | DOI: 10.24412/2073-0667-2023-1-48-59