Automatic differentiation algorithms for matrix operations

Бесплатный доступ

The article analyzes automatic differentiation (AD) algorithms for calculating derivatives of functions specified in the form of computer programs, which is especially important in problems where analytical differentiation is associated with significant difficulties or is completely impossible, while ensuring high accuracy comparable to the analytical approach and excluding errors typical of numerical differentiation. Particular attention is paid to the algorithm proposed by A.V. Klimov, which describes in detail both the forward and backward passes in the perceptron, providing a scheme for calculating gradients for training neural networks, focusing on clear indexing, a detailed description of operations in each node, formalization of gradient calculation by introducing conjugate nodes and taking into account node domains for correct generation of differentiation code. The paper also considers the features of applying AD to matrix operations, namely, direct and reverse modes, with an analysis of their impact on computational efficiency, as well as a justification for using the chain rule and function transformations to achieve compositionality of differentiation. A comparative analysis of the direct and inverse modes of AD is carried out in terms of computational complexity and memory costs, and optimization methods such as accumulation of tangents in memory and the use of back propagators are considered.

Еще

Automatic differentiation, matrix operations, direct method, inverse method, gradient, optimization, computational complexity

Короткий адрес: https://sciup.org/170209955

IDR: 170209955   |   DOI: 10.24412/2500-1000-2025-2-3-100-104

Статья научная