# Статьи журнала - Вестник Южно-Уральского государственного университета. Серия: Вычислительная математика и информатика

Все статьи: 306

A method for creating structural models of text documents using neural networks

Статья научная

The article describes modern neural network BERT-based models and considers their application for Natural Language Processing tasks such as question answering and named entity recognition. The article presents a method for solving the problem of automatically creating structural models of text documents. The proposed method is hybrid and is based on jointly utilizing several NLP models. The method builds a structural model of a document by extracting sentences that correspond to various aspects of the document. Information extraction is performed by using the BERT Question Answering model with questions that are prepared separately for each aspect. The answers are filtered via the BERT Named Entity Recognition model and used to generate the contents of each field of the structural model. The article proposes two algorithms for field content generation: Exclusive answer choosing algorithm and Generalizing answer forming algorithm, that are used for short and voluminous fields respectively. The article also describes the software implementation of the proposed method and discusses the results of experiments conducted to evaluate the quality of the method.

Бесплатно

Статья научная

The structural inverse gravity problem in a multilayer medium is one of the most important geophysics problem. Until recently, the problem was reduced to the separation of gravitational fields and the restoration ofunknown layers independently. Now the methods are in demand that allow find unknown layers simultaneously. For solving Urysohn integral equation of the first kind describing the problem regularized algorithmsLevenberg-Marquardt type with weight factors are investigated. A new Levenberg-Marquardt type methodbased on Levenberg-Marquardt scheme is proposed. A regularized Levenberg-Marquardt type method comparedwith classic Levenberg-Marquardt method. For classic Levenberg-Marquardt method some computationaloptimizations are offered. The numerical experiments using model gravitational data allow to compareconvergence rates, relative errors and program execution times of classic Levenberg-Marquardt algorithm andLevenberg-Marquardt method. The parallel programs implementing the algorithms are developed using CUDAand OpenMP technologies.

Бесплатно

Статья научная

This paper describes the development of a program for analysis of intoning of verbal pieces in the Russian language. The goal is to measure the differences between the intoning of verbal pieces by both native and international Russian language speakers. The research methodology is based on the application of neural network analysis for solving the task of identification of speech samples, obtained by recording inophones’ speech. The experiment was carried out with the participation of 12 people: native speakers of the Russian language and the Chinese language, both male and female, aged from 20 to 35. A total number of speech samples amounted to 4800 items. Overall, 10 speech items in declarative and interrogative intonation were analyzed. A neural network that provides an assessment of correspondence of a speech sample to the standard variant of intoning was formed and trained. The results of experimental research are presented in the form of statistical assessments of pronouncing the verbal pieces with various intonations. These results are recommended to be applied in the process of learning Russian as a foreign language: the obtained data are considered as the confidence threshold of intoning identification, which complies with the standard or deviates from it. The results can also be applied for the individualized automated compilation of recommendations on correction of mistakes.

Бесплатно

Статья научная

В статье представлен новый сверхмасштабируемый программный комплекс AstroPhi для моделирования динамики астрофизических объектов на гибридных суперЭВМ, оснащенных ускорителями Intel Xeon Phi. Численный метод решения газодинамических уравнений основан на специально адаптированной для реализации на множестве ускорителей комбинации метода крупных частиц и метода Годунова. Для решения уравнения Пуассона используется быстрое преобразование Фурье. Программная реализация была отдельно протестирована на газодинамических задачах, на задаче решения уравнения Пуассона и на классических задачах гравитационной газовой динамики. Показано ускорение программного комплекса при использовании ускорителей Intel Xeon Phi, уточнено понятие масштабируемости при использовании ускорителей. Представлены результаты моделирования коллапса астрофизических объектов.

Бесплатно

Developing intelligent assistants to searchfor content on websites of a certain genre

Статья научная

This paper discusses an approach to automatic generation of intelligent assistants, which provide information search on the content of a website. A feature of the approach is to use genre models, developed for a given type of resource (educational, informational, etc.), on the basis of which the genre structuring and subsequent thematic clustering of the content of the target website is performed. The resulting genre structures allow us to define more precisely the boundaries of thematic clusters related to the topic of the user’s search query. The search quality evaluation for the Russian-language websites showed an F-score of 87.8% and originality of 80.9%, which exceeds the Yandex search engine results by 1.1% and 9.1%, respectively. In order to predict user information needs, a method for refining the resulting sample is proposed. It allows a user to get information implicitly, based on current and previous queries, about what the user was not satisfied with in the previous search results. A model of user’s search intentions has been developed and its computational component includes a method for evaluating query closeness based on the FRiS function. Based on the proposed methods, a chatbot was created on the Telegram messenger platform to search the websites of educational institutions. The experiments showed that the user needs the average of 1.75 qualifying questions to find the necessary information.

Бесплатно

Development of a numerical method for solving the inverse Cauchy problem for the heat equation

Статья научная

In this work, the initial temperature has been investigated in the Cauchy inverse problem for linear heat conduction equation that it depends on the given temperature at specification time. In this problem, the initial temperature distribution is unknown, but instead, there is a known temperature at the time, t = T > 0. The heat conduction problem can be formulated as Fredholm integral first kind equation. It is well known that this problem is an ill-posed problem and direct solution to this problem is unacceptable. An algorithm has been used to define a finite-dimensional operator for this problem also used the generalized discrepancy method to reduce the conditional extremum variation problem to unconditional extremum variation problem for the integral equation. The discretization of the integral equation has made it possible to reduce this problem to a system of linear algebraic equations. Then, Tikhonov's regularization inversion method has been used to find an approximation solution. Finally, the numerical computation example has been presented to verify the accuracy of the estimated solution.

Бесплатно

Статья научная

Nowadays, we see a steady growth in the use of cloud computing in modern business. This enables to reduce the cost of IT infrastructure owning and operation; however, there are some issues related to the management of data processing centers. One of these issues is the effective use of companies' computing and network resources. The goal of optimization is to manage the traffic in cloud applications and services within data centers. Taking into account the multitier architecture of modern data centers, we need to pay a special attention to this task. The advantage of modern infrastructure virtualization is the possibility to use software-defined networks and software-defined data storages. However, the existing optimization of algorithmic solutions does not take into account the specific features of the network traffic formation with multiple application types. The task of optimizing traffic distribution for cloud applications and services can be solved by using software-defined infrastructure of virtual data centers. We have developed a simulation model for the traffic in software-defined networks segments of data centers involved in processing user requests to cloud application and services within a network environment. Our model enables to implement the traffic management algorithm of cloud applications and optimize the access to storage systems through the effective use of data transmission channels. During the experimental studies, we have found that the use of our algorithm enables to decrease the response time of cloud applications and services and, therefore, increase the productivity of user requests processing and reduce the number of refusals.

Бесплатно

Hierarchical model of architecture of supercomputer systems for comparison and ranking

Статья научная

The task of comparing the capabilities of computing systems with each other and forming various ratings has many possible goals. Here, there is the identification of trends, the promotion of proven general-purpose architectures, and the demonstration of superiority in a certain class of tasks, etc. It is, of course, not enough to describe the achieved performance for all these purposes, various rankings and comparisons use different levels of abstraction and generalization up to that level, which would allow to associate the identified performance indicators with certain features of the system. In practice, descriptions of the architectural peculiarities of systems in ratings are rather scarce, and the authors of the work solve the problem of development a formal description of computer systems of a relatively high level, which, at the same time, would allow to increase the required level of detail, corresponding to the goals of applied research. Such a hierarchical system description model has been proposed and tested on well-known systems from the Top50 and Top500 lists.

Бесплатно

Hybrid computer system programming technology with adaptation and scaling of calculations

Статья научная

The paper considers the programming technology for hybrid computer systems, which contain reconfigurable and microprocessor computational nodes. The base of the programming technology for hybrid computer systems is the high-level programming language COLAMO with extensions, which allow descriptions of various types of parallel calculations such as structural, structural-procedural, multi-procedural and procedural forms of organization of calculations in a unified parallel-pipeline form. The suggested parallel-pipeline form allows modifications of forms of organization of calculations. Such modifications are performed automatically by the COLAMO language preprocessor, which takes into account current configuration of the hybrid computer system. Owing to the suggested technology, the program can be automatically adapted to the changed architecture or configuration of the hybrid computer system without any modifications of the source code made by the developer. Specially for this the source parallel program, developed in the programming language COLAMO, is transformed by the pre-processor into the canonical form. Then the pre-processor estimates the available computational resource, detects effective parameters of implementation of the program on the available resource and, if necessary, reduces the program performance to adapt it to the current configuration of the hybrid computer system. The technology provides two-way scaling: for increasing of the available computational resource (induction), and for reducing of the available computational resource (reduction), which provides resource independence of programming during implementation of the program, i.e. the developer is not “bound” to the available hardware resource of the computer system.

Бесплатно

Intermediate fusion approach for pneumonia classification on imbalanced multimodal data

Статья научная

In medical practice, the primary diagnosis of diseases should be carried out quickly and, if possible, automatically. The processing of multimodal data in medicine has become a ubiquitous technique in the classification, prediction and detection of diseases. Pneumonia is one of the most common lung diseases. In our study, we used chest X-ray images as the first modality and the results of laboratory studies on a patient as the second modality to detect pneumonia. The architecture of the multimodal deep learning model was based on intermediate fusion. The model was trained on balanced and imbalanced data when the presence of pneumonia was determined in 50% and 9% of the total number of cases, respectively. For a more objective evaluation of the results, we compared our model performance with several other open-source models on our data. The experiments demonstrate the high performance of the proposed model for pneumonia detection based on two modalities even in cases of imbalanced classes (up to 96.6%) compared to single-modality models’ results (up to 93.5%). We made several integral estimates of the performance of the proposed model to cover and investigate all aspects of multimodal data and architecture features. There were accuracy, ROC AUC, PR AUC, F1 score, and the Matthews correlation coefficient metrics. Using various metrics, we proved the possibility and meaningfulness of the usage of the proposed model, aiming to properly classify the disease. Experiments showed that the performance of the model trained on imbalanced data was even slightly higher than other models considered.

Бесплатно

Investigation of different topologies of neural networks for data assimilation

Статья научная

Neural networks have emerged as a novel scheme for a data assimilation process. Neural network techniques are applied for data assimilation in the Lorenz chaotic system. A radial basis function and a multilayer perceptron neural networks are trained employing 1000, 2000, and 4000 examples. Three different observation intervals are used: 0.01, 0.06 and 0.1 s. The performance of the data assimilation technique is investigated for different architectures of these neural networks. The best results of the MP-NN for sampled observation at 0.06 and 0.01 s were obtained using 3 neurons, with hyperbolic-tangent in the output layer. For RBF-NN, the best

Бесплатно

Статья научная

Проект KernelGen (http://kernelgen.org/) имеет цель создать на основе современных открытых технологий компилятор Fortran и C для автоматического портирования приложений на GPU без модификации их исходного кода. Анализ параллелизма в KernelGen основан на инфраструктуре LLVM/Polly и CLooG, модифицированной для генерации GPU-ядер и alias-анализе времени исполнения. PTX-ассемблер для GPU NVIDIA генерируется с помощью бекенда NVPTX. Благодаря интеграции LLVM-части с GCC с помощью плагина DragonEgg и модифицированного компоновщика, KernelGen способен, при полной совместимости с компилятором GCC, генерировать исполняемые модули, содержащие одновременно CPU- и GPU-варианты машинного кода. В сравнительных тестах с OpenACC-компилятором PGI KernelGen демонстрирует большую гибкость по ряду возможностей, обеспечивая при этом сравнимый или до 60 % более высокий уровень производительности.

Бесплатно

Octoshell: система для администрирования больших суперкомпьютерных комплексов

Статья научная

Управление современными суперкомпьютерными центрами и входящими в их состав вычислительными системами представляет собой сложный и комплексный процесс. Традиционное использование многочисленных инструментов для решения отдельных задач по управлению и администрированию суперкомпьютеров становится ограничивающим фактором эффективного использования вычислительных ресурсов при растущих масштабах систем. Разработанная система поддержки работы суперкомпьютерных центров «Octoshell» призвана решить указанную проблему, реализуя в едином интерфейсе основные инструменты администрирования, и позволяет в значительной мере автоматизировать выполнение типовых задач обеспечения эффективного функционирования больших суперкомпьютерных комплексов.

Бесплатно

On using the decision trees to identify the local extrema in parallel global optimization algorithm

Статья научная

In the present work, the solving of the multidimensional global optimization problems using decision tree to reveal the attractor regions of the local minima is considered. The objective function of the problem is defined as a “black box”, may be non-differentiable, multi-extremal and computational costly. We assume that the function satisfies the Lipschitz condition with a priory unknown constant. Global search algorithm is applied for the search of global minimum in the problems of such type. It is well known that the solution complexity essentially depends on the presence of multiple local extrema. Within the framework of the global search algorithm, we propose a method for selecting the vicinity of local extrema of the objective function based on analysis of accumulated search information. Conducting such an analysis using machine learning techniques allows making a decision to run a local method, which can speed up the convergence of the algorithm. This suggestion was confirmed by the results of numerical experiments demonstrating the speedup when solving a series of test problems.

Бесплатно

Parallel algorithms for effective correspondence problem solution in computer vision

Статья научная

We propose new parallel algorithms for correspondence problem solution in computer vision. We develop an industrial photogrammetric system that uses artificial retroreflective targets that are photometrically identical. Therefore, we cannot use traditional descriptor-based point matching methods, such as SIFT, SURF etc. Instead, we use epipolar geometry constraints for finding potential point correspondences between images. In this paper, we propose new effective graph-based algorithms for finding point correspondences across the whole set of images (in contrast to traditional methods that use 2-4 images for point matching). We give an exact problem solution via superclique and show that this approach cannot be used for real tasks due to computational complexity. We propose a new effective parallel algorithm that builds the graph from epipolar constraints, as well as a new fast parallel heuristic clique finding algorithm. We use an iterative scheme (with backprojection of the points, filtering of outliers and bundle adjustment of point coordinates and cameras’ positions) to obtain an exact correspondence problem solution. This scheme allows using heuristic clique finding algorithm at each iteration. The proposed architecture of the system offers a significant advantage in time. Newly proposed algorithms have been implemented in code; their performance has been estimated. We also investigate their impact on the effectiveness of the photogrammetric system that is currently under development and experimentally prove algorithms’ efficiency.

Бесплатно

Статья научная

The paper covers the development and researching mathematical model of interaction processes between plankton and ctenophore populations based on the modern information technologies and computational methods, which leads to increase of the accuracy of predictive modeling of the ecology situation in shallow water in summer. The model takes into account the following: the transport of water environment; microturbulent diffusion; nonlinear interaction of plankton and ctenophore populations; biogenic, temperature and oxygen regimes; influence of salinity. The computational accuracy is significantly increased, and computational time is decreased at using the calculation method based on partially filled cells for discretization of model. The practical significance is the software implementation of the proposed model, the limits and prospects of its practical use are defined. Experimental software was developed based on multiprocessor computer system, which is intended for mathematical modeling of possible progress scenarios in shallow waters ecosystems on the example of the Azov Sea in summer. We used decomposition methods of grid domains in parallel implementation for computationally laborious convection-diffusion problems, taking into account the architecture and parameters of multiprocessor computer system.

Бесплатно

Preimage attack on MD4 hash function as a problem of parallel sat-based cryptanalysis

Статья научная

In this paper we study the inversion problem of MD4 cryptographic hash function developed by R. Rivest in 1990. By MD4-k we denote a truncated variant of MD4 hash function in which k represents a number ofsteps used to calculate a hash value (the full version of MD4 function corresponds to MD4-48). H. Dobbertin hasshowed that MD4-32 hash function is not one-way, namely, it can be inverted for the given image of a randominput. He suggested to add special conditions to the equations that describe the computation of concrete steps(chaining variables) of the considered hash function. These additional conditions allowed to solve the inversionproblem of MD4-32 within a reasonable time by solving corresponding system of equations. The main result ofthe present paper is an automatic derivation of “Dobbertin’s conditions” using parallel SAT solving algorithms.We also managed to solve several inversion problems of functions of the kind MD4-k (for k from 31 up to 39inclusive). Our method significantly outperforms previously existing approaches to solving these problems.

Бесплатно

Preliminary assessment of hydrothermal risks in the Euphrates-Tigris basin: droughts in Iraq

Статья научная

This paper presents a temporal and spatial pattern of precipitation, surface air temperature, and drought occurrence in Euphrates-Tigris rivers basin with special emphases on Iraq. Historical records based on 115 years (1900-2014) of monthly precipitation and temperature data has been divided into four sub-periods, each of 30 years (first 1900-1929, second 1930-1959, third 1960-1989 and fourth 1985-2014) and studied separately. The results showed that the mean annual precipitation in Iraq for the four sub-periods is: 218.5, 202.1, 196.4, and 174.9 mm respectively, with an average of 198 mm. This indicates that the mean annual precipitation decreased by 43.6 mm (20 %) in the fourth sub-period compared to the first sub-period. The mean annual temperature for the four sub-periods in Iraq are 22.0, 21.9, 22.0, 22.8 °C respectively, with an average of 22.2 °C. This indicates that the average monthly temperature during the year in Iraq increased by 0.76 °C (3.45 %) in the fourth sub-period compared to the first sub-period. The probability of occurrence of dry (hot) periods in Iraq increased by 345.5 % (147.7 %) in the fourth sub-period compared to the first sub-period. Fortunately, the greatest drought occurrence is observed in western parts of Ira, where agriculture is irrigated, in rain-fed areas in the northern Iraq, there has also been a decrease in precipitation, but not so strong as in the west of the country. A preliminary conclusion about the current climatic desertification and its possible consequences for Iraq was drawn.

Бесплатно

Solving grid equations using the alternating-triangular method on a graphics accelerator

Статья научная

The paper describes a parallel-pipeline implementation of solving grid equations using the modified alternating-triangular iterative method (MATM), obtained by numerically solving the equations of mathematical physics. The greatest computational costs at using this method are on the stages of solving a system of linear algebraic equations (SLAE) with lower triangular and upper non-triangular matrices. An algorithm for solving the SLAE with a lower triangular matrix on a graphics accelerator using NVIDIA CUDA technology is presented. To implement the parallel-pipeline method, a three-dimensional decomposition of the computational domain was used. It is divided into blocks along the y coordinate, the number of which corresponds to the number of GPU streaming multiprocessors involved in the calculations. In turn, the blocks are divided into fragments according to two spatial coordinates - x and z. The presented graph model describes the relationship between adjacent fragments of the computational grid and the pipeline calculation process. Based on the results of computational experiments, a regression model was obtained that describes the dependence of the time for calculation one MATM step on the GPU, the acceleration and efficiency for SLAE solution with a lower triangular matrix by the parallel-pipeline method on the GPU were calculated using the different number of streaming multiprocessors.

Бесплатно

Статья научная

Efficient use and high output of any supercomputer depends on a great number of factors. The problem of controlling granted resource utilization is one of those, and becomes especially noticeable in conditions of concurrent work of many user projects. It is important to provide users with detailed information on peculiarities of their executed jobs. At the same time it is important to provide project managers with detailed information on resource utilization by project members by giving access to the detailed job analysis. Unfortunately, such information is rarely available. This gap should be eliminated with our proposed approach to supercomputer application integral characteristics analysis for the whole queued job collection of large-scale HPC systems based on system monitoring data management and study, building integral job characteristics, revealing job categories and single job run peculiarities.

Бесплатно