Development of a modified Winnowing method for aggregating bibliographic information data from citation systems under the conditions of incomplete information

Бесплатный доступ

Currently, transition to the electronic presentation of bibliographic information about scientific works has caused an increased interest in scientometric research. At the same time, the existing scientometric methods are criticized by scientists, since the incomplete bibliographic base and tools for its assessment do not allow the most accurate assessment of the contribution of scientific work. The problem of the quality of scientometric assessments, as a rule, is based on the study of the data of a certain citation system, which does not include complete information about all publications of the authors contained in other citation systems. Aim. This study is aimed at developing an adaptive approach for the formation of aggregated data of bibliographic information of a scientific organization in conditions of incomplete information from the citation systems of the RSCI, “Google Academy” and Scopus. Methods. The definition of the aggregated list of publications for the analysis of scientometric indicators was carried out by the Winnowing method, the Levenshtein algorithm, the shingle method and the Jaro-Winkler method. In the framework of the experimental study, the effectiveness of the application of the considered methods for aggregating information from citation systems was assessed based on the analysis of accuracy, completeness and F-measure. Results. Experiments on test data from the list of publications by authors of the Orenburg State University from the citation systems RSCI, Google Academy and Scopus showed that the Winnowing method formed the most accurate lists of publications by the F-measure criterion. To improve the performance of this algorithm, a two-stage optimization of the aggregation process was carried out, which made it possible to improve the running time of the algorithm when generating a list of bibliographic descriptions. Conclusion. The proposed approach for the formation of aggregated data of bibliographic information of a scientific organization in conditions of incomplete information from the citation systems of the Russian Science Citation Index, Google Academy and Scopus allows increasing productivity in the formation of a list of authors' publications and shows good efficiency in determining the scientometric characteristics of authors.

Еще

Citation system, scientometric methods, aggregation of bibliographic information, modification of the Winnowing method, Levenshtein method, shingle method

Короткий адрес: https://sciup.org/147233779

IDR: 147233779   |   DOI: 10.14529/ctcr200413

Краткое сообщение