Using Fuzzy String Comparison For Automated Transfer of Formating in Poetic Works

Бесплатный доступ

The creation of the scientific and educational resource "Pushkin Digital" is driven by the necessity of typesetting poetic texts based on layout information from other editions. From one edition to another, texts may vary, and in each case, typesetting is performed a new according to the rules of the specific edition. Manual typesetting demands attentiveness and significant time and effort from a specialist, as it requires comparing several identical texts across multiple editions. The proposed method addresses two tasks. First, it determines the extent to which the texts differ between editions, enabling an assessment of the number of errors or deliberate transformations of the text, which is a separate subject of study for textual scholars. Second, based on an evaluation of line differences and their fuzzy alignment, the method generates typesetting rules for each line, taking into account the rules applied in earlier editions. The method was tested on 914 lyrical works by A.S. Pushkin, successfully ensuring the correct and complete transfer of typesetting for 74,55% of the texts. However, for 25,45% of the cases, this proved unfeasible, requiring manual typesetting instead.

Еще

Fuzzy string comparison, levenshtein distance, formatting, text processing

Короткий адрес: https://sciup.org/147251639

IDR: 147251639   |   DOI: 10.14529/mmp250308

Статья научная