Basic Algorithm for Automatic Spelling Correction of Russian Texts: Development, Evaluation and Prospects

Бесплатный доступ

Automatic spelling check and correction of texts in Russian is an urgent task in the field of natural language processing. Our research is aimed at developing, evaluating, and describing a computer programme for correcting spelling errors with high accuracy.The proposed method is based on line-by-line text processing using rules for spelling and capitalisation accuracy and a probabilistic model for proposing candidate words for error correction. Our algorithm operates at the level of individual words, which limits its ability to take context into account. The metrics used to test the quality of the model are Precision, Recall, and F1 Score. For ease of use and program refinement, we integrated automated error analysis and detailed report generation to identify the strengths and weaknesses of the algorithm. The detailed development description ensures the reproducibility of the algorithm and is in line with the Open-source ideology.The results showed that the algorithm has high Precision = 1.00, i.e., it corrects only those spelling errors that were specified in the reference text. However, the Recall = 0.84 emphasises the need for further refinement, including handling context-dependent errors and processing sуе expressions. The F1 Score = 0.91 confirms the balanced performance of the algorithm and justifies its use as a basic model of text correction in Russian.The conclusions of the study emphasise the potential of the algorithm in the tasks of automatic correction of Russian-language text, and suggest prospective areas for improving the source code, such as the use of n-grams and language models. This work lays the foundation for further research in the field of automatic correction of Russian-language texts.

Еще

Spelling errors, grammatical errors, Russian language, automatic text correction, natural language processing, accuracy, completeness, F1 Score

Короткий адрес: https://sciup.org/147247352

IDR: 147247352   |   DOI: 10.17072/1993-0550-2025-1-91-108

Статья научная