Hallucinations of Language Models through the Lens of Precedent Expressions

Бесплатный доступ

The article demonstrates that AI language models, despite their large training datasets and sophisticated learning algorithms, face two key challenges: training fa-tigue, in which models cease to effectively assimilate new data, and hallucinations, in which models generate unreliable information by substituting missing knowledge with statistical patterns and associative links. An analysis of precedent texts shows that AI systems often fail to recognize cultural context and the symbolic meaning of texts, which leads to distorted responses. The results of an experiment involving two transformer-based models (YandexGPT 4 Pro RC and ChatGPT-4) has confirmed the hypothesis that hallucinations arise from gaps in the training data. Both models exhibited a tendency to misidentify the sources of quotations and to distort precedent texts, while at the same time performing well in recognizing certain widely known materials (e.g., advertising slogans from the 1990s, Yeltsin’s aphorisms, and quotations from Ilf and Petrov). The study has also revealed that the likelihood of a correct response increases as the amount of contextual information grows; however, errors still occur even when exten-sive context is provided. This finding points to fundamental limitations in the models’ ability to understand and generate text while taking cultural and symbolic context into account. The results underscore the need for further investigation into the mechanisms underly-ing language models, the development of methods to reduce hallucinations, and im-provements in the handling of cultural context. Promising directions include expanding and diversifying training corpora, as well as developing specialized methodologies for assessing and correcting the models’ cultural intelligence.

Еще

Language models, artificial intelligence, hallucinations, precedent texts, linguocultural gaps, cultural context, associative and statistical token selection

Короткий адрес: https://sciup.org/148332639

IDR: 148332639   |   УДК: 811.161.1   |   DOI: 10.18101/2686-7095-2025-4-12-21