Analysis of categorical data encoding algorithms

Автор: Sh.A. Tursunov, A.E. Rashidov

Журнал: Проблемы информатики @problem-info

Рубрика: Параллельное системное программирование и вычислительные технологии

Статья в выпуске: 2 (67), 2025 года.

Бесплатный доступ

It is known that the efficiency of artificial intelligence, which is recognized as the most useful tool in all fields, is closely related to several factors. One of these important factors is that the data entering the artificial intelligence algorithms must be in a form that these algorithms understand. That is, since artificial intelligence algorithms are based on mathematical operations and expressions, there must be an opportunity to perform mathematical operators on the incoming data. However, there are projects in which, during the use of artificial intelligence algorithms, data is encountered that does not allow arithmetic operations to be performed. Since discarding this data can negatively affect the result of artificial intelligence, it is necessary to convert it from one form to another. That is, this data is converted to the numeric type. It is known that there are several methods for categorical data encoding, and selecting the best one from these methods is a complex research process. This requires the artificial intelligence user to know not only information about the data set, but also information about all the methods. This research work is aimed at analyzing the methods of categorical data encoding. During the research, 12 different methods of form transformation in text data are studied and analyzed. The advantages and disadvantages of each studied method of encoding are studied. At the same time, a comparative analysis of the studied methods is conducted and a general conclusion is given.

Еще

Artificial intelligence, categorical data encoding, data encoding methods

Короткий адрес: https://sciup.org/143185032

IDR: 143185032 | УДК: 519 | DOI: 10.24412/2073-0667-2025-2-65-80