Overview of ACO Gym using reinforcement learning
Автор: Legkodumov A.A., Kozeev B.N., Belikov V.V., Korolkov A.V.
Рубрика: Информатика и вычислительная техника
Статья в выпуске: 3, 2025 года.
Бесплатный доступ
The article discusses the main components of ACO Gyms for training autonomous agents. These agents are used to train one of the directions of artificial intelligence – reinforcement learning, the basics of which are also outlined in the article. Autonomous agents are used to respond to security incidents in the infrastructure, thereby leveling out potential losses. The agent follows an optimal policy, which it receives after training in ACO Gyms in the form of a certain aspect of information security. Creating these environments is a multicomponent process, so before creating your own environment it is necessary to highlight the key significant components of existing training environments. The purpose of the article is to highlight the key aspects of ACD training environments, aspects of their work, corresponding proximity of the environment to reality, for reliability and validity of the obtained policies of agents in situations of counteraction to an attacker in real information systems. The scientific novelty of the work consists in a comprehensive systematization of existing approaches to the research of ACD training environments, identification of components of training environments and nuances of their operation. The paper defines the foundations of reinforcement learning and specifies the fundamentals proceeding in the process of training autonomous agents. A new phenomenon in traditional information defense is considered – Automated Cyber Defense, part of which are ACD training environments and autonomous agents. The advantages and disadvantages of simulators and emulators are presented. It is demonstrated that a specific training environment should be used for a specific task in information security. An introduction to reinforcement learning is given and a formal problem formulation in reinforcement learning and the domain under study is given. The main components of ACD training environments are derived, which can be applied to further create one’s own training environment.
Reinforcement learning, ACO Gyms, autonomous agent, Automated Cyber Defense
Короткий адрес: https://sciup.org/148331949
IDR: 148331949 | УДК: 004.942 | DOI: 10.18137/RNU.V9187.25.03.P.106