Application of topic modeling methods to identify groups of internet resources in order to reduce the risk of cyber threats

Автор: Dontsov D.Y., Isaev S.V.

Журнал: Siberian Aerospace Journal @vestnik-sibsau-en

Рубрика: Informatics, computer technology and management

Статья в выпуске: 2 vol.23, 2022 года.

Бесплатный доступ

Internal network security is an important aspect of a successful enterprise. There are various means to prevent cyber threats and analyze visited Internet resources, but their speed and the possibility of applica-tion strongly depend on the volume of input data. This article discusses the existing methods for determin-ing network threats by analyzing proxy server logs, and proposes a method for clustering Internet re-sources aimed at reducing the volume of input data by excluding groups of secure Internet resources or selecting only suspicious Internet resources. The proposed method consists of 3 stages: data preprocessing, data analysis and interpretation of the results obtained. The initial data for the method are the proxy server log entries. At the first stage, data useful for analysis is selected from the source data, after which the con-tinuous data stream is divided into small sessions using the nuclear density estimation method. At the sec-ond stage, soft clustering of visited Internet resources is performed by applying the thematic modeling method. The result of the second stage are unmarked groups of Internet resources. At the third stage, with the help of an expert, the results are interpreted by analyzing the most popular Internet resources in each group. The method has many settings at each stage, which allows to configure it for any format and specif-ics of the input data. The scope of the method is not limited in any way. The resulting method can be used as an additional preprocessing step in order to reduce the amount of input data.

Еще

Topic-modeling, cyber security, data analysis

Короткий адрес: https://sciup.org/148329616

IDR: 148329616   |   DOI: 10.31772/2712-8970-2022-23-2-148-155

Список литературы Application of topic modeling methods to identify groups of internet resources in order to reduce the risk of cyber threats

  • Mouna J., Latifa B., Latifa B. R., Anis A. Classification of security threats in information sys-tems. // Procedia Computer Science. 2014. Vol. 32. P. 489–496.
  • Derendyaev D. A., Gatchin Yu. A., Bezrukov V. A. [Determining the influence of the human factor on the main characteristics of security threats]. Cybernetics and programming. 2019, No. 3, P. 38–42 (In Russ.).
  • Gyorodi R., Cornelia G., Pecherle G., Radu L. Network Security Using Firewalls. Journal of Computer Science and Control Systems. 2008, Vol. 1.
  • Kao D. Y., Wang S. J., Huang F. Dataset Analysis of Proxy Logs Detecting to Curb Propaga-tions in Network Attacks. Intelligence and Security Informatics. 2008, P. 245–250.
  • Marshall B., Chen, H. Using Importance Flooding to Identify Interesting Networks of Criminal Activity. Lecture Notes in Computer Science. 2006, Vol. 3975, P. 14–25.
  • Mukkamala S., Sung A. Identifying significant features fornetwork forensic analysis using artifi-cial techniques. InternationalJournal of Digital Evidence. 2003, Vol. 1, No 4.
  • Blei D. M. Probabilistiс topiс models. Communiсations of the ACM. 2012, Vol. 55, No. 4, P. 77–84.
  • Fei B., Eloff J., Oliver M., Venter H. Analysis of Web Proxy Logs. IFIP International Confer-ence on Digital Forensics. Orlando, 2006, Vol. 222, P. 247–258.
  • Scott D. W. Multivariate Density Estimation. Theory. Practice and Visualization: Second edi-tion. New York, 2015.
  • King T. L., Bentley R. J., Thornton L. E. et al. Using kernel density estimation to understand the influence of neighbourhood destinations on BMI. BMJ Open. 2016, Vol. 6.
  • Kalinic M., Krisp J. Kernel Density Estimation (KDE) vs. Hot-Spot Analysis – Detecting Crim-inal Hot Spots in the City of San Francisco. Lund, Sweden, 2018.
  • Vorontsov K. V. Obzor veroyatnostnykh tematicheskikh modelei [Overview of probabilistic thematic models]. Moscow, 2021. 112 p.
  • Albalawi R., Yeap T., Benyoucef M. Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis. Frontiers in Artificial Intelligence. 2020, Vol. 3.
  • Jelodar H., Wang Y., Yuan, Ch., Xia, F. Latent Dirichlet Allocation (LDA) and Topic model-ing: models, applications, a survey. 2017.
  • Tharwat A., Gaber T., Ibrahim A., Hassanien A. E. Linear discriminant analysis: A detailed tu-torial. Ai Communications. 2017, Vol. 30, P. 169–190.
Еще
Статья научная