Developing intelligent assistants to searchfor content on websites of a certain genre

Rublev V.D., Sidorova E.A.; Рублев Владислав Дмитриевич, Сидорова Елена Анатольевна

doi:10.14529/cmse220404

Developing intelligent assistants to searchfor content on websites of a certain genre

Автор: Rublev V.D., Sidorova E.A.

Журнал: Вестник Южно-Уральского государственного университета. Серия: Вычислительная математика и информатика @vestnik-susu-cmi

Статья в выпуске: 4 т.11, 2022 года.

Бесплатный доступ

This paper discusses an approach to automatic generation of intelligent assistants, which provide information search on the content of a website. A feature of the approach is to use genre models, developed for a given type of resource (educational, informational, etc.), on the basis of which the genre structuring and subsequent thematic clustering of the content of the target website is performed. The resulting genre structures allow us to define more precisely the boundaries of thematic clusters related to the topic of the user’s search query. The search quality evaluation for the Russian-language websites showed an F-score of 87.8% and originality of 80.9%, which exceeds the Yandex search engine results by 1.1% and 9.1%, respectively. In order to predict user information needs, a method for refining the resulting sample is proposed. It allows a user to get information implicitly, based on current and previous queries, about what the user was not satisfied with in the previous search results. A model of user’s search intentions has been developed and its computational component includes a method for evaluating query closeness based on the FRiS function. Based on the proposed methods, a chatbot was created on the Telegram messenger platform to search the websites of educational institutions. The experiments showed that the user needs the average of 1.75 qualifying questions to find the necessary information.

Еще

Information retrieval, intelligent assistant, website genre model, thematic analysis, information retrieval system, user search intent model

Короткий адрес: https://sciup.org/147239438

IDR: 147239438 | УДК: 004.912 | DOI: 10.14529/cmse220404

Разработка интеллектуальных помощников для поиска по контенту веб-сайта определенного жанра

В данной работе предлагается подход к созданию интеллектуальных помощников в виде чат-ботов, поддерживающих информационный поиск на основе модели намерений пользователя, предварительной жанровой и тематической кластеризации контента веб-сайта. Особенностью подхода является использование жанровых моделей, разрабатываемых для заданного типа ресурса (образовательный, информационный и т.п.), на основе которых осуществляется жанровая структуризация контента конкретного сайта. Полученные жанровые структуры позволяют более точно определять границы тематических кластеров, относящиеся к теме поискового запроса пользователя. Оценка качества поиска по сайту НГУ показала F-меру 87.8% и оригинальность 80.9%, что превосходит результаты поисковой системы Яндекс на 1.1% и 9.1% соответственно. С целью повышения качества информационной поддержки пользователя разработана модель поисковых намерений пользователя, которая позволяет неявно получить информацию о том, что пользователя не устроило в поисковой выдаче и уточнить новый поисковый запрос. В практической части работы реализован чат-бот на платформе мессенджера Telegram для информационного поиска по сайтам образовательных организаций. Проведенные эксперименты показали, что пользователю в среднем требуется 1.75 уточняющих вопросов для нахождения необходимой информации.

Еще

Список литературы Developing intelligent assistants to searchfor content on websites of a certain genre

Mehler A., Sharoff S., Santini M. Genres on the Web. Computational Models and Empirical Studies. Dordrecht, Springer, 2010. 362 p.
Dong L., Watters C., Duffy J., Shepherd M. An Examination of Genre Attributes for Web Page Classification. Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS’08). IEEE, 2008. P. 133-143. DOI: 10.1109/HICSS.2008.53.
Kutovenko A. Professional internet search. St. Petersburg, Piter Publishing House, 2011. P. 70-73. (in Russian)
Osinski S., Weiss D. Carrot2 Project. Carrot2 - Open Source Search Results Clustering Engine. URL: http://project.carrot2.org/ (accessed: 30.08.2022).
Kutovenko A. Professional internet search. St. Petersburg, Piter Publishing House, 2011. P. 74-77. (in Russian)
Official website of the question and answer search engine AskNet. URL: http://asknet.ru/ (accessed: 30.08.2022). (in Russian)
Radhakrishnan A. Hakia’s Semantic Search: The Answer to Poor Keyword Based Relevancy. Search Engine Journal. URL: https://www.searchenginejournal.com/hakias-semantic-search-the-answer-to-poor-keyword-based-relevancy/5246/ (accessed: 30.08.2022).
Introducing the Knowledge Graph: things, not strings. URL: https://blog.google/products/search/introducing-knowledge-graph-things-not (accessed: 30.08.2022).
The Palekh Algorithm: how neural networks help Yandex search. URL: https://yandex.ru/ blog/company/algoritm-palekh-kak-neyronnye-seti-pomogayut-poisku-yandeksa (accessed: 30.08.2022). (in Russian)
Technical Approaches for Building Conversational AI. URL: https://www.topbots.com/ building-conversational-ai / (accessed: 30.08.2022).
Nimavat K., Champaneria T. Chatbots: an overview of types, architecture, tools and future possibilities. International Journal for Scientific Research and Development. 2017. Vol. 5, no. 7. P. 1019-1024.
Wu Y., Wu W., Xing C., et al. Sequential Matching Network: A New Architecture for Multiturn Response Selection in Retrieval-based Chatbots. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017), Vancouver, Canada, July 30 -August 4, 2017. P. 496-505. DOI: 10.18653/vl/P17-1046.
Kapociute-Dzikiene J. A Domain-Specific Generative Chatbot Trained from Little Data. Applied Sciences. 2020. Vol. 10, no. 7. Article no. 2221. DOI: 10.3390/appl0072221.
Cuayâhuitl LL, Lee D., Ryu S., et al. Ensemble-based deep reinforcement learning for chatbots. Neurocomputing. 2019. Vol. 366. P. 118-130. DOI: 10.1016/j.neucom.2019.08.007.
Kim S., Kwon O.-W., Kim H. Knowledge-Grounded Chatbot Based on Dual Wasserstein Generative Adversarial Networks with Effective Attention Mechanisms. Applied Sciences. 2020. Vol. 10, no. 9. P. 3335. DOI: 10.3390/appl0093335.
Bahtin M.M. The problem of speech genres. Jestetika slovesnogo tvorchestva (Aesthetics of Verbal Creation). Moscow, Iskusstvo, 1986. P. 250-296. (in Russian)
Kononenko I.S., Sidorova E.A. Genre aspects of website classification. Software Engineering. 2015. Vol. 8. P. 32-40. (in Russian)
Sidorova E.A. A comprehensive approach to the study of lexical characteristics of the text. Vestnik SibGUTI. 2019. Vol. 3. P. 80-88. (in Russian)
MacQueen J.B. Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, 1967. P. 281-297.
Guo J., Hartung S., Komusiewicz C., et al. Exact algorithms and experiments for hierarchical tree clustering. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010. AAAI Press, 2010. P. 1-6.
Manwar A., Mahalle LL, Chinchkhede K., et al. A vector space model for information retrieval: a MATLAB approach. Indian Journal of Computer Science and Engineering. 2012. Vol. 3. P. 222-230.
Rendon E., Abundez L, Arizmendi A., et al. Internal versus external cluster validation indexes. International Journal of computers and communications. 2011. Vol. 5, no. 1. P. 27-34.
Liu Y., Li Z., Xiong LL, et al. Understanding of internal clustering validation measures. IEEE International Conference on Data Mining, Sydney, NSW, Australia, December 13-17, 2010. IEEE, 2010. P. 911-916. DOI: 10.1109/tsmcb.2012.2220543.
Arbelaitz O., Gurrutxaga L, Muguerza J., et al. An extensive comparative study of cluster validity indices. Pattern Recognition. 2013. Vol. 46. P. 243-256. DOI: 10.1016/j.patcog.2012.07.021.
Rousseeuw P.J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics. 1987. Vol. 20. P. 53-65. DOI: 10.1016/0377-0427(87)90125-7.
Zagoruiko N.G., Borisova I.A., Kutnenko O.A., Dyubanov V.V. Constructing a compressed description of data using the competitive similarity function. Industry math. 2013. Vol. 16, no. 1. P. 275-286.
Telegram Bot API. URL: https://core.telegram.org/bots/api (accessed: 30.08.2022).
Manning C. D., Raghavan P., Schütze H. Introduction to Information Retrieval. Cambridge University Press, 2008. P. 151-175. DOI: 10.1017/CB09780511809071.

Еще