A Survey on Hybrid Recommendation Engine for Businesses and Users
Автор: Spurthy Mutturaj, Shwetha B., Sangeetha P., Shivani Beldale, Sahana V.
Журнал: International Journal of Information Engineering and Electronic Business @ijieeb
Статья в выпуске: 3 vol.13, 2021 года.
Бесплатный доступ
Various techniques have been used over the years to implement recommendation systems. In this research, we have analyzed several papers and majority of them have used collaborative and content-based filtering techniques to implement recommender system. To build a recommender system, we require abundant amount of data which comprises of a spectrum of reviews and sentiments from all user domains. Websites like Yelp and TripAdvisor, allow users to post reviews for various businesses, products and services. In this work we have two objectives 1) Recommend restaurants to user based on user reviews in Yelp dataset and 2) Suggest improvements to business based on user reviews. In the first scenario, we will use the comments and ratings available in the Yelp dataset to generate restaurant recommendations and personalize them with user profile data. In the second scenario, we intend to suggest improvements to businesses based on various user reviews and provide them with a ranked list of predefined parameters to help them understand where they stand with respect to their competitors and where they should improve to do better. For both scenarios, we will perform two major steps to achieve our objective 1) Sentiment Analysis and 2) Content Based Recommendation. The first step gives us the - sentiment, quality, subject of discussion relevant to product and in the second step we use the outcomes of first step for personalizing and ranking our results. We came across Gensim and Latent Dirichlet Allocation which seemed to be interesting and was tailored to our requirements. In the yelp dataset, user comments are a mixture of various topics which are extracted by the algorithm (LDA) to provide accurate recommendation for all the users. A prototype of this method provided us with 93% accuracy.
Gensim, LDA, Recommendation System, Topic Modelling
Короткий адрес: https://sciup.org/15017782
IDR: 15017782 | DOI: 10.5815/ijieeb.2021.03.03
Текст научной статьи A Survey on Hybrid Recommendation Engine for Businesses and Users
Published Online June 2021 in MECS
-
1. Introduction
-
2. Methodology
Recommender systems are primarily commercial applications and they are subclasses of information filtering system.
These systems help in predicting rating or preference a user would give to an item. They are implemented in many areas and are most commonly used as playlist generators for music services and video, product recommenders for stores, or for open web content recommenders and social media websites. Recommender systems are also developed to explore a wide variety of areas.
Popular recommender systems use two main approaches in their implementation: Collaborative filtering and context-based filtering. Knowledge based recommender system have also been implemented. In Collaborative filtering technique, we build a model from a user’s past rating preferences as well as behavior observed in other users. This is then used to predict items that the user might like. Content-based filtering technique uses a list of labelled characteristics of an item to recommend similar items with related characteristics. Nowadays, recommender systems usually combine a few approaches to build a hybrid recommender system
This survey was conducted to explore the various methodologies that are available for 1) Recommending restaurants to users and 2) Suggest improvements to businesses based on user reviews.
The limitations and disadvantages in collaborative-based filtering, knowledge-based filtering, demographic filtering is summarized in [Table 1]. According to our requirements, content-based filtering seemed to be a better choice which overcame those limitations. Usually likes and dislikes of an individual vary. One customer might not like the product whereas another might enjoy it despite the fact that they both have like mindsets. So, content-based filtering (CBF) comes into picture. Here, the algorithm uses only the previous ratings of the individuals to personalize the content whereas the other techniques like collaborative filtering mainly uses reviews provided by other customers as well. This might act as an advantage or disadvantage as mentioned earlier.
The survey has helped us to build a prototype for our project in which we aim to build a hybrid recommender system (in the restaurant domain.) The accuracy when tested on a subset of yelp academic datasets has turned out to be 93%.
Both Restaurant Recommendation system and Restaurant Improvement Recommendation system use a hybrid recommendation approach based on Sentiment Analysis and Content Based Recommendations.
-
[1] For the Sentiment Analysis system, we intend to use statistical methods to capture elements of subjective style and the sentence polarity. We will study the following supervised machine learning algorithms in the context of Sentiment Analysis: K-Nearest Neighbor (K-NN), Naive Bayes, J48, BF Tree and One R and compare their overall accuracy, precisions as well as recall values on the Yelp dataset
-
[2] Content-based recommendation engine can take a user’s profile and past ratings to make new recommendations to the user. An item profile is created which contains the features of the item.
-
[3] From the Item profile, user profile is inferred which contains the weighted features of the item. Based on the user profile the user is recommended similar items containing the features present in Item profile.
-
[4] The Hybrid recommendation system will combine the output of Content Based recommendation system and Sentiment Analysis system to provide appropriate recommendation
-
3. Sentiment Analysis
It is a natural language processing method which is used to determine whether data is neutral, positive or negative. This analysis is usually applied on textual data to help businesses monitor their brand and product sentiment in customer feedback, and understand customer requirements. Sentiment analysis models focus on polarity, feelings and emotions, urgency and even intentions. In our project, we intend to use the opinions in the dataset and classify them.
-
A. Content Based Recommender System
This system works by the data that we take from the user, either it can be implicit or explicit. After collecting the data, we can create a user profile, which is then used to suggest to the user. The engine becomes more accurate as and when the user takes action.
User Profile : We create vectors that defines the user’s demands. While creating the profile, we use the utility matrix that defines the relation between item and user. With this data, the most accurate estimate we can implement regarding which item user likes, is an aggregation of profiles of those items.
Item Profile : In this Recommender, we have to build an item profile, which represents the prime characteristics of that item. For example, if we make a food as an item then its consumers, cuisines, diet and quality are the most significant features of the hotel. We can also add its rating from the yelp reviews here.
Utility Matrix: This Matrix indicates the user’s predilection with certain items. In the information gathered from the user, we must find some relation between the items which are liked and disliked by the user. In it we allocate a specific cost to each user-item pair, this cost is known as the degree of preference. Then we draw a utility matrix of the user with corresponding items to recognize their preference relationship.
-
B. Gensim
It is an unsupervised topic modeling technique and an open-source library NLP system, using current statistical machine learning concepts. Gensim has been built and is implemented using Cython and Python. It is designed to handle large volumes of text using incremental online algorithms and data streaming, which sets it apart from other machine learning algorithms and software packages which targets only in memory processing.
The vital advantages of Gensim are as follows:
We might get the amenities of word embedding and topic modeling in other packages like scikit learn, but the services provided by Gensim for building word embedding and topic models is unmatched. It even provides more suitable facilities for text processing.
Second important advantage of Gensim is that, it lets us handle huge text files even without loading the whole file into the memory.
Thirdly, it doesn’t require expensive comments or hand tagging of documents because it practices unsupervised models.
-
C. Topic Modelling with Gensim
Topic Modeling is a widespread method to extract the hidden topics from large volumes of textual data. Latent Dirichlet Allocation is a popular algorithm for topic modeling with exceptional applications in the Gensim package. However, the main challenge is to extract decent quality of topics that are meaningful, segregated and clear. It basically depends on the quality of text pre-processing and the approach used to find the ideal number of topics.
-
D. Latent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation’s approach to topic modelling is it contemplates each document as a collection of keywords and topics in a certain proportion.
After we provide the algorithm with the number of topics, it reorders the topic distribution within the documents and keywords distribution to get a good composition of topic-keywords distribution. A topic is a group of dominant keywords which acts as typical representatives of the item.
The major factors to obtain good segregation topics are:
-
1. The superiority of text processing.
-
2. The diversity of topics the text represents.
-
3. The algorithm used for topic modeling.
-
4. The tuning parameters used in the algorithm.
-
5. The number of topics we feed into the algorithm.
-
4. Literature Survey
To summarize, we can use a database to store the user reviews and clean it i.e., split it into sentences and remove stop words, filter out all words which aren’t nouns and lookup the lemma of each nouns and store it in a final database. This is then used to train the Gensim LDA model keeping the most frequent tokens and using a few topics. We then display the output of the previous step to visualize how a given text has the topics distributed in it. It basically gives us “basket(s)” in which we can position the text.
Brief literature of papers which emphasize on Recommender System over the years as well as different methods implemented.
-
[1] Paper focuses on a sentiment-focused web crawling platform to enable the rapid discovery and analysis of sentimental content from movie and hotel reviews. The statistical techniques are used to capture aspects of the polarity of subjective types and phrases. Two supervised machine learning algorithms are discussed in detail in [1], K-Nearest Neighbor (K-NN) along with Naïve Bayes with their overall precision, accuracy and recall values are compared. Naive Bayes ’showed much better results than K-NN in the case of movie reviews, but these algorithms give lower, at most the similar accuracies for hotel reviews. From this paper we can conclude that Naïve Bayes classifier provides accurate results compared to KNN algorithm.
-
[2] For the purpose of recommending products to buy or examine, recommender systems reflect user interests. In electronic commerce and information access, they have become essential applications, offering recommendations that efficiently prune broad spaces of information so that consumers are led towards those items that best suit their needs and desires. Through this paper we can state that Recommender systems are broadly used as the user or consumer’s interests are prioritized.
-
[3] For conducting recommendations, a number of techniques have been suggested, including content-based, collaborative, knowledge-based and other techniques. These approaches have often been integrated into hybrid recommenders to boost
The overall performance.
The proposed content- based filtering and hybrid collaborative filtering perform considerably superior than Content-based, collaborative, and combined filtering Approach within a collaborative framework. The above addresses the lacks of collaborative filtering and content-based filtering, content-based collaborative filtering and vice versa. In consideration, any changes in collaborative filtering or content-based filtering can be easily used to create a powerful enhanced recommendation framework due to the nature of the method.
The customer feedback of Sentiment analysis has a critical influence on the growth strategy of a company. Given that a review repository has been evolving with time, sentiment analysis usually depends on offline solutions where training data set is collected until a model is constructed. The best alternative approach for this learning is Incremental learning to stop retraining the entire model over time. A version of online random forests is implemented in this work to perform sentiment analysis on customer feedback. Similar to offline approaches and equivalent to other online models, the above model will be able to achieve the accuracy.
-
[4] Machine learning techniques for filtering unnoticed information are used by suggested systems and will be able to predict whether a user will rely on a given resource or not. Three key forms of recommendation system are collaborative filtering, content-based filtering, and systems of demographic recommendation. Recommended collaborative filtering systems recommend products by considering the taste of users, believing the users might be interested in items which have been highly rated by alike users.
Recommendation systems for content-based filtering advocate items based on an item’s written information, believing that users would prefer parallel items to the ones they were fond of. Demographic recommendation schemes identify consumers or goods constructed on their personal features and make recommendations based on demographic categorizations. Such systems suffer from problems with scalability, data sparsity, and cold-start, resulting in weak recommendations for consistency and reduced coverage. In this paper, by merging the rank, purpose, along with demographic information about objects, a specific cascading hybrid recommendation approach is proposed. Eventually we can use [Table1] collaborative filtering, content-based filtering, and systems of demographic recommendation to segregate the overlooked information on which users can rely on.
-
[5] The traveler’s feedback replaces the viva-voce, but then the time-consuming chore of searching based on user fondness becomes tuff. Reviews taken after the places visited by explorers are a popular then useful source of knowledge for a hotel’s recommendation, but slight consideration has been given to in what way to present a reviewer’s review in a comprehensible way. The aim of the above paper is to mention the names of hotels to travelers based on their preferences by examining the feedback of various travelers along with the ratings.
To resolve the above problem, a context-aware hybrid approach is preferred to provide customized hotel recommendations when the collaborative filtering technique sums up with the sentimental analysis, then this featurebased sentiment analysis is carried out in which the weights of each of the variable are accurately determined to determine its orientation score.
-
[6] Since Recommender Systems have become a sort of network intelligence techniques to scan through the massive amount of knowledge existing in the digital data, they have become a significant research field. In most recommended schemes, collaborative filtering and content-based techniques are two methods that are utmost used. Almost all the methods have pros and cons in delivering superior recommendations, in certain cases, a hybrid recommendation mechanism combining workings from both approaches will achieve acceptable results.
This paper presents a well-designed and efficient structure for integrating collaboration and content. In order to optimize current user information and item related data, here we use a content- based predictor, and then offers custom-made feedback through user-based collaborative filtering and item-based collaborative filtering. The planned framework clusters on a content-based approach and collaborative approach and then centrals to the enhancement of a hybrid recommender system's prediction efficiency. We can achieve optimized results by integrating Collaborative filtering and content-based technique.
In order to construct an analytical model of customer restaurant ratings, the concepts and techniques of recommendation systems are applied. Using Yelp's dataset, collaborative and content-based features are extracted to classify profiles of customers and restaurants.
-
[7] Implements singular value decomposition, K-nearest neighbor clustering hybrid cascade, weighted bi-partite graph projection, and many additional learning algorithms in particular. Using Mean Squared Error and Mean Absolute Error Root metrics, the analysis and comparison of the efficiency ofthe algorithms is done.
On the source of a single principle, Conventional Recommender Systems recommend items, although Multi Principles approaches take several different measures for particular item. While Recommender Systems have a consisting precision for Multi Criteria, methods used by [8] enable many former users to first rate items in relation to these principles. For each object, it is practically not possible to offer user ratings for every distinct dimension. [8] introduces a Multi Criteria Recommendation Framework for Hotel Recommendations to pick the well-matched restaurant in a city based on the preferences of users and feedback of other users.
The authors of [8] use various approaches to Natural Language Processing on a Hotel Review Mass to assess the ranking of a hotel from preceding users with regard to different criteria and creates a user-item-feature database. For Text Messaging Language Issue, it also discusses the Cold Start issue when extracting user feedback. Hybrid recommendation system acts as a key method to solve the current single recommendation system defects. Further, the hybrid recommendation system will play a dynamic role in the recommendation system.
-
[9] The recommendation framework as a common solution came into our everyday lives with the popularity of the Internet and an increasingly varied product, which helps an assistant decision making when we buy something on the internet. The Traditional recommendation method is based on the collaborative filtering algorithm of the user, and in order to get good results, Amazon suggested a collaborative filtering algorithm.
A personalized recommendation system model based on users and objects is proposed, through the analysis of two types of conventional algorithms. On the MovieLens-100K data collection, tests were then conducted and the findings of the recommendation were analyzed. The performance was increased compared to the standard collaborative filtering algorithm. In this paper, we analyzed and compared the boundaries of different existing recommendation systems and how we can implement them in our project.
An overview of the recommendation system is illustrated in [10]. A sub-part of the field of data mining is the recommendation method. This is the e-commerce industry age. To support the organization in implementing one-to-one marketing campaigns, recommendation systems are used. These kinds of strategies provide many benefits, such as creating consumer loyalty, raising the possibility of cross-selling, meeting customer needs by offering customer interest items or goods. In many applications on the network, the recommendation framework (RS) is essential. The system of recommendations is generally categorized into three categories: content-based, collaborative and hybrid approaches. Different categories have their own benefits and disadvantages. The various techniques in each category and the problems in each category are described in this paper.
-
[11] Introduces a collaborative filtering-based hybrid recommendation algorithm and Word2Vec, based on the conventional recommendation technology used in a wide-plot collection of mobile data, which is hard to keep up with the correctness and efficiency of the recommendation. The traditional collaborative filtering algorithm is crossed with MapReduce framework and Hive database.
To get the resemblance within the labels and the applications of the consumer Word2Vec model can be used to train the information in the tag present in the data. The results of the recommendation are combined with weight, according to the user's feedback actions.
The experimental findings show that the hybrid recommendation algorithm significantly increases recommendation performance and accuracy and makes it more reliable in a wide picture collection of data. Thus, the recommendation concept of Word2Vec based is better than traditional based recommendation algorithm.
The latent subtopics discovered by running an online Latent Dirichlet Allocation (LDA) algorithm are defined from Yelp restaurant reviews in [12]. The aim is to point out the consumer desire, with high dimensionality, from a large number of reviews. Such topics may provide restaurants with meaningful insights into what clients care about in order to improve their Yelp scores, which directly impact their sales. With over 158,000 restaurant reviews, we used the open dataset from the Yelp Dataset Challenge.
Online LDA, a generative probabilistic model for sets of discrete data such as text corporas, to find latent subtopics from feedback. In all reviews, it presents the breakdown of hidden topics, forecast stars for hidden topics found, and expand our results to that of temporal knowledge about peak hours of restaurants. Overall, this paper uncovers some fascinating observations and methodology. So Latent Dirichlet Allocation algorithm focuses on user reviews and will be able to isolate the areas of interest for restaurants.
-
[13] Knowledge-based recommendations are based on practical knowledge: they understand how a specific item serves a specific consumer need and can therefore reason about the relationship between that need and a potential recommendation. A user must complete a basic questionnaire about his skills, interests, and potential goals, among other things, and then receive suggestions based on his responses.
-
[14] The use of a recommendation system is critical when looking for information on the internet. By delivering the best providers, the recommender method solves the issue of information overload and improves consumer correlation. In content-based filtering, there is an outlier navigation pattern that causes the recommender system to be unable to recognize users’ strong preferences, resulting in unnecessary recommendations. These issues are more mitigated by using a hybrid approach.
-
[15] This paper presented an analysis of evaluations of TEL recommendation systems in this paper, raising many questions, problems, and challenges that these systems face. We also discussed related innovations and suggested a personalized learning content recommendation system architecture. By making applicable recommendations, the proposed architecture has strong characteristics in advising students to select suitable learning materials for their tests.
-
5. Comparative Study
-
6. Conclusion
Table 1
Method |
Advantages |
Disadvantages |
Collaborative filtering system |
No necessary domain knowledge: embedding is automatically learned. The Serendipity: The model can assist users in discovering new interests. In isolation, the ML system may not know that a given item is of interest to the user. Terrific starting point: The system only requires the feedback matrix to train a matrix to some extent. |
Fresh items cannot be handled: The model prediction for a given pair is the dot product of the corresponding embeddings. So, if an item is not seen during training, it cannot be embedded by the system and cannot query the model with this item. The cold-start issue is often called this problem. |
Content-based filtering system |
Since the suggestions are unique to this individual, the model doesn't need any data about other users. This makes it easy for a large number of users to scale up. The model can capture a user's particular interests, Niche products that very few other users are interested in, and may recommend them. |
Since the function representation of the products is to some degree hand-engineered, a lot of domain expertise is needed for this methodology. The model can only make suggestions based on the user's current interests. |
Hybrid recommender systems |
They incorporate two or more methods of advice to achieve efficiency with less of any of their disadvantages. Combination of Features: It helps the system to accept collaborative data without relying solely on it, so it reduces the system's sensitivity to the number of users who have valued an object. Cascade: The primary benefit of using a cascade recommendation system is that it helps the system to dodge using the second, lower priority, approach on goods that are already well-differentiated from the first. |
It remains a traditional challenge to deliver more reliable and user-oriented assessments. In addition to this, new problems have also been established, such as reacting to user context variations, shifting user preferences or offering cross-domain recommendations |
Demographic recommender system |
In collaborative and content-based recommender schemes, this does not include a background of user reviews like that. Cross-genre niches may be established. Unneeded domain awareness. Adaptive: Overtime increases productivity. |
Faces New ramp up issue for consumers. Quality depends on the large dataset of historical data. Stability vs. There are issues with plasticity. |
Knowledge-based recommender system |
No needed ramp up. Responsive to preferential shifts. Can contain non-product characteristics Might map products from consumer needs. |
Can be mapped from user needs to different products but requires engineering knowledge. |
Summary of comparative study, the basic concept behind CF-based algorithms is to make item suggestions or forecasts based on the opinions of other users who have similar interests. Users' views may be derived directly from them or by the use of certain tacit controls. Based on the user's previous activities or explicit reviews, content-based filtering uses item features to suggest other products that are close to what they want.
Hybrid recommender systems incorporate two or more recommendation methods in a variety of ways to take advantage of their synergistic benefits. We may conclude that using hybrid recommendation strategies in learning environments can be beneficial because precision measurement is improved.
The Demographic Recommender system is designed to classify users based on their characteristics and make suggestions based on demographic groups. Many companies have adopted this strategy because it is not too complicated or difficult to execute. Knowledge-based recommendations are based on practical knowledge, they understand how a specific item serves a specific consumer need and can therefore reason about the relationship between that need and a potential recommendation.
Customer Experience is the key to business prosperity. Websites allow users to post online reviews for various businesses, products and services. In this project, we intend to recommend restaurants to user as well as suggest enhancements to business based on user reviews. This hybrid recommendation approach is based on Sentiment Analysis and Content Based Recommendations. Recommendation engines have the capacity to change the way websites communicate with users and to allow companies to maximize their Return-on-investments based on the data they can collect on each customer’s predilections and procurements.
As mentioned earlier, the prototype of Content-based Recommender System provided us with 93% accuracy. We hope to improve its efficiency in the main implementation of the project.
Our approach focuses on Gensim and LDA which have not been extensively used for recommender systems. Topic modeling is another aspect which we would like to explore to visualize the data.
Список литературы A Survey on Hybrid Recommendation Engine for Businesses and Users
- Lopamudra Dey, Sanjay Chakraborty, Anuraag Biswas, Beepa Bose, Sweta Tiwari,"Sentiment Analysis of Review Datasets Using Naïve Bayes' and K-NN Classifier", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.8, No.4, pp.54-62, 2016. DOI: 10.5815/ijieeb.2016.04.07.
- Prof Vipul vekariya and Dr G R Kulkarni. Hybrid Recommender systems: survey and Experiments. Journal of information, knowledge and research in computer engineering 2012.
- Tri Doan and Jugal Kalita. Sentiment Analysis of Restaurant Reviews on Yelp with Incremental Learning. 2016 15th IEEE International Conference on Machine Learning and Applications.
- Mustansar Ali Ghazanfar and Adam Prugel-Bennett School of Electronics and Computer Science University of Southampton. A Scalable, Accurate Hybrid Recommender System. 2010 Third International Conference on Knowledge Discovery and Data Mining.
- Khushbu Jalan and Prof. Kiran Gawande. Context-Aware Hotel Recommendation System based on Hybrid Approach to Mitigate Cold-Start-Problem. International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017).
- Sutheera Puntheeranurak and Hidekazu Tsuji. A Multi-Clustering Hybrid Recommender System. Seventh International Conference on Computer and Information Technology. © 2007 IEEE conference. Sumedh Sawant, Gina Pai. Yelp Food Recommendation System.
- Yashvardhan Sharma, Jigar Bhatt, Rachit Magon A Multi Criteria Review-Based Hotel Recommendation System. 2015 IEEE International Conference on Computer and Information Technology, Ubiquitous Computing Communications Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.
- Yannan Song, Shi Liu, Wei Ji. Research on Personalized Hybrid Recommendation System. Published in: 2017 international conference on computer, Information and Telecommunications System (CITS).
- Lipi Shah, Hetal Gaudani and Prem Balani. Survey on Recommendation System. International Journal of Computer Applications Volume 137 March 2016.
- Yao Xiao, Quan Shi. Research and Implementation of Hybrid Recommendation Algorithm Based on Collaborative Filtering and Word2Vec. 2015 8th International Symposium on Computational Intelligence and Design.
- James Huang, Stephanie Rogers, Eunkwang Joo. Improving Restaurants by Extracting Subtopics from Yelp Reviews. In Conference 2014 (Social Media Expo).
- Ya-han hu, ju lee, kuanchin chen, j. michael tarn, duyen-vi dang. hotel recommendation system based on review and context information: a collaborative filtering appro. (2016). pacis 2016 proceedings.
- Richa Sharma, Sharu Vinayak, Rahul Singh,"Guide Me: A Research Work Area Recommender System", International Journal of Intelligent Systems and Applications(IJISA), Vol.8, No.9, pp.30-37, 2016. DOI: 10.5815/ijisa.2016.09.04 .
- Santosh Kumar, Varsha," Survey on Personalized Web Recommender System", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.10, No.4, pp. 33-40, 2018. DOI: 10.5815/ijieeb.2018.04.05.
- Thoufeeq Ahmed Syed , Vasile Palade , Rahat Iqbal and Smitha Sunil Kumaran Nair. A Personalized Learning Recommendation System Architecture for Learning Management System. In Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. 2017 MECS.