The recommender system is one of the best or successful applications of machine learning. A recommender system is basically a subclass of information filtering that seeks to predict user preferences or user ratings for some products or items.
A recommender system used in many areas such as generating a playlist for songs or videos on Netflix, Spotify, and YouTube. Amazon uses a recommender system to suggest a product to their customer i.e. “who bought this item also bought this”.
- 35% of Amazon.com’s revenue is generated by its recommendation engine.
- Netflix had announced $1 million in prices for 10% improvement in its existing recommender system in 2006.
- Content-Based (Knowledge-Based):
For the content-based recommender system, we use features of items for example for movie recommender system genre, artist, director, etc. In content-based filtering, we do not require past activities of users we use only user profile and metadata. One of the best examples of content-based filtering is Netflix.when you signup first time in Netflix, it does not have your preferences or past activity to compare you with other users so Netflix first asks your preferences like languages, genre, etc. based on that they show you first page. Then when you watch harry potter movie and like it then it came to know that you like fantasy so it would suggest some fantasy movies such as The Hobbit based on metadata of movie which you like or seen previously.
For collaborative-filtering, we use user past behaviour to suggests the next movie that means we try to find a similar user based on their ratings or like/dislike any movie or product. In collaborative-filtering, we try to find a similar user based on their past activities such as rating to a movie. Let us understand by example. Suppose we have a business-like Netflix and we need to build a recommender system based on user ratings to suggest the next movie.
Suppose we have 10X10 matrix i.e. user X movie and each cell contains ratings of the user to the movie. In real life, you have a very large matrix. Now let understand this matrix here 10 users and 10 movies each cell has rating correspond to user and movie. And u1= user 1 and m1=movie 1 and so on. user 1 rated 4 to movie 2 that means matrix[u1][m2] = 4. And - means no rating or not watched.
By analysing this matrix we came to know that user 5 and user 8 are almost similar as they rate the same to Movie 3, movie 4, movie 5, and movie 6.
Here user 8 already watched and rated movie 8 as 4 which means user 8 likes movie 8 so we can suggest movie 8 to user 5.
This is an example so we use a small matrix and a suggested movie. In real life, we have a very large matrix and float values to ratings and maybe did not find this type of exact match. here we use only integer values that’s why we can decide manually a similar user. While in production we have float value for that we need to consider 3.4 or 3.5 almost similar and so on.
For real-life we can use the following techniques:
Cosine-similarity: Cosine-similarity is a technique to measure the similarity between two non-zero vectors. Vector means we can assume our array of ratings to the movies. Mathematically it measures the cosine angle between two vectors.
Pearson Correlation: Pearson correlation is a measure of the linear correlation between two variables x and y.
Here in collaborative filtering, we do not care about movie genres, an actor we use simple logic if tow user A and user B likes similar movies than they both are similar. And if user A like movie M then most probably user B will like that movie.
With collaborative filtering, there are mainly three problems: Cold start: The term “cold start” derives from cars. When the engine is cold, the car is not yet working so smoothly, but once the optimal temperature is reached, it works just fine. For the recommender system, the cold start is a situation when we do not have sufficient data to recommend the movie. For example, if we start any movie streaming service we do not have sufficient user base and rating data to make a recommendation based on rating.
Sparsity: Most users do not give a rating to all movies which they watch. So our user-rating matrix will become sparse ( most of the values are zero). Using sparse matrix we can not find effective similarity.
Scalability: As you grow your data will grow. And our matrix will become large and it is difficult to calculate similarity on that large matrix.
- Hybrid Recommender System:
In a hybrid recommender system, we use both techniques to make a recommender system for example. make content-based filtering and collaborative filtering separately and then combine both to get the advantages of both techniques.
WebMob Technologies is always updating with the technologies that emerge in the market and work on it to create something groundbreaking!
Contact Us! To know more and discuss your idea with us.