Current location - Training Enrollment Network - Books and materials - Analysis of Douban Reading Recommendation Module
Analysis of Douban Reading Recommendation Module
I am used to it now. After seeing any book I am interested in, I will go to Douban to check the score and evaluation of the book, help me understand the book and see if it suits me.

In the process of use, I also found that the recommendation of reading Douban is actually not very ideal, which is just an assignment of this strategic product category.

Through the methodology of strategic product manager and descriptive statistics, this paper explores the law of Douban book recommendation module through data analysis.

* Description: The module of this analysis is on the details page of Douban Book: People who like this book also like it. ...

, as shown:

Ideally, the books recommended by this module to users are books with strong relevance to current books, which can arouse users' interest and bring surprises to users.

Ideally, the user's recommendation module on the book page is to find books that are strongly related to this book, that is, similar good books. Similar good books may be similar in content, type, series or books by the same author. At the same time, I haven't read this book, which can surprise me.

The purpose of this analysis is to find out that people who like this book in Douban reading also like the recommendation strategy of this module, and find out the possible problems of this strategy.

This information is collected from books recommended below 10. By selecting the labels of books, this paper analyzes and explores the recommendation strategies of Douban books. (Due to limited time and manual data collection, there are a large number of tags, so the data sample is 10. It may be different from the real douban recommendation strategy, but this analysis can also glimpse some problems in its strategy. )

Assuming that the overall data is normally distributed, the sample is randomly selected. Due to the limited time, this sample is 10 books, and the total number of recommended books for 10 books is 86.

The selected labels are: book title, book type, publisher, publication time, author, series, score, label and bean column.

According to statistics, users who like the books under each book also like the relevance between the books recommended under this module and the books on the label.

Screenshot of some data:

Statistic the tag data of books recommended by 10, and get the following data. Some screenshots are as follows:

Before analyzing the samples, we made the following assumptions about the recommendation strategy of Douban, that is, users who like this book also like this module:

Because of the particularity of books relative to film and television types, users of reading channels are more willing to get the advice of experts in this field, and users need the recommendation of experts similar to themselves. Therefore,

According to this idea, assuming that Douban users who like this book at present also like this module, the recommendation strategy is as follows:

The following assumptions are analyzed.

Explore data sets and make descriptive analysis.

Note: The following screenshots are all part of the overall data and do not represent the overall data. The attachment of the overall data is at the end.

* * * 86 books were recommended, with an average score of 8.5.

The data shows that the average score of books is 8.5, and the total score of Douban books is 10. 8.5 is a relatively high score. However, due to the small number of samples, it is impossible to accurately define whether 8.5 is a high score, and it can only be subjectively judged as a high score.

The total number of books recommended in 10 is 86, among which 27 books are recommended by the same author, accounting for 3 1.40%. Some screenshots are as follows:

From the above data analysis, it can be seen that there may be some correlation between the recommended books and the authors.

Of the 86 recommended books, only 13 are in the same series as the sample books, accounting for 15. 12%. Some screenshots are as follows:

The data show that the correlation between the same series and the recommended books is low.

In the sample, A Brief History of Mankind belongs to the historical type, but none of the five books recommended matches its type. Similarly, Understanding Business is a business book, but among the recommended 10 books, 8 books do not match its type.

The book "Difficult to Start a Business" belongs to the category of economic management, but five of the six recommended books do not match its type.

As can be seen from the above, the recommended books have little to do with the types of books.

Of the 86 recommendation books, only three do not belong to the same label, that is, the ratio of recommendation books with the same label to current books is 96.5 1%.

Therefore, there is a strong correlation between the recommended books and the book labels.

Further observation shows that each book has at least dozens of labels, and only 7-8 popular labels are displayed on the book details page. The more people tag, the higher the display. Therefore, it can be seen that the number of times the tag is marked is one of the recommended weights.

However, whether there is artificial management in the naming of labels is unknown for the time being, and this article will not explore it.

Of the 86 recommended books, 56 are the same as the current books, accounting for 65. 12%.

Therefore, there is a strong correlation between bean column and recommended books.

In addition, it is observed that there is a big gap between the collection number and recommendation data of the recommended bean column and other collections and recommendations containing the book, so the higher the popularity, the more likely it is to be recommended.

After testing, no matter whether I log in or not, or log in with someone else's account, the books recommended by Why Home Hurts People are consistent. Therefore, the recommendation result has nothing to do with the personalized behavior of users, and the recommendation module is not personalized.

Based on the above descriptive analysis, the following conclusions are drawn:

Note: Due to the small number of samples, the above conclusions are all univariate exploration, which may be different from the actual situation. Please be kind to more data for further verification.

According to the above analysis conclusions, the following questions are obtained:

According to the data and consulting materials, it is found that the recommended algorithm for reading Douban is CF, which is based on the similarity of item features. That is to contact users and items through some characteristics, and recommend those items with characteristics that users like, that is, labels and bean columns. The recommendation results are based on the same set of popular tags and bean columns.

The result of this is a hot cluster effect, which makes the recommended position easily occupied by several hot jobs for a long time. In the long run, it will cause two problems:

Some books have little to do with this popular book. For example, in the book Growing Hacker, we can see that the recommended book contains Revelation. Strictly speaking, the book Growing Hacker is actually operating, while Revelation is a product or management book, which is very different from the type of Growing Hacker.

For another example, the recommended books I saw under the book Understanding Business are as follows:

These recommended books are different from the types of knowing and doing, and as a user who wants to learn business knowledge, what is actually more valuable to them is a good book with high scores similar to knowing and doing, so the recommended result is actually not ideal.

Wu Zhihong's books and Wu Jun's books are almost all their own books.

The books marked in the picture are all books by the same author as the current books, which appear too frequently, but there are fewer books by other authors.

It is observed that the recommendation results include different versions of the same book, but Douban has actually summarized the data of long reviews and short reviews of the current books, that is, the short reviews and long reviews of different versions of the same book are the same. So there is no need to recommend different versions.