Current location - Training Enrollment Network - Books and materials - How to capture popular comments of Netease Cloud Music with Python
How to capture popular comments of Netease Cloud Music with Python
The content shared in this article is a hot comment on how to climb Netease cloud music with Python, which has certain reference value and can be consulted by friends in need.

order

Recently, I have been studying the content related to text mining. The so-called clever woman cannot cook without rice. If you want to analyze the text, you must first get the text. There are many ways to obtain text, such as downloading ready-made text documents from the Internet or obtaining data through APIs provided by third parties. But sometimes the data we want cannot be obtained directly, because there is no direct download channel or API for us to obtain the data. So what should we do at this time? There is a better way to get the desired data through web crawler, which is to write computer programs to pretend to be users. With the high efficiency of computer, we can get data conveniently and quickly.

About reptiles

So how do you write reptile? There are many languages that can write reptiles, such as Java, php, python and so on. Personally, I prefer python. Because python not only has a powerful built-in network library, but also many excellent third-party libraries, others made wheels directly, so we can just use them, which brings great convenience to writing reptiles. It is no exaggeration to say that you can actually write a small crawler with less than 10 lines of python code, while you can write a lot of code in other languages. Simplicity is a great advantage of python.

Well, before it's too late, let's get down to business today. In recent years, Netease cloud music has become popular. I am a user of Netease Cloud Music myself, and I have used it for several years. I used to use QQ music and cool dogs. Based on my own personal experience, I think the biggest feature of Netease Cloud Music is accurate song recommendation and unique user comments (solemnly declare! ! ! This is not a soft article, not an advertisement! ! ! Only represents personal views, don't spray if you don't like it! )。 There are often some divine comments that are praised a lot under a song. In addition, Netease Cloud Music moved selected user comments to the subway a few days ago, and Netease Cloud Music's comments were on fire again. So I want to analyze Netease Cloud's comments, find out the rules, and especially analyze the characteristics of some hot reviews. With this goal in mind, I started grabbing comments from Netease Cloud.

online library

Python has two built-in network libraries, urllib and urllib2, but neither of them is particularly convenient to use, so here we use a well-received third-party library, requests. With the request, you only need a few lines of code to set up the proxy, simulate login and other complex crawler work. If pip is already installed, you can use pip installation request to install directly.

The address of Chinese document is here = (organic) | utmcmd = organic; playerid = 8 15689 1 1; _ _ utmb = 94650624 . 23 . 10. 1490672820 ",

Connection':' Keep Alive',

Quoted by ":/"}

# Set proxy server

Agent = {

Element (url):

hot_comments_list = []

Hot_comments_list.append(u "user ID user nickname user avatar address comment time likes total comment content")

Params = get_params( 1) # Page 1

encSecKey = get_encSecKey()

json_text = get_json(url,params,encSecKey)

json_dict = json.loads(json_text)

Hot _ comments = JSON _ dict ['Top Comments'] # Top Comments

Print ("* * * has %d popular comments!" % len (popular comment))

For items in popular comments:

Comment = item['content'] # comment content

LikedCount = item['likedCount'] # Always like numbers.

Comment_time = item['time'] # comment time (timestamp)

UserID = item[' User'] ['User ID'] # Commenter ID

Nickname = Project ['User'] ['Nickname'] # Nickname

Avatar website = project ['user'] ['avatar website'] # avatar address

Comment_info = userID+""+nickname+"+avatar URL+""+comment _ time+""+likedcount+""+comment+u" "

hot _ comments _ list . append(comment _ info)

Return to the list of popular comments

# Capture all comments on a song

Define get all comments (url):

All_comments_list = [] # Store all comments.

All_comments_list.append(u "user ID user nickname user avatar address comment time likes total comment content) # header information

params = get_params( 1)

encSecKey = get_encSecKey()

json_text = get_json(url,params,encSecKey)

json_dict = json.loads(json_text)

comments _ num = int(JSON _ dict[' total '])

if(comments_num % 20 == 0):