用python爬取酷狗(网易云)音乐

用python爬取酷狗(网易云)音乐

(此文为小左1120原创,转载请标明出处,如违规请联系作者立刻删除)
此文为python爬虫文章,讲了用python爬取某狗里的歌名并下载(包括vip)后面带exe
在这里插入图片描述

2020/8/5

由于本人爬虫经验不足,代码可能会粗糙,有bug,麻烦大佬加个微信QQ一起学吧!代码失效了也给我通知一声(图片如果显示不出来的话请多次刷新,若问题仍存在,请告诉我换源)
微信:aicoder-XZ
QQ:2583188733
老规矩
配置好python
配置好pycharm
再安装依赖的模块:
点这里(感谢大神的文章)
接下来,按下win+R输入cmd-回车!在窗口里输入

1
pip install requests selenium

打开pycharm,
新建项目,(不会的话点这里
进去后,右键文件夹,
在这里插入图片描述
输入个你喜欢的名字,回车!

开始写代码吧!
先打开浏览器
访问某狗音乐下载(不要用ie)
在这里插入图片描述

随便搜索搜索看看网址
以《癞蛤蟆》为例
网址是

1
https://music.liuzhijin.cn/?name=癞蛤蟆&type=kugou

所以搜索的格式是

1
https://music.liuzhijin.cn/?name=歌曲名&type=kugou

好,咱先爬:

1
2
3
4
5
6
7
from selenium import webdriver
import time

name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)

这个网站有一点问题,底下有个载入更多,不能直接全部加载,
没有事的。。
按下F12
在出来的控制台的左上角点这个按钮
在这里插入图片描述
点击载入更多

ok,浏览器会定位到这样一条元素,

1
<div class="aplayer-more">载入更多</div>

右击-复制-完整的XPath(火狐是XPath)
在这里插入图片描述
在这里插入图片描述

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from selenium import webdriver
import time

name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("你的XPath").click()
except:
break

我的是/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]
所以就是

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from selenium import webdriver
import time

name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break

一切就绪,获取网页源代码

1
html = browser.page_source

控制台上看一看

看到一大堆li
在这里插入图片描述

挑一个(我搜索的Mojito)

1
2
3
4
5
6
7
8
9
10
11
12
<li>
<span class="aplayer-list-cur" style="background: #0e90d2;"></span>
<span class="aplayer-list-index">2</span>
<span class="aplayer-list-title">Mojito</span>
<span class="aplayer-list-author">许66</span>
</li>
<li>
<span class="aplayer-list-cur" style="background: #0e90d2;"></span>
<span class="aplayer-list-index">3</span>
<span class="aplayer-list-title">Mojito</span>
<span class="aplayer-list-author">音乐之声-SongTaste</span>
</li>

经过阅读知道

1
2
3
4
5
6
<li>
<span class="aplayer-list-cur" style="background: #0e90d2;"></span>
<span class="aplayer-list-index">歌曲排名</span>
<span class="aplayer-list-title">歌曲名</span>
<span class="aplayer-list-author">歌手</span>
</li>

呵呵呵,我们用正则表达式提取一下,顺便封装成方法

在这里插入图片描述

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#coding:utf-8
from selenium import webdriver
import time,re
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
return browser.page_source
html = geturl(url)
a111 = '''<li class="aplayer-list-light">
<span class="aplayer-list-cur" style="background: #0e90d2;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>
'''
a11='''<li>
<span class="aplayer-list-cur" style="background: .*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a='''<li>
<span class="aplayer-list-cur" style="background:.*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2

加上选择

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#coding:utf-8
from selenium import webdriver
import time,re
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
return browser.page_source
html = geturl(url)
a111 = '''<li class="aplayer-list-light">
<span class="aplayer-list-cur" style="background: #0e90d2;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>
'''
a11='''<li>
<span class="aplayer-list-cur" style="background: .*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a='''<li>
<span class="aplayer-list-cur" style="background:.*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2
song = 0
for i in data:
print(i)
num = input(">>")
for i in data:
if (str(i[0])==num):
song = i
url = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou"
print(geturl(url))

看下载链接的位置的代码是

1
2
3
4
5
<span class="am-input-group-btn">
<a id="j-src-btn" class="am-btn am-btn-default" target="_blank" href="(.*?)" download="(.*?)">
<i id="j-src-btn-icon" class="am-icon-download"></i>
</a>
</span>

我们通过搜索歌曲名 - 歌手来搜索并下载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#coding:utf-8
from selenium import webdriver
import time,re,requests
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
return browser.page_source
html = geturl(url)
a111 = '''<li class="aplayer-list-light">
<span class="aplayer-list-cur" style="background: #0e90d2;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>
'''
a11='''<li>
<span class="aplayer-list-cur" style="background: .*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a='''<li>
<span class="aplayer-list-cur" style="background:.*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2
song = 0
for i in data:
print(i)
num = input(">>")
for i in data:
if (str(i[0])==num):
song = i
url1 = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou"
a7='''<span class="am-input-group-btn">
<a id="j-src-btn" class="am-btn am-btn-default" target="_blank" href="(.*?)" download="(.*?)">
<i id="j-src-btn-icon" class="am-icon-download"></i>
</a>
</span>
'''
html1=geturl(url1)
a8 = re.findall(a7,html1)
r = requests.get(a8[0][0])
with open(a8[0][1], "wb") as code:
code.write(r.content)

接下来是优化环节:
我发现,用这东西乱蹦浏览器,我们使用无头模式,再加一个循环

完整代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#coding:utf-8
from selenium import webdriver
import time,re,requests
while True:
name = input("输入你想下载的歌曲名并按下回车(不输入并按下回车退出):")
if (name==''):
break
else:
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
options = webdriver.FirefoxOptions()
options.add_argument('-headless')
browser = webdriver.Firefox(options=options)
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
temp = browser.page_source
browser.close()
return temp
html = geturl(url)
a111 = '''<li class="aplayer-list-light">
<span class="aplayer-list-cur" style="background: #0e90d2;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>
'''
a11='''<li>
<span class="aplayer-list-cur" style="background: .*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a='''<li>
<span class="aplayer-list-cur" style="background:.*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2
song = 0
for i in data:
print(i)
num = input("请输入序号:")
for i in data:
if (str(i[0])==num):
song = i
url1 = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou"
a7='''<span class="am-input-group-btn">
<a id="j-src-btn" class="am-btn am-btn-default" target="_blank" href="(.*?)" download="(.*?)">
<i id="j-src-btn-icon" class="am-icon-download"></i>
</a>
</span>
'''
html1=geturl(url1)
a8 = re.findall(a7,html1)
r = requests.get(a8[0][0])
with open(a8[0][1], "wb") as code:
code.write(r.content)

使用示例:
在这里插入图片描述

exe文件下载地址:https://www.jianguoyun.com/p/DcMuR8YQs7rYCBjy1rED
-小左1120
这个程序费了我2天时间,求求你们点个赞,关个注呗

2年后的补充:

2022/10/4

大家好!版本已经失效了,我学c++,python忘完了,不过壳子能用。全新代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#coding:utf-8
from selenium import webdriver
import time,re,requests
while True:
name = input("输入你想下载的歌曲名并按下回车(不输入并按下回车退出):")
if (name==''):
break
else:
url = "https://music.liuzhijin.cn/?name=" + name + "&type=netease"
def geturl(url):
options = webdriver.FirefoxOptions()
options.add_argument('-headless')
browser = webdriver.Firefox(options=options)
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
temp = browser.page_source
browser.close()
return temp
html = geturl(url)
a111 = '''<li class="aplayer-list-light">
<span class="aplayer-list-cur" style="background: #0e90d2;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>
'''
a11='''<li>
<span class="aplayer-list-cur" style="background: .*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a='''<li>
<span class="aplayer-list-cur" style="background:.*?;"></span>
<span class="aplayer-list-index">(.*?)</span>
<span class="aplayer-list-title">(.*?)</span>
<span class="aplayer-list-author">(.*?)</span>
</li>'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2
song = 0
for i in data:
print(i)
num = input("请输入序号:")
for i in data:
if (str(i[0])==num):
song = i
url1 = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=netease"
a7='''<span class="am-input-group-btn">
<a id="j-src-btn" class="am-btn am-btn-default" target="_blank" href="(.*?)" download="(.*?)">
<i id="j-src-btn-icon" class="am-icon-download"></i>
</a>
</span>
'''
html1=geturl(url1)
a8 = re.findall(a7,html1)
r = requests.get(a8[0][0])
with open(a8[0][1], "wb") as code:
code.write(r.content)

现在网站上酷狗失效了(网站上失效的,网站复原了,代码应该也有效)不过网易云还能用,网易云伺候!
写本文时 selenium版本3.141.0

2023/7/28 更新

现在代码仍可用,目前selenium版本4.10.0,火狐115.0.3 (32 位),geckodriver版本0.33.0放于源码目录下运行,本代码应该不能下载vip曲目,望了解。


用python爬取酷狗(网易云)音乐
https://xiaozuo1120.github.io/2020/08/05/用python爬取酷狗音乐/
作者
小左1120
发布于
2020年8月5日
许可协议