Python爬取網站視頻資源

思路：

在界面找到視頻對應的html元素位置，觀察發現視頻的url為https://www.pearvideo.com/video_+視頻的id，而這個id在html中的href中，所以第一步需要通過xpath捕獲到所需要的id

在https://www.pearvideo.com/video_+id的頁面，通過控制臺查看返回的響應消息，發現沒有視頻數據，說明視頻是進入頁面后由其他請求發起獲得

在搜索框中搜索mp4，發現視頻文件對應的請求，觀察請求的url與負載，發現負載1為視頻的id另一個為隨機生成的數字。方法為get

由其返回的視頻url與元素中的url進行對比發現是用cont-id替換了一段數字。這一段的url就為視頻的url

代碼實現：

代碼：

import os
from lxml import etree
import requests
import time
from fake_useragent import UserAgent
# UA繞過
ua = UserAgent()
headers = {'User-Agent': ua.random
}def deal_video(id):time.sleep(1)url = "https://www.pearvideo.com/video_" + idurl1 = "https://www.pearvideo.com/videoStatus.jsp?contId=" + idnew_headers = headersnew_headers["Referer"] = urlpage_json = requests.get(url=url1, headers=new_headers).json()video_src = page_json["videoInfo"]["videos"]["srcUrl"]key = "cont-"+url1.split("=")[1]return video_src.replace(video_src.split('/')[6].split('-')[0], key)def save_video(video_src,name):time.sleep(1)print("正在下載"+name)videoData = requests.get(url=video_src, headers=headers).contentif not os.path.exists("./videoLibs"):os.mkdir("./videoLibs")with open("./videoLibs/"+name+".mp4",'wb') as fp:fp.write(videoData)print(dic['name']+" 下載完成")post_url = 'https://www.pearvideo.com/category_1'
# 發出請求
page_text = requests.get(url=post_url, headers=headers).text
# 數據處理
urls = []
tree = etree.HTML(page_text)
videos = tree.xpath('//a[@class="vervideo-lilink actplay"]')
for video in videos:time.sleep(0.5)name = video.xpath('./@href')[0]information_url = "https://www.pearvideo.com/" + nameh = headersid = name.split("_")[1]#從函數中獲取到視頻的資源位置video_url=deal_video(id)dic = {'name': name,'url': video_url}save_video(video_url,name)urls.append(dic)

解析：

獲取主頁的text，然后通過xpath找到所以的視頻<a>標簽，for循環標簽，獲得href中的id。存儲url與名字。通過視頻id進入deal_video函數

在url后動態添加視頻id，一個作為訪問源url，表示從這個頁面向url1發起請求，請求頭需要攜帶Referer。通過字典查找獲得srcUrl中的視頻鏈接，并將其數字部分替換為cont-id(KEY)。返回視頻的url。

獲取視頻鏈接后進入保存函數。

向視頻鏈接發起請求保存到文件夾中

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/711500.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/711500.shtml
英文地址，請注明出處：http://en.pswp.cn/news/711500.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！