Python爬蟲——Urllib庫-3

ajax的get請求

獲取豆瓣電影第一頁的數據并保存到本地

獲取豆瓣電影前十頁的數據

ajax的post請求

總結

ajax的get請求

獲取豆瓣電影第一頁的數據并保存到本地

首先可以在瀏覽器找到發送數據的接口

那么我們的url就可以在header中找到了

再加上UA這個header

進行請求對象的定制，模擬瀏覽器發送請求即可

詳細代碼如下：

# get請求
# 獲取豆瓣電影第一頁的數據并且保存起來
import urllib.requesturl = 'https://movie.douban.com/j/chart/top_list?type=13&interval_id=100%3A90&action=&start=0&limit=20'headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
}# 請求對象的定制
request = urllib.request.Request(url=url, headers=headers)# 模擬瀏覽器發送請求,獲取響應的數據
response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')
# print(content)# 將數據下載到本地
# open方法默認使用GBK，但是我們前面使用的是utf-8，那么這里
# 需要將編碼格式指定為utf-8
fp = open('douban.json', 'w', encoding='utf-8')
fp.write(content)# get請求
# 獲取豆瓣電影第一頁的數據并且保存起來
import urllib.requesturl = 'https://movie.douban.com/j/chart/top_list?type=13&interval_id=100%3A90&action=&start=0&limit=20'headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
}# 請求對象的定制
request = urllib.request.Request(url=url, headers=headers)# 模擬瀏覽器發送請求,獲取響應的數據
response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')
# print(content)# 將數據下載到本地
# open方法默認使用GBK，但是我們前面使用的是utf-8，那么這里
# 需要將編碼格式指定為utf-8
fp = open('douban.json', 'w', encoding='utf-8')
fp.write(content)

這就下載下來了

獲取豆瓣電影前十頁的數據

首先我們找到第一次的刷新數據的請求url：

https://movie.douban.com/j/chart/top_list?type=13&interval_id=100%3A90&action=&start=0&limit=20

然后是第二次的：

https://movie.douban.com/j/chart/top_list?type=13&interval_id=100%3A90&action=&start=20&limit=20

然后是第三次的：
https://movie.douban.com/j/chart/top_list?type=13&interval_id=100%3A90&action=&start=40&limit=20

????????如果你觀察這幾個URL后面的參數的話，你就可以發現問題了，start每次都累加上limit，通過改變起始索引來挨個查詢，這個在Java開發中經常會有這種代碼，那么它查詢的方法就已經是顯而易見了。

所以可以得出start的值是：(page - 1) * 20

然后就可以寫出下面的代碼了：

# get請求
# 下載豆瓣電影前十頁的數據
import urllib.request
import urllib.parse"""得到不同pages的request
"""
def create_request(page):base_url = 'https://movie.douban.com/j/chart/top_list?type=13&interval_id=100%3A90&action=&'data = {'start': (page - 1) * 20,'limit': 20}data = urllib.parse.urlencode(data)url = base_url + dataprint(url)headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'}request = urllib.request.Request(url=url, headers=headers)return request"""得到返回的內容content
"""
def get_content(request):response = urllib.request.urlopen(request)content = response.read().decode('utf-8')return content"""將得到的內容寫入本地
"""
def down_load(page, content):fp = open('douban_' + str(page) + '.json', 'w', encoding='utf-8')fp.write(content)"""主方法
"""
if __name__ == '__main__':start_page = int(input('請輸入起始頁碼'))end_page = int(input('請輸入結束頁碼'))for page in range(start_page, end_page + 1):# 每一頁都有自己的請求對象的定制request = create_request(page)# 獲取響應數據content = get_content(request)# download下載down_load(page, content)

然后就完美得到了所有的數據了?

ajax的post請求

對肯德基官網的餐廳位置進行爬取

這為什么是一個ajax發送的數據呢，因為這里有一個ajax的核心對象

然后就通過URL和header就可以得到下面的代碼，并沒有新的東西，都是前面的知識點的整合。

# post請求
# 肯德基官網
import urllib.request
import urllib.parse# 第一頁
# https://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname# cname: 哈爾濱
# pid:
# pageIndex: 1
# pageSize: 10# 第二頁
# https://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname# cname: 哈爾濱
# pid:
# pageIndex: 2
# pageSize: 10"""請求對象定制
"""
def create_request(page):base_url = 'https://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=cname'data = {'cname': '哈爾濱','pid': '','pageIndex': page,'pageSize': '10'}data = urllib.parse.urlencode(data).encode('utf-8')headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'}request = urllib.request.Request(url=base_url, headers=headers, data=data)return request"""獲取網頁內容
"""
def get_content(request):response = urllib.request.urlopen(request)content = response.read().decode('utf-8')return content"""下載內容到本地
"""
def down_load(page, content):fp = open('KFC' + str(page) + ".json", 'w', encoding='utf-8')fp.write(content)if __name__ == '__main__':start_page = int(input("請輸入起始頁碼"))end_page = int(input("請輸入結束頁碼"))for page in range(start_page, end_page + 1):# 請求對象的定制request = create_request(page)# 獲取網頁內容content = get_content(request)# 下載內容到本地down_load(page, content)

總結

累了，沒有總結，再見兄弟們ヾ(￣▽￣)Bye~Bye~

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/711271.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/711271.shtml
英文地址，請注明出處：http://en.pswp.cn/news/711271.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！