Requests庫是目前常用且效率較高的爬取網頁的庫
1.一個簡單的例子
import requests #引入requests庫
r = requests.get("http://www.baidu.com") #調用get方法獲取界面print(r.status_code) #輸出狀態碼
print(r.text) #輸出頁面信息
通過以下代碼,便可獲取一個response對象
2.通用代碼框架
importrequestsdefgetHtmlText(url):try:
r= requests.get(url, timeout = 30) #設置響應時間和地址
r.raise_for_status() #獲取狀態碼,如果不是200會引發HTTPERROR異常
r.encoding=r.apparent_encoding #apparent_encoding是識別網頁的編碼類型returnr.textexcept:return "產生異常"
if __name__ == "__main__":
url= ‘http://www.baidu.com‘
print(getHtmlText(url))
3.requests庫的具體介紹
3.1 response屬性介紹
屬性邏輯結構:
3.2requests方法介紹
requests庫對比http協議
????
ps:在這些方法中,大致有三個參數,略有差別
3.2.1 ?get方法
r = requests.get(‘http://www.baidu.com‘)print(r.text)
3.2.2head方法
r = requests.head(‘http://www.baidu.com‘)print(r.headers)
3.2.3 post方法
payload = {‘key1‘: ‘value1‘, ‘key2‘ : ‘value2‘}
r= requests.post(‘http://httpbin.org/post‘, data =payload)print(r.text)#輸出結果
{..."form": {"key1": "value1","key2": "value2"},
...}
3.2.4 put方法
payload = {‘key1‘:?‘value1‘,?‘key2‘?:?‘value2‘}
r= requests.put(‘http://httpbin.org/post‘, data =payload)print(r.text) #向URL傳一個字典,自動編碼為表單
#........字符串,........data#輸出結果
{..."form": {"key1": "value1","key2": "value2"},
...}
3.2.5 reuqest方法--構造請求
requests.request(method, url, **kwrags)#method:請求方式,對應get/put/post等七種#url : 鏈接#**kwrags : 13個控制訪問的參數
method請求方式:
requests.request(‘GET‘, url, **kwrags)
requests.request(‘HEAD‘, url, **kwrags)
requests.request(‘POST‘, url, **kwrags)
requests.request(‘PUT‘, url, **kwrags)
requests.request(‘PATCH‘, url, **kwrags)
requests.request(‘DELETE‘, url, **kwrags)
requests.request(‘OPTIONS‘, url, **kwrags)
**kwargs詳解:
kv = {‘key1‘: ‘value1‘, ‘key2‘ : ‘value2‘} #params
r= requests.request(‘POST‘, ‘http://python123.io/ws‘, data =kv)
data1= ‘hellowrld‘ #datar= requests.request(‘POST‘, ‘http://python123.io/ws‘, data = data1)
jso = {‘key1‘: ‘value1‘} #json
r = requests.request(‘POST‘,‘http://python123.io/ws‘, json = jso)
hd = {‘key1‘: ‘value1‘} #headers
r = requests.request(‘POST‘,‘http://python123.io/ws‘, headers = hd)
fs = {‘file‘ : open(‘data.xls‘,‘rb‘)} #files
r = requests.request(‘POST‘,‘http://python123.io/ws‘, files =fs)#timeout
r = requests.request(‘POST‘,‘http://python123.io/ws‘, timeout = 10)#proxies
pxs = {‘http‘: ‘http://usr:pass@10.10.10:1234‘,‘https‘ : ‘https://10.10.10.1:4321‘}
r= requests.request(‘GET‘,‘http://www.baidu.com‘, proxies = pxs)
3.2.6 delete方法
3.2.7 patch方法
3.3PATCH和PUT的區別
.
4.requests庫的異常
本文是通過整合慕課網上的資料和網上相關資料完成