爬蟲（requsets）筆記

一、request_基本使用

pip install requests -i https://pypi.douban.com/simple?

一個類型六個屬性

r.text?	獲取網站源碼
r.encoding?	訪問或定制編碼方式
r.url?	獲取請求的url
r.content?	響應的字節類型
r.status_code?	響應的狀態碼
r.headers?	響應的頭信息

import requestsurl='http://www.baidu.com'resopnse=requests.get(url=url)# 第一類型和六個屬性
# Response類型
print(type(resopnse))# 設置響應的編碼格式
resopnse.encoding='utf-8'
# 以字符串的形式
print(resopnse.text)# 返回一個url地址
print(resopnse.url)# 返回的是二進制數據
print(resopnse.content)# 返回響應的狀態碼
print(resopnse.status_code)# 返回的是響應頭
print(resopnse.headers)

二、requests_get請求

定制參數

1、參數使用params 傳遞

2、參數無需urlencode編碼

3、不需要請求對象的定制

4、請求資源路徑中？可加可不加


# urllib
# 1 一個類型六個方法
# 2 get 請求
# 3 post 請求
# 4 ajax get 請求
# 5 ajax post 請求
# 6 cookie登錄
# 7 代理# requsets
# 1 一個類型六個屬性
# 2 get 請求
# 3 post 請求
# 4 代理
# 5 cookie 驗證碼import requestsurl='http://www.baidu.com/s?'headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'
}data={'wd':'北京'
}# url   請求資源路徑
# params 參數
# kwargs 字典
response=requests.get(url=url,params=data,headers=headers)content=response.textprint(content)# 參數使用params傳遞
# 參數無需urlencode編碼
# 不需要請求對象定制
# 請求資源路徑中的?可加可不加

三、request_post請求

get和post區別？

1： get 請求的參數名字是 params post 請求的參數的名字是 data

2：請求資源路徑后面可以不加?

3：不需要手動編解碼

4：不需要做請求對象的定制


import requestsurl='https://fanyi.baidu.com/sug'headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'
}data={'kw':'eye'
}#url 請求地址
#data 請求參數
#kwarg 字典
response=requests.post(url=url,data=data,headers=headers)content=response.textobj=response.json()
print(obj)#總結
# 1 post請求 是不需要編解碼
# 2 post請求的參數是data
# 3 不需要請求對象的定制

四、使用requsests和xpath獲取數據

示例：獲取百度一下

代碼如下：


# 使用requests和Xpath獲取數據
from lxml import etree
import requestsurl = 'https://www.baidu.com/'
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) Chrome/65.0.3325.181'}response = requests.get(url,headers = headers)# 設置響應的編碼格式
response.encoding='utf-8'# 獲取網頁源碼
content= response.content.decode('utf-8')
# print(content)# xpath解析
html=etree.HTML(content,parser=etree.HTMLParser(encoding='utf-8'))
request=html.xpath('//*[@id="su"]/@value')[0]
print(request)

運行結果：

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/898918.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/898918.shtml
英文地址，請注明出處：http://en.pswp.cn/news/898918.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！