Python爬蟲的基礎用法

python爬蟲一般通過第三方庫進行完成

導入第三方庫（如import requests ）
requests用于處理http協議請求的第三方庫,用python解釋器中查看是否有這個庫，沒有點擊+安裝
獲取網站url（url一定要解析正確，如在百度網站中的某個圖片，不是指www.baidu.com，而是這個圖片的具體地址（可以點擊查看，亦可以用瀏覽器自帶的檢查功能進行尋找））
之后通過requests.get打開網站，設置自己適用的encoding
用a = re.findall('<標簽>正則表達式<標簽>',文件名)通過HTML定義的標簽，找到自己想要的內容，再賦值給一個變量a（并將獲取到的數據以指定的數據類型存儲到文件中）
將文件內容使用print輸出進行查看

示例如下：

# url = "http://weather.com.cn/weather1d/101010100.shtml#search"
# resp = requests.get(url)          # 打開瀏覽器訪問該地址
# resp.encoding = 'UTF-8'
# print(resp)# 返回訪問代碼 200表示成功 500 錯誤
# print(resp.text) # 展示網頁 （html代碼的形式展示網頁）# city = re.findall('<span class="name">([\u4e00-\u9fa5]*)</span>', resp.text)
# weather = re.findall('<span class="weather">([\u4e00-\u9fa5]*)</span>', resp.text)
# 以上賦值操作中的內容必須在resp.text中存在
# lst = []
# for a,b in zip(city, weather,):  # 通過zip方式將爬取到的內容填充到list中
#         lst.append([a,b])
# for i in lst:
#     print(i)
# 爬取圖片與之相似
# url = "https://uhf.microsoft.com/images/microsoft/RE1Mu3b.png"
# resp = requests.get(url)
# # print(resp.content)
# with open('logo.png', 'wb') as f:
#     f.write(resp.content)

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/pingmian/79511.shtml
繁體地址，請注明出處：http://hk.pswp.cn/pingmian/79511.shtml
英文地址，請注明出處：http://en.pswp.cn/pingmian/79511.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！