老男孩爬蟲實戰密訓課第一季，2018.6，初識爬蟲訓練-實戰1-爬取汽車之家新聞數據...

1.爬蟲介紹

　　編寫程序，根據URL獲取網站信息

2.用到的庫

　　requests庫

　　bs4庫

3.內容及步驟

4.代碼

import requests
import os
from bs4 import BeautifulSoup
# 1.下載頁面
ret = requests.get(url='https://www.autohome.com.cn/news/')
ret.encoding = ret.apparent_encoding
# print(ret.text)  爬取的內容# 2.解析：獲取想要的指定內容beautifulsoup
soup = BeautifulSoup(ret.text, 'html.parser')  # lxml
div = soup.find(name='div', id='auto-channel-lazyload-article')
li_list = div.find_all(name='li')
# 更改下載地址
os.mkdir("圖片")
os.chdir("圖片")
for li in li_list:# 獲取新聞標題h3 = li.find(name='h3')if not h3:continue# 獲取新聞內容p = li.find(name='p')# 獲取鏈接地址a = li.find(name='a')# print(a.attrs)獲取屬性print(h3.text,a.get('href'),p.text)print('=' * 15)# 獲取圖片鏈接并下載img = li.find('img')src = img.get('src')# 根據__分割，得到列表file_nane = src.rsplit('__', maxsplit=1)[1]ret_img = requests.get(url='https:' + src)with open(file_nane, 'wb') as f:f.write(ret_img.content)

find的擴展：

　　可以用：

　　　　1.id，_class

　　　　2.attrs方式

轉載于:https://www.cnblogs.com/yhstcxx/p/10946511.html

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/247867.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/247867.shtml
英文地址，請注明出處：http://en.pswp.cn/news/247867.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！