Python爬蟲：1藥城店鋪爬蟲（完整代碼）

??????????歡迎來到我的博客??????????
🐴作者：秋無之地

🐴簡介：CSDN爬蟲、后端、大數據領域創作者。目前從事python爬蟲、后端和大數據等相關工作，主要擅長領域有：爬蟲、后端、大數據開發、數據分析等。

🐴歡迎小伙伴們點贊👍🏻、收藏??、留言💬、關注🤝，關注必回關

一、確定目標數據

1、先打開目標網站，找到目標數據所在的頁面，點擊逛店鋪

2、找到目標數據所在的api或頁面

通過f12打開調試模式，通過搜索關鍵詞，找到關鍵詞所在的api或頁面

3、觀察請求參數

1）請求參數：有sign和token加密參數

2）翻頁：position參數變動了，1_0_0表示第一頁，2_0_0表示第二頁。

二、請求接口

使用requests庫請求接口，返回數據

def get_shop_list(self,per=10,position='1_0_0'):'''獲取店鋪列表:param per:每頁展示條數:param position:開始位置:return:'''try:url = self.uri + "/druggmp/index/shopList"params = {"traderName":"yaoex_pc","trader":"pc","closesignature":"yes","timestamp":int(time.time()*1000),}data = {"traderName":"yaoex_pc","trader":"pc","closesignature":"yes","timestamp":int(time.time()*1000),"token":self.token,"queryAll":"yes","isSearch":"yes","per":per,"position":position,}self.log_.info(f"入參：{data}")resp = requests.post(url,headers=self.header,params=params,data=data).json()self.log_.info(f"出參數量：{len(resp['data']['shopList'])}")return resp['data']['shopList']except Exception as e:self.log_.error(str(e))return []

三、數據解析

將返回的數據進行正則匹配，然后通過遍歷提取目標數據

'''獲取店鋪列表'''
shop_list = self.get_shop_list(per=10,position=position)
if not len(shop_list):self.log_.info('已經爬完，結束！')break
#遍歷店鋪
for shop_ in shop_list:#店鋪idshop_id = shop_['enterpriseId']#店鋪名稱shop_name = shop_['shopName']#店鋪logologo = shop_['logo']#是否自營self_str = shop_['shopExtTypeText']if self_str and self_str=='自營':is_self = 1else:is_self = 0#城市if 'shipAddress' in shop_:city = shop_['shipAddress']else:city = ''

四、數據存儲

數據解析后，對數據進行拼接，然后持久化，存在csv文件

sql = f'''replace into yyc_shop(shop_id,shop_name,logo,shelves,is_self,biz_code,biz_url,yao_url,qs_url,official_name,province,city) 
values('{shop_id}','{shop_name}','{logo}',{shelves},{is_self},'{biz_code}','{biz_url}','{yao_url}','{qs_url}','{official_name}','{province}','{city}')'''
self.log_.info(f"插入sql：{sql}")
self.base_.mysql_data(sql)

文件內容：

五、完整代碼

完整代碼如下：

def get_shop_list(self,per=10,position='1_0_0'):'''獲取店鋪列表:param per:每頁展示條數:param position:開始位置:return:'''try:url = self.uri + "/druggmp/index/shopList"params = {"traderName":"yaoex_pc","trader":"pc","closesignature":"yes","timestamp":int(time.time()*1000),}data = {"traderName":"yaoex_pc","trader":"pc","closesignature":"yes","timestamp":int(time.time()*1000),"token":self.token,"queryAll":"yes","isSearch":"yes","per":per,"position":position,}self.log_.info(f"入參：{data}")resp = requests.post(url,headers=self.header,params=params,data=data).json()self.log_.info(f"出參數量：{len(resp['data']['shopList'])}")return resp['data']['shopList']except Exception as e:self.log_.error(str(e))return []'''獲取店鋪列表'''
shop_list = self.get_shop_list(per=10,position=position)
if not len(shop_list):
self.log_.info('已經爬完，結束！')
break
#遍歷店鋪
for shop_ in shop_list:
#店鋪id
shop_id = shop_['enterpriseId']
#店鋪名稱
shop_name = shop_['shopName']
#店鋪logo
logo = shop_['logo']
#是否自營
self_str = shop_['shopExtTypeText']
if self_str and self_str=='自營':is_self = 1
else:is_self = 0
#城市
if 'shipAddress' in shop_:city = shop_['shipAddress']
else:city = '''''獲取店鋪上架數'''
shelves = self.get_shop_drug_count(shop_id=shop_id)'''獲取店鋪證件'''
shop_info = self.get_shopcert(shop_id=shop_id)
#地址
address = shop_info['data']['baseInfo']['address']
#省份
try:if city and city in address:province = address.split(city)[0]else:provs = address.split('省')province = provs[0]city = provs[1].split('市')[0]
except:province = ''
#供應商全稱
official_name = shop_info['data']['baseInfo']['enterpriseName']
#圖片列表
img_files = shop_info['data']['files']
# 企業營業執照
biz_url = ''
# 經營許可證
yao_url = ''
# 質量體系調查表
qs_url = ''
if len(img_files):for i in img_files:if '營業執照' in i['typeName']:biz_url = i['filePath']if '經營許可證' in i['typeName']:yao_url = i['filePath']if '質量體系調查表' in i['typeName']:qs_url = i['filePath']'''獲取店鋪營業執照編碼'''
biz_code = ''
if biz_url:biz_code = self.get_shop_biz_code(img_link=biz_url)#替換插入數據庫
sql = f'''replace into yyc_shop(shop_id,shop_name,logo,shelves,is_self,biz_code,biz_url,yao_url,qs_url,official_name,province,city) 
values('{shop_id}','{shop_name}','{logo}',{shelves},{is_self},'{biz_code}','{biz_url}','{yao_url}','{qs_url}','{official_name}','{province}','{city}')'''
self.log_.info(f"插入sql：{sql}")
self.base_.mysql_data(sql)

六、總結

Python爬蟲主要分三步：

請求接口
數據解析
數據存儲

版權聲明

本文章版權歸作者所有，未經作者允許禁止任何轉載、采集，作者保留一切追究的權利。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/68010.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/68010.shtml
英文地址，請注明出處：http://en.pswp.cn/web/68010.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！