【Python】讓Selenium 像Beautifulsoup一樣，用解析HTML 結構的方式提取元素！

我在使用selenium的find_element的方式去獲取網頁元素，一般通過xpath、css_selector、class_name的方式去獲取元素的絕對位置。

但是有時候如果網頁多了一些彈窗或者啥之類的，絕對位置會發生變化，使用xpath等方法，需要經常變動。

于是我在想，能不能讓selenium也能像Beautifulsoup一樣，可以根據html的結構找到需要的部分，并解析出來。

在這里插入圖片描述

方法：

復制那里的css_selector
對比css_selector的構建和html上的元素的上下位置

products=page_soup.find('div', {'id': 'List'}).ul.findAll("li") #找到最大的那個位置
for product in products:# 提取商品鏈接link_element = product.find_element(By.CSS_SELECTOR, "p-name  a")product_link = link_element.get_attribute("href")product_title = link_element.get_attribute("title")

寫法類似beautifulsoup的寫法。

如果想要多個條件并列，寫法：

   tags_elements = product.find_elements(By.CSS_SELECTOR, "div.p-icons img, div.p-icons i")

這個是想同時獲得icons 下的img 和i 的節點元素的內容。

提取上一級或者下一級的寫法：

例如：提取 div 的p-icons的，下一級元素img;
在這里插入圖片描述

css_selector : #J_pro_100151669791 > img:nth-child(1)
在div class為“p-icons”下的

具體寫法：

 tags_elements = product.find_elements(By.CSS_SELECTOR, "div.p-icons  img:nth-child(1)")

提取其中的具體標簽值，例如像上面的desc的：

for tag_element in tags_elements:tag = tag_element.get_attribute("desc") or tag_element.textif "XX超市" in tag or "五星旗艦店" in tag or "自營" in tag:tags.append(tag.strip())

可以批量判斷是否為這個標簽值

總的寫法：

for product in products:print()# 提取商品鏈接 link_element = product.find_element(By.CSS_SELECTOR, "div.p-name a")#print('提取商品鏈接:',link_element)#產品鏈接 產品名稱product_link = link_element.get_attribute("href") #產品鏈接product_title = link_element.text #產品名稱print(product_title)print('提取商品鏈接:',product_link)#價格     product_price_element = product.find_element(By.CSS_SELECTOR, "div.p-price i")product_price = product_price_element.text if product_price_element else "無"print(product_price)#評價數 #warecard_10116099611938 > div.p-commit > strongcomment_count_element = product.find_element(By.CSS_SELECTOR, "div.p-commit a")comment_count = comment_count_element.text if comment_count_element else "無"print(comment_count)# 提取店鋪名稱shop_name_element = product.find_element(By.CSS_SELECTOR, "div.p-shop a, div.p-shop span")  ##warecard_10129282745285 > div.p-shop > spanshop_name = shop_name_element.text if shop_name_element else "無"print(shop_name)#劃線價original_price= is_exist_element(product,"div.p-price span.originalPrice")print(original_price)#自營is_self_operated = is_extact_element_element(product,"div.p-name.p-name-type-2 img","alt","自營")print(is_self_operated)#X東超市is_jd_supermarket = is_extact_element_element(product, "div.p-icons img","desc",'XX超市')print(is_jd_supermarket)#5星店鋪  is_five_star = is_element(product,"div.p-shop img")print(is_five_star)

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/904555.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/904555.shtml
英文地址，請注明出處：http://en.pswp.cn/news/904555.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！