"在當今數字化的世界中，網頁自動化已經成為了不可或缺的技能。想象一下，您可以通過編寫代碼，讓瀏覽器自動執行各種操作，從點擊按鈕到填寫表單，從網頁抓取數據到進行自動化測試。學習 Selenium，這一功能強大的自動化工具，將為您打開無盡的可能性。在本博客中，您將深入探索 Selenium 的精髓，學習如何構建穩定、高效的自動化腳本，以及如何應用這些技能來提升工作效率、加速開發流程和實現可靠的網頁交互。無論您是一名開發人員、自動化工程師還是對網頁技術感興趣的愛好者，本博客將帶您踏上一段令人激動的學習之旅，釋放出無限的可能性。準備好挑戰傳統、超越自我，掌握 Selenium，引領網頁自動化的未來嗎？讓我們一起探索吧！"

Selenium

簡介

簡介
Selenium是一個Web的自動化測試工具，最初是為網站自動化測試而開發的，類型像我們玩游戲用的按鍵精靈，
可以按指定的命令自動操作，不同是Selenium 可以直接運行在瀏覽器上，它支持所有主流的瀏覽器（包括PhantomJS這些無界面的瀏覽器）。
Selenium 可以根據我們的指令，讓瀏覽器自動加載頁面，獲取需要的數據，甚至頁面截屏，或者判斷網站上某些動作是否發生。

官網
selenium官網
https://selenium-python.readthedocs.io/index.html
注意
Selenium 自己不帶瀏覽器，不支持瀏覽器的功能，它需要與第三方瀏覽器結合在一起才能使用
但是我們有時候需要讓它內嵌在代碼中運行，所以我們可以用一個叫 PhantomJS 的工具代替真實的瀏覽器

安裝

安裝selenium

pip install selenium

安裝ChromeDriver
國內源

https://registry.npmmirror.com/binary.html?path=chromedriver/
ChromeDriver
版本號要對應/幫助-關于Google Chrome——>找到對應版本下載——>下載的文件解壓到python_version\Scripts

安裝Firefox geckodriver
國內源

https://download-installer.cdn.mozilla.net/pub/firefox/releases/
Firefox geckodriver
安裝firefox最新版本，添加Firefox可執行程序到系統環境變量。記得關閉firefox的自動更新
將下載的geckodriver.exe 放到path路徑下 D:\Python\python_version\

基礎知識

基礎操作

創建瀏覽器對象

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service('./chromedriver.exe')
chrome = webdriver.Chrome(service=service)

打開頁面

chrome.get('http://www.baidu.com')

打開本地頁面

import os
file_path='file:///'+os.path.abspath('./1.下拉菜單.html')
chrome.get(file_path)

獲取頁面html源碼【換行】

page = chrome.page_source

休眠

from time import sleep
sleep(8)

關閉瀏覽器

chrome.quit()

操作瀏覽器

窗口大小

chrome.maximize_window() #窗口最大化
chrome.set_window_size(600, 800) #設置窗口大小

前進和后退

chrome.forward()
chrome.back()

基礎定位

定位元素

from selenium.webdriver.common.by import By
chrome.find_element(By.ID,'su')
chrome.find_element(By.XPATH, "//option[@value='10.69']").click()

find_element(type,value) ??一個元素
find_elements(type,value) ?多個元素
By中參數選擇
XPATH【xpath選擇器】
ID【id屬性】
NAME【name屬性】
CLASS_NAME 【class屬性】
LINK_TEXT 【超鏈接的文本】
PARTIAL_LINK_TEXT = "partial link text"
TAG_NAME = "tag name"
CSS_SELECTOR = "css selector"

操作元素
click 點擊對象
send_keys 在對象上模擬按鍵輸入
clear 清除對象的內容，如果可以的話

基礎示例

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from time import sleep
from selenium.webdriver.common.by import Byservice = Service('./chromedriver.exe')
chrome = webdriver.Chrome(service=service)
chrome.get('http://www.baidu.com')
sleep(3)
chrome.find_element(By.ID, 'kw').send_keys('CSDN')
sleep(3)
chrome.find_element(By.ID, 'su').click()
sleep(3) ?????????

常用操作

定位下拉菜單

注意
在定位下拉菜單時，要先定位到父級元素，然后再做一個模擬光標移動，再點擊所選項
頁面代碼

<html><head><meta http-equiv="content-type" content="text/html;charset=utf-8" /><title>Level Locate</title>    <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/jquery@1.12.4/dist/jquery.min.js"></script><link href="https://cdn.jsdelivr.net/npm/@bootcss/v3.bootcss.com@1.0.9/dist/css/bootstrap.min.css" rel="stylesheet" />    </head><body><h3>Level locate</h3><div class="span3 col-md-3">    <div class="well"><div class="dropdown"><a class="dropdown-toggle" data-toggle="dropdown" href="#">Link1</a><ul class="dropdown-menu" role="menu" aria-labelledby="dLabel" id="dropdown1" ><li><a tabindex="-1" href="http://www.bjsxt.com">Action</a></li><li><a tabindex="-1" href="#">Another action</a></li><li><a tabindex="-1" href="#">Something else here</a></li><li class="divider"></li><li><a tabindex="-1" href="#">Separated link</a></li></ul></div>        </div>      </div><div class="span3 col-md-3">    <div class="well"><div class="dropdown"><a class="dropdown-toggle" data-toggle="dropdown" href="#">Link2</a><ul class="dropdown-menu" role="menu" aria-labelledby="dLabel" ><li><a tabindex="-1" href="#">Action</a></li><li><a tabindex="-1" href="#">Another action</a></li><li><a tabindex="-1" href="#">Something else here</a></li><li class="divider"></li><li><a tabindex="-1" href="#">Separated link</a></li></ul></div>        </div>      </div></body><script src="https://cdn.jsdelivr.net/npm/@bootcss/v3.bootcss.com@1.0.9/dist/js/bootstrap.min.js"></script></html>

核心代碼

# 定位父級元素
chrome.find_element(By.LINK_TEXT, 'Link1').click()
sleep(4)
# 做一個移動光標的動作【模擬人工，非必要】
menu = chrome.find_element(By.LINK_TEXT, 'Action')
webdriver.ActionChains(chrome).move_to_element(menu).perform()
# 定位子集元素
menu.click()

示例代碼

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from time import sleep
from selenium.webdriver.common.by import By
import osservice = Service('./chromedriver.exe')
chrome = webdriver.Chrome(service=service)
file_path = 'file:///' + os.path.abspath('./1.下拉菜單.html')
chrome.get(file_path)
sleep(3)
# 定位父級元素
chrome.find_element(By.LINK_TEXT, 'Link1').click()
sleep(4)
# 做一個移動光標的動作【模擬人工，非必要】
menu = chrome.find_element(By.LINK_TEXT, 'Action')
webdriver.ActionChains(chrome).move_to_element(menu).perform()
# 定位子集元素
menu.click()
sleep(4) ???????????

定位下拉框

簡介
相比定位下拉菜單，下拉框可以直接定位到元素
頁面代碼

<html>
<body>
<select id="ShippingMethod" onchange="updateShipping(options[selectedIndex]);" name="ShippingMethod"><option value="12.51">UPS Next Day Air ==> $12.51</option><option value="11.61">UPS Next Day Air Saver ==> $11.61</option><option value="10.69">UPS 3 Day Select ==> $10.69</option><option value="9.03">UPS 2nd Day Air ==> $9.03</option><option value="8.34">UPS Ground ==> $8.34</option><option value="9.25">USPS Priority Mail Insured ==> $9.25</option><option value="7.45">USPS Priority Mail ==> $7.45</option><option value="3.20" selected="">USPS First Class ==> $3.20</option>
</select>
</body>
</html>

核心代碼

# 定位到選擇框，并利用xpath進行選取
m = chrome.find_element(By.ID, "ShippingMethod")
m.find_element(By.XPATH, "//option[@value='10.69']").click()

示例代碼
?

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from time import sleep
from selenium.webdriver.common.by import By
import osservice = Service('./chromedriver.exe')
chrome = webdriver.Chrome(service=service)
file_path = 'file:///' + os.path.abspath('./3.drop_down.html')
chrome.get(file_path)
sleep(3)
# 定位到選擇框，并利用xpath進行選取
m = chrome.find_element(By.ID, "ShippingMethod")
m.find_element(By.XPATH, "//option[@value='10.69']").click()
sleep(3)
chrome.quit()

定位層級內元素

簡介
有時候我們定位一個元素，定位器沒有問題，但一直定位不了，這時候就要檢查這個元素是否在一個frame中
頁面代碼

<html>
<head><meta http-equiv="content-type" content="text/html;charset=utf-8"/><title>inner</title>
</head>
<body>
<div class="row-fluid"><div class="span6 well"><h3>inner</h3><iframe id="f2" src="https://cn.bing.com/" width="700" height="500"></iframe></div>
</div>
</body>
</html>

<html>
<head><meta http-equiv="content-type" content="text/html;charset=utf-8"/><title>frame</title><script type="text/javascript" src="https://cdn.jsdelivr.net/npm/jquery@1.12.4/dist/jquery.min.js"></script><link href="http://netdna.bootstrapcdn.com/twitter-bootstrap/2.3.2/css/bootstrap-combined.min.css"rel="stylesheet"/>
</head><body>
<div class="row-fluid"><div class="span10 well"><h3>frame</h3><iframe id="f1" src="2.inner.html" width="800" , height="600"></iframe></div>
</div>
</body>
<script src="https://cdn.jsdelivr.net/npm/@bootcss/v3.bootcss.com@1.0.8/dist/js/bootstrap.min.js"></script>
</html>
</html>

核心代碼

可以利用以下方法進入到內層元素【參數時id屬性】
chrome.switch_to.frame('f1')

示例代碼

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from time import sleep
from selenium.webdriver.common.by import By
import os# 定位層級內元素【三層】
service = Service('./chromedriver.exe')
chrome = webdriver.Chrome(service=service)
file_path = 'file:///' + os.path.abspath('./2.outer.html')
chrome.get(file_path)
sleep(3)
# 切換到frame里【根據id】
chrome.switch_to.frame('f1')
chrome.switch_to.frame('f2')
# 定位三層里的元素【www.baidu.com】
chrome.find_element(By.ID, 'sb_form_q').send_keys('CSDN')
chrome.find_element(By.ID, 'search_icon').click()
sleep(3)

處理彈窗

頁面代碼

<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>This is a page</title>
</head>
<body>
<div id="container"><div style="font: size 30px;">Hello,Python Spider</div>
</div>
</body>
<script>alert('這個是測試彈窗')</script>
</html>

核心代碼

# 定位彈出窗口，并點擊
chrome.switch_to.alert.accept()

示例代碼

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from time import sleep
from selenium.webdriver.common.by import By
import osservice = Service('./chromedriver.exe')
chrome = webdriver.Chrome(service=service)
file_path = 'file:///' + os.path.abspath('./4.彈出框.html')
chrome.get(file_path)
sleep(3)
# 定位彈出窗口，并點擊
chrome.switch_to.alert.accept()
sleep(4)
chrome.quit()

拖拽元素

簡介
拖拽元素：如拖拽div標簽【塊級】

頁面代碼

<!doctype html>
<html lang="en">
<head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1"><title>jQuery UI Draggable - Auto-scroll</title><link rel="stylesheet" href="http://code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css"><style>#draggable, #draggable2, #draggable3 { width: 100px; height: 100px; padding: 0.5em; float: left; margin: 0 10px 10px 0; }body {font-family: Arial, Helvetica, sans-serif;}table {font-size: 1em;}.ui-draggable, .ui-droppable {background-position: top;}</style><script src="https://code.jquery.com/jquery-1.12.4.js"></script><script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script><script>$( function() {$( "#draggable" ).draggable({ scroll: true });$( "#draggable2" ).draggable({ scroll: true, scrollSensitivity: 100 });$( "#draggable3" ).draggable({ scroll: true, scrollSpeed: 100 });} );</script>
</head>
<body>
<div id="draggable" class="ui-widget-content"><p>Scroll set to true, default settings</p>
</div><div id="draggable2" class="ui-widget-content"><p>scrollSensitivity set to 100</p>
</div><div id="draggable3" class="ui-widget-content"><p>scrollSpeed set to 100</p>
</div>
<div style="height: 5000px; width: 1px;"></div>
</body>
</html>

核心代碼

# 定位要拖拽的元素
div1 = chrome.find_element(By.ID, 'draggable')
div2 = chrome.find_element(By.ID, 'draggable2')
div3 = chrome.find_element(By.ID, 'draggable3')
sleep(3)
# 拖拽【把div1拖拽到div2處】
ActionChains(chrome).drag_and_drop(div1, div2).perform()
sleep(3)
# 拖拽【把div3向左/下各拖拽10px】
ActionChains(chrome).drag_and_drop_by_offset(div3, 10, 10).perform()
sleep(3)

示例代碼

from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.service import Service
from time import sleep
from selenium.webdriver.common.by import By
import os# 定位彈出框
service = Service('./chromedriver.exe')
chrome = webdriver.Chrome(service=service)
file_path = 'file:///' + os.path.abspath('./5.拖拽元素.html')
chrome.get(file_path)
sleep(3)
# 定位要拖拽的元素
div1 = chrome.find_element(By.ID, 'draggable')
div2 = chrome.find_element(By.ID, 'draggable2')
div3 = chrome.find_element(By.ID, 'draggable3')
sleep(3)
# 拖拽【把div1拖拽到div2處】
ActionChains(chrome).drag_and_drop(div1, div2).perform()
sleep(3)
# 拖拽【把div3向左/下各拖拽10px】
ActionChains(chrome).drag_and_drop_by_offset(div3, 10, 10).perform()
sleep(3)
chrome.quit()

調用JS方法

簡介
有時候我們需要控制頁面滾動條上的滾動條，但滾動條并非頁面上的元素，這個時候就需要借助js是來進行操作

注意
js都可以直接打開瀏覽器開發者工具去測試
控制臺————輸入js即可

核心代碼

js = "window.scrollTo(100,400)"# 拉動滾動條
driver.execute_script(js)

示例代碼

from selenium.webdriver.chrome.service import Service
from selenium import webdriver
from time import sleepservice = Service('./chromedriver.exe')
chrome = webdriver.Chrome(service=service)
chrome.get('https://www.jd.com/')
# 拉動滾動條
js = "window.scrollTo(100,400)"
chrome.execute_script(js)
sleep(3) ???????????

功能

等待元素

強制等待
作用：當代碼運行到強制等待這一行的時候，無論出于什么原因，都強制等待指定的時間，需要通過time模塊實現
優點：簡單
缺點：無法做有效的判斷，會浪費時間

from time import sleep
sleep(3)

隱式等待
作用：到了一定的時間發現元素還沒有加載，則繼續等待我們指定的時間，
如果超過了我們指定的時間還沒有加載就會拋出異常，如果沒有需要等待的時候就已經加載完畢就會立即執行
優點：設置一次即可，所有操作都會等待
缺點：必須等待加載完成才能到后續的操作，或者等待超時才能進入后續的操作

from selenium import webdriver
chrome.implicitly_wait(10)

顯示等待
作用：指定一個等待條件，并且指定一個最長等待時間，會在這個時間內進行判斷是否滿足等待條件，如果成立就會立即返回，
如果不成立，就會一直等待，直到等待你指定的最長等待時間，如果還是不滿足，就會拋出異常，如果滿足了就會正常返回
優點：專門用于對指定一個元素等待，加載完即可運行后續代碼
缺點：多個元素都需要要單獨設置等待

from selenium.webdriver.support.wait import WebDriverWait
# 0.5：指定檢查條件的頻率，單位為秒。也就是每隔0.5秒檢查一次條件是否滿足
wait = WebDriverWait(driver,10,0.5)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'next')))

????????

隱藏瀏覽器

實現

# 設置參數，將瀏覽器隱藏起來(無頭瀏覽器)
options = ChromeOptions()
options.add_argument('--headless')
# 創建Chrome瀏覽器時加入參數
service = Service('./chromedriver')
driver = Chrome(service=service,options=options)

代理模式

實現1

# 設置參數，給瀏覽器設置代理
options = ChromeOptions()
# options.add_argument('--proxy-server=http://ip:port')
options.add_argument('--proxy-server=http://221.199.36.122:35414')
# 設置驅動
service = Service('./chromedriver')
# 啟動Chrome瀏覽器
driver = Chrome(service=service,options=options)

實現2

from selenium.webdriver.common.proxy import ProxyType,Proxy
# 設置參數，給瀏覽器設置代理
ip = 'http://113.76.133.238:35680'
proxy = Proxy()
proxy.proxy_type = ProxyType.MANUAL
proxy.http_proxy = ip
proxy.ssl_proxy = ip
# 關聯瀏覽器
capabilities = DesiredCapabilities.CHROME
proxy.add_to_capabilities(capabilities)# 設置驅動
service = Service('./chromedriver')
# 啟動Chrome瀏覽器
driver = Chrome(service=service,desired_capabilities=capabilities)

防檢測設置

實現

from selenium.webdriver import Chrome
from selenium.webdriver import ChromeOptionsoptions = ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-automation'])
options.add_experimental_option('useAutomationExtension', False)chrome = Chrome(chrome_options=options)
chrome.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => false
})
"""
})chrome.get('http://httpbin.org/get')
info = chrome.page_sourceprint(info)
sleep(20) ?????????

實戰

虎牙

爬取英雄聯盟全部分頁的主播和對應的人氣

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from lxml import etree
from selenium.webdriver.common.by import Byservice = Service('../0.工具/chromedriver.exe')
chrome = webdriver.Chrome(service=service)
# 設置隱式等待
chrome.implicitly_wait(5)
# 爬取英雄聯盟頁面的數據
chrome.get('https://www.huya.com/g/lol')
# 做一個循環，退出條件是下一頁沒有數據
while True:# 分析數據e = etree.HTML(chrome.page_source)names = e.xpath('//i[@class="nick"]/@title') ?# 獲取主播昵稱person_nums = e.xpath('//i[@class="js-num"]/text()') ?# 獲取主播人氣# 提取數據for n, p in zip(names, person_nums):print(f'{n}————————————{p}')try:# 找到下一頁的按鈕next_btn = chrome.find_element(By.XPATH, '//a[@class="laypage_next"]')# 點擊下一頁next_btn.click()except Exception as e:break# if chrome.page_source.find('laypage_next') == -1:# ????break# # 找到下一頁的按鈕# next_btn = chrome.find_element(By.XPATH, '//a[@class="laypage_next"]')# # 點擊下一頁# next_btn.click()
chrome.quit()