????????在日常的數據分析工作中,我們經常需要對數據進行分組排序、計算移動平均值、統計累計求和等操作。在MySQL 8.0之前,這類需求通常需要編寫復雜的子查詢或連接查詢才能實現。而MySQL 8.0引入的窗口函數(Window Functions)極大地簡化了這類操作,讓數據分析變得更加簡單高效。
????????本文將通過通俗易懂的方式,帶你全面了解MySQL 8.0中的窗口函數,包括聚合類、排名類和跨行類窗口函數的使用方法。
什么是窗口函數?
????????窗口函數是一種特殊的SQL函數,它能夠在不改變原有行數的情況下,對查詢結果的某個"窗口"(一組相關的行)進行計算,并為每一行返回一個值。
????????可以把窗口函數想象成:你有一張全班學生的成績表,窗口函數允許你同時看到每個學生的成績、他在班級中的排名、他與前一名同學的分數差等信息,而不需要改變原始數據表的行數。這個排名是新加的一行。
窗口函數的基本語法
SELECT 列1, 列2,窗口函數() OVER ([PARTITION BY 分區字段] [ORDER BY 排序字段][frame_clause]) AS 別名
FROM 表名;
PARTITION BY
:將數據分成多個分區(類似于GROUP BY,但不合并行)ORDER BY
:確定分區內數據的排序方式frame_clause
:定義窗口框架,即計算范圍
一、聚合類窗口函數
聚合類窗口函數可以在保留所有行的同時,計算分組的聚合值。
1. SUM() OVER()
計算分區內的總和:
-- 計算每個部門的工資總額,同時顯示每個員工的詳細信息
SELECT employee_id,name,department,salary,SUM(salary) OVER (PARTITION BY department) AS dept_total_salary
FROM employees;
2. AVG() OVER()
計算分區內的平均值:
-- 計算每個部門的平均工資,同時顯示每個員工的詳細信息
SELECT employee_id,name,department,salary,AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;
3. COUNT() OVER()
計算分區內的行數:
-- 計算每個部門的員工數量,同時顯示每個員工的詳細信息
SELECT employee_id,name,department,salary,COUNT(*) OVER (PARTITION BY department) AS dept_employee_count
FROM employees;
二、排名類窗口函數
排名類窗口函數用于為分區內的行分配排名、序號等。
1. ROW_NUMBER()
為分區內的每一行分配一個唯一的序號:1,2,3,4,5,6,7
-- 為每個部門的員工按工資從高到低編號
SELECT employee_id,name,department,salary,ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank_in_dept
FROM employees;
2. RANK()
計算分區內的排名,相同值會有相同排名,并跳過后續排名:1,1,3,3,5,6,7
-- 計算每個部門內的工資排名(允許并列)
SELECT employee_id,name,department,salary,RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;
3. DENSE_RANK()
計算分區內的排名,相同值有相同排名,但不跳過后續排名:1,1,2,2,3,4,5,6,6
-- 計算每個部門內的工資密集排名(允許并列但不跳號)
SELECT employee_id,name,department,salary,DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank
FROM employees;
4. NTILE()
將分區內的數據分成指定數量的組:
-- 將每個部門的員工按工資高低分成4個組
SELECT employee_id,name,department,salary,NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) as quartile
FROM employees;
三、跨行類窗口函數
跨行類窗口函數可以訪問分區內其他行的數據。
1. LAG()
訪問分區中當前行之前的數據:
-- 查看每位員工和上一名員工的工資差異
SELECT employee_id,name,department,salary,LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) as prev_salary,salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) as diff_from_prev
FROM employees;
2. LEAD()
訪問分區中當前行之后的數據:
-- 查看每位員工和下一名員工的工資差異
SELECT employee_id,name,department,salary,LEAD(salary) OVER (PARTITION BY department ORDER BY salary DESC) as next_salary,salary - LEAD(salary) OVER (PARTITION BY department ORDER BY salary DESC) as diff_from_next
FROM employees;
3. FIRST_VALUE()
獲取分區內第一行的值:
-- 查看每位員工與部門最高工資的差異
SELECT employee_id,name,department,salary,FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) as highest_salary,salary - FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) as diff_from_highest
FROM employees;
4. LAST_VALUE()
獲取分區內最后一行的值:
-- 查看每位員工與部門最低工資的差異
SELECT employee_id,name,department,salary,LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESCRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lowest_salary,salary - LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESCRANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as diff_from_lowest
FROM employees;
注意:使用LAST_VALUE()時需要特別注意窗口框架的定義,否則可能得不到預期結果。
窗口框架(Window Frame)詳解
窗口框架定義了窗口函數計算時使用的行范圍,常用的有兩種:
ROWS模式:基于物理行偏移
RANGE模式:基于邏輯值偏移
示例:計算移動平均值
-- 計算每個員工與前2行、當前行、后2行共5行的平均工資
SELECT employee_id,name,department,salary,AVG(salary) OVER (PARTITION BY department ORDER BY salary DESCROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) as moving_avg
FROM employees;
實際應用場景
場景1:計算同比/環比增長率
-- 計算每月銷售額與上月相比的增長率
SELECT year,month,sales,LAG(sales) OVER (ORDER BY year, month) as prev_month_sales,(sales - LAG(sales) OVER (ORDER BY year, month)) / LAG(sales) OVER (ORDER BY year, month) * 100 as growth_rate
FROM monthly_sales;
場景2:獲取Top N記錄
-- 獲取每個部門工資前三名的員工
WITH ranked_employees AS (SELECT employee_id,name,department,salary,DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank_in_deptFROM employees
)
SELECT * FROM ranked_employees WHERE rank_in_dept <= 3;
場景3:計算累計百分比
-- 計算每個部門工資的累計百分比
SELECT employee_id,name,department,salary,SUM(salary) OVER (PARTITION BY department ORDER BY salary DESC) as running_total,SUM(salary) OVER (PARTITION BY department) as dept_total,SUM(salary) OVER (PARTITION BY department ORDER BY salary DESC) / SUM(salary) OVER (PARTITION BY department) * 100 as cumulative_percent
FROM employees;
性能優化建議
合理使用索引:為PARTITION BY和ORDER BY涉及的字段創建索引
避免過度使用窗口函數:在數據量大的表中,窗口函數可能影響性能
使用CTE(公用表表達式):將復雜查詢分解為多個步驟,提高可讀性和性能
限制窗口大小:對于移動平均等計算,盡量限制窗口框架的范圍
總結
MySQL 8.0的窗口函數為數據分析提供了強大而靈活的工具,使我們能夠在保留原始行細節的同時,進行各種復雜的計算和分析。通過本文的學習,你應該已經掌握了:
窗口函數的基本概念和語法結構
聚合類、排名類和跨行類窗口函數的使用方法
窗口框架的定義和使用場景
窗口函數在實際業務中的應用示例
窗口函數的學習曲線可能稍陡,但一旦掌握,將極大地提升你的數據處理能力和工作效率。建議在實際工作中多加練習,逐步掌握這些強大的功能。
希望本文對你理解和使用MySQL 8.0窗口函數有所幫助!如有任何疑問,歡迎留言討論。