一、窗口函數:數據分析的利器
1. 窗口函數基礎概念
窗口函數(Window Function)是MySQL 8.0引入的強大特性,它可以在不減少行數的情況下對數據進行聚合計算和分析
SELECT employee_name,department,salary,RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
核心組件:
PARTITION BY:定義窗口分區(類似GROUP BY但不聚合)
ORDER BY:確定窗口內行的排序
frame_clause:定義窗口框架(ROWS/RANGE BETWEEN)
2. 常用窗口函數分類
排名函數
ROW_NUMBER():連續編號(1,2,3…)
RANK():并列排名會跳過后續序號(1,2,2,4…)
DENSE_RANK():并列排名不跳號(1,2,2,3…)
聚合函數
SUM()/AVG()/COUNT()/MIN()/MAX() OVER()
分布函數
PERCENT_RANK():相對排名百分比
CUME_DIST():累積分布值
前后函數
LAG(column, n):訪問前n行數據
LEAD(column, n):訪問后n行數據
FIRST_VALUE()/LAST_VALUE():窗口首尾值
3.高級窗口框架控制
SELECT date,revenue,AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg
FROM sales;
框架類型:
ROWS:物理行偏移
RANGE:邏輯值范圍
GROUPS:MySQL 8.0.2+支持,按組偏移
二、Common Table Expressions (CTE):提升查詢可讀性
基礎CTE語法
WITH department_stats AS (SELECT department,AVG(salary) as avg_salary,COUNT(*) as emp_countFROM employeesGROUP BY department
)
SELECT * FROM department_stats WHERE avg_salary > 5000;
遞歸CTE實現層次查詢
WITH RECURSIVE org_hierarchy AS (-- 基礎查詢(錨成員)SELECT id, name, manager_id, 1 as levelFROM employeesWHERE manager_id IS NULLUNION ALL-- 遞歸查詢(遞歸成員)SELECT e.id, e.name, e.manager_id, h.level + 1FROM employees eJOIN org_hierarchy h ON e.manager_id = h.id
)
SELECT * FROM org_hierarchy;
應用場景:
組織結構圖
產品分類樹
社交網絡關系
CTE優化技巧
MATERIALIZED:強制物化CTE結果
MERGE:將CTE合并到主查詢
限制遞歸深度:SET @@cte_max_recursion_depth = 100;
三、高級JSON處理:應對半結構化數據
JSON創建與修改
-- 創建JSON
SELECT JSON_OBJECT('name', name, 'salary', salary) as emp_json
FROM employees;-- 修改JSON
UPDATE products
SET attributes = JSON_SET(attributes, '$.color', 'blue')
WHERE id = 1001;
JSON路徑查詢
SELECT product_id,JSON_EXTRACT(attributes, '$.dimensions.width') as width,attributes->>'$.manufacturer' as manufacturer
FROM products
WHERE JSON_CONTAINS(attributes, '"wireless"', '$.features');
JSON聚合函數
SELECT department,JSON_ARRAYAGG(JSON_OBJECT('id', id, 'name', name)) as employees
FROM staff
GROUP BY department;
四、索引優化高級技巧
函數索引(MySQL 8.0+)
-- 創建基于表達式的索引
CREATE INDEX idx_name_lower ON employees ((LOWER(name)));-- 使用時必須完全匹配索引表達式
SELECT * FROM employees WHERE LOWER(name) = 'john';
不可見索引
-- 創建不可見索引(優化器忽略)
CREATE INDEX idx_temp ON orders (customer_id) INVISIBLE;-- 測試后決定是否可見
ALTER TABLE orders ALTER INDEX idx_temp VISIBLE;
降序索引優化
-- 創建降序索引
CREATE INDEX idx_created_desc ON log_entries (created_at DESC);-- 適合ORDER BY ... DESC查詢
SELECT * FROM log_entries ORDER BY created_at DESC LIMIT 100;
五、高級事務處理
保存點(Savepoint)控制
START TRANSACTION;
INSERT INTO orders (...) VALUES (...);
SAVEPOINT order_created;
UPDATE inventory SET quantity = quantity - 1;
-- 發生錯誤時
ROLLBACK TO SAVEPOINT order_created;
COMMIT;
多版本并發控制(MVCC)深度優化
-- 使用特定隔離級別
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;-- 優化長時間事務
SET TRANSACTION READ ONLY;
鎖優化策略
-- 行鎖升級為表鎖(慎用)
LOCK TABLES orders WRITE;-- 使用SKIP LOCKED處理高并發
SELECT * FROM jobs
WHERE status = 'pending'
ORDER BY priority DESC
LIMIT 1 FOR UPDATE SKIP LOCKED;
六、性能分析高級技術
執行計劃深度解讀
EXPLAIN FORMAT=JSON
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE region = 'APAC'
);-- 關鍵指標分析
/*
"cost_info": {"query_cost": "10.25" -- 總預估成本
},
"table_scan": {"rows_examined_per_scan": 1000,"rows_produced_per_join": 100,"filtered": "10.00"
}
*/
優化器提示(Optimizer Hints)
SELECT /*+ INDEX(orders idx_customer) */ *
FROM orders FORCE INDEX (idx_customer)
WHERE customer_id = 1001;
性能模式(Performance Schema)監控
-- 分析最耗資源的SQL
SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;-- 查看鎖等待
SELECT * FROM performance_schema.events_waits_current
WHERE EVENT_NAME LIKE '%lock%';
七、實戰案例:電商數據分析系統
用戶購買路徑分析
WITH user_journey AS (SELECT user_id,event_time,event_type,LAG(event_type, 1) OVER (PARTITION BY user_id ORDER BY event_time) as prev_event,LEAD(event_type, 1) OVER (PARTITION BY user_id ORDER BY event_time) as next_eventFROM user_eventsWHERE event_date = CURDATE()
)
SELECT prev_event,event_type,next_event,COUNT(*) as transition_count
FROM user_journey
GROUP BY prev_event, event_type, next_event
ORDER BY transition_count DESC;
實時庫存預警
WITH inventory_status AS (SELECT product_id,current_stock,AVG(current_stock) OVER (PARTITION BY category_id) as category_avg,RANK() OVER (PARTITION BY warehouse_id ORDER BY current_stock) as stock_rankFROM inventory
)
SELECT product_id, current_stock
FROM inventory_status
WHERE current_stock < (0.2 * category_avg) ORstock_rank <= 5; -- 每個倉庫庫存最低的5個商品
建議
漸進式優化:先確保SQL正確性,再逐步應用高級優化
測試驗證:所有優化必須通過真實數據驗證
監控迭代:持續監控執行計劃變化
適度使用:避免過度復雜化SQL邏輯
版本特性:充分利用MySQL 8.0+的新特性