有11家門店數據,要求每天所有門店的各個指標的中位數
1.第一種做法,使用PERCENTILE_CONT() 函數 SQL SERVER 2012 版本及以上
PERCENTILE_CONT 函數簡介
PERCENTILE_CONT 是 SQL 中的窗口函數,用于計算連續百分位數(基于線性插值)。適用于需要獲取數據分布中特定百分位值的場景,如統計工資的中位數或 90% 分位數。
語法結構
PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY sort_expression)
OVER ([PARTITION BY partition_expression])
- percentile:介于 0 和 1 之間的數值,指定所需的百分位(如 0.5 表示中位數)。
- sort_expression:用于排序的列或表達式。
- PARTITION BY:可選,按指定列分組計算百分位數。
基礎用法示例
計算整個表中工資列的中位數(50% 分位數):
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees;
分組計算示例
按部門分組計算工資的 75% 分位數:
SELECT department_id,PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS percentile_75
FROM employees
GROUP BY department_id;
窗口函數用法
結合 OVER 和 PARTITION BY 計算每個部門的工資中位數:
SELECT employee_id,department_id,salary,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) AS dept_median_salary
FROM employees;
多百分位計算
單次查詢中計算多個百分位值:
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS percentile_25,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS percentile_50,PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS percentile_75
FROM employees;
注意事項
- PERCENTILE_CONT 在 Oracle、PostgreSQL 和 SQL Server 中支持,但 MySQL 需使用 PERCENTILE_CONT 的替代方案。
- 結果為浮點數,即使輸入是整數類型。
- 空值(NULL)會被自動忽略。
- 與 PERCENTILE_DISC 的區別:PERCENTILE_CONT 返回插值結果,PERCENTILE_DISC 返回實際存在的值。
解決代碼:
WITH test_basedata AS (-- 模擬11家門店的指標數據(sdate統一為20250814,覆蓋不同指標值,方便驗證中位數)SELECT '20250814' AS sdate, 'SH001' AS shopid, 0.25 AS profit_rate, 89.5 AS avg_customer_price, 2.1 AS avg_num_purchases, 0.85 AS salevalue_finishrate, 0.12 AS abysku_rate, 35 AS dsiUNION ALL SELECT '20250814', 'SH002', 0.32, 105.2, 1.8, 1.02, 0.08, 28UNION ALL SELECT '20250814', 'SH003', 0.18, 76.9, 2.5, 0.68, 0.15, 42UNION ALL SELECT '20250814', 'SH004', 0.41, 120.7, 1.5, 1.25, 0.05, 22UNION ALL SELECT '20250814', 'SH005', 0.29, 98.3, 2.3, 0.95, 0.10, 30UNION ALL SELECT '20250814', 'SH006', 0.35, 112.1, 1.9, 1.10, 0.07, 25UNION ALL SELECT '20250814', 'SH007', 0.22, 82.4, 2.4, 0.75, 0.13, 38UNION ALL SELECT '20250814', 'SH008', 0.38, 109.6, 1.7, 1.08, 0.06, 26UNION ALL SELECT '20250814', 'SH009', 0.27, 92.8, 2.2, 0.88, 0.11, 33UNION ALL SELECT '20250814', 'SH010', 0.31, 10.5, 2.0, 0.99, 0.09, 29UNION ALL SELECT '20250814', 'SH011', 0.42, 10.5, 2.0, 0.99, 0.09, 29
),
daily_medians AS (SELECTdistinct sdate,-- 計算中位數PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY profit_rate) OVER(PARTITION BY sdate) AS median_profit_rate,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY avg_customer_price) OVER(PARTITION BY sdate) AS median_avg_customer_price,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY avg_num_purchases) OVER(PARTITION BY sdate) AS median_avg_num_purchases,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salevalue_finishrate) OVER(PARTITION BY sdate) AS median_salevalue_finishrate,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY abysku_rate) OVER(PARTITION BY sdate) AS median_abysku_rate,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY dsi) OVER(PARTITION BY sdate) AS median_dsiFROM test_basedata -- GROUP BY sdate,profit_rate,avg_customer_price,avg_num_purchases,salevalue_finishrate,abysku_rate,dsi
)
SELECT * FROM daily_medians;
2.第二種做法,通用版本
解決代碼:
1.首先利用窗口函數,求出總行數以及每個指標的排序
2.再利用ceiling 向上取整函數,和floor 向下取整函數 去求中位數的位置,并且返回對應的中位數數值
3.最后兩數加和,取平均值
代碼功能說明
這段SQL代碼用于計算多個門店在不同業務指標上的中位數值。通過創建臨時測試數據表test_basedata
,模擬了11家門店在利潤、客單價、購買次數等指標上的數據,并采用窗口函數等方法計算各指標的中位數。
數據結構說明
臨時表test_basedata
包含以下字段:
sdate
:日期標識(固定為20250814)shopid
:門店編號profit_rate
:利潤率avg_customer_price
:客單價avg_num_purchases
:平均購買次數salevalue_finishrate
:銷售目標完成率abysku_rate
:缺貨率dsi
:庫存周轉天數
核心計算邏輯
中位數計算原理: 當數據量為奇數時取中間值,偶數時取中間兩個值的平均數。代碼通過CEILING
和FLOOR
函數分別定位中間位置的上下界。
計算過程分解:
- 內層查詢使用
ROW_NUMBER()
為每個指標按值排序并標記排名 - 中間層查詢通過
MAX(CASE WHEN...)
提取中間位置對應的指標值 - 外層查詢對偶數的中間位置取平均值得到最終中位數
關鍵函數說明
ROW_NUMBER() OVER (PARTITION BY...ORDER BY...)
窗口函數,對分組后的數據按指定字段排序并生成序號CEILING(total_count/2.0)
向上取整,用于定位中位數上界位置(如10個值時返回5,11返回6(5.5向上取整為6))FLOOR(total_count/2.0)+1
向下取整后+1,用于定位中位數下界位置(如10個值時返回6,11返回6(5.5向下取整為5,5+1=6))COUNT(*) OVER (PARTITION BY sdate)
計算每個日期的總記錄數,用于判斷奇偶性
結果輸出說明
最終輸出包含:
- 原始日期字段
sdate
median_profit_rate
:利潤率中位數median_avg_customer_price
:客單價中位數median_avg_num_purchases
:平均購買次數中位數median_salevalue_finishrate
:銷售完成率中位數median_abysku_rate
:缺貨率中位數median_dsi
:庫存周轉天數中位數
特殊處理說明
對于偶數個數據點(如示例中的11條記錄):
- 利潤率中位數 = (第6名的值 + 第6名的值)/2
- 實際會取相同值,等同于直接取第6名的值
對于奇數個數據點(如10條記錄):
- 會正確計算第5名和第6名指標值的平均數
解決代碼:
WITH test_basedata AS (-- 模擬10家門店的指標數據(sdate統一為20250814,覆蓋不同指標值,方便驗證中位數)SELECT '20250814' AS sdate, 'SH001' AS shopid, 0.25 AS profit_rate, 89.5 AS avg_customer_price, 2.1 AS avg_num_purchases, 0.85 AS salevalue_finishrate, 0.12 AS abysku_rate, 35 AS dsiUNION ALL SELECT '20250814', 'SH002', 0.32, 105.2, 1.8, 1.02, 0.08, 28UNION ALL SELECT '20250814', 'SH003', 0.18, 76.9, 2.5, 0.68, 0.15, 42UNION ALL SELECT '20250814', 'SH004', 0.41, 120.7, 1.5, 1.25, 0.05, 22UNION ALL SELECT '20250814', 'SH005', 0.29, 98.3, 2.3, 0.95, 0.10, 30UNION ALL SELECT '20250814', 'SH006', 0.35, 112.1, 1.9, 1.10, 0.07, 25UNION ALL SELECT '20250814', 'SH007', 0.22, 82.4, 2.4, 0.75, 0.13, 38UNION ALL SELECT '20250814', 'SH008', 0.38, 109.6, 1.7, 1.08, 0.06, 26UNION ALL SELECT '20250814', 'SH009', 0.27, 92.8, 2.2, 0.88, 0.11, 33UNION ALL SELECT '20250814', 'SH010', 0.31, 10.5, 2.0, 0.99, 0.09, 29UNION ALL SELECT '20250814', 'SH011', 0.42, 10.5, 2.0, 0.99, 0.09, 29
)
SELECT sdate,-- 計算中位數(偶數時取平均值)(median_profit_rate_high + median_profit_rate_low) / 2.0 AS median_profit_rate,(median_avg_customer_price_high + median_avg_customer_price_low) / 2.0 AS median_avg_customer_price,(median_avg_num_purchases_high + median_avg_num_purchases_low) / 2.0 AS median_avg_num_purchases,(median_salevalue_finishrate_high + median_salevalue_finishrate_low) / 2.0 AS median_salevalue_finishrate,(median_abysku_rate_high + median_abysku_rate_low) / 2.0 AS median_abysku_rate,(median_dsi_high + median_dsi_low) / 2.0 AS median_dsi
FROM (SELECT sdate,MAX(CASE WHEN profit_rate_rank = CEILING(total_count/2.0) THEN profit_rate END) AS median_profit_rate_high,MAX(CASE WHEN profit_rate_rank = FLOOR(total_count/2.0)+1 THEN profit_rate END) AS median_profit_rate_low,MAX(CASE WHEN avg_customer_price_rank = CEILING(total_count/2.0) THEN avg_customer_price END) AS median_avg_customer_price_high,MAX(CASE WHEN avg_customer_price_rank = FLOOR(total_count/2.0)+1 THEN avg_customer_price END) AS median_avg_customer_price_low,MAX(CASE WHEN avg_num_purchases_rank = CEILING(total_count/2.0) THEN avg_num_purchases END) AS median_avg_num_purchases_high,MAX(CASE WHEN avg_num_purchases_rank = FLOOR(total_count/2.0)+1 THEN avg_num_purchases END) AS median_avg_num_purchases_low,MAX(CASE WHEN salevalue_finishrate_rank = CEILING(total_count/2.0) THEN salevalue_finishrate END) AS median_salevalue_finishrate_high,MAX(CASE WHEN salevalue_finishrate_rank = FLOOR(total_count/2.0)+1 THEN salevalue_finishrate END) AS median_salevalue_finishrate_low,MAX(CASE WHEN abysku_rate_rank = CEILING(total_count/2.0) THEN abysku_rate END) AS median_abysku_rate_high,MAX(CASE WHEN abysku_rate_rank = FLOOR(total_count/2.0)+1 THEN abysku_rate END) AS median_abysku_rate_low,MAX(CASE WHEN dsi_rank = CEILING(total_count/2.0) THEN dsi END) AS median_dsi_high,MAX(CASE WHEN dsi_rank = FLOOR(total_count/2.0)+1 THEN dsi END) AS median_dsi_low,total_countFROM (SELECT sdate,profit_rate,avg_customer_price,avg_num_purchases,salevalue_finishrate,abysku_rate,dsi,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY profit_rate) AS profit_rate_rank,COUNT(*) OVER (PARTITION BY sdate) AS total_count,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY avg_customer_price) AS avg_customer_price_rank,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY avg_num_purchases) AS avg_num_purchases_rank,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY salevalue_finishrate) AS salevalue_finishrate_rank,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY abysku_rate) AS abysku_rate_rank,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY dsi) AS dsi_rankFROM test_basedata) ranked_dataGROUP BY sdate, total_count
) median_calculation