窗口函數介紹
窗口函數的引入是為了解決想要既顯示聚集前的數據,又要顯示聚集后的數據;窗口數對一組值進行操作,不需要使用GROUP BY子句對數據進行分組,能夠在同一行中同時返回基礎行的列和聚合列。
強調:使用MySQL 8.0版本方可實現
基本語法
函數名(列) over(選項) 選項為partition by 列 order by 列
解釋:
- over(partition by xxx) 按xxx分組的所有行進行分組
- over(partition by xxx order by aaa) 按列xxx分組,按列aaa排序
- over(order by aaa) 按aaa列排序
- over括號中的partition by和order by的使用根據具體情況選擇
-- 需求:計算每個學生的及格科目數
-- 使用聚合函數,類似數據透視表,原有表結構已發生變化
SELECT student_id,count( sid ) FROM score WHERE num >= 60 GROUP BY student_id;-- 使用窗口函數,不會更改原表結構
SELECT student_id,count( sid ) over ( PARTITION BY student_id ORDER BY student_id ) AS 及格數
FROM score WHERE num >= 60;
聚合窗口函數
語法:聚合函數(列) over(partition by 列 order by 列)
常見的聚合函數:sum() count() avg() max() min()
排序窗口函數
- row_number():僅僅根據行號進行排序,相同結果則排序按照順序依次排
- rank():排名,與row_number函數不同的是,rank函數考慮到了over子句中排序字段值相同的情況,over子句中排序字段值相同的排序結果是一樣的,后面字段值不相同的序號將跳過相同的排名號排下一個。如:11335
- dense_rank():密集排序,用法跟rank類似,唯一不同是當排序結果相同時,排序不跳躍,而是緊跟排下一個。如:11223
- ntile():桶排序,首先,ntile會先根據你的分組依據,然后把每個組的總記錄數進行按照你給的ntile(n)里的數字n進行均分,這個數字就是桶數,例如一個組內總共12條記錄,若n=6,則等劃分成6桶,然后按照num的排序等級劃分,12/6=2則每個桶兩條記錄,也就是112233445566的排序結果,常用于提取前百分之多少的應用場景。
都是排名函數,不同之處在對于名次相同的數據處理方式
-- 對每門課程進行排序
SELECTs.sid,s1.sname,s1.gender,c.cname,s.num,row_number() over ( PARTITION BY c.cname ORDER BY num DESC ) AS row_number排名,rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS rank排名,dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名,ntile( 6 ) over ( PARTITION BY c.cname ORDER BY num DESC ) AS ntile排名
FROMscore AS sJOIN student AS s1 ON s.student_id = s1.sidLEFT JOIN course AS c ON s.course_id = c.cid-- 計算每門課程前三,考慮排名相同的情況
SELECT * FROM (SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名 FROMscore AS sJOIN student AS s1 ON s.student_id = s1.sidLEFT JOIN course AS c ON s.course_id = c.cid ) AS a
WHEREdense_rank排名 <=3
位置移動窗口函數
- lag(col,n):col列名,n行數,用于統計窗口內往上第n行值
- lead(col,n):col列名,n行數,用于統計窗口內往下第n行值
這兩個函數可以用于同列中相鄰行的數據計算
應用場景:
- 計算作弊次數
-- 需求:對于下面的數據,對于同一用戶(uid)如果在2分鐘之內重新登陸,則判斷為作弊,統計哪些用戶有作弊行為,并計算作弊次數
-- 數據代碼
CREATE TABLE lead_table (id INT PRIMARY KEY,uid INT NOT NULL,login_time datetime NOT NULL );
INSERT INTO lead_table
VALUES( 1, 1, "2020-8-26 12:59:00" ),( 2, 1, "2020-8-26 13:02:23" ),( 3, 1, "2020-8-26 13:03:34" ),( 4, 1, "2020-8-26 13:09:00" ),( 5, 2, "2020-8-26 13:57:00" ),( 6, 2, "2020-8-26 13:59:00" ),( 7, 2, "2020-8-26 13:59:45" );
思路:根據題目要求,如果能把相鄰兩列的下面一列與上面那一列變成同一行,不久能實現相減了么,因此我們可以多生成一列,例如:把uid都為1的第二行記錄生成到第一行,以此類推,這就可以用到lead往下移動的操作了
-- 第一步
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time
FROM lead_table;--第二步
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,
TIMESTAMPDIFF(SECOND,login_time,LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒數
FROM lead_table;-- 最終代碼
SELECT uid,COUNT( 1 ) AS 作弊次數
FROM(SELECT id,uid,login_time,LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,TIMESTAMPDIFF(SECOND,login_time,lead( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒數
FROM lead_table ) AS e
WHERE 相差秒數 <= 120
GROUP BY uid;
- 計算次日留存率
其他窗口函數
- first_value(column):取分組排序后第一個值
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
FIRST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS first_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
- last_value(column):取分組排序后最后一個值
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS last_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
為什么和想要的結果不一樣呢?
實際上,窗口函數默認統計范圍是rows between unbounded preceding and current row,也就是取當前行數據與當前行之前的數據的比較。
修改SQL,在order by條件的后面加上語句:rows between unbounded preceding and unbounded following,可以理解為:當前分組數據中的所有數據進行比較,取最后一條記錄。
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC rows between unbounded preceding and unbounded following) AS last_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
詳細介紹:
- rows between XXX and XXX
- unbounded 無限制的
- preceding 分區的當前記錄的向前偏移量
- current 當前
- following 分區的當前記錄的向后偏移量
示例:累計計算每個月的銷售額
-- 示例數據
CREATE TABLE sale (id INT PRIMARY KEY auto_increment,年份 INT,月份 INT,
money FLOAT ( 10, 2 ));
INSERT INTO sale ( 年份, 月份, money )
VALUES( 2020, 1, 5840 ),( 2020, 2, 5780 ),( 2020, 3, 4300 ),( 2020, 4, 4760 ),( 2020, 5, 3630 ),( 2020, 6, 4130 ),( 2020, 7, 4350 );-- 語句
SELECT *,sum( money ) over ( ORDER BY 月份rows between unbounded preceding and current row) AS 累計銷售額
FROM sale;
本章示例數據
CREATE DATABASE school;
USE school;
CREATE TABLE class (cid INT ( 11 ) NOT NULL auto_increment,caption VARCHAR ( 32 ) NOT NULL,PRIMARY KEY ( cid )
) ENGINE = INNODB charset = utf8;INSERT INTO class
VALUES( 1, '三年二班' ),( 2, '三年三班' ),( 3, '二年二班' ),( 4, '一年二班' ),( 5, '二年五班' );CREATE TABLE teacher (tid INT ( 11 ) NOT NULL auto_increment,tname VARCHAR ( 32 ) NOT NULL,PRIMARY KEY ( tid )
) ENGINE = INNODB DEFAULT charset = utf8;INSERT INTO teacher
VALUES( 1, '張磊老師' ),( 2, '李平老師' ),( 3, '劉蘭老師' ),( 4, '朱朱老師' ),( 5, '李杰老師' );CREATE TABLE course (cid INT ( 11 ) NOT NULL auto_increment,cname VARCHAR ( 32 ) NOT NULL,teacher_id INT ( 11 ) NOT NULL,PRIMARY KEY ( cid ),KEY fk_couurse_teacher ( teacher_id ),CONSTRAINT fk_course_teacher FOREIGN KEY ( teacher_id ) REFERENCES teacher ( tid )
) ENGINE = INNODB DEFAULT charset = utf8;
INSERT INTO course
VALUES( 1, '生物', 1 ),( 2, '物理', 2 ),( 3, '體育', 3 ),( 4, '美術', 2 );CREATE TABLE student (sid INT ( 11 ) NOT NULL auto_increment,gender CHAR ( 1 ) NOT NULL,class_id INT ( 11 ) NOT NULL,sname VARCHAR ( 32 ) NOT NULL,PRIMARY KEY ( sid ),KEY fk_class ( class_id ),CONSTRAINT fk_class FOREIGN KEY ( class_id ) REFERENCES class ( cid )
) ENGINE = INNODB DEFAULT charset = utf8;INSERT INTO student
VALUES( 1, '男', 1, '理解' ),( 2, '女', 1, '鋼蛋' ),( 3, '男', 1, '張三' ),( 4, '男', 1, '張思' ),( 5, '女', 1, '網易' ),( 6, '男', 1, '王二' ),( 7, '男', 2, '鐵道' ),( 8, '男', 2, '李武' ),( 9, '男', 2, '劉三' ),( 10, '女', 2, '劉一' ),( 11, '男', 2, '劉思' ),( 12, '男', 3, '王三' ),( 13, '男', 3, '小五' ),( 14, '男', 3, '小七' ),( 15, '女', 3, '如花' ),( 16, '男', 3, '張四' );CREATE TABLE score (sid INT ( 11 ) NOT NULL auto_increment,student_id INT ( 11 ) NOT NULL,course_id INT ( 11 ) NOT NULL,num INT ( 11 ) NOT NULL,PRIMARY KEY ( sid ),KEY fk_score_student ( student_id ),KEY fk_score_course ( course_id ),CONSTRAINT fk_score_course FOREIGN KEY ( course_id ) REFERENCES course ( cid ),CONSTRAINT fk_score_student FOREIGN KEY ( student_id ) REFERENCES student ( sid )
) ENGINE = INNODB DEFAULT charset = utf8;INSERT INTO score
VALUES( 1, 1, 1, 10 ),( 2, 1, 2, 9 ),( 5, 1, 4, 66 ),( 6, 2, 1, 8 ),( 8, 2, 3, 68 ),( 9, 2, 4, 99 ),( 10, 3, 1, 77 ),( 11, 3, 2, 66 ),( 12, 3, 3, 87 ),( 13, 3, 4, 99 ),( 14, 4, 1, 79 ),( 15, 4, 2, 11 ),( 16, 4, 3, 67 ),( 17, 4, 4, 100 ),( 18, 5, 1, 79 ),( 19, 5, 2, 11 ),( 20, 5, 3, 67 ),( 21, 5, 4, 100 );
來自: 學習MySQL(五):窗口函數