Python Pandas聚合函數
窗口函數可以與聚合函數一起使用,聚合函數指的是對一組數據求總和、最大值、最小值以及平均值的操作。
應用聚合函數
首先讓我們創建一個 DataFrame 對象,然后對聚合函數進行應用。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(5,4),index = pd.date_range('12/14/2020', periods=5),columns = ['A', 'B', 'C', 'D'])
print (df)
#窗口大小為3,min_periods 最小觀測值為1
r = df.rolling(window=3,min_periods=1)
print(r)
輸出結果:
A B C D
2020-12-14 0 1 2 3
2020-12-15 4 5 6 7
2020-12-16 8 9 10 11
2020-12-17 12 13 14 15
2020-12-18 16 17 18 19
Rolling [window=3,min_periods=1,center=False,axis=0,method=single]
1) 對整體聚合
您可以把一個聚合函數傳遞給 DataFrame,示例如下:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(5,4),index = pd.date_range('12/14/2020', periods=5),columns = ['A', 'B', 'C', 'D'])
print (df)
#窗口大小為3,min_periods 最小觀測值為1
r = df.rolling(window=3,min_periods=1)
print(r.aggregate(np.sum))
# 以下方式也可以
# print(r.sum())
輸出結果:
A B C D
2020-12-14 0 1 2 3
2020-12-15 4 5 6 7
2020-12-16 8 9 10 11
2020-12-17 12 13 14 15
2020-12-18 16 17 18 19A B C D
2020-12-14 0.0 1.0 2.0 3.0
2020-12-15 4.0 6.0 8.0 10.0
2020-12-16 12.0 15.0 18.0 21.0
2020-12-17 24.0 27.0 30.0 33.0
2020-12-18 36.0 39.0 42.0 45.0
2) 對任意某一列聚合
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(5,4),index = pd.date_range('12/14/2020', periods=5),columns = ['A', 'B', 'C', 'D'])
print (df)
#窗口大小為3,min_periods 最小觀測值為1
r = df.rolling(window=3,min_periods=1)
print(r['B'].aggregate(np.sum))
輸出結果:
A B C D
2020-12-14 0 1 2 3
2020-12-15 4 5 6 7
2020-12-16 8 9 10 11
2020-12-17 12 13 14 15
2020-12-18 16 17 18 19
2020-12-14 1.0
2020-12-15 6.0
2020-12-16 15.0
2020-12-17 27.0
2020-12-18 39.0
Freq: D, Name: B, dtype: float64
3) 對多列數據聚合
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(5,4),index = pd.date_range('12/14/2020', periods=5),columns = ['A', 'B', 'C', 'D'])
print (df)
#窗口大小為3,min_periods 最小觀測值為1
r = df.rolling(window=3,min_periods=1)
print(r['B','C'].aggregate(np.sum))
輸出結果:
A B C D
2020-12-14 0 1 2 3
2020-12-15 4 5 6 7
2020-12-16 8 9 10 11
2020-12-17 12 13 14 15
2020-12-18 16 17 18 19B C
2020-12-14 1.0 2.0
2020-12-15 6.0 8.0
2020-12-16 15.0 18.0
2020-12-17 27.0 30.0
2020-12-18 39.0 42.0
4) 對單列應用多個函數
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(5,4),index = pd.date_range('12/14/2020', periods=5),columns = ['A', 'B', 'C', 'D'])
print (df)
#窗口大小為3,min_periods 最小觀測值為1
r = df.rolling(window=3,min_periods=1)
print(r['B'].aggregate([np.sum,np.mean]))
輸出結果:
A B C D
2020-12-14 0 1 2 3
2020-12-15 4 5 6 7
2020-12-16 8 9 10 11
2020-12-17 12 13 14 15
2020-12-18 16 17 18 19sum mean
2020-12-14 1.0 1.0
2020-12-15 6.0 3.0
2020-12-16 15.0 5.0
2020-12-17 27.0 9.0
2020-12-18 39.0 13.0
5) 對不同列應用多個函數
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(5,4),index = pd.date_range('12/14/2020', periods=5),columns = ['A', 'B', 'C', 'D'])
print (df)
#窗口大小為3,min_periods 最小觀測值為1
r = df.rolling(window=3,min_periods=1)
print(r['B','C'].aggregate([np.sum,np.mean]))
輸出結果:
A B C D
2020-12-14 0 1 2 3
2020-12-15 4 5 6 7
2020-12-16 8 9 10 11
2020-12-17 12 13 14 15
2020-12-18 16 17 18 19B C sum mean sum mean
2020-12-14 1.0 1.0 2.0 2.0
2020-12-15 6.0 3.0 8.0 4.0
2020-12-16 15.0 5.0 18.0 6.0
2020-12-17 27.0 9.0 30.0 10.0
2020-12-18 39.0 13.0 42.0 14.0
6) 對不同列應用不同函數
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(20).reshape(5,4),index = pd.date_range('12/14/2020', periods=5),columns = ['A', 'B', 'C', 'D'])
print (df)
#窗口大小為3,min_periods 最小觀測值為1
r = df.rolling(window=3,min_periods=1)
print(r.aggregate({"B":np.sum,"C":np.mean}))
輸出結果:
A B C D
2020-12-14 0 1 2 3
2020-12-15 4 5 6 7
2020-12-16 8 9 10 11
2020-12-17 12 13 14 15
2020-12-18 16 17 18 19B C
2020-12-14 1.0 2.0
2020-12-15 6.0 4.0
2020-12-16 15.0 6.0
2020-12-17 27.0 10.0
2020-12-18 39.0 14.0