官網地址:https://pandas.pydata.org/docs/reference/groupby.html
pandas中對數據進行分組操作的方法,官方有很詳細的教程。下面的案例是真實遇到的問題,看一看用pandas是如何解決的。

import pandas as pdimport numpy as npdf = pd.DataFrame(data={ "boss":["A"]*3+["B"]*3+["C"]*4, "owner":["A1","A1","A2","B1","B2","B2","C1","C1","C2","C2"], "month":[1,2,1,1,1,2,1,2,1,2], "fk_money":[10,20,30,40,50,60,70,80,90,100],})

boss | owner | month | fk_money | |
0 | A | A1 | 1 | 10 |
1 | A | A1 | 2 | 20 |
2 | A | A2 | 1 | 30 |
3 | B | B1 | 1 | 40 |
4 | B | B2 | 1 | 50 |
5?? | B | B2 | 2 | 60 |
6 | C | C1 | 1 | 70 |
7 | C | C1 | 2 | 80 |
8 | C | C2 | 1 | 90 |
9 | C | C2 | 2 | 100 |
解釋:比如第一條數據,老板A手下的業務員A1,在第1個月的放款金額為10萬。

解決思路:
- 按照owner分組,降序排列,取第一個數據
result1_df = df.sort_values(by="fk_money",ascending=False).groupby(by="owner").head(1)result1_df
boss | owner | month | fk_money | |
9 | C | C2 | 2 | 100 |
7 | C | C1 | 2 | 80 |
5 | B | B2 | 2 | 60 |
3 | B | B1 | 1 | 40 |
2 | A | A2 | 1 | 30 |
1 | A | A1 | 2 | 20 |
解釋:老板C手下的業務員C2在第2個月的放款金額最大為100萬。
拓展:如何取第二大的數據?
GroupBy.nth(),取每一組第n行的數據,n從0開始,0代表第一行。
- 沒有第n行的時候,不取。
result1_df = df.sort_values(by="fk_money",ascending=False).groupby(by="owner",as_index=False).nth(1)result1_df
owner | boss | month | fk_money |
A1 | A | 1 | 10 |
B1 | B | 1 | 50 |
C1 | C | 1 | 70 |
C2 | C | 1 | 90 |

解決思路:
計算出每個業務員總的放款金額owner_total_fk_money
將df與計算好的owner_total_fk_money合并
fk_money除以owner_total_fk_money得到需要的數據
### 代碼實現:owner_total_fk_money?=?df.groupby(by="owner",as_index=False).agg({"fk_money":"sum"})
result1_df = pd.merge(df,owner_total_fk_money,on="owner",how="left",suffixes=("","_total"))
result1_df["rate"] = (result1_df["fk_money"]/result1_df["fk_money_total"]).map(lambda x:"{:.2%}".format(x))result1_df
boss | owner | month | fk_money | fk_money_total | rate | |
0 | A | A1 | 1 | 10 | 30 | 33.33% |
1 | A | A1 | 2 | 20 | 30 | 66.67% |
2 | A | A2 | 1 | 30 | 30 | 100.00% |
3 | B | B1 | 1 | 40 | 40 | 100.00% |
4 | B | B2 | 1 | 50 | 110 | 45.45% |
5 | B | B2 | 2 | 60 | 110 | 54.55% |
6 | C | C1 | 1 | 70 | 150 | 46.67% |
7 | C | C1 | 2 | 80 | 150 | 53.33% |
8 | C | C2 | 1 | 90 | 190 | 47.37% |
9 | C | C2 | 2 | 100 | 190 | 52.63% |

解決思路:
需要知道每個老板總的放款金額,boss_df
需要知道每個業務員的放款金額,owner_df
按照boss字段合并boss_df和owner_df
業務員的放款金額除以每個老板總的放款金額
# 計算每一個boss的總fk_moneyboss_df = df.groupby(by="boss",as_index=False).agg({"fk_money":"sum"})
# 計算每一個owner的總fk_moneyowner_df = df.groupby(by=["boss","owner"],as_index=False).agg({"fk_money":"sum"})
# 合并owner_df和boss_dfresult2_df = pd.merge(owner_df,boss_df,on="boss",how="left",suffixes=("_owner","_boss"))
result2_df["占比"] = (result_df["fk_money_owner"]/result_df["fk_money_boss"]).map(lambda x:"{:.2%}".format(x)result2_df
boss | owner | fk_money_owner | fk_money_boss | reate | |
0 | A | A1 | 30 | 60 | 50.00% |
1 | A | A2 | 30 | 60 | 50.00% |
2 | B | B1 | 40 | 150 | 26.67% |
3 | B | B2 | 110 | 150 | 73.33% |
4 | C | C1 | 150 | 340 | 44.12% |
5 | C | C2 | 190 | 340 | 55.88% |
解釋:老板A手下的A1占總放款金額(60萬)比例為50%。
----END----