在大數據風控業務實踐過程中,目前業內主要還是采用規則疊加的辦法做策略,但是會遇到一些問題:
1.我們有10條規則,我上了前7條后,后面3條的絕對風險增益是多少?
2.我的規則之間應該做排序嗎,最重要的放在前面?
3.規則重合了怎么辦,我又想收風險又想保留較高批核怎么辦?
4.我應用完所有的規則后,金額逾期風險能下降到什么水平?
因此,我們按規則的重要性(lift高且抓壞規模大)將規則封裝,再按照循環做逐個碰撞,計算由先到后每個規則的絕對增益。
本篇代碼的優點是:就能充分確保后面的規則不因為前面的規則而喪失效率,批核能全部用在刀刃上,規則不重合,計算不同口徑Y指標下的風險情況,風險目標更清晰準確的預算。
本文是代碼交流,里面的策略閾值都為虛構。
定義規則并應用
rules = [ ('1_rhzx_nnm_over_acct>11', lambda row: row['rhzx_nnm_over_acct'] > 11), ('2_rhzx_nnm_over_cnt>=22', lambda row: row['rhzx_nnm_over_cnt'] >= 22), ('3_multi_final_level>=33', lambda row: row['multi_final_level'] >=33), # 可以添加更多規則...
]
定義一個計算金額逾期風險的函數
def cal_risk2(df2): total_fm7_amt = df2['fpd_fm7_amt'].sum() total_fz7_amt = df2['fpd_fz7_amt'].sum() total_mob4m2fenmu_amt = df2['mob4m2fenmu_amt'].sum() total_mob4m2fenzi_amt = df2['mob4m2fenzi_amt'].sum() return total_fz7_amt / total_fm7_amt if total_fm7_amt > 0 else 0,total_mob4m2fenzi_amt / total_mob4m2fenmu_amt if total_mob4m2fenmu_amt > 0 else 0
應用規則并計算邊際風險壓降效果
def cal_risk_lift(current_df): results2=[]results2 = [ { 'Rule': 'Baseline', 'Touched_Count': len(current_df), 'Due_FPD7_Amt': round(float(current_df['fpd_fm7_amt'].sum()),2), 'FPD_Fz7_Amt': current_df['fpd_fz7_amt'].sum(), '命中_$FPD7': current_df['fpd_fz7_amt'].sum()/ current_df['fpd_fm7_amt'].sum(), '命中_$MOB4M2': current_df['mob4m2fenzi_amt'].sum()/ current_df['mob4m2fenmu_amt'].sum(), '應用規則后_$FPD7': cal_risk2(current_df)[0],'應用規則后_$MOB4M2': cal_risk2(current_df)[1] } ] for i, (rule_name, rule_func) in enumerate(rules, start=1): touched_loans = current_df[current_df.apply(rule_func, axis=1)] touched_loan_ids = set(touched_loans['loan_id'].unique()) # 計算被規則觸碰的貸款數據的到期金額和逾期金額 new_fm7_amount = round(touched_loans['fpd_fm7_amt'].sum() ,2) new_fz7_amount = touched_loans['fpd_fz7_amt'].sum() new_mob4m2fm_amt = touched_loans['mob4m2fenmu_amt'].sum() new_mob4m2fz_amt = touched_loans['mob4m2fenzi_amt'].sum() # 計算剩余貸款的風險情況(在應用當前規則之后,以借據為唯一主鍵) remaining_df = current_df[~current_df['loan_id'].isin(touched_loan_ids)] new_df_risk = cal_risk2(remaining_df) if not remaining_df.empty else 0 # 記錄結果 results2.append({ 'Rule': rule_name, 'Touched_Count': len(touched_loan_ids), 'Due_FPD7_Amt': new_fm7_amount, 'FPD_Fz7_Amt': new_fz7_amount, '命中_$FPD7': new_fz7_amount/new_fm7_amount, '命中_$MOB4M2': new_mob4m2fz_amt/new_mob4m2fm_amt,### 展示純新增命中# '命中_$MOB4M2': df['mob4m2fenzi_amt'].sum()/df['mob4m2fenmu_amt'].sum(),## 展示總命中'應用規則后_$FPD7': new_df_risk[0],'應用規則后_$MOB4M2': new_df_risk[1] }) # 更新current_df以反映應用規則后的效果current_df = remaining_df # 將結果轉換為DataFrame results_df = pd.DataFrame(results2)return results_df
于是我們可以按分組group來統計規則的累計lift了
data6.groupby(['xj_sxdurseg']).apply(cal_risk_lift)