python:大文件分批/塊導入數據庫方式記錄

一、問題背景

對于數據文件比較大的數據，一次性串聯sql進行入庫，往往會受到數據庫本身對sql長度的限制，從而需要分塊或者分批次，將大數據文件一點一點的進行入庫。特針對這種入庫方式，進行一個簡單記錄，各類數據庫入庫后續均可參考下述實現分塊的方式，進行分批入庫數據。

二、實現記錄

問題點其實主要是如何對數據進行分塊。pyhton的read_csv函數和read_excel等數據讀取函數都含有分批讀取數據的參數：chunksize，從而實現分批入庫。

如果是已經全部讀取了數據，還想再進一步分批，則可以參考下述代碼進行分批讀取：

batch_size = 2000
total_rows = p_result_notice_mes_df.shape[0]    
total_batches = total_rows // batch_size + (1 if total_rows % batch_size > 0 else 0)
for i in range(total_batches):start_index = i * batch_sizeend_index = min((i + 1) * batch_size, total_rows)batch_df = p_result_notice_mes_df.iloc[start_index:end_index]

部分示例程序如下：

third_tbname ='bods.scw_info'# 計算數據總數和批次數batch_size = 2000total_rows = p_result_notice_mes_df.shape[0]    total_batches = total_rows // batch_size + (1 if total_rows % batch_size > 0 else 0)if p_third_flag:for i in range(total_batches):start_index = i * batch_sizeend_index = min((i + 1) * batch_size, total_rows)batch_df = p_result_notice_mes_df.iloc[start_index:end_index]third_values_list = []cursor.execute(f"truncate table {third_tbname} ")# 構建批量插入的SQL語句        insert_query = f"""INSERT into {third_tbname} (changelog_id, notice_model, notice_batch,brand, vehicle_type, rated_quality, total_quality, curb_weight, fuel_type, emission_standard)        VALUES """for index, row in batch_df.iterrows():third_values_list.append(f"""('{row["變記錄"]}' , '{row["告"]}', '{row["公次"]}', '{row["品牌"]}', '{row["類型"]}', '{row["額量"]}', '{row["總"]}','{row["整量"]}', '{row["燃類"]}', '{row["排放準"]}')""")insert_query += ',\n'.join(third_values_list)# 執行批量插入cursor.execute(insert_query) print('公告url信息更新入庫成功！\n')  else:print('公告鏈接信息無需更新')

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/diannao/14113.shtml
繁體地址，請注明出處：http://hk.pswp.cn/diannao/14113.shtml
英文地址，請注明出處：http://en.pswp.cn/diannao/14113.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！