【Python】文件讀取：逐行讀取應用實例——從一個JSONL文件中逐行讀取文件

從一個JSONL文件中逐行讀取文件，并將這些問題保存到一個新的JSONL文件中

import json
import argparse
import os  # 導入os模塊用于檢查文件是否存在def read_questions_from_jsonl(file_path, limit):"""從JSONL文件中讀取指定數量的question部分的內容參數:file_path: JSONL文件的路徑limit: 要讀取的問題數量返回:一個包含最多`limit`個question的列表"""questions = []count = 0try:with open(file_path, 'r', encoding='utf-8') as file:for line in file:if count >= limit:break  # 達到限制，停止讀取# 解析每行JSONjson_data = json.loads(line)# 提取questionif 'question' in json_data:questions.append(json_data['question'])count += 1except json.JSONDecodeError:print("錯誤: 文件格式無效或損壞")return questionsdef save_questions_to_jsonl(questions, output_file_path):"""將問題列表保存到JSONL文件中參數:questions: 要保存的問題列表output_file_path: 輸出文件的路徑"""with open(output_file_path, 'w', encoding='utf-8') as file: for question in questions:json_line = json.dumps({"question": question}, ensure_ascii=False)file.write(json_line + '\n')def main():# 設置命令行參數解析parser = argparse.ArgumentParser(description='從JSONL文件中提取問題并保存到新的JSONL文件中')parser.add_argument('input_path', help='輸入JSONL文件的路徑')parser.add_argument('output_path', help='輸出JSONL文件的路徑')parser.add_argument('--limit', type=int, default=100, help='要讀取的問題數量，默認為100')  # 設置默認值# 解析命令行參數args = parser.parse_args()# 檢查輸入文件是否存在if not os.path.exists(args.input_path):raise FileNotFoundError(f"錯誤: 輸入文件 {args.input_path} 不存在")# 讀取指定數量的問題questions = read_questions_from_jsonl(args.input_path, args.limit)# 將問題保存到JSONL文件中save_questions_to_jsonl(questions, args.output_path)print(f"已將 {len(questions)} 個問題保存到 {args.output_path}")if __name__ == "__main__":main()

檢查輸入文件是否存在

import os# 檢查輸入文件是否存在
if not os.path.exists(args.input_path):raise FileNotFoundError(f"錯誤: 輸入文件 {args.input_path} 不存在")

檢查要輸出的文件是否存在

使用open函數以寫入模式（‘w’）打開文件時，如果指定的文件路徑不存在，Python會自動創建一個新的文件。

如果存在在現有文件后追加，如果不存在，創建新的輸出文件，可修改函數為

def save_questions_to_jsonl(questions, output_file_path):"""將問題列表保存到JSONL文件中參數:questions: 要保存的問題列表output_file_path: 輸出文件的路徑"""# 檢查輸出文件是否存在if os.path.exists(output_file_path):# 文件存在，以追加模式打開mode = 'a'else:# 文件不存在，以寫入模式打開mode = 'w'# 根據mode變量動態選擇文件打開模式with open(output_file_path, mode, encoding='utf-8') as file:for question in questions:json_line = json.dumps({"question": question}, ensure_ascii=False)file.write(json_line + '\n')

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/pingmian/87704.shtml
繁體地址，請注明出處：http://hk.pswp.cn/pingmian/87704.shtml
英文地址，請注明出處：http://en.pswp.cn/pingmian/87704.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！