Python初學者筆記第二十期 -- （文件IO）

第29節課文件IO

在編程中，文件 I/O（輸入/輸出）允許程序與外部文件進行數據交互。Python 提供了豐富且易用的文件 I/O 操作方法，能讓開發者輕松實現文件的讀取、寫入和修改等操作。

IO交互方向

從硬盤文件 -> 讀取數據 -> 內存（程序）：輸入流 Input
內存（程序）-> 寫入數據 -> 硬盤文件：輸出流 Output

文件類型基礎

任何操作系統中根據文件中內容的組成方式，通常將文件區分為兩種類型：

字符類文件：內容數據底層通過字符組成；校驗方式-使用記事本打開文件不會出現亂碼！
- 任何編程語言編寫的源代碼、各種文本配置文件、記事本文檔、CSV文件、JSON文件、MD格式文件等
字節類文件：內容數據底層通過字節/二進制組成；校驗方式-使用記事本打開文件會出現亂碼！
- 圖片、音頻、視頻、可執行文件、壓縮文件、ppt、word、excel等

說到底，在計算機中，所有的文件在硬盤上存儲的時候，其實本質上都是字節文件

所謂的字符文件：字節文件 + 編碼表 = 字符文件

字符輸入流字符輸出流
字節輸入流字節輸出流

1. 指定文件路徑

在Python中操作文件前，首先需要正確指定文件路徑。

（1）絕對路徑

絕對路徑是從文件系統的根目錄開始的完整路徑，能唯一確定文件的位置。在不同操作系統中，絕對路徑的表示方式有所不同：

Windows：使用反斜杠 \ 作為路徑分隔符。不過在 Python 字符串里，反斜杠是轉義字符，所以要使用雙反斜杠 \\ 或者在字符串前加 r 來表示原始字符串，示例如下：

path1 = "C:\\Users\\HENG\\Desktop\\PyDay28"
path2 = "C:/Users/HENG/Desktop/PyDay28"

Linux 和 macOS：使用正斜杠 / 作為路徑分隔符。示例代碼：

path3 = '/home/xixi/desktop/test.txt'

（2）相對路徑

相對路徑是相對于當前工作目錄的路徑。當前工作目錄指的是程序運行時所在的目錄。可以使用 os.getcwd() 函數獲取當前工作目錄。常見的相對路徑表示方式有：

示例代碼：

import os
# 查看當前文件所在的目錄
print(os.getcwd())
# C:\Users\HENG\Desktop\PyDay28# 因為xixi.txt 與 Demo.py 處于同一個目錄下的
f = open("xixi.txt", "r")
print(f.read())
# 因為datas目錄 與 Demo.py 處于同一個目錄下的
f = open("datas\\example.txt", "r")
print(f.read())# 因為xixi.txt 與 Demo.py 處于同一個目錄下的
f = open("./xixi.txt", "r")
print(f.read())
# 因為datas目錄 與 Demo.py 處于同一個目錄下的
f = open("./datas\\example.txt", "r")
print(f.read())# Desktop\\YDJava\\README.md
f = open("../YDJava\\README.md", "r", encoding="UTF-8")
print(f.read())

（3）使用os.path模塊處理路徑

Python的os.path模塊提供了許多實用函數，可以幫助我們以跨平臺的方式處理文件路徑：

import os.path
# 路徑拼接
# haha\\xixi\\a.txt
full_path = os.path.join("D:\\","xixi","a.txt")
print(full_path)# 獲取文件的絕對路徑
abs_path = os.path.abspath("xixi.txt")
print(abs_path)# 獲取文件所在的目錄 一般傳入絕對路徑
dir_name = os.path.dirname(abs_path)
print(dir_name)
# 獲取文件名
file_name = os.path.basename(abs_path)
print(file_name)# 獲取文件名稱 與 后綴名
name,ext = os.path.splitext(file_name)
print(name, ext)# 檢查路徑是否存在
print(os.path.exists(abs_path))# 是否是文件或者目錄
print(os.path.isfile(abs_path))
print(os.path.isdir("./datas"))

2. 文件打開與關閉

文件操作的第一步是打開文件，最后一步是關閉文件。

（1）打開文件

在 Python 中，使用 open() 函數來打開文件，其基本語法如下：

file_object = open(file_path, mode, encoding=None, buffering=-1, errors=None)

主要參數說明：

file_path：文件的路徑，可以是絕對路徑或相對路徑。
mode：文件的打開模式，常見的模式有：
- 'r'：只讀模式，文件必須存在（默認模式）。
- 'w'：寫入模式，若文件不存在則創建，若存在則清空原有內容。
- 'a'：追加模式，若文件不存在則創建，若存在則在文件末尾追加內容。
- 'x'：獨占創建模式，若文件已存在則失敗。
- 'rb'：二進制只讀模式。
- 'wb'：二進制寫入模式。
- 'ab'：二進制追加模式。
- 'r+'：讀寫模式，文件必須存在。
- 'w+'：讀寫模式，若文件不存在則創建，若存在則清空原有內容。
- 'a+'：讀寫模式，若文件不存在則創建，若存在則在文件末尾追加內容。
encoding：指定文件的編碼方式（針對字符文件），常見的編碼方式有：
- 'utf-8'：Unicode編碼，支持多語言字符（推薦使用）。
- 'gbk'：中文編碼，主要用于簡體中文。
- 'ascii'：ASCII編碼，僅支持英文字符。
- 'latin-1'：西歐語言編碼。
  在處理文本文件時，建議明確指定編碼方式，避免出現編碼錯誤。
buffering：緩沖策略，-1表示使用默認緩沖策略，0表示無緩沖，1表示行緩沖，大于1表示緩沖區大小。【提高讀寫效率的】
errors：指定如何處理編碼和解碼錯誤，如’strict’（默認，拋出異常）、‘ignore’（忽略錯誤）、‘replace’（替換錯誤字符）等。

（2）關閉文件

文件使用完畢后，需要調用 close() 方法關閉文件，以釋放系統資源。不關閉文件可能導致資源泄漏和數據丟失。示例代碼如下：

file = open("./datas/example.txt", "w", encoding="utf-8")
content = file.read()
print(content)
file.close()

（3）使用with語句（上下文管理器）

為了避免忘記關閉文件或異常發生時文件未關閉的情況，推薦使用 with 語句（上下文管理器），它會在代碼塊執行完畢后自動關閉文件，即使發生異常也能確保文件被正確關閉：

with open("./datas/example.txt", 'r', encoding="utf-8") as file:content = file.read()print(content)

（4）同時操作多個文件

with語句也支持同時打開多個文件：

with (open("./datas/example.txt", 'r', encoding="utf-8") as file1,open("./datas/copy.txt", "w", encoding="utf-8") as file2):content = file1.read()file2.write(content)print(content)

3. 文件讀取

文件讀取是文件操作中最常見的任務之一。Python提供了多種方法來讀取文件內容，從整個文件一次性讀取到逐行處理，滿足不同的需求場景。

（1）讀取整個文件內容

使用 read() 方法可以一次性讀取整個文件的內容，適用于處理小型文件：

# 字符文件輸入流
with open("datas/example.txt", "r", encoding="utf-8") as file:content = file.read()print(content)
# 字節文件輸入流
with open("datas/example.txt", "rb") as file:content = file.read()print(content) # 打印的是字符串的字節形式
# 字節文件輸入流
with open('datas/fig.png', 'rb') as file:content = file.read()print(content) # 打印的是字符串的字節形式

也可以通過指定參數來讀取指定字節/字符數：

# 字符文件輸入流
with open("datas/example.txt", "r", encoding="utf-8") as file:# 字符輸入 10 -> 10個字符content = file.read(10) # 從開始讀10個字符print(content)content = file.read(10) # 從剛才讀取的位置繼續讀取10個字符print(content)content = file.read(10)  # 從剛才讀取的位置繼續讀取10個字符print(content)# 注意 換行也算字符！# 字節文件輸入流
with open("datas/example.txt", "rb") as file:content = file.read(12)print(content)content = file.read(12)print(content)content = file.read(12)print(content)#windows中換行\r\n

（2）逐行讀取文件內容

可以使用 readline() 方法逐行讀取文件內容，適用于按行處理文件：

# 字符文件輸入流
with open("datas/example.txt", "r", encoding="utf-8") as file:line = file.readline()while line:print(line)line = file.readline()

也可以使用 for 循環逐行讀取文件內容，這種方式更簡潔且內存效率更高，推薦使用：

# 字符文件輸入流
with open("datas/example.txt", "r", encoding="utf-8") as file:for line in file:print(line)

（3）讀取多行內容

使用 readlines() 方法可以將文件的每一行作為一個元素存儲在列表中，適用于需要隨機訪問行的場景：

# 字符文件輸入流
with open("datas/example.txt", "r", encoding="utf-8") as file:lines = file.readlines()print(len(lines))for i in range(len(lines)):print(lines[i])

（4）大文件處理技巧

處理大文件時，應避免一次性將整個文件讀入內存，而應采用逐行或分塊讀取的方式：

# 返回一個文件當中的某一個分塊
def read_in_chunks(file, chunk):with open(file, 'r', encoding='utf-8') as file:while True:block = file.read(chunk)if not block: # 文件結束breakyield block # 返回當前塊for block in read_in_chunks("./datas/pride-and-prejudice.txt", 100):print("=" * 20)print(block)

4. 文件寫入

文件寫入是程序將數據持久化存儲的重要方式。Python提供了多種方法來寫入文件，包括覆蓋寫入、追加寫入和按行寫入等。

（1）寫入文件

使用 write() 方法可以向文件中寫入內容，示例代碼如下：

# 覆蓋寫入
with open('./datas/copy.txt', 'w', encoding='utf-8') as file:file.write("Hello")

（2）追加寫入文件

使用 a 模式打開文件，然后使用 write() 方法可以在文件末尾追加內容，示例代碼如下：

# 追加寫入
with open('./datas/copy.txt', 'a', encoding='utf-8') as file:file.write("\nHello")

（3）寫入多行內容

使用 writelines() 方法可以一次性寫入多行內容，但需要注意該方法不會自動添加換行符：

# 追加寫入
with open('./datas/copy.txt', 'a', encoding='utf-8') as file:lines = ['Hello!\n', "World\n", "Nice\n"]file.writelines(lines)

（4）格式化寫入

結合字符串格式化功能，可以更靈活地寫入內容：

# 追加寫入
with open('./datas/copy.txt', 'a', encoding='utf-8') as file:file.write('\n')for i in range(1,10):for j in range(1,i + 1):file.write(f'{i} × {j} = {i * j} \t')file.write("\n")

（5）文件寫入的實際應用場景

簡易日志記錄器：

import datetime
def log_message(message):timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")log_item = f'[{timestamp}] {message}\r\n'with open("./datas/log.txt", 'a', encoding="utf-8") as file:file.write(log_item)log_message("原神啟動！")
log_message("抽卡100次")
log_message("一個都沒有！")
log_message("卸載 拜拜！")

5. 文件指針操作

文件指針是一個重要概念，它表示當前文件讀寫的位置。通過控制文件指針，我們可以實現隨機訪問文件的任意位置，而不必從頭到尾順序讀取。

（1）seek() 方法

seek() 方法用于移動文件指針到文件中的指定位置，其語法如下：

file.seek(offset, whence=0)

參數說明：

offset：偏移量，表示要移動的字節數。
whence：可選參數，指定偏移的起始位置，取值為：
- 0：從文件開頭開始偏移（默認值）。
- 1：從當前位置開始偏移。（字節，無緩沖，可支持隨機訪問）
- 2：從文件末尾開始偏移。（字節，無緩沖，可支持隨機訪問）

示例代碼如下：

with open('./datas/example.txt','r', encoding='utf-8') as file:file.seek(12)content = file.read(12)print(content)file.seek(30)content = file.read(5)print(content)

（2）tell() 方法

tell() 方法用于獲取文件指針的當前位置（從文件開頭算起的字節/字符數），示例代碼如下：

with open('./datas/example.txt','r', encoding='utf-8') as file:print(file.tell())content = file.read(5)print(content)print(file.tell())

6. 二進制文件操作

除了文本文件，Python 還可以處理二進制文件，如圖片、音頻、視頻等。在操作二進制文件時，需要使用二進制模式（'rb'、'wb'、'ab' 等）。二進制模式下，Python不會對數據進行任何轉換，而是按原始字節處理。

（1）讀取二進制文件

讀取二進制文件時，返回的是字節對象（bytes），而不是字符串：

with open('./datas/fig.png','rb') as file:data = file.read()print(type(data))print(len(data))print(data[:10])

（2）寫入二進制文件

寫入二進制文件時，必須提供字節對象，而不是字符串：

with open('./datas/out.png','wb') as file:with open('./datas/fig.png', 'rb') as f:file.write(f.read())

（3）分塊處理大型二進制文件

處理大型二進制文件時，應避免一次性將整個文件讀入內存：

def copy(source_file, target_file):chunk = 1024 * 1024 * 10 # 10MB為分塊total = 0with open(source_file, 'rb') as s:with open(target_file, 'wb') as t:while True:block = s.read(chunk)if not block: # 文件結束breaktotal += len(block)t.write(block)print("寫入了" , total)source_file = "F:\\系統鏡像\\CentOS-7-x86_64-DVD-1810.iso"
target_file = "F:\\系統鏡像\\CentOS-7-x86_64-DVD-1810-copy.iso"
copy(source_file, target_file)

7. 文件操作的異常處理

在進行文件操作時，可能會出現各種異常，如文件不存在、權限不足、磁盤空間不足等。為了保證程序的健壯性，需要對這些異常進行適當處理。

（1）常見的文件操作異常

FileNotFoundError：嘗試打開不存在的文件時拋出
PermissionError：沒有足夠權限訪問文件時拋出
IsADirectoryError：嘗試對目錄執行文件操作時拋出
FileExistsError：嘗試創建已存在的文件時拋出（使用’x’模式）
UnicodeDecodeError：文件編碼與指定的編碼不匹配時拋出
IOError：輸入/輸出操作失敗時拋出（如磁盤已滿）

（2）使用try-except處理異常

可以使用 try-except 語句來捕獲和處理異常，示例代碼如下：

try:with open("./datas/example.txt", 'r', encoding='utf-8') as file:# 如果出現異常 則導致程序中斷content = file.read()print(1/0)print(content)
except FileNotFoundError:print("文件未找到！")
except PermissionError:print("文件權限異常!")
except UnicodeDecodeError:print("編碼異常！")
# except ZeroDivisionError:
#     print("除數不能為0")
except IOError as e:print(f"發生其他錯誤{e}")
except Exception as e:print(f"范圍最大的異常問題{e}")print("之后執行的代碼")

（3）使用finally確保資源釋放

finally 子句可以確保無論是否發生異常，某些代碼都會執行，通常用于資源清理：

最好把操作的文件對象設置為全局

file = None
try:file = open("./datas/example.txt", 'r', encoding='utf-8')# 如果出現異常 則導致程序中斷content = file.read()print(1/0)print(content)
except FileNotFoundError:print("文件未找到！")
except PermissionError:print("文件權限異常!")
except UnicodeDecodeError:print("編碼異常！")
# except ZeroDivisionError:
#     print("除數不能為0")
except IOError as e:print(f"發生其他錯誤{e}")
except Exception as e:print(f"范圍最大的異常問題{e}")
finally:# 關閉資源（文件 數據庫 網絡鏈接）# 判斷文件對象的存在性 如果存在則關閉# 不存在 則不管 文件對象在創建時出現了異常print(type(file))if file is not None:file.close()print("之后執行的代碼")

8. 案例解析

（1）統計文件中單詞的數量

import redef count_words(file_path):file = None# 記錄單詞與其出現次數的字典# 鍵：單詞# 值：次數words = dict()try:file = open(file_path, 'r', encoding='utf-8')# 讀取每一行字符串for line in file:# 通過正則表達式 提取所有單詞(連續的英文字母)words_in_line = re.findall(r'\b[a-zA-Z]+\b', line)# 將提取出來的所有單詞 進行小寫化words_in_line = [word.lower() for word in words_in_line]for word in words_in_line:# 判斷當前單詞是否存在于字典當中（鍵）if word in words:# 修改 次數+1words[word] += 1else:# 新建 次數從1開始words[word] = 1except Exception as e:print(e)finally:if file is not None:file.close()return words
if __name__ == "__main__":file_path = "./Demo.py"# 統計單詞的個數 返回字典words = count_words(file_path)# 打印字典內容for key in words.keys():print(f'單詞{key},出現{words[key]}次')

（2）復制文件

#做一個復制文件的函數 傳入兩個文件的路徑 + try-expect-finally

（3）帶進度的復制文件

# pip install tqdm
from tqdm import tqdm
import time
import os
# 模擬一個需要一定時間完成的任務
# total_items指的是總任務個數
# def long_running_task(total_items):
#     for i in tqdm(range(total_items), desc="當前進度"):
#         time.sleep(0.1)
# long_running_task(50)# source_path 源文件路徑
# destination_path 目標文件路徑
def copy_file(source_path, destination_path):# 分塊大小chunk_size = 1024 * 1024 # => 1mb# 源文件大小file_size = os.path.getsize(source_path)# 當前拷貝大小copied_size = 0with open(source_path, 'rb') as source_file, open(destination_path,'wb') as destination_file:# total 進度最大值# unit 進度數據單位# unit_scale 單位縮放 500kb -> 2.5mbwith tqdm(total=file_size, unit="B", unit_scale=True, desc="當前進度") as progess_bar:while True:chunk = source_file.read(chunk_size)if not chunk:breakdestination_file.write(chunk)copied_size += len(chunk)progess_bar.update(len(chunk))print("復制完成")if __name__ == "__main__":s = "F:\\系統鏡像\\CentOS-7-x86_64-DVD-1810.iso"d = "F:\\系統鏡像\\CentOS-7-x86_64-DVD-1810-copy.iso"copy_file(s, d)

（4）文件備份工具

"""
工作空間目錄A目錄A目錄B目錄C文件A文件B文件C目錄B目錄C文件A文件B文件C
"""
# source_dir 源目錄
# backup_dir 拷貝目錄
# file_extensions 文件后綴名的過濾列表 ['.java', '.py', '.c']
import os
# 高級的文件操作的模塊
import shutildef backup_files(source_dir, backup_dir, file_extensions):# 確保備份拷貝目錄是存在的os.makedirs(backup_dir, exist_ok=True)# 統計信息 總共遍歷了多少文件 拷貝了多少文件 跳過了多少文件stats = {'total': 0, 'back_up': 0, 'skipped': 0}for root, dirs, files in os.walk(source_dir):# 跳過拷貝目錄本身(拷貝目錄如果在源目錄中)if os.path.abspath(root) == os.path.abspath(backup_dir):continue# 對當前root中的一級子文件進行遍歷(文件名稱)for file in files:stats['total'] += 1# 創建該文件全路徑（以root為父目錄）src_file = os.path.join(root, file)# 檢查后綴 .mp3 .MP3name, ext = os.path.splitext(file)if ext.lower() not in file_extensions:# 跳過的文件數量stats['skipped'] += 1continue# 符合過濾器 就需要復制出來 創建絕對路徑rel_path = os.path.relpath(root, source_dir)# 目標目錄路徑 保持原先的目錄結構dst_dir = os.path.join(backup_dir, rel_path)os.makedirs(dst_dir, exist_ok=True)# 復制文件dst_file = os.path.join(dst_dir, file)shutil.copy2(src_file, dst_file)stats['back_up'] += 1print(f'已備份:{src_file} -> {dst_file}')print(f'備份完成！總共{stats['total']}，備份{stats['back_up']}，跳過{stats['skipped']}')if __name__ == '__main__':backup_files("E:\\工作空間", "./備份", ['.java', '.py', '.c'])

9. 序列化操作

序列化是將程序中的數據結構轉換為可存儲或傳輸的格式的過程，而反序列化則是將這種格式轉換回原始數據結構。在Python中，序列化常用于數據持久化、網絡傳輸和進程間通信等場景。

將代碼中不屬于字符、字節的數據，文件IO操作中稱為抽象數據

Python中針對抽象數據提供了對應的模塊，可以實現抽象數據和文件之間的IO操作

序列化：將完整數據，拆分標記后進行保存
反序列化：將拆分標記的數據進行組合

Python提供的內置模塊：

將抽象數據序列化成字節文件：pickle（掌握）- 僅適用于Python
將抽象數據序列化成字符文件：json（掌握）- 跨語言通用
將抽象數據序列化成字節文件：marshal（了解）- 主要用于Python內部
將抽象數據進行數據字典保存：shelve（了解）- 提供類似字典的接口

（1）pickle

pickle模塊是Python特有的序列化模塊，可以將幾乎任何Python對象序列化為字節流，適合用于Python程序間的數據交換和持久化存儲。

將字典數據，序列化到文件中：

import pickle
user_dict = {"admin": {"username": "admin", "password": 123, "realname":"張三"},"manager": {"username": "manager", "password": 123, "realname":"李四"}
}
arr = [1,2,3,4,5,6,7,8,9,10]# 一般而言 建議將數據一次性封裝好 再去進行序列化
with open("pickle.pickle", mode="wb") as file:pickle.dump(user_dict, file)pickle.dump(arr, file)

將文件中的數據，反序列化到代碼中：

import pickle
with open("pickle.pickle", mode='rb') as file:while True:try:data = pickle.load(file)print(data, type(data))except EOFError:break

推薦方式如下：

import pickleuser_dict = {"admin": {"username": "admin", "password": 123, "realname": "張三"},"manager": {"username": "manager", "password": 123, "realname": "李四"}
}
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
tup = (1, 2, 3, 4, 5, 6)
set = {1, 2, 3, 4}
# 一次性將需要序列化的數據進行一個統一的封裝
data = {"user_dict": user_dict,"arr": arr,"tup": tup,"set": set
}with open("pickle_upper.pickle", mode="wb") as file:pickle.dump(data, file)with open('pickle_upper.pickle', mode='rb') as file:data = pickle.load(file)print(data, type(data))print(data['user_dict'])print(data['arr'])print(data['tup'])print(data['set'])

（2）json

json模塊是Python標準庫中用于JSON數據編碼和解碼的模塊，JSON是一種輕量級的數據交換格式，可以在不同編程語言之間傳遞數據（JSON支持的數據類型：字典、數組、數字、布爾類型、字符串(必須用""雙引號)）。

將代碼中的抽象數據，序列化存儲到字符文件中：

import json
user_dict = {"admin": {"username": "admin", "password": 123, "realname": "張三"},"manager": {"username": "manager", "password": 123, "realname": "李四"}
}arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
tup = (1, 2, 3, 4, 5, 6)data = {"user_dict": user_dict,"arr": arr,"tup": tup,
}
# serializable 可序列化
with open("json.txt", mode="w") as file:json.dump(data, file, indent=4)

將字符文件中的數據，反序列化到代碼中：

import json
with open('json.txt', mode='r') as file:data = json.load(file)print(data, type(data))print(data['arr'])

（3）marshal

將代碼中的抽象數據，序列化存儲到字節文件中

import marshal
user_dict = {"admin": {"username": "admin", "password": 123, "realname": "張三"},"manager": {"username": "manager", "password": 123, "realname": "李四"}
}arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
tup = (1, 2, 3, 4, 5, 6)data = {"user_dict": user_dict,"arr": arr,"tup": tup,
}
with open('marshal.mar', mode='wb') as file:marshal.dump(data, file)

將字節文件中的數據，反-序列化到代碼中

import marshalwith open('marshal.mar', mode='rb') as file:data = marshal.load( file)print(data, type(data))

（4）shelve

唯一一個文件IO操作中，對抽象數據進行存儲的高級模塊

如果需要將抽象數據保存到文件中，優先推薦使用該模塊
shelve模塊的操作方式，是所有序列化模塊中最簡潔、最清晰

將抽象數據，序列化到文件中

import shelveuser_dict = {"admin": {"username": "admin", "password": 123, "realname": "張三"},"manager": {"username": "manager", "password": 123, "realname": "李四"}
}
user_set = {"admin": "張三", "manger": "李四"}
user_arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
"""
.bak 數據備份
.dir 數據的索引
.dat 源數據的字節形式
"""
with shelve.open("my_shevel") as db:db['user_dict'] = user_dictdb['user_set'] = user_setdb['user_arr'] = user_arr

將文件中的數據，反序列化到代碼中

import shelvewith shelve.open("my_shevel") as db:db['user_arr'] = [66,666,6666]print(db['user_dict'])print(db['user_set'])print(db['user_arr'])