第 6 章:進階話題
過擬合vs欠擬合:模型復雜度和泛化能力的關系
在前面的章節中,我們已經學習了神經網絡的基礎知識、常見架構和基本訓練流程。然而,在實際的深度學習項目中,僅僅掌握這些基礎知識是不夠的。我們還需要了解一些進階話題,這些話題往往決定了模型能否在實際應用中取得成功。
本章將探討三個重要的進階話題:
- 過擬合與正則化:如何防止模型過度記憶訓練數據
- 不同的優化器:如何選擇合適的優化算法
- 超參數調優:如何系統地尋找最佳的超參數組合
這些話題雖然聽起來很技術性,但它們都是實際項目中必須面對的挑戰。掌握這些知識,將讓你的深度學習項目更加穩健和高效。
6.1 過擬合與正則化:防止模型"死記硬背"
什么是過擬合?
過擬合示例:模型在訓練數據上完美擬合,但在新數據上表現很差
過擬合(Overfitting)是機器學習中最常見的問題之一。簡單來說,過擬合就是模型在訓練數據上表現很好,但在新數據上表現很差的現象。
想象一下,如果一個學生只是死記硬背了所有的練習題答案,但沒有真正理解知識點,那么當遇到新的題目時,他就會束手無策。這就是過擬合的典型表現。
過擬合的識別
讓我們通過一個簡單的例子來觀察過擬合現象:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
from torch.utils.data import DataLoader, TensorDataset# 生成一些簡單的數據
np.random.seed(42)
x = np.linspace(0, 10, 100)
y_true = 2 * x + 1 + np.random.normal(0, 0.5, 100) # 真實的線性關系加噪聲# 轉換為 PyTorch 張量
x_tensor = torch.FloatTensor(x).reshape(-1, 1)
y_tensor = torch.FloatTensor(y_true).reshape(-1, 1)# 創建數據集
dataset = TensorDataset(x_tensor, y_tensor)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)# 定義一個可能過擬合的復雜模型
class OverfittingModel(nn.Module):def __init__(self, hidden_size=100, num_layers=5):super(OverfittingModel, self).__init__()layers = []input_size = 1for i in range(num_layers):layers.append(nn.Linear(input_size, hidden_size))layers.append(nn.ReLU())input_size = hidden_sizelayers.append(nn.Linear(hidden_size, 1))self.network = nn.Sequential(*layers)def forward(self, x):return self.network(x)# 訓練模型
model = OverfittingModel()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)# 記錄訓練損失
train_losses = []# 訓練
epochs = 1000
for epoch in range(epochs):model.train()total_loss = 0for batch_x, batch_y in dataloader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()avg_loss = total_loss / len(dataloader)train_losses.append(avg_loss)if epoch % 100 == 0:print(f'Epoch {epoch}, Loss: {avg_loss:.4f}')# 可視化結果
plt.figure(figsize=(12, 4))# 訓練損失曲線
plt.subplot(1, 2, 1)
plt.plot(train_losses)
plt.title('訓練損失')
plt.xlabel('Epoch')
plt.ylabel('Loss')# 模型預測結果
plt.subplot(1, 2, 2)
model.eval()
with torch.no_grad():predictions = model(x_tensor)plt.scatter(x, y_true, alpha=0.5, label='真實數據')
plt.plot(x, predictions.numpy(), 'r-', linewidth=2, label='模型預測')
plt.title('模型擬合結果')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()plt.tight_layout()
plt.show()
從這個例子中,我們可以看到:
- 訓練損失持續下降,最終接近零
- 模型預測的曲線非常復雜,試圖擬合每一個數據點
- 這種復雜的擬合很可能在新數據上表現不佳
正則化技術
為了防止過擬合,我們需要使用正則化技術。以下是幾種最常用的方法:
1. Dropout
Dropout機制:在訓練時隨機"關閉"一部分神經元,防止過擬合
Dropout 是最簡單有效的正則化方法之一。它的思想是在訓練時隨機"關閉"一部分神經元,迫使網絡不能過度依賴某些特定的神經元。
class RegularizedModel(nn.Module):def __init__(self, hidden_size=100, num_layers=5, dropout_rate=0.5):super(RegularizedModel, self).__init__()layers = []input_size = 1for i in range(num_layers):layers.append(nn.Linear(input_size, hidden_size))layers.append(nn.ReLU())layers.append(nn.Dropout(dropout_rate)) # 添加 Dropoutinput_size = hidden_sizelayers.append(nn.Linear(hidden_size, 1))self.network = nn.Sequential(*layers)def forward(self, x):return self.network(x)# 使用正則化的模型
regularized_model = RegularizedModel(dropout_rate=0.3)
criterion = nn.MSELoss()
optimizer = optim.Adam(regularized_model.parameters(), lr=0.01)# 訓練正則化模型
reg_train_losses = []
epochs = 1000for epoch in range(epochs):regularized_model.train()total_loss = 0for batch_x, batch_y in dataloader:optimizer.zero_grad()outputs = regularized_model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()avg_loss = total_loss / len(dataloader)reg_train_losses.append(avg_loss)if epoch % 100 == 0:print(f'Epoch {epoch}, Loss: {avg_loss:.4f}')# 比較兩個模型
plt.figure(figsize=(15, 5))# 訓練損失對比
plt.subplot(1, 3, 1)
plt.plot(train_losses, label='無正則化')
plt.plot(reg_train_losses, label='有 Dropout')
plt.title('訓練損失對比')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()# 無正則化模型預測
plt.subplot(1, 3, 2)
model.eval()
with torch.no_grad():predictions = model(x_tensor)plt.scatter(x, y_true, alpha=0.5, label='真實數據')
plt.plot(x, predictions.numpy(), 'r-', linewidth=2, label='無正則化')
plt.title('無正則化模型')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()# 正則化模型預測
plt.subplot(1, 3, 3)
regularized_model.eval()
with torch.no_grad():reg_predictions = regularized_model(x_tensor)plt.scatter(x, y_true, alpha=0.5, label='真實數據')
plt.plot(x, reg_predictions.numpy(), 'g-', linewidth=2, label='有 Dropout')
plt.title('正則化模型')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()plt.tight_layout()
plt.show()
2. L1 和 L2 正則化
L1 和 L2 正則化通過在損失函數中添加權重懲罰項來防止過擬合:
- L2 正則化(權重衰減):添加權重的平方和到損失函數
- L1 正則化:添加權重的絕對值之和到損失函數
# L2 正則化(權重衰減)
optimizer_l2 = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.01)# L1 正則化需要手動實現
def l1_regularization(model, lambda_l1=0.01):l1_loss = 0for param in model.parameters():l1_loss += torch.sum(torch.abs(param))return lambda_l1 * l1_loss# 在訓練循環中使用 L1 正則化
for epoch in range(epochs):model.train()total_loss = 0for batch_x, batch_y in dataloader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)# 添加 L1 正則化l1_loss = l1_regularization(model, lambda_l1=0.01)total_loss_with_l1 = loss + l1_losstotal_loss_with_l1.backward()optimizer.step()total_loss += loss.item() # 只記錄原始損失用于顯示
3. 早停(Early Stopping)
早停是一種簡單但有效的正則化方法。它的思想是在驗證集性能開始下降時停止訓練:
def train_with_early_stopping(model, train_loader, val_loader, patience=10):criterion = nn.MSELoss()optimizer = optim.Adam(model.parameters(), lr=0.01)best_val_loss = float('inf')patience_counter = 0train_losses = []val_losses = []for epoch in range(1000):# 訓練階段model.train()train_loss = 0for batch_x, batch_y in train_loader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()train_loss += loss.item()# 驗證階段model.eval()val_loss = 0with torch.no_grad():for batch_x, batch_y in val_loader:outputs = model(batch_x)loss = criterion(outputs, batch_y)val_loss += loss.item()avg_train_loss = train_loss / len(train_loader)avg_val_loss = val_loss / len(val_loader)train_losses.append(avg_train_loss)val_losses.append(avg_val_loss)# 早停檢查if avg_val_loss < best_val_loss:best_val_loss = avg_val_losspatience_counter = 0# 保存最佳模型torch.save(model.state_dict(), 'best_model.pth')else:patience_counter += 1if patience_counter >= patience:print(f'Early stopping at epoch {epoch}')breakif epoch % 50 == 0:print(f'Epoch {epoch}: Train Loss: {avg_train_loss:.4f}, Val Loss: {avg_val_loss:.4f}')return train_losses, val_losses
數據增強
對于圖像數據,數據增強是一種非常有效的正則化方法:
from torchvision import transforms# 圖像數據增強
transform_train = transforms.Compose([transforms.RandomHorizontalFlip(p=0.5), # 隨機水平翻轉transforms.RandomRotation(10), # 隨機旋轉transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)), # 隨機平移transforms.ColorJitter(brightness=0.2, contrast=0.2), # 顏色抖動transforms.ToTensor(),transforms.Normalize((0.5,), (0.5,))
])# 測試時只做基本變換
transform_test = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,), (0.5,))
])
6.2 不同的優化器:選擇合適的"登山路徑"
梯度下降:沿著損失函數的"山坡"向下走,尋找最小值
在梯度下降中,優化器決定了我們如何沿著損失函數的"山坡"向下走。不同的優化器有不同的"走路方式",適用于不同的場景。
隨機梯度下降(SGD)
帶動量的SGD:在梯度方向上累積動量,減少震蕩
SGD 是最基礎的優化器,它直接使用梯度來更新參數:
# 基礎 SGD
optimizer_sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)# SGD 的變體:帶動量的 SGD
optimizer_sgd_momentum = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)# SGD 的變體:帶動量和權重衰減的 SGD
optimizer_sgd_full = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0001
)
SGD 的特點:
- 簡單直接,易于理解
- 在某些情況下可能收斂較慢
- 需要仔細調整學習率
- 帶動量的 SGD 通常比基礎 SGD 表現更好
Adam 優化器
Adam優化器:結合動量和自適應學習率的優勢
Adam 是目前最受歡迎的優化器之一,它結合了動量和自適應學習率的優點:
# 基礎 Adam
optimizer_adam = optim.Adam(model.parameters(), lr=0.001)# 帶權重衰減的 Adam
optimizer_adam_wd = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0001
)# AdamW(改進的 Adam,更好的權重衰減)
optimizer_adamw = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01
)
Adam 的特點:
- 自適應學習率,通常不需要手動調整
- 收斂速度快
- 對超參數相對不敏感
- 在大多數情況下表現良好
RMSprop
RMSprop 是另一種自適應學習率優化器:
optimizer_rmsprop = optim.RMSprop(model.parameters(), lr=0.001, alpha=0.99, # 移動平均的衰減率eps=1e-08 # 數值穩定性常數
)
優化器比較實驗
讓我們通過一個實驗來比較不同優化器的性能:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader, TensorDataset
import numpy as np# 生成一些測試數據
np.random.seed(42)
x = np.random.randn(1000, 10)
y = np.sum(x * np.random.randn(10), axis=1) + np.random.normal(0, 0.1, 1000)x_tensor = torch.FloatTensor(x)
y_tensor = torch.FloatTensor(y).reshape(-1, 1)dataset = TensorDataset(x_tensor, y_tensor)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)# 定義一個簡單的模型
class SimpleModel(nn.Module):def __init__(self):super(SimpleModel, self).__init__()self.fc1 = nn.Linear(10, 50)self.fc2 = nn.Linear(50, 1)def forward(self, x):x = torch.relu(self.fc1(x))x = self.fc2(x)return x# 定義不同的優化器
optimizers = {'SGD': optim.SGD,'SGD with Momentum': lambda params: optim.SGD(params, momentum=0.9),'Adam': optim.Adam,'RMSprop': optim.RMSprop,'AdamW': optim.AdamW
}# 訓練函數
def train_model(optimizer_class, model, dataloader, epochs=100):if 'Momentum' in optimizer_class.__name__:optimizer = optimizer_class(model.parameters(), lr=0.01, momentum=0.9)else:optimizer = optimizer_class(model.parameters(), lr=0.001)criterion = nn.MSELoss()losses = []for epoch in range(epochs):model.train()total_loss = 0for batch_x, batch_y in dataloader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()avg_loss = total_loss / len(dataloader)losses.append(avg_loss)return losses# 比較不同優化器
results = {}
for name, optimizer_class in optimizers.items():print(f"訓練 {name}...")model = SimpleModel()losses = train_model(optimizer_class, model, dataloader)results[name] = losses# 可視化結果
plt.figure(figsize=(12, 8))for name, losses in results.items():plt.plot(losses, label=name, linewidth=2)plt.title('不同優化器的收斂速度比較')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.yscale('log') # 使用對數坐標更好地顯示差異
plt.show()# 打印最終損失
print("\n最終損失值:")
for name, losses in results.items():print(f"{name}: {losses[-1]:.6f}")
如何選擇優化器?
- Adam:大多數情況下的默認選擇,特別是對于深度學習任務
- SGD + Momentum:當需要更精確的控制時,或者在某些特定任務上表現更好
- AdamW:當使用權重衰減時,通常比 Adam 更好
- RMSprop:在某些特定場景下可能表現更好
6.3 超參數調優:尋找最佳配置
超參數搜索空間:在多個維度上尋找最優配置
超參數是那些不能通過訓練自動學習的參數,如學習率、網絡層數、神經元數量等。選擇合適的超參數對模型性能至關重要。
常見的超參數
- 學習率(Learning Rate):最重要的超參數之一
- 批次大小(Batch Size):影響訓練穩定性和內存使用
- 網絡架構:層數、每層神經元數量
- 正則化參數:Dropout 率、權重衰減系數
- 優化器參數:動量、β1、β2 等
網格搜索(Grid Search)
網格搜索是最簡單但計算成本最高的方法:
import itertoolsdef grid_search_hyperparameters():# 定義超參數網格param_grid = {'learning_rate': [0.001, 0.01, 0.1],'hidden_size': [50, 100, 200],'dropout_rate': [0.1, 0.3, 0.5],'batch_size': [16, 32, 64]}# 生成所有組合param_combinations = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]best_score = float('inf')best_params = Noneresults = []for params in param_combinations:print(f"測試參數: {params}")# 創建模型和訓練model = SimpleModel(hidden_size=params['hidden_size'])train_loader = DataLoader(dataset, batch_size=params['batch_size'], shuffle=True)optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])criterion = nn.MSELoss()# 簡單訓練(為了演示,只訓練幾個 epoch)for epoch in range(10):model.train()total_loss = 0for batch_x, batch_y in train_loader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()final_loss = total_loss / len(train_loader)results.append((params, final_loss))if final_loss < best_score:best_score = final_lossbest_params = paramsreturn best_params, best_score, results# 運行網格搜索
best_params, best_score, all_results = grid_search_hyperparameters()
print(f"\n最佳參數: {best_params}")
print(f"最佳分數: {best_score:.6f}")
隨機搜索(Random Search)
隨機搜索通常比網格搜索更高效:
import randomdef random_search_hyperparameters(n_trials=20):best_score = float('inf')best_params = Noneresults = []for trial in range(n_trials):# 隨機采樣超參數params = {'learning_rate': random.choice([0.001, 0.01, 0.1]),'hidden_size': random.choice([50, 100, 200, 300]),'dropout_rate': random.uniform(0.1, 0.6),'batch_size': random.choice([16, 32, 64, 128])}print(f"試驗 {trial + 1}: {params}")# 訓練和評估(簡化版)model = SimpleModel(hidden_size=int(params['hidden_size']))train_loader = DataLoader(dataset, batch_size=int(params['batch_size']), shuffle=True)optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])criterion = nn.MSELoss()# 訓練for epoch in range(10):model.train()total_loss = 0for batch_x, batch_y in train_loader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()total_loss += loss.item()final_loss = total_loss / len(train_loader)results.append((params, final_loss))if final_loss < best_score:best_score = final_lossbest_params = paramsreturn best_params, best_score, results
學習率調度
學習率調度是另一種重要的超參數調優技術:
# 學習率調度器
from torch.optim.lr_scheduler import StepLR, ExponentialLR, ReduceLROnPlateau# StepLR:每隔一定步數降低學習率
scheduler_step = StepLR(optimizer, step_size=30, gamma=0.1)# ExponentialLR:指數衰減學習率
scheduler_exp = ExponentialLR(optimizer, gamma=0.95)# ReduceLROnPlateau:當驗證損失停止改善時降低學習率
scheduler_plateau = ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10)# 在訓練循環中使用
for epoch in range(epochs):# 訓練代碼...# 更新學習率scheduler_step.step() # 或者 scheduler_exp.step()# 對于 ReduceLROnPlateau,需要傳入驗證損失# scheduler_plateau.step(val_loss)
超參數調優的最佳實踐
- 從小范圍開始:先在小范圍內搜索,找到大致方向后再擴大搜索范圍
- 使用驗證集:確保超參數在未見過的數據上也能表現良好
- 記錄所有實驗:記錄每次實驗的參數和結果,避免重復實驗
- 考慮計算成本:在計算資源有限時,優先調優最重要的超參數
- 使用交叉驗證:對于小數據集,使用交叉驗證來獲得更可靠的評估
6.4 實踐:完整的模型調優流程
讓我們通過一個完整的例子來展示如何應用這些進階技術:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
import matplotlib.pyplot as plt
import numpy as np# 生成更復雜的數據
np.random.seed(42)
x = np.random.randn(2000, 20)
y = np.sum(x * np.random.randn(20), axis=1) + np.random.normal(0, 0.1, 2000)x_tensor = torch.FloatTensor(x)
y_tensor = torch.FloatTensor(y).reshape(-1, 1)# 分割數據集
dataset = TensorDataset(x_tensor, y_tensor)
train_size = int(0.7 * len(dataset))
val_size = int(0.15 * len(dataset))
test_size = len(dataset) - train_size - val_sizetrain_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size]
)# 定義完整的模型類
class AdvancedModel(nn.Module):def __init__(self, input_size=20, hidden_sizes=[100, 50], dropout_rate=0.3):super(AdvancedModel, self).__init__()layers = []prev_size = input_sizefor hidden_size in hidden_sizes:layers.append(nn.Linear(prev_size, hidden_size))layers.append(nn.ReLU())layers.append(nn.Dropout(dropout_rate))prev_size = hidden_sizelayers.append(nn.Linear(prev_size, 1))self.network = nn.Sequential(*layers)def forward(self, x):return self.network(x)# 訓練函數
def train_model_with_regularization(model, train_loader, val_loader, optimizer, scheduler, epochs=100):criterion = nn.MSELoss()train_losses = []val_losses = []best_val_loss = float('inf')patience = 15patience_counter = 0for epoch in range(epochs):# 訓練階段model.train()train_loss = 0for batch_x, batch_y in train_loader:optimizer.zero_grad()outputs = model(batch_x)loss = criterion(outputs, batch_y)loss.backward()optimizer.step()train_loss += loss.item()# 驗證階段model.eval()val_loss = 0with torch.no_grad():for batch_x, batch_y in val_loader:outputs = model(batch_x)loss = criterion(outputs, batch_y)val_loss += loss.item()avg_train_loss = train_loss / len(train_loader)avg_val_loss = val_loss / len(val_loader)train_losses.append(avg_train_loss)val_losses.append(avg_val_loss)# 學習率調度if isinstance(scheduler, optim.lr_scheduler.ReduceLROnPlateau):scheduler.step(avg_val_loss)else:scheduler.step()# 早停if avg_val_loss < best_val_loss:best_val_loss = avg_val_losspatience_counter = 0torch.save(model.state_dict(), 'best_model.pth')else:patience_counter += 1if patience_counter >= patience:print(f'Early stopping at epoch {epoch}')breakif epoch % 20 == 0:print(f'Epoch {epoch}: Train Loss: {avg_train_loss:.6f}, 'f'Val Loss: {avg_val_loss:.6f}, LR: {optimizer.param_groups[0]["lr"]:.6f}')return train_losses, val_losses# 超參數優化
def optimize_hyperparameters():best_score = float('inf')best_params = None# 定義搜索空間param_combinations = [{'lr': 0.001, 'hidden_sizes': [100, 50], 'dropout_rate': 0.3, 'batch_size': 32},{'lr': 0.01, 'hidden_sizes': [200, 100], 'dropout_rate': 0.2, 'batch_size': 64},{'lr': 0.001, 'hidden_sizes': [150, 75], 'dropout_rate': 0.4, 'batch_size': 32},{'lr': 0.005, 'hidden_sizes': [100, 100, 50], 'dropout_rate': 0.3, 'batch_size': 32},]for params in param_combinations:print(f"\n測試參數: {params}")# 創建數據加載器train_loader = DataLoader(train_dataset, batch_size=params['batch_size'], shuffle=True)val_loader = DataLoader(val_dataset, batch_size=params['batch_size'])# 創建模型model = AdvancedModel(hidden_sizes=params['hidden_sizes'],dropout_rate=params['dropout_rate'])# 創建優化器和調度器optimizer = optim.Adam(model.parameters(), lr=params['lr'])scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10, verbose=True)# 訓練模型train_losses, val_losses = train_model_with_regularization(model, train_loader, val_loader, optimizer, scheduler, epochs=50)# 評估final_val_loss = val_losses[-1]if final_val_loss < best_score:best_score = final_val_lossbest_params = paramsreturn best_params, best_score# 運行優化
print("開始超參數優化...")
best_params, best_score = optimize_hyperparameters()
print(f"\n最佳參數: {best_params}")
print(f"最佳驗證損失: {best_score:.6f}")# 使用最佳參數訓練最終模型
print("\n使用最佳參數訓練最終模型...")
train_loader = DataLoader(train_dataset, batch_size=best_params['batch_size'], shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=best_params['batch_size'])
test_loader = DataLoader(test_dataset, batch_size=best_params['batch_size'])final_model = AdvancedModel(hidden_sizes=best_params['hidden_sizes'],dropout_rate=best_params['dropout_rate']
)optimizer = optim.Adam(final_model.parameters(), lr=best_params['lr'])
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10
)train_losses, val_losses = train_model_with_regularization(final_model, train_loader, val_loader, optimizer, scheduler, epochs=100
)# 最終評估
final_model.eval()
test_loss = 0
criterion = nn.MSELoss()with torch.no_grad():for batch_x, batch_y in test_loader:outputs = final_model(batch_x)loss = criterion(outputs, batch_y)test_loss += loss.item()test_loss /= len(test_loader)
print(f"\n最終測試損失: {test_loss:.6f}")# 可視化訓練過程
plt.figure(figsize=(12, 4))plt.subplot(1, 2, 1)
plt.plot(train_losses, label='訓練損失')
plt.plot(val_losses, label='驗證損失')
plt.title('訓練和驗證損失')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)plt.subplot(1, 2, 2)
plt.plot(train_losses, label='訓練損失', alpha=0.7)
plt.plot(val_losses, label='驗證損失', alpha=0.7)
plt.title('訓練和驗證損失(對數坐標)')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.yscale('log')plt.tight_layout()
plt.show()
總結
在本章中,我們深入探討了深度學習的三個重要進階話題:
-
過擬合與正則化:
- 理解了過擬合的概念和危害
- 學習了多種正則化技術:Dropout、L1/L2 正則化、早停、數據增強
- 掌握了如何識別和防止過擬合
-
不同的優化器:
- 比較了 SGD、Adam、RMSprop 等主要優化器
- 了解了每種優化器的特點和適用場景
- 學會了如何選擇合適的優化器
-
超參數調優:
- 學習了網格搜索、隨機搜索、貝葉斯優化等方法
- 掌握了學習率調度技術
- 了解了超參數調優的最佳實踐
這些進階話題是實際深度學習項目中不可或缺的知識。掌握這些技術,將讓你的模型更加穩健、高效,并能夠在實際應用中取得更好的效果。
損失: {test_loss:.6f}")
可視化訓練過程
plt.figure(figsize=(12, 4))plt.subplot(1, 2, 1)
plt.plot(train_losses, label='訓練損失')
plt.plot(val_losses, label='驗證損失')
plt.title('訓練和驗證損失')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)plt.subplot(1, 2, 2)
plt.plot(train_losses, label='訓練損失', alpha=0.7)
plt.plot(val_losses, label='驗證損失', alpha=0.7)
plt.title('訓練和驗證損失(對數坐標)')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.yscale('log')plt.tight_layout()
plt.show()
總結
在本章中,我們深入探討了深度學習的三個重要進階話題:
-
過擬合與正則化:
- 理解了過擬合的概念和危害
- 學習了多種正則化技術:Dropout、L1/L2 正則化、早停、數據增強
- 掌握了如何識別和防止過擬合
-
不同的優化器:
- 比較了 SGD、Adam、RMSprop 等主要優化器
- 了解了每種優化器的特點和適用場景
- 學會了如何選擇合適的優化器
-
超參數調優:
- 學習了網格搜索、隨機搜索、貝葉斯優化等方法
- 掌握了學習率調度技術
- 了解了超參數調優的最佳實踐
這些話題是實際深度學習項目中不可或缺的知識。掌握這些技術,將讓你的模型更加穩健、高效,并能夠在實際應用中取得更好的效果。