全連接神經網絡(MLP)原理與PyTorch實現詳解

一、全連接神經網絡概述

全連接神經網絡(Fully Connected Neural Network)，也稱為多層感知機(Multi-Layer Perceptron, MLP)，是深度學習中最基礎的神經網絡結構之一。它由多個全連接層組成，每一層的神經元與下一層的所有神經元相連接。

1.1 神經網絡基本結構

一個典型的全連接神經網絡包含以下組成部分：

輸入層：接收原始數據
隱藏層：進行特征提取和轉換（可以有多層）
輸出層：產生最終預測結果
權重(Weights)：連接神經元之間的參數
偏置(Bias)：每個神經元的附加參數
激活函數：引入非線性因素

1.2 為什么需要激活函數？

如果不使用激活函數，無論神經網絡有多少層，輸出都是輸入的線性組合，這與單層神經網絡等價。這是因為多層線性變換可以被簡化為一個等效的單層線性變換。

具體來說：

假設一個兩層神經網絡，第一層的權重矩陣為W1，第二層為W2
沒有激活函數時，輸出y = W2(W1x) = (W2W1)x
這個結果等價于一個單層網絡y = W'x，其中W'=W2W1

這種線性疊加會導致神經網絡的表達能力被大大限制：

無法學習非線性關系（如XOR問題）
無法實現復雜的特征轉換
無法逼近任意函數（根據通用近似定理）

常用的激活函數包括：

ReLU（Rectified Linear Unit）
- 公式：f(x) = max(0, x)
- 特點：
  - 計算簡單高效，只需判斷閾值
  - 有效緩解深度網絡中的梯度消失問題
  - 存在"死亡ReLU"現象（神經元可能永遠不被激活）
- 應用場景：CNN、DNN等深層網絡的隱藏層
- 示例：AlexNet、ResNet等經典網絡均采用ReLU
Sigmoid
- 公式：f(x) = 1 / (1 + e^-x)
- 特點：
  - 輸出范圍(0,1)，可解釋為概率
  - 存在梯度消失問題（當輸入絕對值較大時）
  - 計算涉及指數運算，相對復雜
- 應用場景：
  - 二分類問題的輸出層
  - 傳統神經網絡的隱藏層（現已較少使用）
- 示例：邏輯回歸中的默認激活函數
Tanh（雙曲正切函數）
- 公式：f(x) = (e^x - e^-x) / (e^x + e^-x)
- 特點：
  - 輸出范圍(-1,1)，以0為中心
  - 梯度比sigmoid更陡峭
  - 同樣存在梯度消失問題
- 應用場景：RNN/LSTM等序列模型的隱藏層
- 示例：LSTM中用于控制記憶單元狀態的更新
Softmax
- 公式：f(x_i) = e^x_i / Σ(e^x_j)
- 特點：
  - 將輸出轉化為概率分布（總和為1）
  - 放大最大值的概率，抑制較小值
  - 通常配合交叉熵損失函數使用
- 應用場景：
  - 多分類問題的輸出層
  - 注意力機制中的注意力權重計算
- 示例：圖像分類網絡（如VGG、Inception）的最后一層
其他常見激活函數
- Leaky ReLU：f(x) = max(αx, x)（α通常取0.01）
- ELU：f(x) = x (x>0), α(e^x-1) (x≤0)
- Swish：f(x) = x * sigmoid(βx)（Google提出的自門控激活函數）

注：在實際應用中，ReLU及其變種（如Leaky ReLU）是目前最常用的隱藏層激活函數，而輸出層根據任務類型選擇Sigmoid（二分類）或Softmax（多分類）。選擇激活函數時需要權衡計算效率、梯度特性和網絡深度等因素。

二、使用PyTorch構建全連接神經網絡

PyTorch是一個開源的Python機器學習庫，基于Torch，廣泛應用于計算機視覺和自然語言處理等應用領域。下面我們將詳細介紹如何使用PyTorch構建全連接神經網絡。

2.1 環境準備

首先需要安裝PyTorch：

# 使用pip安裝PyTorch CPU版本
# pip install torch torchvision# 如果有NVIDIA GPU，可以安裝CUDA版本
# pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

2.2 導入必要的庫?

import torch
from torch import nn  # 神經網絡模塊
from torch import optim  # 優化器模塊
from torch.nn import functional as F  # 常用函數模塊
import numpy as np
from sklearn.datasets import make_classification  # 生成分類數據
from sklearn.model_selection import train_test_split  # 數據集劃分
from sklearn.preprocessing import StandardScaler  # 數據標準化

2.3 數據準備

我們使用scikit-learn生成一個模擬的二分類數據集：

# 設置隨機種子保證可復現性
torch.manual_seed(42)
np.random.seed(42)# 生成模擬數據
# n_samples: 樣本數量
# n_features: 特征數量
# n_classes: 類別數量
# n_informative: 有信息的特征數量
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, n_informative=8, random_state=42)# 將數據轉換為PyTorch張量
X = torch.from_numpy(X).float()  # 轉換為float32類型
y = torch.from_numpy(y).float().view(-1, 1)  # 轉換為float32并調整形狀# 數據標準化
scaler = StandardScaler()
X = torch.from_numpy(scaler.fit_transform(X.numpy())).float()# 劃分訓練集和測試集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# 轉換為PyTorch的Dataset和DataLoader
from torch.utils.data import TensorDataset, DataLoadertrain_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)# DataLoader參數詳解：
# dataset: 數據集
# batch_size: 每批數據量
# shuffle: 是否打亂數據
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

2.4 定義模型結構

使用PyTorch的nn.Module類定義我們的全連接神經網絡：

class MLP(nn.Module):def __init__(self, input_size):"""初始化MLP模型參數:input_size: 輸入特征維度"""super(MLP, self).__init__()# 第一個全連接層# nn.Linear參數:# in_features: 輸入特征數# out_features: 輸出特征數(神經元數量)# bias: 是否使用偏置項(默認為True)self.fc1 = nn.Linear(input_size, 64)# 第二個全連接層self.fc2 = nn.Linear(64, 32)# 輸出層self.fc3 = nn.Linear(32, 1)# Dropout層，防止過擬合# p: 丟棄概率self.dropout = nn.Dropout(p=0.2)def forward(self, x):"""前向傳播參數:x: 輸入數據返回:模型輸出"""# 第一層: 線性變換 + ReLU激活 + Dropoutx = F.relu(self.fc1(x))x = self.dropout(x)# 第二層: 線性變換 + ReLU激活 + Dropoutx = F.relu(self.fc2(x))x = self.dropout(x)# 輸出層: 線性變換 + Sigmoid激活(二分類問題)x = torch.sigmoid(self.fc3(x))return x

2.5 模型實例化與參數查看?

# 實例化模型
input_size = X_train.shape[1]  # 輸入特征維度
model = MLP(input_size)# 打印模型結構
print(model)# 查看模型參數
for name, param in model.named_parameters():print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

2.6 定義損失函數和優化器?

# 定義損失函數
# nn.BCELoss(): 二分類交叉熵損失函數
# 注意：使用BCELoss時，模型輸出需要經過Sigmoid激活
criterion = nn.BCELoss()# 定義優化器
# optim.SGD參數:
# params: 要優化的參數(通常為model.parameters())
# lr: 學習率(learning rate)
# momentum: 動量因子(0-1)
# weight_decay: L2正則化系數
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)# 也可以使用Adam優化器
# optimizer = optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999))

2.7 訓練模型?

# 訓練參數
num_epochs = 100# 記錄訓練過程中的損失和準確率
train_losses = []
test_losses = []
train_accuracies = []
test_accuracies = []for epoch in range(num_epochs):# 訓練模式model.train()running_loss = 0.0correct = 0total = 0for inputs, labels in train_loader:# 梯度清零optimizer.zero_grad()# 前向傳播outputs = model(inputs)# 計算損失loss = criterion(outputs, labels)# 反向傳播loss.backward()# 更新權重optimizer.step()# 統計信息running_loss += loss.item()predicted = (outputs > 0.5).float()total += labels.size(0)correct += (predicted == labels).sum().item()# 計算訓練集上的平均損失和準確率train_loss = running_loss / len(train_loader)train_accuracy = 100 * correct / totaltrain_losses.append(train_loss)train_accuracies.append(train_accuracy)# 測試模式model.eval()test_running_loss = 0.0test_correct = 0test_total = 0with torch.no_grad():  # 不計算梯度for inputs, labels in test_loader:outputs = model(inputs)loss = criterion(outputs, labels)test_running_loss += loss.item()predicted = (outputs > 0.5).float()test_total += labels.size(0)test_correct += (predicted == labels).sum().item()# 計算測試集上的平均損失和準確率test_loss = test_running_loss / len(test_loader)test_accuracy = 100 * test_correct / test_totaltest_losses.append(test_loss)test_accuracies.append(test_accuracy)# 打印訓練信息print(f'Epoch [{epoch+1}/{num_epochs}], 'f'Train Loss: {train_loss:.4f}, Train Acc: {train_accuracy:.2f}%, 'f'Test Loss: {test_loss:.4f}, Test Acc: {test_accuracy:.2f}%')

2.8 可視化訓練過程?

import matplotlib.pyplot as plt# 繪制損失曲線
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.plot(test_losses, label='Test Loss')
plt.title('Training and Test Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()# 繪制準確率曲線
plt.subplot(1, 2, 2)
plt.plot(train_accuracies, label='Train Accuracy')
plt.plot(test_accuracies, label='Test Accuracy')
plt.title('Training and Test Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()plt.tight_layout()
plt.show()

三、全連接神經網絡的高級技巧

3.1 權重初始化

良好的權重初始化可以加速收斂并提高模型性能：

# 自定義權重初始化
def init_weights(m):if isinstance(m, nn.Linear):# Xavier/Glorot初始化nn.init.xavier_uniform_(m.weight)# 偏置初始化為0nn.init.zeros_(m.bias)# 應用初始化
model.apply(init_weights)

3.2 學習率調度

動態調整學習率可以提高訓練效果：

# 定義學習率調度器
# optim.lr_scheduler.StepLR參數:
# optimizer: 優化器
# step_size: 多少epoch后調整學習率
# gamma: 學習率衰減因子
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)# 在訓練循環中添加
# scheduler.step()

3.3 模型保存與加載?

# 保存模型
torch.save(model.state_dict(), 'mlp_model.pth')# 加載模型
loaded_model = MLP(input_size)
loaded_model.load_state_dict(torch.load('mlp_model.pth'))
loaded_model.eval()

完整示例：使用PyTorch創建并訓練全連接神經網絡

下面是一個完整的示例代碼，展示了如何使用PyTorch創建、訓練和評估一個全連接神經網絡（MLP），包含詳細注釋和最佳實踐。

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import numpy as np# 1. 設置隨機種子保證可復現性
torch.manual_seed(42)
np.random.seed(42)# 2. 數據準備
def prepare_data():"""生成并準備訓練數據"""# 生成模擬數據集 (1000個樣本，20個特征，2個類別)X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, n_informative=15, random_state=42)# 數據標準化scaler = StandardScaler()X = scaler.fit_transform(X)# 轉換為PyTorch張量X = torch.from_numpy(X).float()y = torch.from_numpy(y).float().view(-1, 1)  # 調整形狀為(n_samples, 1)# 劃分訓練集和測試集 (80%訓練，20%測試)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# 創建DataLoadertrain_dataset = TensorDataset(X_train, y_train)test_dataset = TensorDataset(X_test, y_test)train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)return train_loader, test_loader, X_train.shape[1]# 3. 定義模型
class MLP(nn.Module):"""全連接神經網絡模型"""def __init__(self, input_size):"""初始化MLP參數:input_size (int): 輸入特征維度"""super(MLP, self).__init__()# 網絡結構self.fc1 = nn.Linear(input_size, 128)  # 第一隱藏層self.fc2 = nn.Linear(128, 64)         # 第二隱藏層self.fc3 = nn.Linear(64, 32)          # 第三隱藏層self.fc4 = nn.Linear(32, 1)           # 輸出層# Dropout層 (防止過擬合)self.dropout = nn.Dropout(p=0.3)# 批歸一化層 (加速訓練)self.bn1 = nn.BatchNorm1d(128)self.bn2 = nn.BatchNorm1d(64)self.bn3 = nn.BatchNorm1d(32)def forward(self, x):"""前向傳播"""x = F.relu(self.bn1(self.fc1(x)))x = self.dropout(x)x = F.relu(self.bn2(self.fc2(x)))x = self.dropout(x)x = F.relu(self.bn3(self.fc3(x)))x = self.dropout(x)x = torch.sigmoid(self.fc4(x))  # 二分類使用sigmoidreturn xdef initialize_weights(self):"""自定義權重初始化"""for m in self.modules():if isinstance(m, nn.Linear):nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')if m.bias is not None:nn.init.constant_(m.bias, 0)# 4. 訓練和評估函數
def train_model(model, train_loader, test_loader, num_epochs=100):"""訓練模型并記錄指標"""# 定義損失函數和優化器criterion = nn.BCELoss()  # 二分類交叉熵損失optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)# 學習率調度器scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=True)# 記錄指標history = {'train_loss': [],'test_loss': [],'train_acc': [],'test_acc': []}for epoch in range(num_epochs):# 訓練階段model.train()running_loss = 0.0correct = 0total = 0for inputs, labels in train_loader:optimizer.zero_grad()outputs = model(inputs)loss = criterion(outputs, labels)loss.backward()optimizer.step()running_loss += loss.item()predicted = (outputs > 0.5).float()total += labels.size(0)correct += (predicted == labels).sum().item()# 計算訓練集指標train_loss = running_loss / len(train_loader)train_acc = 100 * correct / totalhistory['train_loss'].append(train_loss)history['train_acc'].append(train_acc)# 評估階段model.eval()test_loss = 0.0test_correct = 0test_total = 0with torch.no_grad():for inputs, labels in test_loader:outputs = model(inputs)loss = criterion(outputs, labels)test_loss += loss.item()predicted = (outputs > 0.5).float()test_total += labels.size(0)test_correct += (predicted == labels).sum().item()# 計算測試集指標test_loss /= len(test_loader)test_acc = 100 * test_correct / test_totalhistory['test_loss'].append(test_loss)history['test_acc'].append(test_acc)# 更新學習率scheduler.step(test_loss)# 打印進度print(f'Epoch [{epoch+1}/{num_epochs}] | 'f'Train Loss: {train_loss:.4f}, Acc: {train_acc:.2f}% | 'f'Test Loss: {test_loss:.4f}, Acc: {test_acc:.2f}%')return historydef plot_history(history):"""繪制訓練曲線"""plt.figure(figsize=(12, 5))# 損失曲線plt.subplot(1, 2, 1)plt.plot(history['train_loss'], label='Train Loss')plt.plot(history['test_loss'], label='Test Loss')plt.title('Training and Validation Loss')plt.xlabel('Epoch')plt.ylabel('Loss')plt.legend()# 準確率曲線plt.subplot(1, 2, 2)plt.plot(history['train_acc'], label='Train Accuracy')plt.plot(history['test_acc'], label='Test Accuracy')plt.title('Training and Validation Accuracy')plt.xlabel('Epoch')plt.ylabel('Accuracy (%)')plt.legend()plt.tight_layout()plt.show()# 5. 主程序
def main():# 準備數據train_loader, test_loader, input_size = prepare_data()# 初始化模型model = MLP(input_size)model.initialize_weights()  # 自定義權重初始化# 打印模型結構print(model)# 訓練模型history = train_model(model, train_loader, test_loader, num_epochs=50)# 繪制訓練曲線plot_history(history)# 保存模型torch.save(model.state_dict(), 'mlp_model.pth')print("Model saved to mlp_model.pth")if __name__ == '__main__':main()