深入理解Softmax函數及其在PyTorch中的實現

Softmax函數簡介

Softmax函數在機器學習和深度學習中，被廣泛用于多分類問題的輸出層。它將一個實數向量轉換為概率分布，使得每個元素介于0和1之間，且所有元素之和為1。

Softmax函數的定義

給定一個長度為 $K$ 的輸入向量 $\boldsymbol{z} = [z_1, z_2, \dots, z_K]$ ，Softmax函數 $\sigma(\boldsymbol{z})$ 定義為：

$\sigma(\boldsymbol{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}, \quad \text{對于所有 } i = 1, 2, \dots, K$

其中：

$e$ 是自然對數的底數，約為2.71828。
$\sigma(\boldsymbol{z})_i$ 是輸入向量第 $i$ 個分量對應的Softmax輸出。

Softmax函數的特點

將輸出轉換為概率分布：Softmax的輸出向量中的每個元素都在 $(0, 1)$ 之間，并且所有元素的和為1，這使得輸出可以視為各類別的概率。
強調較大的值：Softmax函數會放大輸入向量中較大的元素對應的概率，同時壓縮較小的元素對應的概率。這種特性有助于突出模型認為更有可能的類別。
可微性：Softmax函數是可微的，這對于基于梯度的優化算法（如反向傳播）非常重要。

數值穩定性的問題

在實際計算中，為了防止指數函數計算過程中可能出現的數值溢出，通常會對輸入向量進行調整。常見的做法是在計算Softmax之前，從輸入向量的每個元素中減去向量的最大值：

$\sigma(\boldsymbol{z})_i = \frac{e^{z_i - z_{\text{max}}}}{\sum_{j=1}^{K} e^{z_j - z_{\text{max}}}}$

其中， $z_{\text{max}} = \max\{z_1, z_2, \dots, z_K\}$ 。這種調整不會改變Softmax的輸出結果，但能提高計算的數值穩定性。

Softmax函數的應用場景

多分類問題：在神經網絡的最后一層，Softmax函數常用于將模型的線性輸出轉換為概率分布，以進行多分類預測。
注意力機制：在深度學習中的注意力模型中，Softmax用于計算注意力權重，以突顯重要的輸入特征。
語言模型：在自然語言處理任務中，Softmax函數用于預測下一個詞的概率分布。

Softmax函數的示例計算

假設有一個三類別分類問題，神經網絡的輸出為一個長度為3的向量：

$\boldsymbol{z} = [z_1, z_2, z_3] = [2.0, 1.0, 0.1]$

我們想使用Softmax函數將其轉換為概率分布。

步驟1：計算每個元素的指數

$\begin{align*} e^{z_1} &= e^{2.0} = 7.3891 \\ e^{z_2} &= e^{1.0} = 2.7183 \\ e^{z_3} &= e^{0.1} = 1.1052 \end{align*}$

步驟2：計算指數和

$\text{sum} = e^{z_1} + e^{z_2} + e^{z_3} = 7.3891 + 2.7183 + 1.1052 = 11.2126$

步驟3：計算Softmax輸出

$\begin{align*} \sigma_1 &= \frac{e^{z_1}}{\text{sum}} = \frac{7.3891}{11.2126} = 0.6590 \\ \sigma_2 &= \frac{e^{z_2}}{\text{sum}} = \frac{2.7183}{11.2126} = 0.2424 \\ \sigma_3 &= \frac{e^{z_3}}{\text{sum}} = \frac{1.1052}{11.2126} = 0.0986 \end{align*}$

因此，經過Softmax函數后，輸出概率分布為：

$\sigma(\boldsymbol{z}) = [0.6590, 0.2424, 0.0986]$

這表示模型預測第一個類別的概率約為65.9%，第二個類別約為24.24%，第三個類別約為9.86%。

使用PyTorch實現Softmax函數

在PyTorch中，可以通過多種方式實現Softmax函數。以下將通過示例演示如何使用torch.nn.functional.softmax和torch.nn.Softmax。

創建輸入數據

首先，創建一個示例輸入張量：

import torch
import torch.nn as nn
import torch.nn.functional as F# 創建一個輸入張量，形狀為 (batch_size, features)
input_tensor = torch.tensor([[2.0, 1.0, 0.1],[1.0, 3.0, 0.2]])
print("輸入張量：")
print(input_tensor)

輸出：

輸入張量：
tensor([[2.0000, 1.0000, 0.1000],[1.0000, 3.0000, 0.2000]])

方法一：使用`torch.nn.functional.softmax`

利用PyTorch中torch.nn.functional.softmax函數直接對輸入數據應用Softmax。

# 在維度1上（即特征維）應用Softmax
softmax_output = F.softmax(input_tensor, dim=1)
print("\nSoftmax輸出：")
print(softmax_output)

輸出：

Softmax輸出：
tensor([[0.6590, 0.2424, 0.0986],[0.1065, 0.8726, 0.0209]])

方法二：使用`torch.nn.Softmax`模塊

也可以使用torch.nn中的Softmax模塊。

# 創建一個Softmax層實例
softmax = nn.Softmax(dim=1)# 對輸入張量應用Softmax層
softmax_output_module = softmax(input_tensor)
print("\n使用nn.Softmax模塊的輸出：")
print(softmax_output_module)

輸出：

使用nn.Softmax模塊的輸出：
tensor([[0.6590, 0.2424, 0.0986],[0.1065, 0.8726, 0.0209]])

在神經網絡模型中應用Softmax

構建一個簡單的神經網絡模型，在最后一層使用Softmax激活函數。

class SimpleNetwork(nn.Module):def __init__(self, input_size, num_classes):super(SimpleNetwork, self).__init__()self.layer1 = nn.Linear(input_size, 5)self.layer2 = nn.Linear(5, num_classes)# 使用LogSoftmax提高數值穩定性self.softmax = nn.LogSoftmax(dim=1)def forward(self, x):x = F.relu(self.layer1(x))x = self.layer2(x)x = self.softmax(x)return x# 定義輸入大小和類別數
input_size = 3
num_classes = 3# 創建模型實例
model = SimpleNetwork(input_size, num_classes)# 查看模型結構
print("\n模型結構：")
print(model)

輸出：

模型結構：
SimpleNetwork((layer1): Linear(in_features=3, out_features=5, bias=True)(layer2): Linear(in_features=5, out_features=3, bias=True)(softmax): LogSoftmax(dim=1)
)

前向傳播：

# 將輸入數據轉換為浮點型張量
input_data = input_tensor.float()# 前向傳播
output = model(input_data)
print("\n模型輸出（對數概率）：")
print(output)

輸出：

模型輸出（對數概率）：
tensor([[-1.2443, -0.7140, -1.2645],[-1.3689, -0.6535, -1.5142]], grad_fn=<LogSoftmaxBackward0>)

轉換為概率：

# 取指數，轉換為概率
probabilities = torch.exp(output)
print("\n模型輸出（概率）：")
print(probabilities)

輸出：

模型輸出（概率）：
tensor([[0.2882, 0.4898, 0.2220],[0.2541, 0.5204, 0.2255]], grad_fn=<ExpBackward0>)

預測類別：

# 獲取每個樣本概率最大的類別索引
predicted_classes = torch.argmax(probabilities, dim=1)
print("\n預測的類別：")
print(predicted_classes)

輸出：

預測的類別：
tensor([1, 1])

`torch.nn.functional.softmax`與`torch.nn.Softmax`的區別

函數式API與模塊化API的設計理念

PyTorch提供了兩種API：

函數式API (torch.nn.functional)：
- 特點：無狀態（Stateless），不包含可學習的參數。
- 使用方式：直接調用函數。
- 適用場景：需要在forward方法中靈活應用各種操作。
模塊化API (torch.nn.Module)：
- 特點：有狀態（Stateful），可能包含可學習的參數，即使某些模塊沒有參數（如Softmax），但繼承自nn.Module。
- 使用方式：需要先實例化，再在前向傳播中調用。
- 適用場景：構建模型時，統一管理各個層和操作。

具體到Softmax的實現

torch.nn.functional.softmax（函數）：
- 使用示例：
```
import torch.nn.functional as F
output = F.softmax(input_tensor, dim=1)
```
- 特點：直接調用，簡潔靈活。
torch.nn.Softmax（模塊）：
- 使用示例：
```
import torch.nn as nn
softmax = nn.Softmax(dim=1)
output = softmax(input_tensor)
```
- 特點：作為模型的一層，便于與其他層組合，保持代碼結構一致。

為什么存在兩個實現？

提供兩種實現方式是為了滿足不同開發者的需求和編程風格。

使用nn.Softmax的優勢：
- 在模型定義階段明確各層，結構清晰。
- 便于使用nn.Sequential構建順序模型。
- 統一管理模型的各個部分。
使用F.softmax的優勢：
- 代碼簡潔，直接調用函數。
- 適用于需要在forward中進行靈活操作的情況。

使用示例

使用`nn.Softmax`

import torch
import torch.nn as nn# 定義模型
class MyModel(nn.Module):def __init__(self):super(MyModel, self).__init__()self.layer = nn.Linear(10, 5)self.softmax = nn.Softmax(dim=1)def forward(self, x):x = self.layer(x)x = self.softmax(x)return x# 實例化和使用
model = MyModel()
input_tensor = torch.randn(2, 10)
output = model(input_tensor)
print(output)

使用`F.softmax`

import torch
import torch.nn as nn
import torch.nn.functional as F# 定義模型
class MyModel(nn.Module):def __init__(self):super(MyModel, self).__init__()self.layer = nn.Linear(10, 5)def forward(self, x):x = self.layer(x)x = F.softmax(x, dim=1)return x# 實例化和使用
model = MyModel()
input_tensor = torch.randn(2, 10)
output = model(input_tensor)
print(output)