Pytorch 復習總結 4

Pytorch 復習總結，僅供筆者使用，參考教材：

《動手學深度學習》
Stanford University: Practical Machine Learning

本文主要內容為：Pytorch 深度學習計算。

本文先介紹了深度學習中自定義層和塊的方法，然后介紹了一些有關參數的管理和讀寫方法，最后介紹了 GPU 的使用方法。

Pytorch 語法匯總：

Pytorch 張量的常見運算、線性代數、高等數學、概率論部分見 Pytorch 復習總結1；
Pytorch 線性神經網絡部分見 Pytorch 復習總結2；
Pytorch 多層感知機部分見 Pytorch 復習總結3；
Pytorch 深度學習計算部分見 Pytorch 復習總結4；
Pytorch 卷積神經網絡部分見 Pytorch 復習總結5；
Pytorch 現代卷積神經網絡部分見 Pytorch 復習總結6；

層是神經網絡的基本組成單元，如全連接層、卷積層、池化層等。塊是由層組成的更大的功能單元，用于構建復雜的神經網絡結構。塊可以是一系列相互關聯的層，形成一個功能完整的單元，也可以是一組層的重復模式，用于實現重復的結構。下圖就是多個層組合成塊形成的更大模型：
在這里插入圖片描述

在實際應用中，經常會需要自定義層和塊。

一. 自定義塊

1. 順序塊

nn.Sequential 本質上就是一個順序塊，通過在塊中實例化層來創建神經網絡。 nn.Module 是 PyTorch 中用于構建神經網絡模型的基類，nn.Sequential 和各種層都是繼承自 Module，nn.Sequential 維護一個由多個層組成的有序列表，列表中的每個層連接在一起，將每個層的輸出作為下一個層的輸入。

如果想要自定義一個順序塊，必須要定義以下兩個關鍵函數：

構造函數：將每個層按順序逐個加入列表；
前向傳播函數：將每一層按順序傳遞給下一層；

import torch
from torch import nnclass MySequential(nn.Module):def __init__(self, *args):super().__init__()for idx, module in enumerate(args):self._modules[str(idx)] = moduledef forward(self, X):# self._modules的類型是OrderedDictfor block in self._modules.values():X = block(X)return Xnet = MySequential(nn.Linear(20, 256),nn.ReLU(),nn.Linear(256, 10)
)X = torch.rand(2, 20)
output = net(X)

上述示例代碼中，定義 net 時會自動調用 __init__(self, *args) 函數，實例化 MySequential 對象；調用 net(X) 相當于 net.__call__(X)，會自動調用模型類中定義的 forward() 函數，進行前向傳播，每一層的傳播本質上就是調用 block(X) 的過程。

2. 自定義前向傳播

nn.Sequential 類將前向傳播過程封裝成函數，用戶可以自由使用但沒法修改傳播細節。如果想要自定義前向傳播過程中的細節，就需要自定義順序塊及 forward 函數，而不能僅僅依賴預定義的框架。

例如，需要一個計算函數 $f(\bold x,\bold w)=c \cdot \bold w ^T \bold x$ 的層，并且在傳播過程中引入控制流。其中 $\bold x$ 是輸入， $\bold w$ 是參數， $c$ 是優化過程中不需要更新的指定常量。為此，定義 FixedHiddenMLP 類如下：

import torch
from torch import nn
from torch.nn import functional as Fclass FixedHiddenMLP(nn.Module):def __init__(self):super().__init__()self.rand_weight = torch.rand((20, 20), requires_grad=False)    # 優化過程中不需要更新的指定常量self.linear = nn.Linear(20, 20)def forward(self, X):X = self.linear(X)X = F.relu(torch.mm(X, self.rand_weight) + 1)X = self.linear(X)          # 兩個全連接層共享參數while X.abs().sum() > 1:    # 控制流X /= 2return X

3. 嵌套塊

多個層可以組合成塊，多個塊還可以嵌套形成更大的模型：

import torch
from torch import nn
from torch.nn import functional as Fclass FixedHiddenMLP(nn.Module):def __init__(self):super().__init__()self.rand_weight = torch.rand((20, 20), requires_grad=False)    # 優化過程中不需要更新的指定常量self.linear = nn.Linear(20, 20)def forward(self, X):X = self.linear(X)X = F.relu(torch.mm(X, self.rand_weight) + 1)X = self.linear(X)          # 兩個全連接層共享參數while X.abs().sum() > 1:    # 控制流X /= 2return X.sum()class NestMLP(nn.Module):def __init__(self):super().__init__()self.net = nn.Sequential(nn.Linear(20, 64), nn.ReLU(),nn.Linear(64, 32), nn.ReLU())self.linear = nn.Linear(32, 16)def forward(self, X):return self.linear(self.net(X))net = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixedHiddenMLP()
)X = torch.rand(2, 20)
output = net(X)

二. 自定義層

和自定義塊一樣，自定義層也需要實現構造函數和前向傳播函數。

1. 無參數層

import torch
from torch import nnclass CenteredLayer(nn.Module):def __init__(self):super().__init__()def forward(self, X):return X - X.mean()net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())
X = torch.rand(4, 8)
output = net(X)
print(output.mean())	# tensor(0., grad_fn=<MeanBackward0>)

2. 有參數層

import torch
from torch import nn
import torch.nn.functional as Fclass MyLinear(nn.Module):def __init__(self, in_units, out_units):super().__init__()self.weight = nn.Parameter(torch.randn(in_units, out_units))self.bias = nn.Parameter(torch.randn(out_units,))def forward(self, X):linear = torch.matmul(X, self.weight.data) + self.bias.datareturn F.relu(linear)net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1)
)
X = torch.rand(2, 64)
output = net(X)
print(output)       # tensor([[11.9497], [13.9729]])

三. 參數管理

在實驗過程中，有時需要提取參數，以便檢查或在其他環境中復用。本節將介紹參數的訪問方法和參數的初始化。

1. 參數訪問

net.state_dict() / net[i].state_dict()：返回模型或某一層參數的狀態字典；
net[i].weight.data / net[i].bias.data：返回某一層的權重 / 偏置參數；
net[i].weight.grad：返回某一層的權重參數的梯度屬性。只有調用了 backward() 方法后才能訪問到梯度值，否則為 None；

import torch
from torch import nnnet = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
output = net(X)print(net.state_dict())
'''
OrderedDict([('0.weight', tensor([[ 0.2178, -0.3286,  0.4875, -0.0347],[-0.0415,  0.0009, -0.2038, -0.1813],[-0.2766, -0.4759, -0.3134, -0.2782],[ 0.4854,  0.0606,  0.1070,  0.0650],[-0.3908,  0.2412, -0.1348,  0.3921],[-0.3044, -0.0331, -0.1213, -0.1690],[-0.3875, -0.0117,  0.3195, -0.1748],[ 0.1840, -0.3502,  0.4253,  0.2789]])), ('0.bias', tensor([-0.2327, -0.0745,  0.4923, -0.1018,  0.0685,  0.4423, -0.2979,  0.1109])), ('2.weight', tensor([[ 0.1006,  0.2959, -0.1316, -0.2015,  0.2446, -0.0158,  0.2217, -0.2780]])), ('2.bias', tensor([0.2362]))])
'''
print(net[2].state_dict())
'''
OrderedDict([('weight', tensor([[ 0.1006,  0.2959, -0.1316, -0.2015,  0.2446, -0.0158,  0.2217, -0.2780]])), ('bias', tensor([0.2362]))])
'''
print(net[2].bias)
'''
Parameter containing:
tensor([0.2362], requires_grad=True)
'''
print(net[2].bias.data)
'''
tensor([0.2362])
'''

如果想一次性訪問所有參數，可以使用 for 循環遞歸遍歷：

import torch
from torch import nnnet = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
output = net(X)print(*[(name, param.data) for name, param in net[0].named_parameters()])
'''
('weight', tensor([[-0.0273, -0.4942, -0.0880,  0.3169],[ 0.2205,  0.3344, -0.4425, -0.0882],[ 0.1726, -0.0007, -0.0256, -0.0593],[-0.3854, -0.0934, -0.4641,  0.1950],[ 0.2358, -0.4820, -0.2315,  0.1642],[-0.2645,  0.2021,  0.3167, -0.0042],[ 0.1714, -0.2201, -0.3326, -0.2908],[-0.3196,  0.0584, -0.1059,  0.0256]])) ('bias', tensor([ 0.3285,  0.4167, -0.2343,  0.3099,  0.1576, -0.0397, -0.2190, -0.3854]))
'''
print(*[(name, param.shape) for name, param in net.named_parameters()])
'''
('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1]))
'''

如果網絡是由多個塊相互嵌套的，可以按塊索引后再訪問參數：

import torch
from torch import nndef block1():return nn.Sequential(nn.Linear(4, 8), nn.ReLU(),nn.Linear(8, 4), nn.ReLU())def block2():net = nn.Sequential()for i in range(4):net.add_module(f'block {i}', block1())return netnet = nn.Sequential(block2(), nn.Linear(4, 1))
X = torch.rand(size=(2, 4))
output = net(X)print(net)
'''
Sequential((0): Sequential((block 0): Sequential((0): Linear(in_features=4, out_features=8, bias=True)(1): ReLU()(2): Linear(in_features=8, out_features=4, bias=True)(3): ReLU())(block 1): Sequential((0): Linear(in_features=4, out_features=8, bias=True)(1): ReLU()(2): Linear(in_features=8, out_features=4, bias=True)(3): ReLU())(block 2): Sequential((0): Linear(in_features=4, out_features=8, bias=True)(1): ReLU()(2): Linear(in_features=8, out_features=4, bias=True)(3): ReLU())(block 3): Sequential((0): Linear(in_features=4, out_features=8, bias=True)(1): ReLU()(2): Linear(in_features=8, out_features=4, bias=True)(3): ReLU()))(1): Linear(in_features=4, out_features=1, bias=True)
)
'''
print(net[0][1][0].bias.data)
'''
tensor([-0.0083,  0.2490,  0.1794,  0.1927,  0.1797,  0.1156,  0.4409,  0.1320])
'''

2. 參數初始化

PyTorch 的 nn.init 模塊提供了多種初始化方法：

nn.init.constant_(layer.weight, c)：將權重參數初始化為指定的常量值；
nn.init.zeros_(layer.weight)：將權重參數初始化為 0；
nn.init.ones_(layer.weight)：將權重參數初始化為 1；
nn.init.uniform_(layer.weight, a, b)：將權重參數按均勻分布初始化；
nn.init.xavier_uniform_(layer.weight)：
nn.init.normal_(layer.weight, mean, std)：將權重參數按正態分布初始化；
nn.init.orthogonal_(layer.weight)：將權重參數初始化為正交矩陣；
nn.init.sparse_(layer.weight, sparsity, std)：將權重參數初始化為稀疏矩陣；

初始化時，可以直接 net.apply(init_method) 初始化整個網絡，也可以 net[i].apply(init_method) 初始化某一層：

import torch
from torch import nndef init_normal(m):if type(m) == nn.Linear:nn.init.normal_(m.weight, mean=0, std=0.01)nn.init.zeros_(m.bias)def init_constant(m):if type(m) == nn.Linear:nn.init.constant_(m.weight, 1)nn.init.zeros_(m.bias)net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
output = net(X)# net.apply(init_normal)
net[0].apply(init_normal)
net[2].apply(init_constant)

3. 延后初始化

有些情況下，無法提前判斷網絡的輸入維度。為了代碼能夠繼續運行，需要使用延后初始化，即直到數據第一次通過模型傳遞時，框架才會動態地推斷出每個層的大小。由于 PyTorch 的延后初始化功能還處于開發階段，API 和功能隨時可能變化，下面只給出簡單示例：

import torch
from torch import nnnet = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))print(net)
'''
Sequential((0): LazyLinear(in_features=0, out_features=256, bias=True)(1): ReLU()(2): LazyLinear(in_features=0, out_features=10, bias=True)
)
'''X = torch.rand(2, 20)
net(X)
print(net)
'''
Sequential((0): Linear(in_features=20, out_features=256, bias=True)(1): ReLU()(2): Linear(in_features=256, out_features=10, bias=True)
)
'''

四. 文件讀寫

可以使用 torch.load(file) 和 torch.save(x, file) 函數讀寫張量和模型參數。

1. 加載和保存張量

只要保證讀寫格式一致即可：

import torch
from torch import nn
from torch.nn import functional as Fx = torch.arange(4)
y = torch.zeros(4)
torch.save([x, y],'xy.pth')x2, y2 = torch.load('xy.pth')
print(x2, y2)       # tensor([0, 1, 2, 3]) tensor([0., 0., 0., 0.])

保存張量的文件格式沒有要求，甚至可以沒有后綴名。因為 torch.save(x, file) 函數本質上是使用 Python 的 pickle 模塊來序列化對象并將其保存到文件中的，pickle 模塊負責將 Python 對象轉換為字節流，而文件的擴展名本身并不影響 pickle 模塊的工作。

2. 加載和保存模型參數

因為模型一般是自定義的類，所以加載模型前要先實例化一個相同類別的變量，再將模型參數加載到該變量中：

import torch
from torch import nn
from torch.nn import functional as Fclass MLP(nn.Module):def __init__(self):super().__init__()self.hidden = nn.Linear(4, 2)self.output = nn.Linear(2, 3)def forward(self, x):return self.output(F.relu(self.hidden(x)))net = MLP()
X = torch.randn(size=(2, 4))
Y = net(X)
print(net.state_dict())
'''
OrderedDict([('hidden.weight', tensor([[-0.0154, -0.3586, -0.3653, -0.2950],[ 0.2591, -0.2563,  0.3833,  0.1449]])), ('hidden.bias', tensor([0.1884, 0.3998])), ('output.weight', tensor([[-0.4805,  0.4077],[-0.0933,  0.0584],[ 0.3114,  0.6285]])), ('output.bias', tensor([-0.2552, -0.6520,  0.3290]))])
'''torch.save(net.state_dict(), 'mlp.pth')net2 = MLP()
net2.load_state_dict(torch.load('mlp.pth'))
print(net2.state_dict())
'''
OrderedDict([('hidden.weight', tensor([[-0.0154, -0.3586, -0.3653, -0.2950],[ 0.2591, -0.2563,  0.3833,  0.1449]])), ('hidden.bias', tensor([0.1884, 0.3998])), ('output.weight', tensor([[-0.4805,  0.4077],[-0.0933,  0.0584],[ 0.3114,  0.6285]])), ('output.bias', tensor([-0.2552, -0.6520,  0.3290]))])
'''

也可以只保存單層參數：

import torch
from torch import nn
from torch.nn import functional as Fclass MLP(nn.Module):def __init__(self):super().__init__()self.hidden = nn.Linear(4, 2)self.output = nn.Linear(2, 3)def forward(self, x):return self.output(F.relu(self.hidden(x)))net = MLP()
X = torch.randn(size=(2, 4))
Y = net(X)
print(net.state_dict())
'''
OrderedDict([('hidden.weight', tensor([[-0.2937,  0.1589,  0.2349,  0.1130],[ 0.4170,  0.2699,  0.3760,  0.0201]])), ('hidden.bias', tensor([ 0.3914, -0.1185])), ('output.weight', tensor([[0.0884, 0.2572],[0.1547, 0.0164],[0.3386, 0.5151]])), ('output.bias', tensor([-0.5032, -0.2515, -0.4531]))])
'''torch.save(net.hidden.state_dict(), 'mlp.pth')net2 = MLP()
net2.hidden.load_state_dict(torch.load('mlp.pth'))
print(net2.state_dict())
'''
OrderedDict([('hidden.weight', tensor([[-0.2937,  0.1589,  0.2349,  0.1130],[ 0.4170,  0.2699,  0.3760,  0.0201]])), ('hidden.bias', tensor([ 0.3914, -0.1185])), ('output.weight', tensor([[ 0.2318,  0.3837],[ 0.2380,  0.6463],[-0.6014,  0.3717]])), ('output.bias', tensor([-0.3154, -0.0078, -0.2676]))])
'''

五. GPU 計算

在 PyTorch 中，CPU 和 GPU 分別可以用 torch.device('cpu') 和 torch.device('cuda') 表示。如果有多個 GPU，可以使用 torch.device(fcuda:i') 來表示，cuda:0 和 cuda 等價。可以查詢所有可用 GPU 也可以指定 GPU：

import torchdef try_gpu(i=0):if torch.cuda.device_count() >= i + 1:return torch.device(f'cuda:{i}')return torch.device('cpu')def try_all_gpus():devices = [torch.device(f'cuda:{i}')for i in range(torch.cuda.device_count())]return devices if devices else [torch.device('cpu')]device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

張量和網絡模型都可以通過 .cuda(device) 函數移動到 GPU 上：

import torch
from torch import nnnet = nn.Sequential(nn.Linear(2, 64),nn.ReLU(),nn.Linear(64, 32)
)
X = torch.randn(size=(4, 2))
loss = nn.MSELoss()device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')net.cuda(device=device)
X.cuda(device=device)
loss.cuda(device=device)