pytorch學習筆記（二）-- pytorch模型開發步驟詳解

簡介：

? ? ? ? 本章主要是針對Pytorch神經網絡的開發步驟做一個詳細的總結，對每一步的前世今生做一個了解，下面先列一下開發需要的步驟有哪些：

模型構建，主要是前向傳遞函數的確認
確認損失函數以及學習步頻（learning_rate）
基于損失函數，對模型層的權值進行求導
權值更新，實現梯度遞減，然后恢復權值導數 a.grad = None
二輪循環

Pytorch模型開發詳解：

? ? ? ? 下面我們將參照上面的步驟實現一個簡單的模型，然后從最原始的方式到當前最新的方式來梳理每一個步驟的意義。

? ? ? ? 1、原始模型構建

? ? ? ? 模型要求：使用三階多項擬合 y=sin(x) 的問題作為運行示例，網絡會有四個參數，并將通過梯度下降進行訓練，以通過最小化網絡輸出和真實輸出之間的歐幾里得距離（多維空間兩點之間的距離）來擬合隨機數據

? ? ? ? 模型分析：

? ? ? ? ? ? ? ? 輸入X：(-pai,pai)

? ? ? ? ? ? ? ? 輸出Y：sin(x)

? ? ? ? ? ? ? ? 模型預估值Ypred：a + b*x + c*x^2 + d*X^3

? ? ? ? ? ? ? ? 損失函數：（Ypred - Y）^2

import torch
import mathdtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU# 創建隨機的輸入，并計算對應的輸出
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)# 隨機初始化權值，如果想保證每次的初始權值一致，可以采用torch.manual_seed()
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)learning_rate = 1e-6
for t in range(2000):# Forward pass: compute predicted yy_pred = a + b * x + c * x ** 2 + d * x ** 3# 定義損失函數，并計算損失值loss = (y_pred - y).pow(2).sum().item()if t % 100 == 99:print(t, loss)# 基于損失函數，計算每個權值的導數grad_y_pred = 2.0 * (y_pred - y)grad_a = grad_y_pred.sum()grad_b = (grad_y_pred * x).sum()grad_c = (grad_y_pred * x ** 2).sum()grad_d = (grad_y_pred * x ** 3).sum()#自己計算一些損失函數的求導函數#loss = ( a + b * x + c * x ** 2 + d * x ** 3 - y).pow(2).sum().item()#lossa = 2(a + bx + c * x ** 2 + d * x ** 3 - y)#lossb = 2(a + bx + c * x ** 2 + d * x ** 3 - y)*x#lossc = 2(a + bx + c * x ** 2 + d * x ** 3 - y)*x^2#lossd = 2(a + bx + c * x ** 2 + d * x ** 3 - y)*x^3# 按照梯度相反的方向更新權值，實現梯度遞減a -= learning_rate * grad_ab -= learning_rate * grad_bc -= learning_rate * grad_cd -= learning_rate * grad_dprint(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

? ? ? ? 2、求導步驟的優化

????????在上面的案例中，我們手動的實現了神經網絡前向和后向傳遞。對于小型網絡來說，手動實現后向傳遞沒問題，但是對于大型網絡來說，會變得很棘手。但是值得感謝的是，我們可以使用自動差分來在神經網絡中對反向傳遞進行自動計算。使用autograd后，網絡的前向傳遞會定義一個計算圖。里面的節點就是tensors,然后邊緣就是從輸入產生輸出的函數。通過這個圖進行反向傳播是我們能輕易的計算出梯度.

????????前向傳遞：從輸入層到輸出層計算預測值的過程。整個過程，神經網絡的權重和偏置是固定的，目的是根據給定輸入計算輸出。
????????????????- 輸入層：輸入數據被傳遞給神經網絡的輸入層
????????????????- 隱藏層：數據通過每一層的神經元進行處理。每個神經元的輸入是前一層所有神經元的輸出（加權和），經過激活函數（如ReLU、sigmoid、tanh等）后產生輸出。
????????????????- 輸出層：最終，數據通過最后一層（輸出層）生成預測值
????????后向傳播：神經網絡根據損失函數計算梯度并更新權重和偏置的過程
????????????????- 損失函數：計算預測值和實際值之間的誤差，通常使用損失函數（如均方差MSE，交叉熵損失等）
????????????????- 梯度計算：使用鏈式法則計算損失函數對每一層權重和偏置的梯度（導數）
????????????????- 權重更新：使用優化算法（梯度下降、Adam等）更新權重和偏置

import torch
import mathdtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU# 創建隨機的輸入，并計算對應的輸出
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)# 隨機初始化權值，如果想保證每次的初始權值一致，可以采用torch.manual_seed()
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)learning_rate = 1e-6
for t in range(2000):# Forward pass: compute predicted yy_pred = a + b * x + c * x ** 2 + d * x ** 3# 定義損失函數，并計算損失值loss = (y_pred - y).pow(2).sum().item()if t % 100 == 99:print(t, loss)# 采用backward實現權值導數計算loss.backward()# 按照梯度相反的方向更新權值，實現梯度遞減with torch.no_grad():a -= learning_rate * a.gradb -= learning_rate * b.gradc -= learning_rate * c.gradd -= learning_rate * d.grad# Manually zero the gradients after updating weightsa.grad = Noneb.grad = Nonec.grad = Noned.grad = Noneprint(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

? ? ? ? 需要注意的是，當我們采用backward進行求導后，要更新權值參數，必須先調用torch.no_grad()來禁用梯度跟蹤，默認情況下，所有涉及?requires_grad=True?的張量操作都會被記錄到計算圖中，用于反向傳播，造成內存浪費以及梯度計算混亂。

? ? ? ? 3、nn.Module的使用? ? ?

????????????????計算圖和 autograd 是定義復雜運算符和自動取導數的非常強大的范式;但是，對于大型神經網絡，Raw Autograd 可能有點太低級了。在tensorflow中，像?Keras,?TensorFlow-Slim, and?TFLearn這些包提供了對原始計算圖的高級抽象，這些對于構建神經網絡非常有用。

????????在pytorch中，nn包提供相同的作用，nn包有一系列的module，它們大致相當于神經網絡層。一個module既可以接收輸入tensors，然后計算出輸出tensors, 也可以保存包含學習參數的tensors狀態。nn包里面也定義了一系列有用的損失函數，常用于訓練神經網絡的時候使用。

import torch
import math# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)# In the above code, x.unsqueeze(-1) has shape (2000, 1), and p has shape
# (3,), for this case, broadcasting semantics will apply to obtain a tensor
# of shape (2000, 3) # Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. The Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
# The Flatten layer flatens the output of the linear layer to a 1D tensor,
# to match the shape of `y`.
model = torch.nn.Sequential(torch.nn.Linear(3, 1),torch.nn.Flatten(0, 1)
)# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')learning_rate = 1e-6
for t in range(2000):# Forward pass: compute predicted y by passing x to the model. Module objects# override the __call__ operator so you can call them like functions. When# doing so you pass a Tensor of input data to the Module and it produces# a Tensor of output data.y_pred = model(xx)# Compute and print loss. We pass Tensors containing the predicted and true# values of y, and the loss function returns a Tensor containing the# loss.loss = loss_fn(y_pred, y)if t % 100 == 99:print(t, loss.item())# Zero the gradients before running the backward pass.model.zero_grad()# Backward pass: compute gradient of the loss with respect to all the learnable# parameters of the model. Internally, the parameters of each Module are stored# in Tensors with requires_grad=True, so this call will compute gradients for# all learnable parameters in the model.loss.backward()# Update the weights using gradient descent. Each parameter is a Tensor, so# we can access its gradients like we did before.with torch.no_grad():for param in model.parameters():param -= learning_rate * param.grad# You can access the first layer of `model` like accessing the first item of a list
linear_layer = model[0]# For linear layer, its parameters are stored as `weight` and `bias`.
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()}

???????? 4、優化器的使用

????????????????到目前為止，我們都是通過使用torch.no_grad函數來手動調用帶學習參數的Tensors來更新模型的權值。對于隨機梯度下降等簡單的優化算法來說，這并不是一個巨大的負擔，但在實踐中，我們經常使用更復雜的優化器（如 AdaGrad、RMSProp、Adam 等）來訓練神經網絡。

????????????PyTorch 中的 optim 包抽象了優化算法的概念，并提供了常用優化算法的實現。

????????????????下面我們還是使用上面的模型，但是我們將會使用optim package里面的RMSprop算法來優化模型：

import torch
import math# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)# Prepare the input tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(torch.nn.Linear(3, 1),torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use RMSprop; the optim package contains many other
# optimization algorithms. The first argument to the RMSprop constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):# Forward pass: compute predicted y by passing x to the model.y_pred = model(xx)# Compute and print loss.loss = loss_fn(y_pred, y)if t % 100 == 99:print(t, loss.item())# Before the backward pass, use the optimizer object to zero all of the# gradients for the variables it will update (which are the learnable# weights of the model). This is because by default, gradients are# accumulated in buffers( i.e, not overwritten) whenever .backward()# is called. Checkout docs of torch.autograd.backward for more details.optimizer.zero_grad()# Backward pass: compute gradient of the loss with respect to model# parametersloss.backward()# Calling the step function on an Optimizer makes an update to its# parametersoptimizer.step()linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

? ? ? ? 5、自定義nn.Module

? ? ? ? ? ? ? ? 在第三小節已經介紹了，nn.Module就包含了各種神經網絡層，如果我們想自定義一個神經網絡層，或者說自定義一個包含多層網絡的模型，該怎么辦呢？

????????對于這種情況，我們可以繼承nn包來定義自己的modules，并且定義一個前向傳遞函數，接收輸入tensors并且使用其他的模塊或者自動求導函數來產生輸出tensors。

import torch
import mathclass Polynomial3(torch.nn.Module):def __init__(self):"""In the constructor we instantiate four parameters and assign them asmember parameters."""super().__init__()self.a = torch.nn.Parameter(torch.randn(()))self.b = torch.nn.Parameter(torch.randn(()))self.c = torch.nn.Parameter(torch.randn(()))self.d = torch.nn.Parameter(torch.randn(()))def forward(self, x):"""In the forward function we accept a Tensor of input data and we must returna Tensor of output data. We can use Modules defined in the constructor aswell as arbitrary operators on Tensors."""return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3def string(self):"""Just like any class in Python, you can also define custom method on PyTorch modules"""return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)# Construct our model by instantiating the class defined above
model = Polynomial3()# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters (defined 
# with torch.nn.Parameter) which are members of the model.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):# Forward pass: Compute predicted y by passing x to the modely_pred = model(x)# Compute and print lossloss = criterion(y_pred, y)if t % 100 == 99:print(t, loss.item())# Zero gradients, perform a backward pass, and update the weights.optimizer.zero_grad()loss.backward()optimizer.step()print(f'Result: {model.string()}')

OK！以上基本對模型構建的各個步驟，從毛坯到最后的裝修說清楚了，當然我們的案例只是想給大家展示一個模型構建的基本步驟是怎么樣的。如果真正想要開發一個用于實際業務的模型，還要很多其他需要學習的地方，歡迎大家一起學習，指正，分享。