0. 引出問題
在神經網絡反向傳播過程中 loss = [loss?,loss?, loss?],為什么 ?loss/?w
?loss?/?w
?loss?/?w
?loss?/?w
?loss?/?w 和 loss 維度一樣都是三位向量 ,[?loss?/?w, ?loss?/?w, ?loss?/?w] 就變成3*3的矩陣
如下所示:
import torchw = torch.tensor([1.0, 2.0,3.0], requires_grad=True)
loss = w * 3
print("loss: \n", loss)loss_m = []for i, val in enumerate(loss):w.grad = None # 清零val.backward(retain_graph=True)print(f"?loss{i+1}/?w = {w.grad}")loss_m.append(w.grad.clone())print("loss_m: \n", torch.stack(loss_m))
輸出結果:
loss: tensor([3., 6., 9.], grad_fn=<MulBackward0>)?loss1/?w = tensor([3., 0., 0.])
?loss2/?w = tensor([0., 3., 0.])
?loss3/?w = tensor([0., 0., 3.])loss_m: tensor([[3., 0., 0.],[0., 3., 0.],[0., 0., 3.]])
loss: tensor([3., 6., 9.]) 為向量,對w求導時為矩陣
但是 w.grad 必須 是標量或張量,不能是向量矩陣
1. 標量求導
import torchw = torch.tensor([1.0, 2.0,3.0], requires_grad=True)
loss = w * 3
print("loss: \n", loss)loss_m = []
# 方法1:分別計算
for i, val in enumerate(loss):w.grad = None # 清零val.backward(retain_graph=True)print(f"?loss{i+1}/?w = {w.grad}")loss_m.append(w.grad.clone())print("loss_m: \n", torch.stack(loss_m))grads = torch.autograd.grad(loss.sum(), w,retain_graph=True)
print("grads: \n", grads) grads1 = torch.autograd.grad(loss.mean(), w)[0]
print("grads1: \n", grads1)
輸出;
loss: tensor([3., 6., 9.], grad_fn=<MulBackward0>)
?loss1/?w = tensor([3., 0., 0.])
?loss2/?w = tensor([0., 3., 0.])
?loss3/?w = tensor([0., 0., 3.])
loss_m: tensor([[3., 0., 0.],[0., 3., 0.],[0., 0., 3.]])
grads: (tensor([3., 3., 3.]),)
grads1: tensor([1., 1., 1.])
同樣的例子:
import torch# 3個樣本的真實數據
x = torch.tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], requires_grad=True)
y_true = torch.tensor([1.0, 2.0, 3.0])# 線性模型:y = w?x? + w?x?
w = torch.tensor([0.5, 0.5], requires_grad=True)
predictions = (x @ w) # [1.5, 3.5, 5.5]
print("預測值:", predictions)
# 計算每個樣本的梯度
individual_grads = []
for i in range(3):loss = (predictions[i] - y_true[i])**2loss.backward(retain_graph=True)individual_grads.append(w.grad.clone())w.grad.zero_()print("樣本1梯度:", individual_grads[0])
print("樣本2梯度:", individual_grads[1])
print("樣本3梯度:", individual_grads[2]) # 標量梯度:自動綜合
total_loss = ((predictions - y_true)**2).mean()
total_loss.backward()# 驗證:標量梯度 = 向量梯度的平均
manual_average = (individual_grads[0] + individual_grads[1] + individual_grads[2]) / 3print("手動平均:", manual_average)
print("標量結果:", w.grad)
輸出結果:
預測值: tensor([1.5000, 3.5000, 5.5000], grad_fn=<MvBackward0>)
樣本1梯度: tensor([1., 2.])
樣本2梯度: tensor([ 9., 12.])
樣本3梯度: tensor([25., 30.])
手動平均: tensor([11.6667, 14.6667])
標量結果: tensor([11.6667, 14.6667])
訓練神經網絡是為了最小化整體損失,不是單獨優化每個樣本
# 實際訓練:最小化平均損失
batch_loss = individual_losses.mean() # 標量
batch_loss.backward() # 得到平均梯度
optimizer.step() # 朝平均最優方向更新
2. 什么時候需要向量梯度?
僅用于研究:分析樣本敏感性
def compute_sample_gradients(model, x, y):"""僅用于分析,不用于訓練"""grads = []for xi, yi in zip(x, y):model.zero_grad()pred = model(xi.unsqueeze(0))loss = ((pred - yi) ** 2)loss.backward()grads.append(model.weight.grad.clone())return grads # 每個樣本的單獨梯度