Pytorch張量和損失函數

文章目錄

張量
- 張量類型
- 張量例子
- 使用概率分布創建張量
- - 正態分布創建張量 (torch.normal)
  - 正態分布創建張量示例
  - 標準正態分布創建張量
  - 標準正態分布創建張量示例
  - 均勻分布創建張量
  - 均勻分布創建張量示例
激活函數
- 常見激活函數
損失函數(Pytorch API)
- L1范數損失函數
- 均方誤差損失函數
- 交叉熵損失函數
- 余弦相似度損失
- - 計算兩個向量的余弦相似度
  - 計算兩個矩陣的余弦相似度（逐行計算）
  - 計算兩個 batch 數據的余弦相似度

張量

張量類型

張量是一個多維數組，它的每個方向都被稱為模(Mode)。張量的階數就是它的維數，一階張量就是向量，二階張量就是矩陣，三界以上的張量統稱為高階張量。

Tensor是Pytorch的基本數據結構，在使用時表示為torch.Tensor形式。主要屬性包括以下內容（前四個屬性與數據相關，后四個屬性與梯度求導相關）：
- data：被包裝的張量。
- dtype：張量的數據類型。
- shape：張量的形狀/維度。
- device：張量所在的設備，加速計算的關鍵（CPU、GPU）
- grad：data的梯度
- grad_fn：創建張量的Function（自動求導的關鍵）
- requires_grad：指示是否需要計算梯度
- is_leaf：指示是否為葉子節點

torch.dtype是表示torch.Tensor數據類型的對象，PyTorch支持以下9種數據類型：

數據類型	dtype表示	CPU張量類型	GPU張量類型
32位浮點數	`torch.float32` 或 `torch.float`	`torch.FloatTensor`	`torch.cuda.FloatTensor`
64位浮點數	`torch.float64` 或 `torch.double`	`torch.DoubleTensor`	`torch.cuda.DoubleTensor`
16位浮點數	`torch.float16` 或 `torch.half`	`torch.HalfTensor`	`torch.cuda.HalfTensor`
8位無符號整數	`torch.uint8`	`torch.ByteTensor`	`torch.cuda.ByteTensor`
8位有符號整數	`torch.int8`	`torch.CharTensor`	`torch.cuda.CharTensor`
16位有符號整數	`torch.int16` 或 `torch.short`	`torch.ShortTensor`	`torch.cuda.ShortTensor`
32位有符號整數	`torch.int32` 或 `torch.int`	`torch.IntTensor`	`torch.cuda.IntTensor`
64位有符號整數	`torch.int64` 或 `torch.long`	`torch.LongTensor`	`torch.cuda.LongTensor`
布爾型	`torch.bool`	`torch.BoolTensor`	`torch.cuda.BoolTensor`

浮點類型默認使用torch.float32
整數類型默認使用torch.int64
布爾類型用于存儲True/False值
GPU張量類型需在CUDA環境下使用

張量例子

import torch
import numpy as np
# 1. 創建Tensor
x = torch.tensor([[1, 2], [3, 4.]])  # 自動推斷為float32類型
print("Tensor x:\n", x)
y=torch.tensor(np.ones((3,3)))
print("Tensor y:\n", y)

Tensor x:tensor([[1., 2.],[3., 4.]])
Tensor y:tensor([[1., 1., 1.],[1., 1., 1.],[1., 1., 1.]], dtype=torch.float64)

# 2. 查看Tensor屬性
print("\nTensor屬性:")
print("data:", x.data)        # 被包裝的張量
print("dtype:", x.dtype)      # 數據類型 torch.float32
print("shape:", x.shape)      # 形狀/維度 torch.Size([2, 2])
print("device:", x.device)    # 所在設備 cpu
print("requires_grad:", x.requires_grad)  # 是否需要計算梯度 False
print("is_leaf:", x.is_leaf)  # 是否為葉子節點 True

Tensor屬性:
data: tensor([[1., 2.],[3., 4.]])
dtype: torch.float32
shape: torch.Size([2, 2])
device: cpu
requires_grad: False
is_leaf: True

# 3. 設置requires_grad=True以跟蹤計算
x = torch.tensor([[1., 2], [3, 4]], device='cpu', requires_grad=True)
print("\n設置requires_grad=True后的x:", x)

設置requires_grad=True后的x: tensor([[1., 2.],[3., 4.]], requires_grad=True)

# 4. 進行一些計算操作
y = x + 2
z = y * y * 3
out = z.mean()print("\n計算過程:")
print("y = x + 2:\n", y)
print("z = y * y * 3:\n", z)
print("out = z.mean():", out)

計算過程:
y = x + 2:tensor([[3., 4.],[5., 6.]], grad_fn=<AddBackward0>)
z = y * y * 3:tensor([[ 27.,  48.],[ 75., 108.]], grad_fn=<MulBackward0>)
out = z.mean(): tensor(64.5000, grad_fn=<MeanBackward0>)

# 5. 反向傳播計算梯度
out.backward()
print("\n梯度計算:")
print("x.grad:\n", x.grad)  # d(out)/dx

梯度計算:
x.grad:tensor([[4.5000, 6.0000],[7.5000, 9.0000]])

# 6. 查看grad_fn
print("\n梯度函數:")
print("y.grad_fn:", y.grad_fn)  # <AddBackward0>
print("z.grad_fn:", z.grad_fn)  # <MulBackward0>
print("out.grad_fn:", out.grad_fn)  # <MeanBackward0>

梯度函數:
y.grad_fn: <AddBackward0 object at 0x0000025AD0B28670>
z.grad_fn: <MulBackward0 object at 0x0000025AD0B919A0>
out.grad_fn: <MeanBackward0 object at 0x0000025AD0B28670>

# 7. 設備管理
if torch.cuda.is_available():device = torch.device("cuda")x_cuda = x.to(device)print("\nGPU Tensor:")print("x_cuda device:", x_cuda.device)
else:print("\nCUDA不可用")

GPU Tensor:
x_cuda device: cuda:0

# 8. 數據類型轉換
x_int = x.int()
print("\n數據類型轉換:")
print("x_int dtype:", x_int.dtype)  # torch.int32

數據類型轉換:
x_int dtype: torch.int32

使用概率分布創建張量

正態分布創建張量 (torch.normal)

通過torch.normal()函數從給定參數的離散正態分布中抽取隨機數創建張量。

torch.normal(mean, std, size=None, out=None)

mean (Tensor/float): 正態分布的均值（支持標量或張量）
std (Tensor/float): 正態分布的標準差（支持標量或張量）
size (tuple): 輸出張量的形狀（僅當mean/std為標量時必需）
out (Tensor): 可選輸出張量

均值和標準差均為標量
均值為張量，標準差為標量
均值為標量，標準差為張量
均值和標準差均為張量（需同形狀）

正態分布創建張量示例

import torch# 模式1：標量均值和標準差
normal_tensor1 = torch.normal(mean=0.0, std=1.0, size=(2,2))
print("標量參數:\n", normal_tensor1)# 模式2：張量均值 + 標量標準差
mean_tensor = torch.arange(1, 5, dtype=torch.float)
normal_tensor2 = torch.normal(mean=mean_tensor, std=1.0)
print("\n張量均值:\n", normal_tensor2)# 模式4：張量均值 + 張量標準差
std_tensor = torch.linspace(0.1, 0.4, steps=4)
normal_tensor3 = torch.normal(mean=mean_tensor, std=std_tensor)
print("\n雙張量參數:\n", normal_tensor3)

標量參數:tensor([[-1.5585,  0.2315],[-1.5771, -0.0783]])張量均值:tensor([0.9710, 1.2523, 3.6285, 4.2808])雙張量參數:tensor([1.0566, 2.1025, 3.1653, 3.3020])

標準正態分布創建張量

torch.randn

torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False)

size (tuple): 定義張量形狀的整數序列
dtype (torch.dtype): 指定數據類型（如torch.float32）
device (torch.device): 指定設備（‘cpu’或’cuda’）
requires_grad (bool): 是否啟用梯度計算
torch.randn_like

torch.randn_like(input, dtype=None, layout=None, device=None, requires_grad=False)

input (Tensor): 參考張量（復制其形狀）

標準正態分布創建張量示例

# 基礎用法
randn_tensor = torch.randn(3, 4, dtype=torch.float64)
print("標準正態張量:\n", randn_tensor)# 類似張量創建
base_tensor = torch.empty(2, 3)
randn_like_tensor = torch.randn_like(base_tensor)
print("\n類似形狀創建:\n", randn_like_tensor)# GPU張量創建（需CUDA環境）
if torch.cuda.is_available():gpu_tensor = torch.randn(3, 3, device='cuda')print("\nGPU張量:", gpu_tensor.device)

標準正態張量:tensor([[-0.3266, -0.9314,  0.1892, -0.3418],[ 0.4397, -1.2986, -0.7380, -0.6443],[ 0.7485,  0.4076, -0.6021, -0.9000]], dtype=torch.float64)類似形狀創建:tensor([[-0.8994,  0.5934, -1.3246],[-0.1019,  0.8172, -1.3164]])GPU張量: cuda:0

均勻分布創建張量

torch.rand：生成[0,1)區間內的均勻分布

torch.rand(*size, out=None, dtype=None, layout=torch.strided, device=None,requires_grad=False) → Tensor

torch.rand_like

torch.rand_like(input, dtype=None, layout=None, device=None, requires_grad=False)

均勻分布創建張量示例

# 基礎均勻分布
uniform_tensor = torch.rand(2, 2)
print("均勻分布張量:\n", uniform_tensor)# 指定范圍的均勻分布（需線性變換）
a, b = 5, 10
scaled_tensor = a + (b - a) * torch.rand(3, 3)
print("\n[5,10)區間張量:\n", scaled_tensor)# 整數均勻分布（需結合random.randint）
int_tensor = torch.randint(low=0, high=10, size=(4,))
print("\n整數均勻分布:\n", int_tensor)

均勻分布張量:tensor([[0.4809, 0.6847],[0.9278, 0.9965]])[5,10)區間張量:tensor([[8.6137, 5.9940, 7.2302],[5.1680, 7.0532, 5.9403],[8.3315, 6.1549, 8.5181]])整數均勻分布:tensor([8, 5, 9, 6])

激活函數

激活函數是指在神經網絡的神經元上運行的函數，其負責將神經元的輸入映射到輸出端。

常見激活函數

參看深度學習系統學習系列【5】之深度學習基礎

損失函數(Pytorch API)

在監督學習中，損失函數表示樣本真實值與模型預測值之間的偏差，其值通常用于衡量模型的性能。現有的監督學習算法不僅使用了損失函數，而且求解不同應用場景的算法會使用不同的損失函數。即使在相同場景下，不同的損失函數度量同一樣本的性能時也存在差異。
損失函數的選用是否合理直接決定著監督學習算法預測性能的優劣。
在PyTorch中，損失函數通過torch.nn包實現調用。

L1范數損失函數

L1范數損失即L1LoSS，原理就是取預測值和真實值的絕對誤差的平均數，計算模型預測輸出output和目標target之差的絕對值，可選擇返回同維度的張量或者標量。
$loss(x,y)=\frac{1}{N}\sum_{i=1}^{N}|x-y|$

torch.nn.L1Loss (size_average=None, reduce=None, reduction='mean')

size_average：為True時，返回的loss為平均值；為False時，返回的loss為各樣本的loss值之和。
reduce：返回值是否為標量，默認為True。

import torch
import torch.nn as nn
loss=nn.L1Loss(eduction='mean')
input=torch.tensor([1.0,2.0,3.0,4.0])
target=torch.tensor([4.0,5.0,6.0,7.0])
output=loss(input,target)
print(output) # tensor(3.)

兩個輸入類型必須一致，reduction是損失函數一個參數，有三個值：'none’返回的是一個向量(batch_size)，'sum’返回的是和，'mean’返回的是均值。

均方誤差損失函數

均方誤差損失即MSELoss，計算公式是預測值和真實值之間的平方和的平均數，計算模型預測輸出output和目標target之差的平方，可選返回同維度的張量或者標量。
$loss(x,y)=\frac{1}{N}\sum_{i=1}^{N}|x-y|^2$

torch.nn.MSELoss(reduce=True,size average=True,reduction='mean')

reduce：返回值是否為標量，默認為True。
size_average：當reduce=True時有效。為True時，返回的loss為平均值；為False時，返回的loss為各樣本的loss值之和。

import torch
import torch.nn as nn
loss=nn.MSELoss(reduction='mean')
input=torch.tensor([1.0,2.0,3.0,4.0])
target=torch.tensor([4.0,5.0,6.0,7.0])
output=loss(input,target)
print(output) # tensor(9.)

交叉熵損失函數

交叉熵損失（Cross Entropy Loss）函數結合了nn.LogSoftmax()和nn.NLLLoss()兩個函數，在做分類訓練的時候非常有用。
交叉熵的概念，它用來判定實際輸出與期望輸出的接近程度。也就是說，用它來衡量網絡的輸出與標簽的差異，利用這種差異通過反向傳播來更新網絡參數。交叉熵主要刻畫的是實際輸出概率與期望輸出概率的距離，也就是交叉熵的值越小，兩個概率分布就越接近，假設概率分布p為期望輸出，概率分布q為實際輸出，計算公式如下：
$q)=-\sum_x p(x) \times logq(x)$

torch.nn.CrossEntropyLoss(weight=None, size_average=None,ignore_index=-100,reduce=None,reduction='mean')

weight(tensor)：n個元素的一維張量，分別代表n類權重，如果訓練樣本很不均衡的話，則非常有用，默認值為None。
size_average：當reduce=True時有效。為True時，返回的loss為平均值；為False時，返回的loss為各樣本的loss值之和。
ignore_index：忽略某一類別，不計算其loss，并且在采用size_average時，不會計算那一類的loss值。
reduce：返回值是否為標量，默認為True。

import torch.nn as nn
entroy=nn.CrossEntropyLoss(reduction='mean')
input=torch.tensor([[-0.011,-0.022,-0.033,-0.044]])
target=torch.tensor([0])
output=entroy(input,target)
print(output)

余弦相似度損失

余弦相似度損失（Cosine SimilarityLoss）通常用于度量兩個向量的相似性，可以通過最大化這個相似度來進行優化。
$\begin{array} { r } { \mathrm { l o s s } ( x , y ) = \left\{ \begin{array} { l l } { \mathrm { l } - \mathrm { c o s } ( x _ { 1 } , x _ { 2 } ) , \quad } & { y = 1 } \\ { \mathrm { m a x } ( 0 , \mathrm { c o s } ( x _ { 1 } , x _ { 2 } ) - \mathrm { m a r g i n } ) , \quad } & { y = - 1 } \end{array} \right. } \end{array}$
torch.nn.functional.cosine_similarity是 PyTorch 提供的用于計算兩個張量之間余弦相似度（Cosine Similarity）的函數。余弦相似度衡量的是兩個向量在方向上的相似程度，取值范圍為 [-1, 1]，值越大表示方向越相似。

torch.nn.functional.cosine_similarity(x1,  x2, dim=1,  eps=1e-8)

參數	類型	說明
`x1`	`Tensor`	第一個輸入張量
`x2`	`Tensor`	第二個輸入張量
`dim`	`int`	計算相似度的維度，默認 `dim=1`表示對每個樣本計算特征向量的相似度。
`eps`	`float`	防止除零的小數值，默認 `1e-8` 防止分母為零（當某個向量的 L2 范數為 0 時）

常見用途

文本/圖像相似度計算（如對比學習、檢索任務）。
損失函數設計（如 1 - cosine_similarity 用于最小化方向差異）。
特征匹配（如計算兩個嵌入向量的相似度）。

計算兩個向量的余弦相似度

輸入要求：x1 和 x2 必須具有 相同的形狀（shape）。如果輸入是 1D 張量（向量），需要先 unsqueeze(0) 變成 2D（矩陣）才能計算。例如：

import torch
import torch.nn.functional as Fa = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])# 需要 unsqueeze(0) 變成 2D
similarity = F.cosine_similarity(a.unsqueeze(0), b.unsqueeze(0), dim=1)
print(similarity)  # 輸出：tensor([0.9746])

計算兩個矩陣的余弦相似度（逐行計算）

import torch
import torch.nn.functional as F
x1 = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
x2 = torch.tensor([[5.0, 6.0], [7.0, 8.0]])similarity = F.cosine_similarity(x1, x2, dim=1)
print(similarity)  # 輸出：tensor([0.9689, 0.9974])

計算兩個 batch 數據的余弦相似度

import torch
import torch.nn.functional as F
batch_a = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
batch_b = torch.tensor([[4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])similarity = F.cosine_similarity(batch_a, batch_b, dim=1)
print(similarity)  # 輸出：tensor([0.9746, 0.9989])