自定義reset50模型轉換到昇騰om

原始轉換腳本

腳本運行報錯

基于reset50 模型的自定義網絡

基本網絡結構

卷積模塊定義示例

Bottleneck定義示例

網絡定義示例

改進的轉換腳本

腳本運行報錯channels不匹配

腳本運行報錯維度不匹配

模型輸入數據的類型

tensor size

NCHW和NHWC

自定義網絡的通道數目

輸入維度修改

onnx轉換om文件

帶參數images

帶參數input

總結

之前轉換yolov8 官方的模型，命令很簡單，只有幾行就可以搞定。

而自定的reset50 則顯得更為復雜些。在這個轉換過程中，我們可以了解一些模型相關的基本概念。

原始轉換腳本

 model = models.torch.load('ok.pt')model.to(device)model.eval()dummy_img = torch.Tensor(torch.randn(10, 2, 95))) torch.onnx.export(model = model, #pth模型args=dummy_img,)

腳本運行報錯

 pth2onnx('resnet18')File "E:\project\10\model_trasfer.py", line 12, in pth2onnxmodel.eval()
AttributeError: 'collections.OrderedDict' object has no attribute 'eval'

提示沒有這個eval這個屬性。

這個eval屬性將模型設置為推理模式（參考：將 PyTorch 訓練模型轉換為 ONNX | Microsoft Learn），是必須調用的。此屬性是針對模型的，需要先定義一個模型再進行轉換。

基于reset50 模型的自定義網絡

由前述這個腳本轉換的報錯，我們需要一個模型。

基本網絡結構

參考：ResNets: Why do they perform better than Classic ConvNets? (Conceptual Analysis) | Towards Data Science

【DL系列】ResNet網絡結構詳解、完整代碼實現-CSDN博客

?主要包括

1）卷積模塊，包括含卷積層（青色）、批歸一化層（淺藍色）、ReLU 激活函數（橙黃色）和最大池化層

2）幾個bottleneck模塊

卷積模塊定義示例

def Conv1(in_places, places, stride=2):return nn.Sequential(nn.Conv2d(in_channels=in_places, out_channels=places, kernel_size=7, stride=stride, padding=3, bias=False),nn.BatchNorm2d(places),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

參數

in_places :?輸入通道數。

places :?輸出通道數。

stride: 卷積步長，默認為 2.

作用: 作為 ResNet 的第一個卷積層，用于對輸入圖像進行初步特征提取

Bottleneck定義示例

class Bottleneck(nn.Module):def __init__(self, in_places, places, stride=1, downsampling=False, expansion=4):super(Bottleneck, self).__init__()self.expansion = expansionself.downsampling = downsamplingself.bottleneck = nn.Sequential(nn.Conv2d(in_channels=in_places, out_channels=places, kernel_size=1, stride=1, bias=False),nn.BatchNorm2d(places),nn.ReLU(inplace=True),nn.Conv2d(in_channels=places, out_channels=places, kernel_size=3, stride=stride, padding=1, bias=False),nn.BatchNorm2d(places),nn.ReLU(inplace=True),self.relu = nn.ReLU(inplace=True)def forward(self, x):residual = x#         print("bot input shape:",x.shape)out = self.bottleneck(x)if self.downsampling:residual = self.downsample(x)out += residualout = self.relu(out)#         print("bot output shape:",out.shape)return out

in_places: 輸入通道數
places: 中間層的通道數
stride : 卷積步長，默認為 1
downsampling: 是否進行下采樣(用于調整殘差連接的維度)
expansion: 擴展因子，用于調整輸出通道數。

作用??通過 1x1、3x3、1x1 的卷積層組合，減少計算量并提取特征。
使用殘差連接 ( out += residua1 ) 解決深層網絡的梯度消失問題

網絡定義示例

class ResNet(nn.Module):def __init__(self, blocks, num_classes=10, expansion=4):super(ResNet, self).__init__()self.expansion = expansionself.conv1 = Conv1(in_places=2, places=64)self.layer1 = self.make_layer(in_places=64, places=64, block=blocks[0], stride=1)self.layer2 = self.make_layer(in_places=256, places=128, block=blocks[1], stride=2)self.layer3 = self.make_layer(in_places=512, places=256, block=blocks[2], stride=2)self.layer4 = self.make_layer(in_places=1024, places=512, block=blocks[3], stride=2)self.avgpool = nn.AdaptiveAvgPool2d((1, 1))self.fc = nn.Linear(512 * expansion, num_classes)def make_layer(self, in_places, places, block, stride):layers = []layers.append(Bottleneck(in_places, places, stride, downsampling=True))for i in range(1, block):#             print(i,in_places,places,self.expansion)layers.append(Bottleneck(places * self.expansion, places))return nn.Sequential(*layers)def forward(self, x):#         print(x.shape)x1 = self.conv1(x)#         print("conv1:",x1.shape)x2 = self.layer1(x1)return F.normalize(x7, dim=-1), F.normalize(x8, dim=-1)

功能:定義了完整的 ResNet 模型
參數

blocks : 每個階段的 Bottleneck 模塊數量(例如 ResNet-50 為 [3，4，6，3])。
num_classes :?分類任務的類別數，默認為 10
expansion ：Bottleneck 模塊的擴展因子，默認為 4
作用:
通過 make_layer 方法構建多個 Bottleneck 層。
使用全局平均池化 ( AdaptiveAvgPoo12d ) 將特征圖轉換為向量
通過全連接層 ( fc) 輸出分類結果
返回歸一化后的特征向量和分類結果
?

改進的轉換腳本

主要在于model變量由上述自定義網絡模型實例化而來。

 model = Resnet50_2d()  //這里引入自定義網絡model.load_state_dict(torch.load('ok.pt')) #model = models.torch.load('ok.pt')#model.to(device)model.eval()dummy_img = torch.Tensor(torch.randn(10, 2, 95)) # (batchsize, channels, width, height)

腳本運行報錯channels不匹配

return F.conv2d(

RuntimeError: Given groups=1, weight of size [64, 2, 7, 7], expected input[1, 10, 2, 95] to have 2 channels, but got 10 channels instead

腳本運行報錯維度不匹配

raise ValueError("expected 4D input (got {}D input)".format(input.dim()))
ValueError: expected 4D input (got 3D input)

期望4D，但實際3D的輸入。

模型輸入數據的類型

任何程序都需要數據輸入，只是數據輸入的形式不同。

tensor size

tensor ：在模型轉換中，指的是張量的形狀或維度信息。張量是多維數組，廣泛應用于深度學習框架中，用于表示輸入數據、權重、中間結果等。

- 標量：0維張量（單個數值）
- 向量：1維張量（一維數組）
- 矩陣：2維張量（二維數組）
- 更高維張量：如3D、4D等

tensor size 表示張量在每個維度上的大小。?一個形狀為?[3, 224, 224] 的張量表示一個3通道、224x224像素的圖像。

NCHW和NHWC

多維數據通過多維數組存儲，比如卷積神經網絡的特征圖（Feature Map）通常用四維數組保存，即4D，4D格式解釋如下：

N：Batch數量，例如圖像的數目。
H：Height，特征圖高度，即垂直高度方向的像素個數。
W：Width，特征圖寬度，即水平寬度方向的像素個數。
C：Channels，特征圖通道，例如彩色RGB圖像的Channels為3。

自定義網絡的通道數目

通過2.2節及2.4節網絡定義傳輸的參數

self.conv1 = Conv1(in_places=2, places=64)

?據此我們將轉換腳本修改為：

dummy_img = torch.Tensor(torch.randn(2, 2, 95)) # (batchsize, channels, width, height)

輸入維度修改

由原先3維修改為4維

dummy_img = torch.Tensor(torch.randn(2,2, 2, 95)) # (batchsize, channels, width, height)

至此，已經可以將pt模型文件轉換到onnx文件。

onnx轉換om文件

帶參數images

atc --model=ok.onnx --framework=5 --output=resnet --input_shape="images:1,3,640,640"  --soc_version=Ascend310P3  --insert_op_conf=aipp.cfg
ATC start working now, please wait for a moment.
...
ATC run failed, Please check the detail log, Try 'atc --help' for more information
E10016: 1970-01-01-08:05:58.540.117 Opname [images] specified in [--input_shape] is not found in the model, confirm whether this node name exists, or node is not split with the specified delimiter ';'

提示模型中沒有images。這里就特別注意

帶參數input

atc  --model=ok.onnx     --framework=5     --output=resnet50_2d     --soc_version=Ascend310P3    --input_format=NCHW     --input_shape="input:2,2,2,95"     --log=info
ATC start working now, please wait for a moment.
...
ATC run success, welcome to the next use.

總結

? ?1）自定義網絡的模型轉換，需要轉換時實例化網絡模型。

? ?2）明確模型輸入數據的維度及各維度的大小。畢竟模型的主要工作即處理這些數據。

示例常用的輸入一般為 4D 張量，形狀為?(batch_size, channels, height, width)。

1）batch_size：每次輸入網絡的樣本數量。根據硬件內存和訓練需求選擇，常見值為 32、64、128 等。

? ? ??在推理階段，batch_size?表示一次性輸入模型的樣本數量。與訓練階段不同，推理時通常不需要反向傳播和梯度計算，因此顯存占用較低，可以支持較大的?batch_size。

?常見的?batch_size?選擇

batch_size = 1：
- 適用于實時推理任務（如攝像頭視頻流處理）。
- 延遲最低，但 GPU 利用率可能較低。
batch_size = 8/16/32：
- 適用于批量處理任務（如離線圖像分類、目標檢測）。
- 在顯存允許的情況下，提高 GPU 利用率。
batch_size = 動態調整：
- 根據輸入數據的數量動態調整?batch_size。
- 例如，如果有 100 張圖像需要處理，可以一次性輸入 100 張（如果顯存允許）。