文章目錄
- 前言
- 一、平臺環境準備
- 二、環境準備
- 1.GFPGAN代碼處理
- 2.MagicMind轉換
- 修改env.sh
- 修改run.sh
- 參數解析
- 運行
- 3.修改后模型運行
前言
MagicMind是面向寒武紀MLU的推理加速引擎。MagicMind能將人工智能框架(TensorFlow、PyTorch、Caffe與ONNX等)訓練好的算法模型轉換成MagicMind統一計算圖表示,并提供端到端的模型優化、代碼生成以及推理業務部署能力。MagicMind 致力于為用戶提供高性能、靈活、易用的編程接口以及配套工具,讓用戶能夠專注于推理業務開發和部署本身,而無需過多關注底層硬件細節。
如果有用MLU、GPU、CPU訓練好的算法模型,可以使用MagicMind快速地實現在MLU上部署推理業務。MagicMind的優勢在于它能為MLU上的推理業務提供:
極致的性能優化。可靠的精度。盡可能少的內存占用。靈活的定制化開發能力。簡潔易用的接口。
MagicMind適用(但不限于)以下推理業務場景:
圖像處理(分類、檢測、分割)。視頻處理。自然語言處理。姿態檢測。搜索、推薦。
MagicMind支持不同的系統平臺和MLU硬件平臺。MagicMind面向云端業務和端側業務,提供了統一的編程界面,并針對兩種業務場景的差異點,提供了必要的定制化功能(比如面向端側部署提供了remote debug功能)。
具體參考:https://www.cambricon.com/docs/sdk_1.15.0/magicmind_1.7.0/user_guide/2_introduction/0_what_is_magicmind/what_is_magicmind.html
一、平臺環境準備
鏡像選擇:pytorch:v24.10-torch2.4.0-torchmlu1.23.1-ubuntu22.04-py310 【本次mm操作,對鏡像需求不是很高,只需對其相關版本即可】
卡選擇:任意一款MLU3系列及以上卡
二、環境準備
1.GFPGAN代碼處理
git clone https://github.com/xuanandsix/GFPGAN-onnxruntime-demo.git
#下載gfpgan原始模型
wget https://githubfast.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth
#轉onnx操作
python torch2onnx.py --src_model_path ./GFPGANv1.4.pth --dst_model_path ./GFPGANv1.4.onnx --img_size 512
#onnx推理
python demo_onnx.py --model_path GFPGANv1.4.onnx --image_path ./cropped_faces/Adele_crop.png --save_path Adele_v3.jpg
性能:
(pytorch) root@notebook-mm-100semv-notebook-0:/workspace/volume/guojunceshi2/mmgfpgan/GFPGAN-onnxruntime-demo# python demo_onnx.py
infer time: 2.8468078281730413
infer time: 2.2596635334193707
infer time: 3.117730548605323
2.MagicMind轉換
#mmwhl包安裝
pip install magicmind-1.13.0-cp310-cp310-linux_x86_64.whl
#代碼拷貝
git clone https://gitee.com/cambricon/magicmind_cloud.git
#1,環境變量配置
cd magicmind_cloud/buildin/cv/classification/resnet50_onnx/
修改env.sh
export NEUWARE_HOME=/usr/local/neuware #主要是這一行重要其余不變
export MM_RUN_PATH=${NEUWARE_HOME}/bin
#本sample工作路徑
export PROJ_ROOT_PATH=$(cd $(dirname "${BASH_SOURCE[0]}");pwd)
export MAGICMIND_CLOUD=${PROJ_ROOT_PATH%buildin*}
export MODEL_PATH=${PROJ_ROOT_PATH}/data/models# CV類網絡通用文件路徑
export UTILS_PATH=${MAGICMIND_CLOUD}/buildin/cv/utils# Python公共組件路徑
export PYTHON_COMMON_PATH=${MAGICMIND_CLOUD}/buildin/python_common
# CPP公共接口路徑
export CPP_COMMON_PATH=$MAGICMIND_CLOUD/buildin/cpp_commonhas_add_common_path=$(echo ${PYTHONPATH}|grep "${PYTHON_COMMON_PATH}")
if [ -z ${has_add_common_path} ];thenexport PYTHONPATH=${PYTHONPATH}:${PYTHON_COMMON_PATH}
fihas_add_util_path=$(echo ${PYTHONPATH}|grep "${UTILS_PATH}")
if [ -z ${has_add_util_path} ];thenexport PYTHONPATH=${PYTHONPATH}:${UTILS_PATH}
fi
然后source env.sh
修改run.sh
#!/bin/bash
set -e
set -xmagicmind_model=face_force_float32_true
precision=force_float32
batch_size=1
dynamic_shape=falsepython gen_model.py --precision ${precision} \--input_dims ${batch_size} 3 512 512 \--batch_size ${batch_size} \-dynamic_shape ${dynamic_shape} \--magicmind_model ${magicmind_model} \--input_layout NHWC \--dim_range_min 1 3 512 512 \--dim_range_max 64 3 512 512 \--onnx /workspace/volume/guojunceshi2/mmgfpgan/GFPGAN-onnxruntime-demo/gfpgan14.onnx
參數解析
–precision 可選。精度模式,默認采用float32運行整個網絡,即值為:force_float32。force_float32:所有算子以FLOAT32作為輸入精度和輸出數據類型,且中間結果也是FLOAT32。force float16:所有算子以FLOAT16作為輸入精度和輸出數據類型,且中間結果也是FLOAT16。qint8_mixed float32:模擬量化算子以FLOAT32作為輸入,先量化成INT8,再轉成FLOAT32進行計算,其他非量化算子的輸入精度和輸出數據類型和中間結果都是FLOAT32。qint16_mixed_foat32:模擬量化算子以FLOAT32作為輸入,先量化成INT16,再轉成FLOAT32進行計算,其他非量化算子的輸入精度和輸出數據類型和中間結果都是FLOAT32。
qint8_mixed _float16:模擬量化算子以FLOAT16作為輸入,先量化成INT8,再轉成FLOAT16進行計算,其他非量化算子的輸入精度和輸出數據類型和中間結果都是FLOAT16。ONNX支持的模擬量化算子包括:Conv1D,Conv2D,Conv3D,ConvTranspose1D,ConvTrans-pose2D,Gemm,MatMul。模擬量化相關概念見模擬量化。
–input_dims 輸入維度
–dynamic_shape
運行
生成 face_force_float32_true文件
注意輸入維度和輸出維度
3.修改后模型運行
原始模型讀取部分
img = img.transpose(0, 3, 1, 2)
pre_process 返回為1,3,512,512注意img輸入維度為1,3,512,512
ort_inputs = {self.ort_session.get_inputs()[0].name: img}
ort_outs = self.ort_session.run(None, ort_inputs)
修改后
img = img.transpose(0, 1, 2, 3)
pre_process 返回為1,512,512,3
模型讀取部分修改為,其余不變
記得推理前執行以下前面source env.sh操作
from mm_runner import MMRunner
self.ort_session = MMRunner(mm_file = "face_force_float32_true",device_id = 0)
ort_outs = self.ort_session([img])
運行效果
2025-01-06 10:49:16,886: INFO: mm_runner.py:20] Model instance Created Success!
2025-01-06 10:49:16,898: INFO: mm_runner.py:32] Model dev Created Success!
2025-01-06 10:49:17,516: INFO: mm_runner.py:39] Model engine Created Success!
2025-01-06 10:49:17,644: INFO: mm_runner.py:43] Model context Created Success!
2025-01-06 10:49:17,645: INFO: mm_runner.py:47] Model queue Created Success!
2025-01-06 10:49:17,645: INFO: mm_runner.py:50] Model inputs Created Success!
2025-01-06 10:49:17,645: INFO: mm_runner.py:51] All Model resource Created Success!
infer time: 0.11474167183041573
infer time: 0.04283882491290569
infer time: 0.040602266788482666
infer time: 0.04028203524649143
infer time: 0.04049760662019253
infer time: 0.04016706347465515
infer time: 0.04045788757503033
infer time: 0.04026786610484123
infer time: 0.041572125628590584
infer time: 0.04047401808202267
infer time: 0.04045314900577068
infer time: 0.04047247767448425
infer time: 0.04037348926067352
infer time: 0.04047695733606815
infer time: 0.04112406447529793
顯存消耗
Every 2.0s: cnmon notebook-mm-100semv-notebook-0: Mon Jan 6 10:49:26 2025Mon Jan 6 10:49:26 2025
+------------------------------------------------------------------------------+
| CNMON v5.10.29 Driver v5.10.29 |
+-------------------------------+----------------------+-----------------------+
| Card VF Name Firmware | Bus-Id | Util Ecc-Error |
| Fan Temp Pwr:Usage/Cap | Memory-Usage | Mode Compute-Mode |
|===============================+======================+=======================|
| 0 / MLU370-M8 v1.1.4 | 0000:69:00.0 | 73% 0 |
| 0% 34C 179 W/ 300 W | 731 MiB/ 42396 MiB | FULL Default |
+-------------------------------+----------------------+-----------------------+
| 1 / MLU370-M8 v1.1.4 | 0000:72:00.0 | 0% 0 |
| 0% 27C 50 W/ 300 W | 0 MiB/ 42396 MiB | FULL Default |
+-------------------------------+----------------------+-----------------------++------------------------------------------------------------------------------+
| Processes: |
| Card MI PID Command Line MLU Memory Usage |
|==============================================================================|
| 0 / 40007 python 650 MiB |
+------------------------------------------------------------------------------+
優化前:2.84-3.0s
優化后:0.04-0.1s