定點化和模型量化（三）

量化解決的是訓練使用的浮點和運行使用的硬件只支持定點的矛盾。這里介紹一些實際量化中使用到的工具。

SNPE簡介

The Snapdragon Neural Processing Engine (SNPE)是高通驍龍為了加速網絡模型設計的框架。但它不只支持高通，SNPE還支持多種硬件平臺，ARM平臺、Intel平臺等。支持的深度學習框架也有包括Caffe、TensorFlow和ONNX等。

SNPE可以前向運行模型，但需要先將模型轉換為Deep Learning Container (DLC) file才可以加載進SNPE中。

CPU支持支持雙精度浮點和8位量化的模型，GPU支持混合精度或者單精度浮點，數字信號處理器DSP就只支持支持8位整形。DLC進一步進行8bit量化才可以運行在Qualcomm?Hexagon?DSP上。

下圖是典型的workflow：

上圖的上半部分是我們熟悉的模型浮點訓練，當模型效果達到預期之后，模型參數固定下來，然后轉換成dlc，dlc再經過壓縮，量化等操作，最后運行在SNPE中。

SNPE安裝及命令

下載SNPE壓縮包，版本號與高通芯片有關，主流是2.13，還有2.17，2.22等。在壓縮包的docs文件夾里面有官方html文檔。

source ./bin/envsetup.sh，激活snpe環境。sh ，bash ，./,?source的作用都是執行腳本。然后就可以在終端中調用snpe指令了：

轉換dlc

以TensorFlow 為例，模型文件可以是pb file或者checkpoint+meta，調用SNPE轉換指令時給定模型路徑和輸入尺寸，最后一層節點名字，就可以得到dlc文件：

snpe-tensorflow-to-dlc --input_network $SNPE_ROOT/models/inception_v3/tensorflow/inception_v3_2016_08_28_frozen.pb \--input_dim input "1,299,299,3" --out_node "InceptionV3/Predictions/Reshape_1" \--output_path inception_v3.dlc

如果是pytorch框架得到的模型，使用snpe-pytorch-to-dlc，serialized PyTorch model into a SNPE DLC file。

更一般地，不管什么框架，都可以先將模型轉換為onnx格式。ONNX（Open Neural Network Exchange）是一種開放式的文件格式，專為機器學習設計，用于存儲訓練好的模型。它使得不同的深度學習框架（如Pytorch，MXNet）可以采用相同格式存儲模型數據。onnx轉dlc使用命令snpe-onnx-to-dlc。

解析dlc：

snpe-dlc-info -i ./xxxx.dlc

dlc量化：snpe-dlc-quantize。

[ --input_dlc=<val> ]Path to the dlc container containing the model for which fixed-point encodingmetadata should be generated. This argument is required.[ --input_list=<val> ]Path to a file specifying the trial inputs. This file should be a plain text file,containing one or more absolute file paths per line. These files will be taken to constitutethe trial set. Each path is expected to point to a binary file containing one trial inputin the 'raw' format, ready to be consumed by SNPE without any further modifications.This is similar to how input is provided to snpe-net-run 
application.
[ --enable_htp ]      Pack HTP information in quantized DLC.

Snapdragon Neural Processing Engine SDK: Tools

run on linux

cd $SNPE_ROOT/models/alexnet
snpe-net-run --container dlc/bvlc_alexnet.dlc --input_list data/cropped/raw_list.txt

run on android target

推lib下面對應架構的所有so

推lib/dsp下面的so

推bin里面的snpe-net-run

AIMET簡介

剛才提到的量化其實是后量化，要想實現量化感知訓練QAT，需要使用AIMET（AI Model Efficiency Toolkit），AIMET也是高通提高的工具，可以實現量化和壓縮。

AIMET是一個庫，可以對訓練好的模型進行量化和壓縮，從而在保證精度損失最小的情況下縮短運行時間，減輕內存壓力。

雖然是在訓練過后再使用AIMET，但它不是簡單地轉化為dlc，而是也有一個訓練的過程，這個過程盡量縮小與浮點模型的誤差。

AIMET實例

PTQ

參考github上面的文檔。AIMET PyTorch AutoQuant API — AI Model Efficiency Toolkit Documentation: ver tf-torch-cpu_1.31.0

即便是后量化也有很多方法組合，需要一些專業的分析。AIMET 提供了AutoQuant 這樣的接口，可以自動分析模型，選擇最合適的后量化方法。用戶只需要指明能接受的精度損失就可以了。

核心是實例化一個AutoQuant類：

auto_quant = AutoQuant(model,                        # Load a pretrained FP32 modeldummy_input=dummy_input,      # dummy_input 是一個隨機數組，只要維度符合輸入就行。data_loader=unlabeled_imagenet_data_loader, eval_callback=eval_callback)  # 統計準確率

使用默認值初始化后開始量化：

auto_quant.set_adaround_params(adaround_params)
model, optimized_accuracy, encoding_path = auto_quant.optimize(allowed_accuracy_drop=0.01)
print(f"- Quantized Accuracy (after optimization):  {optimized_accuracy}")

注意，auto_quant.optimize會返回三個值，分別是model，精度，和encoding path。encoding path以類似json的格式記錄了每一層的名稱，位寬，最大值最小值，offset，scale，還有是否是symmertic。

這里的offset，scale應該就對應上一篇量化中提到的step和zero point。

QAT

參考文檔。AIMET PyTorch Quantization SIM API — AI Model Efficiency Toolkit Documentation: ver tf-torch-cpu_1.31.0

關鍵是構建QuantizationSimModel：

sim = QuantizationSimModel(model=model,quant_scheme=QuantScheme.post_training_tf_enhanced,dummy_input=dummy_input,default_output_bw=8,  # activation quantizations的bitwidthdefault_param_bw=8)   # parameter  quantizations的bitwidth

真正的QAT也只用一行代碼完成：

ImageNetDataPipeline.finetune(sim.model, epochs=1, learning_rate=5e-7, learning_rate_schedule=[5, 10], use_cuda=use_cuda)

也可以把sim當成一個正常的模型，然后使用常規的torch的梯度更新的訓練方法進行訓練。

reference：

基于CentOS更新 glibc - 解決 `GLIBC_2.29‘ not found-CSDN博客

MVision/CNN/Deep_Compression/quantization at master · Ewenwan/MVision · GitHub

模型量化了解一下？ - 知乎

Snapdragon Neural Processing Engine SDK: Features Overview

Tensorflow模型量化(Quantization)原理及其實現方法 - 知乎

GitHub - quic/aimet: AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.AI Model Efficiency ToolkitGitHub - quic/aimet: AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/19303.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/19303.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/19303.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！