一、Intel Vtune的安裝
在前面初步認識了一下幾個性能的測試工具,本篇重點介紹一下Intel VTune Profiler,VTune是一個強大的性能分析工具,它屬于Intel oneAPI工具包中工具的一種。VTune的安裝只介紹在Linux平臺下的場景(Windows安裝相對簡單)。
1、兩種安裝方法
第一種方法:
打開網址https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit。找到合適的版本安裝,不過這個安裝是一個全版本的安裝,可能會安裝很多其它工具。
第二種方法:
使用命令安裝:
sudo yum install intel-oneapi-vtune # CentOS
sudo apt install intel-oneapi-vtune # Ubuntu/Debian
或使用離線安裝包:
wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/6bfca885-4156-491e-849b-1cd7da9cc760/intel-oneapi-base-toolkit-2025.1.1.36_offline.sh
sudo sh ./intel-oneapi-base-toolkit-2025.1.1.36_offline.sh -a --silent --cli --eula accept
2、配置
主要是設置環境變量和權限
//配置環境變量
source /opt/intel/oneapi/vtune/latest/env/vars.sh # oneAPI 版本
source /opt/intel/vtune_amplifier/amplxe-vars.sh # 老版本
//設置用戶權限
sudo usermod -aG vtune $USER
3、驗證
可以運行相關腳本:
bash /opt/intel/oneapi/vtune/latest/bin64/vtune-self-checker.sh
二、主要功能
VTune有三種使用方式,一個是使用GUI的方式,另外一種是命令方式,還有一種是遠程應用的方式。
1、使用GUI方式
使用下面的命令啟動:
vtune-gui
然后就可以使用UI進行處理相關的測試。首先創建一個新的測試項目,選擇相應的類型;其次配置測試程序的路徑或進程ID;最后啟動分析并查看結果,如對火焰圖或調用棧等進行分析。
如下圖:
2、命令方式
執行下面命令:
vtune -collect hotspots -r ./result_dir -- ./your_application #運行
amplxe-cl -report hotspots -r ./result_dir -format text -report-output ./report.txt #生成報告
hotspots為分析的類型(其它還有locksandwaits等),生成的報告支持txt,csv及html。
3、遠程應用方式
遠程方式也比較簡單,它也分成有無界面的操作方式,其實這個和VTune本身沒有什么太大關系。可以使用遠程桌面的一些工具(如向日葵,VNC Server等)或在一些開發IDE上使用插件(如VSCODE的oneAPI插件)。這里就不再展開分析說明。
三、例程應用
下面看一個例程分析,對OneAPI自帶的矩陣測試程序進行:
vtune -collect hotspots -r ~/result -- ./matrix
vtune: Warning: Microarchitecture performance insights will not be available. Make sure the sampling driver is installed and enabled on your system.
vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /home/fpc/result -command stop.
Addr of buf1 = 0x7f339f7b5010
Offs of buf1 = 0x7f339f7b5180
Addr of buf2 = 0x7f339d7b4010
Offs of buf2 = 0x7f339d7b41c0
Addr of buf3 = 0x7f339b7b3010
Offs of buf3 = 0x7f339b7b3100
Addr of buf4 = 0x7f33997b2010
Offs of buf4 = 0x7f33997b2140
Threads #: 16 Pthreads
Matrix size: 2048
Using multiply kernel: multiply1
Execution time = 3.516 seconds
vtune: Collection stopped.
vtune: Using result path `/home/fpc/result'
vtune: Executing actions 20 % Resolving information for `libtpsstool.so'
vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2024.0/lib64/libtpsstool.so'.
vtune: Executing actions 75 % Generating a report Elapsed Time: 3.535sCPU Time: 46.929sEffective Time: 46.929sSpin Time: 0sOverhead Time: 0sTotal Thread Count: 17Paused Time: 0sTop Hotspots
Function Module CPU Time % of CPU Time(%)
--------- --------- -------- ----------------
multiply1 matrix 46.909s 100.0%
init_arr matrix 0.010s 0.0%
__GI_ libc.so.6 0.010s 0.0%
Collection and Platform InfoApplication Command Line: ./matrixOperating System: 5.19.0-50-generic DISTRIB_ID=Kylin DISTRIB_RELEASE=V10 DISTRIB_CODENAME=kylin DISTRIB_DESCRIPTION="Kylin V10 SP1" DISTRIB_KYLIN_RELEASE=V10 DISTRIB_VERSION_TYPE=enterprise DISTRIB_VERSION_MODE=normalComputer Name: fjfResult Size: 4.5 MBCollection start time: 10:55:09 12/05/2025 UTCCollection stop time: 10:55:13 12/05/2025 UTCCollector Type: User-mode sampling and tracingCPUName: Intel(R) microarchitecture code named Alderlake-SFrequency: 2.112 GHzLogical CPU Count: 20Cache Allocation TechnologyLevel 2 capability: not detectedLevel 3 capability: not detectedIf you want to skip descriptions of detected performance issues in the report,
enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
Alternatively, you may view the report in the csv format: vtune -report
<report_name> -format=csv.
vtune: Executing actions 100 % done
同時,會在指定的目錄下,生成一個文件夾,內部有不少的供分析的相關文件,此處為/home/user/result
如果在執行命令時出現:
vtune: Error: Cannot start data collection because the scope of ptrace system call is limited. To enable profiling, please set /proc/sys/kernel/yama/ptrace_scope to 0. To make this change permanent, set kernel.yama.ptrace_scope to 0 in /etc/sysctl.d/10-ptrace.conf and reboot the machine.
vtune: Warning: Microarchitecture performance insights will not be available. Make sure the sampling driver is installed and enabled on your system
可執行命令:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
或
sudo sysctl -w kernel.yama.ptrace_scope=0
四、總結
會熟練的使用各種測試工具,是對程序進行優化的前提。特別是在一些性能需求相當關鍵的場景下,不借助工具會導致性能優化的效率急劇降低。磨刀不誤砍柴工,與諸君共勉!