可行版本:
python | 3.9.23 |
cuda | 12.0 |
tensorflow-gpu | 2.7.0 |
tensorboard | 2.20.0 |
tensorboard-plugin-profile | 2.4.0 |
?
?
?
?
?
問題描述:
1. 安裝tensorboard后運行`tensorboard --logdir=logs`在網頁中打開,發現profile模塊無法顯示,報錯如下:
The profile plugin has moved.
Please install the new version of the profile plugin from PyPI by running the following command from the machine running TensorBoard……
解決方案:按提示的要求安裝tensorboard-plugin-profile這個插件
pip install tensorboard-plugin-profile
(PROFILE圖標如果在導航欄不顯示,可以點擊右上角INACTIVE那里滑動選擇)
2.安裝tensorboard-plugin-profile后發現在終端運行`tensorboard --logdir=logs`報錯:
報錯1:TensorFlow installation not found - running with reduced feature set. W0821
或?ModuleNotFoundError: No module named 'tensorflow.tsl'
解決方案:
1.?確保你所在python解釋器環境正確(包含tensorflow-gpu庫)。
2. 可能是tensorflow-gpu缺失了某些文件,建議重新安裝。
3. 注意tensorflow和tensorflow-gpu在import時指令相同,但卸載時指令不同,防止混淆可以都卸載掉(看個人需求)。
4. 如果出現警告如:WARNING: Failed to remove contents in a temporary directory 'D:\anaconda3\envs\tensorflow_gpu\Lib\site-packages\tensorflow\~ython'. You can safely remove it manually.進入文件夾刪除即可。
pip uninstall tensorflow tensorflow-gpu -y
pip install tensorflow-gpu==2.7.0
報錯2:17:10:13.666622 7312 profile_plugin_loader.py:75] Unable to load profiler plugin. Import error: cannot import name '_pywrap_profiler_plugin' from 'tensorboard_plugin_profile.convert' (unknown location) Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all TensorBoard 2.20.0 at http://localhost:6006/ (Press CTRL+C to quit)
解決方案:安裝正確的tensorboard、tensorboard-plugin-profile插件版本
(不同的版本組合,報錯信息可能不同。但根本原因是版本不兼容,修改成兼容的版本就能解決)
pip uninstall tensorboard tensorboard-plugin-profile -ypip install tensorboard==2.4.0
pip install tensorboard-plugin-profile==2.20.0
3. 網頁打開后,profiler頁面成功出現,但是overview-page有ERRORS提示
ZELUAR: Failed to load libcupti (is it installed and accessible?)
Warnings
No step marker observed and hence the step time is unknown. This may happen if (1) training steps are not instrumented (e.g., if you are not using Keras) or (2) the profiling duration is shorter than the step time. For (1), you need to add step instrumentation; for (2), you may try to profile longer.
解決方案:通過搜索發現是修改CUDA的cupti文件命名錯誤問題,參考下面兩個鏈接
Tensorboard Profiler:未能加載libcupti (是否已安裝并可訪問?)-騰訊云開發者社區-騰訊云
已解決:tnsorflow-gpu 2.6.0運行的時候日志提示有報錯: Could not load dynamic library ‘cupti64_112.dll‘....._cupti dll-CSDN博客
1. 添加環境變量:把C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\extras\CUPTI\lib64添加到系統變量中(根據自己的安裝路徑添加,版本號不同路徑也不同,例如我的路徑:D:\CUDA\NVIDIA GPU Computing Toolkit\CUDA\v12.0\extras\CUPTI\lib64)
2. 修改C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\extras\CUPTI\lib64路徑中cupti64_2020.3.0.dll的名稱。我的cuda版本是12.0,路徑中是cupti64_2022.4.0.dll,當我把這個文件復制一份修改名稱為cupti64_112.dll后,發現問題解決。
3. 不同的版本可能會影響修改方式,由于我并沒有找到像上面兩個鏈接里的報錯,明確告訴我缺的是什么文件,因此推薦可以把.dll文件復制幾份修改成不同的名稱cupti64_112.dll或cupti64_113.dll或……,可以逐步判斷自己需要的是哪個。
4. 修改文件后,需要重新打開vscode,再次運行代碼,在終端執行tensorboard --logdir=logs
?