終于可以體驗下risc-v了! 操作系統是openKylin,算能的云空間
嘗試編譯安裝pytorch
首先安裝git
apt install git
然后下載pytorch和算能cpu的庫:
git clone https://github.com/sophgo/cpuinfo.git
git clone https://github.com/pytorch/pytorch
注意事項:
cd pytorch
# 確保子模塊的遠程倉庫URL與父倉庫中的配置一致
git submodule sync
# 確保獲取并更新所有子模塊的內容,包括初始化尚未初始化的子模塊并遞歸地處理嵌套的子模塊
git submodule update --init --recursive
將pytorch/third-parth目錄的cpuinfo刪除,換成算能的cpu庫cpuinfo
cd pytorch
rm -rf cpuinfo
cp -rf ../cpuinfo .
安裝相關庫
apt install libopenblas-dev 報錯,可以跳過
apt install libblas-dev m4 cmake cython3 ccache
手工編譯安裝openblas
git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS
make -j8
make PREFIX=/usr/local/OpenBLAS install
編譯的時候是一堆warning啊
在/etc/profile最后一行添加:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/OpenBLAS/lib/
并執行:source? /etc/profile
修改代碼
到pytorch目錄,執行:?vi aten/src/ATen/CMakeLists.txt
??? aten/src/ATen/CMakeLists.txt
將語句:if(NOT MSVC AND NOT EMSCRIPTEN AND NOT INTERN_BUILD_MOBILE)
替換為:if(FALSE)
?? vi caffe2/CMakeLists.txt
將語句:target_link_libraries(${test_name}_${CPU_CAPABILITY} c10 sleef gtest_main)
替換為:target_link_libraries(${test_name}_${CPU_CAPABILITY} c10 gtest_main)
?? vi? test/cpp/api/CMakeLists.txt
在語句下:add_executable(test_api ${TORCH_API_TEST_SOURCES})
添加:target_compile_options(test_api PUBLIC -Wno-nonnull)
環境變量配置
# 直接在終端中輸入即可,重啟需要重新輸入
export USE_CUDA=0
export USE_DISTRIBUTED=0
export USE_MKLDNN=0
export MAX_JOBS=16
配置原文鏈接:https://blog.csdn.net/m0_49267873/article/details/135670989
編譯安裝
執行:
python3 setup.py develop --cmake
或者python3.10 setup.py install
據說要gcc 13以上,自帶的gcc版本:
gcc version 9.3.0 (Openkylin 9.3.0-ok12)
需要打patch:
# 若提示無patchelf命令,則執行下列語句
apt install patchelf
# path為存放libtorch_cpu.so的路徑
patchelf --add-needed libatomic.so.1 /path/libtorch_cpu.so
?
對算能云的系統來說,命令為:patchelf --add-needed libatomic.so.1? /root/pytorch/build/lib/libtorch_cpu.so
編譯前的準備
編譯前還需要安裝好這兩個庫:
pip3 install pyyaml typing_extensions
另外還要升級setuptools
pip3 install setuptools -U
最終編譯完成
在pytorch目錄執行:
python3 setup.py develop --cmake
整個編譯過程大約需要3-4個小時
最終編譯完成:
Installed /usr/lib/python3.8/site-packages/mpmath-1.3.0-py3.8.egg
Searching for typing-extensions==4.9.0
Best match: typing-extensions 4.9.0
Adding typing-extensions 4.9.0 to easy-install.pth file
detected new path './mpmath-1.3.0-py3.8.egg'
Using /usr/local/lib/python3.8/dist-packages
Finished processing dependencies for torch==2.3.0a0+git5c5b71b
測試
進入python3,執行import pytorch,報錯沒有pytorch。 執行import torch
看到沒有報錯,以為測試通過。其實是因為在pytorch目錄,有子目錄torch,誤以為pass了
是我唐突了,因為使用的develop模式,就是這樣用。
也就是必須在pytorch的目錄,這樣才能識別為develop的torch,在~/pytorch目錄,執行python3,在命令交互方式下,把下面這段代碼cp進去執行,測試通過
import torch
import torch.nn as nn
import torch.optim as optim
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"N,D_in,H,D_out = 64, 1000, 100, 10 # N: batch size, D_in:input size, H:hidden size, D_out: output size
x = torch.randn(N,D_in) # x = np.random.randn(N,D_in)
y = torch.randn(N,D_out) # y = np.random.randn(N,D_out)
w1 = torch.randn(D_in,H) # w1 = np.random.randn(D_in,H)
w2 = torch.randn(H,D_out) # w2 = np.random.randn(H,D_out)
learning_rate = 1e-6
for it in range(200):# forward passh = x.mm(w1) # N * H h = x.dot(w1)h_relu = h.clamp(min=0) # N * H np.maximum(h,0)y_pred = h_relu.mm(w2) # N * D_out h_relu.dot(w2) # compute lossloss = (y_pred - y).pow(2).sum() # np.square(y_pred-y).sum()print(it,loss.item()) # print(it,loss) # BP - compute the gradientgrad_y_pred = 2.0 * (y_pred-y)grad_w2 = h_relu.t().mm(grad_y_pred) # h_relu.T.dot(grad_y_pred)grad_h_relu = grad_y_pred.mm(w2.t()) # grad_y_pred.dot(w2.T)grad_h = grad_h_relu.clone() # grad_h_relu.copy()grad_h[h<0] = 0grad_w1 = x.t().mm(grad_h) # x.T.dot(grad_h) # update weights of w1 and w2w1 -= learning_rate * grad_w1w2 -= learning_rate * grad_w2
0 29870438.0
1 26166322.0
2 25949932.0
3 25343224.0
4 22287072.0
5 16840522.0
6 11024538.0
7 6543464.5
8 3774165.25
9 2248810.5
10 1440020.25
11 1001724.5
12 749632.625
13 592216.6875
14 485451.34375
15 407586.65625
16 347618.4375
17 299686.625
18 260381.9375
19 227590.734375
怎樣全環境可以用torch呢?
感覺是環境變量的問題,敬請期待
調試
安裝libopenblas-dev報錯
root@863c89a419ec:~/pytorch/third_party# apt install libopenblas-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package libopenblas-dev is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
竟然有人已經過了這個坑,可以跳過它,用編譯安裝openblas代替
編譯pytorch的時候報錯
python3 setup.py develop --cmake
Building wheel torch-2.3.0a0+git5c5b71b
-- Building version 2.3.0a0+git5c5b71b
Could not find any of CMakeLists.txt, Makefile, setup.py, LICENSE, LICENSE.md, LICENSE.txt in /root/pytorch/third_party/pybind11
Did you run 'git submodule update --init --recursive'?
進入third_parth目錄執行下面命令解決:
rm -rf pthreadpool
# 執行下列指令前回退到pytorch目錄
git submodule update --init --recursive
執行完還是報錯:
root@863c89a419ec:~/pytorch# python3 setup.py develop --cmake
Building wheel torch-2.3.0a0+git5c5b71b
-- Building version 2.3.0a0+git5c5b71b
Could not find any of CMakeLists.txt, Makefile, setup.py, LICENSE, LICENSE.md, LICENSE.txt in /root/pytorch/third_party/QNNPACK
Did you run 'git submodule update --init --recursive'?
再次執行命令 git submodule update --init --recursive 照舊。
將QNNPACK目錄刪除,再執行一遍 git submodule update --init --recursive ,過了。
報錯RuntimeError: Missing build dependency: Unable to `import yaml`.
python3 install pyyaml
報錯:ModuleNotFoundError: No module named 'typing_extensions'
python3 install typing_extensions 搞定。
編譯到78%報錯
/usr/bin/ld: /root/pytorch/build/lib/libtorch_cpu.so: undefined reference to `__atomic_exchange_1'
collect2: error: ld returned 1 exit status
make[2]: *** [caffe2/CMakeFiles/NamedTensor_test.dir/build.make:101: bin/NamedTensor_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:3288: caffe2/CMakeFiles/NamedTensor_test.dir/all] Error 2
/usr/bin/ld: /root/pytorch/build/lib/libtorch_cpu.so: undefined reference to `__atomic_exchange_1'
collect2: error: ld returned 1 exit status
make[2]: *** [caffe2/CMakeFiles/cpu_profiling_allocator_test.dir/build.make:101: bin/cpu_profiling_allocator_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:3505: caffe2/CMakeFiles/cpu_profiling_allocator_test.dir/all] Error 2
[ 78%] Linking CXX executable ../bin/cpu_rng_test
/usr/bin/ld: /root/pytorch/build/lib/libtorch_cpu.so: undefined reference to `__atomic_exchange_1'
collect2: error: ld returned 1 exit status
make[2]: *** [caffe2/CMakeFiles/cpu_rng_test.dir/build.make:101: bin/cpu_rng_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:3536: caffe2/CMakeFiles/cpu_rng_test.dir/all] Error 2
make: *** [Makefile:146: all] Error 2
初步懷疑是cpu庫有問題。看cpu庫,沒問題。
試試這個辦法:
問題分析:對__atomic_exchange_1的未定義引用
解決方法:使用patchelf添加需要的動態庫
# 若提示無patchelf命令,則執行下列語句
apt install patchelf
# path為存放libtorch_cpu.so的路徑
patchelf --add-needed libatomic.so.1 /path/libtorch_cpu.so
?
存放libtorch_cpu.so的路徑:/root/pytorch/build/lib/libtorch_cpu.so
因此命令為:patchelf --add-needed libatomic.so.1 /root/pytorch/build/lib/libtorch_cpu.so
果然運行完這條命令后,編譯就能繼續下去了。
編譯100%報錯
running develop
/usr/lib/python3/dist-packages/setuptools/command/easy_install.py:146: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
? warnings.warn(
Traceback (most recent call last):
? File "setup.py", line 1401, in <module>
??? main()
? File "setup.py", line 1346, in main
??? setup(
? File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 87, in setup
??? return distutils.core.setup(**attrs)
? File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 185, in setup
??? return run_commands(dist)
? File "/usr/lib/python3/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
??? dist.run_commands()
? File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 973, in run_commands
??? self.run_command(cmd)
? File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 1217, in run_command
??? super().run_command(command)
? File "/usr/lib/python3/dist-packages/setuptools/_distutils/dist.py", line 991, in run_command
??? cmd_obj.ensure_finalized()
? File "/usr/lib/python3/dist-packages/setuptools/_distutils/cmd.py", line 109, in ensure_finalized
??? self.finalize_options()
? File "/usr/lib/python3/dist-packages/setuptools/command/develop.py", line 52, in finalize_options
??? easy_install.finalize_options(self)
? File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 231, in finalize_options
??? self.config_vars = dict(sysconfig.get_config_vars())
UnboundLocalError: local variable 'sysconfig' referenced before assignment
嘗試升級setuptools試試
root@863c89a419ec:~# pip3 install? setuptools -U
Collecting setuptools
? Using cached setuptools-69.1.0-py3-none-any.whl (819 kB)
Installing collected packages: setuptools
? Attempting uninstall: setuptools
??? Found existing installation: setuptools 65.3.0
??? Not uninstalling setuptools at /usr/lib/python3/dist-packages, outside environment /usr
??? Can't uninstall 'setuptools'. No files were found to uninstall.
Successfully installed setuptools-69.1.0
然后再次編譯,過了!
查看gcc版本
據說要gcc 13以上,自帶的gcc版本:
gcc version 9.3.0 (Openkylin 9.3.0-ok12)
gcc version 9.3.0 (Openkylin 9.3.0-ok12)