今天遇到一個關于CUDA的問題,我要跑的深度學習代碼,他里面有cuda編程,需要編譯。但是你運行就報錯。
代碼提示我大段報錯。
(score-denoise) ubuntu@GPUA10002:~/wbd/score-denoise_Transformerdepth20$ python train.py
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ubuntu/wbd/score-denoise_Transformerdepth20/utils/cutils/build/build.ninja...
Building extension module cutils_...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=cutils_ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/TH -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/THC -isystem /data/miniconda3/envs/score-denoise/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -Xptxas -v --generate-code=arch=compute_80,code=sm_80 -std=c++14 -c /home/ubuntu/wbd/score-denoise_Transformerdepth20/utils/cutils/srcs/half_aligned_knn_sub_maxpooling.cu -o half_aligned_knn_sub_maxpooling.cuda.o
FAILED: half_aligned_knn_sub_maxpooling.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=cutils_ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/TH -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/THC -isystem /data/miniconda3/envs/score-denoise/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -Xptxas -v --generate-code=arch=compute_80,code=sm_80 -std=c++14 -c /home/ubuntu/wbd/score-denoise_Transformerdepth20/utils/cutils/srcs/half_aligned_knn_sub_maxpooling.cu -o half_aligned_knn_sub_maxpooling.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_80'
[2/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=cutils_ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/TH -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/THC -isystem /data/miniconda3/envs/score-denoise/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -Xptxas -v --generate-code=arch=compute_80,code=sm_80 -std=c++14 -c /home/ubuntu/wbd/score-denoise_Transformerdepth20/utils/cutils/srcs/aligned_knn_sub_maxpooling.cu -o aligned_knn_sub_maxpooling.cuda.o
FAILED: aligned_knn_sub_maxpooling.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=cutils_ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/TH -isystem /data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/include/THC -isystem /data/miniconda3/envs/score-denoise/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -Xptxas -v --generate-code=arch=compute_80,code=sm_80 -std=c++14 -c /home/ubuntu/wbd/score-denoise_Transformerdepth20/utils/cutils/srcs/aligned_knn_sub_maxpooling.cu -o aligned_knn_sub_maxpooling.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_80'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):File "/data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1666, in _run_ninja_buildsubprocess.run(File "/data/miniconda3/envs/score-denoise/lib/python3.8/subprocess.py", line 516, in runraise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.The above exception was the direct cause of the following exception:Traceback (most recent call last):File "train.py", line 13, in <module>from models.denoise import *File "/home/ubuntu/wbd/score-denoise_Transformerdepth20/models/denoise.py", line 7, in <module>from .feature import FeatureExtractionWithResLFEFile "/home/ubuntu/wbd/score-denoise_Transformerdepth20/models/feature.py", line 6, in <module>from .ResLFE_block import ResLFE_BlockFile "/home/ubuntu/wbd/score-denoise_Transformerdepth20/models/ResLFE_block.py", line 8, in <module>from utils.cutils import knn_edge_maxpoolingFile "/home/ubuntu/wbd/score-denoise_Transformerdepth20/utils/cutils/__init__.py", line 14, in <module>cutils = load("cutils_", sources=sources, extra_cflags=["-O3", "-mavx2", "-funroll-loops"], extra_cuda_cflags=["-Xptxas","-v", "--generate-code=arch=compute_80,code=sm_80"],File "/data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1080, in loadreturn _jit_compile(File "/data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1293, in _jit_compile_write_ninja_file_and_build_library(File "/data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1405, in _write_ninja_file_and_build_library_run_ninja_build(File "/data/miniconda3/envs/score-denoise/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_buildraise RuntimeError(message) from e
RuntimeError: Error building extension 'cutils_'
然后你問ai,ai給你的建議是查看nvcc版本
nvcc --version
然后你發現,沒有這個,然后會提示你
sudo apt install nvidia-cuda-toolkit
然后你又去安裝,安裝好后查看了一下
(base) ubuntu@GPUA10002:~/wbd/score-denoise_Transformerdepth20$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
發現這是很老的版本
新版本的這樣的提示
(score-denoise) wu@wu:~/code/pointDenoise/score-denoise_Transformerdepth20$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
然后老版本的,我就編譯不通過。
然后我就一直在查找相關問題,一直在想,沒安裝nvcc前,別人也能夠跑深度學習,然后感覺很奇怪。
然后我后面把這個卸載了
sudo apt remove nvidia-cuda-toolkit
這個是可以卸載的,大家放心。
然后就報錯CUDA路徑的問題
然后我就去
~/.bashrc
找
發現里面是有這個路徑的,然后我就試著在終端輸入
export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
還是不行。
然后我就沒轍了。
最后的最后,我進入/usr/local
查看,發現他下載的是cuda-12.0,不是12.1
然后我就在我要跑的代碼終端,輸入下面兩個命令,然后執行跑代碼,就能夠編譯通過了。
export PATH=/usr/local/cuda-12.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64:$LD_LIBRARY_PATH
我也不知道為啥,~/.bashrc
里面為啥不是cuda-12.1,所以我也沒有把他里面改成12.1,就按照原來的吧,以后要跑,直接先輸入這兩行命令,然后執行代碼,當然,如果你不要編譯,你直接運行是沒有關系的。
這個問題,浪費了我一下午的時間,如果你也有相關問題,一定要注意,去看看/usr/local/
到底是多少版本的cuda。