.報錯For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/aten/src/ATen/native/cuda/NLLLoss2d.cu:103: nll_loss2d_forward_kernel: block: [29,0,0], thread: [707,0,0] Assertion t >= 0 && t < n_classes
failed.
報錯信息如下:
./aten/src/ATen/native/cuda/NLLLoss2d.cu:103: nll_loss2d_forward_kernel: block: [29,0,0], thread: [707,0,0] Assertion t >= 0 && t < n_classes
failed.
。。。。。。
。。。。。。
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
模型運行訓練,可到epoch=9 ,報錯
刪除models/__pycache__下的緩存文件,重新運行數據集,還是會報錯。
解決方案:
是標簽有問題,有一張圖片標簽壞了,某張圖片的label標簽個數超過了設定的類別數。