RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one.

報錯信息

報錯信息：

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn’t able to locate the output tensors in the return value of your module’s forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).

遇到這個報錯的原因可能有很多，設置torch.nn.parallel.DistributedDataParallel的參數find_unused_parameters=True之類的方法就不提了，報錯信息中給的很清楚，看不懂的話google翻譯一下即可。

運行時錯誤：預計在開始新迭代之前已完成前一次迭代的減少。此錯誤表明您的模塊具有未用于產生損耗的參數。您可以通過 (1) 將關鍵字參數 find_unused_parameters=True 傳遞給 torch.nn.parallel.DistributedDataParallel 來啟用未使用的參數檢測； (2) 確保所有 forward 函數輸出都參與計算損失。如果您已經完成了上述兩個步驟，那么分布式數據并行模塊無法在模塊的 forward 函數的返回值中定位輸出張量。報告此問題時，請包括損失函數和模塊 forward 返回值的結構（例如 list、dict、iterable）。

如果改個參數能夠就能夠解決你的問題的話，你也不會找到這篇博客了^^。

解決方法（之一）

這里其實報錯的最后一句值得注意：

如果您已經完成了上述兩個步驟，那么分布式數據并行模塊無法在模塊的 forward 函數的返回值中定位輸出張量。報告此問題時，請包括損失函數和模塊 forward 返回值的結構（例如 list、dict、iterable）。

但是第一次遇到這個問題只看官方的提示信息可能還是云里霧里，這里筆者將自己的理解和解決過程分享出來。

說的簡單點，其實就一句話：確保你的所有的forward的函數的所有輸出都被用于計算損失函數了。

注意，不僅僅是你的模型的forward函數的輸出，可能你的損失函數也是通過forward函數來計算的。也就是說，所有繼承自nn.Module的模塊（不只是模型本身）的forward函數的所有輸出都要參與損失函數的計算。

筆者本身遇到的問題就是，在多任務學習中，損失函數是通過一個整個繼承自nn.Module的模塊來計算的，但是在forward返回的loss中少加了一個任務的loss，導致這個報錯。


class multi_task_loss(nn.Module):def __init__(self, device, batch_size):super().__init__()self.ce_loss_func = nn.CrossEntropyLoss()self.l1_loss_func = nn.L1Loss()self.contra_loss_func = ContrastiveLoss(batch_size, device)def forward(self, rot_p, rot_t, pert_p, pert_t, emb_o, emb_h, emb_p,original_imgs, rect_imgs):rot_loss = self.ce_loss_func(rot_p, rot_t)pert_loss = self.ce_loss_func(pert_p, pert_t)contra_loss = self.contra_loss_func(emb_o, emb_h) \+ self.contra_loss_func(emb_o, emb_p) \+ self.contra_loss_func(emb_p, emb_h)rect_loss = self.l1_loss_func(original_imgs, rect_imgs)# tol_loss = rot_loss + pert_loss + rect_loss 				# 少加了一個loss，但是所有loss都返回了tol_loss = rot_loss + pert_loss + contra_loss + rect_loss 		# 修改為此行后正常return tol_loss, (rot_loss, pert_loss, contra_loss, rect_loss)

讀者可以檢查一下自己整個的計算過程中（不只是模型本身），是否所有的forward的函數的所有輸出都被用于計算損失函數了。

Ref：

https://discuss.pytorch.org/t/need-help-runtimeerror-expected-to-have-finished-reduction-in-the-prior-iteration-before-starting-a-new-one/119247

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/532771.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/532771.shtml
英文地址，請注明出處：http://en.pswp.cn/news/532771.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！