項目背景
網絡全部是分割了沒有檢測。
自動駕駛的車道線和可行駛區域在數據集中的表示
自動駕駛系統中的車道線和可行駛區域的表示方式主要有以下幾種:
-
基于幾何模型:使用幾何模型來描述車道線和可行駛區域的形狀和位置,例如直線、曲線、多邊形等。車道線和可行駛區域的幾何模型可以通過傳感器獲取的數據進行擬合和計算,例如攝像頭圖像、激光雷達點云等。
-
基于圖像分割:使用圖像分割技術來將圖像中的像素分為不同的類別,例如車道線、道路表面、背景等。這種方法通常需要使用深度學習模型進行訓練,以提高分割的準確性。
-
基于語義地圖:使用語義地圖來描述車道線和可行駛區域的位置和形狀。語義地圖是一種高級地圖,其中包含豐富的語義信息,例如道路結構、交通標志、路口、建筑物等。車道線和可行駛區域可以作為語義地圖中的一個層次,以提供更精確的位置和形狀信息。
-
基于點云:使用激光雷達或者毫米波雷達等傳感器獲取的點云數據,通過點云處理算法來提取車道線和可行駛區域的位置和形狀信息。點云數據通常包含大量的點,需要使用點云處理算法進行降噪、分割、擬合等操作,以提取有用的信息。
總之,自動駕駛系統中車道線和可行駛區域的表示方式是多種多樣的,選擇合適的表示方式可以提高自動駕駛系統的性能和可靠性。
- 就是首先先做一個,就是不是聯邦的一個學習版本兒,就是要對路上的一個東西進行一個分割和檢測,還有跟蹤,就比如說我分割這個汽車,然后這個路可形式區域車道線什么燈,然后還有什么建筑,天空還有什么,一共有19個類,分割的一共是就是18個類,外加上車道線跟可行駛區域,就一共算是20個類。然后,檢測,就是檢測的話,就是除了車道線纜行駛區的話都要檢測出來,然后,也要跟蹤下來,這個的話用一個V8其實就可以做,然后,因為現在現在做的是就是現在是在一個數據集上去訓練那個18個類的類的分割情況,然后再用這個訓練出來的模型作為再用訓練出來的這個back棒,然后去去訓練車道線可行駛區域,因為這個數據集現在還沒有整合,但是后這是現在做的,現在做的情況就做到這兒,但是后續,可能會把這個車道線跟行駛區域,都就是用一個模型就把數據集合并,然后用一個模型來做,這個是下一階段的一個任務,就是這個這個做好了,如果需然后,還需要看一看就是其他的這個模型的好優點。然后,盡量爭取把我們這個跟他們這個精度的話會提高一點,然后就是參考別的模型,改我們這個模型,可能也會是下下階段的一個東西,然后這個準確率啥的都差不多了以后,就進行聯邦學習,就是做一個能保護數據的一個,然后把查分隱私,什么聯邦學習直接都加進去,然后來保護數據,這個是最終目標兒。明兒的話你就來公司,然后正正好兒有一個人,就是現在做到這個程度的話,有一個人做的,然后,你們就是跟他去對接一下,最好是你能把這個v5
v6v8的模型看一看,但是你也可以后后補,因為另一個人,他已經提前看了兩三天了,之后再補也可以,然后明天就都來聽一下這個對接,然后里面的現在做到什么程度,明天就過來一起聽一下。好像是70
miou
因為yolop之前做的就是那個車道線和可行駛區域的任務
yolox沒考慮過啊
至于用什么模型好的話 是得把這個訓練模式搞定了以后下一步再思考的
現在的話數據里 一個是18類 city還有一個是可行駛區域和車道線的bdd
這倆數據做的
現在是分著訓練的 先把這倆看看怎么能一個模型訓練好
至于模型用哪個的話都行的
只要能把任務做出來就行
因為之前看到了那個v8
有檢測有分割
但是他那個檢測是分割標的框
所以可以參考city那個數據里的榜單改yolo8的
就是他其實訓練的就是分割 但是推理有檢測的效果而已 任務最后的效果能達到任務要求就行
因為分割的類不全 那天看的是所有類的效果
我們現在只用的是精標簽
那個就10個g好像
5000張照片?
yolop的分割頭用了yolo8的backbone
因為現在的數據 要是不能合起來做的話 真就標簽有問題 合不上 就得換數據的
之前看的是 這倆數據不是很能合上
因為bdd數據那個車道線的label就是一條線
硬合的話就是把線變粗很高提取多邊形標簽 這個是我猜的
bdd100k
https://blog.csdn.net/qq_41185868/article/details/100146709
https://zhuanlan.zhihu.com/p/110191240
BDD100K數據集總大小約為約約40TB。該數據集包含超過10萬張圖像和動態場景的視頻,每個場景都有高分辨率的圖像和視頻。此外,它還包含各種類型的標注信息,包括對象檢測和跟蹤標注、語義分割標注、實例分割標注、行人和車輛行為預測等。
由于BDD100K數據集非常大,需要大量的存儲和計算資源來處理和分析。為方便研究人員使用,BDD100K提供了多種數據格式和工具,如圖像、視頻、標注文件等。同時,BDD100K還提供了一個基準測試平臺,方便用戶進行場景理解和自動駕駛等領域的研究和開發。
BDD100K是一個大規模的自動駕駛數據集,包含超過10萬個在城市環境下采集的高分辨率圖像和視頻。該數據集由加州大學伯克利分校(UC Berkeley)的計算機視覺和機器學習實驗室(BAIR)發布,旨在為自動駕駛研究提供豐富的數據資源。
BDD100K數據集包含以下內容:
-
圖像和視頻:BDD100K包含超過10萬個采集自不同城市的高分辨率圖像和視頻。
-
多種標注信息:BDD100K提供了多種標注信息,包括對象檢測、語義分割、實例分割、行人行為預測、車輛行為預測等。
-
多種場景:BDD100K涵蓋了多種城市場景,包括城市街道、高速公路、停車場等。
-
多樣化的對象:BDD100K包含了多種不同的對象,包括行人、車輛、自行車、交通標志等。
BDD100K數據集的發布為自動駕駛領域的研究提供了寶貴的數據資源,可以用于訓練和評估自動駕駛系統的性能。同時,BDD100K也為計算機視覺和機器學習領域的研究提供了一個豐富的數據集,可以用于訓練和評估各種視覺算法和模型的性能。
BDD100K數據集涵蓋了多種城市街道環境下的場景和對象,包括但不限于:
-
場景:城市街道、高速公路、停車場、城市廣場、城市交叉口等。
-
對象:行人、自行車、汽車、卡車、公交車、摩托車、交通標志、建筑物等。
-
天氣:晴天、陰天、雨天、夜晚等。
-
道路情況:道路標線、隧道、橋梁、交通燈、路障等。
BDD100K數據集提供了多種數據格式,包括圖像、視頻和標注信息等。下面是BDD100K數據集中常用的幾種數據格式:
-
圖像格式:BDD100K中的圖像格式為JPEG格式,每張圖像大小為720x1280像素或者1080x1920像素。
-
視頻格式:BDD100K中的視頻格式為MP4格式,每個視頻的時長不定,一般在30秒到5分鐘之間。
-
標注信息格式:BDD100K提供了多種標注信息,包括對象檢測、語義分割、實例分割、行人行為預測、車輛行為預測等。標注信息以JSON格式保存,每個JSON文件對應一個圖像或視頻的標注信息。
在BDD100K數據集中,每個圖像或視頻都有一個唯一的ID,所有的標注信息都與這個ID相關聯。此外,BDD100K還提供了一個數據集索引文件,該文件列出了所有圖像和視頻的ID以及它們的路徑和標注信息文件的路徑,方便用戶快速訪問數據集中的數據。
cityspace
Cityscapes是一個用于計算機視覺領域的大規模數據集,它包含高分辨率的街景圖像,旨在為城市街景分割和場景理解等任務提供數據支持。該數據集由德國斯圖加特大學計算機視覺中心(MPI-IS)和德國馬克斯·普朗克計算機科學研究所(MPI-INF)聯合發布。
Cityscapes數據集總大小約為350GB。具體來說,它包含5000張2048x1024像素的高分辨率RGB圖像,以及與每張圖像相關聯的多種標注信息,包括像素級別的語義分割標注、實例級別的語義分割標注、2D和3D邊界框標注等。此外,它還包含相機參數、車輛軌跡、城市地圖等其他數據。
Cityscapes數據集是一個相對較大的數據集,需要大量的存儲和計算資源來處理和分析。為了方便用戶使用,Cityscapes數據集提供了多種數據格式和工具,如圖像、標注信息、相機參數、城市地圖等。同時,Cityscapes還提供了一個基準測試平臺,方便用戶進行城市街景分割和場景理解等任務的評估和比較。
Cityscapes數據集包含以下數據:
-
圖像數據:Cityscapes包含5000張高分辨率的街景圖像,每張圖像大小為2048x1024像素。
-
標注信息:Cityscapes提供了精細的標注信息,包括像素級別的語義分割標注、實例級別的語義分割標注、2D和3D邊界框標注等。標注信息使用JSON和PNG格式保存。
-
其他數據:Cityscapes還提供了相機參數、車輛軌跡、城市地圖等其他數據,用于輔助研究和開發城市街景分割和場景理解等任務。
Cityscapes數據集包含以下內容:
-
圖像數據:Cityscapes包含5000張高分辨率的街景圖像,每張圖像大小為2048x1024像素。
-
標注信息:Cityscapes提供了精細的標注信息,包括像素級別的語義分割標注、實例級別的語義分割標注、2D和3D邊界框標注等。
-
多樣化的場景:Cityscapes涵蓋了德國和其他歐洲城市的多樣化場景,包括城市中心、郊區、高速公路等。
-
多樣化的天氣和時間:Cityscapes數據集中的圖像采集于不同的天氣和時間,包括晴天、陰天、夜晚等。
Cityscapes數據集是計算機視覺領域中一個重要的數據集,它可以用于訓練和評估各種視覺算法和模型的性能。該數據集的發布也為城市街景分割和場景理解等任務提供了寶貴的數據資源。
Cityscapes數據集提供了精細的標注信息,包括像素級別的語義分割標注、實例級別的語義分割標注、2D和3D邊界框標注等。下面是Cityscapes數據集中常用的標注格式:
-
像素級別的語義分割標注格式:對于每張圖像,Cityscapes提供了一個像素級別的語義分割標注圖,其中每個像素都被標注為屬于某一個類別。標注圖使用PNG格式保存,每個像素的值代表該像素所屬的類別ID。
-
實例級別的語義分割標注格式:Cityscapes提供了實例級別的語義分割標注,其中每個實例都被標注為一個獨立的對象。標注信息使用JSON格式保存,每個JSON文件對應一個圖像的標注信息,其中包含每個實例的ID、類別、像素坐標等信息。
-
2D和3D邊界框標注格式:Cityscapes提供了2D和3D邊界框標注,用于表示物體在圖像中的位置和大小。標注信息使用JSON格式保存,每個JSON文件對應一個圖像的標注信息,其中包含每個物體的ID、類別、2D邊界框和3D邊界框等信息。
總之,Cityscapes數據集提供了多種精細的標注信息格式,方便用戶進行城市街景分割和場景理解等任務的研究和開發。
以下是一些可以加快訓練速度的方法:
-
GPU加速:使用GPU加速訓練可以極大地提高訓練速度。選擇在性能較高的GPU上訓練,并使用CUDA或OpenCL等GPU計算框架可以加速深度學習計算。
-
數據增強:通過數據增強技術,如隨機裁剪、旋轉、翻轉、縮放等,可以增加訓練樣本的多樣性,減少過擬合現象,提高訓練效果和速度。
-
分布式訓練:使用分布式訓練技術可以將訓練任務分布到多個計算節點上,從而加快訓練速度。目前,TensorFlow、PyTorch等深度學習框架都支持分布式訓練。
-
小批次訓練:使用小批次訓練可以減少計算量和存儲需求,提高訓練速度。同時,使用動態調整批量大小的方法,可以根據模型和數據的特性來自適應地調整批量大小。
-
模型剪枝和量化:通過模型剪枝和量化等技術,可以減少模型參數和計算量,從而提高模型的計算效率和速度。
-
預訓練模型:使用預訓練模型可以減少訓練時間和數據需求,同時提高模型的泛化能力和表現。
總之,通過合理的算法設計和優化,以及合適的硬件設備和軟件工具,可以加快深度學習模型的訓練速度并提高效果。
聯邦學習可以在某些情況下加快訓練速度。與傳統的集中式機器學習不同,聯邦學習將模型分布在多個設備或用戶上進行訓練,通過在本地訓練和更新模型,再將本地模型聚合成全局模型,從而實現模型的更新和優化。
聯邦學習的優點之一是可以避免數據集中化帶來的隱私泄露和安全問題,同時還可以利用分布式計算的優勢,加快訓練速度和提高模型的泛化能力。特別是在大規模數據集和復雜模型的情況下,聯邦學習可以減少數據傳輸和存儲需求,從而提高訓練效率。
然而,聯邦學習也存在一些挑戰和限制。例如,在聯邦學習中,每個設備或用戶可能具有不同的數據分布和特性,這可能導致模型的收斂速度和效果不穩定。此外,聯邦學習還需要處理模型合并和安全性等問題,需要一定的技術和算法支持。
綜上所述,聯邦學習可以在特定情況下加快訓練速度,但需要根據具體場景和應用需求進行選擇和優化。
yolov8
YOLOv8是一個用于目標檢測、圖像分割和圖像分類任務的深度學習模型。它基于深度學習和計算機視覺的前沿進展構建,具有速度和準確性方面的卓越性能。其簡化的設計使其適用于各種應用程序,并易于適應不同的硬件平臺,從邊緣設備到云API。YOLOv8旨在支持任何YOLO架構,而不僅僅是v8。
YOLOv8和YOLOv5都是快速的目標檢測模型,能夠實時處理圖像。然而,YOLOv8比YOLOv5更快,因此對于需要實時目標檢測的應用程序來說,它是更好的選擇1。此外,YOLOv5比YOLOv8更易于使用,因為它是基于PyTorch框架構建的,使開發人員可以輕松使用和部署1。YOLOv8為訓練執行目標檢測、實例分割和圖像分類的模型提供了統一的框架1。
YOLOv8在準確性方面表現優異。在COCO數據集上,YOLOv8s模型的平均精度為51.4%,而YOLOv8m模型的平均精度為54.2%。此外,YOLOv8在檢測小物體方面表現出色,并解決了YOLOv5的一些局限性。
YOLOv8和YOLOv5都是快速的目標檢測模型,能夠實時處理圖像。然而,YOLOv8比YOLOv5更快,因此對于需要實時目標檢測的應用程序來說,它是更好的選擇1。此外,YOLOv5比YOLOv8更易于使用,因為它是基于PyTorch框架構建的,使開發人員可以輕松使用和部署。YOLOv8為訓練執行目標檢測、實例分割和圖像分類的模型提供了統一的框架1。
YOLOv8在準確性方面表現優異。在COCO數據集上,YOLOv8s模型的平均精度為51.4%,而YOLOv8m模型的平均精度為54.2%。此外,YOLOv8在檢測小物體方面表現出色,并解決了YOLOv5的一些局限性。
p5 640
p6 1280
YOLOv8-Seg 模型是 YOLOv8 對象檢測模型的擴展,它還對輸入圖像進行語義分割。YOLOv8-Seg 模型的主干是一個 CSPDarknet53 特征提取器,其后是一個新穎的 C2f 模塊,而不是傳統的 YOLO 頸部架構。C2f 模塊之后是兩個分割頭,它們學習預測輸入圖像的語義分割掩碼。該模型具有與 YOLOv8 類似的檢測頭,由五個檢測模塊和一個預測層組成。YOLOv8-Seg 模型已被證明可以在各種對象檢測和語義分割基準測試中取得最先進的結果,同時保持高速和高效。
與 YOLOv5 相比的變化:
C3用模塊C2f替換模塊
將第一個替換6x6 Conv為3x3 ConvBackbone
刪除兩個Conv(YOLOv5 config中的No.10和No.14)
將第一個替換1x1 Conv為3x3 ConvBottleneck
使用 decoupled head 并刪除objectness分支
YOLOv3的backbone是DarkNet53,而YOLOv8是Ultralytics的一個新模型,它采用了新的backbone網絡、新的無錨點分割頭和新的損失函數,使其在速度、大小和準確性方面都有所提高。
yolop
https://cloud.tencent.com/developer/article/2115041
YOLOP是一種全景駕駛感知網絡,它是一個高效的多任務網絡,可以一個網絡模型同時處理自動駕駛中的三個關鍵任務:物體檢測、可行駛區域分割和車道檢測,減少推理時間的同時提高了每項任務的性能,顯著節省計算成本
YOLOP的分割頭是可行駛區域分割和車道線檢測。
我們提出了一種簡單高效的前饋網絡,它可以同時完成交通目標檢測、可行駛區域分割和車道檢測任務。如圖2所示,我們的全景駕駛感知單次網絡,稱為YOLOP,包含一個共享編碼器和三個后續解碼器,用于解決特定任務。不同解碼器之間沒有復雜和冗余的共享塊,這減少了計算消耗,并使我們的網絡能夠輕松地進行端到端訓練。3.1、編碼器我們的網絡共享一個編碼器,由骨干網絡和瓶頸網絡組成。3.1.1骨干網骨干網用于提取輸入圖像的特征。通常,一些經典的圖像分類網絡作為骨干。由于YOLOv4[1]在對象檢測方面的優異性能,我們選擇CSPDarket[26]作為骨干,解決了優化過程中梯度重復的問題[27]。它支持特征傳播和特征重用,從而減少了參數和計算的數量。因此,有利于保證網絡的實時性能。
頸部用于融合主干生成的特征。我們的頸部主要由空間金字塔池(SPP)模塊[8]和特征金字塔網絡(FPN)模塊[11]組成。SPP生成并融合不同尺度的特征,FPN融合不同語義層次的特征,使生成的特征包含多個尺度和多個語義層次的信息。我們在工作中采用了串聯的方法來融合特征。
我們網絡中的三個頭是用于這三項任務的特定解碼器。
{0: ‘person’, 1: ‘bicycle’, 2: ‘car’, 3: ‘motorcycle’, 4: ‘airplane’, 5: ‘bus’, 6: ‘train’, 7: ‘truck’, 8: ‘boat’, 9: ‘traffic light’, 10: ‘fire hydrant’, 11: ‘stop sign’, 12: ‘parking meter’, 13: ‘bench’, 14: ‘bird’, 15: ‘cat’, 16: ‘dog’, 17: ‘horse’, 18: ‘sheep’, 19: ‘cow’, 20: ‘elephant’, 21: ‘bear’, 22: ‘zebra’, 23: ‘giraffe’, 24: ‘backpack’, 25: ‘umbrella’, 26: ‘handbag’, 27: ‘tie’, 28: ‘suitcase’, 29: ‘frisbee’, 30: ‘skis’, 31: ‘snowboard’, 32: ‘sports ball’, 33: ‘kite’, 34: ‘baseball bat’, 35: ‘baseball glove’, 36: ‘skateboard’, 37: ‘surfboard’, 38: ‘tennis racket’, 39: ‘bottle’, 40: ‘wine glass’, 41: ‘cup’, 42: ‘fork’, 43: ‘knife’, 44: ‘spoon’, 45: ‘bowl’, 46: ‘banana’, 47: ‘apple’, 48: ‘sandwich’, 49: ‘orange’, 50: ‘broccoli’, 51: ‘carrot’, 52: ‘hot dog’, 53: ‘pizza’, 54: ‘donut’, 55: ‘cake’, 56: ‘chair’, 57: ‘couch’, 58: ‘potted plant’, 59: ‘bed’, 60: ‘dining table’, 61: ‘toilet’, 62: ‘tv’, 63: ‘laptop’, 64: ‘mouse’, 65: ‘remote’, 66: ‘keyboard’, 67: ‘cell phone’, 68: ‘microwave’, 69: ‘oven’, 70: ‘toaster’, 71: ‘sink’, 72: ‘refrigerator’, 73: ‘book’, 74: ‘clock’, 75: ‘vase’, 76: ‘scissors’, 77: ‘teddy bear’, 78: ‘hair drier’, 79: ‘toothbrush’}
yolov8訓練代碼訓練數據集cityspaces
# Ultralytics YOLO 🚀, GPL-3.0 license
from copy import copyimport torch
import torch.nn.functional as Fimport sys
sys.path.append("/home/shenlan08/lihanlin_shijian/ultralytics")
# print(sys.path)from ultralytics.nn.tasks import SegmentationModel
from ultralytics.yolo import v8
from ultralytics.yolo.utils import DEFAULT_CFG, RANK, IterableSimpleNamespace, yaml_loadDEFAULT_CFG_DICT = yaml_load('/home/shenlan08/lihanlin_shijian/ultralytics/ultralytics/yolo/cfg/default.yaml')
DEFAULT_CFG = IterableSimpleNamespace(**DEFAULT_CFG_DICT)from ultralytics.yolo.utils.ops import crop_mask, xyxy2xywh
from ultralytics.yolo.utils.plotting import plot_images, plot_results
from ultralytics.yolo.utils.tal import make_anchors
from ultralytics.yolo.utils.torch_utils import de_parallel
from ultralytics.yolo.v8.detect.train import Loss此代碼似乎是segmentation trainer的類定義,它繼承自 detection trainer 類。
該方法使用配置字典初始化類并覆蓋,并將任務設置為“段”。該方法創建一個分割模型并加載預訓練的權重(如果提供)。該方法
返回一個分段驗證器對象,供訓練期間使用。該方法使用 SegLoss 函數計
算分割損失。該方法通過繪制輸入圖像、掩碼、類標簽、邊界框和文件路徑
來可視化訓練樣本。這__init__get_modelget_validatorcriterionplot_training_sam
plesplot_metrics方法繪制訓練指標,保存在CSV 文件中,用于分割任務。# BaseTrainer python usage
class SegmentationTrainer(v8.detect.DetectionTrainer):def __init__(self, cfg=DEFAULT_CFG, overrides=None):if overrides is None:overrides = {}overrides['task'] = 'segment'super().__init__(cfg, overrides)def get_model(self, cfg=None, weights=None, verbose=True):model = SegmentationModel(cfg, ch=3, nc=self.data['nc'], verbose=verbose and RANK == -1)if weights:model.load(weights)return modeldef get_validator(self):self.loss_names = 'box_loss', 'seg_loss', 'cls_loss', 'dfl_loss'return v8.segment.SegmentationValidator(self.test_loader, save_dir=self.save_dir, args=copy(self.args))def criterion(self, preds, batch):if not hasattr(self, 'compute_loss'):self.compute_loss = SegLoss(de_parallel(self.model), overlap=self.args.overlap_mask)return self.compute_loss(preds, batch)def plot_training_samples(self, batch, ni):images = batch['img']masks = batch['masks']cls = batch['cls'].squeeze(-1)bboxes = batch['bboxes']paths = batch['im_file']batch_idx = batch['batch_idx']plot_images(images, batch_idx, cls, bboxes, masks, paths=paths, fname=self.save_dir / f'train_batch{ni}.jpg')def plot_metrics(self):plot_results(file=self.csv, segment=True) # save results.png這段代碼定義了SegLoss類,它是該類的子類Loss。該方法使用分割模型和
指示是否計算掩碼重疊的標志來__init__初始化類。該方法使用預測的和真
實的掩碼和框來計算分割損失。它首先將預測特征拆分為預測分布、預測分
數和預測掩碼。然后它計算目標,這些目標由針對給定圖像大小進行預處理
的地面實況標簽和地面實況框組成。預測框SegLoss__call__然后解碼,
并將預測分數和真實分數分配給彼此以計算分類損失。如果存在前景蒙版,
則計算邊界框和分割損失。最后,將損失乘以相應的收益,求和并返回。該
single_mask_loss方法計算一幅圖像的預測掩模和真實掩模之間的二元交
叉熵損失。# Criterion class for computing training losses
class SegLoss(Loss):def __init__(self, model, overlap=True): # model must be de-paralleledsuper().__init__(model)self.nm = model.model[22].nm # number of masksself.overlap = overlapdef __call__(self, preds, batch):loss = torch.zeros(4, device=self.device) # box, cls, dflfeats, pred_masks, proto = preds if len(preds) == 3 else preds[1]batch_size, _, mask_h, mask_w = proto.shape # batch size, number of masks, mask height, mask widthpred_distri, pred_scores = torch.cat([xi.view(feats[0].shape[0], self.no, -1) for xi in feats], 2).split((self.reg_max * 4, self.nc), 1)# b, grids, ..pred_scores = pred_scores.permute(0, 2, 1).contiguous()pred_distri = pred_distri.permute(0, 2, 1).contiguous()pred_masks = pred_masks.permute(0, 2, 1).contiguous()dtype = pred_scores.dtypeimgsz = torch.tensor(feats[0].shape[2:], device=self.device, dtype=dtype) * self.stride[0] # image size (h,w)anchor_points, stride_tensor = make_anchors(feats, self.stride, 0.5)# targetstry:batch_idx = batch['batch_idx'].view(-1, 1)targets = torch.cat((batch_idx, batch['cls'].view(-1, 1), batch['bboxes']), 1)targets = self.preprocess(targets.to(self.device), batch_size, scale_tensor=imgsz[[1, 0, 1, 0]])gt_labels, gt_bboxes = targets.split((1, 4), 2) # cls, xyxymask_gt = gt_bboxes.sum(2, keepdim=True).gt_(0)except RuntimeError as e:raise TypeError('ERROR ? segment dataset incorrectly formatted or not a segment dataset.\n'"This error can occur when incorrectly training a 'segment' model on a 'detect' dataset, ""i.e. 'yolo train model=yolov8n-seg.pt data=coco128.yaml'.\nVerify your dataset is a ""correctly formatted 'segment' dataset using 'data=coco128-seg.yaml' "'as an example.\nSee https://docs.ultralytics.com/tasks/segmentation/ for help.') from e# pboxespred_bboxes = self.bbox_decode(anchor_points, pred_distri) # xyxy, (b, h*w, 4)_, target_bboxes, target_scores, fg_mask, target_gt_idx = self.assigner(pred_scores.detach().sigmoid(), (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype),anchor_points * stride_tensor, gt_labels, gt_bboxes, mask_gt)target_scores_sum = max(target_scores.sum(), 1)# cls loss# loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum # VFL wayloss[2] = self.bce(pred_scores, target_scores.to(dtype)).sum() / target_scores_sum # BCEif fg_mask.sum():# bbox lossloss[0], loss[3] = self.bbox_loss(pred_distri, pred_bboxes, anchor_points, target_bboxes / stride_tensor,target_scores, target_scores_sum, fg_mask)# masks lossmasks = batch['masks'].to(self.device).float()if tuple(masks.shape[-2:]) != (mask_h, mask_w): # downsamplemasks = F.interpolate(masks[None], (mask_h, mask_w), mode='nearest')[0]for i in range(batch_size):if fg_mask[i].sum():mask_idx = target_gt_idx[i][fg_mask[i]]if self.overlap:gt_mask = torch.where(masks[[i]] == (mask_idx + 1).view(-1, 1, 1), 1.0, 0.0)else:gt_mask = masks[batch_idx.view(-1) == i][mask_idx]xyxyn = target_bboxes[i][fg_mask[i]] / imgsz[[1, 0, 1, 0]]marea = xyxy2xywh(xyxyn)[:, 2:].prod(1)mxyxy = xyxyn * torch.tensor([mask_w, mask_h, mask_w, mask_h], device=self.device)loss[1] += self.single_mask_loss(gt_mask, pred_masks[i][fg_mask[i]], proto[i], mxyxy,marea) # seg loss# WARNING: Uncomment lines below in case of Multi-GPU DDP unused gradient errors# else:# loss[1] += proto.sum() * 0 + pred_masks.sum() * 0# else:# loss[1] += proto.sum() * 0 + pred_masks.sum() * 0loss[0] *= self.hyp.box # box gainloss[1] *= self.hyp.box / batch_size # seg gainloss[2] *= self.hyp.cls # cls gainloss[3] *= self.hyp.dfl # dfl gainreturn loss.sum() * batch_size, loss.detach() # loss(box, cls, dfl)def single_mask_loss(self, gt_mask, pred, proto, xyxy, area):# Mask loss for one imagepred_mask = (pred @ proto.view(self.nm, -1)).view(-1, *proto.shape[1:]) # (n, 32) @ (32,80,80) -> (n,80,80)loss = F.binary_cross_entropy_with_logits(pred_mask, gt_mask, reduction='none')return (crop_mask(loss, xyxy).mean(dim=(1, 2)) / area).mean()此代碼定義train函數,可用于使用Ultralytics的 YOLO 庫訓練分割
模型。該函數接受一個配置對象,該對象指定用于訓練的模型、數據集和設
備。如果設置為,則該函數使用YOLO的Python API來訓練模型。否則,
它會創建該類的實例并使用其方法訓練模型。該類是 的包裝器,它提供了使
用 YOLO訓練分割模型的附加功能。 cfguse_pythonTrueSegmentationTrainertrainSegmentationTrainertorch.nn.Moduledef train(cfg=DEFAULT_CFG, use_python=False):model = cfg.model or 'best(1).pt'data = cfg.data or 'cityspaces.yaml' # or yolo.ClassificationDataset("mnist")device = cfg.device if cfg.device is not None else '2'args = dict(model=model, data=data, device=device)if use_python:from ultralytics import YOLOYOLO(model).train(**args)else:trainer = SegmentationTrainer(overrides=args)trainer.train()if __name__ == '__main__':train()
ylov8訓練輸出結果
root@notebook-rn-20230301115620425bi81-k2w3o-0:/home/shenlan08/lihanlin_shijian/last_task/ultralytics# /root/miniconda3/bin/python /home/shenlan08/lihanlin_shijian/last_task/ultralytics/ultralytics/yolo/v8/segment/train.py
WARNING ?? Ultralytics settings reset to defaults. This is normal and may be due to a recent ultralytics package update, but may have overwritten previous settings.
View and update settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.yaml'
New https://pypi.org/project/ultralytics/8.0.81 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.61 🚀 Python-3.8.5 torch-1.8.0+cu111 CUDA:2 (GeForce RTX 2080 Ti, 11019MiB)CUDA:3 (GeForce RTX 2080 Ti, 11019MiB)
yolo/engine/trainer: task=segment, mode=train, model=yolov8m-seg.pt, data=cityspaces.yaml, epochs=100, patience=50, batch=6, imgsz=640, save=True, save_period=-1, cache=False, device=2,3, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, hide_labels=False, hide_conf=False, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, fl_gamma=0.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, save_dir=/home/shenlan08/lihanlin_shijian/last_task/ultralytics/runs/segment/train6
Overriding model.yaml nc=80 with nc=18from n params module arguments 0 -1 1 1392 ultralytics.nn.modules.Conv [3, 48, 3, 2] 1 -1 1 41664 ultralytics.nn.modules.Conv [48, 96, 3, 2] 2 -1 2 111360 ultralytics.nn.modules.C2f [96, 96, 2, True] 3 -1 1 166272 ultralytics.nn.modules.Conv [96, 192, 3, 2] 4 -1 4 813312 ultralytics.nn.modules.C2f [192, 192, 4, True] 5 -1 1 664320 ultralytics.nn.modules.Conv [192, 384, 3, 2] 6 -1 4 3248640 ultralytics.nn.modules.C2f [384, 384, 4, True] 7 -1 1 1991808 ultralytics.nn.modules.Conv [384, 576, 3, 2] 8 -1 2 3985920 ultralytics.nn.modules.C2f [576, 576, 2, True] 9 -1 1 831168 ultralytics.nn.modules.SPPF [576, 576, 5] 10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 11 [-1, 6] 1 0 ultralytics.nn.modules.Concat [1] 12 -1 2 1993728 ultralytics.nn.modules.C2f [960, 384, 2] 13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 14 [-1, 4] 1 0 ultralytics.nn.modules.Concat [1] 15 -1 2 517632 ultralytics.nn.modules.C2f [576, 192, 2] 16 -1 1 332160 ultralytics.nn.modules.Conv [192, 192, 3, 2] 17 [-1, 12] 1 0 ultralytics.nn.modules.Concat [1] 18 -1 2 1846272 ultralytics.nn.modules.C2f [576, 384, 2] 19 -1 1 1327872 ultralytics.nn.modules.Conv [384, 384, 3, 2] 20 [-1, 9] 1 0 ultralytics.nn.modules.Concat [1] 21 -1 2 4207104 ultralytics.nn.modules.C2f [960, 576, 2] 22 [15, 18, 21] 1 5169446 ultralytics.nn.modules.Segment [18, 32, 192, [192, 384, 576]]
YOLOv8m-seg summary: 331 layers, 27250070 parameters, 27250054 gradients, 110.4 GFLOPsTransferred 531/537 items from pretrained weights
Running DDP command ['/root/miniconda3/bin/python', '-m', 'torch.distributed.launch', '--nproc_per_node', '2', '--master_port', '57941', '/home/shenlan08/lihanlin_shijian/last_task/ultralytics/ultralytics/yolo/v8/segment/train.py', 'task=segment', 'mode=train', 'model=yolov8m-seg.pt', 'data=cityspaces.yaml', 'epochs=100', 'patience=50', 'batch=6', 'imgsz=640', 'save=True', 'save_period=-1', 'cache=False', 'device=2,3', 'workers=8', 'project=None', 'name=None', 'exist_ok=False', 'pretrained=False', 'optimizer=SGD', 'verbose=True', 'seed=0', 'deterministic=True', 'single_cls=False', 'image_weights=False', 'rect=False', 'cos_lr=False', 'close_mosaic=10', 'resume=False', 'amp=True', 'overlap_mask=True', 'mask_ratio=4', 'dropout=0.0', 'val=True', 'split=val', 'save_json=False', 'save_hybrid=False', 'conf=None', 'iou=0.7', 'max_det=300', 'half=False', 'dnn=False', 'plots=True', 'source=None', 'show=False', 'save_txt=False', 'save_conf=False', 'save_crop=False', 'hide_labels=False', 'hide_conf=False', 'vid_stride=1', 'line_thickness=3', 'visualize=False', 'augment=False', 'agnostic_nms=False', 'classes=None', 'retina_masks=False', 'boxes=True', 'format=torchscript', 'keras=False', 'optimize=False', 'int8=False', 'dynamic=False', 'simplify=False', 'opset=None', 'workspace=4', 'nms=False', 'lr0=0.01', 'lrf=0.01', 'momentum=0.937', 'weight_decay=0.0005', 'warmup_epochs=3.0', 'warmup_momentum=0.8', 'warmup_bias_lr=0.1', 'box=7.5', 'cls=0.5', 'dfl=1.5', 'fl_gamma=0.0', 'label_smoothing=0.0', 'nbs=64', 'hsv_h=0.015', 'hsv_s=0.7', 'hsv_v=0.4', 'degrees=0.0', 'translate=0.1', 'scale=0.5', 'shear=0.0', 'perspective=0.0', 'flipud=0.0', 'fliplr=0.5', 'mosaic=1.0', 'mixup=0.0', 'copy_paste=0.0', 'cfg=None', 'v5loader=False', 'tracker=botsort.yaml']
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
New https://pypi.org/project/ultralytics/8.0.81 available 😃 Update with 'pip install -U ultralytics'
Overriding model.yaml nc=80 with nc=18
Transferred 531/537 items from pretrained weights
DDP settings: RANK 0, WORLD_SIZE 2, DEVICE cuda:0
TensorBoard: Start with 'tensorboard --logdir /home/shenlan08/lihanlin_shijian/last_task/ultralytics/runs/segment/train6', view at http://localhost:6006/
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ?
optimizer: SGD(lr=0.01) with parameter groups 86 weight(decay=0.0), 97 weight(decay=0.000515625), 96 bias
train: Scanning /home/shenlan08/lihanlin_shijian/cityspaces/labels/train... 2975 images, 0 backgrounds, 3 corrupt: 100%|██████████| 2975/2975 [00:22<00:00, 129.88it/s]
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000000_000019_leftImg8bit.png: 6 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000026_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000030_000019_leftImg8bit.png: 4 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000031_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000034_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000042_000019_leftImg8bit.png: 3 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000054_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000058_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000060_000019_leftImg8bit.png: 24 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000061_000019_leftImg8bit.png: 40 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000062_000019_leftImg8bit.png: 9 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000064_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000070_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000072_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000078_000019_leftImg8bit.png: 4 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000079_000019_leftImg8bit.png: 5 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000080_000019_leftImg8bit.png: 45 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000090_000019_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/aachen_000092_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_020655_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_022645_leftImg8bit.png: 4 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_026356_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_034015_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_034141_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_036051_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_038927_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_043102_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_047499_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_047870_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_051152_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_051271_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/hanover_000000_051536_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/ulm_000056_000019_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/ulm_000058_000019_leftImg8bit.png: 3 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/ulm_000067_000019_leftImg8bit.png: 3 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/ulm_000068_000019_leftImg8bit.png: 3 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/ulm_000069_000019_leftImg8bit.png: 3 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/ulm_000078_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/ulm_000079_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/ulm_000084_000019_leftImg8bit.png: 2 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/weimar_000013_000019_leftImg8bit.png: ignoring corrupt image/label: non-normalized or out of bounds coordinates [ 1.001]
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/weimar_000024_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/zurich_000027_000019_leftImg8bit.png: 1 duplicate labels removed
train: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/train/zurich_000107_000019_leftImg8bit.png: 1 duplicate labels removed
train: New cache created: /home/shenlan08/lihanlin_shijian/cityspaces/labels/train.cache
val: Scanning /home/shenlan08/lihanlin_shijian/cityspaces/labels/val... 500 images, 0 backgrounds, 3 corrupt: 100%|██████████| 500/500 [00:04<00:00, 113.83it/s]
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/frankfurt_000001_016273_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/frankfurt_000001_017101_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/frankfurt_000001_028232_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/frankfurt_000001_044787_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/frankfurt_000001_046272_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/frankfurt_000001_077434_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/lindau_000000_000019_leftImg8bit.png: ignoring corrupt image/label: non-normalized or out of bounds coordinates [ 1.0034]
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/lindau_000013_000019_leftImg8bit.png: ignoring corrupt image/label: non-normalized or out of bounds coordinates [ 1.0005]
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/lindau_000018_000019_leftImg8bit.png: ignoring corrupt image/label: non-normalized or out of bounds coordinates [ 1.001]
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000015_000019_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000026_000019_leftImg8bit.png: 2 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000029_000019_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000049_000019_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000108_000019_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000140_000019_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000141_000019_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000142_000019_leftImg8bit.png: 1 duplicate labels removed
val: WARNING ?? /home/shenlan08/lihanlin_shijian/cityspaces/images/val/munster_000167_000019_leftImg8bit.png: 1 duplicate labels removed
val: New cache created: /home/shenlan08/lihanlin_shijian/cityspaces/labels/val.cache
Plotting labels to /home/shenlan08/lihanlin_shijian/last_task/ultralytics/runs/segment/train6/labels.jpg...
Image sizes 640 train, 640 val
Using 6 dataloader workers
Logging results to /home/shenlan08/lihanlin_shijian/last_task/ultralytics/runs/segment/train6
Starting training for 100 epochs...Epoch GPU_mem box_loss seg_loss cls_loss dfl_loss Instances Size1/100 5.21G 1.501 3.665 1.669 1.309 172 640: 100%|██████████| 496/496 [04:18<00:00, 1.92it/s]Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 83/83 [00:20<00:00, 4.07it/s]all 497 32583 0.511 0.367 0.369 0.22 0.446 0.295 0.274 0.115Epoch GPU_mem box_loss seg_loss cls_loss dfl_loss Instances Size2/100 7.09G 1.375 3.271 1.17 1.228 54 640: 100%|██████████| 496/496 [04:18<00:00, 1.92it/s]Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 83/83 [00:12<00:00, 6.72it/s]all 497 32583 0.602 0.354 0.396 0.232 0.508 0.286 0.293 0.125Epoch GPU_mem box_loss seg_loss cls_loss dfl_loss Instances Size3/100 3.41G 1.41 3.316 1.172 1.252 233 640: 6%|▌ | 30/496 [00:16<03:02, 2.55it/s]
yolop訓練代碼,訓練數據集bdd100k
import argparse
import os, sys
import math
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.append(BASE_DIR)
#os.environ['RANK']='0'
#os.environ['WORLD_SIZE']='4'
#os.environ['MASTER_ADDR'] = 'localhost'
#os.environ['MASTER_PORT'] = '5678'import pprint
import time
import torch
import torch.nn.parallel
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.cuda import amp
import torch.distributed as dist
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import numpy as np
from lib.utils import DataLoaderX, torch_distributed_zero_first
from tensorboardX import SummaryWriterimport lib.dataset as dataset
from lib.config import cfg
from lib.config import update_config
from lib.core.loss import get_loss
from lib.core.function import train
from lib.core.function import validate
from lib.core.general import fitness
from lib.models import get_net
from lib.utils import is_parallel
from lib.utils.utils import get_optimizer
from lib.utils.utils import save_checkpoint
from lib.utils.utils import create_logger, select_device
from lib.utils import run_anchor這是一個為 Python 腳本定義和解析命令行參數的函數。它使用argparse模塊來定義和解析將在腳本運行時傳遞給腳本的參數。該函數定義了幾個參數,包括模型、日志和數據的目錄,以及使用SyncBatchNorm 的標志和對象置信度閾值的參數以及非最大抑制 (NMS) 的IOU 閾值。parse_args()然后該函數使用該類argparse.ArgumentParser創建一個解析器對象并將定義的參數添加到它。然后它解析命令行參數并將它們作為args對象返回。此函數簡化了將參數傳遞給腳本的過程,并且可以更輕松地根據用戶需要自定義腳本的行為。def parse_args():parser = argparse.ArgumentParser(description='Train Multitask network')# general# parser.add_argument('--cfg',# help='experiment configure file name',# required=True,# type=str)# phillyparser.add_argument('--modelDir',help='model directory',type=str,default='')parser.add_argument('--logDir',help='log directory',type=str,default='runs/')parser.add_argument('--dataDir',help='data directory',type=str,default='')parser.add_argument('--prevModelDir',help='prev Model directory',type=str,default='')parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold')parser.add_argument('--iou-thres', type=float, default=0.6, help='IOU threshold for NMS')args = parser.parse_args()return argsdef main():
此代碼塊設置腳本的配置并初始化 DDP(分布式數據并行)變量。它首先調用parse_args()函數來解析命令行參數并相應地更新配置文件。接下來,它通過設置和基于環境變量來初始化DDP 變量。如果未設置這些變量,則設置為 1 并設置為 -1。world_sizeglobal_rankworld_sizeglobal_rank該腳本然后創建一個記錄器并設置 tensorboard 日志目錄。如果等級為 -1 或 0,則該函數將參數和配置打印到記錄器并為張量板編寫器設置字典。最后,它設置了 cudnn 相關設置。總的來說,此代碼塊為分布式訓練設置了必要的配置和DDP變量,并設置了用于監控訓練過程的記錄器和張量板編寫器。# set all the configurationsargs = parse_args()update_config(cfg, args)# Set DDP variablesworld_size = int(os.environ['WORLD_SIZE']) if 'WORLD_SIZE' in os.environ else 1global_rank = int(os.environ['RANK']) if 'RANK' in os.environ else -1# dist.init_process_group("nccl")# rank = dist.get_rank()# print(f"Start running basic DDP example on rank {rank}.")rank = global_rank#print(rank)# TODO: handle distributed training logger# set the logger, tb_log_dir means tensorboard logdirlogger, final_output_dir, tb_log_dir = create_logger(cfg, cfg.LOG_DIR, 'train', rank=rank)if rank in [-1, 0]:logger.info(pprint.pformat(args))logger.info(cfg)writer_dict = {'writer': SummaryWriter(log_dir=tb_log_dir),'train_global_steps': 0,'valid_global_steps': 0,}else:writer_dict = None此代碼與CUDA相關設置并使用函數構建模型get_net()。將用于訓練的get_net()函數神經網絡模型。接下來,它定義了用于訓練的損失函數和優化器。和函數用于根據配置獲得適當的損失函數和優化器get_loss()。get_optimizer()然后,它使用實現余弦學習率退火策略的lambda 函數設置學習率調度程序。開始紀元被設置為配置中指定的值。如果等級為 -1 或 0,則該函數會嘗試從日志目錄加載檢查點模型。如果在配置中指定了預訓練模型路徑,則會加載預訓練模型。model使用函數將檢查點模型或預訓練模型加載到對象中load_state_dict()。最后,它初始化用于存儲最佳性能、最佳模型和最后一個紀元的變量。# cudnn related settingcudnn.benchmark = cfg.CUDNN.BENCHMARKtorch.backends.cudnn.deterministic = cfg.CUDNN.DETERMINISTICtorch.backends.cudnn.enabled = cfg.CUDNN.ENABLED# bulid up model# start_time = time.time()print("begin to bulid up model...")# DP modedevice = select_device(logger, batch_size=cfg.TRAIN.BATCH_SIZE_PER_GPU* len(cfg.GPUS)) if not cfg.DEBUG \else select_device(logger, 'cpu')# device_id = rank % torch.cuda.device_count()if args.local_rank != -1:assert torch.cuda.device_count() > args.local_ranktorch.cuda.set_device(args.local_rank)device = torch.device('cuda', args.local_rank)dist.init_process_group(backend='nccl', init_method='env://', world_size=world_size, rank=rank) # distributed backendprint("load model to device")model = get_net(cfg).to(device)# model = DDP(model, device_ids=[device_id])# print("load finished")#model = model.to(device)# print("finish build model")# define loss function (criterion) and optimizercriterion = get_loss(cfg, device=device)optimizer = get_optimizer(cfg, model)# load checkpoint modelbest_perf = 0.0best_model = Falselast_epoch = -1Encoder_para_idx = [str(i) for i in range(0, 17)]Det_Head_para_idx = [str(i) for i in range(17, 25)]Da_Seg_Head_para_idx = [str(i) for i in range(25, 34)]Ll_Seg_Head_para_idx = [str(i) for i in range(34,43)]lf = lambda x: ((1 + math.cos(x * math.pi / cfg.TRAIN.END_EPOCH)) / 2) * \(1 - cfg.TRAIN.LRF) + cfg.TRAIN.LRF # cosinelr_scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)begin_epoch = cfg.TRAIN.BEGIN_EPOCHif rank in [-1, 0]:checkpoint_file = os.path.join(os.path.join(cfg.LOG_DIR, cfg.DATASET.DATASET), 'checkpoint.pth')if os.path.exists(cfg.MODEL.PRETRAINED):logger.info("=> loading model '{}'".format(cfg.MODEL.PRETRAINED))checkpoint_state_dict = torch.load(cfg.MODEL.PRETRAINED) # begin_epoch = checkpoint['epoch']# # best_perf = checkpoint['perf']# last_epoch = checkpoint['epoch']# model.load_state_dict(checkpoint['state_dict'])# checkpoint1 = checkpoint['state_dict'] #分割類別變化了,會影響模型參數加載# -------begin看起來這段代碼是用于訓練神經網絡模型的PyTorch 腳本的一部分。以下是代碼作用的簡要總結:該腳本為模型加載一個預訓練的檢查點,其中包括神經網絡的權重和優化器狀態。
它檢查檢查點中層的形狀是否與當前模型中層的形狀匹配。如果不匹配,它會從檢查點中刪除該層。
它將檢查點加載到模型中,或者針對整個模型,或者針對模型的特定分支,具體取決于配置。
它根據配置凍結模型的某些層。
如果有多個 GPU 可用,它會使用DataParallel或DistributedDataParallel跨多個 GPU 并行訓練。
值得注意的是,用于凍結層和加載模型不同分支的特定配置選項可能特定于正在訓練的特定模型,因此如果沒有額外的上下文,很難詳細說明這段代碼的作用。 checkpoint_state_dict = checkpoint_state_dict['model']print(checkpoint_state_dict)checkpoint_state_dict = checkpoint_state_dict.float().state_dict()model_state_dict = model.state_dict()for k in list(checkpoint_state_dict.keys()):if k in model_state_dict:shape_model = tuple(model_state_dict[k].shape)shape_checkpoint = tuple(checkpoint_state_dict[k].shape)if shape_model != shape_checkpoint:# incorrect_shapes.append((k, shape_checkpoint, shape_model))checkpoint_state_dict.pop(k)print(k, shape_model, shape_checkpoint)else:print(k, ' layer is missing!')model.load_state_dict(checkpoint_state_dict, strict=False)freeze = [f'model.{x}.' for x in range(15)] # layers to freeze for k, v in model.named_parameters(): v.requires_grad = True # train all layers if any(x in k for x in freeze): print(f'freezing {k}') v.requires_grad = False# optimizer.load_state_dict(checkpoint['optimizer'])logger.info(f"=> loaded checkpoint '{cfg.MODEL.PRETRAINED}'")#cfg.NEED_AUTOANCHOR = False #disable autoanchorif os.path.exists(cfg.MODEL.PRETRAINED_DET):logger.info("=> loading model weight in det branch from '{}'".format(cfg.MODEL.PRETRAINED))det_idx_range = [str(i) for i in range(0,25)]model_dict = model.state_dict()checkpoint_file = cfg.MODEL.PRETRAINED_DETcheckpoint = torch.load(checkpoint_file)begin_epoch = checkpoint['epoch']last_epoch = checkpoint['epoch']checkpoint_dict = {k: v for k, v in checkpoint['state_dict'].items() if k.split(".")[1] in det_idx_range}model_dict.update(checkpoint_dict)model.load_state_dict(model_dict)logger.info("=> loaded det branch checkpoint '{}' ".format(checkpoint_file))if cfg.AUTO_RESUME and os.path.exists(checkpoint_file):logger.info("=> loading checkpoint '{}'".format(checkpoint_file))checkpoint = torch.load(checkpoint_file)begin_epoch = checkpoint['epoch']# best_perf = checkpoint['perf']last_epoch = checkpoint['epoch']model.load_state_dict(checkpoint['state_dict'])# optimizer = get_optimizer(cfg, model)optimizer.load_state_dict(checkpoint['optimizer'])logger.info("=> loaded checkpoint '{}' (epoch {})".format(checkpoint_file, checkpoint['epoch']))#cfg.NEED_AUTOANCHOR = False #disable autoanchor# model = model.to(device)if cfg.TRAIN.SEG_ONLY: #Only train two segmentation branchslogger.info('freeze encoder and Det head...')for k, v in model.named_parameters():v.requires_grad = True # train all layersif k.split(".")[1] in Encoder_para_idx + Det_Head_para_idx:print('freezing %s' % k)v.requires_grad = Falseif cfg.TRAIN.DET_ONLY: #Only train detection branchlogger.info('freeze encoder and two Seg heads...')# print(model.named_parameters)for k, v in model.named_parameters():v.requires_grad = True # train all layersif k.split(".")[1] in Encoder_para_idx + Da_Seg_Head_para_idx + Ll_Seg_Head_para_idx:print('freezing %s' % k)v.requires_grad = Falseif cfg.TRAIN.ENC_SEG_ONLY: # Only train encoder and two segmentation branchslogger.info('freeze Det head...')for k, v in model.named_parameters():v.requires_grad = True # train all layers if k.split(".")[1] in Det_Head_para_idx:print('freezing %s' % k)v.requires_grad = Falseif cfg.TRAIN.ENC_DET_ONLY or cfg.TRAIN.DET_ONLY: # Only train encoder and detection branchslogger.info('freeze two Seg heads...')for k, v in model.named_parameters():v.requires_grad = True # train all layersif k.split(".")[1] in Da_Seg_Head_para_idx + Ll_Seg_Head_para_idx:print('freezing %s' % k)v.requires_grad = Falseif cfg.TRAIN.LANE_ONLY: logger.info('freeze encoder and Det head and Da_Seg heads...')# print(model.named_parameters)for k, v in model.named_parameters():v.requires_grad = True # train all layersif k.split(".")[1] in Encoder_para_idx + Da_Seg_Head_para_idx + Det_Head_para_idx:print('freezing %s' % k)v.requires_grad = Falseif cfg.TRAIN.DRIVABLE_ONLY:logger.info('freeze encoder and Det head and Ll_Seg heads...')# print(model.named_parameters)for k, v in model.named_parameters():v.requires_grad = True # train all layersif k.split(".")[1] in Encoder_para_idx + Ll_Seg_Head_para_idx + Det_Head_para_idx:print('freezing %s' % k)v.requires_grad = Falseif rank == -1 and torch.cuda.device_count() > 1:model = torch.nn.DataParallel(model, device_ids=cfg.GPUS)# model = torch.nn.DataParallel(model, device_ids=cfg.GPUS).cuda()# # DDP modeif rank != -1:model = DDP(model, device_ids=[args.local_rank], output_device=args.local_rank,find_unused_parameters=True)這是用于訓練計算機視覺模型的代碼片段。代碼初始化模型參數,加載數
據,然后訓練模型指定的 epoch 數。在訓練期間,使用梯度下降優化模
型,并使用學習率調度程序調整學習率。該代碼還以指定的頻率在驗證集上
評估模型,并保存性能最佳的模型檢查點。訓練過程的輸出包括模型的性能
指標,例如準確性、IOU、精度、召回率和 mAP。它還將最終模型狀態保存
到文件中。# assign model paramsmodel.gr = 1.0model.nc = 3 #13 #1# print('bulid model finished')print("begin to load data")# Data loadingnormalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])train_dataset = eval('dataset.' + cfg.DATASET.DATASET)(cfg=cfg,is_train=True,inputsize=cfg.MODEL.IMAGE_SIZE,transform=transforms.Compose([transforms.ToTensor(),normalize,]))train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) if rank != -1 else Nonetrain_loader = DataLoaderX(train_dataset,batch_size=cfg.TRAIN.BATCH_SIZE_PER_GPU * len(cfg.GPUS),shuffle=(cfg.TRAIN.SHUFFLE & rank == -1),num_workers=cfg.WORKERS,sampler=train_sampler,pin_memory=cfg.PIN_MEMORY,collate_fn=dataset.AutoDriveDataset.collate_fn)num_batch = len(train_loader)if rank in [-1, 0]:valid_dataset = eval('dataset.' + cfg.DATASET.DATASET)(cfg=cfg,is_train=False,inputsize=cfg.MODEL.IMAGE_SIZE,transform=transforms.Compose([transforms.ToTensor(),normalize,]))valid_loader = DataLoaderX(valid_dataset,batch_size=cfg.TEST.BATCH_SIZE_PER_GPU * len(cfg.GPUS),shuffle=False,num_workers=cfg.WORKERS,pin_memory=cfg.PIN_MEMORY,collate_fn=dataset.AutoDriveDataset.collate_fn)print('load data finished')if rank in [-1, 0]:if cfg.NEED_AUTOANCHOR:logger.info("begin check anchors")run_anchor(logger,train_dataset, model=model, thr=cfg.TRAIN.ANCHOR_THRESHOLD, imgsz=min(cfg.MODEL.IMAGE_SIZE))else:logger.info("anchors loaded successfully")det = model.module.model[model.module.detector_index] if is_parallel(model) \else model.model[model.detector_index]logger.info(str(det.anchors))# trainingnum_warmup = max(round(cfg.TRAIN.WARMUP_EPOCHS * num_batch), 1000)scaler = amp.GradScaler(enabled=device.type != 'cpu')print('=> start training...')for epoch in range(begin_epoch+1, cfg.TRAIN.END_EPOCH+1):if rank != -1:train_loader.sampler.set_epoch(epoch)# train for one epochtrain(cfg, train_loader, model, criterion, optimizer, scaler,epoch, num_batch, num_warmup, writer_dict, logger, device, rank)lr_scheduler.step()# evaluate on validation setif (epoch % cfg.TRAIN.VAL_FREQ == 0 or epoch == cfg.TRAIN.END_EPOCH) and rank in [-1, 0]:# print('validate')da_segment_results,ll_segment_results,detect_results, total_loss,maps, times = validate(epoch,cfg, valid_loader, valid_dataset, model, criterion,final_output_dir, tb_log_dir, writer_dict,logger, device, rank)fi = fitness(np.array(detect_results).reshape(1, -1)) #目標檢測評價指標msg = 'Epoch: [{0}] Loss({loss:.3f})\n' \'Driving area Segment: Acc({da_seg_acc:.3f}) IOU ({da_seg_iou:.3f}) mIOU({da_seg_miou:.3f})\n' \'Lane line Segment: Acc({ll_seg_acc:.3f}) IOU ({ll_seg_iou:.3f}) mIOU({ll_seg_miou:.3f})\n' \'Detect: P({p:.3f}) R({r:.3f}) mAP@0.5({map50:.3f}) mAP@0.5:0.95({map:.3f})\n'\'Time: inference({t_inf:.4f}s/frame) nms({t_nms:.4f}s/frame)'.format(epoch, loss=total_loss, da_seg_acc=da_segment_results[0],da_seg_iou=da_segment_results[1],da_seg_miou=da_segment_results[2],ll_seg_acc=ll_segment_results[0],ll_seg_iou=ll_segment_results[1],ll_seg_miou=ll_segment_results[2],p=detect_results[0],r=detect_results[1],map50=detect_results[2],map=detect_results[3],t_inf=times[0], t_nms=times[1])logger.info(msg)# if perf_indicator >= best_perf:# best_perf = perf_indicator# best_model = True# else:# best_model = False# save checkpoint model and best modelif rank in [-1, 0]:savepath = os.path.join(final_output_dir, f'epoch-{epoch}.pth')logger.info('=> saving checkpoint to {}'.format(savepath))save_checkpoint(epoch=epoch,name=cfg.MODEL.NAME,model=model,# 'best_state_dict': model.module.state_dict(),# 'perf': perf_indicator,optimizer=optimizer,output_dir=final_output_dir,filename=f'epoch-{epoch}.pth')save_checkpoint(epoch=epoch,name=cfg.MODEL.NAME,model=model,# 'best_state_dict': model.module.state_dict(),# 'perf': perf_indicator,optimizer=optimizer,output_dir=os.path.join(cfg.LOG_DIR, cfg.DATASET.DATASET),filename='checkpoint.pth')# save final modelif rank in [-1, 0]:final_model_state_file = os.path.join(final_output_dir, 'final_state.pth')logger.info('=> saving final model state to {}'.format(final_model_state_file))model_state = model.module.state_dict() if is_parallel(model) else model.state_dict()torch.save(model_state, final_model_state_file)writer_dict['writer'].close()else:dist.destroy_process_group()if __name__ == '__main__':main()
yolop訓練輸出
(base) root@notebook-rn-20230301115620425bi81-k2w3o-0:/home/shenlan08/lihanlin_shijian/YOLOP/tools# /home/shenlan08/miniconda3/bin/python /home/shenlan08/lihanlin_shijian/YOLOP/tools/train.py
=> creating runs/BddDataset/_2023-04-17-10-19
Namespace(modelDir='', logDir='runs/', dataDir='', prevModelDir='', sync_bn=False, local_rank=-1, conf_thres=0.001, iou_thres=0.6)
AUTO_RESUME: False
CUDNN:BENCHMARK: TrueDETERMINISTIC: FalseENABLED: True
DATASET:COLOR_RGB: FalseDATAROOT: /home/shenlan08/lihanlin_shijian/bdd100k/imagesDATASET: BddDatasetDATA_FORMAT: jpgFLIP: TrueHSV_H: 0.015HSV_S: 0.7HSV_V: 0.4#檢測框LABELROOT: /home/shenlan08/lihanlin_shijian/bdd100k/det_annotations#ll這個是車道線LANEROOT: /home/shenlan08/lihanlin_shijian/bdd100k/ll_seg_annotations#掩碼分割MASKROOT: /home/shenlan08/lihanlin_shijian/bdd100k/daORG_IMG_SIZE: [720, 1280]ROT_FACTOR: 10SCALE_FACTOR: 0.25SELECT_DATA: FalseSHEAR: 0.0TEST_SET: valTRAIN_SET: trainTRANSLATE: 0.1
DEBUG: False
GPUS: (0, 1)
LOG_DIR: runs/
LOSS:BOX_GAIN: 0.05CLS_GAIN: 0.5CLS_POS_WEIGHT: 1.0DA_SEG_GAIN: 0.2FL_GAMMA: 0.0LL_IOU_GAIN: 0.2LL_SEG_GAIN: 0.2LOSS_NAME: MULTI_HEAD_LAMBDA: NoneOBJ_GAIN: 1.0OBJ_POS_WEIGHT: 1.0SEG_POS_WEIGHT: 1.0
MODEL:EXTRA:HEADS_NAME: ['']IMAGE_SIZE: [640, 640]NAME: PRETRAINED: bes.ptPRETRAINED_DET: STRU_WITHSHARE: False
NEED_AUTOANCHOR: True
PIN_MEMORY: False
PRINT_FREQ: 20
TEST:BATCH_SIZE_PER_GPU: 24MODEL_FILE: NMS_CONF_THRESHOLD: 0.1NMS_IOU_THRESHOLD: 0.2PLOTS: TrueSAVE_JSON: FalseSAVE_TXT: False
TRAIN:ANCHOR_THRESHOLD: 4.0BATCH_SIZE_PER_GPU: 12BEGIN_EPOCH: 0DET_ONLY: FalseDRIVABLE_ONLY: FalseENC_DET_ONLY: FalseENC_SEG_ONLY: FalseEND_EPOCH: 240GAMMA1: 0.99GAMMA2: 0.0IOU_THRESHOLD: 0.2LANE_ONLY: FalseLR0: 0.001LRF: 0.2MOMENTUM: 0.937NESTEROV: TrueOPTIMIZER: adamPLOT: TrueSEG_ONLY: FalseSHUFFLE: TrueVAL_FREQ: 1WARMUP_BIASE_LR: 0.1WARMUP_EPOCHS: 3.0WARMUP_MOMENTUM: 0.8WD: 0.0005
WORKERS: 8
num_seg_class: 2
begin to bulid up model...
Using torch 1.13.1+cu117 CUDA:0 (GeForce RTX 2080 Ti, 11019MB)CUDA:1 (GeForce RTX 2080 Ti, 11019MB)CUDA:2 (GeForce RTX 2080 Ti, 11019MB)CUDA:3 (GeForce RTX 2080 Ti, 11019MB)load model to device
begin to load data
building database...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20000/20000 [00:25<00:00, 785.99it/s]
database build finish
building database...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3000/3000 [00:02<00:00, 1302.99it/s]
database build finish
load data finished
begin check anchors
WARNING: Extremely small objects found. 7434 of 246279 labels are < 3 pixels in width or height.
Running kmeans for 9 anchors on 246273 points...
thr=0.25: 0.9987 best possible recall, 4.53 anchors past thr
n=9, img_size=640, metric_all=0.309/0.729-mean/best, past_thr=0.488-mean: 5,15, 10,31, 11,74, 20,44, 34,74, 28,176, 57,115, 84,208, 124,345
Evolving anchors with Genetic Algorithm: fitness = 0.7592: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:35<00:00, 28.03it/s]
thr=0.25: 0.9992 best possible recall, 5.26 anchors past thr
n=9, img_size=640, metric_all=0.349/0.759-mean/best, past_thr=0.502-mean: 4,12, 7,18, 5,39, 11,29, 17,41, 24,63, 39,89, 70,146, 102,272
tensor([[[0.5445, 1.4860],[0.8506, 2.2737],[0.6714, 4.9291]],[[0.6855, 1.7993],[1.0550, 2.5667],[1.5176, 3.9344]],[[1.2133, 2.7916],[2.2017, 4.5758],[3.1775, 8.5005]]], device='cuda:0')
New anchors saved to model. Update model config to use these anchors in the future.
=> start training...
Traceback (most recent call last):File "/home/shenlan08/lihanlin_shijian/YOLOP/tools/train.py", line 436, in <module>main()File "/home/shenlan08/lihanlin_shijian/YOLOP/tools/train.py", line 363, in maintrain(cfg, train_loader, model, criterion, optimizer, scaler,File "/home/shenlan08/lihanlin_shijian/YOLOP/lib/core/function.py", line 77, in traintotal_loss, head_losses = criterion(outputs, target, shapes,model)File "/home/shenlan08/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_implreturn forward_call(*input, **kwargs)File "/home/shenlan08/lihanlin_shijian/YOLOP/lib/core/loss.py", line 50, in forwardtotal_loss, head_losses = self._forward_impl(head_fields, head_targets, shapes, model)File "/home/shenlan08/lihanlin_shijian/YOLOP/lib/core/loss.py", line 70, in _forward_impltcls, tbox, indices, anchors = build_targets(cfg, predictions[0], targets[0], model) # targetsFile "/home/shenlan08/lihanlin_shijian/YOLOP/lib/core/postprocess.py", line 74, in build_targetsindices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1))) # image, anchor, grid indices
RuntimeError: result type Float can't be cast to the desired output type long int
bd100k.yaml
# Ultralytics YOLO 🚀, GPL-3.0 license
# COCO128 dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017) by Ultralytics
# Example usage: yolo train data=coco128.yaml
# parent
# ├── yolov5
# └── datasets
# └── coco128 ← downloads here (7 MB)# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: /home/shenlan08/lihanlin_shijian/10k/images # dataset root dir
train: val # train images (relative to 'path') 128 images
val: val # val images (relative to 'path') 128 images
test: # test images (optional)# Classes
names:0: person1: bicycle2: car3: motorcycle4: airplane5: bus6: train7: truck8: boat9: traffic light10: fire hydrant11: stop sign12: parking meter13: bench14: bird15: cat16: dog17: horse18: sheep19: cow20: elephant21: bear22: zebra23: giraffe24: backpack25: umbrella26: handbag27: tie28: suitcase29: frisbee30: skis31: snowboard32: sports ball33: kite34: baseball bat35: baseball glove36: skateboard37: surfboard38: tennis racket39: bottle40: wine glass41: cup42: fork43: knife44: spoon45: bowl46: banana47: apple48: sandwich49: orange50: broccoli51: carrot52: hot dog53: pizza54: donut55: cake56: chair57: couch58: potted plant59: bed60: dining table61: toilet62: tv63: laptop64: mouse65: remote66: keyboard67: cell phone68: microwave69: oven70: toaster71: sink72: refrigerator73: book74: clock75: vase76: scissors77: teddy bear78: hair drier79: toothbrush
先訓練好yolov8,然后把v8的模型參數恢復到yolop上,凍結車道線以及可行駛區域這兩個分割頭之前的參數,然后訓練這兩個分割頭
v8 vp整合訓練代碼
# Ultralytics YOLO 🚀, GPL-3.0 license
from copy import copyimport torch
import torch.nn.functional as F
import sys
sys.path.append("/home/shenlan08/lihanlin_shijian/last_task/ultralytics/")from ultralytics.nn.tasks import SegmentationModel
from ultralytics.yolo import v8
from ultralytics.yolo.utils import DEFAULT_CFG, RANK
from ultralytics.yolo.utils.ops import crop_mask, xyxy2xywh
from ultralytics.yolo.utils.plotting import plot_images, plot_results
from ultralytics.yolo.utils.tal import make_anchors
from ultralytics.yolo.utils.torch_utils import de_parallel
from ultralytics.yolo.v8.detect.train import Loss這段代碼定義了一個名為 `SegmentationTrainer` 的類,它繼承自 `v8.detect.DetectionTrainer` 類。`SegmentationTrainer` 類覆蓋了一些方法,以便根據分割任務的需要進行調整。具體來說,這個類的實現如下:1. `__init__` 方法:該方法在類實例化時調用,用于初始化訓練器。它首先調用父類 `DetectionTrainer` 的 `__init__` 方法并傳遞配置和覆蓋參數。然后將任務類型設置為 `segment`。2. `get_model` 方法:該方法用于創建一個新的分割模型。它首先創建一個 `SegmentationModel` 的實例,并傳遞通道數和類別數。如果有指定的權重,則加載權重并返回模型。3. `get_validator` 方法:該方法用于創建一個新的分割驗證器。它返回一個 `v8.segment.SegmentationValidator` 的實例,并傳遞測試數據集和保存目錄參數。4. `criterion` 方法:該方法用于計算訓練損失。它首先創建一個 `SegLoss` 的實例,并傳遞模型和重疊參數。然后將預測值和批次數據作為輸入,通過 `SegLoss` 實例計算損失并返回。5. `plot_training_samples` 方法:該方法用于將訓練數據集中的一批樣本繪制成圖像。它從批次數據中提取圖像、掩碼、類別、邊界框和文件路徑,并調用 `plot_images` 函數將它們繪制成圖像并保存。6. `plot_metrics` 方法:該方法用于繪制訓練和驗證期間的度量指標。它調用 `plot_results` 函數,并傳遞 CSV 文件和 `segment=True` 參數,以便繪制分割任務的度量指標。# BaseTrainer python usage
class SegmentationTrainer(v8.detect.DetectionTrainer):def __init__(self, cfg=DEFAULT_CFG, overrides=None):if overrides is None:overrides = {}overrides['task'] = 'segment'super().__init__(cfg, overrides)def get_model(self, cfg=None, weights=None, verbose=True):model = SegmentationModel(cfg, ch=3, nc=self.data['nc'], verbose=verbose and RANK == -1)if weights:model.load(weights)return modeldef get_validator(self):self.loss_names = 'box_loss', 'seg_loss', 'cls_loss', 'dfl_loss'return v8.segment.SegmentationValidator(self.test_loader, save_dir=self.save_dir, args=copy(self.args))def criterion(self, preds, batch):if not hasattr(self, 'compute_loss'):self.compute_loss = SegLoss(de_parallel(self.model), overlap=self.args.overlap_mask)return self.compute_loss(preds, batch)def plot_training_samples(self, batch, ni):images = batch['img']masks = batch['masks']cls = batch['cls'].squeeze(-1)bboxes = batch['bboxes']paths = batch['im_file']batch_idx = batch['batch_idx']plot_images(images, batch_idx, cls, bboxes, masks, paths=paths, fname=self.save_dir / f'train_batch{ni}.jpg')def plot_metrics(self):plot_results(file=self.csv, segment=True) # save results.png這是用于訓練計算機視覺模型的分段損失函數的Python 代碼。損失函數旨在計算執行對象檢測和分割的模型訓練期間的訓練損失。損失函數接受模型的預測值和訓練數據中相應的真實值。預測值包括預測掩碼、邊界框和類別分數。地面實況值包括類標簽、邊界框坐標和分割掩碼。損失函數計算四種損失:框損失、類別損失、分割損失和變形損失。框損失衡量預測的邊界框坐標與地面真值邊界框坐標之間的差異。類損失衡量預測的類分數和真實類標簽之間的差異。分割損失衡量預測的分割掩碼和地面真值分割掩碼之間的差異。最后,變形損失衡量預測蒙版和地面真值蒙版之間的差異應用變形場后。損失函數還包括一些額外的步驟,例如對分割蒙版進行下采樣、將預測值分配給地面真值以及計算預測蒙版的面積。這些步驟對于確保損失函數準確且穩健是必要的。總的來說,這種損失函數是訓練執行對象檢測和分割的計算機視覺模型的關鍵組成部分。通過準確測量預測值和真實值之間的差異,損失函數幫助模型學習更好地識別和分割圖像中的對象。
# Criterion class for computing training losses
class SegLoss(Loss):def __init__(self, model, overlap=True): # model must be de-paralleledsuper().__init__(model)self.nm = model.model[-1].nm # number of masksself.overlap = overlapdef __call__(self, preds, batch):loss = torch.zeros(4, device=self.device) # box, cls, dflfeats, pred_masks, proto = preds if len(preds) == 3 else preds[1]batch_size, _, mask_h, mask_w = proto.shape # batch size, number of masks, mask height, mask widthpred_distri, pred_scores = torch.cat([xi.view(feats[0].shape[0], self.no, -1) for xi in feats], 2).split((self.reg_max * 4, self.nc), 1)# b, grids, ..pred_scores = pred_scores.permute(0, 2, 1).contiguous()pred_distri = pred_distri.permute(0, 2, 1).contiguous()pred_masks = pred_masks.permute(0, 2, 1).contiguous()dtype = pred_scores.dtypeimgsz = torch.tensor(feats[0].shape[2:], device=self.device, dtype=dtype) * self.stride[0] # image size (h,w)anchor_points, stride_tensor = make_anchors(feats, self.stride, 0.5)# targetstry:batch_idx = batch['batch_idx'].view(-1, 1)targets = torch.cat((batch_idx, batch['cls'].view(-1, 1), batch['bboxes'].to(dtype)), 1)targets = self.preprocess(targets.to(self.device), batch_size, scale_tensor=imgsz[[1, 0, 1, 0]])gt_labels, gt_bboxes = targets.split((1, 4), 2) # cls, xyxymask_gt = gt_bboxes.sum(2, keepdim=True).gt_(0)except RuntimeError as e:raise TypeError('ERROR ? segment dataset incorrectly formatted or not a segment dataset.\n'"This error can occur when incorrectly training a 'segment' model on a 'detect' dataset, ""i.e. 'yolo train model=yolov8n-seg.pt data=coco128.yaml'.\nVerify your dataset is a ""correctly formatted 'segment' dataset using 'data=coco128-seg.yaml' "'as an example.\nSee https://docs.ultralytics.com/tasks/segment/ for help.') from e# pboxespred_bboxes = self.bbox_decode(anchor_points, pred_distri) # xyxy, (b, h*w, 4)_, target_bboxes, target_scores, fg_mask, target_gt_idx = self.assigner(pred_scores.detach().sigmoid(), (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype),anchor_points * stride_tensor, gt_labels, gt_bboxes, mask_gt)target_scores_sum = max(target_scores.sum(), 1)# cls loss# loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum # VFL wayloss[2] = self.bce(pred_scores, target_scores.to(dtype)).sum() / target_scores_sum # BCEif fg_mask.sum():# bbox lossloss[0], loss[3] = self.bbox_loss(pred_distri, pred_bboxes, anchor_points, target_bboxes / stride_tensor,target_scores, target_scores_sum, fg_mask)# masks lossmasks = batch['masks'].to(self.device).float()if tuple(masks.shape[-2:]) != (mask_h, mask_w): # downsamplemasks = F.interpolate(masks[None], (mask_h, mask_w), mode='nearest')[0]for i in range(batch_size):if fg_mask[i].sum():mask_idx = target_gt_idx[i][fg_mask[i]]if self.overlap:gt_mask = torch.where(masks[[i]] == (mask_idx + 1).view(-1, 1, 1), 1.0, 0.0)else:gt_mask = masks[batch_idx.view(-1) == i][mask_idx]xyxyn = target_bboxes[i][fg_mask[i]] / imgsz[[1, 0, 1, 0]]marea = xyxy2xywh(xyxyn)[:, 2:].prod(1)mxyxy = xyxyn * torch.tensor([mask_w, mask_h, mask_w, mask_h], device=self.device)loss[1] += self.single_mask_loss(gt_mask, pred_masks[i][fg_mask[i]], proto[i], mxyxy, marea) # seg# WARNING: lines below prevents Multi-GPU DDP 'unused gradient' PyTorch errors, do not removeelse:loss[1] += (proto * 0).sum() + (pred_masks * 0).sum() # inf sums may lead to nan loss# WARNING: lines below prevent Multi-GPU DDP 'unused gradient' PyTorch errors, do not removeelse:loss[1] += (proto * 0).sum() + (pred_masks * 0).sum() # inf sums may lead to nan lossloss[0] *= self.hyp.box # box gainloss[1] *= self.hyp.box / batch_size # seg gainloss[2] *= self.hyp.cls # cls gainloss[3] *= self.hyp.dfl # dfl gainreturn loss.sum() * batch_size, loss.detach() # loss(box, cls, dfl)def single_mask_loss(self, gt_mask, pred, proto, xyxy, area):# Mask loss for one imagepred_mask = (pred @ proto.view(self.nm, -1)).view(-1, *proto.shape[1:]) # (n, 32) @ (32,80,80) -> (n,80,80)loss = F.binary_cross_entropy_with_logits(pred_mask, gt_mask, reduction='none')return (crop_mask(loss, xyxy).mean(dim=(1, 2)) / area).mean()這是一個Python 函數,用于訓練用于對象檢測和分割的計算機視覺模型。該函數將配置字典作為參數,它指定模型架構、要使用的數據集以及用于訓練的設備。如果“use_python”標志設置為 True,該函數將使用Ultralytics YOLO 庫進行訓練。它創建一個具有指定模型架構的YOLO 對象,并調用其“train”方法,傳入模型、數據和設備參數。這使用指定的設備在指定的數據集上訓練YOLO 模型。如果“use_python”標志設置為 False,則該函數假定正在使用分段訓練器的自定義實現。它使用指定的參數創建一個 SegmentationTrainer 對象并調用其“train”方法。這使用指定的設備在指定的數據集上訓練分割模型。總的來說,此函數提供了一種方便的方法來訓練用于對象檢測和分割的計算機視覺模型,使用流行的庫或自定義實現。
def train(cfg=DEFAULT_CFG, use_python=True):model = cfg.model or 'yolov8m-seg.pt'data = cfg.data or 'cityspaces.yaml' # or yolo.ClassificationDataset("mnist")device = cfg.device if cfg.device is not None else '2,3'args = dict(model=model, data=data, device=device)if use_python:from ultralytics import YOLOYOLO(model).train(**args)else:trainer = SegmentationTrainer(overrides=args)trainer.train()if __name__ == '__main__':train()
cityspaces.yaml
# Ultralytics YOLO 🚀, GPL-3.0 license
# COCO128 dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017) by Ultralytics
# Example usage: yolo train data=coco128.yaml
# parent
# ├── yolov5
# └── datasets
# └── coco128 ← downloads here (7 MB)# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: /home/shenlan08/lihanlin_shijian/cityspaces/images # dataset root dir
train: train # train images (relative to 'path') 128 images
val: val # val images (relative to 'path') 128 images
test: # test images (optional)# Classes
names:#0: road0: sidewalk1: building2: wall3: fence4: pole5: traffic light6: traffic sign7: vegetation8: terrain9: sky10: person11: rider12: car13: truck14: bus15: train16: motorcycle17: bicycle
v8推理代碼
from ultralytics.nn.modules import Detect, Segment
from ultralytics.tracker.trackers.bot_sort import BOTSORT
from ultralytics.hub.utils import traces
from ultralytics.yolo.data.dataloaders.stream_loaders import LoadImages
from ultralytics.yolo.utils import ops, LOGGER, yaml_load
from ultralytics.yolo.utils.plotting import Annotator, colors
from ultralytics.yolo.engine.results import Boxes
import torch
import torch.nn as nn
from pathlib import Path
import numpy as np
import cv2
import os
os.environ['CUDA_VISIBLE_DEVICES']='2'
done_warmup = False
video_flag = False
vid_writer = None
這是一個Python 函數,它加載單個模型權重文件并返回加載的PyTorch 模型和檢查點字典。該函數接受要加載的權重文件的路徑、加載模型的設備、指示是否就地執行操作的標志以及指示是否融合層的標志。該函數首先使用PyTorch的“torch.load”方法加載檢查點和權重文件。然后它從檢查點加載模型,將其轉換為 float32 并將其移動到指定的設備。如果“fuse”標志設置為 True 并且模型具有“fuse”方法,則該函數使用“fuse”方法融合模型中的層。該函數還對正在使用的 PyTorch 版本執行兼容性更新。它為某些模塊設置“inplace”標志以確保與 torch 1.7.0 兼容,并向 nn.Upsample 添加“recompute_scale_factor”屬性以與 torch 1.11.0 兼容。最后,該函數返回加載的模型和檢查點字典。總的來說,此功能提供了一種方便的方法來加載單個模型權重文件并準備加載的模型以用于推理或進一步訓練。
def attempt_load_one_weight(weight, device=None, inplace=True, fuse=False):# Loads a single model weightsckpt, weight = torch.load(weight, map_location='cpu'), weight # load ckptmodel = ckpt['model'].to(device).float() # FP32 model 顯存增加if not hasattr(model, 'stride'):model.stride = torch.tensor([32.])model = model.fuse().eval() if fuse and hasattr(model, 'fuse') else model.eval() # model in eval mode# Module compatibility updatesfor m in model.modules():t = type(m)if t in (nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU, Detect, Segment):m.inplace = inplace # torch 1.7.0 compatibilityelif t is nn.Upsample and not hasattr(m, 'recompute_scale_factor'):m.recompute_scale_factor = None # torch 1.11.0 compatibility# Return model and ckptreturn model, ckpt #加載模型權重
這個Python 函數將numpy 數組轉換為PyTorch 張量和該函數接受一個 numpy 數組和要將張量移動到的設備。它首先使用“isinstance”方法檢查輸入是否為 numpy 數組。如果它是一個 numpy 數組,該函數使用“torch.tensor”方法將其轉換為 PyTorch 張量,并使用“to”方法將其移動到指定設備。如果輸入不是 numpy 數組,則該函數僅按原樣返回輸入。總的來說,這個函數提供了一種將numpy 數組轉換為PyTorch 張量并將它們移動到所需設備的便捷方法。
def from_numpy(x, device):"""Convert a numpy array to a tensor.Args:x (np.ndarray): The array to be converted.Returns:(torch.Tensor): The converted tensor"""return torch.tensor(x).to(device) if isinstance(x, np.ndarray) else x
這是一個Python 函數,它對PyTorch 模型執行預熱過程以初始化其參數并在設備上分配內存。該函數接受要預熱的模型、用于預熱通道的設備以及格式為(batch_size、channels、height、width)的虛擬輸入張量的形狀。該函數使用指定的形狀創建一個虛擬輸入張量,數據類型設置為 float,設備設置為指定的設備。然后它使用前向傳遞將虛擬輸入張量傳遞給模型并返回輸出。如果輸出是列表或元組,則該函數使用“from_numpy”函數將輸出的第一個元素轉換為PyTorch 張量,并將其移動到指定的設備。如果輸出不是列表或元組,該函數只需使用“from_numpy”函數將其轉換為 PyTorch 張量并將其移動到指定設備。總的來說,此函數提供了一種方便的方法來預熱 PyTorch 模型并在運行推理或訓練之前在設備上分配內存。
def warmup(model, device, imgsz=(1, 3, 640, 640)):"""Warm up the model by running one forward pass with a dummy input.Args:imgsz (tuple): The shape of the dummy input tensor in the format (batch_size, channels, height, width)Returns:(None): This method runs the forward pass and don't return any value"""if device.type != 'cpu':im = torch.empty(*imgsz, dtype=torch.float, device=device) # inputy = model(im)if isinstance(y, (list, tuple)):return from_numpy(y[0], device) if len(y) == 1 else [from_numpy(x, device) for x in y]else:return from_numpy(y, device)
這是一個Python 函數,用于預處理輸入圖像以用于PyTorch模型。該函數將輸入圖像作為numpy 數組和用于處理的設備。它首先使用“torch.from_numpy”方法將numpy 數組轉換為PyTorch 張量,然后使用“to”方法將其移動到指定設備。然后該函數將張量轉換為浮點數并將其除以 255 以將像素值縮放到 0.0 到 1.0 的范圍內。這是計算機視覺模型的常見預處理步驟。最后,該函數返回預處理后的張量。總的來說,這個函數提供了一種方便的方法來預處理輸入圖像以用于PyTorch 模型。
def preprocess(img, device):img = torch.from_numpy(img).to(device)img = img.float() # uint8 to fp16/32img /= 255 # 0 - 255 to 0.0 - 1.0return img
這是一個Python 函數,可生成并保存分割結果圖像以用于可視化目的。該函數將輸入圖像作為一個numpy 數組,將分割結果作為一個 numpy 數組,batch 的索引,epoch 編號,結果圖像的保存目錄以及幾個可選參數。如果未提供“調色板”參數,則該函數會為分割類生成一個隨機調色板。該函數首先檢查“is_demo”參數是否設置為 True。如果是,該函數會為分割結果生成一個二值顏色掩碼,并將其與輸入圖像組合以生成結果圖像。如果不是,該函數使用調色板生成顏色分割圖像并將其與輸入圖像組合。然后該函數將結果圖像轉換為BGR 格式,并使用加權平均將其與輸入圖像混合。最后,該函數將結果圖像保存到指定目錄,文件名基于批次索引和紀元號。總的來說,此功能提供了一種方便的方式來生成和保存分割結果圖像,以便在訓練或測試期間進行可視化。
def show_seg_result(img, result, index, epoch, save_dir=None, is_ll=False,palette=None,is_demo=False,is_gt=False):if palette is None:palette = np.random.randint(0, 255, size=(3, 3))palette[0] = [0, 0, 0]palette[1] = [0, 255, 0]palette[2] = [255, 0, 0]palette = np.array(palette)assert palette.shape[0] == 3 # len(classes)assert palette.shape[1] == 3assert len(palette.shape) == 2if not is_demo:color_seg = np.zeros((result.shape[0], result.shape[1], 3), dtype=np.uint8)for label, color in enumerate(palette):color_seg[result == label, :] = colorelse:color_area = np.zeros((result[0].shape[0], result[0].shape[1], 3), dtype=np.uint8)color_area[result[0] == 1] = [0, 255, 0]color_area[result[1] ==1] = [255, 0, 0]color_seg = color_area# convert to BGRcolor_seg = color_seg[..., ::-1]# print(color_seg.shape)color_mask = np.mean(color_seg, 2)img[color_mask != 0] = img[color_mask != 0] * 0.5 + color_seg[color_mask != 0] * 0.5# img = img * 0.5 + color_seg * 0.5img = img.astype(np.uint8)# img = cv2.resize(img, (1280,720), interpolation=cv2.INTER_LINEAR)if not is_demo:if not is_gt:if not is_ll:cv2.imwrite(save_dir+"/batch_{}_{}_da_segresult.png".format(epoch,index), img)else:cv2.imwrite(save_dir+"/batch_{}_{}_ll_segresult.png".format(epoch,index), img)else:if not is_ll:cv2.imwrite(save_dir+"/batch_{}_{}_da_seg_gt.png".format(epoch,index), img)else:cv2.imwrite(save_dir+"/batch_{}_{}_ll_seg_gt.png".format(epoch,index), img) return img
這是一個Python 函數,它通過執行非最大抑制并將預測的邊界框縮放到原始圖像大小來對對象檢測預測進行后處理。該函數將對象檢測預測作為PyTorch 張量列表、輸入圖像作為numpy 數組、原始圖像作為 numpy 數組、輸入圖像的路徑和類名稱。該函數首先使用“ops.non_max_suppression”方法對預測框執行非最大抑制。然后,它使用“ops.process_mask”方法處理預測的掩碼,并使用“ops.scale_boxes”方法縮放預測的框。該函數創建一個結果列表,其中每個結果都是一個元組,其中包含單個圖像的預測掩碼和框。該函數將每個結果附加到結果列表并返回最終列表。總體而言,此功能提供了一種對對象檢測預測進行后處理并獲得最終預測掩碼和邊界框的便捷方法。
def postprocess(preds, img, orig_img, path, names):# TODO: filter by classesp = ops.non_max_suppression(prediction=preds[0][0],conf_thres=0.1,iou_thres=0.7,agnostic=False,max_det=300,nc=len(names),classes=None )results = []proto = preds[0][1][-1] if len(preds[0][1]) == 3 else preds[0][1] # second output is len 3 if pt, but only 1 if exportedfor i, pred in enumerate(p):orig_img = orig_img[i] if isinstance(orig_img, list) else orig_imgshape = orig_img.shapeimg_path = path[i] if isinstance(path, list) else pathif not len(pred): # save empty boxes# results.append(Results(orig_img=orig_img, path=img_path, names=names, boxes=pred[:, :6]))continuemasks = ops.process_mask(proto[i], pred[:, 6:], pred[:, :4], img.shape[2:], upsample=True) # HWCpred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], shape).round()boxes = pred[:, :6].detach()boxes = Boxes(boxes, orig_img.shape[:2]) if boxes is not None else None # native size boxesresults.append(masks)results.append(boxes)return results這是一個Python 函數,可將對象檢測結果寫入圖像和/或視頻文件。該函數接受當前圖像的索引、對象檢測結果列表、作為PyTorch 張量的輸入圖像、到目前為止處理的圖像數量、數據集、類名稱和視頻捕獲對象。它首先擴展輸入圖像張量以包含批處理維度并將日志字符串設置為空字符串。該函數然后從結果中獲取預測的蒙版和框,并創建一個Annotator 對象以在圖像上繪制蒙版和框。該函數遍歷檢測到的類別并計算每個類別的檢測次數。然后,該函數使用 Annotator 對象將蒙版和框添加到圖像,并將生成的圖像寫入視頻編寫器對象(如果指定了一個)。最后,該函數返回一個日志字符串,其中包含有關每個類的檢測信息以及目前已處理的圖像數量。總體而言,此功能提供了一種在推理過程中可視化和保存對象檢測結果的便捷方式。def write_results(idx, results, batch, seen, dataset, names, vid_cap):p, im, im0 = batchlog_string = ''if len(im.shape) == 3:im = im[None] # expand for batch dimseen += 1# txt_path = f'_{frame}'log_string += '%gx%g ' % im.shape[2:] # print stringannotator = Annotator(im0, line_width=3, example=str(names))result = resultsif len(result) == 0:return f'{log_string}(no detections), 'det, masks = result[1], result[0] # getting tensors TODO: mask mask,box inherit for tensor# Print resultsfor c in det.cls.unique():n = (det.cls == c).sum() # detections per classlog_string += f"{n} {names[int(c)]}{'s' * (n > 1)}, "# Mask plottingim_gpu = im[idx]annotator.masks(masks=masks, colors=[colors(x, True) for x in det.cls], im_gpu=im_gpu)# Write resultsfor j, d in enumerate(reversed(det)):cls, conf = d.cls.squeeze(), d.conf.squeeze()# Add bbox to imagec = int(cls) # integer classname = f'id:{int(d.id.item())} {names[c]}' if d.id is not None else names[c]label = f'{name} {conf:.2f}'annotator.box_label(d.xyxy.squeeze(), label, color=colors(c, True))# save_predsglobal video_flag, vid_writerim0 = annotator.result()if vid_cap and not video_flag: # videofps = int(vid_cap.get(cv2.CAP_PROP_FPS)) # integer required, floats produce error in MP4 codecw = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))vid_writer = cv2.VideoWriter("out.mp4", cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))video_flag = Truevid_writer.write(im0)return log_string
這是對視頻執行對象檢測和跟蹤的Python 腳本的主要功能。該腳本首先設置模型并從文件加載其權重。它還會設置輸入源(在本例中為視頻文件),并初始化數據集加載器以遍歷視頻幀。然后,腳本通過運行示例圖像來預熱模型。接下來,腳本使用配置文件初始化對象跟蹤器,并在每一幀中檢測到的對象上運行它。然后腳本通過模型運行輸入圖像并對輸出執行后處理以獲得檢測到的對象的邊界框和類標簽。然后,腳本使用檢測到的對象的邊界框更新跟蹤器,并將相應的類標簽分配給被跟蹤的對象。然后該腳本可視化并保存視頻每一幀的檢測和跟蹤結果。最后,腳本釋放視頻編寫器對象并退出。總的來說,這個腳本提供了一個完整的視頻對象檢測和跟蹤管道,并且可以很容易地適應不同的模型和輸入源。
if __name__ == '__main__':LOGGER.info('')# setup modeldevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')model = 'yolov8-seg-40.pt'model, ckpt = attempt_load_one_weight(weight=model, # 這里已經得到模型device=device,inplace=True,fuse=True)# model = model.fuse()names = model.module.names if hasattr(model, 'module') else model.names # get class names# print(names)#這個是模型# setup source every time predict is calledsource = '1.mp4'imgsz = [640, 640]transforms = Nonedataset = LoadImages(source,imgsz=imgsz,stride=32,auto=True,transforms=transforms,vid_stride=1)# warmup modelif not done_warmup:warmup(model, device, imgsz=(1, 3, *imgsz))done_warmup = Trueseen, windows, dt, batch = 0, [], (ops.Profile(), ops.Profile(), ops.Profile()), None# init tracktracker = 'botsort.yaml'cfg = yaml_load(tracker)tracker = BOTSORT(args=cfg, frame_rate=30)traces(traces_sample_rate=1.0)# predict_startfor batch in dataset:path, im, im0s, vid_cap, s = batchim_copy = im.transpose(1, 2, 0)# preprocesswith dt[0]:im = preprocess(im, device)if len(im.shape) == 3:im = im[None] # expand for batch dim# inferencewith dt[1]:preds = model(im)# da and ll_, da_seg_out,ll_seg_out = preds_, da_seg_mask = torch.max(da_seg_out, 1)da_seg_mask = da_seg_mask.int().squeeze().cpu().numpy()_, ll_seg_mask = torch.max(ll_seg_out, 1)ll_seg_mask = ll_seg_mask.int().squeeze().cpu().numpy()im = show_seg_result(im_copy, (da_seg_mask, ll_seg_mask), _, _, is_demo=True)im = im.transpose(2, 0, 1)im = preprocess(im, device)if len(im.shape) == 3:im = im[None] # expand for batch dim# postprocesswith dt[2]:results = postprocess(preds, im, im0s, path, names)# do trackim0s_track = im0sdet = results[1].cpu().numpy()if len(det) == 0:continuetracks = tracker.update(det, im0s_track)if len(tracks) == 0:continueboxes=torch.as_tensor(tracks[:, :-1])if boxes is not None:results[1] = Boxes(boxes=boxes, orig_shape=im0s_track.shape[:2])if results[0] is not None:idx = tracks[:, -1].tolist()results[0] = results[0][idx]# visualize, save, write resultsn = len(im)for i in range(n):results[i].speed = {'preprocess': dt[0].dt * 1E3 / n,'inference': dt[1].dt * 1E3 / n,'postprocess': dt[2].dt * 1E3 / n}p, im0 = (path, im0s.copy())p = Path(p)# saves += write_results(i, results, (p, im, im0), seen, dataset, names, vid_cap)LOGGER.info(f'{s}{dt[1].dt * 1E3:.1f}ms')vid_writer.release() # release final video writer
video 1/1 (2627/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 6 cars, 1 bus, 19.4ms video 1/1 (2628/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.2ms video 1/1 (2629/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 bus, 19.1ms video 1/1 (2630/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.2ms video 1/1 (2631/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.0ms video 1/1 (2632/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.2ms video 1/1 (2633/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.1ms video 1/1 (2634/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.2ms video 1/1 (2635/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.3ms video 1/1 (2636/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 6 cars, 19.3ms video 1/1 (2637/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 6 cars, 19.1ms video 1/1 (2638/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 6 cars, 19.3ms video 1/1 (2639/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.1ms video 1/1 (2640/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.4ms video 1/1 (2641/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.6ms video 1/1 (2642/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.3ms video 1/1 (2643/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.9ms video 1/1 (2644/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.6ms video 1/1 (2645/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.3ms video 1/1 (2646/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.1ms video 1/1 (2647/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.2ms video 1/1 (2648/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.0ms video 1/1 (2649/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.1ms video 1/1 (2650/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
384x640 5 cars, 1 truck, 19.1ms video 1/1 (2651/3553)
/home/shenlan08/lihanlin_shijian/test/ultralytics_predict/4.mp4:
SegmentationModel(
(model): Sequential(
(0): Conv(
(conv): Conv2d(3, 48, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(48, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): C2f(
(cv1): Conv(
(conv): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): ModuleList(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
)
)
(3): Conv(
(conv): Conv2d(96, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)
(4): C2f(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(576, 192, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): ModuleList(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(3): Bottleneck(
(cv1): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
)
)
(5): Conv(
(conv): Conv2d(192, 384, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)
(6): C2f(
(cv1): Conv(
(conv): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1152, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): ModuleList(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(2): Bottleneck(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(3): Bottleneck(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
)
)
(7): Conv(
(conv): Conv2d(384, 576, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)
(8): C2f(
(cv1): Conv(
(conv): Conv2d(576, 576, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1152, 576, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): ModuleList(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(288, 288, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(288, 288, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(288, 288, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(288, 288, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
)
)
(9): SPPF(
(cv1): Conv(
(conv): Conv2d(576, 288, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1152, 576, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): MaxPool2d(kernel_size=5, stride=1, padding=2, dilation=1, ceil_mode=False)
)
(10): Upsample(scale_factor=2.0, mode=nearest)
(11): Concat()
(12): C2f(
(cv1): Conv(
(conv): Conv2d(960, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): ModuleList(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
)
)
(13): Upsample(scale_factor=2.0, mode=nearest)
(14): Concat()
(15): C2f(
(cv1): Conv(
(conv): Conv2d(576, 192, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): ModuleList(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
)
)
(16): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)
(17): Concat()
(18): C2f(
(cv1): Conv(
(conv): Conv2d(576, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): ModuleList(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
)
)
(19): Conv(
(conv): Conv2d(384, 384, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(act): SiLU(inplace=True)
)
(20): Concat()
(21): C2f(
(cv1): Conv(
(conv): Conv2d(960, 576, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(1152, 576, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): ModuleList(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(288, 288, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(288, 288, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
(1): Bottleneck(
(cv1): Conv(
(conv): Conv2d(288, 288, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(288, 288, kernel_size=(3, 3), stride=(1, 1), padding=[1, 1])
(act): SiLU(inplace=True)
)
)
)
)
(22): Segment(
(cv2): ModuleList(
(0): Sequential(
(0): Conv(
(conv): Conv2d(192, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
)
(1): Sequential(
(0): Conv(
(conv): Conv2d(384, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
)
(2): Sequential(
(0): Conv(
(conv): Conv2d(576, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
)
)
(cv3): ModuleList(
(0): Sequential(
(0): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(192, 80, kernel_size=(1, 1), stride=(1, 1))
)
(1): Sequential(
(0): Conv(
(conv): Conv2d(384, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(192, 80, kernel_size=(1, 1), stride=(1, 1))
)
(2): Sequential(
(0): Conv(
(conv): Conv2d(576, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(192, 80, kernel_size=(1, 1), stride=(1, 1))
)
)
(dfl): DFL(
(conv): Conv2d(16, 1, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
(proto): Proto(
(cv1): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(upsample): ConvTranspose2d(192, 192, kernel_size=(2, 2), stride=(2, 2))
(cv2): Conv(
(conv): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
)
(cv4): ModuleList(
(0): Sequential(
(0): Conv(
(conv): Conv2d(192, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(48, 32, kernel_size=(1, 1), stride=(1, 1))
)
(1): Sequential(
(0): Conv(
(conv): Conv2d(384, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(48, 32, kernel_size=(1, 1), stride=(1, 1))
)
(2): Sequential(
(0): Conv(
(conv): Conv2d(576, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(1): Conv(
(conv): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(2): Conv2d(48, 32, kernel_size=(1, 1), stride=(1, 1))
)
)
)
(23): Conv(
(conv): Conv2d(576, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(24): Upsample(scale_factor=2.0, mode=nearest)
(25): C3(
(cv1): Conv(
(conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)
(26): Conv(
(conv): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(27): Upsample(scale_factor=2.0, mode=nearest)
(28): Conv(
(conv): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(29): C3(
(cv1): Conv(
(conv): Conv2d(16, 4, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(16, 4, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(8, 8, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)
(30): Upsample(scale_factor=2.0, mode=nearest)
(31): Conv(
(conv): Conv2d(8, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(32): Conv(
(conv): Conv2d(576, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(33): Upsample(scale_factor=2.0, mode=nearest)
(34): C3(
(cv1): Conv(
(conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)
(35): Conv(
(conv): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(36): Upsample(scale_factor=2.0, mode=nearest)
(37): Conv(
(conv): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(38): C3(
(cv1): Conv(
(conv): Conv2d(16, 4, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(16, 4, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(cv3): Conv(
(conv): Conv2d(8, 8, kernel_size=(1, 1), stride=(1, 1))
(act): SiLU(inplace=True)
)
(m): Sequential(
(0): Bottleneck(
(cv1): Conv(
(conv): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
(cv2): Conv(
(conv): Conv2d(4, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
)
(39): Upsample(scale_factor=2.0, mode=nearest)
(40): Conv(
(conv): Conv2d(8, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(act): SiLU(inplace=True)
)
)
)
模型訓練及導出onnx
包含18個類和車道線與可行駛區域的數據集,只是用yolov8做分割
車道線膨脹
怎么將兩個數據集進行合并
將兩個數據集合并的具體方法取決于數據集的格式和內容。以下是一些常用的數據集合并方法:
-
數據集拼接:如果兩個數據集的內容和格式相同,可以通過將它們拼接在一起來合并數據集。這種方法適用于數據集的大小和結構相似的情況。例如,如果兩個數據集都是由圖像和其對應的標簽組成,可以將它們拼接在一起,形成一個更大的數據集。
-
數據集擴充:如果兩個數據集的內容和格式不同,可以將它們合并后進行擴充。這種方法適用于需要增加數據集類別或樣本數量的情況。例如,如果一個數據集只包含人和車輛的圖像,另一個數據集包含人、車輛和動物的圖像,可以將它們合并后,通過數據增強和樣本復制等方法擴充數據集,以增加模型的泛化能力。
-
數據集融合:如果兩個數據集的內容和格式相似,但存在重疊的部分,可以將它們融合成一個數據集。這種方法適用于需要減少標注重復度和提高數據集質量的情況。例如,兩個數據集都包含人和車輛的圖像,但它們的標注存在差異,可以通過將它們融合成一個數據集,然后對標注進行驗證和修正,以提高數據集的準確性。
需要注意的是,在進行數據集合并時,應注意保持數據集的平衡性和多樣性,避免數據集中存在類別偏斜和重復樣本,以提高模型的泛化能力和魯棒性。同時,應根據具體的任務需求和數據集特點,選擇合適的數據集合并方法,并進行充分的數據預處理和后處理,以確保合并后的數據集質量和效果。
import json
import osdef bdd2yolo5(categorys,jsonFile,writepath):strs=""f = open(jsonFile)info = json.load(f)#print(len(info))#print(info["name"])write = open(writepath + "%s.txt" % info["name"], 'w')for obj in info["frames"]:#print(obj["objects"])for objects in obj["objects"]:#print(objects)if objects["category"] in categorys:dw = 1.0 / 1280dh = 1.0 / 720strs += str(categorys.index(objects["category"]))strs += " "strs += str(((objects["box2d"]["x1"] + objects["box2d"]["x2"]) / 2.0) * dw)[0:8]strs += " "strs += str(((objects["box2d"]["y1"] + objects["box2d"]["y2"]) / 2.0) * dh)[0:8]strs += " "strs += str(((objects["box2d"]["x2"] - objects["box2d"]["x1"])) * dw)[0:8]strs += " "strs += str(((objects["box2d"]["y2"] - objects["box2d"]["y1"])) * dh)[0:8]strs += "\n"write.writelines(strs)write.close()print("%s has been dealt!" % info["name"])if __name__ == "__main__":####################args#####################categorys = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck'] # 自己需要從BDD數據集里提取的目標類別readpath = "/home/shenlan08/lihanlin_shijian/bdd100k/det_annotations/train/" # BDD數據集標簽讀取路徑,這里需要分兩次手動去修改train、val的地址writepath = "/home/shenlan08/lihanlin_shijian/bdd100k/labels/train/" # BDD數據集轉換后的標簽保存路徑fileList = os.listdir(readpath)#print(fileList)for file in fileList:print(file)filepath = readpath + filebdd2yolo5(categorys,filepath,writepath)
這個錯誤通常是由于CUDA顯存不足導致的。當GPU嘗試分配內存時,如果內存不足,則會引發此錯誤。
以下是一些可能導致此錯誤的原因和解決方法:
-
內存不足:確保你的GPU有足夠的空閑內存來運行你的程序。如果你運行的任務需要大量的顯存,可以嘗試減少批次大小、減少模型大小或使用更小的模型來減少顯存占用。
-
程序錯誤:如果你的程序存在錯誤,可能會導致內存泄漏或者顯存占用過高。確保你的程序沒有錯誤,或者嘗試使用更簡單的程序來排除問題。
-
CUDA驅動程序問題:可能是因為CUDA驅動程序版本不兼容或存在問題。嘗試更新或降級CUDA驅動程序版本,或者重新安裝CUDA驅動程序以解決問題。
-
GPU硬件問題:如果你的GPU存在硬件問題,也可能導致此錯誤。嘗試使用其他GPU或者聯系GPU制造商進行修理或更換。
總之,這個錯誤通常是由于顯存不足引起的。如果你的程序需要大量的顯存,請確保你的GPU有足夠的空閑內存來運行程序。如果問題仍然存在,可以嘗試使用其他GPU或者聯系GPU制造商進行修理或更換。
def train(cfg=DEFAULT_CFG, use_python=False):model = cfg.model or 'yolov8m-seg.pt'data = cfg.data or 'cityspaces.yaml' # or yolo.ClassificationDataset("mnist")device = cfg.device if cfg.device is not None else '0,1,2'print(device)
Ultralytics YOLOv8.0.61 🚀 Python-3.8.5 torch-1.8.0+cu111 CUDA:0 (GeForce RTX 2080 Ti, 11019MiB)CUDA:1 (GeForce RTX 2080 Ti, 11019MiB)CUDA:2 (GeForce RTX 2080 Ti, 11019MiB)
Running DDP command ['/root/miniconda3/bin/python', '-m', 'torch.distributed.launch', '--nproc_per_node', '3', '--master_port', '37427', '/home/shenlan08/lihanlin_shijian/last_task/ultralytics/ultralytics/yolo/v8/segment/train.py', 'task=segment', 'mode=train', 'model=yolov8m-seg.pt', 'data=cityspaces.yaml', 'epochs=100', 'patience=50', 'batch=6', 'imgsz=640', 'save=True', 'save_period=-1', 'cache=False', 'device=0,1,2', 'workers=8', 'project=None', 'name=None', 'exist_ok=False', 'pretrained=False', 'optimizer=SGD', 'verbose=True', 'seed=0', 'deterministic=True', 'single_cls=False', 'image_weights=False', 'rect=False', 'cos_lr=False', 'close_mosaic=10', 'resume=False', 'amp=True', 'overlap_mask=True', 'mask_ratio=4', 'dropout=0.0', 'val=True', 'split=val', 'save_json=False', 'save_hybrid=False', 'conf=None', 'iou=0.7', 'max_det=300', 'half=False', 'dnn=False', 'plots=True', 'source=None', 'show=False', 'save_txt=False', 'save_conf=False', 'save_crop=False', 'hide_labels=False', 'hide_conf=False', 'vid_stride=1', 'line_thickness=3', 'visualize=False', 'augment=False', 'agnostic_nms=False', 'classes=None', 'retina_masks=False', 'boxes=True', 'format=torchscript', 'keras=False', 'optimize=False', 'int8=False', 'dynamic=False', 'simplify=False', 'opset=None', 'workspace=4', 'nms=False', 'lr0=0.01', 'lrf=0.01', 'momentum=0.937', 'weight_decay=0.0005', 'warmup_epochs=3.0', 'warmup_momentum=0.8', 'warmup_bias_lr=0.1', 'box=7.5', 'cls=0.5', 'dfl=1.5', 'fl_gamma=0.0', 'label_smoothing=0.0', 'nbs=64', 'hsv_h=0.015', 'hsv_s=0.7', 'hsv_v=0.4', 'degrees=0.0', 'translate=0.1', 'scale=0.5', 'shear=0.0', 'perspective=0.0', 'flipud=0.0', 'fliplr=0.5', 'mosaic=1.0', 'mixup=0.0', 'copy_paste=0.0', 'cfg=None', 'v5loader=False', 'tracker=botsort.yaml']
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
0,1,2
0,1,2
0,1,2
DDP settings: RANK 0, WORLD_SIZE 3, DEVICE cuda:0
TensorBoard: Start with 'tensorboard --logdir /home/shenlan08/lihanlin_shijian/last_task/ultralytics/runs/segment/train14', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=19
Transferred 531/537 items from pretrained weights
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ?
數據整理一個把發群里 之后你看看 咋能把模型先跑起來
yolo8
的 那個跑過了之前
但是好像沒加路的
把路的標簽加里面在訓練一下子
最后的那個的地方改一下估計就行
完了評估指標你在看一眼 看看那個代碼細節是不是這幾個
分割的
好的
你自己做的太多的話 你看看 咋讓他們配合一下
論文現在沒做這個圖像的
做的是軌跡的
圖像這個創新的話比較難 也不是比較難 因為實驗不是很好做
什么是PA?
PA和mIOU都是用于評估語義分割模型性能的指標,但它們分別從不同的角度對模型的性能進行評價。
PA是Pixel Accuracy的縮寫,是計算所有像素中分類正確的像素占總像素數的比例。它只關注像素級別的準確性,因此無法區分不同類別之間的差異。具體計算方法如下:
P A = T P + T N T P + T N + F P + F N PA = \frac{TP + TN}{TP + TN + FP + FN} PA=TP+TN+FP+FNTP+TN?
其中,TP表示真正例的像素數,TN表示真負例的像素數,FP表示假正例的像素數,FN表示假負例的像素數。
什么是MIOU?
mIOU是Mean Intersection over Union的縮寫,是計算所有類別的平均Intersection over Union的值。Intersection over Union是判斷模型分割結果與真實結果的重合程度的指標,它計算預測結果和真實結果相交部分與相并部分之間的比例。mIOU綜合了所有類別的IoU值,因此能夠更全面地評估模型的性能。具體計算方法如下:
m I O U = 1 n ∑ i = 1 n T P i T P i + F P i + F N i mIOU = \frac{1}{n}\sum_{i=1}^{n}\frac{TP_i}{TP_i + FP_i + FN_i} mIOU=n1?i=1∑n?TPi?+FPi?+FNi?TPi??
其中,n表示類別總數,TP_i表示第i個類別的真正例像素數,FP_i表示第i個類別的假正例像素數,FN_i表示第i個類別的假負例像素數。
綜上所述,PA和mIOU都是用于評估語義分割模型性能的指標,但前者關注的是像素級別的準確性,后者則更關注不同類別之間的差異。在實際應用中,需要根據任務需求和場景特點選擇合適的評價指標。
語義分割的指標計算
https://blog.csdn.net/sinat_29047129/article/details/103642140
yolov8數據集處理
Cityscapes 數據集:Cityscapes 數據集是一個廣泛使用的自動駕駛數據集,包含來自城市道路、廣場和建筑物等多種場景的高分辨率圖像。該數據集包含標注的道路、車輛、行人和建筑物等對象的標簽,可以用于訓練和測試YOLO目標檢測算法。
ApolloScape 數據集:ApolloScape 數據集由百度公司提供,包含高分辨率的自動駕駛圖像和激光雷達數據。該數據集包含多種場景,包括城市道路、高速公路和停車場等。該數據集還包含標注的車輛、行人、自行車和交通燈等對象的標簽,可以用于訓練和測試YOLO目標檢測算法。
Mapillary Vistas 數據集:Mapillary Vistas 數據集是一個由 Mapillary 公司提供的自動駕駛數據集,包含來自全球各地的高分辨率圖像。該數據集包含多種場景,包括城市街道、公園和農村地區等。該數據集還包含標注的道路、車輛、行人和建筑物等對象的標簽,可以用于訓練和測試YOLO目標檢測算法。
https://blog.csdn.net/ruleng8662/article/details/129522449
cvat等工具標注的數據集,標簽一般都是json格式的。
而yolo訓練的得標簽是txt格式,所以不兼容
需要將json轉為yolo,當如v8本身也有數據處理得工具