?最近有個項目需要做視覺自動化處理的工具,最后選用的軟件為python,剛好這個機會進行系統學習。短時間學習,需要快速開發,所以記錄要點步驟,防止忘記。
?鏈接:
開源 python 應用 開發(一)python、pip、pyAutogui、python opencv安裝-CSDN博客
開源 python 應用 開發(二)基于pyautogui、open cv 視覺識別的工具自動化-CSDN博客
開源 python 應用 開發(三)python語法介紹-CSDN博客
開源 python 應用 開發(四)python文件和系統綜合應用-CSDN博客
?推薦鏈接:
開源 Arkts 鴻蒙應用 開發(一)工程文件分析-CSDN博客
開源 Arkts 鴻蒙應用 開發(二)封裝庫.har制作和應用-CSDN博客
開源 Arkts 鴻蒙應用 開發(三)Arkts的介紹-CSDN博客
開源 Arkts 鴻蒙應用 開發(四)布局和常用控件-CSDN博客
開源 Arkts 鴻蒙應用 開發(五)控件組成和復雜控件-CSDN博客
?推薦鏈接:
開源 java android app 開發(一)開發環境的搭建-CSDN博客
開源 java android app 開發(二)工程文件結構-CSDN博客
開源 java android app 開發(三)GUI界面布局和常用組件-CSDN博客
開源 java android app 開發(四)GUI界面重要組件-CSDN博客
開源 java android app 開發(五)文件和數據庫存儲-CSDN博客
開源 java android app 開發(六)多媒體使用-CSDN博客
開源 java android app 開發(七)通訊之Tcp和Http-CSDN博客
開源 java android app 開發(八)通訊之Mqtt和Ble-CSDN博客
開源 java android app 開發(九)后臺之線程和服務-CSDN博客
開源 java android app 開發(十)廣播機制-CSDN博客
開源 java android app 開發(十一)調試、發布-CSDN博客
開源 java android app 開發(十二)封庫.aar-CSDN博客
推薦鏈接:
開源C# .net mvc 開發(一)WEB搭建_c#部署web程序-CSDN博客
開源 C# .net mvc 開發(二)網站快速搭建_c#網站開發-CSDN博客
開源 C# .net mvc 開發(三)WEB內外網訪問(VS發布、IIS配置網站、花生殼外網穿刺訪問)_c# mvc 域名下不可訪問內網,內網下可以訪問域名-CSDN博客
開源 C# .net mvc 開發(四)工程結構、頁面提交以及顯示_c#工程結構-CSDN博客
開源 C# .net mvc 開發(五)常用代碼快速開發_c# mvc開發-CSDN博客
本章節內容如下:實現了使用 YOLOv3 (You Only Look Once version 3) 深度學習模型進行目標檢測的功能。識別了香蕉、手機、夜景中的汽車和行人,第一次玩感覺挺有意思的。對圖片有些要求自己更換圖片,最好選高清大圖。
1.??YOLOv3 模型
2.? 主要函數
3.? 所有代碼
4.? 效果演示
一、YOLOv3 模型
使用預訓練的 YOLOv3 模型(需要 yolov3.cfg 和 yolov3.weights 文件)
基于 COCO 數據集(80 個類別)
這個代碼需要3個文件,分別是yolov3.cfg、yolov3.weights、coco.names網上很容易能搜到。
更換對應的 .cfg文件和.weights文件
鏈接:YOLO: Real-Time Object Detection
yolov3.cfg文件
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=16
width=608
height=608
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky# Downsample[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear# Downsample[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear# Downsample[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear# Downsample[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear# Downsample[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky[shortcut]
from=-3
activation=linear######################[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear[yolo]
mask = 6,7,8
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1[route]
layers = -4[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[upsample]
stride=2[route]
layers = -1, 61[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear[yolo]
mask = 3,4,5
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1[route]
layers = -4[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[upsample]
stride=2[route]
layers = -1, 36[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear[yolo]
mask = 0,1,2
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
coco.names文件
person
bicycle
car
motorbike
aeroplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
sofa
pottedplant
bed
diningtable
toilet
tvmonitor
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush
二、主要函數
2.1??load_yolo() 函數
功能:加載 YOLOv3 模型和相關文件
def load_yolo():# 獲取當前目錄current_dir = os.path.dirname(os.path.abspath(__file__))# 構建完整路徑cfg_path = os.path.join(current_dir, "yolov3.cfg")weights_path = os.path.join(current_dir, "yolov3.weights")names_path = os.path.join(current_dir, "coco.names")# 檢查文件是否存在if not all(os.path.exists(f) for f in [cfg_path, weights_path, names_path]):raise FileNotFoundError("缺少YOLO模型文件,請確保yolov3.cfg、yolov3.weights和coco.names在腳本目錄下")# 加載網絡net = cv2.dnn.readNet(weights_path, cfg_path)with open(names_path, "r") as f:classes = [line.strip() for line in f.readlines()]layer_names = net.getLayerNames()output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]return net, classes, output_layers
2.2??detect_objects(img, net, output_layers) 函數
功能:對輸入圖像進行目標檢測
def detect_objects(img, net, output_layers):# 從圖像創建blobblob = cv2.dnn.blobFromImage(img, scalefactor=1/255.0, size=(416, 416), swapRB=True, crop=False)# 設置輸入并進行前向傳播net.setInput(blob)outputs = net.forward(output_layers)return outputs
2.3??get_box_dimensions(outputs, height, width) 函數
功能:處理網絡輸出,提取邊界框信息
def get_box_dimensions(outputs, height, width):boxes = []confidences = []class_ids = []for output in outputs:for detection in output:scores = detection[5:]class_id = np.argmax(scores)confidence = scores[class_id]if confidence > 0.5: # 置信度閾值# 計算邊界框坐標center_x = int(detection[0] * width)center_y = int(detection[1] * height)w = int(detection[2] * width)h = int(detection[3] * height)# 矩形左上角坐標x = int(center_x - w / 2)y = int(center_y - h / 2)boxes.append([x, y, w, h])confidences.append(float(confidence))class_ids.append(class_id)return boxes, confidences, class_ids
2.4???draw_labels(boxes, confidences, class_ids, classes, img) 函數
功能:在圖像上繪制檢測結果
def draw_labels(boxes, confidences, class_ids, classes, img):# 應用非極大值抑制indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)# 設置顏色colors = np.random.uniform(0, 255, size=(len(classes), 3))if len(indexes) > 0:for i in indexes.flatten():x, y, w, h = boxes[i]label = str(classes[class_ids[i]])confidence = str(round(confidences[i], 2))color = colors[class_ids[i]]# 繪制邊界框和標簽cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)cv2.putText(img, f"{label} {confidence}", (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)return img
2.5??object_detection(image_path) 函數
功能:主函數,整合整個檢測流程
def object_detection(image_path):try:net, classes, output_layers = load_yolo()img = cv2.imread(image_path)if img is None:raise FileNotFoundError(f"無法加載圖像: {image_path}")height, width = img.shape[:2]outputs = detect_objects(img, net, output_layers)boxes, confidences, class_ids = get_box_dimensions(outputs, height, width)img = draw_labels(boxes, confidences, class_ids, classes, img)cv2.imshow("Object Detection", img)cv2.waitKey(0)cv2.destroyAllWindows()except Exception as e:print(f"發生錯誤: {str(e)}")
三、所有代碼
import cv2
import numpy as np
import osdef load_yolo():# 獲取當前目錄current_dir = os.path.dirname(os.path.abspath(__file__))# 構建完整路徑cfg_path = os.path.join(current_dir, "yolov3.cfg")weights_path = os.path.join(current_dir, "yolov3.weights")names_path = os.path.join(current_dir, "coco.names")# 檢查文件是否存在if not all(os.path.exists(f) for f in [cfg_path, weights_path, names_path]):raise FileNotFoundError("缺少YOLO模型文件,請確保yolov3.cfg、yolov3.weights和coco.names在腳本目錄下")# 加載網絡net = cv2.dnn.readNet(weights_path, cfg_path)with open(names_path, "r") as f:classes = [line.strip() for line in f.readlines()]layer_names = net.getLayerNames()output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]return net, classes, output_layersdef detect_objects(img, net, output_layers):# 從圖像創建blobblob = cv2.dnn.blobFromImage(img, scalefactor=1/255.0, size=(416, 416), swapRB=True, crop=False)# 設置輸入并進行前向傳播net.setInput(blob)outputs = net.forward(output_layers)return outputsdef get_box_dimensions(outputs, height, width):boxes = []confidences = []class_ids = []for output in outputs:for detection in output:scores = detection[5:]class_id = np.argmax(scores)confidence = scores[class_id]if confidence > 0.5: # 置信度閾值# 計算邊界框坐標center_x = int(detection[0] * width)center_y = int(detection[1] * height)w = int(detection[2] * width)h = int(detection[3] * height)# 矩形左上角坐標x = int(center_x - w / 2)y = int(center_y - h / 2)boxes.append([x, y, w, h])confidences.append(float(confidence))class_ids.append(class_id)return boxes, confidences, class_idsdef draw_labels(boxes, confidences, class_ids, classes, img):# 應用非極大值抑制indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)# 設置顏色colors = np.random.uniform(0, 255, size=(len(classes), 3))if len(indexes) > 0:for i in indexes.flatten():x, y, w, h = boxes[i]label = str(classes[class_ids[i]])confidence = str(round(confidences[i], 2))color = colors[class_ids[i]]# 繪制邊界框和標簽cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)cv2.putText(img, f"{label} {confidence}", (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)return imgdef object_detection(image_path):try:net, classes, output_layers = load_yolo()img = cv2.imread(image_path)if img is None:raise FileNotFoundError(f"無法加載圖像: {image_path}")height, width = img.shape[:2]outputs = detect_objects(img, net, output_layers)boxes, confidences, class_ids = get_box_dimensions(outputs, height, width)img = draw_labels(boxes, confidences, class_ids, classes, img)cv2.imshow("Object Detection", img)cv2.waitKey(0)cv2.destroyAllWindows()except Exception as e:print(f"發生錯誤: {str(e)}")if __name__ == "__main__":# 使用示例image_path = "myimg.jpg" # 替換為你的圖片路徑object_detection(image_path)
四、效果演示
4.1? 識別香蕉
4.2? 識別手機
4.3? 識別夜景中的汽車和人