基于千問2.5-VL-7B訓練識別人的表情

一、安裝LLaMA-Factory

? ? ? ? 我們使用LLaMA-Factory來進行微調，安裝LLaMA-Factory來參考文章：
大模型微調工具LLaMA-Factory的安裝流程-CSDN博客
?

二、下載千問2.5-VL-7B模型

我們使用千問2.5-VL-7B多模態模型來進行微調
通義千問2.5-VL-7B-Instruct

下載命令（需要先安裝modelscope的依賴，魔塔上有安裝教程）：

modelscope download --model Qwen/Qwen2.5-VL-7B-Instruct

三、數據集

????????在微調之前準備數據集，我們使用Kaggle平臺上開放的數據集FER-2013，該數據集包含了大量的人臉表情圖片，并做了對應的表情標注。

????????數據地址：FER-2013 | Kaggle

? ? ? ? 數據下載后需要構建訓練集和驗證集的json格式文件，包括圖片的路徑和情感標簽。
下面是處理數據的腳本：

import json
import os
from pathlib import Path# 定義消息對象
class Message:def __init__(self, role, content):self.role = roleself.content = content# 定義對話組對象
class ConversationGroup:def __init__(self, messages, images):self.messages = messagesself.images = imagesdef to_dict(self):return {"messages": [msg.__dict__ for msg in self.messages],"images": self.images}def get_file_paths(directory):"""獲取指定目錄下所有文件夾中的文件路徑:param directory: 要掃描的根目錄:return: 包含所有文件路徑的列表"""file_paths = []# 檢查目錄是否存在if not os.path.exists(directory):print(f"錯誤：目錄 '{directory}' 不存在")return file_paths# 遍歷目錄下的所有項目for item in os.listdir(directory):item_path = os.path.join(directory, item)# 只處理文件夾（忽略文件）if os.path.isdir(item_path):# 遍歷文件夾中的所有文件for file in os.listdir(item_path):file_path = os.path.join(item_path, file)# 只添加文件（忽略子文件夾）if os.path.isfile(file_path):file_paths.append(file_path)return file_pathsdef get_path_dir_info(path_file):new_path = "archive" + path_file.split("archive")[1]path_n = Path(new_path)# 獲取上一級目錄名parent_dir_name = path_n.parent.namereturn new_path, parent_dir_nameemotion = {"angry":"生氣/憤怒","disgust":"厭惡","fear":"害怕/恐懼","happy":"開心/快樂","neutral":"平靜","sad":"悲傷/難過","surprise":"驚訝/驚奇"
}
if __name__ == '__main__':all_files = get_file_paths("/Users/youngwea/Downloads/archive/train")print(all_files)output_data = []for file in all_files:new_path , dir_name = get_path_dir_info(file)user_message = Message("user", "<image>是什么表情？")assistant_message = Message("assistant", emotion.get(dir_name))conversation = ConversationGroup(messages=[user_message, assistant_message],images=[new_path])output_data.append(conversation.to_dict())json_output = json.dumps(output_data, indent=2, ensure_ascii=False)with open('../data/qwen2.5-vl-train-data.json', 'w', encoding='utf-8') as file:file.write(json_output)

處理好的數據集需要復制到LLaMA-Factory的工程項目的/data目錄中去，再在dataset_info.json文件中添加數據文件。

三、訓練配置:

訓練輪數為3的時候，對識別表情不太準確。增大后準確率高點。

訓練腳本：

llamafactory-cli train \--stage sft \--do_train True \--model_name_or_path xxx/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \--preprocessing_num_workers 16 \--finetuning_type lora \--template qwen2_vl \--flash_attn auto \--dataset_dir data \--dataset qwen2.5-vl-train-data \--cutoff_len 2048 \--learning_rate 5e-05 \--num_train_epochs 5.0 \--max_samples 100000 \--per_device_train_batch_size 2 \--gradient_accumulation_steps 8 \--lr_scheduler_type cosine \--max_grad_norm 1.0 \--logging_steps 5 \--save_steps 100 \--warmup_steps 0 \--packing False \--enable_thinking True \--report_to none \--output_dir saves/Qwen2.5-VL-7B-Instruct/lora/train_qwen2.5-vl-_2025-07-31-14-02-45 \--bf16 True \--plot_loss True \--trust_remote_code True \--ddp_timeout 180000000 \--include_num_input_tokens_seen True \--optim adamw_torch \--lora_rank 8 \--lora_alpha 16 \--lora_dropout 0 \--lora_target all \--freeze_vision_tower True \--freeze_multi_modal_projector True \--freeze_language_model False \--image_max_pixels 589824 \--image_min_pixels 1024 \--video_max_pixels 65536 \--video_min_pixels 256

?推薦一個非常好用的工具集合：在線工具集合 - 您的開發助手

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/diannao/93702.shtml
繁體地址，請注明出處：http://hk.pswp.cn/diannao/93702.shtml
英文地址，請注明出處：http://en.pswp.cn/diannao/93702.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！