用DeepSeek-R1-Distill-data-110k蒸餾中文數據集微調Qwen2.5-7B-Instruct！

下載模型與數據

模型下載：
huggingface：
Qwen/Qwen2.5-7B-Instruct · HF MirrorWe’re on a journey to advance and democratize artificial intelligence through open source and open science.https://hf-mirror.com/Qwen/Qwen2.5-7B-Instruct
魔搭：
魔搭社區匯聚各領域最先進的機器學習模型，提供模型探索體驗、推理、訓練、部署和應用的一站式服務。https://www.modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct
數據下載：
https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110khttps://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k
?

安裝swift

使用 pip 安裝：

pip install ms-swift -U

從源安裝：

# pip install git+https://github.com/modelscope/ms-swift.gitgit clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

微調

CUDA_VISIBLE_DEVICES=0,1 \
swift sft \--model /home/models/pretrained_models/llm/Qwen2.5-7B-Instruct \?--train_type lora \--dataset ?/home/data/Chinese-DeepSeek-R1-Distill-data-110k-SFT/new_distill_r1_110k_sft.json \--torch_dtype bfloat16 \--num_train_epochs 6 \--per_device_train_batch_size 1 \--per_device_eval_batch_size 1 \--learning_rate 1e-4 \--lora_rank 8 \--lora_alpha 32 \--target_modules all-linear \--gradient_accumulation_steps 16 \--eval_steps 50 \--save_steps 50 \--save_total_limit 5 \--logging_steps 5 \--output_dir output \--system 'You are a deep thinking assistant.' \--warmup_ratio 0.05 \--dataloader_num_workers 4 \--model_author Q \ ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??--model_name Q-AILab-Qwen2.5-7B-Instruct-R1-Distill

訓練過程

?2張A800，訓練時長5天，共訓練6輪。

推理效果

推理：

CUDA_VISIBLE_DEVICES=0,1 \
swift infer \--adapters /home/model/swift/output/v6-20250217-075043/checkpoint-50 \--stream true \--temperature 0 \--max_new_tokens 8192

推理測試：

Qwen2.5-7B-Instruct-DeepSeek-R1-Distill-data-110K 訓練完成！

后續合并Loar、斷點訓練、推送模型、可參考Swift github項目地址：

????????https://github.com/modelscope/ms-swifthttps://github.com/modelscope/ms-swift
?

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/72855.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/72855.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/72855.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！