【LLaMA-Factory】使用LoRa微調訓練DeepSeek-R1-Distill-Qwen-7B

本地環境說明
禁用開源驅動nouveau
安裝nvidia-smi
安裝Git環境
安裝Anaconda(conda)環境
下載`DeepSeek-R1-Distill-Qwen-7B`模型
安裝LLaMA-Factory
下載LLaMA-Factory
安裝LLaMA-Factory依賴
修改環境變量
安裝deepspeed
Alpaca數據集準備
lora配置文件準備
設置GPU個數
啟動微調
查看微調時GPU情況
啟動webui服務
對話
損失函數曲線
參考資料

本地環境說明

依賴	版本
Linux	BigCloud Enterprise Linux 8.6
GPU	NVIDIA Tesla T4 16G * 8

禁用開源驅動nouveau

如果不禁用開源驅動，直接安裝nvidia-smi，會安裝失敗，在日志文件/var/log/nvidia-installer.log中會出現以下錯誤信息
ERROR: Unable to load the kernel module 'nvidia.ko'

查看nouveau是否在運行，先輸入指令

lsmod | grep nouveau

如果不出現一下的情況則已經禁用

nouveau              2334720  0
video                  57344  1 nouveau
mxm_wmi                16384  1 nouveau
drm_kms_helper        262144  5 drm_vram_helper,ast,nouveau
ttm                   114688  3 drm_vram_helper,drm_ttm_helper,nouveau
i2c_algo_bit           16384  3 igb,ast,nouveau
drm                   614400  7 drm_kms_helper,drm_vram_helper,ast,drm_ttm_helper,ttm,nouveau
i2c_core               98304  9 drm_kms_helper,i2c_algo_bit,igb,ast,i2c_smbus,i2c_i801,ipmi_ssif,nouveau,drm
wmi                    32768  2 mxm_wmi,nouveau

禁用nouveau

# 如果文件不存在，就創建
sudo sh -c 'cat > /etc/modprobe.d/blacklist.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF'# 重新生成 initramfs
sudo dracut --force
# 重啟機器
sudo reboot

安裝nvidia-smi

瀏覽器訪問: https://www.nvidia.com/en-us/drivers/

需要梯子(科學上網)，才能加載Manual Driver Search
填寫驅動信息
驅動查詢結果
驅動下載頁面
下載驅動文件，得到NVIDIA-Linux-x86_64-570.133.20.run，上傳到Linux機器上

不能復制下載地址，然后在機器上使用wget命令直接下載，這樣請求會返回403
安裝驅動

必須使用root權限安裝
-no-x-check #安裝驅動時關閉X服務
-no-nouveau-check #安裝驅動時禁用nouveau
-no-opengl-files #只安裝驅動文件，不安裝OpenGL文件

安裝的時候，出現內核模塊類型選擇，根據提示選擇NVIDIA Proprietary，使用左右鍵控制選擇，然后回車

Multiple kernel module types are available for this system. Which would you like to use?NVIDIA Proprietary        MIT/GPL

安裝完運行命令確認驅動安裝成功

nvidia-smi

顯卡信息如下

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:3D:00.0 Off |                    0 |
| N/A   51C    P0             27W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:3E:00.0 Off |                    0 |
| N/A   54C    P0             28W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:40:00.0 Off |                    0 |
| N/A   51C    P0             27W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:41:00.0 Off |                    0 |
| N/A   48C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla T4                       Off |   00000000:B1:00.0 Off |                    0 |
| N/A   47C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla T4                       Off |   00000000:B2:00.0 Off |                    0 |
| N/A   53C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla T4                       Off |   00000000:B4:00.0 Off |                    0 |
| N/A   56C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla T4                       Off |   00000000:B5:00.0 Off |                    0 |
| N/A   55C    P0             30W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

安裝Git環境

安裝git和git-lfs

sudo dnf install git git-lfs

安裝Anaconda(conda)環境

下載頁面: https://www.anaconda.com/download/success
64-Bit (x86) Installer下載

wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh

安裝

sudo sh Anaconda3-2024.10-1-Linux-x86_64.sh

會出現很多信息，一路yes下去，觀看文檔用q跳過

Do you accept the license terms? [yes|no]
>>> yesAnaconda3 will now be installed into this location:
/root/anaconda3- Press ENTER to confirm the location- Press CTRL-C to abort the installation- Or specify a different location below
[/root/anaconda3] >>> /data/ProgramFiles/anaconda3You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes

設置環境變量

cat >> ~/.bash_profile << EOF
export ANACONDA3_HOME=/data/ProgramFiles/anaconda3
export CONDA_ENVS_PATH=\$ANACONDA3_HOME/envs
export PATH="\$ANACONDA3_HOME/bin:$PATH"
EOF
source ~/.bash_profile# 目錄是使用root權限安裝的，對目錄進行授權
sudo chown -R tkyj.tkyj /data

查看conda版本以驗證是否安裝成功

conda -V

配置鏡像源

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

下載`DeepSeek-R1-Distill-Qwen-7B`模型

魔搭社區: https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

mkdir -pv /data/llm/models
cd /data/llm/models# 如果您希望跳過 lfs 大文件下載，可以使用如下命令
GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.git
# 請確保lfs已經被正確安裝
cd DeepSeek-R1-Distill-Qwen-7B
git lfs install
# 下載大文件
git lfs pull

安裝LLaMA-Factory

軟件要求

Mandatory	Minimum	Recommend
python	3.9	3.10
torch	2.0.0	2.6.0
transformers	4.45.0	4.50.0
datasets	2.16.0	3.2.0
accelerate	0.34.0	1.2.1
peft	0.14.0	0.15.1
trl	0.8.6	0.9.6

Optional	Minimum	Recommend
CUDA	11.6	12.2
deepspeed	0.10.0	0.16.4
bitsandbytes	0.39.0	0.43.1
vllm	0.4.3	0.8.2
flash-attn	2.5.6	2.7.2

硬件要求

Method	Bits	7B	14B	30B	70B	`x`B
Full (`bf16` or `fp16`)	32	120GB	240GB	600GB	1200GB	`18x`GB
Full (`pure_bf16`)	16	60GB	120GB	300GB	600GB	`8x`GB
Freeze/LoRA/GaLore/APOLLO/BAdam	16	16GB	32GB	64GB	160GB	`2x`GB
QLoRA	8	10GB	20GB	40GB	80GB	`x`GB
QLoRA	4	6GB	12GB	24GB	48GB	`x/2`GB
QLoRA	2	4GB	8GB	16GB	24GB	`x/4`GB

下載LLaMA-Factory

使用git克隆項目

cd /data/ProgramFiles
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git

安裝LLaMA-Factory依賴

cd LLaMA-Factory
conda create --name llama_factory  python=3.10
conda activate llama_factory
# 從torch官網匹配cuda 12.8版本
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple# 驗證安裝
llamafactory-cli version

修改環境變量

方便每次登錄終端自動切換到對應的Python環境

cat >> ~/.bash_profile << EOF
conda activate llama_factory
EOF
source ~/.bash_profile

安裝deepspeed

單卡16G顯存微調7B模型有點吃緊，可以使用deepspeed解決

# 查詢cuda版本
conda search cuda -c nvidia
# 安裝與nvidia-smi對應的版本(安裝cuda是安裝deepspeed的前置條件)
conda install cuda=12.8.1 -c nvidia
DS_BUILD_CPU_ADAM=1 pip3 install deepspeed==0.16.7

Alpaca數據集準備

alpaca示例格式

[{"instruction": "Invent a new word by combining two existing words.","input": "","output": "\"Chillax\" - a combination of the words \"chill\" and \"relax\". It means to calm down and unwind, enjoying oneself in a laidback manner."},{"instruction": "Transform the following sentence using a synonym: The car sped quickly.","input": "","output": "The car accelerated rapidly."}

修改LLaMA-Factory/data/dataset_info.json，增加如下信息

"cmic_financial_apaca": {"file_name": "/data/llm/dataset/cmic_financial_apaca.json","columns": {"prompt": "instruction","query": "input","response": "output","system": "system"}
}

lora配置文件準備

備份原始文件

cd LLaMA-Factory/examples/train_lora
# 從原始文件復制一份新的出來
cp llama3_lora_sft.yaml ds_qwen7b_lora_sft.yaml
vi examples/train_lora/ds_qwen7b_lora_sft.yaml

修改ds_qwen7b_lora_sft.yaml，主要修改如下字段
- model_name_or_path
- dataset
- template
- cutoff_len
- max_samples
- output_dir
需要關注以下參數
- model_name_or_path: 模型路徑
- dataset: 數據集名稱，對應上面聲明的cmic_financial_apaca
- template: 模版
- cutoff_len: 控制輸入序列的最大長度
- output_dir: 微調后權重保存路徑
- gradient_accumulation_steps: 梯度累積的步數，GPU資源不足時需要減少該值
- num_train_epochs: 訓練的輪數
ds_qwen7b_lora_sft.yaml完整內容如下

### model
model_name_or_path: /data/llm/models/DeepSeek-R1-Distill-Qwen-7B
trust_remote_code: true### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all### dataset
dataset: cmic_financial_apaca
template: deepseek3
cutoff_len: 4096
max_samples: 4019
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4### output
output_dir: /data/llm/models/sft/DeepSeek-R1-Distill-Qwen-7B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none  # choices: [none, wandb, tensorboard, swanlab, mlflow]### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null### eval
#eval_dataset: alpaca_en_demo
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500

設置GPU個數

多卡并行時，設置GPU個數，修改: LLaMA-Factory/examples/accelerate/fsdp_config.yaml

num_processes: 8  # the number of GPUs in all nodes

啟動微調

conda activate llama_factory# 后臺運行
nohup llamafactory-cli train /data/ProgramFiles/LLaMA-Factory/examples/train_lora/ds_qwen7b_lora_sft.yaml > nohup.log 2>&1 &# 查看日志
tail -fn200 nohup.log

查看微調時GPU情況

運行命令: watch -n 0.5 nvidia-smi

Every 0.5s: nvidia-smi                                                                                                                                             localhost.localdomain: Mon Apr 28 15:33:10 2025Mon Apr 28 15:33:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20             Driver Version: 570.133.20     CUDA Version:      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:3D:00.0 Off |                    0 |
| N/A   39C    P8             19W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00000000:3E:00.0 Off |                    0 |
| N/A   40C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla T4                       Off |   00000000:40:00.0 Off |                    0 |
| N/A   37C    P0             26W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla T4                       Off |   00000000:41:00.0 Off |                    0 |
| N/A   37C    P0             25W /   70W |       0MiB /  15360MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Tesla T4                       Off |   00000000:B1:00.0 Off |                    0 |
| N/A   36C    P8             13W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Tesla T4                       Off |   00000000:B2:00.0 Off |                    0 |
| N/A   40C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Tesla T4                       Off |   00000000:B4:00.0 Off |                    0 |
| N/A   42C    P8             15W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Tesla T4                       Off |   00000000:B5:00.0 Off |                    0 |
| N/A   41C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

啟動webui服務

# 關閉防火墻，方便訪問web服務端口
sudo systemctl stop firewalldGRADIO_SHARE=1
nohup llamafactory-cli webui > webui.log 2>&1 &
tail -fn200 webui.log

超參配置1
超參配置2

對話

llamafactory-cli chat /data/ProgramFiles/LLaMA-Factory/examples/inference/ds_qwen7b_lora_sft.yaml

損失函數曲線

參考資料

開源模型應用落地-DeepSeek-R1-Distill-Qwen-7B-LoRA微調-LLaMA-Factory-單機單卡-V100（一）
保姆級零基礎微調大模型（LLaMa-Factory，多卡版）
LLaMA-Factory：手把手教你從零微調大模型！
大語言模型訓練“參數”到底改怎么調？？？

【LLaMA-Factory】使用LoRa微調訓練DeepSeek-R1-Distill-Qwen-7B

【LLaMA-Factory】使用LoRa微調訓練DeepSeek-R1-Distill-Qwen-7B

本地環境說明

禁用開源驅動nouveau

安裝nvidia-smi

安裝Git環境

安裝Anaconda(conda)環境

下載DeepSeek-R1-Distill-Qwen-7B模型

安裝LLaMA-Factory

下載LLaMA-Factory

安裝LLaMA-Factory依賴

修改環境變量

安裝deepspeed

Alpaca數據集準備

lora配置文件準備

設置GPU個數

啟動微調

查看微調時GPU情況

啟動webui服務

對話

損失函數曲線

參考資料

相關文章

下載`DeepSeek-R1-Distill-Qwen-7B`模型