【LLaMA-Factory】使用LoRa微調訓練DeepSeek-R1-Distill-Qwen-7B
- 本地環境說明
- 禁用開源驅動nouveau
- 安裝nvidia-smi
- 安裝Git環境
- 安裝Anaconda(conda)環境
- 下載`DeepSeek-R1-Distill-Qwen-7B`模型
- 安裝LLaMA-Factory
- 下載LLaMA-Factory
- 安裝LLaMA-Factory依賴
- 修改環境變量
- 安裝deepspeed
- Alpaca數據集準備
- lora配置文件準備
- 設置GPU個數
- 啟動微調
- 查看微調時GPU情況
- 啟動webui服務
- 對話
- 損失函數曲線
- 參考資料
本地環境說明
依賴 | 版本 |
---|---|
Linux | BigCloud Enterprise Linux 8.6 |
GPU | NVIDIA Tesla T4 16G * 8 |
禁用開源驅動nouveau
如果不禁用開源驅動,直接安裝nvidia-smi
,會安裝失敗,在日志文件/var/log/nvidia-installer.log
中會出現以下錯誤信息
ERROR: Unable to load the kernel module 'nvidia.ko'
- 查看
nouveau
是否在運行,先輸入指令
lsmod | grep nouveau
如果不出現一下的情況則已經禁用
nouveau 2334720 0
video 57344 1 nouveau
mxm_wmi 16384 1 nouveau
drm_kms_helper 262144 5 drm_vram_helper,ast,nouveau
ttm 114688 3 drm_vram_helper,drm_ttm_helper,nouveau
i2c_algo_bit 16384 3 igb,ast,nouveau
drm 614400 7 drm_kms_helper,drm_vram_helper,ast,drm_ttm_helper,ttm,nouveau
i2c_core 98304 9 drm_kms_helper,i2c_algo_bit,igb,ast,i2c_smbus,i2c_i801,ipmi_ssif,nouveau,drm
wmi 32768 2 mxm_wmi,nouveau
- 禁用
nouveau
# 如果文件不存在,就創建
sudo sh -c 'cat > /etc/modprobe.d/blacklist.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF'# 重新生成 initramfs
sudo dracut --force
# 重啟機器
sudo reboot
安裝nvidia-smi
- 瀏覽器訪問:
https://www.nvidia.com/en-us/drivers/
需要梯子(科學上網),才能加載
Manual Driver Search
- 填寫驅動信息
- 驅動查詢結果
- 驅動下載頁面
- 下載驅動文件,得到
NVIDIA-Linux-x86_64-570.133.20.run
,上傳到Linux
機器上不能復制下載地址,然后在機器上使用
wget
命令直接下載,這樣請求會返回403
- 安裝驅動
- 必須使用
root
權限安裝- -no-x-check #安裝驅動時關閉X服務
- -no-nouveau-check #安裝驅動時禁用nouveau
- -no-opengl-files #只安裝驅動文件,不安裝OpenGL文件
- 安裝的時候,出現內核模塊類型選擇,根據提示選擇
NVIDIA Proprietary
,使用左右鍵控制選擇,然后回車
Multiple kernel module types are available for this system. Which would you like to use?NVIDIA Proprietary MIT/GPL
- 安裝完運行命令確認驅動安裝成功
nvidia-smi
顯卡信息如下
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:3D:00.0 Off | 0 |
| N/A 51C P0 27W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:3E:00.0 Off | 0 |
| N/A 54C P0 28W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla T4 Off | 00000000:40:00.0 Off | 0 |
| N/A 51C P0 27W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla T4 Off | 00000000:41:00.0 Off | 0 |
| N/A 48C P0 26W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Tesla T4 Off | 00000000:B1:00.0 Off | 0 |
| N/A 47C P0 26W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Tesla T4 Off | 00000000:B2:00.0 Off | 0 |
| N/A 53C P0 30W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Tesla T4 Off | 00000000:B4:00.0 Off | 0 |
| N/A 56C P0 30W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Tesla T4 Off | 00000000:B5:00.0 Off | 0 |
| N/A 55C P0 30W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
安裝Git環境
安裝git
和git-lfs
sudo dnf install git git-lfs
安裝Anaconda(conda)環境
- 下載頁面:
https://www.anaconda.com/download/success
64-Bit (x86) Installer
下載
wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh
- 安裝
sudo sh Anaconda3-2024.10-1-Linux-x86_64.sh
- 會出現很多信息,一路
yes
下去,觀看文檔用q
跳過
Do you accept the license terms? [yes|no]
>>> yesAnaconda3 will now be installed into this location:
/root/anaconda3- Press ENTER to confirm the location- Press CTRL-C to abort the installation- Or specify a different location below
[/root/anaconda3] >>> /data/ProgramFiles/anaconda3You can undo this by running `conda init --reverse $SHELL`? [yes|no]
[no] >>> yes
- 設置環境變量
cat >> ~/.bash_profile << EOF
export ANACONDA3_HOME=/data/ProgramFiles/anaconda3
export CONDA_ENVS_PATH=\$ANACONDA3_HOME/envs
export PATH="\$ANACONDA3_HOME/bin:$PATH"
EOF
source ~/.bash_profile# 目錄是使用root權限安裝的,對目錄進行授權
sudo chown -R tkyj.tkyj /data
- 查看
conda
版本以驗證是否安裝成功
conda -V
- 配置鏡像源
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
下載DeepSeek-R1-Distill-Qwen-7B
模型
- 魔搭社區:
https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
mkdir -pv /data/llm/models
cd /data/llm/models# 如果您希望跳過 lfs 大文件下載,可以使用如下命令
GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B.git
# 請確保lfs已經被正確安裝
cd DeepSeek-R1-Distill-Qwen-7B
git lfs install
# 下載大文件
git lfs pull
安裝LLaMA-Factory
- 軟件要求
Mandatory | Minimum | Recommend |
---|---|---|
python | 3.9 | 3.10 |
torch | 2.0.0 | 2.6.0 |
transformers | 4.45.0 | 4.50.0 |
datasets | 2.16.0 | 3.2.0 |
accelerate | 0.34.0 | 1.2.1 |
peft | 0.14.0 | 0.15.1 |
trl | 0.8.6 | 0.9.6 |
Optional | Minimum | Recommend |
---|---|---|
CUDA | 11.6 | 12.2 |
deepspeed | 0.10.0 | 0.16.4 |
bitsandbytes | 0.39.0 | 0.43.1 |
vllm | 0.4.3 | 0.8.2 |
flash-attn | 2.5.6 | 2.7.2 |
- 硬件要求
Method | Bits | 7B | 14B | 30B | 70B | x B |
---|---|---|---|---|---|---|
Full (bf16 or fp16 ) | 32 | 120GB | 240GB | 600GB | 1200GB | 18x GB |
Full (pure_bf16 ) | 16 | 60GB | 120GB | 300GB | 600GB | 8x GB |
Freeze/LoRA/GaLore/APOLLO/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 2x GB |
QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | x GB |
QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | x/2 GB |
QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | x/4 GB |
下載LLaMA-Factory
- 使用
git
克隆項目
cd /data/ProgramFiles
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
安裝LLaMA-Factory依賴
cd LLaMA-Factory
conda create --name llama_factory python=3.10
conda activate llama_factory
# 從torch官網匹配cuda 12.8版本
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple# 驗證安裝
llamafactory-cli version
修改環境變量
方便每次登錄終端自動切換到對應的Python
環境
cat >> ~/.bash_profile << EOF
conda activate llama_factory
EOF
source ~/.bash_profile
安裝deepspeed
單卡16G
顯存微調7B
模型有點吃緊,可以使用deepspeed
解決
# 查詢cuda版本
conda search cuda -c nvidia
# 安裝與nvidia-smi對應的版本(安裝cuda是安裝deepspeed的前置條件)
conda install cuda=12.8.1 -c nvidia
DS_BUILD_CPU_ADAM=1 pip3 install deepspeed==0.16.7
Alpaca數據集準備
alpaca
示例格式
[{"instruction": "Invent a new word by combining two existing words.","input": "","output": "\"Chillax\" - a combination of the words \"chill\" and \"relax\". It means to calm down and unwind, enjoying oneself in a laidback manner."},{"instruction": "Transform the following sentence using a synonym: The car sped quickly.","input": "","output": "The car accelerated rapidly."}
- 修改
LLaMA-Factory/data/dataset_info.json
,增加如下信息
"cmic_financial_apaca": {"file_name": "/data/llm/dataset/cmic_financial_apaca.json","columns": {"prompt": "instruction","query": "input","response": "output","system": "system"}
}
lora配置文件準備
- 備份原始文件
cd LLaMA-Factory/examples/train_lora
# 從原始文件復制一份新的出來
cp llama3_lora_sft.yaml ds_qwen7b_lora_sft.yaml
vi examples/train_lora/ds_qwen7b_lora_sft.yaml
- 修改
ds_qwen7b_lora_sft.yaml
,主要修改如下字段model_name_or_path
dataset
template
cutoff_len
max_samples
output_dir
- 需要關注以下參數
model_name_or_path
: 模型路徑dataset
: 數據集名稱,對應上面聲明的cmic_financial_apaca
template
: 模版cutoff_len
: 控制輸入序列的最大長度output_dir
: 微調后權重保存路徑gradient_accumulation_steps
: 梯度累積的步數,GPU
資源不足時需要減少該值num_train_epochs
: 訓練的輪數
ds_qwen7b_lora_sft.yaml
完整內容如下
### model
model_name_or_path: /data/llm/models/DeepSeek-R1-Distill-Qwen-7B
trust_remote_code: true### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all### dataset
dataset: cmic_financial_apaca
template: deepseek3
cutoff_len: 4096
max_samples: 4019
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4### output
output_dir: /data/llm/models/sft/DeepSeek-R1-Distill-Qwen-7B
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null### eval
#eval_dataset: alpaca_en_demo
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 500
設置GPU個數
多卡并行時,設置GPU個數,修改: LLaMA-Factory/examples/accelerate/fsdp_config.yaml
num_processes: 8 # the number of GPUs in all nodes
啟動微調
conda activate llama_factory# 后臺運行
nohup llamafactory-cli train /data/ProgramFiles/LLaMA-Factory/examples/train_lora/ds_qwen7b_lora_sft.yaml > nohup.log 2>&1 &# 查看日志
tail -fn200 nohup.log
查看微調時GPU情況
運行命令: watch -n 0.5 nvidia-smi
Every 0.5s: nvidia-smi localhost.localdomain: Mon Apr 28 15:33:10 2025Mon Apr 28 15:33:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:3D:00.0 Off | 0 |
| N/A 39C P8 19W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:3E:00.0 Off | 0 |
| N/A 40C P0 26W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla T4 Off | 00000000:40:00.0 Off | 0 |
| N/A 37C P0 26W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla T4 Off | 00000000:41:00.0 Off | 0 |
| N/A 37C P0 25W / 70W | 0MiB / 15360MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 Tesla T4 Off | 00000000:B1:00.0 Off | 0 |
| N/A 36C P8 13W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 Tesla T4 Off | 00000000:B2:00.0 Off | 0 |
| N/A 40C P8 15W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 Tesla T4 Off | 00000000:B4:00.0 Off | 0 |
| N/A 42C P8 15W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 7 Tesla T4 Off | 00000000:B5:00.0 Off | 0 |
| N/A 41C P8 11W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
啟動webui服務
# 關閉防火墻,方便訪問web服務端口
sudo systemctl stop firewalldGRADIO_SHARE=1
nohup llamafactory-cli webui > webui.log 2>&1 &
tail -fn200 webui.log
對話
llamafactory-cli chat /data/ProgramFiles/LLaMA-Factory/examples/inference/ds_qwen7b_lora_sft.yaml
損失函數曲線
參考資料
- 開源模型應用落地-DeepSeek-R1-Distill-Qwen-7B-LoRA微調-LLaMA-Factory-單機單卡-V100(一)
- 保姆級零基礎微調大模型(LLaMa-Factory,多卡版)
- LLaMA-Factory:手把手教你從零微調大模型!
- 大語言模型訓練“參數”到底改怎么調???