內部叫Tuning-Factory
參數文檔https://llamafactory.readthedocs.io/zh-cn/latest/index.html
高級技巧,如加速:https://llamafactory.readthedocs.io/zh-cn/latest/advanced/acceleration.html
0.環境
conda env list
conda remove --name llm --all
conda create -n llm python=3.10
(切記不能11,具體看readme.md的推薦版本)
conda activate llm
cd LLaMA-Factory
pip install -e ".[torch,metrics]" -i https://pypi.tuna.tsinghua.edu.cn/simple --no-build-isolation
成功
(
可以嘗試指定tag,例如:
git clone --branch 0.7.1 --depth 1 https://github.com/username/repository.git
pip install llamafactory[metrics]==0.7.1
;
指定tag直接wget下載代碼壓縮包
當以git clone方式下載時默認下載當時最新dev版本:0.9.4.dev0)
)
llamafactory-cli version(有點久)
之前嘗試出現的報錯:
llamafactory-cli version報錯ImportError: cannot import name 'logging' from 'huggingface_hub'
from transformers import AutoTokenizer,AutoModelForCausalLM報錯
1. SFT
基座模型下載
多種途徑
- 魔塔社區模型庫,git clone https://www.modelscope.cn/Qwen/Qwen2.5-0.5B-Instruct.git
- huggingface
cli微調
進入Llama-Factory倉庫目錄,
如果是自定義數據集,則將數據集json文件移動到data目錄下,同時修改data目錄下的data_info.json,添加key名和自定義數據集名(value)
"identity_xuefeng": {"file_name": "identity_xuefeng.json"},
復制examples\train_qlora下提供的llama3_lora_sft_awq.yaml文件,修改文件名,修改文件內容:
- model_name_or_path (提前下載的基座模型的絕對路徑)
- template: llama3或qwen
- dataset: /data/data_info.json里的key值
- output_dir:相對路徑,saves/Qwen2.5-0.5B-Instruct/lora/sft(倉庫目錄下)
- epoch
llamafactory-cli help
Usage: |
| llamafactory-cli api -h: launch an OpenAI-style API server |
| llamafactory-cli chat -h: launch a chat interface in CLI |
| llamafactory-cli eval -h: evaluate models |
| llamafactory-cli export -h: merge LoRA adapters and export model |
| llamafactory-cli train -h: train models |
| llamafactory-cli webchat -h: launch a chat interface in Web UI |
| llamafactory-cli webui: launch LlamaBoard |
| llamafactory-cli version: show version info
llamafactory-cli train examples/train_qlora/xuefeng_qwen_lora_sft_awq.yaml
llamafactory-cli version
FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_qlora/xuefeng_qwen_lora_sft_awq.yaml
web UI微調
llamafactory-cli webui
export USE_MODELSCOPE_HUB=1 && llamafactory-cli webui
CUDA_VISIBLE_DEVICES=0 USE_MODELSCOPE_HUB=1 llamafactory-cli webui
2. 推理
基座模型直接推理
/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/examples/inference/llama3_lora_sft.yaml
復制examples\inference下提供的llama3_lora_sft.yaml文件,修改文件名,修改文件內容:
- model_name_or_path (提前下載的基座模型的絕對路徑)
- template: llama3或qwen
- adapter_name_or_path: saves/Qwen2.5-0.5B-Instruct/lora/sft (SFT輸出路徑,或者注釋掉則僅使用基座模型)
- infer_backend: huggingface # choices: [huggingface, vllm, sglang]
- trust_remote_code: true
llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
報錯1:
MaxRetryError: HTTPSConnectionPool(host=‘huggingface.co’, port=443)
解決:vim ~/.bashrc,添加
export HF_ENDPOINT=https://hf-mirror.com
source ~/.bashrc
conda activate llm
報錯LOCAL_RANK,卡了較久時間
export LOCAL_RANK=0
FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
解決:pip install e.安裝的4.52.4報上述LOCAL_RANK的錯誤,降級transofrmers=4.51.3,盡管requirements.txt要求范圍內都行
transformers>=4.49.0,<=4.52.4,!=4.52.0; sys_platform != 'darwin'
transformers>=4.49.0,<=4.51.3,!=4.52.0; sys_platform == 'darwin'
僅降級transformers仍不夠還會報錯,還需修改命令,最終成功運行命令為:
LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12346 WORLD_SIZE=1 RANK=0 CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
事實上,最小運行單位為:
LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12345 llamafactory-cli chat examples/inference/xuefeng_qwen_lora_sft.yaml
可以嘗試
export LOCAL_RANK=0
export MASTER_ADDR=127.0.0.1
至此走通下載模型直接推理(非SFT后的模型)鏈路
SFT后推理
然而,(Q)lora SFT后推理仍然報錯:
RuntimeError: aten.add_.Tensor: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
無論推理腳本里是否有或注釋掉finetuning_type: lora
詳細錯誤如下
[INFO|2025-07-12 18:58:51] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/bin/llamafactory-cli", line 8, in <module>
[rank0]: sys.exit(main())
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/cli.py", line 151, in main
[rank0]: COMMAND_MAP[command]()
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 154, in run_chat
[rank0]: chat_model = ChatModel()
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 53, in __init__
[rank0]: self.engine: BaseEngine = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 59, in __init__
[rank0]: self.model = load_model(
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/loader.py", line 184, in load_model
[rank0]: model = init_adapter(config, model, model_args, finetuning_args, is_trainable)
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/adapter.py", line 300, in init_adapter
[rank0]: model = _setup_lora_tuning(
[rank0]: File "/home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/src/llamafactory/model/adapter.py", line 184, in _setup_lora_tuning
[rank0]: model = model.merge_and_unload()
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 900, in merge_and_unload
[rank0]: return self._unload_and_optionally_merge(
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 531, in _unload_and_optionally_merge
[rank0]: target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/peft/tuners/lora/layer.py", line 617, in merge
[rank0]: base_layer.weight.data += delta_weight
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/_compile.py", line 51, in inner
[rank0]: return disable_fn(*args, **kwargs)
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_api.py", line 344, in __torch_dispatch__
[rank0]: return DTensor._op_dispatcher.dispatch(
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 167, in dispatch
[rank0]: op_info = self.unwrap_to_op_info(op_call, args, kwargs)
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 366, in unwrap_to_op_info
[rank0]: self._try_replicate_spec_for_scalar_tensor(
[rank0]: File "/home/jinxuefeng.jxf/.conda/envs/llm/lib/python3.10/site-packages/torch/distributed/tensor/_dispatch.py", line 468, in _try_replicate_spec_for_scalar_tensor
[rank0]: raise RuntimeError(
[rank0]: RuntimeError: aten.add_.Tensor: got mixed torch.Tensor and DTensor, need to convert all torch.Tensor to DTensor before calling distributed operators!
[rank0]:[W712 18:58:52.262922106 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
嘗試先export merge后infer,正常!!(merge之前的SFT 訓練train命令無論是否設置LOCAL_RANK=0 MASTER_ADDR=127.0.0.1 MASTER_PORT=12345都行)
merge_lora.yarm代碼
model_name_or_path: merge/Qwen2.5-0.5B-Instruct/identity_xuefeng
# model_name_or_path: /home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/offline_models/qwen/Qwen2.5-0.5B-Instruct
# model_name_or_path: /home/jinxuefeng.jxf/xuefeng/LLaMA-Factory/offline_models/llama/Llama-3.2-1B-Instruct
# adapter_name_or_path: saves/Qwen2.5-0.5B-Instruct/lora/sftT
template: qwen # llama3
infer_backend: huggingface # choices: [huggingface, vllm, sglang]
trust_remote_code: true
# finetuning_type: lora
3. 微調數據集
大模型數據集格式分為sharegpt和apaca
alpaca格式
通用指令微調:
{
“instruction”: “將以下中文翻譯成英文”,
“input”: “今天的天氣非常好”,
“output”: “The weather is very nice today.”
}
instruction:明確的任務指令(必須存在)
input:任務輸入內容(可能為空)
output:期望的輸出結果(必須存在)
sharegpt格式
多輪對話:
{"id": "chatcmpl-7F6Wr8JQ6JgB","conversations": [{"from": "human", "value": "Python里如何快速排序列表?"},{"from": "gpt", "value": "可以使用sorted()函數..."},{"from": "human", "value": "時間復雜度是多少?"}]
}
def alpaca_to_sharegpt(alpaca_data):
return {
“conversations”: [
{“from”: “human”, “value”: f"{alpaca_data[‘instruction’]}\n{alpaca_data[‘input’]}"},
{“from”: “gpt”, “value”: alpaca_data[“output”]}
]
}
Easy Dataset
LLM微調數據集創建工具
處理文本,生成alpaca、sharegpt格式數據集
https://github.com/ConardLi/easy-dataset
可以輸入任意文本,自動生成問題及對應的答案
「文獻處理-問題生成-答案構建-標簽管理-格式導出」
git clone https://www.modelscope.cn/datasets/xiaofengalg/Chinese-medical-dialogue.git
進入魔塔數據,下載數據至LLaMA-Factory/data/xxx.json
"custom_sft_train_data":{
"file_name":"Chinese-medical-dialogue/data/train_0001_of_0001.json",
"columns":{
"prompt":"instruction",
"query":"input",
"response":"output"}
},
按數據集格式編寫格式,寫進LLaMA-Factory/data/data_info.json中
若數據集已經是sharegpt格式:
"data_name":{
"file_name":"xx/xxx/xx.json",
"formatting": "sharegpt"
},
3. 目錄結果及模型pt結構
倉庫目錄
LLAMA-Factory 的項目目錄結構,下面將對個幾個比較重要的文件和文件夾做簡要介紹,方便大家了解整體的框架:
- 文件夾
- assets
- 用途:通常用于存放項目的靜態資源,如圖像、樣式表、JavaScript 文件等。
- 說明:這些資源可能用于前端展示或用戶界面。
- data
- 用途:存放數據集、配置文件或其他與數據相關的文件。(微調的數據集下載后就放在這里)
- 說明:這些文件可能包括訓練數據、測試數據或模型配置信息。
- docker
- 用途:包含 Docker 相關的配置文件和腳本,用于容器化部署。
- 說明:這些文件幫助自動化部署過程,確保在不同環境中的一致性。
- evaluation
- 用途:存放評估模型性能的腳本和工具。
- 說明:這些腳本用于衡量模型的準確性和其他指標。
- examples
- 用途:提供示例代碼和用例,幫助用戶快速上手。(微調和訓練的參數配置文件在這里)
- 說明:這些示例展示了如何使用項目中的功能。
- scripts
- 用途:存放各種腳本文件,用于自動化任務或輔助功能。
- 說明:這些腳本可能包括數據預處理、模型訓練等任務。
- src
- 用途:存放項目的源代碼。
- 說明:這是項目的核心代碼所在的地方。
- tests
- 用途:存放測試代碼,用于驗證項目功能的正確性。
- 說明:這些測試腳本確保代碼的質量和穩定性。
微調/訓練后的輸出文件
-
model
-
config.json
模型配置文件,包含模型架構、參數等。 -
generation_config.json
生成時的配置 -
merges.txt
分詞器(tokenizer)的合并規則文件,用于將子詞組合成完整的詞匯 -
model.safetensors
安全的二進制格式,存放模型權重信息。模型較大時可能出現多個切片文件 -
optimizer.pt (存儲最大)
猜測優化器狀態 -
scheduler.pt
-
tokenizer_config.json
-
tokenizer.json
-
vocab.jaon
詞表