【GPT入門】第43課 使用LlamaFactory微調Llama3

【GPT入門】第43課 使用LlamaFactory微調Llama3

  • 1.環境準備
  • 2. 下載基座模型
  • 3.LLaMA-Factory部署與啟動
  • 4. 重新訓練![在這里插入圖片描述](https://i-blog.csdnimg.cn/direct/e7aa869f8e2c4951a0983f0918e1b638.png)

1.環境準備

采購autodl服務器,24G,GPU,型號3090,每小時大概2元。
在這里插入圖片描述

2. 下載基座模型

下載基座模型:

source /etc/network_turbo
pip install -U huggingface_hub
![在這里插入圖片描述](https://i-blog.csdnimg.cn/direct/078976ae914a4bb2a9d98dc78bfb5a92.png)export HF_ENDPOINT=https://hf-mirror.comsource /etc/network_turbohuggingface-cli download --resume-download shenzhi-wang/Llama3-8B-Chinese-Chat --local-dir /root/autodl-tmp/models/Llama3-8B-Chinese-Chat

說明:
配置hf國內鏡像
export HF_ENDPOINT=https://hf-mirror.com
autodl的學術加速
source /etc/network_turbo

3.LLaMA-Factory部署與啟動

  • 官網介紹
    https://github.com/hiyouga/LLaMA-Factory

在數據盤下載LLaMA-Facotry

cd /root/autodl-tmp
  • 下載
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .
  • 啟動ui
llamafactory-cli webui

在這里插入圖片描述
配置好模型相關參數

可以生成命令,點擊開始,就開始訓練
在這里插入圖片描述

訓練損失:
在這里插入圖片描述
訓練結果:
在這里插入圖片描述

4. 重新訓練在這里插入圖片描述

配置修改
在這里插入圖片描述

  • 查看內存使用 nvitop
    訓練過程,內存,GPU都使用完。
    在這里插入圖片描述
  • 記錄訓練過程
    在這里插入圖片描述
  • 訓練結果
null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 17:41:53] logging.py:143 >> KV cache is disabled during training.
[INFO|2025-08-10 17:41:53] modeling_utils.py:1305 >> loading weights file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/model.safetensors.index.json
[INFO|2025-08-10 17:41:53] modeling_utils.py:2411 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|2025-08-10 17:41:53] configuration_utils.py:1098 >> Generate config GenerationConfig {"bos_token_id": 128000,"eos_token_id": 128009,"use_cache": false
}[INFO|2025-08-10 17:41:55] modeling_utils.py:5606 >> All model checkpoint weights were used when initializing LlamaForCausalLM.[INFO|2025-08-10 17:41:55] modeling_utils.py:5614 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|2025-08-10 17:41:55] configuration_utils.py:1051 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/generation_config.json
[INFO|2025-08-10 17:41:55] configuration_utils.py:1098 >> Generate config GenerationConfig {"bos_token_id": 128000,"eos_token_id": 128009,"pad_token_id": 128009
}[INFO|2025-08-10 17:41:55] logging.py:143 >> Gradient checkpointing enabled.
[INFO|2025-08-10 17:41:55] logging.py:143 >> Using torch SDPA for faster training and inference.
[INFO|2025-08-10 17:41:55] logging.py:143 >> Upcasting trainable params to float32.
[INFO|2025-08-10 17:41:55] logging.py:143 >> Fine-tuning method: LoRA
[INFO|2025-08-10 17:41:55] logging.py:143 >> Found linear modules: k_proj,gate_proj,o_proj,down_proj,v_proj,up_proj,q_proj
[INFO|2025-08-10 17:41:55] logging.py:143 >> trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
[INFO|2025-08-10 17:41:56] trainer.py:757 >> Using auto half precision backend
[INFO|2025-08-10 17:41:56] trainer.py:2433 >> ***** Running training *****
[INFO|2025-08-10 17:41:56] trainer.py:2434 >>   Num examples = 481
[INFO|2025-08-10 17:41:56] trainer.py:2435 >>   Num Epochs = 10
[INFO|2025-08-10 17:41:56] trainer.py:2436 >>   Instantaneous batch size per device = 2
[INFO|2025-08-10 17:41:56] trainer.py:2439 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|2025-08-10 17:41:56] trainer.py:2440 >>   Gradient Accumulation steps = 8
[INFO|2025-08-10 17:41:56] trainer.py:2441 >>   Total optimization steps = 310
[INFO|2025-08-10 17:41:56] trainer.py:2442 >>   Number of trainable parameters = 20,971,520
[INFO|2025-08-10 17:42:46] logging.py:143 >> {'loss': 1.2718, 'learning_rate': 4.9979e-05, 'epoch': 0.17, 'throughput': 973.44}
[INFO|2025-08-10 17:43:33] logging.py:143 >> {'loss': 1.1509, 'learning_rate': 4.9896e-05, 'epoch': 0.33, 'throughput': 977.09}
[INFO|2025-08-10 17:44:25] logging.py:143 >> {'loss': 1.1788, 'learning_rate': 4.9749e-05, 'epoch': 0.50, 'throughput': 973.40}
[INFO|2025-08-10 17:45:15] logging.py:143 >> {'loss': 1.1129, 'learning_rate': 4.9538e-05, 'epoch': 0.66, 'throughput': 972.88}
[INFO|2025-08-10 17:46:07] logging.py:143 >> {'loss': 1.0490, 'learning_rate': 4.9264e-05, 'epoch': 0.83, 'throughput': 973.34}
[INFO|2025-08-10 17:46:52] logging.py:143 >> {'loss': 1.0651, 'learning_rate': 4.8928e-05, 'epoch': 1.00, 'throughput': 970.40}
[INFO|2025-08-10 17:47:32] logging.py:143 >> {'loss': 1.0130, 'learning_rate': 4.8531e-05, 'epoch': 1.13, 'throughput': 968.83}
[INFO|2025-08-10 17:48:25] logging.py:143 >> {'loss': 1.0207, 'learning_rate': 4.8073e-05, 'epoch': 1.30, 'throughput': 968.68}
[INFO|2025-08-10 17:49:20] logging.py:143 >> {'loss': 0.9748, 'learning_rate': 4.7556e-05, 'epoch': 1.46, 'throughput': 968.56}
[INFO|2025-08-10 17:50:18] logging.py:143 >> {'loss': 0.9706, 'learning_rate': 4.6980e-05, 'epoch': 1.63, 'throughput': 969.27}
[INFO|2025-08-10 17:51:10] logging.py:143 >> {'loss': 0.8635, 'learning_rate': 4.6349e-05, 'epoch': 1.80, 'throughput': 968.72}
[INFO|2025-08-10 17:51:52] logging.py:143 >> {'loss': 0.8789, 'learning_rate': 4.5663e-05, 'epoch': 1.96, 'throughput': 967.37}
[INFO|2025-08-10 17:52:26] logging.py:143 >> {'loss': 0.7969, 'learning_rate': 4.4923e-05, 'epoch': 2.10, 'throughput': 966.36}
[INFO|2025-08-10 17:53:13] logging.py:143 >> {'loss': 0.8251, 'learning_rate': 4.4133e-05, 'epoch': 2.27, 'throughput': 965.43}
[INFO|2025-08-10 17:54:01] logging.py:143 >> {'loss': 0.8170, 'learning_rate': 4.3293e-05, 'epoch': 2.43, 'throughput': 965.72}
[INFO|2025-08-10 17:54:52] logging.py:143 >> {'loss': 0.7979, 'learning_rate': 4.2407e-05, 'epoch': 2.60, 'throughput': 965.57}
[INFO|2025-08-10 17:55:41] logging.py:143 >> {'loss': 0.8711, 'learning_rate': 4.1476e-05, 'epoch': 2.76, 'throughput': 966.39}
[INFO|2025-08-10 17:56:35] logging.py:143 >> {'loss': 0.8219, 'learning_rate': 4.0502e-05, 'epoch': 2.93, 'throughput': 966.59}
[INFO|2025-08-10 17:57:19] logging.py:143 >> {'loss': 0.8487, 'learning_rate': 3.9489e-05, 'epoch': 3.07, 'throughput': 967.22}
[INFO|2025-08-10 17:58:06] logging.py:143 >> {'loss': 0.6639, 'learning_rate': 3.8438e-05, 'epoch': 3.23, 'throughput': 967.08}
[INFO|2025-08-10 17:58:06] trainer.py:4408 >> 
***** Running Evaluation *****
[INFO|2025-08-10 17:58:06] trainer.py:4410 >>   Num examples = 10
[INFO|2025-08-10 17:58:06] trainer.py:4413 >>   Batch size = 2
[INFO|2025-08-10 17:58:08] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-100
[INFO|2025-08-10 17:58:08] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 17:58:08] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 17:58:08] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-100/chat_template.jinja
[INFO|2025-08-10 17:58:08] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-100/tokenizer_config.json
[INFO|2025-08-10 17:58:08] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-100/special_tokens_map.json
[INFO|2025-08-10 17:58:57] logging.py:143 >> {'loss': 0.7453, 'learning_rate': 3.7353e-05, 'epoch': 3.40, 'throughput': 964.85}
[INFO|2025-08-10 17:59:43] logging.py:143 >> {'loss': 0.7854, 'learning_rate': 3.6237e-05, 'epoch': 3.56, 'throughput': 964.81}
[INFO|2025-08-10 18:00:31] logging.py:143 >> {'loss': 0.7068, 'learning_rate': 3.5091e-05, 'epoch': 3.73, 'throughput': 964.43}
[INFO|2025-08-10 18:01:27] logging.py:143 >> {'loss': 0.7155, 'learning_rate': 3.3920e-05, 'epoch': 3.90, 'throughput': 964.82}
[INFO|2025-08-10 18:02:05] logging.py:143 >> {'loss': 0.6858, 'learning_rate': 3.2725e-05, 'epoch': 4.03, 'throughput': 964.73}
[INFO|2025-08-10 18:02:52] logging.py:143 >> {'loss': 0.6058, 'learning_rate': 3.1511e-05, 'epoch': 4.20, 'throughput': 964.82}
[INFO|2025-08-10 18:03:41] logging.py:143 >> {'loss': 0.6702, 'learning_rate': 3.0280e-05, 'epoch': 4.37, 'throughput': 964.40}
[INFO|2025-08-10 18:04:29] logging.py:143 >> {'loss': 0.6454, 'learning_rate': 2.9036e-05, 'epoch': 4.53, 'throughput': 964.05}
[INFO|2025-08-10 18:05:23] logging.py:143 >> {'loss': 0.6122, 'learning_rate': 2.7781e-05, 'epoch': 4.70, 'throughput': 964.45}
[INFO|2025-08-10 18:06:15] logging.py:143 >> {'loss': 0.6598, 'learning_rate': 2.6519e-05, 'epoch': 4.86, 'throughput': 964.33}
[INFO|2025-08-10 18:06:55] logging.py:143 >> {'loss': 0.5058, 'learning_rate': 2.5253e-05, 'epoch': 5.00, 'throughput': 964.34}
[INFO|2025-08-10 18:07:57] logging.py:143 >> {'loss': 0.6128, 'learning_rate': 2.3987e-05, 'epoch': 5.17, 'throughput': 964.73}
[INFO|2025-08-10 18:08:49] logging.py:143 >> {'loss': 0.5506, 'learning_rate': 2.2723e-05, 'epoch': 5.33, 'throughput': 965.04}
[INFO|2025-08-10 18:09:33] logging.py:143 >> {'loss': 0.4599, 'learning_rate': 2.1465e-05, 'epoch': 5.50, 'throughput': 964.54}
[INFO|2025-08-10 18:10:21] logging.py:143 >> {'loss': 0.5927, 'learning_rate': 2.0216e-05, 'epoch': 5.66, 'throughput': 964.48}
[INFO|2025-08-10 18:11:14] logging.py:143 >> {'loss': 0.5014, 'learning_rate': 1.8979e-05, 'epoch': 5.83, 'throughput': 964.93}
[INFO|2025-08-10 18:11:58] logging.py:143 >> {'loss': 0.5629, 'learning_rate': 1.7758e-05, 'epoch': 6.00, 'throughput': 965.11}
[INFO|2025-08-10 18:12:35] logging.py:143 >> {'loss': 0.3429, 'learning_rate': 1.6555e-05, 'epoch': 6.13, 'throughput': 964.97}
[INFO|2025-08-10 18:13:27] logging.py:143 >> {'loss': 0.5253, 'learning_rate': 1.5374e-05, 'epoch': 6.30, 'throughput': 965.11}
[INFO|2025-08-10 18:14:15] logging.py:143 >> {'loss': 0.4926, 'learning_rate': 1.4218e-05, 'epoch': 6.46, 'throughput': 965.22}
[INFO|2025-08-10 18:14:15] trainer.py:4408 >> 
***** Running Evaluation *****
[INFO|2025-08-10 18:14:15] trainer.py:4410 >>   Num examples = 10
[INFO|2025-08-10 18:14:15] trainer.py:4413 >>   Batch size = 2
[INFO|2025-08-10 18:14:18] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-200
[INFO|2025-08-10 18:14:18] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 18:14:18] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 18:14:18] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-200/chat_template.jinja
[INFO|2025-08-10 18:14:18] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-200/tokenizer_config.json
[INFO|2025-08-10 18:14:18] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-200/special_tokens_map.json
[INFO|2025-08-10 18:15:09] logging.py:143 >> {'loss': 0.4487, 'learning_rate': 1.3090e-05, 'epoch': 6.63, 'throughput': 963.77}
[INFO|2025-08-10 18:16:01] logging.py:143 >> {'loss': 0.5197, 'learning_rate': 1.1992e-05, 'epoch': 6.80, 'throughput': 964.14}
[INFO|2025-08-10 18:16:48] logging.py:143 >> {'loss': 0.4752, 'learning_rate': 1.0927e-05, 'epoch': 6.96, 'throughput': 964.07}
[INFO|2025-08-10 18:17:29] logging.py:143 >> {'loss': 0.4830, 'learning_rate': 9.8985e-06, 'epoch': 7.10, 'throughput': 964.03}
[INFO|2025-08-10 18:18:11] logging.py:143 >> {'loss': 0.4122, 'learning_rate': 8.9088e-06, 'epoch': 7.27, 'throughput': 963.90}
[INFO|2025-08-10 18:19:01] logging.py:143 >> {'loss': 0.3856, 'learning_rate': 7.9603e-06, 'epoch': 7.43, 'throughput': 963.90}
[INFO|2025-08-10 18:19:49] logging.py:143 >> {'loss': 0.4221, 'learning_rate': 7.0557e-06, 'epoch': 7.60, 'throughput': 963.93}
[INFO|2025-08-10 18:20:43] logging.py:143 >> {'loss': 0.4601, 'learning_rate': 6.1970e-06, 'epoch': 7.76, 'throughput': 963.97}
[INFO|2025-08-10 18:21:38] logging.py:143 >> {'loss': 0.4670, 'learning_rate': 5.3867e-06, 'epoch': 7.93, 'throughput': 964.27}
[INFO|2025-08-10 18:22:17] logging.py:143 >> {'loss': 0.3538, 'learning_rate': 4.6267e-06, 'epoch': 8.07, 'throughput': 964.06}
[INFO|2025-08-10 18:23:05] logging.py:143 >> {'loss': 0.3823, 'learning_rate': 3.9190e-06, 'epoch': 8.23, 'throughput': 963.94}
[INFO|2025-08-10 18:23:58] logging.py:143 >> {'loss': 0.3467, 'learning_rate': 3.2654e-06, 'epoch': 8.40, 'throughput': 963.98}
[INFO|2025-08-10 18:24:45] logging.py:143 >> {'loss': 0.4004, 'learning_rate': 2.6676e-06, 'epoch': 8.56, 'throughput': 963.92}
[INFO|2025-08-10 18:25:33] logging.py:143 >> {'loss': 0.4647, 'learning_rate': 2.1271e-06, 'epoch': 8.73, 'throughput': 963.86}
[INFO|2025-08-10 18:26:28] logging.py:143 >> {'loss': 0.3902, 'learning_rate': 1.6454e-06, 'epoch': 8.90, 'throughput': 964.21}
[INFO|2025-08-10 18:27:05] logging.py:143 >> {'loss': 0.4114, 'learning_rate': 1.2236e-06, 'epoch': 9.03, 'throughput': 963.96}
[INFO|2025-08-10 18:28:06] logging.py:143 >> {'loss': 0.4310, 'learning_rate': 8.6282e-07, 'epoch': 9.20, 'throughput': 964.55}
[INFO|2025-08-10 18:28:53] logging.py:143 >> {'loss': 0.4251, 'learning_rate': 5.6401e-07, 'epoch': 9.37, 'throughput': 964.58}
[INFO|2025-08-10 18:29:35] logging.py:143 >> {'loss': 0.3407, 'learning_rate': 3.2793e-07, 'epoch': 9.53, 'throughput': 964.39}
[INFO|2025-08-10 18:30:26] logging.py:143 >> {'loss': 0.4325, 'learning_rate': 1.5518e-07, 'epoch': 9.70, 'throughput': 964.48}
[INFO|2025-08-10 18:30:26] trainer.py:4408 >> 
***** Running Evaluation *****
[INFO|2025-08-10 18:30:26] trainer.py:4410 >>   Num examples = 10
[INFO|2025-08-10 18:30:26] trainer.py:4413 >>   Batch size = 2
[INFO|2025-08-10 18:30:28] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-300
[INFO|2025-08-10 18:30:28] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 18:30:28] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 18:30:28] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-300/chat_template.jinja
[INFO|2025-08-10 18:30:28] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-300/tokenizer_config.json
[INFO|2025-08-10 18:30:28] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-300/special_tokens_map.json
[INFO|2025-08-10 18:31:15] logging.py:143 >> {'loss': 0.3490, 'learning_rate': 4.6201e-08, 'epoch': 9.86, 'throughput': 963.47}
[INFO|2025-08-10 18:32:01] logging.py:143 >> {'loss': 0.3733, 'learning_rate': 1.2838e-09, 'epoch': 10.00, 'throughput': 963.59}
[INFO|2025-08-10 18:32:01] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-310
[INFO|2025-08-10 18:32:01] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 18:32:01] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-310/chat_template.jinja
[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-310/tokenizer_config.json
[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/checkpoint-310/special_tokens_map.json
[INFO|2025-08-10 18:32:01] trainer.py:2718 >> Training completed. Do not forget to share your model on huggingface.co/models =)[INFO|2025-08-10 18:32:01] trainer.py:4074 >> Saving model checkpoint to saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45
[INFO|2025-08-10 18:32:01] configuration_utils.py:750 >> loading configuration file /root/autodl-tmp/models/Llama3-8B-Chinese-Chat1/config.json
[INFO|2025-08-10 18:32:01] configuration_utils.py:817 >> Model config LlamaConfig {"architectures": ["LlamaForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"bos_token_id": 128000,"eos_token_id": 128009,"head_dim": 128,"hidden_act": "silu","hidden_size": 4096,"initializer_range": 0.02,"intermediate_size": 14336,"max_position_embeddings": 8192,"mlp_bias": false,"model_type": "llama","num_attention_heads": 32,"num_hidden_layers": 32,"num_key_value_heads": 8,"pretraining_tp": 1,"rms_norm_eps": 1e-05,"rope_scaling": null,"rope_theta": 500000.0,"tie_word_embeddings": false,"torch_dtype": "bfloat16","transformers_version": "4.55.0","use_cache": true,"vocab_size": 128256
}[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2393 >> chat template saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/chat_template.jinja
[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2562 >> tokenizer config file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/tokenizer_config.json
[INFO|2025-08-10 18:32:01] tokenization_utils_base.py:2571 >> Special tokens file saved in saves/Llama-3-8B-Instruct/lora/train_2025-08-10-17-35-45/special_tokens_map.json
[WARNING|2025-08-10 18:32:02] logging.py:148 >> No metric eval_accuracy to plot.
[INFO|2025-08-10 18:32:02] trainer.py:4408 >> 
***** Running Evaluation *****
[INFO|2025-08-10 18:32:02] trainer.py:4410 >>   Num examples = 10
[INFO|2025-08-10 18:32:02] trainer.py:4413 >>   Batch size = 2
[INFO|2025-08-10 18:32:04] modelcard.py:456 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
  • 訓練完成
    在這里插入圖片描述
  • 訓練結果
    在這里插入圖片描述
  • 訓練后,內存自動被釋放
    在這里插入圖片描述

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/web/92718.shtml
繁體地址,請注明出處:http://hk.pswp.cn/web/92718.shtml
英文地址,請注明出處:http://en.pswp.cn/web/92718.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

計算機網絡:如何理解目的網絡不再是一個完整的分類網絡

這一理解主要源于無分類域間路由(CIDR)技術的廣泛應用,它打破了傳統的基于類的IP地址分配方式。具體可從以下方面理解: 傳統分類網絡的局限性:在早期互聯網中,IP地址被分為A、B、C等固定類別,每…

小米開源大模型 MiDashengLM-7B:不僅是“聽懂”,更能“理解”聲音

目錄 前言 一、一枚“重磅炸彈”:開源,意味著一扇大門的敞開 二、揭秘MiDashengLM-7B:它究竟“神”在哪里? 2.1 “超級耳朵” 與 “智慧大腦” 的協作 2.2 突破:從 “聽見文字” 到 “理解世界” 2.3 創新訓練&a…

mysql出現大量redolog、undolog排查以及解決方案

排查步驟 監控日志增長情況 -- 查看InnoDB狀態 SHOW ENGINE INNODB STATUS;-- 查看redo log配置和使用情況 SHOW VARIABLES LIKE innodb_log_file%; SHOW VARIABLES LIKE innodb_log_buffer_size;-- 查看undo log信息 SHOW VARIABLES LIKE innodb_undo%;檢查長時間運行的事務 -…

華為網路設備學習-28(BGP協議 三)路由策略

目錄: 一、BGP路由匯總1、注:使用network命令注入的BGP不會被自動匯總2、主類網絡號計算過程如下:3.示例 開啟BGP路由自動匯總bgp100 開啟BGP路由自動匯總import-route 直連路由 11.1.1.0 /24對端 為 10.1.12.2 AS 2004.手動配置BGP路…

微信小程序中實現表單數據實時驗證的方法

一、實時驗證的基本實現思路表單實時時驗證通過監聽表單元素的輸入事件,在用戶輸入過程中即時對數據進行校驗,并并即時反饋驗證結果,主要實現步驟包括:為每個表單字段綁定輸入事件在事件處理函數中獲取當前輸入值應用驗證規則進行…

openpnp - 頂部相機如果超過6.5米影響通訊質量,可以加USB3.0信號放大器延長線

文章目錄openpnp - 頂部相機如果超過6.5米影響通訊質量,可以加USB3.0信號放大器延長線概述備注ENDopenpnp - 頂部相機如果超過6.5米影響通訊質量,可以加USB3.0信號放大器延長線 概述 手頭有1080x720x60FPS的攝像頭模組備件,換上后&#xff…

【驅動】RK3576-Debian系統使用ping報錯:socket operation not permitted

1、問題描述 在RK3576-Debian系統中,連接了Wifi后,測試網絡通斷時,報錯: ping www.csdn.net ping: socktype: SOCK_RAW ping: socket: Operation not permitted ping: => missing cap_net_raw+p capability or setuid?2、原因分析 2.1 分析打印日志 socktype: SOCK…

opencv:圖像輪廓檢測與輪廓近似(附代碼)

目錄 圖像輪廓 cv2.findContours(img, mode, method) 繪制輪廓 輪廓特征與近似 輪廓特征 輪廓近似 輪廓近似原理 opencv 實現輪廓近似 輪廓外接矩形 輪廓外接圓 圖像輪廓 cv2.findContours(img, mode, method) mode:輪廓檢索模式(通常使用第四個模式&am…

mtrace定位內存泄漏問題(僅限 GNU glibc 的 Linux)

一、mtrace原理 函數攔截機制:mtrace 利用 glibc 的內部機制,對 malloc() / calloc() / realloc() / free() 等內存函數進行 hook,記錄每一次分配和釋放行為。日志記錄:記錄會寫入 MALLOC_TRACE 環境變量指定的日志文件中&#xf…

高校合作 | 世冠科技聯合普華、北郵項目入選教育部第二批工程案例

近日,教育部學位與研究生教育發展中心正式公布第二批工程案例立項名單。由北京世冠金洋科技發展有限公司牽頭,聯合普華基礎軟件、北京郵電大學共同申報的"基于國產軟件棧的汽車嵌入式軟件開發工程案例"成功入選。該項目由北京郵電大學修佳鵬副…

TOMCAT筆記

一、前置知識:Web 技術演進 C/S vs B/S – C/S:Socket 編程,QQ、迅雷等,通信層 TCP/UDP,協議私有。 – B/S:瀏覽器 HTTP,文本協議跨網絡。 動態網頁誕生 早期靜態 HTML → 1990 年 HTTP 瀏覽…

上海一家機器人IPO核心零部件依賴外購, 募投計劃頻繁修改引疑

作者:Eric來源:IPO魔女8月8日,節卡機器人股份有限公司(簡稱“節卡股份”)將接受上交所科創板IPO上會審核。公司保薦機構為國泰海通證券股份有限公司,擬募集資金為6.76億元。報告期內,節卡股份營…

Linux810 shell 條件判斷 文件工具 ifelse

變量 條件判斷 -ne 不等 $(id -u) -eq [codesamba ~]$ [ $(id -u) -ne 0 ] && echo "the user is not admin" the user is not admin [codesamba ~]$ [ $(id -u) -eq 0] && echo "yes admin" || echo "no not " -bash: [: 缺少 …

ChatGPT 5的編程能力宣傳言過其實

2025年的8月7日,OpenAI 正式向全球揭開了GPT-5的神秘面紗,瞬間在 AI 領域乃至整個科技圈引發了軒然大波。OpenAI對GPT-5的宣傳可謂不遺余力,將其描繪成一款具有顛覆性變革的 AI 產品,尤其在編程能力方面,給出了諸多令人…

從MySQL到大數據平臺:基于Spark的離線分析實戰指南

引言在當今數據驅動的商業環境中,企業業務數據通常存儲在MySQL等關系型數據庫中,但當數據量增長到千萬級甚至更高時,直接在MySQL中進行復雜分析會導致性能瓶頸。本文將詳細介紹如何將MySQL業務數據遷移到大數據平臺,并通過Spark等…

Mysql筆記-存儲過程與存儲函數

1. 存儲過程(Stored Procedure) 1.1 概述 1.1.1 定義: 存儲過程是一組預編譯的 SQL 語句和控制流語句(如條件判斷、循環)的集合,?無返回值?(但可通過 OUT/INOUT 參數或結果集返回數據)。它支持參數傳遞、…

[論文閱讀] 人工智能 + 軟件工程 | LLM協作新突破:用多智能體強化學習實現高效協同——解析MAGRPO算法

LLM協作新突破:用多智能體強化學習實現高效協同——解析MAGRPO算法 論文:LLM Collaboration With Multi-Agent Reinforcement LearningarXiv:2508.04652 (cross-list from cs.AI) LLM Collaboration With Multi-Agent Reinforcement Learning Shuo Liu, …

使用OAK相機實現智能物料檢測與ABB機械臂抓取

大家好!今天我們很高興能與大家分享來自OAK的國外用戶——Vention 的這段精彩視頻,展示了他們的AI操作系統在現實中的應用——在演示中,進行實時的自動物料揀選。 OAK相機實時自動AI物料揀選視頻中明顯可以看到我們的OAK-D Pro PoE 3D邊緣AI相…

html5和vue區別

HTML5 是網頁開發的核心標準,而 Vue 是構建用戶界面的JavaScript框架,兩者在功能定位和開發模式上有顯著差異: 核心定位 HTML5是 HTML標準 的第五次重大更新(2014年發布),主要提供網頁結構定義、多媒體嵌入…

【前端八股文面試題】【JavaScript篇3】DOM常?的操作有哪些?

文章目錄🧭 一、查詢/獲取元素 (Selecting Elements)?? 二、修改元素內容與屬性 (Modifying Content & Attributes)🧬 三、創建與插入元素 (Creating & Inserting Elements)🗑? 四、刪除與替換元素 (Removing & Replacing)&am…