DeepSpeed訓練得到checkpoint如何像Huggingface模型一樣評測evaluation？zero_to

DeepSpeed訓練得到checkpoint如何像Huggingface模型一樣評測evaluation？

具體步驟

首先看一個樣例：

在這里插入圖片描述

這是我用open-instruct框架，使用DeepSpeed訓練1728個steps得到的一個checkpoint。請注意，下文我演示用的例子是基于step_1152，主要方法和step_1728這個名字沒有任何區別。

在使用 DeepSpeed 訓練得到的 checkpoint 時，直接加載保存的模型路徑可能需要一些額外的步驟，而不僅僅是像指定 model_name 一樣加載 Hugging Face 提供的預訓練模型。

以下是加載和評估 DeepSpeed Checkpoint 的相關信息和步驟：

1. DeepSpeed Checkpoint 結構

根據圖片中的文件結構，DeepSpeed checkpoint 包括以下內容：

pytorch_model/: 分布式權重的保存路徑（通常是 sharded weights）。
random_states_*.pkl: 保存訓練過程中隨機狀態的文件，用于確保再現性。
scheduler.bin: 保存優化器調度器的狀態。
latest: 用于記錄當前最新的 step。
zero_to_fp32.py: 一個腳本，用于將分布式權重（Zero Partitioned Weights）合并為一個完整的模型權重文件（FP32）。

2. 是否可以直接加載路徑

如果想直接像 Hugging Face 的 model_name 那樣指定路徑加載模型，需要確認以下兩點：

是否已將分布式權重合并為完整的權重文件（FP32）。
是否提供了模型配置文件（主要是模型的config和tokenizer相關的文件，一個經典的例子是config.json, special_tokens_map.json, tokenizer_config.json以及tokenizer.json文件）。

3. 步驟：合并權重并評估性能

(1) 合并權重

DeepSpeed 的 zero_to_fp32.py 腳本可以將分布式權重合并為單個文件：

運行以下命令：

python zero_to_fp32.py . pytorch_model.bin

. 是 checkpoint 的路徑。
pytorch_model.bin 是輸出的完整 FP32 權重文件。

運行后，將得到一個完整的模型文件（例如 pytorch_model.bin），可以直接用于評估。

這里的是一個示例，這里需要參考 https://www.deepspeed.ai/tutorials/zero/里面的參考代碼如下：

If you’d like to get the fp32 weights, we supply a special script that can do offline consolidation. It requires no configuration files or GPUs. Here is an example of its usage:
$ cd /path/to/checkpoint_dir
$ ./zero_to_fp32.py . pytorch_model.bin
Processing zero checkpoint at global_step1
Detected checkpoint of type zero stage 3, world_size: 2
Saving fp32 state dict to pytorch_model.bin (total_numel=60506624)
The zero_to_fp32.py script gets created automatically when you save a checkpoint.

下面是我的實際例子：

(peft_study) (base) xxx@test:~/code/peft_study/open-instruct/output/sft_gemma_2b/step_1152$ python zero_to_fp32.py . pytorch_model.bin
[2025-01-01 03:54:14,913] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing zero checkpoint './pytorch_model'
/data/xxx/code/peft_study/open-instruct/output/sft_gemma_2b/step_1152/zero_to_fp32.py:146: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.state_dict = torch.load(f, map_location=device)
Detected checkpoint of type zero stage 3, world_size: 4
/data/xxx/code/peft_study/open-instruct/output/sft_gemma_2b/step_1152/zero_to_fp32.py:98: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.state_dict = torch.load(file, map_location=device)
Parsing checkpoint created by deepspeed==0.15.0
Reconstructed Trainable fp32 state dict with 288 params 2614341888 elements
Saving fp32 state dict to pytorch_model.bin

(2) 檢查模型配置

DeepSpeed 不會自動生成 Hugging Face 格式的config.json, special_tokens_map.json, tokenizer_config.json以及tokenizer.json文件。你需要提供一個適配的配置文件。

需要提供的文件如下圖所示：

在這里插入圖片描述

例如，假設你用的是類似 gemma-2-2b 的模型結構進行的訓練，可以把它的相關的配置文件復制過來。路徑一般類似于~/.cache/huggingface/hub/models--google--gemma-2-2b/snapshots/c5ebcd40d208330abc697524c919956e692655cf這樣。請注意，原始hf下載下來的snapshots文件下面都是軟連接（指向blobs中實際的存儲文件，各個文件名字一般都是hash值），你無法直接復制文件，需要手動創建，然后從這里的內容復制粘貼過去。

在這里插入圖片描述

示例配置文件：

{"architectures": ["Gemma2ForCausalLM"],"attention_bias": false,"attention_dropout": 0.0,"attn_logit_softcapping": 50.0,"bos_token_id": 2,"cache_implementation": "hybrid","eos_token_id": 1,"final_logit_softcapping": 30.0,"head_dim": 256,"hidden_act": "gelu_pytorch_tanh","hidden_activation": "gelu_pytorch_tanh","hidden_size": 2304,"initializer_range": 0.02,"intermediate_size": 9216,"max_position_embeddings": 8192,"model_type": "gemma2","num_attention_heads": 8,"num_hidden_layers": 26,"num_key_value_heads": 4,"pad_token_id": 0,"query_pre_attn_scalar": 256,"rms_norm_eps": 1e-06,"rope_theta": 10000.0,"sliding_window": 4096,"torch_dtype": "float32","transformers_version": "4.42.4","use_cache": true,"vocab_size": 256000
}

將該文件命名為 config.json 并放在與 pytorch_model.bin 同一目錄下。

(3) 使用 Hugging Face Transformers 加載模型

你可以使用以下代碼加載合并后的模型：

from transformers import AutoModelForCausalLM, AutoTokenizer# 指定模型路徑
model_path = "~/code/peft_study/open-instruct/output/sft_gemma_2b/step_1152"# 加載模型
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto")# 加載分詞器（確保與模型對應）
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b")# 生成測試
inputs = tokenizer("Evaluate this model:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

當然，如果讀者像我一樣使用別的evaluation的框架，比如olmes，直接把MODEL_NAME換成自己的checkpoint路徑即可：

export CUDA_VISIBLE_DEVICES="4,5"MODEL_NAME=/data/xxx/code/peft_study/open-instruct/output/sft_gemma_2b_backup/step_1152
# meta-llama/Llama-3.1-8B BATCH_SIZE=4echo "Running the command and measuring execution time..."TASK_NAME_02=ifeval::tulu
OUTPUT_DIR_02=eval-gemma-2-2b-ifeval::tulumkdir -p $OUTPUT_DIR_02{ time olmes \--model $MODEL_NAME  \--task $TASK_NAME_02 \--batch-size $BATCH_SIZE \--output-dir $OUTPUT_DIR_02 \--model-args '{"max_length": 4096}'; } 2>&1 | tee $OUTPUT_DIR_02/time_output.log

這樣就可以eval自己本地的deepspeed保存的checkpoint啦。

4. 其他注意事項

如果你使用的是 DeepSpeed Zero Stage 3 分布式優化器，評估時無需再使用 DeepSpeed，只需加載合并后的權重即可。

總結

使用 zero_to_fp32.py 合并權重文件。
提供對應的模型配置文件（如 config.json）。
使用 Hugging Face 的 from_pretrained 方法加載模型。
確保分詞器和模型對應。

完成這些步驟后，就可以像加載 Hugging Face 模型一樣評估你的模型性能了。

文件下有多個bin文件，會加載哪個？

問GPT4o：

我現在已經合并文件為pytorh_model.bin，然后把tokenizer的配置已經放好了，但是我的疑問是這里有兩個bin文件，一個是pytorh_model.bin，另一個是scheduler.bin，像加載 Hugging Face 模型一樣評估你的模型性能的時候，它是如何確定加載哪一個bin文件呢？

在評估時，Hugging Face 的 from_pretrained 方法只會加載模型的主要權重文件（例如 pytorch_model.bin 或其他明確指定的文件），而不會加載優化器調度器文件（如 scheduler.bin）。

以下是具體細節以及如何確保正確加載：

1. Hugging Face 默認加載邏輯

當你調用
```
from_pretrained
```
時，Hugging Face 會在指定路徑中尋找一個模型權重文件。默認情況下，它會尋找以下文件：
- pytorch_model.bin 或類似的文件名（例如 model.bin）。
- 一個配置文件（如 config.json）。
如果有多個 .bin 文件，只有名稱明確為主權重文件（如 pytorch_model.bin）的文件會被加載。其他文件（如 scheduler.bin）會被忽略，因為它們不屬于模型權重的一部分，而是訓練過程中調度器的狀態文件。

2. `scheduler.bin` 的作用

scheduler.bin 保存的是優化器調度器的狀態（如學習率調度等），用于在訓練中斷后繼續訓練時恢復進度。
在推理或評估時，不需要加載 scheduler.bin，它僅用于訓練時的恢復（checkpoint 恢復）。

3. 確保正確加載權重

你只需要確保以下幾點：

已將分布式權重合并為 pytorch_model.bin。
提供 config.json 文件，定義模型架構。
提供正確的分詞器配置（如 tokenizer.json 和 tokenizer_config.json）。
在加載時明確指定路徑。

代碼示例如下：

from transformers import AutoModelForCausalLM, AutoTokenizer# 指定模型路徑
model_path = "./sft_gemma_2b_backup/step_1152"# 加載模型
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto")# 加載分詞器
tokenizer = AutoTokenizer.from_pretrained(model_path)# 測試推理
inputs = tokenizer("Evaluate this model:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

這里，from_pretrained 方法會自動讀取 pytorch_model.bin 和 config.json 文件，忽略其他文件（如 scheduler.bin）。

4. 多個 `.bin` 文件的場景

如果目錄中有多個 .bin 文件，但你明確知道 pytorch_model.bin 是主權重文件，你不需要擔心 Hugging Face 會加載錯誤的文件。它會按照以下優先級處理：

首先查找標準命名的模型文件（如 pytorch_model.bin）。
加載與 config.json 文件匹配的模型權重。

對于 scheduler.bin，因為它并不是模型結構的一部分，Hugging Face 不會嘗試加載它。

總結

scheduler.bin 是訓練相關的輔助文件，評估和推理時不會被加載。
確保 pytorch_model.bin 和 config.json 在同一目錄即可正確加載。
使用 from_pretrained 方法時，Hugging Face 會自動忽略與評估無關的文件。

如果命名不是pytorch_model.bin，而是命名為其他的文件呢？

繼續問：

如果不是pytorch_model.bin，而是命名為其他的文件呢？比如gemma2_combined.bin

如果你的模型權重文件不是命名為 pytorch_model.bin，而是像 gemma2_combined.bin 這樣的自定義文件名，Hugging Face 的 from_pretrained 方法不會自動找到這個文件。你需要通過參數顯式指定模型權重文件的路徑。

以下是如何加載自定義命名權重文件的方法：

1. 解決方法：顯式指定權重文件路徑

使用 from_pretrained 的 state_dict 參數顯式加載自定義命名的權重文件。

示例代碼如下：

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch# 指定模型路徑
model_path = "./sft_gemma_2b_backup/step_1152"
custom_weight_file = f"{model_path}/gemma2_combined.bin"# 加載配置文件和分詞器
config = model_path  # 自動尋找 config.json
tokenizer = AutoTokenizer.from_pretrained(model_path)# 加載自定義命名的權重文件
state_dict = torch.load(custom_weight_file, map_location="cpu")
model = AutoModelForCausalLM.from_pretrained(config, state_dict=state_dict, torch_dtype="auto")# 測試推理
inputs = tokenizer("Evaluate this model:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

關鍵點：

torch.load(custom_weight_file, map_location="cpu") 手動加載權重。
將權重通過 state_dict 參數傳遞給 from_pretrained 方法。

2. 替代方法：重命名權重文件

如果你希望繼續使用 Hugging Face 的自動加載機制，可以簡單地將文件重命名為 pytorch_model.bin，并放置在相同路徑下。這樣就無需手動加載 state_dict。

命令示例：

mv ./sft_gemma_2b_backup/step_1152/gemma2_combined.bin ./sft_gemma_2b_backup/step_1152/pytorch_model.bin

然后直接運行代碼：

from transformers import AutoModelForCausalLM, AutoTokenizermodel_path = "./sft_gemma_2b_backup/step_1152"
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)inputs = tokenizer("Evaluate this model:", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

3. 常見錯誤及解決方案

錯誤 1：`KeyError: 'Unexpected key(s) in state_dict'`

原因：

模型的權重文件和配置文件（config.json）不匹配，例如：權重是 LoRA 或其他自定義微調方法生成的。

解決方法：確保 config.json 定義的模型結構與權重文件一致。如果使用了自定義微調方法（如 LoRA），需要加載對應的庫或方法。

錯誤 2：`OSError: File not found`

原因：

from_pretrained 無法找到模型權重文件。

解決方法：確保路徑正確，或者通過 state_dict 明確加載文件。

總結

如果權重文件命名為其他名稱（如 gemma2_combined.bin），需要顯式加載權重文件到 state_dict，并傳遞給 from_pretrained 方法。
為了方便，可以直接重命名權重文件為 pytorch_model.bin，并確保其位于 config.json 的同一目錄下。

后記

2025年1月1日12點57分于上海，在GPT4o大模型輔助下完成。

構與權重文件一致。如果使用了自定義微調方法（如 LoRA），需要加載對應的庫或方法。

錯誤 2：`OSError: File not found`

原因：

from_pretrained 無法找到模型權重文件。

解決方法：確保路徑正確，或者通過 state_dict 明確加載文件。

總結

如果權重文件命名為其他名稱（如 gemma2_combined.bin），需要顯式加載權重文件到 state_dict，并傳遞給 from_pretrained 方法。
為了方便，可以直接重命名權重文件為 pytorch_model.bin，并確保其位于 config.json 的同一目錄下。