mindie推理大語言模型問題及解決方法匯總

問題說明

使用功能mindie 1.0 RC2推理大語言模型，遇到不少問題，記錄下解決思路。

我的硬件是910B4。

問題及解決

問題1

在docker內啟動mindie時終端報錯

Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)

查看logs/pythonlog.log.xxxx

File "/usr/local/Ascend/atb-models/atb_llm/utils/file_utils.py", line 110, in check_owner
raise argparse.ArgumentTypeError("The path is not owned by current user or root")
argparse.ArgumentTypeError: The path is not owned by current user or root

問題分析：模型目錄是我從外部映射進去的，目錄的所有者是一個叫guest的用戶，而docker內的用戶是root。

解決方法：將日志目錄所有者和組改為root

chown root:root /path/to/directory -R

問題2

在docker內啟動mindie時終端報錯

Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: finalizing (tstate=0x0000ffff8401d570)

查看logs/pythonlog.log.xxxx

File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Base/tokenization_baichuan.py", line 7, in <module>
import sentencepiece as spm
ModuleNotFoundError: No module named 'sentencepiece'

問題分析：我加載的事baichuan2-13b模型，該模型依賴sentencepiece這個組件

解決方法：

pip install sentencepiece

問題3

在docker內啟動mindie時終端報錯

Exception:unsupported type: torch.bfloat16

問題分析：我加載的模型是bfloat16的，而mindie貌似不支持，只能支持fp16.具體類型可以從模型下的config.json中看到

解決辦法:將模型轉換為fp16類型

import argparse
import os
import torchdef convert_bin2st_from_pretrained(model_path, out_path):from transformers import AutoModelForCausalLM, AutoTokenizertokenizer = AutoTokenizer.from_pretrained(model_path,revision="v2.0",use_fast=False,trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_path,low_cpu_mem_usage=True,trust_remote_code=True,torch_dtype=torch.float16)  #這里指定float16格式print(f"Saving the target model to {out_path}")model.save_pretrained(out_path, safe_serialization=True)print(f"Saving the tokenizer to {out_path}")tokenizer.save_pretrained(out_path)if __name__ == '__main__':print(f"covert  model  into safetensor")convert_bin2st_from_pretrained("./Qwen2-72B-Instruct", "./Qwen2-72B-Instruct_fp16")

轉換完畢，將./Qwen2-72B-Instruct/tokenizer.json手動復制到./Qwen2-72B-Instruct_fp16。其它文件都全了。

問題4

在docker內啟動mindie時終端報錯

Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: finalizing (tstate=0x0000ffffac01d570)

查看logs/pythonlog.log.xxxx

File "/usr/local/Ascend/atb-models/atb_llm/models/qwen2/router_qwen2.py", line 39, in checkout_config_qwen
if value < min_val or value > max_val:
TypeError: '<' not supported between instances of 'NoneType' and 'int'

跟蹤發現是router_qwen2.py中獲取的sliding_window為None.這個問題是我用上一步的方法轉換模型引起的。

解決方法：在轉換后的模型目錄中config.json中將sliding_window字段設置為131072。

總結

很多問題表現為GIL相關的問題，實際都是業務進程出錯了，真實原因往往在logs/pythonlog.log.xxxx中。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/web/63900.shtml
繁體地址，請注明出處：http://hk.pswp.cn/web/63900.shtml
英文地址，請注明出處：http://en.pswp.cn/web/63900.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！