1 概述
1.1 簡述
Step1X-Edit:一個在各種真實用戶指令下表現出現的統一圖像編輯模型。
Step1X-Edit,其性能可與 GPT-4o 和 Gemini2 Flash 等閉源模型相媲美。更具體地說,我們采用了多模態LLM 來處理參考圖像和用戶的編輯指令。我們提取了潛在嵌入,并將其與擴散圖像解碼器相結合,從而獲得目標圖像。為了訓練模型,我們建立了一個數據生成管道,以生成高質量的數據集。為了進行評估,我們開發了 GEdit-Bench,這是一種植根于真實世界用戶指令的新型基準。在 GEdit-Bench 上的實驗結果表明,Step1X-Edit 的性能大大優于現有的開源基線,并接近領先的專有模型,從而為圖像編輯領域做出了重大貢獻。更多的技術信息可以參考https://arxiv.org/abs/2504.17761
模型鏈接:https://modelers.cn/models/StepFun/Step1X-Edit-npu
2 環境準備
2.1 獲取CANN安裝包&環境準備
版本支持列表
軟件包 | 版本 |
---|---|
CANN | 8.0.0 |
PTA | 6.0.0 |
HDK | 24.1.0 |
pytorch | 2.3.1 |
Python | 3.11 |
2.2 Pytorch & CANN安裝
- Pytorch & Ascend Extension for PyTorch安裝(https://www.hiascend.com/document/detail/zh/Pytorch/600/configandinstg/instg/insg_0001.html)》
以下是python3.11,pytorch2.3.1,PTA插件版本6.0.0,系統架構是AArch64,CANN版本是8.0.0的安裝信息:
# 下載PyTorch安裝包
wget https://download.pytorch.org/whl/cpu/torch-2.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
# 下載torch_npu插件包
wget https://gitee.com/ascend/pytorch/releases/download/v6.0.0-pytorch2.3.1/torch_npu-2.3.1.post4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
# 安裝命令
pip3 install torch-2.3.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
pip3 install torch_npu-2.3.1.post4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- 軟件包下載Atlas 800I A2
- CANN包安裝
以下是CANN包中需要安裝的run包信息:
# 增加軟件包可執行權限,{version}表示軟件版本號,{arch}表示CPU架構,{soc}表示昇騰AI處理器的版本。
chmod +x ./Ascend-cann-toolkit_{version}_linux-{arch}.run
chmod +x ./Ascend-cann-kernels-{soc}_{version}_linux.run
# 校驗軟件包安裝文件的一致性和完整性
./Ascend-cann-toolkit_{version}_linux-{arch}.run --check
./Ascend-cann-kernels-{soc}_{version}_linux.run --check
# 安裝
./Ascend-cann-toolkit_{version}_linux-{arch}.run --install
./Ascend-cann-kernels-{soc}_{version}_linux.run --install# 設置環境變量
source /usr/local/Ascend/ascend-toolkit/set_env.sh
2.3 依賴包安裝
由于NPU下當前對Triton的inductor后端支持并不完備,請注釋requirements.txt中的liger_kernel依賴信息,具體如下:
liger_kernel -> # liger_kernel
然后執行如下命令安裝依賴:
pip install -r requirements.txt
注意:NPU上有單獨的flash_attn算子實現,可以不用安裝flash_attn庫。
3 模型下載
- Huggingface
模型 | 鏈接 |
---|---|
Step1X-Edit | 🤗huggingface |
- 魔樂社區
模型 | 鏈接 |
---|---|
Step1X-Edit | modelers |
4 執行推理
-
獲取
Step1X-Edit
的源碼
git clone https://github.com/stepfun-ai/Step1X-Edit.git
-
修改
scripts/run_examples.sh
種的model_path
參數的值為模型下載的路徑。
執行如下命令進行推理,如下
bash scripts/run_examples.sh
執行成功后,會在當前目錄下生成2個文件夾,分別是output_cn和output_en。對應examples
目錄下2種prompt(中文和英文)。結果如下:
Prompt(中文):給這個女生的脖子上戴一個帶有紅寶石的吊墜
Prompt(英文):Change the outerwear to be made of top-grain calfskin.
5 FAQ
5.1 問題1:rms_norm
Traceback (most recent call last):File "/home/Step1X-Edit/Step1X-Edit/inference.py", line 23, in <module>from modules.model_edit import Step1XParams, Step1XEditFile "/home/Step1X-Edit/Step1X-Edit/modules/model_edit.py", line 8, in <module>from .connector_edit import Qwen2ConnectorFile "/home/Step1X-Edit/Step1X-Edit/modules/connector_edit.py", line 8, in <module>from .layers import MLP, TextProjection, TimestepEmbedder, apply_gate, attentionFile "/home/Step1X-Edit/Step1X-Edit/modules/layers.py", line 27, in <module>from liger_kernel.ops.rms_norm import LigerRMSNormFunction
ModuleNotFoundError: No module named 'liger_kernel'
解決:
替換掉liger_kernel,使用npu版本的npu_rms_norm
RmsNorm & RmsNormGrad-融合算子替換-NPU親和適配優化-性能調優方法-性能調優-PyTorch 訓練模型遷移調優指南-模型開發-Ascend Extension for PyTorch6.0.0開發文檔-昇騰社區
5.2 問題2:liger_kernel
File "/usr/local/lib/python3.11/site-packages/transformers/configuration_utils.py", line 594, in get_config_dictconfig_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/configuration_utils.py", line 653, in _get_config_dictresolved_config_file = cached_file(^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/utils/hub.py", line 385, in cached_fileraise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like Qwen/Qwen2.5-VL-7B-Instruct is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'
ModuleNotFoundError: No module named 'liger_kernel'
解決:
liger_kernel是triton的模塊,當前npu下并未實現,需要去掉,算子直接用NPU的acclNN相關算子。
5.3 問題3: flash_atten
File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4179, in from_pretrainedconfig = cls._autoset_attn_implementation(^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 1575, in _autoset_attn_implementationcls._check_and_enable_flash_attn_2(File "/usr/local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 1710, in _check_and_enable_flash_attn_2raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
[ERROR] 2025-04-25-14:53:48 (PID:7549, Device:0, RankID:-1) ERR99999 UNKNOWN application exception
+ python inference.py --input_dir ./examples --model_path /home/Step1X-Edit/weight/ --json_path ./examples/prompt_cn.json --output_dir ./output_cn --seed 1234 --size_level 1024
解決:
flash_atten是gpu相關的,沒有對應安裝包。修改模型導入時候使用attn_implementation參數。
self.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model_path,torch_dtype=dtype,
- attn_implementation="flash_attention_2",
+ #attn_implementation=" eager",).to(torch.cuda.current_device())
改用attn_implementation = “eager”。
5.4 問題4:torch.compile
File "/usr/local/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 1307, in compile_to_fnreturn self.compile_to_module().call^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib64/python3.11/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapperr = func(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 1250, in compile_to_moduleself.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()^^^^^^^^^^^^^^File "/usr/local/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 1203, in codegenself.init_wrapper_code()File "/usr/local/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 1134, in init_wrapper_codewrapper_code_gen_cls is not None
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: Device npu not supportedSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
解決:
去掉torch,compile的特性。當前npu不支持inductor。
5.5 問題5:flash_attn_func
File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/usr/local/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_implreturn forward_call(*args, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/Step1X-Edit/Step1X-Edit/modules/layers.py", line 557, in forwardattn = attention_after_rope(q, k, v, pe=pe)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/Step1X-Edit/Step1X-Edit/modules/layers.py", line 370, in attention_after_ropex = attention(q, k, v, mode="flash")^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/home/Step1X-Edit/Step1X-Edit/modules/attention.py", line 82, in attentionassert flash_attn_func is not None, "flash_attn_func未定義"^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: flash_attn_func未定義
[ERROR] 2025-04-25-15:20:18 (PID:20394, Device:0, RankID:-1) ERR99999 UNKNOWN application exception
答:使用torch的mode,走acllnn的flashattention。
5.6 問題6:精度問題
出來的結果圖片,完全是亂碼,像是馬賽克。
解決:
由于torch.nn.functional.scaled_dot_product_attention導致的精度問題,需要改成torch_npu.npu_fusion_attention的接口,并修改對應的接口參數。
參考:
FlashAttentionScore-融合算子替換-NPU親和適配優化-性能調優方法-性能調優-PyTorch 訓練模型遷移調優指南-模型開發-Ascend Extension for PyTorch6.0.0開發文檔-昇騰社區