問題背景
更新ktransformers docker鏡像到v0.3版本后(之前為v0.2.4post1),使用更新前啟動命令無法正確啟動服務,提示以下錯誤:
Traceback (most recent call last):File "/workspace/ktransformers/ktransformers/server/main.py", line 12, in <module>from ktransformers.server.utils.create_interface import create_interface, GlobalInterfaceFile "/opt/conda/lib/python3.11/site-packages/ktransformers/server/utils/create_interface.py", line 14, in <module>from ktransformers.server.backend.context_manager import ThreadContextManagerFile "/opt/conda/lib/python3.11/site-packages/ktransformers/server/backend/context_manager.py", line 8, in <module>from ktransformers.server.backend.interfaces.transformers import TransformersThreadContextFile "/opt/conda/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/transformers.py", line 5, in <module>from transformers import (File "<frozen importlib._bootstrap>", line 1229, in _handle_fromlistFile "/opt/conda/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1956, in __getattr__value = getattr(module, name)^^^^^^^^^^^^^^^^^^^^^File "/opt/conda/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__module = self._get_module(self._class_to_module[name])^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "/opt/conda/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1969, in _get_moduleraise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
operator torchvision::nms does not exist
原因分析
經搜索得知,該異常由torchvision版本與torch版本不匹配導致。
解決方案
卸載torchvision,重新安裝 匹配版本 。
但注意,此處不能直接安裝最新版本,如 pip install --upgrade torchvision
,這會將torchvision和torch均更新至最新版本,然后由于預編譯的ktransformer與torch版本不一致導致以下錯誤:
Traceback (most recent call last):File "/workspace/ktransformers/ktransformers/server/main.py", line 10, in <module>from ktransformers.server.args import ArgumentParserFile "/opt/conda/lib/python3.11/site-packages/ktransformers/server/args.py", line 3, in <module>from ktransformers.util.utils import get_free_portsFile "/opt/conda/lib/python3.11/site-packages/ktransformers/util/utils.py", line 14, in <module>from ktransformers.util.custom_gguf import translate_name_to_ggufFile "/opt/conda/lib/python3.11/site-packages/ktransformers/util/custom_gguf.py", line 27, in <module>import KTransformersOps
ImportError: /opt/conda/lib/python3.11/site-packages/KTransformersOps.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs
正確做法是卸載torchvision后安裝與torch版本對應的torchvision,以下為安裝命令(對應torch 2.6.0):
pip install torchvision==0.21.0
正確安裝后問題現象消失。