先上結論,截止到目前2025.7.25日,還不能用。也就是Ernie4.5模型無法在llama.cpp 和Ollama上進行推理,原因主要就llama是不支持Ernie4.5異構MoE架構。
不局限于FreeBSD系統,Windows也測試失敗,理論上Ubuntu下也是不行。
所做嘗試
安裝llama-cpp
首先pkg安裝llama-cpp
pkg install llama-cpp
也嘗試了編譯安裝
下載源代碼
git clone https://github.com/ggerganov/llama.cpp
進入llama.cpp目錄
編譯安裝
mkdir build
cd build
cmake ..
cmake --build . --config Release
?將編譯好的路徑加入PATH
export PATH=~/github/llama.cpp/build/bin:$PAT
這樣就可以執行llama.cpp了。
直接編譯,最后生成的可執行文件是main,執行起來是這樣:
main -m ~/work/model/chinesellama/ggml-model-f16.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e
下載模型
從這個網址下載:unsloth/ERNIE-4.5-0.3B-PT-GGUF at main
如果下載很慢,可以考慮從huggingface官網下載,當然需要科學上網。
下載完畢:
ls E*
ERNIE-4.5-0.3B-PT-F16.gguf ERNIE-4.5-0.3B-PT-Q2_K.gguf
也可以下載普通的模型文件,然后用轉換程序,轉換為gguf格式模型
python convert.py ~/work/model/chinesellama/
運行
llama-cli -m ERNIE-4.5-0.3B-PT-Q2_K.gguf -p "hello"
如果編譯后的文件為main,那么執行:
main -m ERNIE-4.5-0.3B-PT-Q2_K.gguf -p "hello"
運行失敗。?
總結?
截止目前Ernie4.5還不能用llama推理。
說實話,這確實限制了Ernie4.5的普及。
調試
報錯Terminating due to uncaught exception 0x28323c45c340 of type std::runtime_error
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
[New LWP 112399 of process 29362]
[New LWP 112400 of process 29362]
[New LWP 112401 of process 29362]
[New LWP 112402 of process 29362]
0x0000000829dc1818 in _wait4 () from /lib/libc.so.7
#0 ?0x0000000829dc1818 in _wait4 () from /lib/libc.so.7
#1 ?0x0000000821b3993c in ?? () from /lib/libthr.so.3
#2 ?0x00000008231e6809 in ?? () from /usr/local/lib/libggml-base.so
#3 ?0x00000008281be199 in std::terminate() () from /lib/libcxxrt.so.1
#4 ?0x00000008281be674 in ?? () from /lib/libcxxrt.so.1
#5 ?0x00000008281be589 in __cxa_throw () from /lib/libcxxrt.so.1
#6 ?0x00000000002d8070 in ?? ()
#7 ?0x00000000002d8adc in ?? ()
#8 ?0x000000000025e8b8 in ?? ()
#9 ?0x0000000829d0dc3a in __libc_start1 () from /lib/libc.so.7
#10 0x000000000025e120 in ?? ()
[Inferior 1 (process 29362) detached]
Terminating due to uncaught exception 0x28323c45c340 of type std::runtime_error
終止陷阱(核心已轉儲)
大約是內存不足
后來在Windows下用llama.cpp,報錯:
print_info: file size = 688.14 MiB (16.00 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'ernie4_5'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'e:\360Downloads\ERNIE-4.5-0.3B-PT-F16.gguf'
main: error: unable to load model
證明確實無法用llama進行推理。