在FreeBSD系統下使用llama-cpp運行飛槳開源大模型Ernie4.5 0.3B（失敗）

先上結論，截止到目前2025.7.25日，還不能用。也就是Ernie4.5模型無法在llama.cpp 和Ollama上進行推理，原因主要就llama是不支持Ernie4.5異構MoE架構。

不局限于FreeBSD系統，Windows也測試失敗，理論上Ubuntu下也是不行。

所做嘗試

安裝llama-cpp

首先pkg安裝llama-cpp

pkg install llama-cpp

也嘗試了編譯安裝

下載源代碼

git clone https://github.com/ggerganov/llama.cpp

進入llama.cpp目錄

編譯安裝

mkdir build
cd build
cmake ..
cmake --build . --config Release

?將編譯好的路徑加入PATH

export PATH=~/github/llama.cpp/build/bin:$PAT

這樣就可以執行llama.cpp了。

直接編譯，最后生成的可執行文件是main，執行起來是這樣：

main -m ~/work/model/chinesellama/ggml-model-f16.gguf  -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e

下載模型

從這個網址下載：unsloth/ERNIE-4.5-0.3B-PT-GGUF at main

如果下載很慢，可以考慮從huggingface官網下載，當然需要科學上網。

下載完畢：

ls E*
ERNIE-4.5-0.3B-PT-F16.gguf	ERNIE-4.5-0.3B-PT-Q2_K.gguf

也可以下載普通的模型文件，然后用轉換程序，轉換為gguf格式模型

python convert.py ~/work/model/chinesellama/

運行

llama-cli -m ERNIE-4.5-0.3B-PT-Q2_K.gguf -p "hello"

如果編譯后的文件為main，那么執行：

main -m ERNIE-4.5-0.3B-PT-Q2_K.gguf -p "hello"

運行失敗。?

總結?

截止目前Ernie4.5還不能用llama推理。

說實話，這確實限制了Ernie4.5的普及。

調試

報錯Terminating due to uncaught exception 0x28323c45c340 of type std::runtime_error

main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
[New LWP 112399 of process 29362]
[New LWP 112400 of process 29362]
[New LWP 112401 of process 29362]
[New LWP 112402 of process 29362]
0x0000000829dc1818 in _wait4 () from /lib/libc.so.7
#0 ?0x0000000829dc1818 in _wait4 () from /lib/libc.so.7
#1 ?0x0000000821b3993c in ?? () from /lib/libthr.so.3
#2 ?0x00000008231e6809 in ?? () from /usr/local/lib/libggml-base.so
#3 ?0x00000008281be199 in std::terminate() () from /lib/libcxxrt.so.1
#4 ?0x00000008281be674 in ?? () from /lib/libcxxrt.so.1
#5 ?0x00000008281be589 in __cxa_throw () from /lib/libcxxrt.so.1
#6 ?0x00000000002d8070 in ?? ()
#7 ?0x00000000002d8adc in ?? ()
#8 ?0x000000000025e8b8 in ?? ()
#9 ?0x0000000829d0dc3a in __libc_start1 () from /lib/libc.so.7
#10 0x000000000025e120 in ?? ()
[Inferior 1 (process 29362) detached]
Terminating due to uncaught exception 0x28323c45c340 of type std::runtime_error
終止陷阱（核心已轉儲）

大約是內存不足

后來在Windows下用llama.cpp，報錯：

print_info: file size   = 688.14 MiB (16.00 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'ernie4_5'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'e:\360Downloads\ERNIE-4.5-0.3B-PT-F16.gguf'
main: error: unable to load model

證明確實無法用llama進行推理。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/pingmian/90667.shtml
繁體地址，請注明出處：http://hk.pswp.cn/pingmian/90667.shtml
英文地址，請注明出處：http://en.pswp.cn/pingmian/90667.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！