硬件環境
Cix P1 SoC 瑞莎星睿 O6 開發板 + rx580顯卡
- 產品介紹: https://docs.radxa.com/orion/o6/getting-started/introduction
OpenHarmony 5.0.0
使用vulkan后端的llama.cpp (GPU)
# ./llama-bench -m /data/qwen1_5-0_5b-chat-q2_k.gguf -ngl 100
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 580 2048SP (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory : 65536 | int dot: 0 | matrix cores: none
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen2 0.5B Q2_K - Medium | 278.92 MiB | 619.57 M | Vulkan | 100 | pp512 | 2425.55 ± 2.33 |
| qwen2 0.5B Q2_K - Medium | 278.92 MiB | 619.57 M | Vulkan | 100 | tg128 | 136.98 ± 7.70 |build: unknown (0)
此時可以觀察到 rx580的狀態,接近滿載。理論上插一張rx7900 xtx 24G也可以的,可惜筆者囊中羞澀。
純cpu跑
# ./llama-bench -m /data/qwen1_5-0_5b-chat-q2_k.gguf
| model | size | params | backend | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| qwen2 0.5B Q2_K - Medium | 278.92 MiB | 619.57 M | CPU | 12 | pp512 | 138.70 ± 0.36 |
| qwen2 0.5B Q2_K - Medium | 278.92 MiB | 619.57 M | CPU | 12 | tg128 | 8.41 ± 0.22 |build: unknown (0)
結論
GPU 優勢顯著?:Vulkan 后端在端側推理中實現 ?16–17 倍加速,尤其適合高吞吐任務。