回到目錄
【評測】Qwen3-Embedding模型初體驗
模型的介紹頁面
0.6B運行配置:筆記本i5-8265U,16G內存,無GPU核顯運行,win10操作系統
8B運行配置:AMD8700G,64G內存,4090D 24G顯存,ubuntu24.04操作系統
下面直接使用介紹頁面的sample代碼體驗一下模型的威力。
1. modelscope下載模型
$ modelscope download --model Qwen/Qwen3-Embedding-0.6B
$ modelscope download --model Qwen/Qwen3-Embedding-8B
0.6B模型 1.12GB 8B模型 14.1GB
2. 修改sample代碼從本地加載模型
默認代碼運行報錯:
OSError: We couldn’t connect to ‘https://huggingface.co’ to load the files, and couldn’t find them in the cached files.
# test_qwen3-embedding.py# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0from sentence_transformers import SentenceTransformer# Load the model
#model = SentenceTransformer("Qwen/Qwen3-Embedding-8B") 改為下面代碼本地加載模型
model = SentenceTransformer("C:\\Users\\Administrator\\.cache\\modelscope\\hub\models\\Qwen\\Qwen3-Embedding-8B")# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
# "Qwen/Qwen3-Embedding-8B",
# model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
# tokenizer_kwargs={"padding_side": "left"},
# )# The queries and documents to embed
queries = ["What is the capital of China?","Explain gravity",
]
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7493, 0.0751],
# [0.0880, 0.6318]])
可能是機器配置太低問題,無法正常執行出結果
D:\workspace\test_qwen3-embedding.py:8: SyntaxWarning: invalid escape sequence ‘\m’
model = SentenceTransformer(“C:\Users\Administrator\.cache\modelscope\hub\models\Qwen\Qwen3-Embedding-8B”)
Loading checkpoint shards: 25%|██████████████▎ | 1/4 [00:14<00:42, 14.24s/it]
3. 修改sample代碼為0.6B模型
# test_qwen3-embedding.py
。。。
# Load the model
#model = SentenceTransformer("Qwen/Qwen3-Embedding-8B") 改為下面代碼本地加載模型
model = SentenceTransformer("C:\\Users\\Administrator\\.cache\\modelscope\\hub\models\\Qwen\\Qwen3-Embedding-8B")
。。。
(workspace) PS D:\workspace> uv run .\test_qwen3-embedding.py
D:\workspace\test_qwen3-embedding.py:8: SyntaxWarning: invalid escape sequence ‘\m’
model = SentenceTransformer(“C:\Users\Administrator\.cache\modelscope\hub\models\Qwen\Qwen3-Embedding-0.6B”)
tensor([[0.7646, 0.1414],
[0.1355, 0.6000]])
運行成功,幾秒鐘出結果,CPU呼呼的轉
4. 4090D機器上運行8B模型
報錯:torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 72.94 MiB is free. Process 3052744 has 434.64 MiB memory in use. Including non-PyTorch memory, this process has 23.12 GiB memory in use. Of the allocated memory 22.78 GiB is allocated by PyTorch, and 1.10 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
(
# test_qwen3-embedding.py
。。。
# Load the model
model = SentenceTransformer("/mnt/wd4t/models/modlescope/Qwen3-Embedding-8B", device="cuda", model_kwargs={"torch_dtype": "auto"}) <-- 修改加載模型代碼
$ uv run test_qwen3_embedding.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.48it/s]tensor([[0.7471, 0.0770],[0.0894, 0.6321]])
運行出來的結果與sample源代碼的結果基本一致。
回到目錄