開源向量LLM - Qwen3-Embedding

1 Qwen3-Embedding介紹

Qwen3-Embedding遵循?Apache 2.0 許可證，模型大小從0.6B到8B，支持32k長文本編碼。

Model Type	Models	Size	Layers	Sequence Length	Embedding Dimension	MRL Support	Instruction Aware
Text Embedding	Qwen3-Embedding-0.6B	0.6B	28	32K	1024	Yes	Yes
Text Embedding	Qwen3-Embedding-4B	4B	36	32K	2560	Yes	Yes
Text Embedding	Qwen3-Embedding-8B	8B	36	32K	4096	Yes	Yes
Text Reranking	Qwen3-Reranker-0.6B	0.6B	28	32K	-	-	Yes
Text Reranking	Qwen3-Reranker-4B	4B	36	32K	-	-	Yes
Text Reranking	Qwen3-Reranker-8B	8B	36	32K	-	-	Yes

2 sentence-transformers示例

安裝sentence-transformers

pip install -U sentence-transformers -i https://pypi.tuna.tsinghua.edu.cn/simple

代碼示例

import os
os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0import torch
from sentence_transformers import SentenceTransformer# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )# The queries and documents to embed
queries = ["What is the capital of China?","Explain gravity",
]
documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]with torch.no_grad():# Encode the queries and documents. Note that queries benefit from using a prompt# Here we use the prompt called "query" stored under `model.prompts`, but you can# also pass your own prompt via the `prompt` argumentquery_embeddings = model.encode(queries, prompt_name="query")document_embeddings = model.encode(documents)# Compute the (cosine) similarity between the query and document embeddingssimilarity = model.similarity(query_embeddings, document_embeddings)print(similarity)
# tensor([[0.7646, 0.1414], [0.1355, 0.6000]])

3 hf transformers示例

安裝過程參考開源向量LLM - BGE (BAAI General Embedding) -CSDN博客

示例代碼如下

import os
os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"# Requires transformers>=4.51.0
import torch
from transformers import AutoModel, AutoTokenizer, AutoModelForCausalLMdef format_instruction(instruction, query, doc):if instruction is None:instruction = 'Given a web search query, retrieve relevant passages that answer the query'output = "<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}".format(instruction=instruction,query=query, doc=doc)return outputdef process_inputs(pairs):inputs = tokenizer(pairs, padding=False, truncation='longest_first',return_attention_mask=False, max_length=max_length - len(prefix_tokens) - len(suffix_tokens))for i, ele in enumerate(inputs['input_ids']):inputs['input_ids'][i] = prefix_tokens + ele + suffix_tokensinputs = tokenizer.pad(inputs, padding=True, return_tensors="pt", max_length=max_length)for key in inputs:inputs[key] = inputs[key].to(model.device)return inputs@torch.no_grad()
def compute_logits(inputs, **kwargs):batch_scores = model(**inputs).logits[:, -1, :]true_vector = batch_scores[:, token_true_id]false_vector = batch_scores[:, token_false_id]batch_scores = torch.stack([false_vector, true_vector], dim=1)batch_scores = torch.nn.functional.log_softmax(batch_scores, dim=1)scores = batch_scores[:, 1].exp().tolist()return scorestokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", padding_side='left')
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B").eval()# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B", torch_dtype=torch.float16, attn_implementation="flash_attention_2").cuda().eval()token_false_id = tokenizer.convert_tokens_to_ids("no")
token_true_id = tokenizer.convert_tokens_to_ids("yes")
max_length = 8192prefix = "<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be \"yes\" or \"no\".<|im_end|>\n<|im_start|>user\n"
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
prefix_tokens = tokenizer.encode(prefix, add_special_tokens=False)
suffix_tokens = tokenizer.encode(suffix, add_special_tokens=False)task = 'Given a web search query, retrieve relevant passages that answer the query'queries = ["What is the capital of China?","Explain gravity",
]documents = ["The capital of China is Beijing.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]pairs = [format_instruction(task, query, doc) for query, doc in zip(queries, documents)]# Tokenize the input texts
inputs = process_inputs(pairs)
scores = compute_logits(inputs)print("scores: ", scores)

reference

---

Qwen3-Embeeding

https://github.com/QwenLM/Qwen3-Embedding

vllm-on-intel-extension-for-pytorch

https://github.com/malcolmchanhaoxian/VLLM-on-Intel-Extension-for-Pytorch-

vllm cpu

https://vllm.hyper.ai/docs/getting-started/installation/cpu/

理解 Hugging Face 的 AutoModel 系列：不同任務的自動模型加載類

https://blog.csdn.net/weixin_42426841/article/details/142236561

圖像特征提取

https://hugging-face.cn/docs/transformers/tasks/image_feature_extraction

理解 Hugging Face 的 AutoModel 系列：不同任務的自動模型加載類

https://zhuanlan.zhihu.com/p/721062232

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/91613.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/91613.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/91613.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！