Elasticsearch：如何使用 Qwen3 來做向量搜索

在這篇文章中，我們將使用 Qwen3 來針對數據進行向量搜索。我們將對數據使用 qwen3 嵌入模型來進行向量化，并使用 Qwen3 來對它進行推理。在閱讀這篇文章之前，請閱讀之前的文章 “如何使用 Ollama 在本地設置并運行 Qwen3”。

安裝

Elasticsearch 及 Kibana

如果你還沒有安裝自己的 Elasticsearch 及 Kibana，那么請閱讀這篇文章 “使用 start-local 腳本在本地運行 Elasticsearch” 來進行安裝。在默認的情況下，他沒有 SSL 的配置：

$ curl -fsSL https://elastic.co/start-local | sh______ _           _   _      |  ____| |         | | (_)     | |__  | | __ _ ___| |_ _  ___ |  __| | |/ _` / __| __| |/ __|| |____| | (_| \__ \ |_| | (__ |______|_|\__,_|___/\__|_|\___|
-------------------------------------------------
🚀 Run Elasticsearch and Kibana for local testing
-------------------------------------------------??  Do not use this script in a production environment?? Setting up Elasticsearch and Kibana v9.1.2-arm64...- Generated random passwords
- Created the elastic-start-local folder containing the files:- .env, with settings- docker-compose.yml, for Docker services- start/stop/uninstall commands
- Running docker compose up --wait[+] Running 6/6? Network elastic-start-local_default             Created                                             0.1s ? Volume "elastic-start-local_dev-kibana"         Create...                                           0.0s ? Volume "elastic-start-local_dev-elasticsearch"  Created                                             0.0s ? Container es-local-dev                          Healthy                                            22.0s ? Container kibana-local-settings                 Exited                                             21.9s ? Container kibana-local-dev                      Healthy                                            31.9s 🎉 Congrats, Elasticsearch and Kibana are installed and running in Docker!🌐 Open your browser at http://localhost:5601Username: elasticPassword: u06Imqiu🔌 Elasticsearch API endpoint: http://localhost:9200
🔑 API key: QzNJQnA1Z0JiSkRyN2UwaUk3VFQ6dXFrdkFvRkt1UXlJX2Z1bm5qblpndw==Learn more at https://github.com/elastic/start-local

在我安裝完畢后，我得到的最新的 Elasticsearch 版本是 9.1.2。

寫入數據到 Elasticsearch

我們使用如下的代碼把數據寫入到 Elasticsearch：

elasticsearch_qwen3.py

from langchain_community.vectorstores import ElasticsearchStore
from langchain_community.embeddings import OllamaEmbeddings# Replace with your actual Elasticsearch endpoint
ELASTICSEARCH_URL = "http://localhost:9200"
INDEX_NAME = "my_embeddings_index"# Initialize Ollama embeddings (you can specify model if needed)
embeddings = OllamaEmbeddings(model="qwen3")# Create ElasticsearchStore index
vectorstore = ElasticsearchStore(embedding=embeddings,es_url=ELASTICSEARCH_URL,index_name=INDEX_NAME,es_user = "elastic",es_password = "u06Imqiu"
)# Example: Add documents to the index
str1 = "阿里巴巴（中國）有限公司成立于2007年03月26日，法定代表人蔣芳"
str2 = "百度是擁有強大互聯網基礎的領先AI公司。百度愿景是：成為最懂用戶，并能幫助人們成長的全球頂級高科技公司。于2000年1月1日在中關村創建了百度公司"docs = [ str1, str2 ]
vectorstore.add_texts(docs)print(f"Index '{INDEX_NAME}' created and documents added.")

我們需要安裝如下的 Python 包：

pip install langchain_community

在上面，我們使用 qwen3 嵌入模型把輸入的句子進行向量化。在這里，我們可以使用其它的任何嵌入模型。運行以上的代碼：

python3 elasticsearh_qwen3.py

$ python3 elasticsearch_qwen3.py 
Index 'my_embeddings_index' created and documents added.

我們可以在 Kibana 中進行查看：

我們可以看到有兩個文檔被寫入。

這個是因為我們的模型?qwen3:8b 是 4096 維的。?

從上面我們也可以看出來所生成的向量。我們可以通過如下的命令來查看它的 mapping：

GET my_embeddings_index/_mapping

{"my_embeddings_index": {"mappings": {"properties": {"metadata": {"type": "object"},"text": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"vector": {"type": "dense_vector","dims": 4096,"index": true,"similarity": "cosine","index_options": {"type": "bbq_hnsw","m": 16,"ef_construction": 100,"rescore_vector": {"oversample": 3}}}}}}
}

我們可以看到有一個叫做 text 及 vector 的字段。

我們可以使用如下的命令來對它進行向量搜索：

elasticsearch_qwen3.py

from langchain_community.vectorstores import ElasticsearchStore
from langchain_community.embeddings import OllamaEmbeddings# Replace with your actual Elasticsearch endpoint
ELASTICSEARCH_URL = "http://localhost:9200"
INDEX_NAME = "my_embeddings_index"# Initialize Ollama embeddings (you can specify model if needed)
embeddings = OllamaEmbeddings(model="qwen3")# Create ElasticsearchStore index
vectorstore = ElasticsearchStore(embedding=embeddings,es_url=ELASTICSEARCH_URL,index_name=INDEX_NAME,es_user = "elastic",es_password = "u06Imqiu"
)if not vectorstore.client.indices.exists(index=INDEX_NAME):print(f"Index '{INDEX_NAME}' already exists.")# Example: Add documents to the indexstr1 = "阿里巴巴（中國）有限公司成立于2007年03月26日，法定代表人蔣芳"str2 = "百度是擁有強大互聯網基礎的領先AI公司。百度愿景是：成為最懂用戶，并能幫助人們成長的全球頂級高科技公司。于2000年1月1日在中關村創建了百度公司"docs = [ str1, str2 ]vectorstore.add_texts(docs,bulk_kwargs={"chunk_size": 300,"max_chunk_bytes": 4096})print(f"Index '{INDEX_NAME}' created and documents added.")results = vectorstore.similarity_search(query=" alibaba法定代表人是誰"# k=1# filter=[{"term": {"metadata.source.keyword": "tweet"}}],
)# print(results, len(results))for res in results:print(f"* {res.page_content}")

運行上面的代碼：

$ python3 elasticsearch_qwen3.py 
* 阿里巴巴（中國）有限公司成立于2007年03月26日，法定代表人蔣芳
* 百度是擁有強大互聯網基礎的領先AI公司。百度愿景是：成為最懂用戶，并能幫助人們成長的全球頂級高科技公司。于2000年1月1日在中關村創建了百度公司

我們把? query 改為：

results = vectorstore.similarity_search(query="中國的搜索引擎公司是哪個"# k=1# filter=[{"term": {"metadata.source.keyword": "tweet"}}],
)

$ python3 elasticsearch_qwen3.py 
* 百度是擁有強大互聯網基礎的領先AI公司。百度愿景是：成為最懂用戶，并能幫助人們成長的全球頂級高科技公司。于2000年1月1日在中關村創建了百度公司
* 阿里巴巴（中國）有限公司成立于2007年03月26日，法定代表人蔣芳

我們把搜索的 query 改為：

results = vectorstore.similarity_search(query="淘寶"# k=1# filter=[{"term": {"metadata.source.keyword": "tweet"}}],
)

$ python3 elasticsearch_qwen3.py 
* 阿里巴巴（中國）有限公司成立于2007年03月26日，法定代表人蔣芳
* 百度是擁有強大互聯網基礎的領先AI公司。百度愿景是：成為最懂用戶，并能幫助人們成長的全球頂級高科技公司。于2000年1月1日在中關村創建了百度公司

results = vectorstore.similarity_search(query="阿里巴巴的法人代表"# k=1# filter=[{"term": {"metadata.source.keyword": "tweet"}}],
)

$ python3 elasticsearch_qwen3.py 
* 阿里巴巴（中國）有限公司成立于2007年03月26日，法定代表人蔣芳
* 百度是擁有強大互聯網基礎的領先AI公司。百度愿景是：成為最懂用戶，并能幫助人們成長的全球頂級高科技公司。于2000年1月1日在中關村創建了百度公司

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/919227.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/919227.shtml
英文地址，請注明出處：http://en.pswp.cn/news/919227.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！