作者:來自 Elastic?Tomás Murúa
將阿里云 AI 服務功能與 Elastic 結合使用。
更多閱讀,請參閱 “Elasticsearch:使用阿里 infererence API 及 semantic text 進行向量搜索”。
在本文中,我們將介紹如何將阿里云 AI 功能與 Elasticsearch 集成,以提高語義搜索的相關性。
阿里云人工智能搜索是一種將高級人工智能功能與 Elasticsearch 工具相結合的解決方案,利用 Qwen LLM/DeepSeek-R1 系列提供高級推理和分類模型。在本文中,我們將使用同一作者撰寫的小說和戲劇的描述來測試阿里巴巴重新排名和稀疏嵌入端點。
步驟
- 配置阿里云AI
- 創建 Elasticsearch 映射
- 將數據索引到 Elasticsearch 中
- 查詢數據
- 獎勵:完成回答問題
配置阿里云AI
阿里云 AI 重新排名和嵌入
開放推理阿里云(Open inference Alibaba Cloud)提供不同的服務。在此示例中,我們將使用阿加莎·克里斯蒂 (Agatha Christie) 的流行書籍和戲劇的描述來測試阿里云在語義搜索中的嵌入和重新排名端點。
阿里云 AI 重排名端點是一種語義重排名(semantic reranking)功能。這種重新排名使用機器學習模型根據搜索結果與查詢的語義相似性對其進行重新排序。這使你可以在現有的全文搜索索引上使用開箱即用的語義搜索功能。
稀疏嵌入(sparse embedding)端點是一種大多數值為零的嵌入類型,使得相關信息更加突出。
獲取阿里云 API Key
我們需要一個有效的 API 密鑰來將阿里巴巴與 Elasticsearch 集成。要獲取它,請按照下列步驟操作:
- 從服務廣場部分訪問阿里云門戶。
- 轉到左側菜單 API Keys,如下所示。
- 生成一個新的 API 密鑰。
配置阿里巴巴端點
我們首先配置稀疏嵌入端點,將文本描述轉換為語義向量:
嵌入端點:
PUT _inference/sparse_embedding/alibabacloud_ai_search_sparse
{"service": "alibabacloud-ai-search","service_settings": {"api_key": "<api_key>","service_id": "ops-text-sparse-embedding-001","host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com","workspace": "default"}
}
然后我們將配置重新排序端點來重新組織結果。
重新排序端點:
PUT _inference/rerank/alibabacloud_ai_search_rerank
{"service": "alibabacloud-ai-search","service_settings": {"api_key": "<api_key>","service_id": "ops-bge-reranker-larger","host": "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com","workspace": "default"}
}
現在端點已經配置完畢,我們可以準備 Elasticsearch 索引。
創建 Elasticsearch 映射
讓我們配置映射。為此,我們需要組織帶有描述的文本以及模型生成的向量。
我們將使用以下屬性:
- semantic_description:存儲模型生成的嵌入并運行語義搜索。
- description:我們將使用 “text” 類型來存儲小說(novels)和戲劇(plays)的描述,并使用它們進行全文搜索。
我們將包含 copy_to 參數,以便文本和語義字段均可用于混合搜索:
PUT arts
{"mappings": {"properties": {"semantic_description": {"type": "semantic_text","inference_id": "alibabacloud_ai_search_sparse"},"description": {"type": "text","copy_to": "semantic_description"}}}
}
映射準備好后,我們現在可以索引數據。
將數據索引到 Elasticsearch 中
這是我們將在本示例中使用的包含描述的數據集。我們將使用 Elasticsearch Bulk API 對其進行索引。
POST arts/_bulk
{ "index": {} }
{ "description": " Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive." }
{ "index": {} }
{ "description": "The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020." }
{ "index": {} }
{ "description": "The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance." }
{ "index": {} }
{ "description": " Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later." }
{ "index": {} }
{ "description": " Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head." }
{ "index": {} }
{ "description": " The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today." }
請注意,前兩篇文獻《Black Coffee - 黑咖啡》和《The Mousetraps - 捕鼠器》是戲劇(plays),而其他的是小說(novels)。
查詢數據
為了查看不同類型查詢的結果,我們將依次運行不同的查詢類型,首先進行語義查詢,然后應用重新排序,最后結合兩者。我們將使用相同的問題:"Which novel was written by Agatha Christie?"(阿加莎·克里斯蒂寫了哪部小說?),期望獲得三個明確提到 “novel” 的文檔,以及一個包含 “book” 的文檔。同時,兩部戲劇(plays)應排在最后。
語義搜索
我們將開始查詢 semantic_text 字段來詢問:“Which novel was written by Agatha Christie?” 讓我們看看會發生什么:
GET /arts/_search
{"_source": {"includes": ["description"]},"query": {"semantic": {"field": "semantic_description","query": "Which novel was written by Agatha Christie?"}}
}
響應是:
{"took": 1246,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 6,"relation": "eq"},"max_score": 0.1759066,"hits": [{"_index": "arts","_id": "rdJ4-ZMB36zj9EVTnMgJ","_score": 0.1759066,"_source": {"description": " Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head."}},{"_index": "arts","_id": "rNJ4-ZMB36zj9EVTnMgJ","_score": 0.17499167,"_source": {"description": " Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later."}},{"_index": "arts","_id": "q9J4-ZMB36zj9EVTnMgJ","_score": 0.16319725,"_source": {"description": "The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance."}},{"_index": "arts","_id": "qtJ4-ZMB36zj9EVTnMgJ","_score": 0.15506727,"_source": {"description": "The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020."}},{"_index": "arts","_id": "qdJ4-ZMB36zj9EVTnMgJ","_score": 0.14572844,"_source": {"description": " Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive."}},{"_index": "arts","_id": "rtJ4-ZMB36zj9EVTnMgJ","_score": 0.13951442,"_source": {"description": " The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today."}}]}
}
在這種情況下,響應優先考慮了大多數小說,但寫著 “book” 的文檔出現在最后。我們仍然可以通過重新排序來進一步優化結果。
通過重新排序優化結果
在這種情況下,我們將使用 _inference/rerank 請求來評估我們在第一個查詢中獲得的文檔并提高它們在結果中的排名。
POST _inference/rerank/alibabacloud_ai_search_rerank
{"query": "Which novel was written by Agatha Christie?","input": ["Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive.","The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020."," The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance."," Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later."," Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head."," The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today."]
}
響應是:
{"rerank": [{"index": 3,"relevance_score": 0.91086304},{"index": 4,"relevance_score": 0.8409133},{"index": 2,"relevance_score": 0.76838577},{"index": 5,"relevance_score": 0.2295352},{"index": 0,"relevance_score": 0.13846178},{"index": 1,"relevance_score": 0.06620602}]
}
這里的回應表明,這兩部劇現在都處于結果的底部。
語義搜索和重新排名端點相結合
使用檢索器,我們將語義查詢和重新排序合并到一個步驟中:
POST /arts/_search
{"_source": {"includes": ["description"]},"retriever": {"text_similarity_reranker": {"retriever": {"standard": {"query": {"semantic": {"field": "semantic_description","query": "Which novel was written by Agatha Christie?"}}}},"field": "description","rank_window_size": 10,"inference_id": "alibabacloud_ai_search_rerank","inference_text": "Which novel was written by Agatha Christie?"}}
}
響應是:
"took": 1568,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 6,"relation": "eq"},"max_score": 0.91086304,"hits": [{"_index": "arts","_id": "rNJ4-ZMB36zj9EVTnMgJ","_score": 0.91086304,"_source": {"description": " Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later."}},{"_index": "arts","_id": "rdJ4-ZMB36zj9EVTnMgJ","_score": 0.8409133,"_source": {"description": " Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head."}},{"_index": "arts","_id": "q9J4-ZMB36zj9EVTnMgJ","_score": 0.76838577,"_source": {"description": "The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance."}},{"_index": "arts","_id": "rtJ4-ZMB36zj9EVTnMgJ","_score": 0.2295352,"_source": {"description": " The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today."}},{"_index": "arts","_id": "qdJ4-ZMB36zj9EVTnMgJ","_score": 0.13846178,"_source": {"description": " Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive."}},{"_index": "arts","_id": "qtJ4-ZMB36zj9EVTnMgJ","_score": 0.06620602,"_source": {"description": "The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020."}}]}
}
這里的結果與語義查詢有所不同。我們可以看到,盡管文檔中沒有與 "novel" 完全匹配的內容,但包含 "book"(如 The Murder of Roger Ackroyd)的文檔在排名中比第一次語義搜索時更靠前。此外,兩部戲劇仍然排在最后,就像重新排序時一樣。
獎勵:使用 completion 來完成回答問題
通過嵌入和重新排名,我們可以滿足搜索查詢,但用戶仍然會看到所有搜索結果而不是實際答案。
通過提供的示例,我們距離 RAG 實現只有一步之遙,我們可以將最佳結果 + 問題提供給 LLM 以獲得正確答案。
幸運的是,阿里云AI服務還提供了一個 completion 端點服務,我們可以利用它來實現這一目的。
讓我們創建端點
使用阿里 QWen 創建 Completion 終點:
PUT _inference/completion/alibabacloud_ai_search_completion
{"service": "alibabacloud-ai-search","service_settings": {"host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com","api_key": "<api_key>","service_id": "ops-qwen-turbo","workspace" : "default"}
}
我們也可以使用 deepseek-r1?來創建:
PUT _inference/completion/alibabacloud_ai_search_completion_deepseek_r1
{"service": "alibabacloud-ai-search","service_settings": {"host" : "default-j01.platform-cn-shanghai.opensearch.aliyuncs.com","api_key": "{{API_KEY}}","service_id": "deepseek-r1","workspace" : "default"}
}
現在,發送上一個查詢的結果和問題:
使用阿里 QWen 來進行查詢
POST _inference/completion/alibabacloud_ai_search_completion
{"input": """Answer the following question using the context provided:QUESTION: Which novel was written by Agatha Christie?CONTEXT:DOCUMENT1Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive.DOCUMENT2The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020.DOCUMENT3The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance.DOCUMENT4Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later.DOCUMENT5Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head."DOCUMENT6The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today.ANSWER:"""
}
響應是:
{"completion": [
{"result": "Agatha Christie wrote several novels, including \"The Body in the Murder,\" \"Curtain: Poirot's Last Case,\" \"Death on the Nile,\" and \"The Murder of Roger Ackroyd.\""}]
}
使用阿里 deepseek-r1?來進行查詢
POST _inference/completion/alibabacloud_ai_search_completion_deepseek_r1?timeout=180s
{"input": "<|system|>你是一個機器人助手.</s><|user|>CONTEXT:Black Coffee is a play by the British crime-fiction author Agatha Christie. In the play, a scientist discovers that someone in his household has stolen the formula for an explosive;The Mousetrap is a murder mystery play by Agatha Christie. The play opened in London's West End in 1952 and ran continuously until 16 March 2020;The Body in the Murder is a Miss Marple mystery novel published by Agatha Christie in 1942. The case involves the murder of two teenage girls who are similar in appearance;Agatha Christie's last published novel before she passed, Curtain: Poirot's Last Case is also her indelible detective's last appearance. Poirot and Hastings return to the very same house from The Mysterious Affairs at Styles over 30 years later;Death on the Nile is Agatha Christie's most daring travel mystery novel. The tranquillity of a cruise along the Nile is shattered by the discovery that Linnet Ridgeway has been shot through the head;The Murder of Roger Ackroyd was Agatha Christie’s first book to be published by William Collins in the spring of 1926. William Collins became part of HarperCollins and are still Christie’s publishers today;QUESTION: Which novela were written by Agatha Christie?</s><|assistant|>"
}
注:由于 DeepSeek 的推理時間比較長,所以,我們把 timeout 參數設置為 180s。
推理的結果如下:
{"completion": [{"result": """<think>
Okay, let's see. The user is asking which novels were written by Agatha Christie based on the given context. First, I need to go through each item in the context and determine if it's a novel. The user mentioned "novela," which I think is Spanish for "novel," so they're asking about novels, not plays or other works.Looking at the context entries one by one:1. **Black Coffee** is described as a play by Christie. So that's a play, not a novel. Exclude.2. **The Mousetrap** is a murder mystery play, opened in London's West End. Definitely a play, not a novel. Exclude.3. **The Body in the Murder** is listed as a Miss Marple mystery novel published in 1942. Wait, the title here might be a bit off. Agatha Christie wrote a novel called "The Body in the Library," which is a Miss Marple story from 1942. Maybe the user made a typo. Assuming it's "The Body in the Library," then yes, that's a novel. But the title given is "The Body in the Murder," which I don't recall. Need to check if that's a real title or a mistake. However, since the context says it's a Miss Marple novel published in 1942, I'll proceed with that, even if the title is slightly wrong. So include as a novel.4. **Curtain: Poirot's Last Case** is mentioned as her last published novel before she passed. So that's a novel. Include.5. **Death on the Nile** is described as a travel mystery novel. That's a novel. Include.6. **The Murder of Roger Ackroyd** was her first book published by William Collins. That's a novel. Include.So the novels listed here are: The Body in the Murder (assuming typo), Curtain, Death on the Nile, and The Murder of Roger Ackroyd. However, "The Body in the Murder" might actually be "The Body in the Library," which is the correct title. But since the user provided that exact title, I should list it as given, even if there's an error. Alternatively, note the possible typo.Also, check if there are other works mentioned. The other entries are plays. So the answer should list the four novels mentioned in the context, being careful with the title accuracy.
</think>The novels written by Agatha Christie mentioned in the context are: 1. **The Body in the Murder** (likely a typo for *The Body in the Library*, a Miss Marple novel published in 1942).
2. **Curtain: Poirot's Last Case** (her final published novel featuring Hercule Poirot).
3. **Death on the Nile** (a travel mystery novel set on a Nile cruise).
4. **The Murder of Roger Ackroyd** (her breakthrough novel published in 1926). *Note*:
- *Black Coffee* and *The Mousetrap* are plays, not novels.
- If "The Body in the Murder" is intended to refer to *The Body in the Library*, the latter is the correct title of Christie's 1942 Miss Marple novel."""}]
}
結論
將阿里云 AI 搜索與 Elasticsearch 集成,使我們能夠輕松訪問完成、嵌入和重新排名模型,并將其合并到我們的搜索管道中。
我們可以借助檢索器單獨或一起使用重新排序和嵌入端點。
我們還可以引入 completion 端點來完成 RAG 端到端實現。
想要獲得 Elastic 認證嗎?了解下一期 Elasticsearch 工程師培訓何時舉行!
Elasticsearch 包含許多新功能,可幫助你為你的用例構建最佳的搜索解決方案。深入了解我們的示例筆記本以了解更多信息,開始免費云試用,或立即在本地機器上試用 Elastic。
原文:Embeddings and reranking with Alibaba Cloud AI Service - Elasticsearch Labs