作者:來自 Elastic??Jeffrey Rengifo
講解如何用 JavaScript 創建一個可用于生產環境的 Elasticsearch 后端。
想獲得 Elastic 認證?看看下一期 Elasticsearch 工程師培訓什么時候開始吧!
Elasticsearch 擁有大量新功能,能幫助你為你的使用場景構建最佳搜索解決方案。深入了解我們的示例筆記本、開始免費的云端試用,或者立即在本地機器上嘗試 Elastic。
這是一個系列文章的第一篇,講解如何在 JavaScript 中使用 Elasticsearch。在這個系列中,你將學習在 JavaScript 環境中使用 Elasticsearch 的基礎知識,并了解創建搜索應用時最相關的功能和最佳實踐。到最后,你將掌握使用 JavaScript 運行 Elasticsearch 所需的一切。
在第一部分中,我們將介紹:
- 環境設置
- 前端、后端還是無服務器架構?
- 連接客戶端
- 文檔索引
- Elasticsearch 客戶端
- 語義映射
- 批量助手
- 數據搜索
- 詞法查詢
- 語義查詢
- 混合查詢
你可以在這里查看包含示例的源代碼。
什么是 Elasticsearch Node.js 客戶端?
Elasticsearch Node.js 客戶端是一個 JavaScript 庫,它將 Elasticsearch API 的 HTTP REST 調用封裝成 JavaScript,使處理變得更簡單,還提供了一些助手功能,方便執行像批量索引文檔這樣的任務。
更多閱讀,請參閱文章 “Elasticsearch:使用最新的 Nodejs client 8.x 來創建索引并搜索”。
環境
前端、后端,還是無服務器?
為了使用 JavaScript 客戶端創建搜索應用,我們至少需要兩個組件:一個 Elasticsearch 集群和一個運行客戶端的 JavaScript 運行時。
JavaScript 客戶端支持所有 Elasticsearch 解決方案(云端、本地部署和無服務器),它在內部處理了各種差異,因此你不需要擔心使用哪一種。
不過,JavaScript 運行時必須運行在服務器上,不能直接在瀏覽器中運行。
這是因為如果從瀏覽器直接調用 Elasticsearch,用戶可能會獲取敏感信息,比如集群的 API 密鑰、主機地址或查詢本身。Elasticsearch 建議絕不要將集群直接暴露在互聯網上,而是使用一個中間層來屏蔽這些信息,讓用戶只能看到參數。你可以在這里相關內容。
我們建議使用這樣的架構:
在這種情況下,客戶端只會發送搜索詞和一個用于你服務器的認證密鑰,而你的服務器將完全控制查詢內容以及與 Elasticsearch 的通信。
連接客戶端
首先按照這些步驟創建一個 API 密鑰。
根據前面的示例,我們將創建一個簡單的 Express 服務器,并通過一個 Node.js 服務器中的客戶端與它連接。
我們將使用 NPM 初始化項目,并安裝 Elasticsearch 客戶端和 Express。Express 是一個在 Node.js 中搭建服務器的庫。通過使用 Express,我們可以通過 HTTP 與后端進行交互。
讓我們來初始化項目:
npm init -y
安裝依賴項:
npm install @elastic/elasticsearch express split2 dotenv
讓我為你拆解說明:
-
@elastic/elasticsearch
:這是官方的 Node.js 客戶端 -
express
:允許我們快速搭建一個輕量級的 Node.js 服務器,用來暴露 Elasticsearch -
split2
:將文本按行拆分成流,便于我們逐行處理 ndjson 文件 -
dotenv
:允許我們通過.env
文件管理環境變量
在項目根目錄創建一個 .env
文件,并添加以下內容:
ELASTICSEARCH_ENDPOINT="Your Elasticsearch endpoint"
ELASTICSEARCH_API_KEY="Your Elasticssearch API"
這樣,我們可以使用 dotenv 包導入這些變量。
創建一個 server.js 文件:
const express = require("express");
const bodyParser = require("body-parser");
const { Client } = require("@elastic/elasticsearch");require("dotenv").config(); //environment variables setupconst ELASTICSEARCH_ENDPOINT = process.env.ELASTICSEARCH_ENDPOINT;
const ELASTICSEARCH_API_KEY = process.env.ELASTICSEARCH_API_KEY;
const PORT = 3000;const app = express();app.listen(PORT, () => {console.log("Server running on port", PORT);
});
app.use(bodyParser.json());let esClient = new Client({node: ELASTICSEARCH_ENDPOINT,auth: { apiKey: ELASTICSEARCH_API_KEY },
});app.get("/ping", async (req, res) => {try {const result = await esClient.info();res.status(200).json({success: true,clusterInfo: result,});} catch (error) {console.error("Error getting Elasticsearch info:", error);res.status(500).json({success: false,clusterInfo: null,error: error.message,});}
});
這段代碼搭建了一個基礎的 Express.js 服務器,監聽 3000 端口,并使用 API 密鑰連接到 Elasticsearch 集群進行認證。它包含一個 /ping 端點,通過 GET 請求訪問時,會使用 Elasticsearch 客戶端的 .info() 方法查詢集群的基本信息。
如果查詢成功,會以 JSON 格式返回集群信息;否則返回錯誤信息。服務器還使用了 body-parser 中間件來處理 JSON 請求體。
運行該文件啟動服務器:
node server.js
答案應該是這樣的:
Server running on port 3000
現在,讓我們訪問 /ping 端點來檢查 Elasticsearch 集群的狀態。
curl http://localhost:3000/ping
{"success": true,"clusterInfo": {"name": "instance-0000000000","cluster_name": "61b7e19eec204d59855f5e019acd2689","cluster_uuid": "BIfvfLM0RJWRK_bDCY5ldg","version": {"number": "9.0.0","build_flavor": "default","build_type": "docker","build_hash": "112859b85d50de2a7e63f73c8fc70b99eea24291","build_date": "2025-04-08T15:13:46.049795831Z","build_snapshot": false,"lucene_version": "10.1.0","minimum_wire_compatibility_version": "8.18.0","minimum_index_compatibility_version": "8.0.0"},"tagline": "You Know, for Search"}
}
索引文檔
連接成功后,我們可以使用像 semantic_text(語義搜索)和 text(全文查詢)這樣的映射來索引文檔。通過這兩種字段類型,我們還可以進行混合搜索(hybrid search)。
我們將創建一個新的 load.js 文件來生成映射并上傳文檔。
Elasticsearch 客戶端
我們首先需要實例化并認證客戶端:
const { Client } = require("@elastic/elasticsearch");const ELASTICSEARCH_ENDPOINT = "cluster/project_endpoint";
const ELASTICSEARCH_API_KEY = "apiKey";const esClient = new Client({node: ELASTICSEARCH_ENDPOINT,auth: { apiKey: ELASTICSEARCH_API_KEY },
});
語義映射 - semantic mappings
我們將創建一個包含獸醫醫院數據的索引。存儲的信息包括主人、寵物和就診詳情。
需要進行全文搜索的數據,如姓名和描述,將存為 text 類型。類別數據,如動物的種類或品種,將存為 keyword 類型。
此外,我們會將所有字段的值復制到一個 semantic_text 字段,以便也能針對這些信息進行語義搜索。
const INDEX_NAME = "vet-visits";const createMappings = async (indexName, mapping) => {try {const body = await esClient.indices.create({index: indexName,body: {mappings: mapping,},});console.log("Index created successfully:", body);} catch (error) {console.error("Error creating mapping:", error);}
};await createMappings(INDEX_NAME, {properties: {owner_name: {type: "text",copy_to: "semantic_field",},pet_name: {type: "text",copy_to: "semantic_field",},species: {type: "keyword",copy_to: "semantic_field",},breed: {type: "keyword",copy_to: "semantic_field",},vaccination_history: {type: "keyword",copy_to: "semantic_field",},visit_details: {type: "text",copy_to: "semantic_field",},semantic_field: {type: "semantic_text",},},
});
批量助手 - bulk helper
客戶端的另一個優勢是可以使用批量助手(bulk helper)批量索引。批量助手方便處理并發、重試以及每個文檔成功或失敗時的處理方式。
這個助手的一個吸引人功能是支持流式處理。它允許你逐行發送文件,而不是將整個文件存入內存后一次性發送給 Elasticsearch。
要上傳數據到 Elasticsearch,請在項目根目錄創建一個名為 data.ndjson 的文件,并添加以下信息(或者,你也可以從這里下載包含數據集的文件):
{"owner_name":"Alice Johnson","pet_name":"Buddy","species":"Dog","breed":"Golden Retriever","vaccination_history":["Rabies","Parvovirus","Distemper"],"visit_details":"Annual check-up and nail trimming. Healthy and active."}
{"owner_name":"Marco Rivera","pet_name":"Milo","species":"Cat","breed":"Siamese","vaccination_history":["Rabies","Feline Leukemia"],"visit_details":"Slight eye irritation, prescribed eye drops."}
{"owner_name":"Sandra Lee","pet_name":"Pickles","species":"Guinea Pig","breed":"Mixed","vaccination_history":[],"visit_details":"Loss of appetite, recommended dietary changes."}
{"owner_name":"Jake Thompson","pet_name":"Luna","species":"Dog","breed":"Labrador Mix","vaccination_history":["Rabies","Bordetella"],"visit_details":"Mild ear infection, cleaning and antibiotics given."}
{"owner_name":"Emily Chen","pet_name":"Ziggy","species":"Cat","breed":"Mixed","vaccination_history":["Rabies","Feline Calicivirus"],"visit_details":"Vaccination update and routine physical."}
{"owner_name":"Tomás Herrera","pet_name":"Rex","species":"Dog","breed":"German Shepherd","vaccination_history":["Rabies","Parvovirus","Leptospirosis"],"visit_details":"Follow-up for previous leg strain, improving well."}
{"owner_name":"Nina Park","pet_name":"Coco","species":"Ferret","breed":"Mixed","vaccination_history":["Rabies"],"visit_details":"Slight weight loss; advised new diet."}
{"owner_name":"Leo Martínez","pet_name":"Simba","species":"Cat","breed":"Maine Coon","vaccination_history":["Rabies","Feline Panleukopenia"],"visit_details":"Dental cleaning. Minor tartar buildup removed."}
{"owner_name":"Rachel Green","pet_name":"Rocky","species":"Dog","breed":"Bulldog Mix","vaccination_history":["Rabies","Parvovirus"],"visit_details":"Skin rash, antihistamines prescribed."}
{"owner_name":"Daniel Kim","pet_name":"Mochi","species":"Rabbit","breed":"Mixed","vaccination_history":[],"visit_details":"Nail trimming and general health check. No issues."}
我們使用 split2 來流式讀取文件的每一行,同時批量助手將它們發送到 Elasticsearch。
const { createReadStream } = require("fs");
const split = require("split2");const indexData = async (filePath, indexName) => {try {console.log(`Indexing data from ${filePath} into ${indexName}...`);const result = await esClient.helpers.bulk({datasource: createReadStream(filePath).pipe(split()),onDocument: () => {return {index: { _index: indexName },};},onDrop(doc) {console.error("Error processing document:", doc);},});console.log("Bulk indexing successful elements:", result.items.length);} catch (error) {console.error("Error indexing data:", error);throw error;}
};await indexData("./data.ndjson", INDEX_NAME);
上面的代碼逐行讀取 .ndjson 文件,并使用 helpers.bulk 方法批量將每個 JSON 對象索引到指定的 Elasticsearch 索引中。它通過 createReadStream 和 split2 流式讀取文件,為每個文檔設置索引元數據,并記錄處理失敗的文檔。完成后,會輸出成功索引的條目數量。
除了使用 indexData 函數,你也可以通過 Kibana 的 UI 直接上傳文件,使用上傳數據文件的界面。
我們運行該文件,將文檔上傳到 Elasticsearch 集群。
node load.js
Creating mappings for index vet-visits...
Index created successfully: { acknowledged: true, shards_acknowledged: true, index: 'vet-visits' }
Indexing data from ./data.ndjson into vet-visits...
Bulk indexing completed. Total documents: 10, Failed: 0
搜索數據
回到我們的 server.js 文件,我們將創建不同的端點來執行詞法搜索、語義搜索或混合搜索。
簡而言之,這些搜索類型不是互斥的,而是取決于你需要回答的問題類型。
Query type | Use case | Example question |
---|---|---|
詞匯搜索 | 問題中的詞或詞根很可能出現在索引文檔中。問題和文檔之間的詞元相似度。 | I’m looking for a blue sport t-shirt. |
語義搜索 | 問題中的詞不太可能出現在文檔中。問題和文檔之間的概念相似度。 | I’m looking for clothing for cold weather. |
混合搜索 | 問題包含詞法和/或語義成分。問題和文檔之間的詞元相似度和語義相似度。 | I’m looking for an S size dress for a beach wedding. |
問題的詞匯部分很可能是標題、描述或類別名稱的一部分,而語義部分是與這些字段相關的概念。Blue 很可能是類別名稱或描述的一部分,而 beach wedding 可能不是,但可以與 linen clothing 在語義上相關。
Lexical query (/search/lexic?q=<query_term>)
詞法搜索,也叫全文搜索,指的是基于詞元相似度的搜索;也就是說,經過分析后,包含搜索詞元的文檔會被返回。
你可以在這里查看我們的詞法搜索實操教程。
app.get("/search/lexic", async (req, res) => {const { q } = req.query;const INDEX_NAME = "vet-visits";try {const result = await esClient.search({index: INDEX_NAME,size: 5,body: {query: {multi_match: {query: q,fields: ["owner_name", "pet_name", "visit_details"],},},},});res.status(200).json({success: true,results: result.hits.hits});} catch (error) {console.error("Error performing search:", error);res.status(500).json({success: false,results: null,error: error.message,});}
});
我們用 “nail trimming” 測試。
curl http://localhost:3000/search/lexic?q=nail%20trimming
答案:
{"success": true,"results": [{"_index": "vet-visits","_id": "-RY6RJYBLe2GoFQ6-9n9","_score": 2.7075968,"_source": {"pet_name": "Mochi","owner_name": "Daniel Kim","species": "Rabbit","visit_details": "Nail trimming and general health check. No issues.","breed": "Mixed","vaccination_history": []}},{"_index": "vet-visits","_id": "8BY6RJYBLe2GoFQ6-9n9","_score": 2.560356,"_source": {"pet_name": "Buddy","owner_name": "Alice Johnson","species": "Dog","visit_details": "Annual check-up and nail trimming. Healthy and active.","breed": "Golden Retriever","vaccination_history": ["Rabies","Parvovirus","Distemper"]}}]
}
Semantic query (/search/semantic?q=<query_term>)
語義搜索不同于詞法搜索,它通過向量搜索找到與搜索詞含義相似的結果。
你可以在這里查看我們的語義搜索實操教程。
app.get("/search/semantic", async (req, res) => {const { q } = req.query;const INDEX_NAME = "vet-visits";try {const result = await esClient.search({index: INDEX_NAME,size: 5,body: {query: {semantic: {field: "semantic_field",query: q},},},});res.status(200).json({success: true,results: result.hits.hits,});} catch (error) {console.error("Error performing search:", error);res.status(500).json({success: false,results: null,error: error.message,});}
});
我們用 “Who got a pedicure?” 測試。
curl http://localhost:3000/search/semantic?q=Who%20got%20a%20pedicure?
答案:
{"success": true,"results": [{"_index": "vet-visits","_id": "-RY6RJYBLe2GoFQ6-9n9","_score": 4.861466,"_source": {"owner_name": "Daniel Kim","pet_name": "Mochi","species": "Rabbit","breed": "Mixed","vaccination_history": [],"visit_details": "Nail trimming and general health check. No issues."}},{"_index": "vet-visits","_id": "8BY6RJYBLe2GoFQ6-9n9","_score": 4.7152824,"_source": {"pet_name": "Buddy","owner_name": "Alice Johnson","species": "Dog","visit_details": "Annual check-up and nail trimming. Healthy and active.","breed": "Golden Retriever","vaccination_history": ["Rabies","Parvovirus","Distemper"]}},{"_index": "vet-visits","_id": "9RY6RJYBLe2GoFQ6-9n9","_score": 1.6717153,"_source": {"pet_name": "Rex","owner_name": "Tomás Herrera","species": "Dog","visit_details": "Follow-up for previous leg strain, improving well.","breed": "German Shepherd","vaccination_history": ["Rabies","Parvovirus","Leptospirosis"]}},{"_index": "vet-visits","_id": "9xY6RJYBLe2GoFQ6-9n9","_score": 1.5600781,"_source": {"pet_name": "Simba","owner_name": "Leo Martínez","species": "Cat","visit_details": "Dental cleaning. Minor tartar buildup removed.","breed": "Maine Coon","vaccination_history": ["Rabies","Feline Panleukopenia"]}},{"_index": "vet-visits","_id": "-BY6RJYBLe2GoFQ6-9n9","_score": 1.2696637,"_source": {"pet_name": "Rocky","owner_name": "Rachel Green","species": "Dog","visit_details": "Skin rash, antihistamines prescribed.","breed": "Bulldog Mix","vaccination_history": ["Rabies","Parvovirus"]}}]
}
Hybrid query (/search/hybrid?q=<query_term>)
混合搜索允許我們結合語義搜索和詞法搜索,從而兼得兩者優勢:既有基于詞元搜索的精準度,也有語義搜索的意義接近性。
app.get("/search/hybrid", async (req, res) => {const { q } = req.query;const INDEX_NAME = "vet-visits";try {const result = await esClient.search({index: INDEX_NAME,body: {retriever: {rrf: {retrievers: [{standard: {query: {bool: {must: {multi_match: {query: q,fields: ["owner_name", "pet_name", "visit_details"],},},},},},},{standard: {query: {bool: {must: {semantic: {field: "semantic_field",query: q,},},},},},},],},},size: 5,},});res.status(200).json({success: true,results: result.hits.hits,});} catch (error) {console.error("Error performing search:", error);res.status(500).json({success: false,results: null,error: error.message,});}
});
我們用 “Who got a pedicure or dental treatment?” 測試。
curl http://localhost:3000/search/hybrid?q=who%20got%20a%20pedicure%20or%20dental%20treatment
答案:
{"success": true,"results": [{"_index": "vet-visits","_id": "9xY6RJYBLe2GoFQ6-9n9","_score": 0.032522473,"_source": {"pet_name": "Simba","owner_name": "Leo Martínez","species": "Cat","visit_details": "Dental cleaning. Minor tartar buildup removed.","breed": "Maine Coon","vaccination_history": ["Rabies","Feline Panleukopenia"]}},{"_index": "vet-visits","_id": "-RY6RJYBLe2GoFQ6-9n9","_score": 0.016393442,"_source": {"pet_name": "Mochi","owner_name": "Daniel Kim","species": "Rabbit","visit_details": "Nail trimming and general health check. No issues.","breed": "Mixed","vaccination_history": []}},{"_index": "vet-visits","_id": "8BY6RJYBLe2GoFQ6-9n9","_score": 0.015873017,"_source": {"pet_name": "Buddy","owner_name": "Alice Johnson","species": "Dog","visit_details": "Annual check-up and nail trimming. Healthy and active.","breed": "Golden Retriever","vaccination_history": ["Rabies","Parvovirus","Distemper"]}},{"_index": "vet-visits","_id": "9RY6RJYBLe2GoFQ6-9n9","_score": 0.015625,"_source": {"pet_name": "Rex","owner_name": "Tomás Herrera","species": "Dog","visit_details": "Follow-up for previous leg strain, improving well.","breed": "German Shepherd","vaccination_history": ["Rabies","Parvovirus","Leptospirosis"]}},{"_index": "vet-visits","_id": "8xY6RJYBLe2GoFQ6-9n9","_score": 0.015384615,"_source": {"pet_name": "Luna","owner_name": "Jake Thompson","species": "Dog","visit_details": "Mild ear infection, cleaning and antibiotics given.","breed": "Labrador Mix","vaccination_history": ["Rabies","Bordetella"]}}]
}
總結
在本系列的第一部分中,我們講解了如何搭建環境并創建帶有不同搜索端點的服務器,以按照客戶端/服務器的最佳實踐查詢 Elasticsearch 文檔。敬請期待第二部分,你將學習生產環境的最佳實踐以及如何在無服務器環境中運行 Elasticsearch Node.js 客戶端。
原文:https://www.elastic.co/search-labs/blog/how-to-use-elasticsearch-in-javascript-part-i