全文檢索
- 1.全文檢索
- 1.1 準備測試數據
- 1.2 案例分析
- 1.2.1 match(分詞檢索)
- 1.2.2 match_phrase(短語檢索)
- 1.2.3 match_phrase_prefix(短語前綴匹配)
- 1.2.4 multi_match(多字段匹配)
- 1.2.5 query_string(高級查詢語法)
- 1.2.6 simple_query_string
- 1.3 對比總結表
- 2.組合檢索
1.全文檢索
1.1 準備測試數據
創建一個索引。
PUT /products
{"mappings": {"properties": {"name": {"type": "text", "fields": {"keyword": {"type": "keyword"}}},"price": {"type": "double"},"category": {"type": "keyword"},"tags": {"type": "keyword"},"description": {"type": "text"},"stock": {"type": "integer"},"sku": {"type": "keyword"},"created_at": {"type": "date"},"metadata": {"type": "object"}}}
}
插入測試數據。
POST /products/_bulk
{"index":{}}
{"name":"Laptop X1","price":1299.99,"category":"electronics","tags":["new","sale"],"description":"High performance laptop","stock":50,"sku":"LP-X1-2023","created_at":"2023-01-15","metadata":{"weight":1.5,"color":"silver"}}
{"index":{}}
{"name":"Smartphone S10","price":899.99,"category":"electronics","tags":["new","popular"],"description":"Latest smartphone model","stock":120,"sku":"SP-S10-2023","created_at":"2023-02-20","metadata":{"weight":0.3,"color":"black"}}
{"index":{}}
{"name":"Wireless Headphones","price":199.99,"category":"audio","tags":["sale","popular"],"description":"Noise cancelling headphones","stock":75,"sku":"WH-200-2022","created_at":"2022-11-10","metadata":{"weight":0.25,"color":"white"}}
{"index":{}}
{"name":"Smart Watch","price":249.99,"category":"wearables","tags":["new","featured"],"description":"Fitness tracking smartwatch","stock":30,"sku":"SW-500-2023","created_at":"2023-03-05","metadata":{"weight":0.1,"color":"black"}}
{"index":{}}
{"name":"4K TV","price":1499.99,"category":"electronics","tags":["premium","large"],"description":"55-inch 4K television","stock":15,"sku":"TV-4K-55-2023","created_at":"2023-01-25","metadata":{"weight":18.5,"color":"black"}}
{"index":{}}
{"name":"Bluetooth Speaker","price":129.99,"category":"audio","tags":["portable"],"description":"Waterproof bluetooth speaker","stock":60,"sku":"BS-100-2022","created_at":"2022-12-15","metadata":{"weight":0.8,"color":"blue"}}
{"index":{}}
{"name":"Gaming Mouse","price":79.99,"category":"accessories","tags":["gaming"],"description":"High DPI gaming mouse","stock":90,"sku":"GM-X200","created_at":"2023-02-01","metadata":{"weight":0.12,"color":"rgb"}}
{"index":{}}
{"name":"External SSD 1TB","price":159.99,"category":"storage","tags":["fast","reliable"],"description":"Portable SSD drive","stock":45,"sku":"ESSD-1TB-2023","created_at":"2023-03-10","metadata":{"weight":0.05,"color":"gray"}}
{"index":{}}
{"name":"Keyboard Pro","price":109.99,"category":"accessories","tags":["ergonomic"],"description":"Mechanical keyboard","stock":25,"sku":"KB-PRO-2023","created_at":"2023-03-15","metadata":{"weight":1.1,"color":"black"}}
{"index":{}}
{"name":"Tablet T8","price":499.99,"category":"electronics","tags":["new","portable"],"description":"10-inch tablet","stock":40,"sku":"TAB-T8-2023","created_at":"2023-02-28","metadata":{"weight":0.5,"color":"silver"}}
{"index":{}}
{"name":"Camera DSLR","price":899.99,"category":"photography","tags":["professional"],"description":"24MP DSLR camera","stock":20,"sku":"CAM-DSLR-24","created_at":"2023-01-10","metadata":{"weight":0.7,"color":"black"}}
{"index":{}}
{"name":"Monitor 27\"","price":299.99,"category":"electronics","tags":["office"],"description":"27-inch office monitor","stock":35,"sku":"MON-27-2023","created_at":"2023-02-15","metadata":{"weight":4.2,"color":"black"}}
1.2 案例分析
1.2.1 match(分詞檢索)
對字段進行分詞后匹配,支持模糊匹配和運算符。
GET /products/_search
{"query": {"match": {"description": {"query": "niose cancelling", // 故意拼錯 "noise" 測試模糊匹配"fuzziness": "AUTO"}}}
}
1.2.2 match_phrase(短語檢索)
要求詞語按順序完整出現,可設置 slop
,允許中間有其他詞。
GET /products/_search
{"query": {"match_phrase": {"description": {"query": "high laptop","slop": 1 // 允許中間有 1 個其他詞}}}
}
1.2.3 match_phrase_prefix(短語前綴匹配)
短語匹配,但最后一個詞支持前綴匹配。
GET /products/_search
{"query": {"match_phrase_prefix": {"name": {"query": "Smart Wa", // 匹配 "Smart Watch" 等"max_expansions": 10 // 限制擴展數量}}}
}
1.2.4 multi_match(多字段匹配)
multi_match
檢索適用于在多個字段上執行 match
檢索的場景。它提供了一種方便的方法來在多個字段中間同時搜索指定的關鍵詞,從而實現跨字段的高效檢索。通過使用 multi_match
檢索,用戶可以簡化復雜的多字段查詢,優化搜索體驗,并確保結果滿足各種檢索需求。
GET /products/_search
{"query": {"multi_match": {"query": "portable","fields": ["name", "description", "tags"],"type": "best_fields"}}
}
由于涉及的字段不止一個,multi_match
檢索在處理結果評分時采用特殊的評分機制,包括 most_fields
、best_fields
、cross_fields
等評分方式。這些評分方式確定了如何對每個字段獲取的分數進行整合。
為了強調 tags
字段在搜索結果中的重要性,我們使用 ^3
來提高其權重。這意味著匹配 tags
字段的文檔具有更高的相關性分數。
GET /products/_search
{"query": {"multi_match": {"query": "portable","fields": ["name", "description", "tags^3"], "type": "best_fields"}}
}
1.2.5 query_string(高級查詢語法)
支持 Lucene 查詢語法,功能強大但較復雜。
例如:查找在 name
或 description
字段中包含 laptop
或 smartphone
,并且 price
字段值在 100 100 100 到 1000 1000 1000 之間的所有產品文檔。
GET /products/_search
{"query": {"query_string": {"query": "(laptop OR smartphone) AND price:[100 TO 1000]","fields": ["name", "description"],"default_operator": "AND"}}
}
1.2.6 simple_query_string
更簡單的語法,對用戶輸入更友好,容錯性更好。
例如,搜索同時滿足以下條件的產品:
- 僅在商品名稱(
name
)和描述(description
)字段中搜索。 - 必須包含
speaker
(由+speaker
表示)。 - 必須不包含
blue
(由-blue
表示)。 - 最好包含
waterproof
(沒有前綴符號,作為可選條件)。
GET /products/_search
{"query": {"simple_query_string": {"query": "waterproof +speaker -blue", "fields": ["name", "description"],"default_operator": "AND"}}
}
+
必須包含,-
必須不包含。"default_operator": "AND"
表示當有多個搜索詞時(沒有+
/-
前綴的詞),默認使用AND
邏輯。AND
操作符:提高精確度(結果更少但更相關)。OR
操作符:提高召回率(結果更多但可能包含不相關項)。
如果用 SQL 表示,類似于:
SELECT * FROM products
WHERE (name LIKE '%speaker%' OR description LIKE '%speaker%')
AND (name NOT LIKE '%blue%' AND description NOT LIKE '%blue%')
AND (name LIKE '%waterproof%' OR description LIKE '%waterproof%')
注意:雖然 metadata.color
包含 blue
,但沒有檢查 metadata.color
的內容,所以會按照上述內容返回。
如果真正目的是排除藍色產品,應該這樣查詢:
GET /products/_search
{"query": {"bool": {"must": {"simple_query_string": {"query": "waterproof +speaker","fields": ["name", "description"]}},"must_not": {"term": {"metadata.color": "blue"}}}}
}
1.3 對比總結表
查詢類型 | 特點 | 適用場景 | 語法復雜度 |
---|---|---|---|
match | 基本分詞匹配,支持模糊 | 常規搜索 | 低 |
match_phrase | 精確短語匹配 | 引號搜索、固定短語 | 低 |
match_phrase_prefix | 短語+最后詞前綴 | 自動補全 | 中 |
multi_match | 多字段搜索 | 跨字段搜索 | 中 |
query_string | 完整查詢語法 | 高級搜索界面 | 高 |
simple_query_string | 簡化語法 | 用戶直接輸入 | 中 |
2.組合檢索
must
:查詢結果必須滿足指定條件。must_not
:查詢結果必須不滿足指定條件。在此情況下,召回的數據評分為 0 0 0,且不考慮評分。filter
:過濾條件,同樣不考慮評分,召回的數據評分為 0 0 0。使用filter
可以借助緩存機制提高查詢性能。should
:查詢結果可以滿足的部分條件,具體滿足條件的最小數量由minimum_should_match
參數控制。
🚀 Elasticsearch 查詢語句中的
query
和filter
具有不同的用途。
query
用于評估文檔相關性,并對結果進行評分,通常用于搜索場景。filter
用于篩選文檔,不會對文檔評分,通常用于過濾場景。
業務要求:查找符合以下條件的相關產品,其中:
- 必須在
category
或description
中包含electronics
。 - 優先顯示以下產品:
description
中提到high performance
的產品。- 被標記為
popular
的產品。 - 同時滿足多個加分條件的產品會排名更靠前。
GET /products/_search
{"query": {"bool": {"must": [{"multi_match": {"query": "electronics","fields": ["category^2", "description"],"type": "most_fields"}}],"should": [{"match_phrase": {"description": {"query": "high performance","slop": 2}}},{"match": {"tags": {"query": "popular"}}}],"minimum_should_match": 1}}
}