在中文搜索場景中,用戶經常使用拼音輸入(如 “iPhone”、“pingguo”)來搜索中文內容(如“蘋果手機”)。為了提升用戶體驗,Elasticsearch 可通過 拼音分詞器 + Completion Suggester 實現 拼音補全(Pinyin Completion) 功能。
本文提供一套 完整、可落地的 Elasticsearch 拼音補全配置模板,支持:
- 中文輸入 → 中文補全
- 拼音輸入 → 中文補全
- 拼音首字母輸入 → 中文補全
- 自動糾錯與模糊匹配
一、前置準備
1. 安裝拼音分詞插件
Elasticsearch 官方不自帶拼音分詞器,需安裝第三方插件:
# 進入 Elasticsearch 插件目錄
cd /usr/share/elasticsearch# 安裝拼音分詞器(根據 ES 版本選擇)
bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v8.11.0/elasticsearch-analysis-pinyin-8.11.0-linux-x86_64.zip
? 支持版本:6.x ~ 8.x,GitHub 項目地址
重啟 Elasticsearch 使插件生效。
二、索引配置模板
PUT /products-pinyin
{"settings": {"analysis": {"analyzer": {"pinyin_analyzer": {"type": "custom","tokenizer": "pinyin","filter": ["lowercase"]}},"tokenizer": {"pinyin": {"type": "pinyin","keep_separate_first_letter": false,"keep_full_pinyin": true,"keep_original": true,"limit_first_letter_length": 16,"lowercase": true,"remove_duplicated_term": true}}}},"mappings": {"properties": {"title": {"type": "text","analyzer": "ik_max_word","fields": {"pinyin": {"type": "text","analyzer": "pinyin_analyzer"}}},"suggest": {"type": "completion","analyzer": "simple","preserve_separators": true,"preserve_position_increments": true,"max_input_length": 50},"suggest_pinyin": {"type": "completion","analyzer": "pinyin_analyzer","preserve_separators": false,"preserve_position_increments": false,"max_input_length": 50}}}
}
三、字段說明
字段 | 用途 |
---|---|
title | 原始文本,用于全文搜索 |
title.pinyin | 用于拼音搜索(如 match: { "title.pinyin": "pingguo" } ) |
suggest | 支持中文輸入補全(如“蘋” → “蘋果手機”) |
suggest_pinyin | 支持拼音輸入補全(如“ping” → “蘋果手機”) |
四、寫入文檔示例
PUT /products-pinyin/_doc/1
{"title": "蘋果手機 iPhone 15","suggest": {"input": ["蘋果手機","iPhone 15","蘋果","手機"],"weight": 30},"suggest_pinyin": {"input": ["pingguoshouji","yinguoshouji","pingguo","shouji","pgs","pg"],"weight": 30}
}
?
input
列表包含:
- 完整拼音:
pingguoshouji
- 首字母:
pgs
、pg
- 分詞拼音:
pingguo
,shouji
五、查詢方式
1. 中文前綴補全
POST /products-pinyin/_search
{"suggest": {"text": "蘋","completion": {"field": "suggest"}}
}
返回:
"suggest": [{"text": "蘋","options": [{ "text": "蘋果手機", "score": 30 }]}
]
2. 拼音前綴補全
POST /products-pinyin/_search
{"suggest": {"text": "ping","completion": {"field": "suggest_pinyin"}}
}
返回:
"options": [{ "text": "pingguoshouji", "score": 30 }
]
?? 返回的是拼音,需在應用層映射回原始標題。
? 建議:在 suggest_pinyin
的 _source
中存儲原始 title
:
"suggest_pinyin": {"input": ["pingguo"],"weight": 30,"_source": "蘋果手機 iPhone 15"
}
3. 拼音首字母補全
POST /products-pinyin/_search
{"suggest": {"text": "pgs","completion": {"field": "suggest_pinyin"}}
}
只要
input
中包含pgs
,即可匹配。
4. 模糊拼音補全(支持糾錯)
"suggest": {"text": "pinggou","completion": {"field": "suggest_pinyin","fuzzy": {"fuzziness": 1,"transpositions": true}}
}
可匹配
pingguo
(編輯距離為 1)。
六、優化建議 ?
場景 | 建議 |
---|---|
輸入性能 | 預生成拼音和首字母,避免運行時計算 |
存儲空間 | suggest_pinyin.input 可能較多,控制 max_input_length |
權重控制 | 熱門商品設置更高 weight |
緩存 | 應用層緩存高頻拼音前綴(如“i”, “ip”, “iph”) |
多語言 | 支持英文、拼音混合輸入 |
七、完整補全流程(應用層)
def get_suggestions(user_input):suggestions = []# 1. 如果是中文,查 suggestif is_chinese(user_input):res = es.search(index="products-pinyin", suggest={"text": user_input, "completion": {"field": "suggest"}})for opt in res['suggest'][0]['options']:suggestions.append(opt['text'])# 2. 如果是拼音,查 suggest_pinyinelif is_pinyin(user_input):res = es.search(index="products-pinyin", suggest={"text": user_input, "completion": {"field": "suggest_pinyin"}, "fuzzy": {"fuzziness": 1}})for opt in res['suggest'][0]['options']:# 從 _source 或映射表獲取原始標題original = get_title_by_pinyin(opt['text'])if original not in suggestions:suggestions.append(original)return suggestions[:10]
八、擴展建議
場景 | 建議方案 |
---|---|
動態拼音生成 | 寫入時用 Ingest Pipeline 自動生成拼音 |
拼音 + 中文混合補全 | 使用 multi_match 查詢 title.pinyin 和 title |
個性化補全 | 結合用戶歷史行為調整 weight |
冷啟動問題 | 初始填充運營配置的熱門詞 |
性能監控 | 監控 suggest 查詢延遲與命中率 |
九、Ingest Pipeline 自動生成拼音(可選)
PUT /_ingest/pipeline/add_pinyin_suggest
{"description": "自動添加拼音補全字段","processors": [{"script": {"lang": "painless","source": """ctx.suggest_pinyin = [];def inputs = [ctx.title];// 可調用外部服務生成拼音// 此處簡化為固定值ctx.suggest_pinyin.add('pingguoshouji');ctx.suggest_pinyin.add('pgs');"""}}]
}
寫入時使用:
PUT /products-pinyin/_doc/2?pipeline=add_pinyin_suggest
{"title": "華為手機"
}