技術中臺與終搜—

文章目錄

5、語言處理與自動補全技術探測
- 5.1 自定義語料庫
- - 5.1.1 語料庫映射OpenAPI
  - 5.1.2 語料庫文檔OpenAPI
- 5.2 產品搜索與自動補全
- - 5.2.1 漢字補全OpenAPI
  - 5.2.2 拼音補全OpenAPI
- 5.3 產品搜索與語言處理
- - 5.3.1 什么是語言處理（拼寫糾錯）
  - 5.3.2 語言處理OpenAPI
- 5.4 總結
6、電商平臺產品推薦
- 6.1 什么是搜索推薦
- 6.2 產品推薦OpenAPI
7、指標聚合與下鉆分析
- 7.1 指標聚合與分類
- 7.2 指標聚合與下鉆設計
- - 7.2.1 基礎框架搭建
  - 7.2.2 單值分析API設計
  - 7.2.3 多值分析API設計
8、電商平臺日志埋點與搜索熱詞
- 8.1 什么是熱度搜索
- 8.2 提取熱度搜索
- 8.3 日志埋點
- 8.4 數據落盤
- 8.5 熱度搜索OpenAPI

5、語言處理與自動補全技術探測

實現的效果

實現的最終效果如下圖京東搜索相似，輸入詞的時候返回提示。同時輸入拼音和首字母也會有相同的提示效果

輸入漢字
在這里插入圖片描述

輸入拼音
在這里插入圖片描述

輸入首字母
在這里插入圖片描述

5.1 自定義語料庫

5.1.1 語料庫映射OpenAPI

索引映射OpenAPI

定義索引（映射）接口

/*** 索引操作接口*/
public interface ElasticsearchIndexService {//新增索引+映射public boolean addIndexAndMapping(CommonEntity commonEntity) throws Exception;
}

定義索引（映射）實現

@Overridepublic boolean addIndexAndMapping(CommonEntity commonEntity) throws Exception {boolean  flag=false;//創建索引請求CreateIndexRequest request=new CreateIndexRequest(commonEntity.getIndexName());//獲取下游業務參數Map<String,Object> map =commonEntity.getMap();//循環參數for(Map.Entry<String,Object> entry:map.entrySet()){//設置settings參數if("settings".equals(entry.getKey()) && entry.getValue() instanceof  Map && ((Map)entry.getValue()).size()>0){request.settings(((Map)entry.getValue()));}//設置mapping參數if("mapping".equals(entry.getKey()) && entry.getValue() instanceof  Map && ((Map)entry.getValue()).size()>0){request.mapping(((Map)entry.getValue()));}}//創建索引操作客戶端IndicesClient indicesClient=client.indices();//創建響應對象CreateIndexResponse response=indicesClient.create(request,RequestOptions.DEFAULT);flag=response.isAcknowledged();return flag;}

新增控制器

/*** 索引操作控制器*/
@RestController
@RequestMapping("v1/indices")
public class ElasticsearchIndexController {private static final Logger logger = LoggerFactory.getLogger(ElasticsearchIndexController.class);@AutowiredElasticsearchIndexService elasticsearchIndexService;@PostMapping(value = "/add")public ResponseData addIndexAndMapping(@RequestBody CommonEntity commonEntity) {//構造返回下游業務數據ResponseData rData = new ResponseData();if (StringUtils.isEmpty(commonEntity.getIndexName())) {rData.setResultEnum(ResultEnum.param_isnull);return rData;}//增加索引（映射）是否成功boolean isSuccess = false;try {//通過接口調用遠程結構化查詢方法isSuccess = elasticsearchIndexService.addIndexAndMapping(commonEntity);//通過類型推斷自動裝箱（多個參數取交集）rData.setResultEnum(isSuccess, ResultEnum.success, null);//日志記錄logger.info(TipsEnum.create_index_success.getMessage());} catch (Exception e) {//打印到控制臺e.printStackTrace();//日志記錄logger.error(TipsEnum.create_index_fail.getMessage());//構建錯誤返回信息rData.setResultEnum(ResultEnum.error);}//返回return rData;}
}

開始新增映射

http://127.0.0.1:8888/v1/indices/add

參數
自定義分詞器ik_pinyin_analyzer（ik和pinyin組合分詞器）

tips
在創建映射前，需要安裝拼音插件

{"indexName": "product_completion_index","map": {"settings": {"number_of_shards": 1,"number_of_replicas": 2,"analysis": {"analyzer": {"ik_pinyin_analyzer": {"type": "custom","tokenizer": "ik_smart","filter": "pinyin_filter"}},"filter": {"pinyin_filter": {"type": "pinyin","keep_first_letter": true,"keep_separate_first_letter": false,"keep_full_pinyin": true,"keep_original": true,"limit_first_letter_length": 16,"lowercase": true,"remove_duplicated_term": true}}}},"mapping": {"properties": {"name": {"type": "keyword"},"searchkey": {"type": "completion","analyzer": "ik_pinyin_analyzer"}}}}
}

settings下面的為索引的設置信息，動態設置參數，遵循DSL寫法
mapping下為映射的字段信息，動態設置參數，遵循DSL寫法

屬性	說明
keep_first_letter	啟用此選項時，例如：劉德華> ldh，默認值： true
keep_separate_first_letter	啟用該選項時，將保留第一個字母分開，例如：劉德華> l，d，h，默認：假的，注意：查詢結果也許是太模糊，由于長期過頻
limit_first_letter_length	設置first_letter結果的最大長度，默認值：16
keep_full_pinyin	當啟用該選項，例如：劉德華> [ liu，de， hua]，默認值：true
keep_joined_full_pinyin	當啟用此選項時，例如：劉德華> [ liudehua]，默認值：false
keep_none_chinese	在結果中保留非中文字母或數字，默認值：true
keep_none_chinese_together	默認值：true，如：DJ音樂家- > DJ，yin，yue， jia，當設置為false，例如：DJ音樂家- > D，J， yin，yue，jia，注意：keep_none_chinese必須先啟動
keep_none_chinese_in_first_letter	第一個字母保持非中文字母，例如：劉德華 AT2016- > ldhat2016，默認值：true
keep_none_chinese_in_joined_full_pinyin	保留非中文字母加入完整拼音，例如：劉德華 2016- > liudehua2016，默認：false
none_chinese_pinyin_tokenize	打破非中國信成單獨的拼音項，如果他們拼音，默認值：true，如： liudehuaaibaba13zhuanghan- > liu，de， hua，a，li，ba，ba，13，zhuang，han，注意：keep_none_chinese和 keep_none_chinese_together應首先啟用
keep_original	當啟用此選項時，也會保留原始輸入，默認值： false
lowercase	小寫非中文字母，默認值：true
trim_whitespace	默認值：true
remove_duplicated_term	當啟用此選項時，將刪除重復項以保存索引，例如：de的> de，默認值：false，注意：位置相關查詢可能受影響

返回
在這里插入圖片描述

5.1.2 語料庫文檔OpenAPI

定義批量新增文檔接口

    //批量新增文檔public RestStatus bulkAndDoc(CommonEntity commonEntity) throws Exception;

定義批量新增文檔實現

	@Overridepublic RestStatus bulkAndDoc(CommonEntity commonEntity) throws Exception {//構建批量新增請求BulkRequest bulkRequest = new BulkRequest(commonEntity.getIndexName());//循環下游業務文檔數據for (int i = 0; i < commonEntity.getList().size(); i++) {bulkRequest.add(new IndexRequest().source(XContentType.JSON, SearchTools.mapToObjectGroup(commonEntity.getList().get(i))));}//開始執行批量新增操作BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);return bulkResponse.status();}

官方文檔
在這里插入圖片描述

如上圖，需要定義成箭頭中的形式
所以上面SearchTools.mapToObjectGroup將map轉成了數組

定義批量新增文檔控制器

	@PostMapping(value = "/batch")public ResponseData bulkAndDoc(@RequestBody CommonEntity commonEntity) {//構造返回下游業務數據ResponseData rData = new ResponseData();if (StringUtils.isEmpty(commonEntity.getIndexName()) || CollectionUtils.isEmpty(commonEntity.getList())) {rData.setResultEnum(ResultEnum.param_isnull);return rData;}//定義批量返回結果RestStatus result = null;try {//通過接口調用批量新增方法result = elasticsearchDocumentService.bulkAndDoc(commonEntity);//通過類型推斷自動裝箱（多個參數取交集）rData.setResultEnum(result, ResultEnum.success, null);//日志記錄logger.info(TipsEnum.batch_create_doc_success.getMessage());} catch (Exception e) {//打印到控制臺e.printStackTrace();//日志記錄logger.error(TipsEnum.batch_create_doc_fail.getMessage());//構建錯誤返回信息rData.setResultEnum(ResultEnum.error);}//返回return rData;}

開始批量新增調用

http://127.0.0.1:8888/v1/docs/batch

參數
定義23個suggest詞庫（定義了兩個小米手機，驗證是否去重）

tips
學完聚合通過日志埋點、數據落盤進行維護

{"indexName": "product_completion_index","list": [{"searchkey": "小米手機","name": "小米(MI)"},{"searchkey": "小米10","name": "小米(MI)"},{"searchkey": "小米電視","name": "小米(MI)"},{"searchkey": "小米路由器","name": "小米(MI)"},{"searchkey": "小米9","name": "小米(MI)"},{"searchkey": "小米手機","name": "小米(MI)"},{"searchkey": "小米耳環","name": "小米(MI)"},{"searchkey": "小米8","name": "小米(MI)"},{"searchkey": "小米10Pro","name": "小米(MI)"},{"searchkey": "小米筆記本","name": "小米(MI)"},{"searchkey": "小米攝像頭","name": "小米(MI)"},{"searchkey": "小米電飯煲","name": "小米(MI)"},{"searchkey": "小米充電寶","name": "小米(MI)"},{"searchkey": "adidas男鞋","name": "adidas男鞋"},{"searchkey": "adidas女鞋","name": "adidas女鞋"},{"searchkey": "adidas外套","name": "adidas外套"},{"searchkey": "adidas褲子","name": "adidas褲子"},{"searchkey": "adidas官方旗艦店","name": "adidas官方旗艦店"},{"searchkey": "阿迪達斯襪子","name": "阿迪達斯襪子"},{"searchkey": "阿迪達斯外套","name": "阿迪達斯外套"},{"searchkey": "阿迪達斯運動鞋","name": "阿迪達斯運動鞋"},{"searchkey": "耐克外套","name": "耐克外套"},{"searchkey": "耐克運動鞋","name": "耐克運動鞋"}]
}

返回
在這里插入圖片描述

查看

GET product_completion_index/_search

在這里插入圖片描述

5.2 產品搜索與自動補全

在這里插入圖片描述

Term suggester ：詞條建議器。對給輸入的文本進進行分詞，為每個分詞提供詞項建議
Phrase suggester ：短語建議器，在term的基礎上，會考量多個term之間的關系
Completion Suggester，它主要針對的應用場景就是"Auto Completion"。
Context Suggester：上下文建議器

GET product_completion_index/_search
{"from": 0,"size": 100,"suggest": {"czbk-suggest": {"prefix": "小米","completion": {"field": "searchkey","size": 20,"skip_duplicates": true}}}
}

5.2.1 漢字補全OpenAPI

定義自動補全接口

    //自動補全(完成建議)public List<String> cSuggest(CommonEntity commonEntity) throws Exception;

定義自動補全實現

    @Overridepublic List<String> cSuggest(CommonEntity commonEntity) throws Exception {//定義返回List<String> suggestList = new ArrayList<>();//定義自動完成構建器CompletionSuggestionBuilder completionSuggestionBuilder = SuggestBuilders.completionSuggestion(commonEntity.getSuggestFileld());//定義搜索關鍵字completionSuggestionBuilder.prefix(commonEntity.getSuggestValue());//去重completionSuggestionBuilder.skipDuplicates(true);//獲取建議條數completionSuggestionBuilder.size(commonEntity.getSuggestCount());//定義返回字段SearchRequest searchRequest = new SearchRequest().indices(commonEntity.getIndexName()).source(new SearchSourceBuilder().sort(new ScoreSortBuilder().order(SortOrder.DESC)).suggest(new SuggestBuilder().addSuggestion("czbk-suggest", completionSuggestionBuilder)));//定義查找響應SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);//定義完成建議對象CompletionSuggestion completionSuggestion = response.getSuggest().getSuggestion("czbk-suggest");
//獲取返回數據List<CompletionSuggestion.Entry.Option> optionList = completionSuggestion.getEntries().get(0).getOptions();//從optionList取出結果if (!CollectionUtils.isEmpty(optionList)) {optionList.forEach(item -> {suggestList.add(item.getText().toString());});}return suggestList;}

定義自動補全控制器

    @GetMapping(value = "/csuggest")public ResponseData cSuggest(@RequestBody CommonEntity commonEntity) {//構造返回下游業務數據ResponseData rData = new ResponseData();if (StringUtils.isEmpty(commonEntity.getIndexName()) || StringUtils.isEmpty(commonEntity.getSuggestFileld()) || StringUtils.isEmpty(commonEntity.getSuggestValue())) {rData.setResultEnum(ResultEnum.param_isnull);return rData;}//定義建議返回結果List<String> result = null;try {//通過接口調用批量新增方法result = elasticsearchDocumentService.cSuggest(commonEntity);//通過類型推斷自動裝箱（多個參數取交集）rData.setResultEnum(result, ResultEnum.success, result.size());//日志記錄logger.info(TipsEnum.csuggest_get_doc_success.getMessage());} catch (Exception e) {//打印到控制臺e.printStackTrace();//日志記錄logger.error(TipsEnum.csuggest_get_doc_fail.getMessage());//構建錯誤返回信息rData.setResultEnum(ResultEnum.error);}//返回return rData;}

自動補全調用驗證

http://192.168.150.7:6666/v1/docs/csuggest
或者
http://localhost:6666/v1/docs/csuggest

參數

{"indexName": "product_completion_index","suggestFileld": "searchkey","suggestValue": "小米","suggestCount": 13
}

indexName索引名稱
suggestFileld：自動補全查找列
suggestValue：自動補全輸入的關鍵字
suggestCount：自動補全返回個數（京東是13個）

{"code": "200","desc": "操作成功！","data": ["小米10","小米10Pro","小米8","小米9","小米充電寶","小米手機","小米攝像頭","小米電視","小米電飯煲","小米筆記本","小米耳環","小米路由器"],"count": 12
}

自動補全自動去重

5.2.2 拼音補全OpenAPI

使用拼音訪問【小米】

http://localhost:8888/v1/docs/csuggest

參數

// 全拼訪問
{"indexName": "product_completion_index","suggestFileld": "searchkey","suggestValue": "xiaomi","suggestCount": 13
}
// 全拼訪問(分隔)
{"indexName": "product_completion_index","suggestFileld": "searchkey","suggestValue": "xiao mi","suggestCount": 13
}
// 首字母訪問
{"indexName": "product_completion_index","suggestFileld": "searchkey","suggestValue": "xm","suggestCount": 13
}

安裝pinyin插件

5.3 產品搜索與語言處理

5.3.1 什么是語言處理（拼寫糾錯）

場景描述
例如：錯誤輸入"【adidaas官方旗艦店】 ”能夠糾錯為【adidas官方旗艦店】

在這里插入圖片描述

5.3.2 語言處理OpenAPI

GET product_completion_index/_search
{"suggest": {"czbk-suggestion": {"text": "adidaas官方旗艦店","phrase": {"field": "name","size": 13}}}
}

定義拼寫糾錯接口

    // 拼寫糾錯public String pSuggest(CommonEntity commonEntity) throws Exception;

定義拼寫糾錯實現

    @Overridepublic String pSuggest(CommonEntity commonEntity) throws Exception {//定義返回String pSuggestString = new String();//定義短語建議器的構建器PhraseSuggestionBuilder phraseSuggestionBuilder = new PhraseSuggestionBuilder(commonEntity.getSuggestFileld());//設置搜索關鍵字phraseSuggestionBuilder.text(commonEntity.getSuggestValue());//數量匹配phraseSuggestionBuilder.size(1);//定義返回字段SearchRequest searchRequest = new SearchRequest().indices(commonEntity.getIndexName()).source(new SearchSourceBuilder().sort(new ScoreSortBuilder().order(SortOrder.DESC)).suggest(new SuggestBuilder().addSuggestion("czbk-suggest", phraseSuggestionBuilder)));//定義查找響應SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);//定義短語建議對象PhraseSuggestion phraseSuggestion = response.getSuggest().getSuggestion("czbk-suggest");//獲取返回數據List<PhraseSuggestion.Entry.Option> optionList = phraseSuggestion.getEntries().get(0).getOptions();//從optionList取出結果if (!CollectionUtils.isEmpty(optionList)) {pSuggestString = optionList.get(0).getText().toString();}return pSuggestString;}

定義拼寫糾錯控制器

    @GetMapping(value = "/psuggest")public ResponseData pSuggest(@RequestBody CommonEntity commonEntity) {//構造返回下游業務數據ResponseData rData = new ResponseData();if (StringUtils.isEmpty(commonEntity.getIndexName()) || StringUtils.isEmpty(commonEntity.getSuggestFileld()) || StringUtils.isEmpty(commonEntity.getSuggestValue())) {rData.setResultEnum(ResultEnum.param_isnull);return rData;}//定義糾錯返回結果String result = null;try {//通過接口調用批量新增方法result = elasticsearchDocumentService.pSuggest(commonEntity);//通過類型推斷自動裝箱（多個參數取交集）rData.setResultEnum(result, ResultEnum.success, null);//日志記錄logger.info(TipsEnum.psuggest_get_doc_success.getMessage());} catch (Exception e) {//打印到控制臺e.printStackTrace();//日志記錄logger.error(TipsEnum.psuggest_get_doc_fail.getMessage());//構建錯誤返回信息rData.setResultEnum(ResultEnum.error);}//返回return rData;}

語言處理調用驗證

http://192.168.150.7:6666/v1/docs/psuggest
或者
http://localhost:6666/v1/docs/psuggest

參數
```
{"indexName": "product_completion_index","suggestFileld": "name","suggestValue": "adidaas官方旗艦店"
}
```
- indexName索引名稱
- suggestFileld：自動補全查找列
- suggestValue：自動補全輸入的關鍵字
返回
```
{"code": "200","desc": "操作成功！","data": "adidas官方旗艦店"
}
```

5.4 總結

需要一個搜索詞庫/語料庫，不要和業務索引庫在一起，方便維護和升級語料庫
根據分詞及其他搜索條件去語料庫中查詢若干條（京東13條、淘寶（天貓）10條、百度4條）記錄
返回
為了提升準確率，通常都是前綴搜索

6、電商平臺產品推薦

6.1 什么是搜索推薦

在這里插入圖片描述

例如：關鍵詞輸入【阿迪達斯耐克外套運動鞋襪子】

汪~沒有找到與“阿迪達斯耐克外套運動鞋襪子”相關的商品，為您推薦“ 阿迪達斯耐克運動鞋”的相關商品，或者試試：

6.2 產品推薦OpenAPI

GET product_completion_index/_search
{"suggest": {"czbk-suggestion": {"text": "阿迪達斯 耐克 外套 運動鞋 襪子","term": {"field": "name","min_word_length": 2,"string_distance": "ngram","analyzer": "ik_smart"}}}
}

注意的地方，查看官網
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/search-suggesters.html#te
rm-suggester

在這里插入圖片描述

定義搜索推薦接口

    //搜索推薦public String tSuggest(CommonEntity commonEntity) throws Exception;

定義搜索推薦實現

    @Overridepublic String tSuggest(CommonEntity commonEntity) throws Exception {//定義返回String tSuggestString = new String();//定義詞條建議器的構建器TermSuggestionBuilder termSuggestionBuilder = SuggestBuilders.termSuggestion(commonEntity.getSuggestFileld());//定義搜索關鍵字termSuggestionBuilder.text(commonEntity.getSuggestValue());//設置分詞termSuggestionBuilder.analyzer("ik_smart");//定義查詢長度termSuggestionBuilder.minWordLength(2);//設置查找算法termSuggestionBuilder.stringDistance(TermSuggestionBuilder.StringDistanceImpl.NGRAM);//定義返回字段SearchRequest searchRequest = new SearchRequest().indices(commonEntity.getIndexName()).source(new SearchSourceBuilder().sort(new ScoreSortBuilder().order(SortOrder.DESC)).suggest(new SuggestBuilder().addSuggestion("czbk-suggest", termSuggestionBuilder)));//定義查找響應SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);//定義term建議對象TermSuggestion termSuggestion = response.getSuggest().getSuggestion("czbk-suggest");//獲取返回數據List<TermSuggestion.Entry.Option> optionList = termSuggestion.getEntries().get(0).getOptions();//從optionList取出結果if (!CollectionUtils.isEmpty(optionList)) {tSuggestString = optionList.get(0).getText().toString();}return tSuggestString;}

定義搜索推薦控制器

    @GetMapping(value = "/tsuggest")public ResponseData tSuggest(@RequestBody CommonEntity commonEntity) {//構造返回下游業務數據ResponseData rData = new ResponseData();if (StringUtils.isEmpty(commonEntity.getIndexName()) || StringUtils.isEmpty(commonEntity.getSuggestFileld()) || StringUtils.isEmpty(commonEntity.getSuggestValue())) {rData.setResultEnum(ResultEnum.param_isnull);return rData;}//定義搜索推薦返回結果String result = null;try {//通過接口調用批量新增方法result = elasticsearchDocumentService.tSuggest(commonEntity);//通過類型推斷自動裝箱（多個參數取交集）rData.setResultEnum(result, ResultEnum.success, null);//日志記錄logger.info(TipsEnum.tsuggest_get_doc_success.getMessage());} catch (Exception e) {//打印到控制臺e.printStackTrace();//日志記錄logger.error(TipsEnum.tsuggest_get_doc_fail.getMessage());//構建錯誤返回信息rData.setResultEnum(ResultEnum.error);}//返回return rData;}

語言處理調用驗證

http://127.0.0.1:8888/v1/docs/tsuggest

參數

{"indexName": "product_completion_index","suggestFileld": "name","suggestValue": "阿迪達斯 耐克 外套 運動鞋 襪子"
}

indexName索引名稱
suggestFileld：自動補全查找列
suggestValue：自動補全輸入的關鍵字

{"code": "200","desc": "操作成功！","data": "阿迪達斯外套"
}

7、指標聚合與下鉆分析

7.1 指標聚合與分類

什么是指標聚合（Metric）

聚合分析是數據庫中重要的功能特性，完成對某個查詢的數據集中數據的聚合計算，
如：找出某字段（或計算表達式的結果）的最大值、最小值，計算和、平均值等。
ES作為搜索引擎兼數據庫，同樣提供了強大的聚合分析能力。
對一個數據集求最大值、最小值，計算和、平均值等指標的聚合，在ES中稱為指標聚合。

Metric聚合分析分為單值分析和多值分析兩類

單值分析，只輸出一個分析結果
min,max,avg,sum,cardinality（cardinality 求唯一值，即不重復的字段有多少（相當于mysql中的
distinct）
多值分析，輸出多個分析結果
stats,extended_stats,percentile,percentile_rank

7.2 指標聚合與下鉆設計

官網

語法

"aggregations" : {"<aggregation_name>" : { <!--聚合的名字 -->"<aggregation_type>" : { <!--聚合的類型 --><aggregation_body> <!--聚合體：對哪些字段進行聚合 -->}[,"meta" : { [<meta_data_body>] } ]? <!--元 -->[,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定義子聚合-->}[,"<aggregation_name_2>" : { ... } ]* <!--聚合的名字 -->
}

openAPI設計目標與原則：

DSL調用與語法進行高度抽象，參數動態設計
Open API通過結果轉換器支持上百種組合調用
qurey,constant_score,match/matchall/filter/sort/size/frm/higthlight/_source/includes
邏輯處理公共調用，提升API業務處理能力
保留原生API與參數的用法

7.2.1 基礎框架搭建

在這里插入圖片描述

7.2.2 單值分析API設計

Avg(平均值)

從聚合文檔中提取的價格的平均值。
對所有文檔進行avg聚合（DSL）

POST product_list/_search
{"size": 0,"aggs": {"czbk": {"avg": {"field": "price"}}}
}

以上匯總計算了所有文檔的平均值。
“size”: 0, 表示只查詢文檔聚合數量，不查文檔，如查詢50，size=50
aggs：表示是一個聚合
czbk：可自定義，聚合后的數據將顯示在自定義字段中

結果：

{"took" : 1662,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"czbk" : {"value" : 920.1535462724372}}
}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"avg": {"field": "price"}}}}
}

對篩選后的文檔聚合

POST product_list/_search
{"size": 0,"query": {"match": {"onelevel": "手機通訊"}},"aggs": {"czbk": {"avg": {"field": "price"}}}
}

結果：

{"took" : 159,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"czbk" : {"value" : 314.77633210684854}}
}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"query": {"match": {"onelevel": "手機通訊"}},"aggs": {"czbk": {"avg": {"field": "price"}}}}
}

根據Script計算平均值：

es所使用的腳本語言是painless這是一門安全-高效的腳本語言,基于jvm的

#統計所有
POST product_list/_search?size=0
{"aggs": {"czbk": {"avg": {"script": {"source": "doc.evalcount.value"}}}}
}
結果："value" : 599929.110015995
#有條件
POST product_list/_search?size=0
{"query": {"match": {"onelevel": "手機通訊"}},"aggs": {"czbk": {"avg": {"script": {"source": "doc.evalcount"}}}}
}
結果："value" : 600055.6935087288

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"avg": {"script": {"source": "doc.evalcount"}}}}}
}

總結：
avg平均
1、統一avg（所有文檔）
2、有條件avg（部分文檔）
3、腳本統計（所有）
4、腳本統計（部分）

代碼編寫

	//平均值if (m.getValue() instanceof ParsedAvg) {map.put("value", ((ParsedAvg) m.getValue()).getValue());}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
或者
http://localhost:5555/v1/analysis/metric/agg

Max(最大值)

計算從聚合文檔中提取的數值的最大值。

統計所有文檔

POST product_list/_search
{"size": 0,"aggs": {"czbk": {"max": {"field": "price"}}}
}

結果： “value” :1.0E8

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"max": {"field": "price"}}}}
}

統計過濾后的文檔

POST product_list/_search
{"size": 0,"query": {"match": {"onelevel": "手機通訊"}},"aggs": {"czbk": {"max": {"field": "price"}}}
}

結果： “value” :2474000.0

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"query": {"match": {"onelevel": "手機通訊"}},"aggs": {"czbk": {"max": {"field": "price"}}}}
}

結果： “value” : 2474000.0

代碼編寫

	//最大值if (m.getValue() instanceof ParsedMax) {map.put("value", ((ParsedMax) m.getValue()).getValue());}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg

Min(最小值)

計算從聚合文檔中提取的數值的最小值。

統計所有文檔

POST product_list/_search
{"size": 0,"aggs": {"czbk": {"min": {"field": "price"}}}
}

結果：“value”: 0.0

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"min": {"field": "price"}}}}
}

統計篩選后的文檔

POST product_list/_search
{"size": 1,"query": {"match": {"onelevel": "手機通訊"}},"aggs": {"czbk": {"min": {"field": "price"}}}
}

結果：“value”: 0.0

參數size=1；可查詢出金額為0的數據

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 1,"query": {"match": {"onelevel": "手機通訊"}},"aggs": {"czbk": {"min": {"field": "price"}}}}
}

代碼編寫

	//最小值if (m.getValue() instanceof ParsedMin) {map.put("value", ((ParsedMin) m.getValue()).getValue());}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
或者
http://localhost:5555/v1/analysis/metric/agg

Sum(總和)

統計所有文檔匯總

POST product_list/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"sum": {"field": "price"}}}
}

結果：“value” : 9.652872986812243E8

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"sum": {"field": "price"}}}}
}

代碼編寫

	//求和if (m.getValue() instanceof ParsedSum) {map.put("value", ((ParsedSum) m.getValue()).getValue());}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg

Cardinality(唯一值)

Cardinality Aggregation，基數聚合。它屬于multi-value，基于文檔的某個值（可以是特定的字段，
也可以通過腳本計算而來），計算文檔非重復的個數（去重計數），相當于sql中的distinct。

cardinality 求唯一值，即不重復的字段有多少（相當于mysql中的distinct）

統計所有文檔

POST product_list/_search
{"size": 0,"aggs": {"czbk": {"cardinality": {"field": "storename.keyword"}}}
}

結果：“value” : 103169

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"cardinality": {"field": "storename.keyword"}}}}
}

統計篩選后的文檔

POST product_list/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"cardinality": {"field": "storename.keyword"}}}
}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"cardinality": {"field": "storename.keyword"}}}}
}

代碼編寫

	//不重復的值if (m.getValue() instanceof ParsedCardinality) {map.put("value", ((ParsedCardinality) m.getValue()).getValue());}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg

7.2.3 多值分析API設計

Stats Aggregation

Stats Aggregation，統計聚合。它屬于multi-value，基于文檔的某個值（可以是特定的數值型字段，也可以通過腳本計算而來），計算出一些統計信息（min、max、sum、count、avg5個值）

統計所有文檔

POST product_list/_search
{"size": 0,"aggs": {"czbk": {"stats": {"field": "price"}}}
}

  "aggregations" : {"czbk" : {"count" : 5072448,"min" : 0.0,"max" : 1.0E8,"avg" : 920.1535462724372,"sum" : 4.667431015482532E9}}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"stats": {"field": "price"}}}}
}

統計篩選文檔

POST product_list/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"stats": {"field": "price"}}}
}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"stats": {"field": "price"}}}}
}

代碼編寫

	//狀態統計if (m.getValue() instanceof ParsedStats) {map.put("count", ((ParsedStats) m.getValue()).getCount());map.put("min", ((ParsedStats) m.getValue()).getMin());map.put("max", ((ParsedStats) m.getValue()).getMax());map.put("avg", ((ParsedStats) m.getValue()).getAvg());map.put("sum", ((ParsedStats) m.getValue()).getSum());}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg

擴展狀態統計

Extended Stats Aggregation，擴展統計聚合。它屬于multi-value，比stats多4個統計結果：平方
和、方差、標準差、平均值加/減兩個標準差的區間

統計所有文檔

POST product_list/_search
{"size": 0,"aggs": {"czbk": {"extended_stats": {"field": "price"}}}
}

  "aggregations" : {"czbk" : {"count" : 5072448,"min" : 0.0,"max" : 1.0E8,"avg" : 920.1535462724372,"sum" : 4.667431015482532E9,"sum_of_squares" : 2.0182209454063148E16,"variance" : 3.9779441210362864E9,"variance_population" : 3.9779441210362864E9,"variance_sampling" : 3.9779449052621484E9,"std_deviation" : 63070.94514145389,"std_deviation_population" : 63070.94514145389,"std_deviation_sampling" : 63070.951358467304,"std_deviation_bounds" : {"upper" : 127062.04382918023,"lower" : -125221.73673663534,"upper_population" : 127062.04382918023,"lower_population" : -125221.73673663534,"upper_sampling" : 127062.05626320705,"lower_sampling" : -125221.74917066217}}}

sum_of_squares:平方和
variance：方差
std_deviation：標準差
std_deviation_bounds：標準差的區間

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"extended_stats": {"field": "price"}}}}
}

統計篩選后的文檔

POST product_list/_search
{"size": 1,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"extended_stats": {"field": "price"}}}
}

  "aggregations" : {"czbk" : {"count" : 340378,"min" : 0.0,"max" : 2474000.0,"avg" : 2835.927406240193,"sum" : 9.652872986812243E8,"sum_of_squares" : 6.06065362437439E13,"variance" : 1.7001407710991383E8,"variance_population" : 1.7001407710991383E8,"variance_sampling" : 1.7001457659747353E8,"std_deviation" : 13038.944631752749,"std_deviation_population" : 13038.944631752749,"std_deviation_sampling" : 13038.963785419206,"std_deviation_bounds" : {"upper" : 28913.81666974569,"lower" : -23241.961857265305,"upper_population" : 28913.81666974569,"lower_population" : -23241.961857265305,"upper_sampling" : 28913.854977078605,"lower_sampling" : -23242.00016459822}}}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 1,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"extended_stats": {"field": "price"}}}}
}

代碼編寫

狀態統計ParsedStats是擴展狀態統計ParsedExtendedStats父類

判斷無需更改順序

	//擴展統計if (m.getValue() instanceof ParsedExtendedStats) {map.put("count", ((ParsedExtendedStats) m.getValue()).getCount());map.put("min", ((ParsedExtendedStats) m.getValue()).getMin());map.put("max", ((ParsedExtendedStats) m.getValue()).getMax());map.put("avg", ((ParsedExtendedStats) m.getValue()).getAvg());map.put("sum", ((ParsedExtendedStats) m.getValue()).getSum());map.put("sum_of_squares", ((ParsedExtendedStats) m.getValue()).getSumOfSquares());map.put("variance", ((ParsedExtendedStats) m.getValue()).getVariance());map.put("std_deviation", ((ParsedExtendedStats) m.getValue()).getStdDeviation());map.put("upper", ((ParsedExtendedStats) m.getValue()).getStdDeviationBound(ExtendedStats.Bounds.UPPER));map.put("lower", ((ParsedExtendedStats) m.getValue()).getStdDeviationBound(ExtendedStats.Bounds.LOWER));}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg

百分位度量/百分比統計

Percentiles Aggregation，百分比聚合。它屬于multi-value，對指定字段（腳本）的值按從小到大累計每個值對應的文檔數的占比（占所有命中文檔數的百分比），返回指定占比比例對應的值。默認返回[1, 5, 25, 50, 75, 95, 99 ]分位上的值。

它們表示了人們感興趣的常用百分位數值。

統計所有文檔

POST product_list/_search
{"size": 0,"aggs": {"czbk": {"percentiles": {"field": "price"}}}
}

  },"aggregations" : {"czbk" : {"values" : {"1.0" : 0.0,"5.0" : 14.99999272133453,"25.0" : 58.76038168571048,"50.0" : 139.47447505232998,"75.0" : 388.59368606915626,"95.0" : 3634.3835145207904,"99.0" : 12547.450833578012}}}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"percentiles": {"field": "price"}}}}
}

統計篩選后的文檔

POST product_list/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"percentiles": {"field": "price"}}}
}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"percentiles": {"field": "price"}}}}
}

代碼編寫

	//百分位度量if (m.getValue() instanceof ParsedTDigestPercentiles) {for (Iterator<Percentile> iterator = ((ParsedTDigestPercentiles) m.getValue()).iterator(); iterator.hasNext(); ) {Percentile p = iterator.next();map.put(p.getPercent(), p.getValue());}}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg

百分位等級/百分比排名聚合

百分比排名聚合：這里有另外一個緊密相關的度量叫 percentile_ranks 。 percentiles 度量告訴
我們落在某個百分比以下的所有文檔的最小值。

統計所有文檔

統計價格在15元之內統計價格在30元之內文檔數據占有的百分比

tips：
統計數據會變化
這里的15和30；完全可以理解萬SLA的200；比較字段不一樣而已

POST product_list/_search
{"size": 0,"aggs": {"czbk": {"percentile_ranks": {"field": "price","values": [15,30]}}}
}

返回
價格在15元之內的文檔數據占比是4.92%
價格在30元之內的文檔數據占比是12.72%

  "aggregations" : {"czbk" : {"values" : {"15.0" : 4.89331591488828,"30.0" : 12.732247823263487}}}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"aggs": {"czbk": {"percentile_ranks": {"field": "price","values": [15,30]}}}}
}

統計過濾后的文檔

POST product_list/_search
{"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"percentile_ranks": {"field": "price","values": [15,30]}}}
}

OpenAPI查詢參數設計

{"indexName": "product_list","map": {"size": 0,"query": {"constant_score": {"filter": {"match": {"threelevel": "手機"}}}},"aggs": {"czbk": {"percentile_ranks": {"field": "price","values": [15,30]}}}}
}

代碼編寫

	//百分位等級if (m.getValue() instanceof ParsedTDigestPercentileRanks) {for (Iterator<Percentile> iterator = ((ParsedTDigestPercentileRanks) m.getValue()).iterator(); iterator.hasNext(); ) {Percentile p = iterator.next();map.put(p.getValue(), p.getPercent());}}

訪問驗證

http://localhost:6666/v1/analysis/metric/agg
OR
http://localhost:5555/v1/analysis/metric/agg

8、電商平臺日志埋點與搜索熱詞

8.1 什么是熱度搜索

以下為【京東】熱搜詞
在這里插入圖片描述

8.2 提取熱度搜索

熱搜詞分析流程圖
在這里插入圖片描述

8.3 日志埋點

下面的配置針對需要埋點的服務
這里以service-elasticsearch為例

Spring Cloud 整合Log4j2

相比與其他的日志系統，log4j2丟數據這種情況少；disruptor技術，在多線程環境下，性能高于logback等10倍以上；利用jdk1.5并發的特性，減少了死鎖的發生；

排除logback的默認集成。
因為Spring Cloud 默認集成了logback, 所以首先要排除logback的集成，在pom.xml文件
```
        <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId><exclusions><exclusion><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-logging</artifactId></exclusion></exclusions></dependency>
```

引入log4j2起步依賴

        <!-- 引入log4j2起步依賴--><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-log4j2</artifactId></dependency><!-- log4j2依賴環形隊列--><dependency><groupId>com.lmax</groupId><artifactId>disruptor</artifactId><version>3.4.2</version></dependency>

設置配置文件

如果自定義了文件名，需要在application.yml中配置

進入Nacos修改配置
```
logging:config: classpath:log4j2-dev.xml
```
配置文件模板
```
<Configuration><Appenders><Socket name="Socket" host="192.168.150.7" port="4567"><JsonLayout compact="true" eventEol="true" /></Socket></Appenders><Loggers><Root level="info"><AppenderRef ref="Socket"/></Root></Loggers>
</Configuration>
```
從配置文件中可以看到，這里使用的是Socket Appender來將日志打印的信息發送到Logstash。

注意了，Socket的Appender必須要配置到下面的Logger才能將日志輸出到Logstash里！

另外這里的host是部署了Logstash服務端的地址，并且端口號要和你在Logstash里配置的一致才行。

日志埋點

    private void getClientConditions(CommonEntity commonEntity, SearchSourceBuilder searchSourceBuilder) {//循環下游業務查詢條件for (Map.Entry<String, Object> m : commonEntity.getMap().entrySet()) {if (StringUtils.isNotEmpty(m.getKey()) && m.getValue() != null) {String key = m.getKey();String value = String.valueOf(m.getValue());//構造DSL請求體中的querysearchSourceBuilder.query(QueryBuilders.matchQuery(key, value));logger.info("search for the keyword:" + value);}}}

創建索引

下面的索引存儲用戶輸入的關鍵字，最終通過聚合的方式處理索引數據，最終將數據放到語料庫

PUT es-log/
{"mappings": {"properties": {"@timestamp": {"type": "date"},"host": {"type": "text"},"searchkey": {"type": "keyword"},"port": {"type": "long"},"loggerName": {"type": "text"}}}
}

8.4 數據落盤

配置Logstash.conf

連接logstash方式有兩種
（1）一種是Socket連接
（2）另外一種是gelf連接

對外暴露logstash容器的4567端口：參考文檔

執行全文檢索

http://localhost:8888/v1/docs/mquery

參數

{"pageNumber": 1,"pageSize": 3,"indexName": "product_list","highlight": "productname","map": {"productname": "小米"}
}

查詢是否有數據

GET es-log/_search
{"from": 0,"size": 200,"query": {"match_all": {}}
}

    "hits" : [{"_index" : "es-log","_type" : "_doc","_id" : "H94AKpQB5vqCNWpIYHYT","_score" : 1.0,"_source" : {"host" : "192.168.150.1","loggerName" : "com.xin.service.impl.ElasticsearchDocumentServiceImpl","@timestamp" : "2025-01-03T02:30:55.118Z","searchkey" : "小米","port" : 54544}},{"_index" : "es-log","_type" : "_doc","_id" : "ZdgAKpQBrYxtVgSQgvHB","_score" : 1.0,"_source" : {"host" : "192.168.150.1","loggerName" : "com.xin.service.impl.ElasticsearchDocumentServiceImpl","@timestamp" : "2025-01-03T02:31:04.021Z","searchkey" : "小米","port" : 54544}}]

8.5 熱度搜索OpenAPI

聚合

獲取es-log索引中的文檔數據并對其進行分組，統計熱搜詞出現的頻率，根據頻率獲取有效數據。

DSL實現

POST es-log/_search?size=0
{"aggs": {"czbk": {"terms": {"field": "searchkey","min_doc_count": 5,"size": 2,"order": {"_count": "desc"}}}}
}

{"took" : 155,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 14,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"czbk" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "華為","doc_count" : 7},{"key" : "小米","doc_count" : 7}]}}
}

OpenAPI查詢參數設計

定義搜索推薦接口

    //獲取搜索熱詞public Map<String, Long> hotWords(CommonEntity commonEntity) throws Exception;

定義搜索推薦實現

    @Overridepublic Map<String, Long> hotWords(CommonEntity commonEntity) throws Exception {//定義返回數據Map<String, Long> map = new LinkedHashMap<>();//執行查詢SearchResponse response = getSearchResponse(commonEntity);//接收數據Terms termsAggData = response.getAggregations().get(response.getAggregations().getAsMap().entrySet().iterator().next().getKey());for (Terms.Bucket entry : termsAggData.getBuckets()) {if (entry.getKey() != null) {//key為分組字段String key = entry.getKey().toString();//count數據條數Long count = entry.getDocCount();//設置到mapmap.put(key, count);}}return map;}

定義搜索推薦控制器

    @GetMapping(value = "/hotwords")public ResponseData hotWords(@RequestBody CommonEntity commonEntity) {//構造返回數據ResponseData responseData = new ResponseData();if (StringUtils.isEmpty(commonEntity.getIndexName())) {responseData.setResultEnum(ResultEnum.param_isnull);return responseData;}//定義查詢返回結果Map<String, Long> result = null;try {result = analysisService.hotWords(commonEntity);//通過類型推斷自動裝箱responseData.setResultEnum(result, ResultEnum.success, null);//日志記錄logger.info(TipsEnum.hotwords_get_doc_success.getMessage());} catch (Exception e) {//打印到控制臺e.printStackTrace();//日志記錄logger.error(TipsEnum.hotwords_get_doc_fail.getMessage());//構建錯誤信息responseData.setResultEnum(ResultEnum.error);}return responseData;}

調用驗證

獲取分析服務熱搜詞數據

http://localhost:5555/v1/analysis/hotwords

參數
```
{"indexName": "es-log","map": {"aggs": {"per_count": {"terms": {"field": "searchkey","min_doc_count": 5,"size": 2,"order": {"_count": "desc"}}}}}
}
```
- field表示需要查找的列
- min_doc_count：熱搜詞在文檔中出現的次數
- size表示本次取出多少數據
- order表示排序（升序or降序）
返回
```
{"code": "200","desc": "操作成功！","data": {"華為": 7,"小米": 7}
}
```