elasticsearch 通用筆記

文章目錄

一、前言
二、內容說明
- 1、目錄簡介
- 2、本文例子前提內容
三、操作內容
- 1、設置ES為服務
- 2、查看健康度
- - 參數解析
- 3、索引相關查詢
- - 3.1、查詢指定索引內容
  - - 3.1.1、匹配查詢
    - 3.1.2、精確匹配（不嘗試分詞）
    - 3.1.3、范圍查詢
    - 3.1.4、id查詢
    - 3.1.5、通配符及前綴匹配
    - 3.1.6、正則表達式模式（最佳混淆匹配方案）
    - 3.1.7、多字段匹配
    - 3.1.8、函數分數查詢（業務處理計算使用）
    - 3.1.9、布爾查詢（組合查詢方法）
  - 3.2、查詢mapping
  - 3.3、查詢所有索引信息
  - 3.4、移除索引
  - 3.5、統計條數
四、常見問題處理
- 1、數據遷移處理
- 2、處理shard 滿的問題
- 3、最大子句數量設置問題
- 4、分片配置檢測（通常關閉了分片）

一、前言

本文主要敘述ES的結構，基礎查詢 和部分調用實踐內容，主要以 7.x 版本為基準

二、內容說明

1、目錄簡介

bin/ 包含 Elasticsearch 的啟動腳本和管理工具

比如 elasticsearch（啟動服務）和插件管理工具 elasticsearch-plugin 等。

config/ 保存 Elasticsearch 的配置文件。

主要包括：

elasticsearch.yml: 核心配置文件，用于定義集群名稱、節點設置、網絡綁定等。
jvm.options: 用于配置 JVM 相關參數，例如堆大小、垃圾回收設置。
log4j2.properties: 配置日志記錄參數。

data/ 存儲實際的索引數據。

每個節點的所有索引及其分片數據都保存在這個目錄中。

logs/ 存儲 Elasticsearch 的日志文件。

modules/ 保存 Elasticsearch 核心功能模塊

比如內置的分析器或監控功能。通常不需要手動干預。

plugins/ 保存已安裝的插件，每個插件都有一個單獨的子目錄。

例如 Kibana 連接插件或安全插件。

lib/ 包含 Elasticsearch 運行所需的核心類庫和依賴包。

2、本文例子前提內容

默認部署的ip地址： 192.168.6.8
用戶組及用戶 ： elasticsearch:elasticsearch
安裝目錄 ： /opt/elasticsearch
測試索引：my_index

三、操作內容

1、設置ES為服務

1.1、創建服務文件

下面的 User 和 elasticsearch 因個人而定，這里默認采用elasticsearch 用戶組

sudo nano /etc/systemd/system/elasticsearch.service

[Unit]
Description=Elasticsearch
Documentation=https://www.elastic.co
Wants=network-online.target
After=network-online.target[Service]
Type=simple
User=elasticsearch
Group=elasticsearch
ExecStart=/opt/elasticsearch/bin/elasticsearch
Restart=always
LimitNOFILE=65535
LimitNPROC=4096[Install]
WantedBy=multi-user.target

1.2、重新加載守護進程（識別新服務文件）

sudo systemctl daemon-reload

1.3、設置自啟動及啟停操作

# 設置自啟動
sudo systemctl enable elasticsearch
# 啟動服務
sudo systemctl start elasticsearch
# 關閉服務
sudo systemctl stop elasticsearch
# 檢測服務狀態
sudo systemctl staus elasticsearch

2、查看健康度

curl -XGET 'http://192.168.6.8:9200/_cluster/health?pretty=true

響應示例

{"cluster_name" : "elasticsearch","status" : "yellow","timed_out" : false,"number_of_nodes" : 3,"number_of_data_nodes" : 2,"active_primary_shards" : 10,"active_shards" : 20,"relocating_shards" : 0,"initializing_shards" : 0,"unassigned_shards" : 10,"delayed_unassigned_shards" : 0,"number_of_pending_tasks" : 0,"number_of_in_flight_fetch" : 0,"task_max_waiting_in_queue_millis" : 0,"active_shards_percent_as_number" : 66.7
}

參數解析

參數名稱	含義
`cluster_name`	表示 Elasticsearch 集群的名稱。
`status`	集群健康狀態，可能值為：`green`（健康）、`yellow`（有未分配副本分片）、`red`（有未分配主分片）。
`number_of_nodes`	當前集群中節點的總數量，包括主節點和數據節點。
`number_of_data_nodes`	當前集群中負責存儲和查詢數據的節點數量。
`active_primary_shards`	當前處于活躍狀態的主分片數量。
`unassigned_shards`	當前未分配到任何節點的分片數。
`number_of_pending_tasks`	集群中等待處理的任務數量。
`active_shards_percent_as_number`	活躍分片的百分比，例如 `66.7` 表示當前 66.7% 的分片是活躍狀態。

3、索引相關查詢

基礎查詢格式

curl -XGET 'http://192.168.6.8:9200/my_index/_search?pretty' -H "Content-Type: application/json" -d '{"query": {"match_all": {}}
}'

3.1、查詢指定索引內容

3.1.1、匹配查詢

默認查詢元素元素為 name 、age、birthday

# keyword替換值
# 【分詞匹配】 ： match
# 【短語完全匹配】 ： match_phrase
{"query": {"{{keyword}}": {"name": "ringo lam"}}
}

3.1.2、精確匹配（不嘗試分詞）

# keyword替換值
# 【精確匹配】 ： termvalue : "ringo lam"
# 【短語完全匹配】 ： termsvalue : ["lao wu", "lao liu"]{"query": {"{{keyword}}": {"name": {{value}}}}
}

3.1.3、范圍查詢

主要針對數字會比較好，日期上偶爾會有點問題

# 數字示例
{"query": {"range": {"age": {"gte": 10,"lte": 20}}}
}# 日期示例
{"query": {"range": {"birthday": {"gte": "2020-03-01",  // 起始日期"lte": "2023-03-31",  // 截止日期"format": "yyyy-MM-dd" // 日期格式 (可選)}}}
}

3.1.4、id查詢

根據文檔的 _id 搜索文檔

{"query": {"ids": {"values": ["1", "2"]}}
}

3.1.5、通配符及前綴匹配

通配符模式，通常可以使用如下內容

*：匹配零個或多個任意字符（包括空字符）。
?：匹配單個任意字符。

# keyword替換值
# 【指定前綴開頭匹配】 ： prefixvalue : "lao"
# 【通配符匹配】 ： wildcardvalue : "lao*"{"query": {"{{keyword}}": {"name": {{value}}}}
}

3.1.6、正則表達式模式（最佳混淆匹配方案）

這里可以盡情發揮正則的匹配內容，性能上有損耗

{"query": {"regexp": {"name": "lao.*"}}
}

3.1.7、多字段匹配

分詞后查詢
這里采用搜索 name 和nick_name字段

{"query": {"multi_match": {"query": "lao liu","name": ["name", "nick_name"]}}
}

3.1.8、函數分數查詢（業務處理計算使用）

待清空業務關系后，完善補充

3.1.9、布爾查詢（組合查詢方法）

組合多個條件來綜合查詢，主要增加過濾的條件
可以使用 must（必須匹配）、should（可以匹配）或 must_not（禁止匹配）

{"query": {"bool": {"must": [{ "match": { "name": "lao liu" } }],"should": [{ "match": { "name": "lao wu" } }],"must_not": [{ "term": { "name": "lao san" } }]}}

3.2、查詢mapping

curl -XGET 'http://192.168.6.8:9200/my_index/_mappings?pretty'

3.3、查詢所有索引信息

curl -XGET 'http://192.168.6.8:9200/_cat/indices?v`

參數名稱	含義
`health`	索引的健康狀態：`green`（健康）、`yellow`（有未分配副本分片）、`red`（有未分配主分片）。
`status`	索引的狀態：`open` 表示索引是打開的，`close` 表示索引被關閉。
`index`	索引的名稱。
`uuid`	索引的唯一標識符，用于區分不同的索引。
`pri`	主分片的數量，即該索引擁有的主分片數量。
`rep`	副本分片的數量，即每個主分片有多少副本。
`docs.count`	當前索引中的文檔總數，包括主分片和副本分片的總和。
`docs.deleted`	已被標記為刪除但尚未從磁盤中物理刪除的文檔數量。
`store.size`	索引占用的總磁盤空間大小，包括主分片和副本分片。
`pri.store.size`	主分片占用的磁盤空間大小，僅包含主分片的存儲大小。

3.4、移除索引

這里默認測試id 6bd3b7e63f844886909e66c7f5548b50

curl -XDELETE 'http://192.168.6.8:9200/my_index/_doc/6bd3b7e63f844886909e66c7f5548b50'

3.5、統計條數

curl -XGET 'http://192.168.6.8:9200/my_index/_count?pretty'

四、常見問題處理

1、數據遷移處理

參考數據遷移篇章

2、處理shard 滿的問題

使用Head插件或者Kibana的Dev Tools 執行如下命令（通過下面的命令重啟es會失效，因為 transient 是臨時生效的）：

1、api配置（臨時處理，重啟后失效）

curl -X PUT "http://192.168.6.8:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d '
{"persistent" : {"cluster" : {"max_shards_per_node" : "10000"}}
}'

2、通過 elasticsearch.yml 文件配置（永久處理）
cluster.max_shards_per_node: 10000

3、校驗配置

curl "http://192.168.6.8:9200/_cluster/settings?pretty"

3、最大子句數量設置問題

查詢時報錯
示例錯誤：QueryPhaseExecutionException[failed to execute query]; nested: TooManyClauses[maxClauseCount is set to 1024];

1、api配置（臨時處理，重啟后失效）

curl -X PUT "http://192.168.6.8:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d '
{"persistent" : {"indices.query.bool.max_clause_count" : "2048"}
}'

2、通過 elasticsearch.yml 文件配置：
indices.query.bool.max_clause_count: 2048

3、校驗配置

curl "http://192.168.6.8:9200/_cluster/settings?pretty"

4、分片配置檢測（通常關閉了分片）

集群可用

1、檢測分片
curl -XGET 'http://localhost:9200/_cluster/settings?pretty'

2、開啟分片配置

curl -XPUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{"transient": {"cluster.routing.allocation.enable": "all"}
}'