Elasticsearch 使用reindex進行數據同步或索引重構

1、批量復制優化

POST _reindex
{"source": {"index": "source","size": 5000},"dest": {"index": "dest"}
}

2、提高scroll的并行度優化

POST _reindex?slices=5&refresh
{"source": {"index": "twitter"},"dest": {"index": "new_twitter"}
}

slices大小設置注意事項：
1）slices大小的設置可以手動指定，或者設置slices設置為auto，auto的含義是：針對單索引，slices大小=分片數；針對多索引，slices=分片的最小值。
2）當slices的數量等于索引中的分片數量時，查詢性能最高效。slices大小大于分片數，非但不會提升效率，反而會增加開銷。
3）如果這個slices數字很大(例如500)，建議選擇一個較低的數字，因為過大的slices 會影響性能。
效果
實踐證明，比默認設置reindex速度能提升10倍+。

3、條件查詢以及部分字段同步

{"source": {"index": "maindata","_source": [ //查詢字段"dataId","website"],"query": {"match_phrase": {"teamId": 3}},"excludes": [ "column1","column2" ] //排除字段},"dest": {"index": "maindatagroup","version_type": "internal"}
}

說明：
“version_type”: “internal”，internal表示內部的，省略version_type或version_type設置為 internal 將導致 Elasticsearch 盲目地將文檔轉儲到目標中，覆蓋任何具有相同類型和 ID 的文件。
這也是最常見的重建方式。

4、從遠程中重建索引

POST _reindex
{"source": {"remote": {"host": "http://otherhost:9200","username": "user","password": "pass","socket_timeout": "1m","connect_timeout": "10s"},"index": "source","query": {"match": {"test": "data"}}},"dest": {"index": "dest"}
}

注：需要給新的es配置白名單：reindex.remote.whitelist: “172.16.76.147:9200”

5、重構數據之取余

將publicsentimenthot 數據通過organId 取余2 ，把數據分配到相應的索引上

POST  _reindex
{"source": {"index": "publicsentimenthot","size": 1000},"dest": {"index": "pubtest_0","op_type": "create"},"script": {"lang": "painless","source": "ctx._index = 'pubtest_' + (ctx._source.organId ?: 0) % 2;"}
}

6、查詢reindex任務

（1）獲取reindex任務列表

GET _tasks?detailed=true&actions=*reindex

（2）根據任務id查看任務

GET _tasks/r1A2WoRbTwKZ516z6NEs5A:36619

注： r1A2WoRbTwKZ516z6NEs5A:36619 為任務列表的id

（2）取消任務

POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel

7、logstash 按照數據id重構索引

input {elasticsearch {hosts => ["第一個集群地址"]index => "源索引名稱"query => '{"query": {"match_all": {}}}'size => 1000scroll => "5m"docinfo => true}
}filter {ruby {code => "organ_id = event.get('organId').to_i rescue 0target_index = '目標索引前綴_' + (organ_id % 10).to_sevent.set('[@metadata][target_index]', target_index)"}
}output {elasticsearch {hosts => ["第二個集群地址"]index => "%{[@metadata][target_index]}"document_id => "%{[@metadata][_id]}"}
}

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/diannao/80532.shtml
繁體地址，請注明出處：http://hk.pswp.cn/diannao/80532.shtml
英文地址，請注明出處：http://en.pswp.cn/diannao/80532.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！