當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

elasticsearch分片分配和路由配置

發布時間：2023/12/20 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 elasticsearch分片分配和路由配置小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文基于es7.3版本。
集群級別的分片分配配置，主要有下面幾個：

cluster.routing.allocation.enable：啟用或禁止特定種類分片的分配。有下面四種取值：

all?- (default) Allows shard allocation for all kinds of shards.允許所有種類分片的分配，包括primary和replica。默認行為。
primaries?- Allows shard allocation only for primary shards.僅允許primary分片的分配，節點重啟后，replica的分片不恢復。
new_primaries?- Allows shard allocation only for primary shards for new indices.僅允許新建索引的primary分片的分配。測試了一下，貌似與上面的區別不大。
none?- No shard allocations of any kind are allowed for any indices.不允許任何種類分片的分配。新建的索引也不會分配primary和replica分片。

僅影響變化的分片。默認情況下，集群中某個節點失敗后，此節點上的shard會恢復到其他節點上，設置非all值，會導致失敗節點上shard不會恢復到其他節點。這在集群維護時非常有用，避免了節點重啟時，分片在節點間移動的開銷。需要注意的是，無論何種取值，節點重啟后，如果此節點上存在某分片的replica copy，并且集群中沒有此分片的primary copy，則此replica copy會恢復為primary copy。另外，即使在none下，新建索引不分配任何分片，但是集群重啟后，仍然會分配primary分片。

cluster.routing.allocation.node_concurrent_incoming_recoveries：單個節點的入口并發恢復的分片數量。表示此節點作為恢復目標節點，分片在其他節點或者是由于Rebalance或者是由于其他節點失敗，導致需要在此節點上恢復分片。默認是2個并發分片。

cluster.routing.allocation.node_concurrent_outgoing_recoveries：單個幾點的出口并發恢復的分片數量。表示此節點作為恢復的源節點，由于Rebalance導致需要從此節點遷移部分分片到其他節點。默認是2個并發分片。

cluster.routing.allocation.node_concurrent_recoveries：用于快速設置上面兩個參數，至于這個是總數兩個平分，還是分別設置兩個限制，目前未知。先留個坑，等我翻看源碼再回來填。從下面的代碼看，是分別設置。
?

//ThrottlingAllocationDecider.java public static final int DEFAULT_CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_RECOVERIES = 2;public static final int DEFAULT_CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES = 4;public static final String NAME = "throttling";public static final Setting<Integer> CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_RECOVERIES_SETTING =new Setting<>("cluster.routing.allocation.node_concurrent_recoveries",Integer.toString(DEFAULT_CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_RECOVERIES),(s) -> Setting.parseInt(s, 0, "cluster.routing.allocation.node_concurrent_recoveries"),Property.Dynamic, Property.NodeScope);public static final Setting<Integer> CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES_SETTING =Setting.intSetting("cluster.routing.allocation.node_initial_primaries_recoveries",DEFAULT_CLUSTER_ROUTING_ALLOCATION_NODE_INITIAL_PRIMARIES_RECOVERIES, 0,Property.Dynamic, Property.NodeScope);public static final Setting<Integer> CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_INCOMING_RECOVERIES_SETTING =new Setting<>("cluster.routing.allocation.node_concurrent_incoming_recoveries",CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_RECOVERIES_SETTING::getRaw,(s) -> Setting.parseInt(s, 0, "cluster.routing.allocation.node_concurrent_incoming_recoveries"),Property.Dynamic, Property.NodeScope);public static final Setting<Integer> CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_OUTGOING_RECOVERIES_SETTING =new Setting<>("cluster.routing.allocation.node_concurrent_outgoing_recoveries",CLUSTER_ROUTING_ALLOCATION_NODE_CONCURRENT_RECOVERIES_SETTING::getRaw,(s) -> Setting.parseInt(s, 0, "cluster.routing.allocation.node_concurrent_outgoing_recoveries"),Property.Dynamic, Property.NodeScope);

cluster.routing.allocation.node_initial_primaries_recoveries：單個節點并行initial? primary恢復的并發數。指的是在節點restart后，本來屬于此節點的primary shard進行的恢復。從本地磁盤進行的恢復。因此恢復較快。默認值為4。

cluster.routing.allocation.same_shard.host：設置是否檢查同一臺主機不能存放多個shard的copy。僅針對一個主機上運行同個集群的多個節點的情況。默認為false。

與恢復相關的其他參數：

indices.recovery.max_bytes_per_sec：單個幾點進行恢復的inbound和outbound帶寬的和。默認40mb。
indices.recovery.max_concurrent_file_chunks：每一個shard恢復可以并行發送的file chunk的數量。默認值為2。file chunk可理解為將文件內容分割為一個一個的chunk，類似操作系統的page的概念。oracle中共享池的內存分配單元就是按chunk來的，盡管各個chunk的大小不同。

集群級別的分片Rebalance配置：

cluster.routing.rebalance.enable：啟用或禁止特定種類分片的Rebalance。有四種取值：

all?- (default) Allows shard balancing for all kinds of shards. 啟用所有類別分片的Rebalance。
primaries?- Allows shard balancing only for primary shards.僅啟用primary分片的Rebalance。
replicas?- Allows shard balancing only for replica shards.僅啟用replica分片的Rebalance。
none?- No shard balancing of any kind are allowed for any indices.禁止分片Rebalance。

cluster.routing.allocation.allow_rebalance：指定何時可以進行分片Rebalance。有三種取值：

always?- Always allow rebalancing. 總是允許。
indices_primaries_active?- Only when all primaries in the cluster are allocated.僅僅當集群中所有primary分片都active的時候。
indices_all_active?- (default) Only when all shards (primaries and replicas) in the cluster are allocated.僅僅當集群中所有分片都active。

cluster.routing.allocation.cluster_concurrent_rebalance：控制集群范圍內并發Rebalance的分片數量。默認為2。僅僅影響由于分片分布不平衡產生的Rebalance操作。不影響因為分片分配過濾allocation filtering或者強制 awareness引起的分片遷徙。

shard rebalance heuristics設置參數：

cluster.routing.allocation.balance.shard：rebalance相關的分片因子，默認值為0.45f；
cluster.routing.allocation.balance.index：rebalance相關的索引因子，默認值為0.55f；與上面的配置參數一起，一起帶入BalancedShardsAllocator類的靜態內部類WeightFunction中進行計算。

//BalancedShardsAllocator.java public static final Setting<Float> INDEX_BALANCE_FACTOR_SETTING =Setting.floatSetting("cluster.routing.allocation.balance.index", 0.55f, 0.0f, Property.Dynamic, Property.NodeScope);public static final Setting<Float> SHARD_BALANCE_FACTOR_SETTING =Setting.floatSetting("cluster.routing.allocation.balance.shard", 0.45f, 0.0f, Property.Dynamic, Property.NodeScope);public static final Setting<Float> THRESHOLD_SETTING =Setting.floatSetting("cluster.routing.allocation.balance.threshold", 1.0f, 0.0f,Property.Dynamic, Property.NodeScope);@Injectpublic BalancedShardsAllocator(Settings settings, ClusterSettings clusterSettings) {setWeightFunction(INDEX_BALANCE_FACTOR_SETTING.get(settings), SHARD_BALANCE_FACTOR_SETTING.get(settings));setThreshold(THRESHOLD_SETTING.get(settings));clusterSettings.addSettingsUpdateConsumer(INDEX_BALANCE_FACTOR_SETTING, SHARD_BALANCE_FACTOR_SETTING, this::setWeightFunction);clusterSettings.addSettingsUpdateConsumer(THRESHOLD_SETTING, this::setThreshold);}private void setWeightFunction(float indexBalance, float shardBalanceFactor) {weightFunction = new WeightFunction(indexBalance, shardBalanceFactor);}public static class WeightFunction {private final float indexBalance;private final float shardBalance;private final float theta0;private final float theta1;public WeightFunction(float indexBalance, float shardBalance) {float sum = indexBalance + shardBalance;if (sum <= 0.0f) {throw new IllegalArgumentException("Balance factors must sum to a value > 0 but was: " + sum);}theta0 = shardBalance / sum;theta1 = indexBalance / sum;this.indexBalance = indexBalance;this.shardBalance = shardBalance;}public float weight(Balancer balancer, ModelNode node, String index) {return weight(balancer, node, index, 0);}public float weightShardAdded(Balancer balancer, ModelNode node, String index) {return weight(balancer, node, index, 1);}public float weightShardRemoved(Balancer balancer, ModelNode node, String index) {return weight(balancer, node, index, -1);}private float weight(Balancer balancer, ModelNode node, String index, int numAdditionalShards) {final float weightShard = node.numShards() + numAdditionalShards - balancer.avgShardsPerNode();final float weightIndex = node.numShards(index) + numAdditionalShards - balancer.avgShardsPerNode(index);return theta0 * weightShard + theta1 * weightIndex;}}

cluster.routing.allocation.balance.threshold：閾值。當節點間權重差值大于這個值時，才會進行分片的reallocate。默認值為1.0f，增大這個值，將會降低reallocate的敏感度：
?

private static boolean lessThan(float delta, float threshold) {/* deltas close to the threshold are "rounded" to the threshold manuallyto prevent floating point problems if the delta is very close to thethreshold ie. 1.000000002 which can trigger unnecessary balance actions*/return delta <= (threshold + 0.001f);}

除了上面集群級別設置之外，分片分配還收到基于磁盤的分片分配Disk-based shard allocation和基于awareness的分片分配Shard allocation awareness的影響。

es會考慮磁盤剩余空間的多少，來決定是否分配新的分片到節點或者將分片從節點中遷移到集群中其他節點。如下是相關參數設置：
cluster.routing.allocation.disk.threshold_enabled：設置是否啟用基于磁盤的分配策略。默認為true。
cluster.routing.allocation.disk.watermark.low：設置磁盤使用空間的低水線限制。默認值為85%，表示磁盤使用空間達到85%后，除了新建索引的primary shards以及之前從未分配過的shards（unassigned shards），es將不會分配其他shard到此節點。設置為字節值，例如500mb，則表示磁盤剩余空間限制。
cluster.routing.allocation.disk.watermark.high設置磁盤使用空間的高水線限制。默認值為90%，表示磁盤使用空間達到90%后，es將會嘗試將分片從此節點遷出。此影響針對所有類型的分片，包括unassigned shards。可以設置為字節值，例如250mb，表示磁盤剩余空間限制。
cluster.routing.allocation.disk.watermark.flood_stage：磁盤使用率的最高限制。默認值為95%，表示當磁盤使用率達到95%后，es將會設置所有在此節點上有分片存儲的index為readonly并允許delete的（index.blocks.read_only_allow_delete）。當磁盤空間釋放后，被設置為index.blocks.read_only_allow_delete的index，需要通過如下語句重置：
?

PUT /twitter/_settings {"index.blocks.read_only_allow_delete": null }

需要注意的是，以上三個參數不能混合使用百分比與字節值。要么三個都使用百分比，要么都使用字節值。并且百分比值需要遞增，字節值需要遞減。
cluster.info.update.interval：設置磁盤空間檢查頻率。默認為30s。
cluster.routing.allocation.disk.include_relocations：設置評估磁盤使用率時是否考慮正在reallocate中的分片的空間。默認值為true。這會導致磁盤使用率的評估偏高，假設reallocate的分片大小為1G，reallocate過程已完成了50%，那這個評估過程會多出這50%的空間占用。參數設置舉例如下：
?

PUT _cluster/settings {"transient": {"cluster.routing.allocation.disk.watermark.low": "100gb","cluster.routing.allocation.disk.watermark.high": "50gb","cluster.routing.allocation.disk.watermark.flood_stage": "10gb","cluster.info.update.interval": "1m"} }

基于awareness的分配是考慮了這樣的想定：一個elasticsearch集群可能包含了若干服務器，這些服務器可能分布在若干機架或不同地理位置的機房或不同網絡區域。基于容災的考慮，可能會將同個索引的primary、replica分片分布在不同的機架上；或是基于就近獲取的考慮，將get請求路由到與coordinator處于同個網絡區域的節點。啟用shard allocation awareness需要做如下設置：

1，在節點的elasticsearch.yml配置文件中設置節點屬性，屬性名稱與值是任意指定的，假設我的集群中有3個節點，這里指定my_rack_id的屬性:
node1：? ? node.attr.my_rack_id: rack1
node2：? ? node.attr.my_rack_id: rack1
node3：? ? node.attr.my_rack_id: rack2

2，在節點的elasticsearch.yml配置文件中，指定cluster.routing.allocation.awareness.attributes：
cluster.routing.allocation.awareness.attributes: my_rack_id
或者通過cluster update api指定：
?

PUT _cluster/settings {"persistent": {"cluster.routing.allocation.awareness.attributes":"my_rack_id"} }

cluster.routing.allocation.awareness.attributes設置要特別小心，如果設置錯誤，比如設置了不存在的屬性，會導致分片分配錯誤，新建的索引無法分配分片，已存在的索引replica copy無法分配，導致集群healthy變為yellow甚至red狀態。

注意，這個的三個節點中，node1和node2設置了my_rack_id都為rack1，node3只是my_rack_id為rack2。

現在考慮這樣一種情況，假設給每個索引設置3個分片，1個replica。那么此時，集群共有6個分片，平均每個節點3個。按照我上面的設置，那么必然node3會存放3個分片的各一個copy，也就是node3上會有3個分片，另外兩個節點上隨機分布3個節點。此時，整個集群時不平衡的，但是這是為了滿足用戶的設置。

情況在發展，you know, things going on 。這個時候node3掛掉了，如果其中一個節點丟失，那么此時，node3上的分片會遷移到另外兩個節點，而忽略了awareness的容災要求的設置。這個時候會變成node1，node2平分6個分片的情況。如果需要強制保留node3掛掉之前的效果，需要設置cluster.routing.allocation.awareness.force來讓同一個my_rack_id區域的節點上，不會分配一個分片的多余一個copy。既在node3掛掉之后，node1、node2上只會分布所有分片repica group的其中一個copy，而不是所有。此時node1，node2上的copy會全部轉變成primary copy，而沒有replica copy。這個時候，索引的狀態是yellow。如下：
?

PUT _cluster/settings {"persistent": {"cluster.routing.allocation.awareness.attributes":"my_rack_id","cluster.routing.allocation.awareness.force.rack_id.values":"rack_one,rack_two"} }

同樣可以通過shard allocation filter過濾（include或者exclude）分片在節點上的分布，相關的設置參數有下面三個：
cluster.routing.allocation.include.{attribute}：Allocate shards to a node whose?{attribute}?has at least one of the comma-separated values。將shard分配到至少有一個attribute-value的節點上。{attribute}的值是一個逗號分隔的屬性值列表；
cluster.routing.allocation.require.{attribute}：Only allocate shards to a node whose?{attribute}?has?all?of the comma-separated values。將shard分配到擁有所有attribute-values的節點上。
cluster.routing.allocation.exclude.{attribute}：Do not allocate shards to a node whose?{attribute}?has?any?of the comma-separated values。將shard從擁有任何attribute-value的節點上排除掉，移走。需要注意的是，這個并不是強制生效的。同時需要符合其他的設置，例如這里的node1和node2的rack_id為rack_one，node3的rack_id為rack_two，當設置awareness為rack_id時，primary 和replica shard不能都分布在同一個rack_id上。
{attribute}支持自定義屬性及下面的內建屬性：

_name	Match nodes by node names
_ip	Match nodes by IP addresses (the IP address associated with the hostname)
_host	Match nodes by hostnames

舉例如下：
?

PUT _cluster/settings {"transient": {"cluster.routing.allocation.exclude._ip": "192.168.2.*","192.168.1.*"} } PUT _cluster/settings {"transient": {"cluster.routing.allocation.include._name": "node1","node2"} } PUT _cluster/settings {"transient": {"cluster.routing.allocation.require.rack_id": "rack_one","rack_two"} }

因為可以動態設置，這一功能通常使用在節點停機時，通過設置cluster.routing.allocation.exclude將分片從此節點移出到其他節點。

其他設置：
cluster.blocks.read_only：Make the whole cluster read only (indices do not accept write operations), metadata is not allowed to be modified (create or delete indices)。使整個集群只讀。禁止包括document的CUD操作，以及索引元數據的修改（創建、刪除索引）；
cluster.blocks.read_only_allow_delete：Identical to?cluster.blocks.read_only?but allows to delete indices to free up resources。使集群只讀，但是可進行刪除操作以釋放空間。
cluster.max_shards_per_node：Controls the number of shards allowed in the cluster per data node。集群中單個data節點允許的open的分片數量，closed的index所屬的shard不計算在內。默認1000。如果集群中data node節點數固定的話，這個值也限定了整個集群中shard的數量，包括primary和replica的shard。在進行create index/restore snapshot/open index時，如果會導致節點上的分片數超過設置的話，會造成操作失敗。同時因為更改設置導致分片上存在了多余設置的值，（例如節點上已存在900個shard，此時修改設置為500），會造成不能新建和open索引。
cluster.metadata.*：用戶自定義設置。可以設置任何自定義配置和配置值。
cluster.indices.tombstones.size：索引墓碑大小設置，默認值500。靜態設置。cluster state中維護了deleted的index的index_name、index_uuid、以及刪除時間delete_date_in_millis信息。可通過如下dsl獲取：

GET _cluster/state?filter_path=metadata.index-graveyard.tombstones

這個設置用于控制cluster state中維護deleted index的數量。當節點A從集群中離開后，此時集群中進行了刪除索引的操作。操作成功后，此時集群中已經沒有這個index的任何記錄了。此后，節點A再次加入集群，由于es的特點，當節點重新加入集群時會import節點中有的，集群中沒有的index，因此可能會re-import這些在節點A離開期間刪除掉的索引，可能會抵消掉索引的刪除操作。為了對抗這個影響帶來的錯誤影響，cluster state中維護了deleted的索引信息。當集群頻繁刪除索引時，可調大此設置，維護過多的deleted index會造成cluster state膨脹，需要權衡。

持久化任務（persistent task）分配相關設置：
持久化任務創建后會存儲在cluster state中，以保證集群重啟后仍然存在。但是task需要分配到具體的node上去執行。
cluster.persistent_tasks.allocation.enable：啟用或禁用持久化任務分配。取值為all、none：這個設置不影響已經存在的task，只影響新建或者需要重新分配節點的task（例如節點失去連接，資源不足等）。

all?- (default) Allows persistent tasks to be assigned to nodes
none?- No allocations are allowed for any type of persistent task

cluster.persistent_tasks.allocation.recheck_interval：task重新分配檢查間隔。當節點失去連接后，節點上的task會自動由master分配到其它節點上執行，這是因為節點離開后，cluster state會變化，此時master是知道哪個節點上的task需要重新分配節點的。但是當節點因為資源不足需要將task分配到其他節點時，就需要master定期進行檢查。默認值30s，最小值為10s。

Logger日志相關設置：這個放在日志相關中介紹。

總結

以上是生活随笔為你收集整理的elasticsearch分片分配和路由配置的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：用python画枫叶代码-Python自
下一篇： ROBOGUIDE软件：机器人仿真视频导