elasticsearch - 将节点添加到运行中的集群Elasticsearch中导致主节点未发现异常

标签 elasticsearch cluster-computing node-modules

问题

我有一个正在运行的集群,我想在其中添加一个数据节点。正在运行的集群是

x.x.x.246

并且数据节点是
x.x.x.99

每个服务器可以通过ping互相看到对方。
机器操作系统:CentOS7
flex 搜索:7.61

配置:

这是x.x.x.246的elasticsearch.yml:
cluster.name: elasticsearch
node.master: true
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.246
http.port: 9200
discovery.seed_hosts: ["x.x.x.99:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]

这是x.x.x.99的elasticsearch.yml
cluster.name: elasticsearch
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.99
http.port: 9200
discovery.seed_hosts: ["x.x.x.245:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]

在机器上测试运行的Elasticsearch

当我在每台机器上运行systemctl start elasticsearch时,它运行良好。

在x.x.x.246上测试运行
curl -X GET "X.X.X.246:9200/_cluster/health?pretty"

show:节点数不变
curl -X GET "X.X.X.99:9200/_cluster/health?pretty

节目:
{
  "error" : {
    "root_cause" : [
      {
        "type" : "master_not_discovered_exception",
        "reason" : null
      }
    ],
    "type" : "master_not_discovered_exception",
    "reason" : null
  },
  "status" : 503
}

已编辑

这是x.x.x.246的elasticsearch.yml:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99","x.x.x.246]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE


这是x.x.x.99的elasticsearch.yml
cluster.name: elasticsearch
node.name: node
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246","x.x.x.99"]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE


登录x.x.x.99:
[root@dev ~]# tail -30 /var/log/elasticsearch/elasticsearch.log
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:830) ~[?:?]
[2020-03-19T12:12:04,462][INFO ][o.e.c.c.JoinHelper       ] [node-1] failed to join {master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, optionalJoin=Optional[Join{term=178, lastAcceptedTerm=8, lastAcceptedVersion=100, sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, targetNode={master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][10.64.2.246:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
        at org.elasticsearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:514) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:244) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.1.jar:7.6.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][10.64.2.99:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid P4QlwvuRRGSmlT77RroSjA than local cluster uuid oUoIe2-bSbS2UPg722ud9Q, rejecting
        at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:148) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:830) ~[?:?]

最佳答案

对于节点x.x.x.99,种子主机的条目错误。它应该如下所示:

discovery.seed_hosts: ["x.x.x.246:9300"]
discovery.seed_hosts列表用于检测主节点,因为此列表包含作为符合主条件的节点的节点的地址,并且还保存当前主节点的信息,因为在配置中指向x.x.x.245而不是x.x.x.246x.x.x.99,节点x.x.x.99无法检测到主服务器。

发表评论中的讨论正确配置应该是:

主节点:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246]
cluster.initial_master_nodes: ["master"]

请注意,如果您希望上述节点仅是主节点,而不保存数据,则进行设置
node.data: false

数据节点:
cluster.name: elasticsearch
node.name: data-node-1
node.data: true
node.master: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246"]

同样,由于节点x.x.x.99无法加入集群,因此它具有过时的集群状态。因此,删除data上的x.x.x.99文件夹,然后重新启动该节点。

关于elasticsearch - 将节点添加到运行中的集群Elasticsearch中导致主节点未发现异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60736842/

相关文章:

elasticsearch - 从/usr/local/var卸载elasticsearch并在MacOSX上安装另一个版本

tomcat - lucene 应该在与 tomcat 不同的进程中运行吗

java - quartz 调度器 : Trigger some jobs on every cluster node and some only once per cluster

angular - 找不到模块 "tslib"

node.js - Ubuntu上的node_modules(或npm)有什么问题

linux - Mac 与 Linux 上的 Docker 挂载权限

elasticsearch - 比赛未能通过Elasticsearch

elasticsearch - ElasticSearch使用空格搜索带连字符的文本,而不是查询中的破折号

amazon-ec2 - 如何在一组 Amazon EC2 上构建一个 torrent 集群,用于机器之间的文件传输?

javascript - 将 knex 与 es6 模块一起使用