cassandra-2.0 - 需要对运行 Cassandra nodetool 修复进行一些说明

因此，我们在平衡当前集群上的工作负载时遇到了困难，主要是由于预算限制以及目前无法添加更多节点。直到最近，节点一夜之间宕机的情况经常发生，所以我经常运行nodetool Repair。最近，集群变得更加稳定，这些宕机的节点不会定期发生，因此上周末我为每个节点上的 nodetool Repair -pr 创建了 cron 作业，每周运行一次。 gc_grace 仍默认为 10 天，最大提示仍为默认 3 小时。

我的问题是:

如果我们丢失一个节点超过 3 小时，提示到底会发生什么？它/它们不再存在了吗？
如果我们丢失一个节点的时间超过 3 小时，但由于某种原因没有意识到该节点已关闭那么长时间，如果运行 nodetool Repair -pr 而不是对已关闭的节点进行完全修复，会发生什么情况节点？
如果事实确实如此，您将如何解决问题 2 中的问题？
有没有办法检查所有节点是否显着一致/已修复？

这还没有发生(至少我不这么认为)，但我正在尝试提前计划最坏的情况，因为我们的集群稳定性可能会或可能不会长期丢失，所以我宁愿做好准备尽我所能。

最佳答案

1) If we lose a node for longer than 3 hours, what exactly happens to the hint/s? Does it/they no longer exist?

是的，没错，您的提示将被删除(逻辑删除)，并且它们将通过常规压缩过程消失。您实际上可以亲自看到这一点，只需从 system.hints 表中进行选择即可。

查看我们的docs和 Jonathan's blog post on HH .

2) If we lost a node for longer than the 3 hours but for some reason didn't realize that the node had been down that long, what will happen if the nodetool repair -pr is run rather than the full repair on the downed node?

在该节点恢复和运行修复之间的这段时间，您可能会保存过时的数据。

-pr 表示您只需修复该机器上的主要范围。如果您在集群中使用 -pr 运行修复，您仍然会修复所有内容。

我建议您尝试使用 OpsCenter repair service，而不是使用 chron。它使这个过程自动化。

3) How would you fix the issue/s from question 2 if that is in fact the case?

修复将使您回到完全一致性的基线，这就是为什么您应该每周运行一次(或在

4) Is there a way to check that all nodes are significantly consistent/repaired?

唯一的方法是建立默克尔树，这就是修复的作用。一旦发现不一致之处，不妨进行修复。没有办法只比较而不修复。

注意:很好的提示，3.0 中将进行改进，请查看 Aleksey 的这篇文章: http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery

关于cassandra-2.0 - 需要对运行 Cassandra nodetool 修复进行一些说明，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28728046/

cassandra-2.0 - 需要对运行 Cassandra nodetool 修复进行一些说明

上一篇：x11 - 如何在低级别上最小化 X 窗口(不是 wmctrl 或 xdotool)？

下一篇：scheme - Scheme Meta Circular Evaluator 中的模式匹配功能