amazon-ec2 - 没有从 ELB 到其中一个 Auto Scaling 实例的流量

我们使用 Auto Scaling，它对我们非常有用，但今天早上发生了一些事情。由于某种原因，其中一个实例的 CPU 利用率约为 %0，这将 %100 的 CPU 利用率带到同一可用区中的其余实例，并且它没有扩展，因为所有实例的平均 CPU 利用率约为 %70而触发器应该在 %80 被命中时启动新实例。还使用了 ELB 实例健康检查，但此 %0 实例健康。

是否可以配置 Auto Scaling 以删除此类实例？我们不想为检查设置任何自定义 cronjobs。

Auto Scaling Issue

最佳答案

更新2

Is it possible to configure Auto Scaling to remove such Instances?

是的，请参见下文 - 根据您的评论，您已经正确完成了此操作。

We don't want to setup any custom cronjobs for check ups.

鉴于您的配置显然是正确的(暗示 Auto Scaling 和/或 ELB 各自存在问题)，恐怕无法通过主动关闭未使用的实例来避免自定义解决方案或促进 as-set-instance-health，正如我在下面的初始回答中所建议的那样 - tribalcrossing 对 ELB-Unhealthy instances taken OOS then removed from ELB automatically 的回答建议前者同样，这似乎可以解决您的情况:

We run a cronjob that's fired every 5 minutes to scan all of the servers in an ELB to check to see if it's been up for more than 5 minutes AND is unhealthy. When we find one, we shut it down. We've hadd issues of "dead" instances stuck in ELB and throwing off monitoring metrics that trigger autoscaling actions, and that cronjob has solved the problem for us.

更新 1

ELB Instance health check is used as well, but this %0 Instance was healthy.

您指的是哪个健康指标？您是如何断定实例健康的？

重要的是要认识到，自动缩放和 ELB 以不同方式衡量健康实例，请参阅 alighafour 对 Autoscaling not reacting to unhealthy instances 的回复:

ELB checks at the application layer while autoscaling checks at the machine layer.

此差异在 AWS 团队对链接问题 ELB-Unhealthy instances taken OOS then removed from ELB automatically 的回复中有进一步详细说明。 (实际上解决了一个相反的问题):

Autoscaling is looking at instance health - they'll take an instance down if the data shows that the instance is not healthy. They'll take it out of the ELB at that time and then shut down the instance.

ELB, on the other hand, is doing an application health check by reading in a file or doing a connection to a port. If the application fails a certain number of these checks, the instance continues to run, but the ELB won't send it any new traffic. The ELB continues to perform the health check - if the application instance becomes healthy again, it'll start routing traffic to it. ELB doesn't remove the instances from the ELB registration - it simply stops sending it traffic until it's healthy again. [emphasis mine]

结论

看起来上述情况可能确实适用于您的体验:ELB 停止向您的实例发送流量，因为 ELB 健康检查失败，而 Auto Scaling 健康检查没有发现实例本身的问题；例如，如果 ELB 运行状况检查探测到 Apache 服务的网页，而该网页由于任何原因(例如 Apache 崩溃或其他原因)无法响应，则可能会发生这种情况。

解决方案

您需要配置 Auto Scaling Policy 以根据 EC2 健康状态和 ELB 健康状态做出健康决策，如 部分所述为 Elastic Load Balancing 创建健康检查 within Maintaining Current Scaling Level :

By default, Auto Scaling uses the Amazon EC2 health status for all Auto-Scaling-managed instances. To also use the Elastic Load Balancer's health check, set the HealthCheckType property of the group to ELB:

% as-update-autoscaling-group myGroup –-health-check-type ELB

有了这个配置，一旦 ELB 健康检查也失败，实例将被视为不健康，并相应地被替换。

初始答案

Is it possible to have multiple triggers for one Auto Scaling Group?

不幸的是，请参阅例如AWS 团队对 How to set Multiple Triggers in Template 的回应:

Unfortunately, the Auto Scaling service only allows 1 trigger per Auto Scaling group and so we do not support having multiple triggers for the same group within a template at this time.

另一种方法是通过 as-set-instance-health 实现自定义解决方案，如 Maintaining Current Scaling Level 中的 Custom Health Check 部分所述。 :

If you have your own health check system, you can integrate it with Auto Scaling. Use SetInstanceHealth to send the instance's health information directly from your system to Auto Scaling.

关于amazon-ec2 - 没有从 ELB 到其中一个 Auto Scaling 实例的流量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/9287022/

amazon-ec2 - 没有从 ELB 到其中一个 Auto Scaling 实例的流量

更新2

更新 1

结论

解决方案

初始答案

上一篇：macos - 没有multilib的gcc是什么意思？

下一篇：debugging - Android-java.lang.ClassCastException:android.widget.LinearLayout无法转换为android.widget.FrameLayout