amazon-web-services - ECS Execution Role导致容器启动时日志驱动失败?

标签 amazon-web-services timeout amazon-iam amazon-ecs

当使用自定义 IAM 角色作为 ECS 任务定义的自定义执行角色时,由于无法初始化 CloudWatch 日志记录驱动程序,我们生成的服务将无法在我们的 ECS 实例上启动。具体来说,我们在 CloudWatch 中看到来自 ECS 代理的以下错误:

2019-10-24T21:43:10Z [INFO] TaskHandler: Adding event: TaskChange: [arn:aws:ecs:us-west-1:REDACTED -> STOPPED, Known Sent: NONE, PullStartedAt: 2019-10-24 21:43:08.499577397 +0000 UTC m=+187.475751716, PullStoppedAt: 2019-10-24 21:43:09.69279918 +0000 UTC m=+188.668973506, ExecutionStoppedAt: 2019-10-24 21:43:10.153954812 +0000 UTC m=+189.130129126, arn:aws:ecs:us-west-1:REDACTED wordpress -> STOPPED, Reason CannotStartContainerError: Error response from daemon: failed to initialize logging driver: CredentialsEndpointError: failed to load credentials

caused by: Get http://169.254.170.2/v2/credentials/REDACTED: dial tcp 169.254.170.2:80: connect: connection refused, Known Sent: NONE] sent: false

这个“连接被拒绝错误”曾经是一个超时错误,但我在阅读类似问题后尝试通过添加来自 https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html 的 iptables 条目来调试这个问题。即使这是 Amazon ECS 配置的 CoreOS EC2 实例(不是自定义实例)。

本质上是那个链接和other issues similar to mine推荐以下内容,至少将错误更改为超时错误:

ubuntu:~$ sudo iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
ubuntu:~$ sudo iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679

请注意,当我们不在容器定义中使用自定义 IAM 执行角色时,此容器定义在正常情况下运行并完全正常工作;但由于我试图在任务定义中添加 AWS SecretsManager secret ;这需要我们定义一个可以访问 secret 的自定义角色。

编辑:这是 ECS 实例的角色策略 JSON 和 cloud-config.yml:

JSON 策略角色:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:Describe*",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
        "elasticloadbalancing:DeregisterTargets",
        "elasticloadbalancing:Describe*",
        "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
        "elasticloadbalancing:RegisterTargets"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
        "ssm:GetParameters",
        "secretsmanager:GetSecretValue",
        "kms:Decrypt"
        ],
        "Resource": [
            "${var.aws_mysql_secret_arn}"
        ]
    }
  ]
}

云配置.yml

coreos:
  units:
   - name: update-engine.service
     command: stop
   - name: amazon-ecs-agent.service
     command: start
     runtime: true
     content: |
       [Unit]
       Description=AWS ECS Agent
       Documentation=https://docs.aws.amazon.com/AmazonECS/latest/developerguide/
       Requires=docker.socket
       After=docker.socket

       [Service]
       Environment=ECS_CLUSTER=${ecs_cluster_name}
       Environment=ECS_LOGLEVEL=${ecs_log_level}
       Environment=ECS_VERSION=${ecs_agent_version}
       Restart=on-failure
       RestartSec=30
       RestartPreventExitStatus=5
       SyslogIdentifier=ecs-agent
       ExecStartPre=-/bin/mkdir -p /var/log/ecs /var/ecs-data /etc/ecs
       ExecStartPre=-/usr/bin/docker kill ecs-agent
       ExecStartPre=-/usr/bin/docker rm ecs-agent
       ExecStartPre=iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
       ExecStartPre=iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679
       ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent:$${ECS_VERSION}
       ExecStart=/usr/bin/docker run --name ecs-agent \
                                     --volume=/var/run/docker.sock:/var/run/docker.sock \
                                     --volume=/var/log/ecs:/log \
                                     --volume=/var/ecs-data:/data \
                                     --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro \
                                     --volume=/run/docker/execdriver/native:/var/lib/docker/execdriver/native:ro \
                                     --publish=127.0.0.1:51678:51678 \
                                     --env=ECS_LOGFILE=/log/ecs-agent.log \
                                     --env=ECS_LOGLEVEL=$${ECS_LOGLEVEL} \
                                     --env=ECS_DATADIR=/data \
                                     --env=ECS_CLUSTER=$${ECS_CLUSTER} \
                                     --env=ECS_AVAILABLE_LOGGING_DRIVERS='["awslogs"]' \
                                     --env=ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true \
                                     --log-driver=awslogs \
                                     --log-opt awslogs-region=${aws_region} \
                                     --log-opt awslogs-group=${ecs_log_group_name} \
                                     amazon/amazon-ecs-agent:$${ECS_VERSION}

最佳答案

如果您在这种情况下失败,请检查 2 个选项。

  1. ECS 执行角色策略 的权限。它应该包含 logs:CreateLogStreamlogs:PutLogEvents。喜欢:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}
  1. 您应该为 awslogs 驱动程序配置 ecs_agent 的配置。

此配置文件路径是主机中的/etc/ecs/ecs.config。这个文件应该是这样的:

awslogs 驱动添加到ecs.config

ECS_CLUSTER=test_ecs_cluster
ECS_AVAILABLE_LOGGING_DRIVERS=["awslogs","json-file"]

See :

Here's a document

关于amazon-web-services - ECS Execution Role导致容器启动时日志驱动失败?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58549635/

相关文章:

aws-lambda - 模板资源属性无效 'Policies'

ios - AWSS3 区域/plist 配置问题 'The service configuration is ` nil`

node.js - 在上传到亚马逊 s3 服务器之前是否有必要将图像从图库转换为 Base64?

asynchronous - Dart :io clean up Futures that timeout? 是如何实现的

amazon-web-services - 无服务器部署 lambda 无法承担现有 IAM 角色

python - 如何发现 boto 方法调用和 AWS 服务操作之间的*精确*对应关系

python - boto3 给出 AccessDenied,有没有办法查找缺少的权限?

amazon-web-services - 创建地理匹配 AWS WAF 条件/规则并使用 CloudFormation 将其连接到现有 CloudFront 分配

c# - 比赛收集超时

javascript - setTimeout -- 立即执行回调?