amazon-web-services - CDK重构导致ECS任务失败: Unable to Retrieve ECR Registry Auth

标签 amazon-web-services aws-cloudformation aws-cdk

我更改了 CDK 部署代码以使其更加模块化。因此,我将任务定义和 FargateService 代码移至单独的类 EcsService 中。进行这些更改后,堆栈部署由于 ECS 而陷入停滞。原因是由于某些权限或网络问题,taskdef 无法获取图像。错误如下所示。我的旧代码和新代码都在错误消息下方。

错误

Task stopped at: 2023-08-31T05:55:55.882Z
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.us-east-1.amazonaws.com/": dial tcp 44.213.79.50:443: i/o timeout. Please check your task network configuration.

旧代码

securityGroup.addIngressRule(ec2.Peer.anyIpv4(), ec2.Port.tcp(3000));

// Validation
if (!envJSON.ssdDockerImageTag ) {
  throw new Error('Missing ssd-fe image tag.');
}
const cluster = new ecs.Cluster(this, "ssdCluster", { vpc });

// Define the task definition with a container using an image from ECR
const taskDefinition = new ecs.FargateTaskDefinition(this, 'ssdTaskDef');
const container = taskDefinition.addContainer('ssdContainer', {
  image: ecs.ContainerImage.fromEcrRepository(
    ecr.Repository.fromRepositoryName(this, 'ssdRepo', 'ssd-fe'),
    envJSON.ssdDockerImageTag),
  memoryLimitMiB: 512,
  cpu: 256,
  portMappings: [{
    containerPort: 3000
  }],
  environment: {
    NODE_ENV: "production",
    API_BASE_URL: api.url
  }
});

// Create the Fargate Service
const service = new ecs.FargateService(this, 'ssdService', {
  cluster,
  taskDefinition,
  desiredCount: 1,
  vpcSubnets: {
    subnetType: ec2.SubnetType.PUBLIC,
  },
  securityGroups: [securityGroup],
  assignPublicIp: true,
});

LoadBalancer.getInstance(this, 'LoadBalancer', {
  vpc,
  ecsService: service,
});

新代码

securityGroup.addIngressRule(ec2.Peer.anyIpv4(), ec2.Port.tcp(3000));

// Validation
if (!envJSON.ssdDockerImageTag ) {
  throw new Error('Missing ssd-fe image tag.');
}
const cluster = new ecs.Cluster(this, "ssdCluster", { vpc });

// Create ECS Service
const ecsService = new EcsService(this, 'ssdService', {
  vpc,
  securityGroup: securityGroup,
  cluster: cluster,
  repoName: 'ssd-fe',
  imageTag: envJSON.ssdDockerImageTag,
  environment: {
    NODE_ENV: "production",
    API_BASE_URL: api.url
  }
});

LoadBalancer.getInstance(this, 'LoadBalancer', {
  vpc,
  ecsService: ecsService.service,
});

// EcsService class
import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecr from 'aws-cdk-lib/aws-ecr';
import * as logs from 'aws-cdk-lib/aws-logs';
import * as iam from 'aws-cdk-lib/aws-iam';
import { Construct } from 'constructs';

interface EcsServiceProps {
  vpc: ec2.IVpc;
  securityGroup: ec2.ISecurityGroup;
  cluster: ecs.ICluster;
  repoName: string;
  imageTag: string;
  environment?: { [key: string]: string };
}

export class EcsService extends Construct {
  public readonly service: ecs.FargateService;

  constructor(scope: Construct, id: string, props: EcsServiceProps) {
    super(scope, id);
    
    const ecrRepository = ecr.Repository.fromRepositoryName(this, `${id}Repo`, props.repoName);
    
    const taskDefinition = new ecs.FargateTaskDefinition(this, `${id}TaskDef`);
    taskDefinition.addContainer(`${id}Container`, {
      image: ecs.ContainerImage.fromEcrRepository(ecrRepository, props.imageTag),
      memoryLimitMiB: 512,
      cpu: 256,
      portMappings: [{ containerPort: 3000 }],
      environment: props.environment,
    });

    this.service = new ecs.FargateService(this, id, {
      cluster: props.cluster,
      taskDefinition,
      desiredCount: 1,
      vpcSubnets: { subnetType: ec2.SubnetType.PUBLIC },
      securityGroups: [props.securityGroup],
    });
  }
}

IAM 语句更改

┌───┬───────────────────────────────────────┬────────┬───────────────────────────────────────┬───────────────────────────────────────┬───────────┐
│   │ Resource                              │ Effect │ Action                                │ Principal                             │ Condition │
├───┼───────────────────────────────────────┼────────┼───────────────────────────────────────┼───────────────────────────────────────┼───────────┤
│ - │ *                                     │ Allow  │ ecr:GetAuthorizationToken             │ AWS:${ssdTaskDefExecutionRole469C7625 │           │
│   │                                       │        │                                       │ }                                     │           │
├───┼───────────────────────────────────────┼────────┼───────────────────────────────────────┼───────────────────────────────────────┼───────────┤
│ - │ arn:aws:ecr:us-east-1:533732470418:re │ Allow  │ ecr:BatchCheckLayerAvailability       │ AWS:${ssdTaskDefExecutionRole469C7625 │           │
│   │ pository/ssd-fe                       │        │ ecr:BatchGetImage                     │ }                                     │           │
│   │                                       │        │ ecr:GetDownloadUrlForLayer            │                                       │           │
├───┼───────────────────────────────────────┼────────┼───────────────────────────────────────┼───────────────────────────────────────┼───────────┤
│ + │ ${ssdService/ssdServiceTaskDef/Execut │ Allow  │ sts:AssumeRole                        │ Service:ecs-tasks.amazonaws.com       │           │
│   │ ionRole.Arn}                          │        │                                       │                                       │           │
├───┼───────────────────────────────────────┼────────┼───────────────────────────────────────┼───────────────────────────────────────┼───────────┤
│ + │ ${ssdService/ssdServiceTaskDef/TaskRo │ Allow  │ sts:AssumeRole                        │ Service:ecs-tasks.amazonaws.com       │           │
│   │ le.Arn}                               │        │                                       │                                       │           │
├───┼───────────────────────────────────────┼────────┼───────────────────────────────────────┼───────────────────────────────────────┼───────────┤
│ + │ *                                     │ Allow  │ ecr:GetAuthorizationToken             │ AWS:${ssdService/ssdServiceTaskDef/Ex │           │
│   │                                       │        │                                       │ ecutionRole}                          │           │
├───┼───────────────────────────────────────┼────────┼───────────────────────────────────────┼───────────────────────────────────────┼───────────┤
│ + │ arn:aws:ecr:us-east-1:533732470418:re │ Allow  │ ecr:BatchCheckLayerAvailability       │ AWS:${ssdService/ssdServiceTaskDef/Ex │           │
│   │ pository/ssd-fe                       │        │ ecr:BatchGetImage                     │ ecutionRole}                          │           │
│   │                                       │        │ ecr:GetDownloadUrlForLayer            │                                       │           │
└───┴───────────────────────────────────────┴────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────┘

ChatGPT 建议我明确地在 EcsService 类中添加权限,因此我进行了以下更改。但即使在这些更改之后,错误仍然是相同的。

// Create an execution role
const executionRole = new iam.Role(this, 'ExecutionRole', {
  assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com'),
});

// Grant permissions to the execution role to pull from ECR
executionRole.addToPolicy(new iam.PolicyStatement({
  actions: [
    'ecr:GetAuthorizationToken'
  ],
  resources: ['*'],
}));

const ecrRepository = ecr.Repository.fromRepositoryName(this, `${id}Repo`, props.repoName);

const taskDefinition = new ecs.FargateTaskDefinition(this, `${id}TaskDef`, {
  executionRole: executionRole
});

如何解决这个问题?

最佳答案

在@gshpychka的帮助下,我可以解决这个问题。这是我的做法。

该问题与 ECS 任务访问互联网以从 ECR 提取 Docker 镜像的能力有关。当我重构代码时,我无意中从 Fargate 服务构造函数中删除了 allocatePublicIp: true 属性,认为这是不必要的。

为了解决该问题,我在 Fargate 服务构造函数中重新添加了 allocatePublicIp: true,如下所示:

this. Service = new ecs.FargateService(this, id, {
  cluster: props. Cluster,
  taskDefinition,
  desiredCount: 1,
  assignPublicIp: true  // This line solved the issue
});

添加此属性可确保 ECS 任务具有 Internet 访问权限,从而使其能够成功拉取 Docker 镜像。

PS:我还删除了 vpcSubnetssecurityGroups 属性。这些都是不必要的。

关于amazon-web-services - CDK重构导致ECS任务失败: Unable to Retrieve ECR Registry Auth,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/77013257/

相关文章:

python - 无法使用 python -m 调用工具执行

amazon-web-services - CDK 将映射模板添加到 LambdaIntegration

python - 如何将值附加到 AWS DynamoDB 上的列表属性?

amazon-web-services - 如何在 terraform 中旋转时将现有 IAM 角色设置为新实例

linux - CloudFormation如何导出变量

aws-api-gateway - 从 CDK 在 HTTP API 网关上启用 IAM 授权

amazon-web-services - 使用 CDK diff 来区分管道中包含的资源

amazon-web-services - Sagemaker Studio Pyspark 示例失败

amazon-web-services - S3 直接存储桶上传 : success but file not there

amazon-web-services - Cloudformation - 重新部署使用记录集的环境(使用 Jenkins)