kubernetes - 无法从堡垒访问通过 Terraform 进行的 AWS EKS 集群设置

标签 kubernetes terraform terraform-provider-aws amazon-eks terraform-aws-modules

背景和背景

我正在开发一个 Terraform 项目,该项目的最终目标是具有以下属性的 EKS 集群:

  1. 对外部互联网不公开
  2. 可通过堡垒主机访问
  3. 使用工作组
  4. 可通过 Terraform Kubernetes 模块配置资源(部署、cron 作业等)

为了实现此目标,我稍微修改了 Terraform EKS 示例(问题底部的代码)。我遇到的问题是,通过 SSH 连接到堡垒后,我无法 ping 集群,并且任何像 kubectl get pods 这样的命令都会在大约 60 秒后超时。

以下是我所知道的事实/事情:

  1. 出于测试目的,我(暂时)已将集群切换到公共(public)集群。以前,当我将 cluster_endpoint_public_access 设置为 false 时,terraform apply 命令甚至无法完成,因为它无法访问/healthz code> 集群上的端点。
  2. 堡垒配置的工作原理是用户数据成功运行并安装 kubectl 和 kubeconfig 文件
  3. 我可以通过静态 IP(即代码中的 var.company_vpn_ips)通过 SSH 访问堡垒
  4. 这完全有可能完全是一个网络问题,而不是 EKS/Terraform 问题,因为我对 VPC 及其安全组如何适应此情况的理解还不完全成熟。

代码

这是 VPC 配置:

locals {
  vpc_name            = "my-vpc"
  vpc_cidr            = "10.0.0.0/16"
  public_subnet_cidr  = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
  private_subnet_cidr = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}

# The definition of the VPC to create

module "vpc" {

  source  = "terraform-aws-modules/vpc/aws"
  version = "3.2.0"

  name                 = local.vpc_name
  cidr                 = local.vpc_cidr
  azs                  = data.aws_availability_zones.available.names
  private_subnets      = local.private_subnet_cidr
  public_subnets       = local.public_subnet_cidr
  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true

  tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }

  public_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                    = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/internal-elb"           = "1"
  }
}

data "aws_availability_zones" "available" {}

然后是我为集群创建的安全组:

resource "aws_security_group" "ssh_sg" {
  name_prefix = "ssh-sg"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port = 22
    to_port   = 22
    protocol  = "tcp"

    cidr_blocks = [
      "10.0.0.0/8",
    ]
  }
}

resource "aws_security_group" "all_worker_mgmt" {
  name_prefix = "all_worker_management"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port = 22
    to_port   = 22
    protocol  = "tcp"

    cidr_blocks = [
      "10.0.0.0/8",
      "172.16.0.0/12",
      "192.168.0.0/16",
    ]
  }
}

这是集群配置:

locals {
  cluster_version = "1.21"
}

# Create the EKS resource that will setup the EKS cluster
module "eks_cluster" {
  source = "terraform-aws-modules/eks/aws"

  # The name of the cluster to create
  cluster_name = var.cluster_name

  # Disable public access to the cluster API endpoint
  cluster_endpoint_public_access = true

  # Enable private access to the cluster API endpoint
  cluster_endpoint_private_access = true

  # The version of the cluster to create
  cluster_version = local.cluster_version

  # The VPC ID to create the cluster in
  vpc_id = var.vpc_id

  # The subnets to add the cluster to
  subnets = var.private_subnets

  # Default information on the workers
  workers_group_defaults = {
    root_volume_type = "gp2"
  }

  worker_additional_security_group_ids = [var.all_worker_mgmt_id]

  # Specify the worker groups
  worker_groups = [
    {
      # The name of this worker group
      name = "default-workers"
      # The instance type for this worker group
      instance_type = var.eks_worker_instance_type
      # The number of instances to raise up
      asg_desired_capacity = var.eks_num_workers
      asg_max_size         = var.eks_num_workers
      asg_min_size         = var.eks_num_workers
      # The security group IDs for these instances
      additional_security_group_ids = [var.ssh_sg_id]
    }
  ]
}

data "aws_eks_cluster" "cluster" {
  name = module.eks_cluster.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks_cluster.cluster_id
}

output "worker_iam_role_name" {
  value = module.eks_cluster.worker_iam_role_name
}

最后是堡垒:

locals {
  ami           = "ami-0f19d220602031aed" # Amazon Linux 2 AMI (us-east-2)
  instance_type = "t3.small"
  key_name      = "bastion-kp"
}

resource "aws_iam_instance_profile" "bastion" {
  name = "bastion"
  role = var.role_name
}

resource "aws_instance" "bastion" {
  ami           = local.ami
  instance_type = local.instance_type

  key_name                    = local.key_name
  associate_public_ip_address = true
  subnet_id                   = var.public_subnet
  iam_instance_profile        = aws_iam_instance_profile.bastion.name

  security_groups = [aws_security_group.bastion-sg.id]

  tags = {
    Name = "K8s Bastion"
  }

  lifecycle {
    ignore_changes = all
  }

  user_data = <<EOF
      #! /bin/bash

      # Install Kubectl
      curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
      install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
      kubectl version --client

      # Install Helm
      curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
      chmod 700 get_helm.sh
      ./get_helm.sh
      helm version

      # Install AWS
      curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
      unzip awscliv2.zip
      ./aws/install
      aws --version

      # Install aws-iam-authenticator
      curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator
      chmod +x ./aws-iam-authenticator
      mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
      echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
      aws-iam-authenticator help

      # Add the kube config file 
      mkdir ~/.kube
      echo "${var.kubectl_config}" >> ~/.kube/config
  EOF
}

resource "aws_security_group" "bastion-sg" {
  name   = "bastion-sg"
  vpc_id = var.vpc_id
}

resource "aws_security_group_rule" "sg-rule-ssh" {
  security_group_id = aws_security_group.bastion-sg.id
  from_port         = 22
  protocol          = "tcp"
  to_port           = 22
  type              = "ingress"
  cidr_blocks       = var.company_vpn_ips
  depends_on        = [aws_security_group.bastion-sg]
}

resource "aws_security_group_rule" "sg-rule-egress" {
  security_group_id = aws_security_group.bastion-sg.id
  type              = "egress"
  from_port         = 0
  protocol          = "all"
  to_port           = 0
  cidr_blocks       = ["0.0.0.0/0"]
  ipv6_cidr_blocks  = ["::/0"]
  depends_on        = [aws_security_group.bastion-sg]
}

询问

对我来说最紧迫的问题是找到一种通过堡垒与集群交互的方法,以便 Terraform 代码的其他部分可以运行(在集群本身中启动的资源)。我还希望了解当 terraform apply 命令无法访问私有(private)集群时如何设置它。预先感谢您提供的任何帮助!

最佳答案

查看您的节点组如何与控制平面通信,您需要将相同的集群安全组添加到您的堡垒主机,以便它与控制平面通信。您可以在 EKS 控制台 - 网络选项卡上找到 SG ID。

关于kubernetes - 无法从堡垒访问通过 Terraform 进行的 AWS EKS 集群设置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70477754/

相关文章:

azure - 在 Azure 管理 API 中找不到订阅

amazon-web-services - AWS Lambda 未向 AWS SQS DLQ 发送错误消息

Azure ACR 任务 API?有一个在 docker 容器中运行的应用程序需要构建镜像并将其推送到 ACR

python - Pod错误-Fastparquet的建筑轮子失败

kubernetes - PersistentVolume 不使用本地主机路径

amazon-web-services - 如何销毁特定的 Terraform 托管资源?

docker - 为什么在使用多个 Docker 主机时出现 Unable to create container with image Unable to pull image 错误 pulling image?

amazon-ec2 - 终止 EC2 实例时删除 EBS 卷,通过 terraform

Terraform 创建 Lambda 函数时出错 : ResourceConflictException with the resource just created by Terraform apply

amazon-web-services - 在 Terraform 的 aws_ecs_task_definition 资源中设置 ulimit 堆栈大小