python - 将 Glue Connection 资源的值传递给 Python Job

标签 python sql-server amazon-web-services aws-cloudformation aws-glue

在我的 AWS::Glue::Connection 资源中,我已设置了访问我的 SQL Server 数据库所需的所有凭据。

  GlueJDBCConnection:
    Type: AWS::Glue::Connection
    Properties:
      CatalogId: !Ref AWS::AccountId
      ConnectionInput:
        ConnectionType: "JDBC"
        ConnectionProperties:
          USERNAME: !Ref Username
          PASSWORD: !Ref Password
          JDBC_CONNECTION_URL: !Ref GlueJDBCStringTarget
          sslMode: 'REQUIRED'
        PhysicalConnectionRequirements:
          AvailabilityZone: !If [IsProd, !Ref AvailabilityZoneProd, !Ref AvailabilityZoneNonProd]
          SecurityGroupIdList:
            - Fn::GetAtt: GlueJobSecurityGroup.GroupId
          SubnetId: !If [IsProd, !Ref PrivateSubnetAz2, !Ref PrivateSubnetAz3]
        Name: !Ref JDBCConnectionName

我需要在 Python 脚本中使用 USERNAMEPASSWORD,但我不希望它们在 AWS 的 作业参数 部分中公开' 安慰。是否可以通过其他方式完成我在下面所做的事情?

  GlueJob:
    Type: AWS::Glue::Job
    DependsOn: GlueSecurityConfiguration
    Properties:
      Name: !Ref GlueJobName
      Role: !Ref RoleForRTMI
      SecurityConfiguration: !Ref SecurityConfiguration
      Command:
        Name: glueetl
        PythonVersion: 3
        ScriptLocation: !Sub 's3://xyz-${AWS::AccountId}-xx-xxxx-0/${blablabla}'
      DefaultArguments:
        '--USER': !Ref Username
        '--PASS': !Ref Password
      Connections:
        Connections:
        - Ref: GlueJDBCConnection
      ExecutionProperty:
        MaxConcurrentRuns: 2
      #MaxCapacity: 2 #if used, don't use WorkerType and NumberOfWorkers
      WorkerType: G.1X
      NumberOfWorkers: 2
      MaxRetries: 1
      GlueVersion: '2.0'
      Tags:
        name: value_1

Python 示例:

class FrameWriter:

    def __init__(self, environment: str, context: GlueContext):
        self.environment = environment
        self.context = context
    
    def write_frame(self, table_name: str, spark_df: DataFrame, rds_user: str, rds_pass: str):
        
        rds_creds = glue_rds_cred(self.environment)
        rds_user = rds_user
        rds_pass = rds_pass
        rds_url = dict_recursive_lookup("JDBC_CONNECTION_URL", rds_creds)

        glue_df = DynamicFrame.fromDF(spark_df, self.context, "glue_df")
        glue_table = table_name
        self.context.write_dynamic_frame.from_options(
            frame=glue_df,
            connection_type = 'sqlserver',
            connection_options = {"url": f"{rds_url}/db_name", "user": f"{rds_user}", "password": f"{rds_pass}", "dbtable": f"rdm.{glue_table}"},
            transformation_ctx="output",
        )

writer = FrameWriter(environment, glue_context)
writer.write_frame(name, sp_df, args["USER"], args["PASS"])

最佳答案

我想出了下面的代码,使用 boto3 提取用户并传递,这样我就不会在 AWS 的 Glue 控制台中公开它:

import boto3

def glue_rds_cred(environment) -> dict:
    client_glue = boto3.client("glue")
    response_rds_pass = client_glue.get_connection(
        # CatalogId='string',
        Name=f"instance_name-{environment}",
        HidePassword=False,
    )
    return response_rds_pass


def dict_recursive_lookup(k: str, d: dict) -> str:
    if k in d:
        return d[k]
    for v in d.values():
        if isinstance(v, dict):
            a = dict_recursive_lookup(k, v)
            if a is not None:
                return a
    return None

关于python - 将 Glue Connection 资源的值传递给 Python Job,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72005392/

相关文章:

Javascript 如何检查多个等待异步 mssql DB 请求结果是否有错误?

amazon-web-services - 如何在 Altostra ENV 中存储加密的环境变量?

amazon-web-services - 将对象从一个 S3 存储桶移动到另一个存储桶时,是否可以触发 S3 PUT 事件?

python - 为什么 'is' 运算符说这些方法不一样?

python - 后续: missing required Charfield in django Modelform is saved as empty string and do not raise an error

python - QWebEngineView:显示加载进度

sql - 使用 ISNULL 包装求和和计算

json - SQL Server sp_OAMethod 在调用 WebAPI 时返回 NULL

amazon-s3 - S3(Amazon简单存储系统)存储价格如何计算?

python - 如何从 django 查询集中检索项目?