amazon-web-services - AWS Glue Cloudformation 排除模式排除 : String

标签 amazon-web-services aws-cloudformation aws-glue

我已经在AWS控制台上成功设置了一个胶水爬虫。 现在我有一个 Cloudformation 模板来模拟整个过程,除了我无法添加 Exclusions:字段到模板。背景:在 AWS Glue API 中,Exclusions: 字段表示全局模式,用于排除与数据存储(在我的示例中为 S3 数据存储)内的特定模式匹配的文件或文件夹。

尽管付出了很大的努力,我还是无法将 glob 模式填充到胶水爬网程序控制台上,尽管脚本中的所有其他值都与爬网程序配置一起填充,即 S3Target、爬网程序名称、IAM 角色和分组行为,所有这些胶水设置/fields 从 CFN 模板成功填充,排除字段除外,在 Glue 控制台上也称为排除模式。我的 CFN 模板通过了验证,并且我已经运行了爬网程序,希望排除 glob(尽管隐藏)仍会产生影响,但不幸的是,我似乎无法填充“排除”字段?

Here's the S3Target Exclusion AWS Glue API guide

Here's an AWS sample YAML CFN for a Glue Crawler

Here's a helpful YAML string array guide

YAML

 CFNCrawlerSecDeraNUM:
    Type: AWS::Glue::Crawler
    Properties:
      Name: !Ref CFNCrawlerName
      Role: !GetAtt CFNRoleSecDERA.Arn
      #Classifiers: none, use the default classifier
      Description: AWS Glue crawler to crawl SecDERA data
      #Schedule: none, use default run-on-demand
      DatabaseName: !Ref CFNDatabaseName
      Targets:
        S3Targets:
          - Exclusions:
              - "*/readme.htm"
              - "*/sub.txt"
              - "*/pre.txt"
              - "*/tag.txt"
          - Path: "s3://sec-input"
      TablePrefix: !Ref CFNTablePrefixName
      SchemaChangePolicy:
        UpdateBehavior: "UPDATE_IN_DATABASE"
        DeleteBehavior: "LOG"
        # Added single schema grouping Glue API option
      Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}},\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"

JSON

"CFNCrawlerSecDeraNUM": {
    "Type": "AWS::Glue::Crawler",
    "Properties": {
        "Name": {
            "Ref": "CFNCrawlerName"
        },
        "Role": {
            "Fn::GetAtt": [
                "CFNRoleSecDERA",
                "Arn"
            ]
        },
        "Description": "AWS Glue crawler to crawl SecDERA data",
        "DatabaseName": {
            "Ref": "CFNDatabaseName"
        },
        "Targets": {
            "S3Targets": [
                {
                    "Exclusions": [
                        "*/readme.htm",
                        "*/sub.txt",
                        "*/pre.txt",
                        "*/tag.txt"
                    ]
                },
                {
                    "Path": "s3://sec-input"
                }
            ]
        },
        "TablePrefix": {
            "Ref": "CFNTablePrefixName"
        },
        "SchemaChangePolicy": {
            "UpdateBehavior": "UPDATE_IN_DATABASE",
            "DeleteBehavior": "LOG"
        },
        "Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}},\"Grouping\":{\"TableGroupingPolicy\":\"CombineCompatibleSchemas\"}}"
    }
}

最佳答案

您正在将 Exclusions 作为新的 S3Target 对象传递到 S3Targets 列表。

尝试更改此:

  Targets:
    S3Targets:
      - Exclusions:
          - "*/readme.htm"
          - "*/sub.txt"
          - "*/pre.txt"
          - "*/tag.txt"
      - Path: "s3://sec-input"

对此:

  Targets:
    S3Targets:
      - Path: "s3://sec-input"
        Exclusions:
          - "*/readme.htm"
          - "*/sub.txt"
          - "*/pre.txt"
          - "*/tag.txt"

关于amazon-web-services - AWS Glue Cloudformation 排除模式排除 : String,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58850677/

相关文章:

http - Golang - 尝试 GET 请求时来自 S3 的 SignatureDoesNotMatch 错误

amazon-web-services - 如何从Amazon Athena查询 Parquet 数据?

amazon-web-services - Curl 上的 API 网关 : ok on TEST button, 500

amazon-web-services - AWS ELB 监听器创建失败并出现验证异常

amazon-web-services - 更新 Glue Table Schema 时设置小数位数

amazon-athena - 亚马逊雅典娜不使用胶水目录

amazon-web-services - 使用 AWS 数据管道 - EMR 与 EC2

amazon-web-services - 云信息 : How to read file contents from s3 object and use the contents in cloudformation template as a string

amazon-ec2 - 如何使用 Cloud Formation 或 AWS CLI 将扩展策略添加到自动扩展组

amazon-athena - Amazon Athena 允许查看访问并拒绝表访问