python - ASW Athena boto3 冰山 table - 插入

标签 python boto3 amazon-athena

我正在将 AWS 与 Athena Iceberg 结合使用。 我尝试使用 boto3 向 Athena 冰山表插入新记录,但出现内部错误 GENERIC_INTERNAL_ERROR: 无法在没有事务冲突的情况下提交。如果数据 list 文件是在 * 生成的 - 我猜 boto3 会尝试像常规 Athena 表一样将文件添加到 s3。 任何想法:

...

import boto3
import pandas as pd    
def athena_query_to_dataframe(db, s3Bucket, query):
    
    client = boto3.client('athena')
    listOfStatus = ['SUCCEEDED', 'FAILED', 'CANCELLED']
    listOfInitialStatus = ['RUNNING', 'QUEUED']
    
    print('Starting Query Execution:')
    
    tempS3Path = 's3://{}'.format(s3Bucket)
    
    response = client.start_query_execution(
        QueryString = query,
        QueryExecutionContext = {
            'Database': db
        },
        ResultConfiguration = {
            'OutputLocation': tempS3Path,
        }
    )

    queryExecutionId = response['QueryExecutionId']
    print(client.get_query_execution(QueryExecutionId = queryExecutionId))

    status = client.get_query_execution(QueryExecutionId = queryExecutionId)['QueryExecution']['Status']['State']

    while status in listOfInitialStatus:
        status = client.get_query_execution(QueryExecutionId = queryExecutionId)['QueryExecution']['Status']['State']
        if status in listOfStatus:
            if status == 'SUCCEEDED':
                print('Query Succeeded!')
                paginator = client.get_paginator('get_query_results')
                query_results = paginator.paginate(
                    QueryExecutionId = queryExecutionId,
                    PaginationConfig = {'PageSize': 1000}
                )
            elif status == 'FAILED':
                print('Query Failed!')
            elif status == 'CANCELLED':
                print('Query Cancelled!')
            break
    print(client.get_query_execution(QueryExecutionId = queryExecutionId))
    
    results = []
    rows = []
    
    print('Processing Response')

in_cmd = """ insert into my_iceberg_table values ('aaaaa','bbb');"""
athena_query_to_dataframe('my'db,'my-bck/athena/tables/my_iceberg_table/',in_cmd)

...

最佳答案

这是权限问题... 因为这是一个 Iceberg 表,我还需要粘合表和架构的目录权限。 所以我添加了此 IAM 权限:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:UpdateTable",
                "glue:GetTable"
            ],
            "Resource": [
                "arn:aws:glue:*:<account>:table/<schema>/<table>",
                "arn:aws:glue:*:<account>:schema/AwsDataCatalog",
                "arn:aws:glue:*:<account>:database/<schema>",
                "arn:aws:glue:*:<account>:catalog"
            ]
        }
    ]
}

关于python - ASW Athena boto3 冰山 table - 插入,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73929159/

相关文章:

amazon-web-services - Athena 中 YYYY-MM-DD 日期格式的字符串

amazon-s3 - 用于高效 Athena 查询的 S3 分区(文件大小)

python - Pandas:如何格式化两种类型的日期?

python - 在 AWS Cloudformation 上调用 ListStacks 时出现验证错误

amazon-web-services - 我可以使用 boto 将 "ok_action"添加到现有的 cloudwatch 警报而不创建新警报吗?

python - Boto3 逐行从 S3 键读取文件内容

python - 有没有办法防止 plotnine 在将 ggplot 对象保存到文件时打印用户警告?

python - Databricks dbutils 不显示特定文件夹下的文件夹列表

python - 从字符串中获取多个标记的更好方法? (Python 2)

sql - 如何在 AWS Athena 中将行转换为列?