python - 从 Blaze 访问 S3 上的分片 JSON 文件中的数据

我正在尝试访问 S3 上以行分隔的 JSON 数据。根据我对文档的理解，我应该能够做类似的事情

print data(S3(Chunks(JSONLines))('s3://KEY:SECRET@bucket/dir/part-*.json').peek()

抛出

BotoClientError: BotoClientError: Bucket names cannot contain upper-case characters when using either the sub-domain or virtual hosting calling format.

我已经尝试过导致不同错误的变体。

我可以获得以下内容来处理本地文件:

print data(chunks(JSONLines)(map(JSONLines, glob("/home/me/data/*")))).peek()

不过，我不太确定为什么需要 (map(JSONLines, glob(。

我真的不明白如何使用 type-modofiers

最佳答案

在本页的示例 6 的评论部分 http://www.programcreek.com/python/example/51587/boto.exception.BotoClientError指出:

Bucket names must not contain uppercase characters. We check for this by appending a lowercase character and testing with islower(). Note this also covers cases like numeric bucket names with dashes.

用于此评估的函数是 check_lowercase_bucketname(n) 通过示例调用我们得到:

>>> check_lowercase_bucketname("Aaaa")
Traceback (most recent call last):
...
BotoClientError: S3Error: Bucket names cannot contain upper-case
characters when using either the sub-domain or virtual hosting calling
format.

>>> check_lowercase_bucketname("1234-5678-9123")
True
>>> check_lowercase_bucketname("abcdefg1234")
True

上面提到的，让我相信你对 's3://KEY:SECRET@bucket/dir/part-*.json' 的调用没有通过，因为 KEY 和/或 SECRET 变量包含大写或不允许的字符

关于python - 从 Blaze 访问 S3 上的分片 JSON 文件中的数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42603299/

上一篇：python - 从源代码安装 mod_wsgi 时出错

下一篇：python - 如何在 pandas 中使用 read_fwf 跳过空行？

javascript - 删除对象中找到的所有 "true"和 "false"周围的引号

java - Jersey MOXy 不解析 Snake_case

xml - ASP.NET JSON Web 服务总是返回用 XML 包装的 JSON 响应

amazon-web-services - 公开 AWS S3 存储桶策略。如何使对象私有(private)？

python - 如何使用 boto3 仅检索 S3 中的 last_modified key

python - 有人可以解释一下以下 Django 代码背后的逻辑吗？

python - SQL LIKE 多个值

amazon-s3 - 无法从使用 pyspark 内核运行的 emr 笔记本内的 s3 存储桶下载文件

python - 按频率对字符串列表(URL)进行排序并删除重复项