amazon-web-services - LIST 的 S3 性能，前缀为单个存储桶中的数百万个对象

我有一个项目，其中 S3 存储桶中将有大约 8000 万个对象。每天，我将删除大约 400 万个并添加 400 万个。对象名称将位于伪目录结构中:

/012345/0123456789abcdef0123456789abcdef

对于删除，我需要列出前缀为 012345/ 的所有对象，然后删除它们。我担心此 LIST 操作所需的时间。虽然很明显 S3 对单个 Assets 的访问时间不会因单个对象而增加，但我还没有发现任何明确的说法表明对 80MM 对象的 LIST 操作，搜索 10 个都具有相同前缀的对象将在这样的情况下保持快速一个大桶。

在可以存储在存储桶中的 side comment on a question about the maximum number of objects 中(从 2008 年开始):

In my experience, LIST operations do take (linearly) longer as object count increases, but this is probably a symptom of the increased I/O required on the Amazon servers, and down the wire to your client.

从 Amazon S3 documentation :

There is no limit to the number of objects that can be stored in a bucket and no difference in performance whether you use many buckets or just a few. You can store all of your objects in a single bucket, or you can organize them across several buckets.

虽然我倾向于相信亚马逊的文档，但并不完全清楚他们的评论针对的是什么操作。

在提交这个昂贵的计划之前，我想明确地知道当桶包含数百万个对象时，按前缀搜索时的 LIST 操作是否保持快速。如果有人对这么大的水桶有实际经验，我很想听听你的意见。

最佳答案

如果您正确选择了前缀，则前缀搜索会很快。解释如下:https://cloudnative.io/blog/2015/01/aws-s3-performance-tuning/

关于amazon-web-services - LIST 的 S3 性能，前缀为单个存储桶中的数百万个对象，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25061960/

上一篇：amazon-web-services - 如何使用 Boto 3 获取域的托管区域？

下一篇：amazon-web-services - 调用CreateStack操作时出现(ValidationError): Template format error: Every Description member must be a string

bash - 如何将 tar 压缩操作通过管道传输到 aws s3 cp？

python - 本地文件系统作为 Django 中的远程存储

laravel - 在 Laravel 中压缩和下载 Amazon S3 存储桶文件和文件夹

amazon-s3 - 将 Glacier 永久恢复到 S3

html - 如何删除nodejs/html中的科学计数法并仅以十进制显示？

amazon-web-services - 如何 grep 到 S3 中存储的文件

javascript - Alexa Smarthome-无需 AWS 的技能

amazon-web-services - Sagemaker 模型训练中设备上没有剩余空间

scala - Spark 驱动程序不会因异常而崩溃