我使用python代码将数据从bigquery导出到gcs,然后使用gsutil导出到s3!但是导出到gcs后,我注意到有些文件超过5 GB,gsutil无法处理?所以我想知道限制大小的方法
最佳答案
尝试使用单个通配符 URI
请参阅 Exporting data into one or more files
的文档
Use a single wildcard URI if you think your exported data will be larger than BigQuery's 1 GB per file maximum value. BigQuery shards your data into multiple files based on the provided pattern. If you use a wildcard in a URI component other than the file name, be sure the path component does not exist before exporting your data.
Property definition:
['gs://[YOUR_BUCKET]/file-name-*.json']
Creates:
gs://my-bucket/file-name-000000000000.json
gs://my-bucket/file-name-000000000001.json
gs://my-bucket/file-name-000000000002.json ...Property definition:
['gs://[YOUR_BUCKET]/path-component-*/file-name.json']
Creates:
gs://my-bucket/path-component-000000000000/file-name.json
gs://my-bucket/path-component-000000000001/file-name.json
gs://my-bucket/path-component-000000000002/file-name.json
关于google-bigquery - 如何限制从 bigquery 导出到 gcs 的文件的大小?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44117092/