我正在尝试从 Google Cloud Storage 读取 pyspark DataFrame,但我不断收到错误消息,指出服务帐户没有 storage.objects.create 权限。该帐户没有 WRITER 权限,但它只是读取 parquet 文件:
spark_session.read.parquet(input_path)
18/12/25 13:12:00 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Repairing batch of 1 missing directories.
18/12/25 13:12:01 ERROR com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Failed to repair some missing directories.
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "***.gserviceaccount.com does not have storage.objects.create access to ***.",
"reason" : "forbidden"
} ],
"message" : "***.gserviceaccount.com does not have storage.objects.create access to ***."
}
最佳答案
我们发现了问题。这是由于 GCS 连接器中的隐式自动修复功能。我们通过设置 fs.gs.implicit.dir.repair.enable
禁用了此行为至 false
.
关于pyspark - 从 pyspark 读取时,Google Cloud Storage 需要 storage.objects.create 权限,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53922777/