google-cloud-platform - 谷歌云存储文件系统,Python 包错误 : AttributeError: 'GCSFile' object has no attribute 'gcsfs'

标签 google-cloud-platform python-3.7 google-cloud-run

我正在尝试运行一个 python 代码,它将从源 URL 下载数据 block 并将其流式传输到目标云存储 blob。 它在独立 pc、本地函数等中运行良好。 但是当我尝试使用 GCP Cloud RUN 时,它会抛出奇怪的错误。

AttributeError: 'GCSFile' object has no attribute 'gcsfs'

完整的错误:

Traceback (most recent call last):
  File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1683, in __del__
    self.close()
  File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1661, in close
    self.flush(force=True)
  File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1527, in flush
    self._initiate_upload()
  File "/home/<user>/.local/lib/python3.9/site-packages/gcsfs/core.py", line 1443, in _initiate_upload
    self.gcsfs.loop,
AttributeError: 'GCSFile' object has no attribute 'gcsfs'

它耗费了我一周的时间,非常感谢任何帮助或指导,在此先感谢。

实际使用过的代码:

from flask import Flask, request
import os
import gcsfs
import requests

app = Flask(__name__)


@app.route('/urltogcs')
def urltogcs():
    try:
        os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "secret.json"
        gcp_file_system = gcsfs.GCSFileSystem(project='<project_id>')
        session = requests.Session()
        url = request.args.get('source', 'temp')
        blob_path = request.args.get('destination', 'temp')
        with session.get(url, stream=True) as r:
            r.raise_for_status()
            with gcp_file_system.open(blob_path, 'wb') as f_obj:
                for chunk in r.iter_content(chunk_size=1024 * 1024):
                    f_obj.write(chunk)
        return f'Successfully downloaded from {url} to {blob_path} :)'
    except Exception as e:
        print("Failure")
        print(e)
        return f'download failed for  {url} :('


if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))

最佳答案

您的代码(包含建议的更改)适用于我:

main.py:

from flask import Flask, request
import os
import gcsfs
import requests

app = Flask(__name__)

project = os.getenv("PROJECT")
port = os.getenv("PORT", 8080)

@app.route('/urltogcs')
def urltogcs():
    try:
        gcp_file_system = gcsfs.GCSFileSystem(project=project)
        session = requests.Session()
        url = request.args.get('source', 'temp')
        blob_path = request.args.get('destination', 'temp')
        with session.get(url, stream=True) as r:
            r.raise_for_status()
            with gcp_file_system.open(blob_path, 'wb') as f_obj:
                for chunk in r.iter_content(chunk_size=1024 * 1024):
                    f_obj.write(chunk)
        return f'Successfully downloaded from {url} to {blob_path} :)'
    except Exception as e:
        print("Failure")
        print(e)
        return f'download failed for  {url}


if __name__ == "__main__":
    app.run(debug=True, host="0.0.0.0", port=int(port))

注意:代码需要 project 来自不理想的环境。如果 gcsfs.GCSFileSystem 不需要 project 会更好。或者,可以从 Google 的元数据服务中获取 project。为了方便 (!),我使用环境进行设置。

requirements.txt:

Flask==2.2.2
gcsfs==2022.7.1
gunicorn==20.1.0

Dockerfile:

FROM python:3.10-slim

ENV PYTHONUNBUFFERED True

ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

RUN pip install --no-cache-dir -r requirements.txt

CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app

重击脚本:

BILLING="[YOUR-BILLING]"
PROJECT="[YOUR-PROJECT]"
REGION="[YOUR-REGION]"
BUCKET="[YOUR-BUCKET]"

# Create Project
gcloud projects create ${PROJECT}

# Associate with Billing Account
gcloud beta billing projects link ${PROJECT} \
--billing-account=${BILLING}

# Enabled services
SERVICES=(
  "artifactregistry"
  "cloudbuild"
  "run"
)
for SERVICE in ${SERVICES[@]}
do
  gcloud services enable ${SERVICE}.googleapis.com \
  --project=${PROJECT}
done

# Create Bucket
gsutil mb -p ${PROJECT} gs://${BUCKET}

# Service Account
ACCOUNT=tester
EMAIL=${ACCOUNT}@${PROJECT}.iam.gserviceaccount.com

# Create Service Account
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}

# Create Service Account key
gcloud iam service-accounts keys create ${PWD}/${ACCOUNT}.json \
--iam-account=${EMAIL} \
--project=${PROJECT}

# Ensure Service Account can write to storage
gcloud projects add-iam-policy-binding ${PROJECT} \
--role=roles/storage.admin \
--member=serviceAccount:${EMAIL}

# Only needed for local testing
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/${ACCOUNT}.json

# Deploy Cloud Run service
# Run service as Service Account
NAME="urltogcs"
gcloud run deploy ${NAME} \
--source=${PWD}  \
--set-env-vars=PROJECT=${PROJECT} \
--no-allow-unauthenticated \
--service-account=${EMAIL} \
--region=${REGION} \
--project=${PROJECT}

# Grab the Cloud Run service's endpoint
ENDPOINT=$(gcloud run services describe ${NAME} \
--region=${REGION} \
--project=${PROJECT} \
--format="value(status.url)")

# Cloud Run service requires auth
TOKEN=$(gcloud auth print-identity-token)

# This page
SRC="https://stackoverflow.com/questions/73393808/"

# Generate a GCS Object name by epoch
DST="gs://${BUCKET}/$(date +%s)"

curl \
--silent \
--get \
--header "Authorization: Bearer ${TOKEN}" \
--data-urlencode "source=${SRC}" \
--data-urlencode "destination=${DST}" \
--write-out '%{response_code}' \
--output /dev/null \
${ENDPOINT}/urltogcs

产量正常:

200

和:

gsutil ls gs://${BUCKET}

gs://${BUCKET}/1660780270

关于google-cloud-platform - 谷歌云存储文件系统,Python 包错误 : AttributeError: 'GCSFile' object has no attribute 'gcsfs' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73393808/

相关文章:

docker - 错误 : unable to find container in kubectl set image

node.js - 如何使用 HTTP API 从 gcr.io Docker Registry 列出图像和标签?

Python:进程已完成,退出代码为 -1073741819 (0xC0000005)。如何调试?

python - 如何解决与 Windows 10 上安装 dlib 相关的问题?

go - Cloud-Run 进程失败,出现 500 状态代码和 membarrier gvisor 错误

google-cloud-platform - Terraform:Cloud Run 服务上的云端点?

reactjs - 如何在 React 中使用在 Google Cloud Run Dashboard 上声明的 ENV 变量

google-app-engine - Google App Engine Java 11 - 不清楚的错误 com.google.apphosting.api.ApiProxy$CallNotFoundException

machine-learning - 设置卷积神经网络的学习率

ssl - Cloudflare 和 Google Cloud Run 出现错误 525