我正在尝试运行一个 python 代码,它将从源 URL 下载数据 block 并将其流式传输到目标云存储 blob。 它在独立 pc、本地函数等中运行良好。 但是当我尝试使用 GCP Cloud RUN 时,它会抛出奇怪的错误。
AttributeError: 'GCSFile' object has no attribute 'gcsfs'
完整的错误:
Traceback (most recent call last):
File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1683, in __del__
self.close()
File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1661, in close
self.flush(force=True)
File "/home/<user>/.local/lib/python3.9/site-packages/fsspec/spec.py", line 1527, in flush
self._initiate_upload()
File "/home/<user>/.local/lib/python3.9/site-packages/gcsfs/core.py", line 1443, in _initiate_upload
self.gcsfs.loop,
AttributeError: 'GCSFile' object has no attribute 'gcsfs'
它耗费了我一周的时间,非常感谢任何帮助或指导,在此先感谢。
实际使用过的代码:
from flask import Flask, request
import os
import gcsfs
import requests
app = Flask(__name__)
@app.route('/urltogcs')
def urltogcs():
try:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "secret.json"
gcp_file_system = gcsfs.GCSFileSystem(project='<project_id>')
session = requests.Session()
url = request.args.get('source', 'temp')
blob_path = request.args.get('destination', 'temp')
with session.get(url, stream=True) as r:
r.raise_for_status()
with gcp_file_system.open(blob_path, 'wb') as f_obj:
for chunk in r.iter_content(chunk_size=1024 * 1024):
f_obj.write(chunk)
return f'Successfully downloaded from {url} to {blob_path} :)'
except Exception as e:
print("Failure")
print(e)
return f'download failed for {url} :('
if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
最佳答案
您的代码(包含建议的更改)适用于我:
main.py
:
from flask import Flask, request
import os
import gcsfs
import requests
app = Flask(__name__)
project = os.getenv("PROJECT")
port = os.getenv("PORT", 8080)
@app.route('/urltogcs')
def urltogcs():
try:
gcp_file_system = gcsfs.GCSFileSystem(project=project)
session = requests.Session()
url = request.args.get('source', 'temp')
blob_path = request.args.get('destination', 'temp')
with session.get(url, stream=True) as r:
r.raise_for_status()
with gcp_file_system.open(blob_path, 'wb') as f_obj:
for chunk in r.iter_content(chunk_size=1024 * 1024):
f_obj.write(chunk)
return f'Successfully downloaded from {url} to {blob_path} :)'
except Exception as e:
print("Failure")
print(e)
return f'download failed for {url}
if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port=int(port))
注意:代码需要 project
来自不理想的环境。如果 gcsfs.GCSFileSystem
不需要 project
会更好。或者,可以从 Google 的元数据服务中获取 project
。为了方便 (!),我使用环境进行设置。
requirements.txt
:
Flask==2.2.2
gcsfs==2022.7.1
gunicorn==20.1.0
Dockerfile
:
FROM python:3.10-slim
ENV PYTHONUNBUFFERED True
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
RUN pip install --no-cache-dir -r requirements.txt
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 main:app
重击脚本:
BILLING="[YOUR-BILLING]"
PROJECT="[YOUR-PROJECT]"
REGION="[YOUR-REGION]"
BUCKET="[YOUR-BUCKET]"
# Create Project
gcloud projects create ${PROJECT}
# Associate with Billing Account
gcloud beta billing projects link ${PROJECT} \
--billing-account=${BILLING}
# Enabled services
SERVICES=(
"artifactregistry"
"cloudbuild"
"run"
)
for SERVICE in ${SERVICES[@]}
do
gcloud services enable ${SERVICE}.googleapis.com \
--project=${PROJECT}
done
# Create Bucket
gsutil mb -p ${PROJECT} gs://${BUCKET}
# Service Account
ACCOUNT=tester
EMAIL=${ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
# Create Service Account
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}
# Create Service Account key
gcloud iam service-accounts keys create ${PWD}/${ACCOUNT}.json \
--iam-account=${EMAIL} \
--project=${PROJECT}
# Ensure Service Account can write to storage
gcloud projects add-iam-policy-binding ${PROJECT} \
--role=roles/storage.admin \
--member=serviceAccount:${EMAIL}
# Only needed for local testing
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/${ACCOUNT}.json
# Deploy Cloud Run service
# Run service as Service Account
NAME="urltogcs"
gcloud run deploy ${NAME} \
--source=${PWD} \
--set-env-vars=PROJECT=${PROJECT} \
--no-allow-unauthenticated \
--service-account=${EMAIL} \
--region=${REGION} \
--project=${PROJECT}
# Grab the Cloud Run service's endpoint
ENDPOINT=$(gcloud run services describe ${NAME} \
--region=${REGION} \
--project=${PROJECT} \
--format="value(status.url)")
# Cloud Run service requires auth
TOKEN=$(gcloud auth print-identity-token)
# This page
SRC="https://stackoverflow.com/questions/73393808/"
# Generate a GCS Object name by epoch
DST="gs://${BUCKET}/$(date +%s)"
curl \
--silent \
--get \
--header "Authorization: Bearer ${TOKEN}" \
--data-urlencode "source=${SRC}" \
--data-urlencode "destination=${DST}" \
--write-out '%{response_code}' \
--output /dev/null \
${ENDPOINT}/urltogcs
产量正常:
200
和:
gsutil ls gs://${BUCKET}
gs://${BUCKET}/1660780270
关于google-cloud-platform - 谷歌云存储文件系统,Python 包错误 : AttributeError: 'GCSFile' object has no attribute 'gcsfs' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73393808/