python - Apache Beam Google Datastore ReadFromDatastore 实体 protobuf

标签 python google-cloud-datastore protocol-buffers google-cloud-dataflow apache-beam

我正在尝试使用 apache beam 的 google datastore api 来 ReadFromDatastore

p = beam.Pipeline(options=options)
(p
 | 'Read from Datastore' >> ReadFromDatastore(gcloud_options.project, query)
 | 'reformat'            >> beam.Map(reformat)
 | 'Write To Datastore'  >> WriteToDatastore(gcloud_options.project))

传递给我的重新格式化函数的对象是类型

google.cloud.proto.datastore.v1.entity_pb2.Entity

它是protobuf格式,很难修改或读取。

我想我可以将 entity_pb2.Entity 转换为字典

entity= dict(google.cloud.datastore.helpers._property_tuples(entity_pb))

但出于某种原因,尝试导入以下两个库时出现一些错误:

import google.cloud.datastore.helpers  
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore 

错误:

Traceback (most recent call last):
  File "/home/nburn42/MotoGarage/MotoGarage/MotoGarageBackgroundJobs/format_data.py", line 16, in <module>
    import google.cloud.datastore.helpers
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/__init__.py", line 57, in <module>
    from google.cloud.datastore.batch import Batch
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/batch.py", line 24, in <module>
    from google.cloud.datastore import helpers
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/helpers.py", line 29, in <module>
    from google.cloud.grpc.datastore.v1 import entity_pb2 as _entity_pb2
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/grpc/datastore/v1/entity_pb2.py", line 28, in <module>
    dependencies=[google_dot_api_dot_annotations__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_type_dot_latlng__pb2.DESCRIPTOR,])
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 824, in __new__
    return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "google/cloud/grpc/datastore/v1/entity.proto":
  google.datastore.v1.PartitionId.project_id: "google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.PartitionId.namespace_id: "google.datastore.v1.PartitionId.namespace_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.PartitionId: "google.datastore.v1.PartitionId" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.partition_id: "google.datastore.v1.Key.partition_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.path: "google.datastore.v1.Key.path" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement.id_type: "google.datastore.v1.Key.PathElement.id_type" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement.kind: "google.datastore.v1.Key.PathElement.kind" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement.id: "google.datastore.v1.Key.PathElement.id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement.name: "google.datastore.v1.Key.PathElement.name" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement: "google.datastore.v1.Key.PathElement" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key: "google.datastore.v1.Key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.ArrayValue.values: "google.datastore.v1.ArrayValue.values" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.ArrayValue: "google.datastore.v1.ArrayValue" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.value_type: "google.datastore.v1.Value.value_type" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.null_value: "google.datastore.v1.Value.null_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.boolean_value: "google.datastore.v1.Value.boolean_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.integer_value: "google.datastore.v1.Value.integer_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.double_value: "google.datastore.v1.Value.double_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.timestamp_value: "google.datastore.v1.Value.timestamp_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.key_value: "google.datastore.v1.Value.key_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.string_value: "google.datastore.v1.Value.string_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.blob_value: "google.datastore.v1.Value.blob_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.geo_point_value: "google.datastore.v1.Value.geo_point_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.entity_value: "google.datastore.v1.Value.entity_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.array_value: "google.datastore.v1.Value.array_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.meaning: "google.datastore.v1.Value.meaning" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.exclude_from_indexes: "google.datastore.v1.Value.exclude_from_indexes" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value: "google.datastore.v1.Value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.key: "google.datastore.v1.Entity.key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.properties" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.PropertiesEntry.key: "google.datastore.v1.Entity.PropertiesEntry.key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.PropertiesEntry.value: "google.datastore.v1.Entity.PropertiesEntry.value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.PropertiesEntry: "google.datastore.v1.Entity.PropertiesEntry" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity: "google.datastore.v1.Entity" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.partition_id: "google.datastore.v1.PartitionId" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Key.path: "google.datastore.v1.Key.PathElement" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.ArrayValue.values: "google.datastore.v1.Value" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Value.key_value: "google.datastore.v1.Key" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Value.entity_value: "google.datastore.v1.Entity" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Value.array_value: "google.datastore.v1.ArrayValue" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Entity.PropertiesEntry.value: "google.datastore.v1.Value" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Entity.key: "google.datastore.v1.Key" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.PropertiesEntry" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.

我可以做些什么来将 entity_pb2.Entity 转换为可用的东西吗?
ReadFromDatastore 是否太新而不能立即实际使用?
我应该使用另一种方法吗?

谢谢,
弥敦道

最佳答案

我遇到了同样的问题和 accepted answer对我不起作用。

OP 有 3 个问题:

<强>1。我可以做些什么来将 entity_pb2.Entity 转换为可用的东西吗?

您没有具体说明在使用返回值时遇到的困难,但 entity_pb2.Entity 的所有实例都应该有一个 properties属性(property)。然后您应该能够使用它从您的实体中获取值。例如property_value = entity.properties.get('<your_property_name>')


更新:我想我现在可能知道 OP 所说的“可用”是什么意思了,即使您这样做了 property_value = entity.properties.get('<your_property_name>')您在 property_value 中获得的值是 Protocol Buffer 格式...所以要获得属性的字典,您可以这样做...

from googledatastore import helper

value_dict = dict((prop_name, helper.get_value(entity.properties.get(prop_name)),) for prop_name in entity.properties)

<强>2。 ReadFromDatastore 是否太新,目前无法实际使用?

我最初也有同样的想法,但我现在似乎已经开始工作了(请参阅下面我对问题 3 的回答)。

<强>3。我应该使用另一种方法吗?

绝对不能导入 google-cloud-datastore库到您的项目中。这样做会导致 TypeError: Couldn't build proto file into descriptor pool!导入时要提出的原始问题中的错误 ReadFromDatastore来自 apache_beam .

从我一直在做的调查/调试看来,apache-beam (v2.8.0) 的当前版本似乎是库与 google-cloud-datastore (v1.7.1) 不兼容图书馆。这意味着我们必须改为使用 bundled googledatastore (v7.0.1)库来实现我们想要的。

进一步阅读/引用:

https://cloud.google.com/blog/products/gcp/how-to-do-data-processing-and-analytics-from-google-app-engine-with-google-cloud-dataflow

https://github.com/amygdala/gae-dataflow

https://gcloud-python.readthedocs.io/en/0.10.0/_modules/gcloud/datastore/helpers.html

关于python - Apache Beam Google Datastore ReadFromDatastore 实体 protobuf,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44117621/

相关文章:

python - 在 python 中使用公差唯一化数组/列表(相当于 uniquetol)

python - numpy.resize() 重新排列而不是调整大小?

python - GAE - 如何在没有连接的情况下生活?

google-app-engine - 给父项获取子实体/键的有效方法

java - 编译巨大的 .java 文件时 Eclipse 挂起

java - 用于 Protobuf 编译器的 gRPC Java Codegen 插件

python - Django1.6 IntegrityError 尽管有 transaction.atomic()?

python - BadKeyError : Invalid string key

c++ - Protobuf重复字段反序列化

python - 如何在 matplotlib 中隐藏曲面图后面的一条线?