python - 使用 python avro 库时读写模式

标签 python python-2.7 avro

avro 规范允许使用不同的写入和读取模式,前提是它们匹配。该规范进一步允许别名来满足读写模式之间的差异。下面的 python 2.7 试图说明这一点。

import uuid
import avro.schema
import json
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter


write_schema = {
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
         {"name": "name", "type": "string"},
         {"name": "favorite_number", "type": ["int", "null"]},
         {"name": "favorite_color", "type": ["string", "null"]}
     ]
}
writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(write_schema))
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()

read_schema = {
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "first_name", "type": "string", "aliases": ["name"]},
        {"name": "favorite_number", "type": ["int", "null"]},
        {"name": "favorite_color", "type": ["string", "null"]}
    ]
}

# 1. open avro and extract passport + data
reader = DataFileReader(open("users.avro", "rb"), DatumReader(write_schema, read_schema))
reader.close()

此代码有以下错误信息:

/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/simonshapiro/python_beam/src/avrov_test.py
Traceback (most recent call last):
  File "/Users/simonshapiro/python_beam/src/avrov_test.py", line 67, in <module>
    writer.append({"name": "Alyssa", "favorite_number": 256})
  File "/Library/Python/2.7/site-packages/avro/datafile.py", line 196, in append
    self.datum_writer.write(datum, self.buffer_encoder)
  File "/Library/Python/2.7/site-packages/avro/io.py", line 768, in write
    if not validate(self.writers_schema, datum):
  File "/Library/Python/2.7/site-packages/avro/io.py", line 103, in validate
    schema_type = expected_schema.type
AttributeError: 'dict' object has no attribute 'type'

Process finished with exit code 1

当它在没有使用此行的不同模式下运行时

reader = DataFileReader(open("users.avro", "rb"), DatumReader())

它工作正常。

最佳答案

经过更多工作后,我发现模式设置不正确。此代码按预期工作:

import uuid
import avro.schema
import json
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter


write_schema = avro.schema.parse(json.dumps({
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
         {"name": "name", "type": "string"},
         {"name": "favorite_number", "type": ["int", "null"]},
         {"name": "favorite_color", "type": ["string", "null"]}
     ]
}))

writer = DataFileWriter(open("users.avro", "wb"), DatumWriter(), write_schema)
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()

read_schema = avro.schema.parse(json.dumps({
    "namespace": "example.avro",
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "first_name", "type": "string", "default": "", "aliases": ["name"]},
        {"name": "favorite_number", "type": ["int", "null"]},
        {"name": "favorite_color", "type": ["string", "null"]}
    ]
}))

# 1. open avro and extract passport + data
reader = DataFileReader(open("users.avro", "rb"), DatumReader(write_schema, read_schema))
new_schema = reader.get_meta("avro.schema")
users = []
for user in reader:
    users.append(user)
reader.close()

关于python - 使用 python avro 库时读写模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44487684/

相关文章:

regex - 如何检测python字符串中的最后一位数字

java - Kafka Avro 使用模式注册表序列化/反序列化为具体类型失败

avro - 到目前为止,Apache Avro 中代码日期字段的最佳实践是什么?

python - 使用求幂 **0.5 比 math.sqrt 效率低?

centOS 上的 Python netadd 模块

python - 将列表列表插入 pandas df 的单列

python - 如何在不覆盖数据的情况下写入现有的 excel 文件(使用 pandas)?

java - 访问 AVRO GenericRecord (Java/Scala) 中的嵌套字段

python - 在感知器学习模型的 Python 实现中将数组传递给 numpy.dot()

python - 使用 Union 类型设置默认值