将 json 解析为 avro 模式 : avro. schema.SchemaParseException 时出现 Python 异常:没有 "type"属性

标签 python json dictionary avro

我从文件中读取记录并将其转换为字典。后来我将该字典转换为 json 格式,以便我可以进一步尝试将其转换为 avro 模式。

到目前为止,这是我的代码片段:-

import json
from avro import schema, datafile, io

def json_to_avro():
        fo = open("avro_record.txt", "r")
        data = fo.readlines()
        final_header = []
        final_rec = []
        for header in data[0:1]:
            header = header.strip("\n")
            header = header.split(",")
            final_header = header
        for rec in data[1:]:
            rec = rec.strip("\n")
            rec = rec.split(" ")
            rec = ' '.join(rec).split()
            final_rec = rec
        final_dict = dict(zip(final_header,final_rec))
        #print final_dict
        json_dumps = json.dumps(final_dict, ensure_ascii=False)
        #print json_dumps
        SCHEMA = schema.parse(json_dumps)

json_to_avro()

当我打印 final_dict 时,输出是:-

{'TransportProtocol': 'udp', 'MSISDN': '+62696174735', 'ResponseCode': 'E6%B8    %B8%E%A%8%E%8%93&pfid=139ver=10.1.2.571title=Air%20fighter_pakage.apk', 'GGSN IP': '202.89.193.185', 'MSTimeZone': '+0008', 'Numbers of time period': '1', 'Mime Type': 'audio/aac', 'EndTime': '1462251588', 'OutBound': '709', 'Inbound': '35','Method': 'GET', 'RAT': 'ph', 'Referer': 'ghijk', 'TAC': '35893783', 'UserAgent': '961', 'MNC': '02', 'OutPayload': '0', 'CI': '34301', 'StartTime': '1462251588', 'DestinationIP':'ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3', 'URL': 'http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%', 'SGSN IP': '202.89.204.5', 'InPayload': '100', 'Protocol': 'http', 'WebDomain': '3', 'Source IP': 'e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e', 'MCC': '515', 'LAC': '36202', 'FlushFlag': '0', 'APN': '.internet.globe.com.', 'DestinationPort': '80', 'SourcePort': '82', 'LineFormat': 'http7', 'IMSI': '515-02-040687823335'}

当我打印 json_dumps 时,输出是:-

{"TransportProtocol": "udp", "MSISDN": "+62696174735", "ResponseCode":"E6%B%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571title=Air%20fighter_pakage.apk", "GGSN IP": "202.89.193.185", "MSTimeZone": "+0008", "Numbers of time period": "1", "Mime Type": "audio/aac", "EndTime": "1462251588", "OutBound": "709", "Inbound": "35", "Method": "GET", "RAT": "ph", "Referer": "ghijk", "TAC": "35893783", "UserAgent": "961", "MNC": "02", "OutPayload": "0", "CI": "34301", "StartTime": "1462251588", "DestinationIP": "ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3", "URL": "http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%", "SGSN IP": "202.89.204.5", "InPayload": "100", "Protocol": "http", "WebDomain": "3", "Source IP": "e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e", "MCC": "515", "LAC": "36202", "FlushFlag": "0", "APN": ".internet.globe.com.", "DestinationPort": "80", "SourcePort": "82", "LineFormat": "http7", "IMSI": "515-02-040687823335"}

我想我想进一步将其转换为 avro 模式的 json 格式。但是

SCHEMA = schema.parse(json_dumps)

抛出异常:-

Traceback (most recent call last):
File "convertToAvro.py", line 23, in <module>
json_to_avro()
File "convertToAvro.py", line 20, in json_to_avro
SCHEMA = schema.parse(json_dumps)
File "/usr/lib/python2.7/site-packages/avro/schema.py", line 785, in parse
return make_avsc_object(json_data, names)
File "/usr/lib/python2.7/site-packages/avro/schema.py", line 756, in make_avsc_object
raise SchemaParseException('No "type" property: %s' % json_data)
avro.schema.SchemaParseException: No "type" property: {u'TransportProtocol':u'udp', u'MSISDN': u'+62696174735', u'ResponseCode': u'E6%B8%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571&title=Air%20fighter_pakage.apk', u'GGSN IP': u'202.89.193.185', u'EndTime': u'1462251588', u'Method': u'GET',u'Mime Type': u'audio/aac', u'OutBound': u'709', u'Inbound': u'35',u'Numbers of time period': u'1', u'RAT': u'import jsonph', u'Referer':u'ghijk', u'TAC': u'35893783', u'UserAgent': u'961', u'MNC':u'02',u'OutPayload': u'0', u'CI': u'34301', u'DestinationPort': u'80',u'DestinationIP': u'ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3', u'URL':u'http:///group1/M00//B/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%', u'SGSN IP': u'202.89.204.5', u'InPayload': u'100', u'Protocol': u'http', u'WebDomain': u'3', u'Source IP': u'e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e', u'MCC': u'515', u'MSTimeZone': u'+0008', u'FlushFlag': u'0', u'APN': u'.internet.globe.com.', u'StartTime': u'1462251588', u'SourcePort': u'82', u'LineFormat': u'http7', u'LAC': u'36202', u'IMSI': u'515-02-040687823335'}

以防万一,这是我的输入记录:-

Protocol,LineFormat,StartTime,EndTime,MSTimeZone,IMSI,MSISDN,TAC,MCC,MNC,LAC,CI,SGSNIP,GGSNIP,APN,RAT,WebDomain,SourceIP,DestinationIP,SourcePort,DestinationPort,TransportProtocol,FlushFlag,Numbers of time period,OutBound,Inbound,Method,URL,ResponseCode,UserAgent,MimeType,Referer,OutPayload,InPayload

http    http7   1462251588      1462251588      +0008       515-02-040687823335     +62696174735    35893783        515     02      36202   34301   202.89.204.5    202.89.193.185  .internet.globe.com.        ph  3               e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3 82      80      udp     0       1       709     35      GET     http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%        E6%B8%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571&title=Air%20fighter_pakage.apk   961             audio/aac       ghijk   0       100

最佳答案

发生这种情况是因为 schema.parse() 函数中的参数必须是 avro-schema(不是记录本身),就像这里 ( https://avro.apache.org/docs/1.8.0/gettingstartedpython.html ):

schema = avro.schema.parse(open("user.avsc", "rb").read())

当您传递 json 记录时,它会中断。

关于将 json 解析为 avro 模式 : avro. schema.SchemaParseException 时出现 Python 异常:没有 "type"属性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38980367/

相关文章:

iPhone开发-调用外部JSON API(Apple会拒绝吗?)

JQuery 不处理 ColdFusion JSON

c# - Newtonsoft.Json.JsonSerializationException 未由用户代码处理

dictionary - 遍历自引用字典的 Pythonic 方式

python - 使用 linspace 和 ones 创建 Numpy 矩阵

python - 从网页中抓取 - python

python - 如何在 scikit 中对分类数据使用一种热编码器?

python - 一个易于刷新的 Python 线程队列

java - 在映射中多次添加相同的类对象作为 KEY

python - 将值设置为字典的字典值